CASAS.data¶
CASAS.data implements pyActLearn.CASAS.data.CASASData
.
Sensor Event File Format¶
pyActLearn.CASAS.data.CASASData
can load the smart home raw sensor event logs in raw text (legacy) format,
and comma separated (.csv) files.
Legacy format¶
Here is a snip of sensor event logs of a smart home in raw text format:
2009-06-01 17:51:20.055202 M046 ON
2009-06-01 17:51:22.036689 M046 OFF
2009-06-01 17:51:28.053264 M046 ON
2009-06-01 17:51:30.072223 M046 OFF
2009-06-01 17:51:35.046958 M045 OFF
2009-06-01 17:51:41.096098 M045 ON
2009-06-01 17:51:44.096236 M046 ON
2009-06-01 17:51:45.053722 M045 OFF
2009-06-01 17:51:46.015612 M045 ON
2009-06-01 17:51:47.005712 M046 OFF
2009-06-01 17:51:48.004619 M046 ON
2009-06-01 17:51:49.076356 M046 OFF
2009-06-01 17:51:50.035392 M046 ON
The following is an example to load the sensor event logs in legacy text format into class
pyActLearn.CASAS.data.CASASData
.
from pyActLearn.CASAS.data import CASASData
data = CASASData(path='twor.summer.2009/annotate')
CSV format¶
Some of the smart home data set are updated to CSV format. Those datasets usually come with meta-data about the smart home including floorplan, sensor location, activities annotated, and other information.
The binary sensor events are logged inside file event.csv
. Here is a snip of it:
2/1/2009,8:00:38 AM,M048,OFF,,,
2/1/2009,8:00:38 AM,M049,OFF,,,
2/1/2009,8:00:39 AM,M028,ON,,,
2/1/2009,8:00:39 AM,M042,ON,,,
2/1/2009,8:00:40 AM,M029,ON,,,
2/1/2009,8:00:40 AM,M042,OFF,,,
2/1/2009,8:00:40 AM,L003,OFF,,,
2/1/2009,8:00:42 AM,M043,OFF,,,
2/1/2009,8:00:42 AM,M037,ON,,,
2/1/2009,8:00:42 AM,M050,OFF,,,
2/1/2009,8:00:42 AM,M044,OFF,,,
2/1/2009,8:00:42 AM,M028,OFF,,,
2/1/2009,8:00:43 AM,M029,OFF,,,
The metadata about the smart home is in a json file format. Here is a snip of the metadata for twor dataset:
{
"name": "TWOR_2009_test",
"floorplan": "TWOR_2009.png",
"sensors": [
{
"name": "M004",
"type": "Motion",
"locX": 0.5605087077755726,
"locY": 0.061440840882448416,
"sizeX": 0.0222007722007722,
"sizeY": 0.018656716417910446,
"description": ""
},
],
"activities": [
{
"name": "Meal Preparation",
"color": "#FF8A2BE2",
"is_noise": false,
"is_ignored": false
},
]}
To load such a dataset, provide the directory path to the constructor of pyActLearn.CASAS.data.CASASData
.
from pyActLearn.CASAS.data import CASASData
data = CASASData(path='twor.summer.2009/')
Note
The constructor of pyActLearn.CASAS.data.CASASData
differentiates the format of sensor log by
determining whether the path is a directory or file. If it is a file, it assumes that it is in legacy raw
text format. If it is a directory, the constructor looks for event.csv
file within the directory for
binary sensor events, and dataset.json
for mete-data about the smart home.
Event Pre-processing¶
Raw sensor event data may need to be pre-processed before the learning algorithm can consume them. For algorithms like Hidden Markov Model, only raw sensor series are needed. For algorithms like decision tree, random forest, multi-layer perceptron, etc., statistic features within a sliding window of fixed length or variable length are calculated. For data used in stacked auto-encoder, the input needs to be normalized between 0 to 1.
pyActLearn.CASAS.data.CASASData.populate_feature()
function handles the pre-processing of all binary sensor
events. The statistical features implemented in this function includes
- Window Duration
- Last Sensor
- Hour of the Event
- Seconds of the Event
- Sensor Count
- Sensor Elapse Time
- Dominant Sensor
Methods to enable and disable specific features or activities are provided as well.
Please refer to pyActLearn.CASAS.data.CASASData
API reference for more information.
Export Data¶
After the data are pre-processed, the features and labels can be exported to excel file (.xlsx) via function
pyActLearn.CASAS.data.CASASData.write_to_xlsx()
.
pyActLearn.CASAS.data.CASASData.export_hdf5()
will save the pre-processed features and target labels in
hdf5 format. The meta-data is saved as attributes of the root node of hdf5 dataset.
The hdf5 file can be viewed using hdfviewer.
Here is an example loading raw sensor events and save to hdf5 dataset file.
from pyActLearn.CASAS.data import CASASData
data = CASASData(path='datasets/twor.2009/')
data.populate_feature(method='stat', normalized=True, per_sensor=True)
data..export_hdf5(filename='hdf5/twor_2009_stat.hdf5', comments='')
API Reference¶
-
class
pyActLearn.CASAS.data.
CASASData
(path)[source]¶ Bases:
object
A class to load activity data from CASAS smart home datasets.
The class load raw activity sensor events from CASAS smart home datasets. The class provides methods to pre-process the data for future learning algorithms for activity recognition. The pre-processed data can be exported to xlsx files for verification, and hdf5 file for faster read and search when evaluating a activity recognition algorithm.
Parameters: path (
str
) – path to a dataset directory, the dataset event.rst file for dataset in legacy format.Variables: - sensor_list (
dict
) – A dictionary containing sensor information. - activity_list (
dict
) – A dictionary containing activity information. - event_list (
list
ofdict
) – List of data used to store raw events. - x (
numpy.ndarray
) – 2D numpy array that contains calculated feature data. - y (
numpy.ndarray
) – 2D numpy array that contains activity label corresponding to feature array - data_path (
str
) – path to data file. - home (
pyActLearn.CASAS.home.CASASHome
) –CASAS.home.CASASHome
object that stores the home information associated with the dataset. - is_legacy (
bool
) – Defaults to False. If the dataset loaded is in legacy format or not. - is_stat_feature (
bool
) – Calculate statistical features or use raw data inx
- is_labeled (
bool
) – If given dataset is labeled - time_list (
list
ofdatetime.datetime
) – Datetime of each entry inx
. Used for back annotation, and splitting dataset by weeks or days. - feature_list (
dict
) – A dictionary of statistical features used in statistical feature calculation - routines (
dict
) – Function routines that needs to run every time when calculating features. Excluded from pickling. - num_enabled_features (
int
) – Number of enabled features. - num_static_features (
int
) – Number of features related to window - num_per_sensor_features (
int
) – Number of features that needs to be calculated per enabled sensor - events_in_window (
int
) – Number of sensor events (or statistical features of a sliding window) grouped in a feature vector.
-
disable_activity
(activity_label)[source]¶ Disable an activity
Parameters: activity_label ( str
) – Activity label
-
disable_feature
(feature_name)[source]¶ Disable a feature
Parameters: feature_name ( str
) – Feature name.
-
disable_routine
(routine)[source]¶ Disable a routine
Check all enabled feature list and see if the routine is used by other features. If no feature need the routine, disable it
Parameters: routine ( pyActLearn.CASAS.stat_features.FeatureRoutineTemplate
) – routine to be disabled
-
enable_activity
(activity_label)[source]¶ Enable an activity
Parameters: activity_label ( str
) – Activity labelReturns: The index of the enabled activity Return type: int
-
enable_feature
(feature_name)[source]¶ Enable a feature
Parameters: feature_name ( str
) – Feature name.
-
enable_routine
(routine)[source]¶ Enable a given routine
Parameters: routine ( pyActLearn.CASAS.stat_features.FeatureRoutineTemplate
) – routine to be disabled
-
enable_sensor
(sensor_name)[source]¶ Enable a sensor
Parameters: sensor_name ( str
) – Sensor Name- Returns
int
: The index of the enabled sensor
-
export_fuel
(directory, break_by='week', comments='')[source]¶ Export feature and label vector into hdf5 file and store the class information in a pickle file
Parameters:
-
export_hdf5
(filename, comments='', bg_activity='Other_Activity', driver=None)[source]¶ Export the dataset into a hdf5 dataset file with meta-data logged in attributes.
To load the data, you can use
pyActLearn.CASAS.h5py.CASASH5PY
class.Parameters:
-
get_activities_by_indices
(activity_ids)[source]¶ Get a group of activities by their corresponding indices
Parameters: activity_ids ( list
ofint
) – A list of activity indicesReturns: A list of activity labels in the same order Return type: list
ofstr
-
get_activity_by_index
(activity_id)[source]¶ Get Activity name by their index
Parameters: activity_id ( int
) – Activity indexReturns: Activity label Return type: str
-
get_activity_color
(activity_label)[source]¶ Find the color string for the activity.
Parameters: activity_label ( str
) – activity labelReturns: RGB color string Return type: str
-
get_activity_index
(activity_label)[source]¶ Get Index of an activity
Parameters: activity_label ( str
) – Activity labelReturns: Activity index (-1 if not found or not enabled) Return type: int
-
get_enabled_activities
()[source]¶ Get label list of all enabled activities
Returns: list of activity labels Return type: list
ofstr
-
get_enabled_sensors
()[source]¶ Get the names of all enabled sensors
Returns: List of sensor names Return type: list
ofstr
-
get_feature_by_index
(index)[source]¶ Get Feature Name by Index
Parameters: index ( int
) – column index of featureReturns: - (feature name, sensor name) tuple.
- If it is not per-sensor feature, the sensor name is None.
Return type: tuple
ofstr
-
get_feature_string_by_index
(index)[source]¶ Get the string describing the feature specified by column index
Parameters: index ( int
) – column index of featureReturns: Feature string Return type: str
-
get_sensor_by_index
(sensor_id)[source]¶ Get the name of sensor by index
Parameters: sensor_id ( int
) – Sensor indexReturns: Sensor name Return type: str
-
get_sensor_index
(sensor_name)[source]¶ Get Sensor Index
Parameters: sensor_name ( str
) – Sensor NameReturns: Sensor index (-1 if not found or not enabled) Return type: int
- sensor_list (