CASAS.data

CASAS.data implements pyActLearn.CASAS.data.CASASData.

Sensor Event File Format

pyActLearn.CASAS.data.CASASData can load the smart home raw sensor event logs in raw text (legacy) format, and comma separated (.csv) files.

Legacy format

Here is a snip of sensor event logs of a smart home in raw text format:

2009-06-01 17:51:20.055202   M046    ON
2009-06-01 17:51:22.036689   M046    OFF
2009-06-01 17:51:28.053264   M046    ON
2009-06-01 17:51:30.072223   M046    OFF
2009-06-01 17:51:35.046958   M045    OFF
2009-06-01 17:51:41.096098   M045    ON
2009-06-01 17:51:44.096236   M046    ON
2009-06-01 17:51:45.053722   M045    OFF
2009-06-01 17:51:46.015612   M045    ON
2009-06-01 17:51:47.005712   M046    OFF
2009-06-01 17:51:48.004619   M046    ON
2009-06-01 17:51:49.076356   M046    OFF
2009-06-01 17:51:50.035392   M046    ON

The following is an example to load the sensor event logs in legacy text format into class pyActLearn.CASAS.data.CASASData.

from pyActLearn.CASAS.data import CASASData
data = CASASData(path='twor.summer.2009/annotate')

CSV format

Some of the smart home data set are updated to CSV format. Those datasets usually come with meta-data about the smart home including floorplan, sensor location, activities annotated, and other information.

The binary sensor events are logged inside file event.csv. Here is a snip of it:

2/1/2009,8:00:38 AM,M048,OFF,,,
2/1/2009,8:00:38 AM,M049,OFF,,,
2/1/2009,8:00:39 AM,M028,ON,,,
2/1/2009,8:00:39 AM,M042,ON,,,
2/1/2009,8:00:40 AM,M029,ON,,,
2/1/2009,8:00:40 AM,M042,OFF,,,
2/1/2009,8:00:40 AM,L003,OFF,,,
2/1/2009,8:00:42 AM,M043,OFF,,,
2/1/2009,8:00:42 AM,M037,ON,,,
2/1/2009,8:00:42 AM,M050,OFF,,,
2/1/2009,8:00:42 AM,M044,OFF,,,
2/1/2009,8:00:42 AM,M028,OFF,,,
2/1/2009,8:00:43 AM,M029,OFF,,,

The metadata about the smart home is in a json file format. Here is a snip of the metadata for twor dataset:

{
"name": "TWOR_2009_test",
"floorplan": "TWOR_2009.png",
"sensors": [
   {
      "name": "M004",
      "type": "Motion",
      "locX": 0.5605087077755726,
      "locY": 0.061440840882448416,
      "sizeX": 0.0222007722007722,
      "sizeY": 0.018656716417910446,
      "description": ""
   },
],
"activities": [
   {
      "name": "Meal Preparation",
      "color": "#FF8A2BE2",
      "is_noise": false,
      "is_ignored": false
   },
]}

To load such a dataset, provide the directory path to the constructor of pyActLearn.CASAS.data.CASASData.

from pyActLearn.CASAS.data import CASASData
data = CASASData(path='twor.summer.2009/')

Note

The constructor of pyActLearn.CASAS.data.CASASData differentiates the format of sensor log by determining whether the path is a directory or file. If it is a file, it assumes that it is in legacy raw text format. If it is a directory, the constructor looks for event.csv file within the directory for binary sensor events, and dataset.json for mete-data about the smart home.

Event Pre-processing

Raw sensor event data may need to be pre-processed before the learning algorithm can consume them. For algorithms like Hidden Markov Model, only raw sensor series are needed. For algorithms like decision tree, random forest, multi-layer perceptron, etc., statistic features within a sliding window of fixed length or variable length are calculated. For data used in stacked auto-encoder, the input needs to be normalized between 0 to 1.

pyActLearn.CASAS.data.CASASData.populate_feature() function handles the pre-processing of all binary sensor events. The statistical features implemented in this function includes

Methods to enable and disable specific features or activities are provided as well. Please refer to pyActLearn.CASAS.data.CASASData API reference for more information.

Export Data

After the data are pre-processed, the features and labels can be exported to excel file (.xlsx) via function pyActLearn.CASAS.data.CASASData.write_to_xlsx().

pyActLearn.CASAS.data.CASASData.export_hdf5() will save the pre-processed features and target labels in hdf5 format. The meta-data is saved as attributes of the root node of hdf5 dataset. The hdf5 file can be viewed using hdfviewer.

Here is an example loading raw sensor events and save to hdf5 dataset file.

from pyActLearn.CASAS.data import CASASData
data = CASASData(path='datasets/twor.2009/')
data.populate_feature(method='stat', normalized=True, per_sensor=True)
data..export_hdf5(filename='hdf5/twor_2009_stat.hdf5', comments='')

API Reference

class pyActLearn.CASAS.data.CASASData(path)[source]

Bases: object

A class to load activity data from CASAS smart home datasets.

The class load raw activity sensor events from CASAS smart home datasets. The class provides methods to pre-process the data for future learning algorithms for activity recognition. The pre-processed data can be exported to xlsx files for verification, and hdf5 file for faster read and search when evaluating a activity recognition algorithm.

Parameters:

path (str) – path to a dataset directory, the dataset event.rst file for dataset in legacy format.

Variables:
  • sensor_list (dict) – A dictionary containing sensor information.
  • activity_list (dict) – A dictionary containing activity information.
  • event_list (list of dict) – List of data used to store raw events.
  • x (numpy.ndarray) – 2D numpy array that contains calculated feature data.
  • y (numpy.ndarray) – 2D numpy array that contains activity label corresponding to feature array
  • data_path (str) – path to data file.
  • home (pyActLearn.CASAS.home.CASASHome) – CASAS.home.CASASHome object that stores the home information associated with the dataset.
  • is_legacy (bool) – Defaults to False. If the dataset loaded is in legacy format or not.
  • is_stat_feature (bool) – Calculate statistical features or use raw data in x
  • is_labeled (bool) – If given dataset is labeled
  • time_list (list of datetime.datetime) – Datetime of each entry in x. Used for back annotation, and splitting dataset by weeks or days.
  • feature_list (dict) – A dictionary of statistical features used in statistical feature calculation
  • routines (dict) – Function routines that needs to run every time when calculating features. Excluded from pickling.
  • num_enabled_features (int) – Number of enabled features.
  • num_static_features (int) – Number of features related to window
  • num_per_sensor_features (int) – Number of features that needs to be calculated per enabled sensor
  • events_in_window (int) – Number of sensor events (or statistical features of a sliding window) grouped in a feature vector.
disable_activity(activity_label)[source]

Disable an activity

Parameters:activity_label (str) – Activity label
disable_feature(feature_name)[source]

Disable a feature

Parameters:feature_name (str) – Feature name.
disable_routine(routine)[source]

Disable a routine

Check all enabled feature list and see if the routine is used by other features. If no feature need the routine, disable it

Parameters:routine (pyActLearn.CASAS.stat_features.FeatureRoutineTemplate) – routine to be disabled
disable_sensor(sensor_name)[source]

Disable a sensor

Parameters:sensor_name (str) – Sensor Name
enable_activity(activity_label)[source]

Enable an activity

Parameters:activity_label (str) – Activity label
Returns:The index of the enabled activity
Return type:int
enable_feature(feature_name)[source]

Enable a feature

Parameters:feature_name (str) – Feature name.
enable_routine(routine)[source]

Enable a given routine

Parameters:routine (pyActLearn.CASAS.stat_features.FeatureRoutineTemplate) – routine to be disabled
enable_sensor(sensor_name)[source]

Enable a sensor

Parameters:sensor_name (str) – Sensor Name
Returns
int: The index of the enabled sensor
export_fuel(directory, break_by='week', comments='')[source]

Export feature and label vector into hdf5 file and store the class information in a pickle file

Parameters:
  • directory (str) – The directory to save hdf5 and complementary dataset information
  • break_by (str) – Select the way to split the data, either by 'week' or 'day'
  • comments (str) – Additional comments to add
export_hdf5(filename, comments='', bg_activity='Other_Activity', driver=None)[source]

Export the dataset into a hdf5 dataset file with meta-data logged in attributes.

To load the data, you can use pyActLearn.CASAS.h5py.CASASH5PY class.

Parameters:
  • filename (str) – The directory to save hdf5 and complementary dataset information.
  • comments (str) – Additional comments to add.
  • bg_activity (str) – Background activity label.
  • driver (str) – h5py dataset R/W driver.
get_activities_by_indices(activity_ids)[source]

Get a group of activities by their corresponding indices

Parameters:activity_ids (list of int) – A list of activity indices
Returns:A list of activity labels in the same order
Return type:list of str
get_activity_by_index(activity_id)[source]

Get Activity name by their index

Parameters:activity_id (int) – Activity index
Returns:Activity label
Return type:str
get_activity_color(activity_label)[source]

Find the color string for the activity.

Parameters:activity_label (str) – activity label
Returns:RGB color string
Return type:str
get_activity_index(activity_label)[source]

Get Index of an activity

Parameters:activity_label (str) – Activity label
Returns:Activity index (-1 if not found or not enabled)
Return type:int
get_enabled_activities()[source]

Get label list of all enabled activities

Returns:list of activity labels
Return type:list of str
get_enabled_sensors()[source]

Get the names of all enabled sensors

Returns:List of sensor names
Return type:list of str
get_feature_by_index(index)[source]

Get Feature Name by Index

Parameters:index (int) – column index of feature
Returns:
(feature name, sensor name) tuple.
If it is not per-sensor feature, the sensor name is None.
Return type:tuple of str
get_feature_string_by_index(index)[source]

Get the string describing the feature specified by column index

Parameters:index (int) – column index of feature
Returns:Feature string
Return type:str
get_sensor_by_index(sensor_id)[source]

Get the name of sensor by index

Parameters:sensor_id (int) – Sensor index
Returns:Sensor name
Return type:str
get_sensor_index(sensor_name)[source]

Get Sensor Index

Parameters:sensor_name (str) – Sensor Name
Returns:Sensor index (-1 if not found or not enabled)
Return type:int
populate_feature(method='raw', normalized=True, per_sensor=True)[source]

Populate the feature vector in x and activities in y

Parameters:
  • method (str) – The method to convert sensor events into feature vector. Available methods are 'raw' and 'stat'.
  • normalized (bool) – Will each feature be normalized between 0 and 1?
  • per_sensor (bool) – For features related with sensor ID, are they
summary()[source]

Print summary of loaded datasets

write_to_xlsx(filename, start=0, end=-1)[source]

Write to file in xlsx format

Parameters:
  • filename (str) – xlsx file name.
  • start (int) – start index.
  • end (int) – end index.