CASAS.h5py

casas_hdf5_doc_master implements pyActLearn.CASAS.h5py.CASASHDF5.

Dataset Structure

HDF5 is a data model, library and file format for storing and managing data. h5py package is the python interface to read and write HDF5 file. You can open and view the HDF5 file using hdfviewer.

The pre-processed feature array x is stored as dataset /features. Corresponding target labels is stored as dataset /targets. The corresponding time for each entry is stored at /time as array of bytes (HDF5 does not support str).

The meta-data of the smart home is stored as attributes of the root node. The table below summarizes the description of all those attributes.

Attribute Description
bg_target Name of background activity.
comment Description of the dataset.
days List of start and stop index tuple of each segment when the dataset is splitted by days.
weeks List of start and stop index tuple of each segment when the dataset is splitted by weeks.
features Feature name corresponding to each column in /features dataset.
targets List of activity labels.
target_color List of color string for each activity for visualization.
sources List of dataset names in the file.
sensors List of sensor names

The image below gives a glimpse of the hdf5 structure in hdfviewer.

../_images/CASASHDF5_HDFView.png

Smart home pre-processed data in hdf5 format.

Load and Fetch Data from HDF5

pyActLearn.CASAS.h5py.CASASHDF5 provides multiple interfaces for accessing and loading the data from hdf5 file. The dataset is usually split by weeks and days. Function pyActLearn.CASAS.h5py.CASASHDF5.fetch_data() will load the time, features and target labels of the time frame provided via the start split and end split names.

Here is the code snip to load the data from splits to train a support vector machine.

import sklearn.svm
from pyActLearn.CASAS.h5py import CASASHDF5
# Load dataset
ds = CASASHDF5(path='twor_statNormPerSensor.hdf5')
# Training
time, feature, target = ds.fetch_data(start_split='week_1', stop_split='week_4')
x = feature
y = target.flatten().astype(np.int)
model = sklearn.svm.SVC(kernel='rbf')
model.fit(x, y)
# Testing
time, feature, target = ds.fetch_data(start_split='week_1', stop_split='week_4')
x = feature
y = model.predict(x)

API Reference

class pyActLearn.CASAS.h5py.CASASHDF5(filename, mode='r', driver=None)[source]

Bases: object

CASASHDF5 Class to create and retrieve CASAS smart home data from h5df file

The data saved to or retrieved from a H5PY data file are pre-calculated features by CASASData class. The H5PY data file also contains meta-data about the dataset, which include description for each feature, splits by week and/or splits by days.

Variables:

_file (h5py.File) – h5py.File object that represents root group.

Parameters:
  • filename (str) – HDF5 File Name
  • mode (str) – ‘r’ for load from the file, and ‘w’ for create a new h5py data
close()[source]

Close Dataset

create_comments(comment)[source]

Add comments to dataset

Parameters:comment (str) – Comments to the dataset
create_features(feature_array, feature_description)[source]

Create Feature Dataset

Parameters:
  • feature_array (numpy.ndarray) – Numpy array holding calculated feature vectors
  • feature_description (list of str) – List of strings that describe each column of feature vectors.
create_sensors(sensors)[source]

Add sensors list to attributes

If the sensor IDs in the dataset is not binary coded, there is a need to provide the sensor list to go along with the feature vectors.

Parameters:sensors (list of str) – List of sensor name corresponds to the id in the feature array.
create_splits(days, weeks)[source]

Create splits by days and weeks

Parameters:
  • days (list of int) – Start index for each day
  • weeks (list of int) – Start index for week
create_targets(target_array, target_description, target_colors)[source]

Create Target Dataset

Parameters:
  • target_array (numpy.ndarray) – Numpy array holding target labels
  • target_description (list of str) – List of strings that describe each each target class.
  • target_colors (list of str) – List of color values corresponding to each target class.
create_time_list(time_array)[source]

Create Time List

Parameters:time_array (list of datetime) – datetime corresponding to each feature vector in feature dataset.
fetch_data(start_split=None, stop_split=None, pre_load=0)[source]

Fetch data between start and stop splits

Parameters:
  • start_split (str) – Begin of data
  • stop_split (str) – End of data
  • pre_load (int) – Load extra number of data before start split.
Returns:

Returns a tuple of all sources sliced by the split defined.

The sources should be in the order of (‘time’, ‘feature’, ‘target’)

Return type:

tuple of numpy.ndarray

flush()[source]

Write To File

get_bg_target()[source]

Get the description of the target class considered background in the dataset.

Returns:Name of the class which is considered background in the dataset. Usually it is ‘Other_Activity’.
Return type:str
get_bg_target_id()[source]

Get the id of the target class considered background.

Returns:The index of the target class which is considered background in the dataset.
Return type:int
get_days_info()[source]

Get splits by day.

Returns:
List of (key, value) tuple, where key is the name of the split and value is
number of items in that split.
Return type:List of tuple
get_feature_description_by_index(i)[source]

Get the description of feature column \(i\).

Parameters:i (int) – Column index.
Returns:Corresponding column description.
Return type:str
get_sensor_by_index(i)[source]

Get sensor name by index

Parameters:i (int) – Index to sensor
get_target_color_by_index(i)[source]

Get the color string of target class \(i\).

Parameters:i (int) – Class index.
Returns:Corresponding target class color string.
Return type:str
get_target_description_by_index(i)[source]

Get target description by class index \(i\).

Parameters:i (int) – Class index.
Returns:Corresponding target class description.
Return type:str
get_target_descriptions()[source]

Get list of target descriptions

Returns:List of target class description strings.
Return type:list of str
get_weeks_info()[source]

Get splits by week.

Returns:
List of (key, value) tuple, where key is the name of the split and value is
number of items in that split.
Return type:List of tuple
is_bg_target(i=None, label=None)[source]

Check if the target class given by :param:`i` or :param:`label` is considered background

Parameters:
  • i (int) – Class index.
  • label (str) – Class name.
Returns:

True if it is considered background.

Return type:

bool

num_between_splits(start_split=None, stop_split=None)[source]

Get the number of item between splits

Parameters:
  • start_split (str) – Begin of data
  • stop_split (str) – End of data
Returns:

The number of items between two splits.

Return type:

int

num_features()[source]

Get number of features in the dataset

num_sensors()[source]

Return the number of sensors in the sensor list

num_targets()[source]

Total number of target classes.

Returns:Total number of target classes.
Return type:int
set_background_target(target_name)[source]

Set ‘target_name’ as background target

Parameters:target_name (str) – Name of background target