Event-based Scoring

[Minnen2006] and [Ward2011] proposed a set of performance metrics and visualizations for continuous activity recognition. In both papers, the authors examine the issues in continuous activity recognition and argued that the traditional standard multi-class evaluation methods fail to capture common artefacts found in continuous AR.

In both papers, the false positives and false negatives are further divided into six categories to faithfully capture the nature of those errors in the context of continuous AR.

Whenever an error occurs, it is both a false positive with respect to the prediction label and a false negative with respect to the ground truth label.

False positive errors are divided into the following three categories:

  • Insertion (I): A FP that corresponds exactly to an inserted return.
  • Merge (M): A FP that occurs between two TP segments within a merge return.
  • Overfill (O): A FP that occurs at the start or end of a partially matched return.

False negatives errors are divided into the following three categories:

  • Deletion (D): A FN that corresponds exactly to a deleted event.
  • Fragmenting (F): A FN that corresponds exactly to a deleted event.
  • Underfill (U): A FN that occurs at the start or end of a detected event.

API Reference

pyActLearn.performance.event.score_segment(truth, prediction, bg_label=-1)[source]

Score Segments

According to [Minnen2006] and [Ward2011], a segment is defined as the largest part of an event on which the comparison between the ground truth and the output of recognition system can be made in an unambiguous way. However, in this piece of code, we remove the limit where the segment is the largest part of an event. As long as there is a match between prediction and ground truth, it is recognized as a segment.

There are four possible outcomes to be scored: TP, TN, FP and FN. In event-based performance scoring, the FP and FN are further divided to the following cases:

  • Insertion (I): A FP that corresponds exactly to an inserted return.
  • Merge (M): A FP that occurs between two TP segments within a merge return.
  • Overfill (O): A FP that occurs at the start or end of a partially matched return.
  • Deletion (D): A FN that corresponds exactly to a deleted evjmk, ent.
  • Fragmenting (F): A FN that corresponds exactly to a deleted event.
  • Underfill (U): A FN that occurs at the start or end of a detected event.
Parameters:
Returns:

An array with truth and event-based scoring labels

Return type:

numpy.ndarray

pyActLearn.performance.event.per_class_event_scoring(num_classes, truth, prediction, truth_scoring, prediction_scoring)[source]

Create per-class event scoring to identify the contribution of event-based errors to the traditional recall and false-positive rate.

Instead of doing an EAD as proposed in previous two papers, we look at Recall and FPR separately.

Recall is defined as TP/(TP + FN). In another word, how often does it predict yes when it’s actually yes? The errors in the false negatives, such as Deletion, Fragmenting, and Underfill, adds up to the FP. A Deletion means a total miss of an activity. Underfill represents an error on the begin and end boundary of the event. Fragmenting represents a glitch in the prediction.

Precision is defined as TP/(TP + FP). In another word, how often is it a yes when it is predicted yes? The error in the false positives, such as Insertion, Merge and Overfill, adds up to the FP. In the task of ADL recognition, insertion may be caused by human error in labeling. Overfill represents a disagreement of the begin/end boundary of an activity, but the merge is a glitch in the prediction.

The function goes through the scoring of prediction and ground truth - and returns two dictionary that summaries the contribution of all those errors to Recall and False Positive Rate scores.

Parameters:
  • num_classes (int) – Total number of target classes
  • truth (numpy.ndarray) – Ground truth array, shape (num_samples, )
  • prediction (numpy.ndarray) – Prediction array, shape (num_samples, )
  • truth_scoring (numpy.ndarray) – Event scoring with respect to ground truth labels (i.e. false negatives are further divided into Deletion, Fragmenting, and Underfill). The information in this array is used to fill Recall measurement.
  • prediction_scoring (numpy.ndarray) – Event scoring with respect to prediction labels (i.e. false positives are further divided into Insertion, Merging and Overfill). The information in this array is used to fill Precision measurement.
Returns:

Tuple of event-based scoring summarie for recall and precision. Each summary array has a shape of (num_classes, ).

Return type:

tuple of numpy.ndarray

pyActLearn.performance.event.per_class_segment_scoring(num_classes, truth, prediction, truth_scoring, prediction_scoring)[source]

Create per-class event scoring to identify the contribution of event-based errors to the traditional recall and false-positive rate. The count is based on each event segment instead of each sensor event.

Parameters:
  • num_classes (int) – Total number of target classes
  • truth (numpy.ndarray) – Ground truth array, shape (num_samples, )
  • prediction (numpy.ndarray) – Prediction array, shape (num_samples, )
  • truth_scoring (numpy.ndarray) – Event scoring with respect to ground truth labels (i.e. false negatives are further divided into Deletion, Fragmenting, and Underfill). The information in this array is used to fill Recall measurement.
  • prediction_scoring (numpy.ndarray) – Event scoring with respect to prediction labels (i.e. false positives are further divided into Insertion, Merging and Overfill). The information in this array is used to fill Precision measurement.
Returns:

Tuple of event-based scoring summarie for recall and precision. Each summary array has a shape of (num_classes, ).

Return type:

tuple of numpy.ndarray