Multi-Class Performance Metrics

In most literature, standard multi-class performance metrics are used to evaluate an activity recognition algorithm. In module pyActLearn.performace, the following functions get the confusion matrix and calculate per-class performance and overall micro and macro performance.

pyActLearn.performance.get_confusion_matrix(num_classes, label, predicted)[source]

Calculate confusion matrix based on ground truth and predicted result

Parameters:
  • num_classes (int) – Number of classes
  • label (list of int) – ground truth labels
  • predicted (list of int) – predicted labels
Returns:

Confusion matrix (numpy_class by numpy_class)

Return type:

numpy.array

pyActLearn.performance.get_performance_array(confusion_matrix)[source]

Calculate performance matrix based on the given confusion matrix

[Sokolova2009] provides a detailed analysis for multi-class performance metrics.

Per-class performance metrics:

  1. True_Positive: number of samples that belong to class and classified correctly
  2. True_Negative: number of samples that correctly classified as not belonging to class
  3. False_Positive: number of samples that belong to class and not classified correctMeasure:
  4. False_Negative: number of samples that do not belong to class but classified as class
  5. Accuracy: Overall, how often is the classifier correct? (TP + TN) / (TP + TN + FP + FN)
  6. Misclassification: Overall, how often is it wrong? (FP + FN) / (TP + TN + FP + FN)
  7. Recall: When it’s actually yes, how often does it predict yes? TP / (TP + FN)
  8. False Positive Rate: When it’s actually no, how often does it predict yes? FP / (FP + TN)
  9. Specificity: When it’s actually no, how often does it predict no? TN / (FP + TN)
  10. Precision: When it predicts yes, how often is it correct? TP / (TP + FP)
  11. Prevalence: How often does the yes condition actually occur in our sample? Total(class) / Total(samples)
  12. F(1) Measure: 2 * (precision * recall) / (precision + recall)
  13. G Measure: sqrt(precision * recall)

Gets Overall Performance for the classifier

  1. Average Accuracy: The average per-class effectiveness of a classifier
  2. Weighed Accuracy: The average effectiveness of a classifier weighed by prevalence of each class
  3. Precision (micro): Agreement of the class labels with those of a classifiers if calculated from sums of per-text decision
  4. Recall (micro): Effectiveness of a classifier to identify class labels if calculated from sums of per-text decisions
  5. F-Score (micro): Relationship between data’s positive labels and those given by a classifier based on a sums of per-text decisions
  6. Precision (macro): An average per-class agreement of the data class labels with those of a classifiers
  7. Recall (macro): An average per-class effectiveness of a classifier to identify class labels
  8. F-Score (micro): Relations between data’s positive labels and those given by a classifier based on a per-class average
  9. Exact Matching Ratio: The average per-text exact classification

Note

In Multi-class classification, Micro-Precision == Micro-Recall == Micro-FScore == Exact Matching Ratio (Multi-class classification: each input is to be classified into one and only one class)

Parameters:
  • num_classes (int) – Number of classes
  • confusion_matrix (numpy.array) – Confusion Matrix (numpy array of num_class by num_class)
Returns:

tuple of overall performance and per class performance

Return type:

tuple of numpy.array