Multi-Class Performance Metrics¶

In most literature, standard multi-class performance metrics are used to evaluate an activity recognition algorithm. In module pyActLearn.performace, the following functions get the confusion matrix and calculate per-class performance and overall micro and macro performance.

pyActLearn.performance.get_confusion_matrix(num_classes, label, predicted)[source]¶

Calculate confusion matrix based on ground truth and predicted result

Parameters:	num_classes (`int`) – Number of classes label (`list` of `int`) – ground truth labels predicted (`list` of `int`) – predicted labels
Returns:	Confusion matrix (numpy_class by numpy_class)
Return type:	`numpy.array`

pyActLearn.performance.get_performance_array(confusion_matrix)[source]¶

Calculate performance matrix based on the given confusion matrix

[Sokolova2009] provides a detailed analysis for multi-class performance metrics.

Per-class performance metrics:

True_Positive: number of samples that belong to class and classified correctly
True_Negative: number of samples that correctly classified as not belonging to class
False_Positive: number of samples that belong to class and not classified correctMeasure:
False_Negative: number of samples that do not belong to class but classified as class
Accuracy: Overall, how often is the classifier correct? (TP + TN) / (TP + TN + FP + FN)
Misclassification: Overall, how often is it wrong? (FP + FN) / (TP + TN + FP + FN)
Recall: When it’s actually yes, how often does it predict yes? TP / (TP + FN)
False Positive Rate: When it’s actually no, how often does it predict yes? FP / (FP + TN)
Specificity: When it’s actually no, how often does it predict no? TN / (FP + TN)
Precision: When it predicts yes, how often is it correct? TP / (TP + FP)
Prevalence: How often does the yes condition actually occur in our sample? Total(class) / Total(samples)
F(1) Measure: 2 * (precision * recall) / (precision + recall)
G Measure: sqrt(precision * recall)

Gets Overall Performance for the classifier

Average Accuracy: The average per-class effectiveness of a classifier
Weighed Accuracy: The average effectiveness of a classifier weighed by prevalence of each class
Precision (micro): Agreement of the class labels with those of a classifiers if calculated from sums of per-text decision
Recall (micro): Effectiveness of a classifier to identify class labels if calculated from sums of per-text decisions
F-Score (micro): Relationship between data’s positive labels and those given by a classifier based on a sums of per-text decisions
Precision (macro): An average per-class agreement of the data class labels with those of a classifiers
Recall (macro): An average per-class effectiveness of a classifier to identify class labels
F-Score (micro): Relations between data’s positive labels and those given by a classifier based on a per-class average
Exact Matching Ratio: The average per-text exact classification

Note

In Multi-class classification, Micro-Precision == Micro-Recall == Micro-FScore == Exact Matching Ratio (Multi-class classification: each input is to be classified into one and only one class)

Parameters:	num_classes (`int`) – Number of classes confusion_matrix (`numpy.array`) – Confusion Matrix (numpy array of num_class by num_class)
Returns:	tuple of overall performance and per class performance
Return type:	`tuple` of `numpy.array`