Evaluation
47 building blocks and models in the evaluation category.
Accuracy
Overall classification accuracy (TP+TN)/(Total)
Precision/Recall/F1
Precision, Recall, F1-Score (per class & macro/micro)
Confusion Matrix
Visualize TP, TN, FP, FN across classes
ROC-AUC Curve
Receiver Operating Characteristic & Area Under Curve
PR Curve
Precision-Recall curve (for imbalanced datasets)
Log Loss
Cross-entropy loss for probabilistic predictions
Cohen's Kappa
Agreement metric accounting for chance
MAE
Mean Absolute Error
MSE / RMSE
Mean Squared Error / Root MSE
R² Score
Coefficient of determination
MAPE
Mean Absolute Percentage Error
Residual Plot
Visualize prediction residuals
Precision@K
Precision at top K results
Recall@K
Recall at top K results
MAP
Mean Average Precision
MRR
Mean Reciprocal Rank
NDCG
Normalized Discounted Cumulative Gain
Hit Rate
Fraction of queries with at least one relevant result
Catalog Coverage
Percentage of items ever recommended
Diversity Score
Intra-list diversity of recommendations
Novelty Score
Average popularity rank of recommended items
Serendipity
Unexpected but relevant recommendations
CTR / Conversion
Click-through rate & conversion metrics
BLEU Score
Bilingual Evaluation Understudy (translation/generation)
ROUGE Score
Recall-Oriented Understudy (summarization)
BERTScore
Semantic similarity using BERT embeddings
Perplexity
Language model quality metric
Faithfulness
Factual consistency with source (RAG)
Answer Relevance
Relevance of generated answer to query (RAG)
IoU / Jaccard
Intersection over Union (detection/segmentation)
mAP (Detection)
Mean Average Precision for object detection
Dice Coefficient
Segmentation overlap metric
PSNR / SSIM
Image quality metrics (generation/super-resolution)
FID Score
Fréchet Inception Distance (generative models)
Silhouette Score
Cluster cohesion and separation
Davies-Bouldin Index
Cluster similarity ratio
Calinski-Harabasz
Variance ratio criterion
ARI / NMI
Adjusted Rand Index / Normalized Mutual Info
A/B Test Runner
Statistical A/B test framework
Statistical Significance
P-value and confidence interval calculator
Uplift Model
Incremental impact measurement
K-Means Clustering
Partition-based clustering algorithm minimizing within-cluster variance
PCA (Principal Component Analysis)
Dimensionality reduction via eigendecomposition of covariance matrix
Elbow Method
Heuristic for selecting optimal K in clustering via inertia curve
Context Recall
RAG evaluation metric measuring retrieval completeness against ground truth
Linear Regression
Fundamental regression model fitting linear relationships with OLS
Gradient Boosting (XGBoost/LightGBM)
Ensemble method building sequential trees on residual errors