Reference Library

ML Building Blocks & Models

A comprehensive reference of every component you need to design production ML systems — from data ingestion and feature engineering to deployment, evaluation, RAG and LLM ops.

241
Components
46
Models
21
Categories
Category

Data Ingestion

6 components

Category

Data Processing

5 components

Category

Feature Engineering

5 components

Category

Model Training

25 components

Train/Test Split

Split data into training, validation, and test sets

Explore

Model Training

Train ML model with hyperparameter tuning

Explore

Hyperparameter Tuning

Grid search, random search, or Bayesian optimization

Explore

Cross-Validation

K-fold, stratified, or time-series cross-validation

Explore

Full Fine-tuning

Update all model parameters on task-specific data

Explore

LoRA

Low-Rank Adaptation - add small trainable matrices to attention layers

Explore

QLoRA

Quantized LoRA - 4-bit quantization + LoRA for memory efficiency

Explore

Adapter Layers

Insert small trainable modules between frozen transformer layers

Explore

Prefix Tuning

Learn continuous soft prompts prepended to each layer

Explore

Prompt Tuning

Learn task-specific prompt embeddings (input layer only)

Explore

IA³

Infused Adapter - learn rescaling vectors for activations

Explore

Instruction Tuning

Fine-tune on instruction-following datasets (e.g., Alpaca, ShareGPT)

Explore

RLHF

Reinforcement Learning from Human Feedback with reward model

Explore

Reward Modeling

Train reward model from human preference comparisons

Explore

DPO

Direct Preference Optimization - simplified RLHF without reward model

Explore

ORPO

Odds Ratio Preference Optimization - single-stage SFT + preference

Explore

Constitutional AI

Self-improvement via AI feedback based on constitutional principles

Explore

Feature Extraction

Freeze base model, train only the classification head

Explore

Domain Adaptation

Adapt pretrained model to a new domain (e.g., medical, legal)

Explore

Continued Pretraining

Further pretrain on domain-specific corpus before fine-tuning

Explore

Knowledge Distillation

Train smaller student model to mimic larger teacher model

Explore

Multi-Task Learning

Train on multiple tasks simultaneously with shared representations

Explore

Transfer Learning

Reuse pretrained model knowledge for new tasks via fine-tuning or feature extraction

Explore

Active Learning

Iteratively select most informative samples for labeling to minimize annotation cost

Explore

Model Quantization

Reduce model precision (FP32→INT8/INT4) for faster inference and smaller footprint

Explore
Category

Evaluation

47 components

Accuracy

Overall classification accuracy (TP+TN)/(Total)

Explore

Precision/Recall/F1

Precision, Recall, F1-Score (per class & macro/micro)

Explore

Confusion Matrix

Visualize TP, TN, FP, FN across classes

Explore

ROC-AUC Curve

Receiver Operating Characteristic & Area Under Curve

Explore

PR Curve

Precision-Recall curve (for imbalanced datasets)

Explore

Log Loss

Cross-entropy loss for probabilistic predictions

Explore

Cohen's Kappa

Agreement metric accounting for chance

Explore

MAE

Mean Absolute Error

Explore

MSE / RMSE

Mean Squared Error / Root MSE

Explore

R² Score

Coefficient of determination

Explore

MAPE

Mean Absolute Percentage Error

Explore

Residual Plot

Visualize prediction residuals

Explore

Precision@K

Precision at top K results

Explore

Recall@K

Recall at top K results

Explore

MAP

Mean Average Precision

Explore

MRR

Mean Reciprocal Rank

Explore

NDCG

Normalized Discounted Cumulative Gain

Explore

Hit Rate

Fraction of queries with at least one relevant result

Explore

Catalog Coverage

Percentage of items ever recommended

Explore

Diversity Score

Intra-list diversity of recommendations

Explore

Novelty Score

Average popularity rank of recommended items

Explore

Serendipity

Unexpected but relevant recommendations

Explore

CTR / Conversion

Click-through rate & conversion metrics

Explore

BLEU Score

Bilingual Evaluation Understudy (translation/generation)

Explore

ROUGE Score

Recall-Oriented Understudy (summarization)

Explore

BERTScore

Semantic similarity using BERT embeddings

Explore

Perplexity

Language model quality metric

Explore

Faithfulness

Factual consistency with source (RAG)

Explore

Answer Relevance

Relevance of generated answer to query (RAG)

Explore

IoU / Jaccard

Intersection over Union (detection/segmentation)

Explore

mAP (Detection)

Mean Average Precision for object detection

Explore

Dice Coefficient

Segmentation overlap metric

Explore

PSNR / SSIM

Image quality metrics (generation/super-resolution)

Explore

FID Score

Fréchet Inception Distance (generative models)

Explore

Silhouette Score

Cluster cohesion and separation

Explore

Davies-Bouldin Index

Cluster similarity ratio

Explore

Calinski-Harabasz

Variance ratio criterion

Explore

ARI / NMI

Adjusted Rand Index / Normalized Mutual Info

Explore

A/B Test Runner

Statistical A/B test framework

Explore

Statistical Significance

P-value and confidence interval calculator

Explore

Uplift Model

Incremental impact measurement

Explore

K-Means Clustering

Partition-based clustering algorithm minimizing within-cluster variance

Explore

PCA (Principal Component Analysis)

Dimensionality reduction via eigendecomposition of covariance matrix

Explore

Elbow Method

Heuristic for selecting optimal K in clustering via inertia curve

Explore

Context Recall

RAG evaluation metric measuring retrieval completeness against ground truth

Explore

Linear Regression

Fundamental regression model fitting linear relationships with OLS

Explore

Gradient Boosting (XGBoost/LightGBM)

Ensemble method building sequential trees on residual errors

Explore
Category

Data Generation

31 components

Gaussian Generator

Generate data from Gaussian/Normal distribution

Explore

GAN Data Generator

Generative Adversarial Network for synthetic data

Explore

VAE Generator

Variational Autoencoder for data generation

Explore

Diffusion Generator

Diffusion model for high-quality synthetic data

Explore

CTGAN

Conditional Tabular GAN for structured data

Explore

TVAE

Tabular VAE for synthetic tabular data

Explore

Copula Generator

Copula-based synthetic data preserving correlations

Explore

Faker Generator

Rule-based fake data (names, addresses, etc.)

Explore

Time Series Generator

Synthetic time series (ARIMA, seasonal patterns)

Explore

LLM Data Generator

Use LLMs to generate synthetic training data

Explore

SMOTE

Synthetic Minority Over-sampling Technique

Explore

SMOTE-NC

SMOTE for mixed numerical/categorical data

Explore

Borderline-SMOTE

SMOTE focusing on borderline samples

Explore

ADASYN

Adaptive Synthetic Sampling

Explore

Random Oversampler

Simple random duplication of minority class

Explore

Random Undersampler

Random removal from majority class

Explore

Tomek Links

Remove Tomek links from majority class

Explore

Edited Nearest Neighbors

Remove samples misclassified by k-NN

Explore

Cluster Centroids

Replace majority class with cluster centroids

Explore

NearMiss

Heuristic undersampling based on distance

Explore

SMOTE + ENN

SMOTE followed by Edited Nearest Neighbors cleaning

Explore

SMOTE + Tomek

SMOTE followed by Tomek links removal

Explore

Image Augmentation

Rotate, flip, crop, color jitter, mixup, cutout

Explore

Text Augmentation

Synonym replacement, back-translation, EDA

Explore

Audio Augmentation

Time stretch, pitch shift, noise injection

Explore

Mixup

Convex combination of training examples

Explore

CutMix

Cut and paste patches between images

Explore

Differential Privacy

Add noise for differential privacy guarantees

Explore

Federated Synthesis

Generate synthetic data in federated setting

Explore

Condensed Nearest Neighbour (CNN)

Undersampling by finding minimal consistent subset preserving 1-NN boundary

Explore

Batch Normalization

Normalize layer inputs across mini-batch for stable deep learning training

Explore
Category

Deployment

6 components

Category

Monitoring

5 components

Category

Storage

4 components

Category

Orchestration

5 components

Category

RAG Pipeline

15 components

Category

LLM Operations

6 components

Category

Agentic Systems

6 components

Category

Multi-Agent

6 components

Category

Vector Databases

6 components

Category

Computer Vision

17 components

Category

NLP

5 components

Category

Responsible AI

6 components

Category

LLM Models

16 components

Category

NLP & Embedding Models

13 components

Category

3D Models

6 components

ML System Design Reference · Built by QnA Lab