Data Generation
31 building blocks and models in the data generation category.
Gaussian Generator
Generate data from Gaussian/Normal distribution
GAN Data Generator
Generative Adversarial Network for synthetic data
VAE Generator
Variational Autoencoder for data generation
Diffusion Generator
Diffusion model for high-quality synthetic data
CTGAN
Conditional Tabular GAN for structured data
TVAE
Tabular VAE for synthetic tabular data
Copula Generator
Copula-based synthetic data preserving correlations
Faker Generator
Rule-based fake data (names, addresses, etc.)
Time Series Generator
Synthetic time series (ARIMA, seasonal patterns)
LLM Data Generator
Use LLMs to generate synthetic training data
SMOTE
Synthetic Minority Over-sampling Technique
SMOTE-NC
SMOTE for mixed numerical/categorical data
Borderline-SMOTE
SMOTE focusing on borderline samples
ADASYN
Adaptive Synthetic Sampling
Random Oversampler
Simple random duplication of minority class
Random Undersampler
Random removal from majority class
Tomek Links
Remove Tomek links from majority class
Edited Nearest Neighbors
Remove samples misclassified by k-NN
Cluster Centroids
Replace majority class with cluster centroids
NearMiss
Heuristic undersampling based on distance
SMOTE + ENN
SMOTE followed by Edited Nearest Neighbors cleaning
SMOTE + Tomek
SMOTE followed by Tomek links removal
Image Augmentation
Rotate, flip, crop, color jitter, mixup, cutout
Text Augmentation
Synonym replacement, back-translation, EDA
Audio Augmentation
Time stretch, pitch shift, noise injection
Mixup
Convex combination of training examples
CutMix
Cut and paste patches between images
Differential Privacy
Add noise for differential privacy guarantees
Federated Synthesis
Generate synthetic data in federated setting
Condensed Nearest Neighbour (CNN)
Undersampling by finding minimal consistent subset preserving 1-NN boundary
Batch Normalization
Normalize layer inputs across mini-batch for stable deep learning training