MAPE in Machine Learning
Mean Absolute Percentage Error (MAPE) is the most universally recognized percentage-based metric for evaluating regression and forecasting models. Its appeal is immediate: MAPE expresses prediction error as a percentage, making it scale-free and instantly interpretable by technical and non-technical audiences alike. When a demand planner hears that the forecast has 12% MAPE, they know exactly what it means without asking about units, scales, or baselines.
But MAPE carries a set of well-documented pitfalls that trip up even experienced practitioners. It explodes when actual values are zero or near-zero, it penalizes over-predictions more heavily than under-predictions (a subtle asymmetry first noted by Armstrong in 1985), and it has no upper bound on the high side while being capped at 100% on the low side. These quirks have spawned a family of variants -- SMAPE, WMAPE, MAAPE -- each attempting to patch a specific weakness.
Despite these limitations, MAPE remains the de facto standard for forecast accuracy in supply chain, retail demand planning, energy forecasting, and financial projections across the globe. From Flipkart's inventory optimization during Big Billion Days to Swiggy Instamart's daily demand prediction for perishable goods, Indian companies routinely track MAPE as a primary KPI. Understanding when MAPE works, when it misleads, and which variant to reach for is essential knowledge for any ML engineer working on regression or time series problems.
Concept Snapshot
- What It Is
- A scale-free regression metric that expresses the average prediction error as a percentage of the actual values, enabling cross-series and cross-product comparisons.
- Category
- Evaluation
- Complexity
- Beginner
- Inputs / Outputs
- Predictions array + ground truth array (non-zero actuals) → single MAPE score (non-negative percentage, 0% is perfect)
- System Placement
- Post-prediction evaluation stage in training, validation, production monitoring, and demand planning dashboards
- Also Known As
- Mean Absolute Percent Error, Percentage Error, APE (when not averaged), Mean Percentage Deviation
- Typical Users
- Data Scientists, ML Engineers, Demand Planners, Supply Chain Analysts, Business Analysts, Product Managers
- Prerequisites
- Basic statistics (mean, absolute value, percentages), Understanding of regression tasks, Familiarity with MAE (Mean Absolute Error)
- Key Terms
- percentage errorabsolute percentage errorasymmetry biasSMAPEWMAPEMAAPEscale-free metricdemand forecastingzero-value problemforecast accuracy
Why This Concept Exists
The Scale Problem That MAE Cannot Solve
Imagine you are managing two product lines at Flipkart. Product A sells for ₹50,000 (a premium laptop) and Product B sells for ₹500 (a phone case). Your demand forecasting model has an MAE of 2 units for both products. Is the model equally good for both? Absolutely not. Being off by 2 units when the actual demand is 10 laptops (20% error) is far worse than being off by 2 units when actual demand is 2,000 phone cases (0.1% error). MAE, being an absolute metric, cannot distinguish between these situations.
MAPE solves this by normalizing the error against the actual value, converting every prediction error into a percentage. Now Product A shows 20% MAPE and Product B shows 0.1% MAPE -- immediately revealing where attention is needed. This normalization is what makes MAPE scale-free: you can compare forecast accuracy across products, time series, geographies, and business units using a single, universal yardstick.
Historical Roots in Business Forecasting
MAPE's dominance in industry predates machine learning by decades. Supply chain managers, operations researchers, and demand planners adopted percentage-based accuracy measures in the 1970s and 1980s because their work inherently involved comparing forecasts across thousands of SKUs with wildly different volume scales. When Makridakis organized the first M-Competition in 1982 to systematically compare forecasting methods, percentage-based metrics including MAPE featured prominently in the evaluation framework.
The academic community quickly identified problems with MAPE. Armstrong (1985) was the first to formally document MAPE's asymmetry, noting that "it has a bias favoring estimates that are below the actual values." This observation triggered a long line of research into improved variants: Makridakis proposed SMAPE (Symmetric MAPE) in 1993 for the M3 competition, Hyndman and Koehler (2006) advocated abandoning percentage-based metrics entirely in favor of MASE (Mean Absolute Scaled Error), and Kim and Kim (2016) introduced MAAPE (Mean Arctangent Absolute Percentage Error) as a bounded alternative.
Why It Persists Despite Known Flaws
Despite four decades of criticism, MAPE remains the most widely used forecast accuracy metric in industry. The reason is simple: communication trumps mathematical purity. When a VP of Supply Chain asks "how accurate is our forecast?", the answer "12% MAPE" is immediately actionable. It means: on average, we are off by 12%. No further explanation needed. Try explaining MASE or RMSSE to the same audience and watch their eyes glaze over.
This tension between statistical rigor and practical communication is at the heart of MAPE's endurance. As an ML engineer, your job is not to avoid MAPE entirely, but to understand its failure modes, know when to use variants, and always complement it with at least one metric that does not share its weaknesses.
Core Intuition & Mental Model
The Simplest Way to Think About MAPE
MAPE answers one question: "On average, what percentage of the actual value did my predictions miss by?" If your delivery time model predicts 45 minutes for a trip that actually takes 50 minutes, the absolute percentage error for that single prediction is . MAPE is simply the average of these percentage errors across all predictions.
The percentage framing is what gives MAPE its power. An error of ₹1,000 means nothing without context. Is it ₹1,000 on a ₹10,000 product (10% -- concerning) or ₹1,000 on a ₹10,00,000 property (0.1% -- excellent)? MAPE embeds this context automatically.
The Asymmetry You Must Understand
Here is the single most important thing about MAPE that most tutorials gloss over: MAPE penalizes over-predictions more than under-predictions of the same magnitude. This is not a minor technicality -- it can systematically bias your model selection and your forecasts.
Consider two scenarios with the same absolute error of 50:
- Actual = 150, Predicted = 100 (under-predict by 50): APE = 50/150 = 33.3%
- Actual = 100, Predicted = 150 (over-predict by 50): APE = 50/100 = 50.0%
Same absolute miss, but MAPE punishes the over-prediction 50% more harshly. Why? Because the denominator is the actual value. When the actual is large, dividing by it shrinks the percentage. When the actual is small, the same absolute error becomes a larger percentage. This means MAPE systematically favors models that under-predict, which can be disastrous for inventory planning (you will consistently stock too little) or capacity planning (you will consistently under-provision).
The Zero Problem
The other critical intuition: MAPE divides by the actual value. If the actual value is zero, you get division by zero -- the metric is undefined. If the actual value is very small (say, 0.001), even a tiny absolute error produces an astronomically large percentage error, potentially thousands of percent. This makes MAPE completely unreliable for data with zeros or near-zero values, which is extremely common in real-world scenarios like intermittent demand (products that sell zero units on many days), rare event prediction, or any dataset with a natural lower bound near zero.
Rule of thumb: If more than 5% of your actual values are zero or near-zero, do not use MAPE. Reach for WMAPE, SMAPE, or MAE instead.
Technical Foundations
The Standard MAPE Formula
MAPE is defined as:
where:
- is the number of samples
- is the actual (true) value for sample (must be non-zero)
- is the predicted value for sample
- The result is expressed as a percentage
Note: Some implementations (including scikit-learn) omit the factor, returning MAPE as a decimal in rather than a percentage. Always check the convention.
Properties
- Range: , with 0% indicating perfect predictions. There is no finite upper bound.
- Scale-free: MAPE is unitless (a percentage), enabling comparison across datasets with different target scales.
- Asymmetric: Under-predictions have APE bounded by 100% (when ), but over-predictions have unbounded APE. This creates a systematic bias favoring under-prediction.
- Non-negative: Absolute values ensure all individual errors are non-negative.
- Undefined at zero: is undefined when any .
- Non-differentiable: Like MAE, the absolute value function introduces non-differentiability at zero error.
The Asymmetry Formally
Let be the absolute error. Then:
- Under-prediction (): , which is bounded by (100%) when
- Over-prediction (): , which can exceed 100% without limit as
This asymmetry means MAPE-minimizing models will systematically under-forecast.
Key Variants
Symmetric MAPE (SMAPE):
SMAPE bounds the denominator by using the average of actual and predicted values, making it symmetric in the sense that swapping and gives the same result. SMAPE ranges from 0% to 200%. However, Hyndman (2014) showed that SMAPE is still not truly symmetric and favors over-forecasting (the opposite bias from MAPE).
Weighted MAPE (WMAPE):
WMAPE weights each observation's error by its share of total actual volume. High-volume items dominate the metric, low-volume items have negligible impact. This is equivalent to MAE divided by mean actual value. WMAPE avoids the zero-denominator problem when individual actuals are zero (as long as the sum is non-zero) and naturally handles the asymmetry issue because it is computed as a ratio of aggregated sums rather than an average of ratios.
Mean Arctangent Absolute Percentage Error (MAAPE):
MAPE replaced by its arctangent, which bounds the error to , eliminating the infinity problem when actuals are zero or near-zero. Proposed by Kim and Kim (2016).
Computational Complexity
MAPE requires operations: one subtraction, one absolute value, one division per sample, plus final averaging. Identical computational cost to MAE with one additional division per element.
Internal Architecture
The MAPE computation pipeline extends the standard MAE architecture with a normalization step that divides each absolute error by its corresponding actual value. This seemingly simple addition introduces significant complexity around zero handling, numerical stability, and variant selection.

The critical architectural difference from MAE is the zero-check gate after absolute error computation. Production systems must decide how to handle zero or near-zero actuals: skip them (biasing the metric toward non-zero observations), clip to a minimum denominator (introducing an arbitrary threshold), or switch to a variant like WMAPE that avoids per-element division entirely.
Key Components
Input Validator
Validates that y_true and y_pred have matching shapes, checks for NaN/Inf values, and flags zero or near-zero values in y_true. Emits warnings or errors based on the configured zero-handling policy.
Residual Computer
Performs element-wise subtraction to compute raw residuals. Same as MAE -- this step is shared across all error metrics.
Absolute Value Transformer
Applies to each residual, converting all errors to non-negative values. For SMAPE, also computes for the symmetric denominator.
Zero-Value Gate
The critical component unique to MAPE. Inspects each and routes computation: if or , applies the configured handling strategy (skip, clip, replace with fallback metric). Logs the count of zero-handled samples for monitoring.
Percentage Normalizer
Divides each absolute error by its corresponding actual value: . For WMAPE, replaces per-element division with aggregate division: . For SMAPE, uses as the denominator.
Aggregator
Averages percentage errors across all valid samples (excluding zeros if configured to skip). Multiplies by 100 to convert to percentage form. For WMAPE, computes the ratio of aggregated sums directly.
Output Formatter
Returns MAPE as a scalar percentage with metadata: sample count, number of zero-excluded observations, the variant used, and optionally the per-element APE distribution for percentile analysis.
Data Flow
Data flows through the pipeline with a critical branch point at the zero-value gate. The write path is straightforward: predictions and actuals enter, residuals are computed, absolute values are taken, and each is divided by the actual value before averaging. The key complexity lies in the zero-handling decision. In production systems at companies like Swiggy, the pipeline often computes MAPE, WMAPE, and MAE in parallel from the same residuals, branching after the absolute value step to avoid redundant computation. The zero-value gate logs statistics about how many observations were excluded, providing a data quality signal. If more than a configurable threshold (typically 5-10%) of observations trigger zero handling, the system may automatically fall back to WMAPE or MAE and alert the monitoring team.
The architecture diagram shows predictions and ground truth merging at a subtraction step, flowing through absolute value computation, then hitting a critical zero-check decision gate (highlighted in red). Non-zero values proceed through percentage normalization (orange) while zero values are routed to a handling strategy. Both paths converge at the sum-and-average aggregator, producing the final MAPE percentage (green). Dashed lines show variant selection paths for SMAPE and WMAPE as alternative computation routes.
How to Implement
Implementing MAPE correctly in production requires navigating three challenges that do not exist for simpler metrics like MAE: zero-value handling, variant selection, and the asymmetry bias. Most teams start with scikit-learn's mean_absolute_percentage_error for offline evaluation, which returns MAPE as a decimal (not a percentage) and handles zeros by producing infinity values. For training loops, TensorFlow and PyTorch do not include MAPE as a built-in loss function because its non-differentiability at zero and undefined behavior at make it unsuitable for gradient-based optimization -- you should train with MAE or Huber loss and evaluate with MAPE.
For production monitoring and demand forecasting, custom implementations are almost always necessary. The key decisions are: (1) how to handle zero actuals -- skip them, clip the denominator to a minimum value, or switch to WMAPE, (2) whether to report standard MAPE, SMAPE, or WMAPE based on the data characteristics, and (3) how to complement MAPE with directional metrics that reveal the asymmetry bias.
A critical implementation detail: scikit-learn's mean_absolute_percentage_error returns the result as a fraction (e.g., 0.12 for 12% error), while most business contexts expect a percentage (12%). Always document and verify the convention in your codebase to avoid off-by-100x errors in dashboards and alerts.
import numpy as np
from sklearn.metrics import mean_absolute_percentage_error
# Ground truth and predictions (demand forecasting example)
y_true = np.array([100, 150, 200, 50, 300, 80])
y_pred = np.array([110, 130, 210, 45, 280, 95])
# 1. Standard MAPE via scikit-learn (returns as fraction, not %)
mape_sklearn = mean_absolute_percentage_error(y_true, y_pred)
print(f"MAPE (sklearn, fraction): {mape_sklearn:.4f}")
print(f"MAPE (percentage): {mape_sklearn * 100:.2f}%")
# Output: MAPE (percentage): 11.08%
# 2. MAPE with sample weights (prioritize high-value predictions)
weights = np.array([2, 1, 1, 3, 1, 2]) # Weight perishables higher
mape_weighted = mean_absolute_percentage_error(
y_true, y_pred, sample_weight=weights
)
print(f"Weighted MAPE: {mape_weighted * 100:.2f}%")
# 3. Multi-output MAPE (multiple forecast horizons)
y_true_multi = np.array([[100, 110], [200, 190], [150, 160]])
y_pred_multi = np.array([[105, 115], [180, 195], [155, 150]])
mape_multi = mean_absolute_percentage_error(
y_true_multi, y_pred_multi, multioutput='raw_values'
)
print(f"Per-horizon MAPE: {mape_multi * 100}") # Per column
# WARNING: sklearn produces inf when y_true contains zeros!
y_with_zero = np.array([100, 0, 200])
y_pred_zero = np.array([110, 5, 190])
mape_inf = mean_absolute_percentage_error(y_with_zero, y_pred_zero)
print(f"MAPE with zero: {mape_inf}") # inf!This demonstrates scikit-learn's mean_absolute_percentage_error, introduced in version 0.24. Critical points: (1) it returns MAPE as a fraction (0.12), not a percentage (12%) -- multiply by 100 for reporting, (2) it produces inf when any actual value is zero, so always check your data first, (3) it supports sample weights and multi-output, making it usable for weighted variants and multi-horizon forecasts. For production use, always wrap this in a function that checks for zeros and handles them gracefully.
import numpy as np
from typing import Optional, Dict
def mape(
y_true: np.ndarray,
y_pred: np.ndarray,
zero_strategy: str = "skip",
epsilon: float = 1e-8
) -> float:
"""
Standard MAPE with configurable zero handling.
Args:
y_true: Actual values
y_pred: Predicted values
zero_strategy: 'skip' (exclude zeros), 'clip' (clip denom to epsilon),
'error' (raise ValueError)
epsilon: Minimum denominator for 'clip' strategy
Returns:
MAPE as a percentage (e.g., 12.5 means 12.5%)
"""
y_true = np.asarray(y_true, dtype=np.float64)
y_pred = np.asarray(y_pred, dtype=np.float64)
if y_true.shape != y_pred.shape:
raise ValueError(f"Shape mismatch: {y_true.shape} vs {y_pred.shape}")
abs_errors = np.abs(y_true - y_pred)
if zero_strategy == "skip":
mask = np.abs(y_true) > epsilon
if not mask.any():
raise ValueError("All actual values are zero")
ape = abs_errors[mask] / np.abs(y_true[mask])
elif zero_strategy == "clip":
denom = np.maximum(np.abs(y_true), epsilon)
ape = abs_errors / denom
elif zero_strategy == "error":
if np.any(np.abs(y_true) < epsilon):
raise ValueError("Zero actual values found; MAPE is undefined")
ape = abs_errors / np.abs(y_true)
else:
raise ValueError(f"Unknown zero_strategy: {zero_strategy}")
return float(np.mean(ape) * 100)
def smape(y_true: np.ndarray, y_pred: np.ndarray) -> float:
"""
Symmetric MAPE. Range: [0%, 200%].
Avoids division by zero when both actual and predicted are nonzero.
"""
y_true = np.asarray(y_true, dtype=np.float64)
y_pred = np.asarray(y_pred, dtype=np.float64)
numerator = np.abs(y_true - y_pred)
denominator = (np.abs(y_true) + np.abs(y_pred)) / 2
# Handle case where both actual and predicted are zero
mask = denominator > 0
result = np.zeros_like(numerator)
result[mask] = numerator[mask] / denominator[mask]
return float(np.mean(result) * 100)
def wmape(y_true: np.ndarray, y_pred: np.ndarray) -> float:
"""
Weighted MAPE. Equivalent to MAE / mean(|y_true|).
Handles individual zeros gracefully; only fails if ALL actuals are zero.
"""
y_true = np.asarray(y_true, dtype=np.float64)
y_pred = np.asarray(y_pred, dtype=np.float64)
total_abs_error = np.sum(np.abs(y_true - y_pred))
total_actual = np.sum(np.abs(y_true))
if total_actual == 0:
raise ValueError("WMAPE undefined: sum of actuals is zero")
return float(total_abs_error / total_actual * 100)
def maape(y_true: np.ndarray, y_pred: np.ndarray) -> float:
"""
Mean Arctangent APE. Bounded in [0, pi/2). Handles zeros gracefully.
Proposed by Kim & Kim (2016).
"""
y_true = np.asarray(y_true, dtype=np.float64)
y_pred = np.asarray(y_pred, dtype=np.float64)
abs_errors = np.abs(y_true - y_pred)
# Use safe division, defaulting to 0 where y_true is 0
with np.errstate(divide='ignore', invalid='ignore'):
ratios = np.where(y_true != 0, abs_errors / np.abs(y_true), abs_errors)
return float(np.mean(np.arctan(ratios)))
# Demonstration with demand forecasting data
y_actual = np.array([120, 0, 85, 200, 15, 300, 0, 50])
y_forecast = np.array([130, 5, 80, 190, 20, 310, 3, 55])
print("=== MAPE Variants Comparison ===")
print(f"MAPE (skip zeros): {mape(y_actual, y_forecast, 'skip'):.2f}%")
print(f"MAPE (clip zeros): {mape(y_actual, y_forecast, 'clip'):.2f}%")
print(f"SMAPE: {smape(y_actual, y_forecast):.2f}%")
print(f"WMAPE: {wmape(y_actual, y_forecast):.2f}%")
print(f"MAAPE (radians): {maape(y_actual, y_forecast):.4f}")
print(f"\nZero observations: {np.sum(y_actual == 0)} / {len(y_actual)}")This provides production-ready implementations of all four major MAPE variants. Key design choices: (1) mape() supports three zero-handling strategies -- skip (most common in practice), clip (numerically stable but introduces bias), and error (for pipelines that must guarantee no zeros), (2) smape() uses the Makridakis (1993) formulation with denominator as the average of actual and predicted absolute values, (3) wmape() avoids per-element division entirely by computing the ratio of aggregated sums, making it naturally robust to individual zeros, (4) maape() bounds the error via arctangent, eliminating infinity for all inputs. The comparison output shows how these variants diverge on data with zeros -- WMAPE and MAAPE remain well-defined while standard MAPE must skip or clip.
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
class MAPEMetric:
"""
Stateful MAPE metric for PyTorch evaluation loops.
Accumulates predictions across batches, computes MAPE at the end.
WARNING: Do NOT use MAPE as a training loss. Use MAE or Huber instead,
then evaluate with MAPE after training.
"""
def __init__(self, epsilon: float = 1e-8):
self.epsilon = epsilon
self.reset()
def reset(self):
self.total_ape = 0.0
self.count = 0
self.skipped_zeros = 0
def update(self, y_pred: torch.Tensor, y_true: torch.Tensor):
"""Accumulate APE values from a batch."""
abs_errors = torch.abs(y_true - y_pred)
# Skip near-zero actuals
mask = torch.abs(y_true) > self.epsilon
valid_ape = abs_errors[mask] / torch.abs(y_true[mask])
self.total_ape += valid_ape.sum().item()
self.count += mask.sum().item()
self.skipped_zeros += (~mask).sum().item()
def compute(self) -> dict:
"""Compute MAPE from accumulated values."""
if self.count == 0:
return {"mape_pct": float('inf'), "valid_count": 0, "skipped": self.skipped_zeros}
mape_val = (self.total_ape / self.count) * 100
return {
"mape_pct": mape_val,
"valid_count": self.count,
"skipped_zeros": self.skipped_zeros
}
# Example: Evaluate a trained model
def evaluate_model_mape(model, dataloader, device='cpu'):
"""Evaluate regression model and report MAPE."""
model.eval()
mape_metric = MAPEMetric()
mae_total = 0.0
n_samples = 0
with torch.no_grad():
for X_batch, y_batch in dataloader:
X_batch, y_batch = X_batch.to(device), y_batch.to(device)
predictions = model(X_batch)
# Update MAPE (for percentage-based reporting)
mape_metric.update(predictions, y_batch)
# Also track MAE (for unit-based reporting)
mae_total += torch.abs(y_batch - predictions).sum().item()
n_samples += len(y_batch)
results = mape_metric.compute()
results["mae"] = mae_total / n_samples
return results
# Demo with synthetic data
torch.manual_seed(42)
model = nn.Sequential(
nn.Linear(5, 32), nn.ReLU(),
nn.Linear(32, 16), nn.ReLU(),
nn.Linear(16, 1)
)
# Train with MAE loss (NOT MAPE)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = nn.L1Loss() # MAE for training
X = torch.randn(500, 5)
y = torch.randn(500, 1) * 10 + 50 # Positive target values
dataset = TensorDataset(X, y)
train_loader = DataLoader(dataset, batch_size=32, shuffle=True)
# Quick training
for epoch in range(20):
for X_batch, y_batch in train_loader:
optimizer.zero_grad()
loss = criterion(model(X_batch), y_batch)
loss.backward()
optimizer.step()
# Evaluate with MAPE
results = evaluate_model_mape(model, train_loader)
print(f"MAPE: {results['mape_pct']:.2f}%")
print(f"MAE: {results['mae']:.3f}")
print(f"Valid samples: {results['valid_count']}")
print(f"Skipped (zero actuals): {results['skipped_zeros']}")This demonstrates the correct pattern for using MAPE with PyTorch: train with MAE (L1 loss), evaluate with MAPE. MAPE should never be used as a training loss because (1) it is undefined at , (2) its gradients explode near zero, and (3) its asymmetry biases the model toward under-prediction. The MAPEMetric class follows the stateful metric pattern used in TorchMetrics and Keras -- accumulating values across batches and computing the final metric once. The evaluate_model_mape function reports both MAPE (for stakeholder communication) and MAE (for debugging), along with metadata about how many zero-valued samples were skipped.
import numpy as np
from collections import deque
from typing import List
class MAPEMonitor:
"""Production MAPE monitor with asymmetry and drift detection."""
def __init__(self, mape_threshold: float = 15.0, history_size: int = 168):
self.mape_threshold = mape_threshold
self.history: deque = deque(maxlen=history_size)
self._errors: List[float] = []
self._actuals: List[float] = []
self._over_count = 0
self._total_count = 0
def record(self, y_true: float, y_pred: float):
"""Record a single prediction."""
self._errors.append(abs(y_true - y_pred))
self._actuals.append(abs(y_true))
self._total_count += 1
if y_pred > y_true:
self._over_count += 1
def flush_window(self) -> dict:
"""Compute WMAPE and asymmetry for current window."""
errors = np.array(self._errors)
actuals = np.array(self._actuals)
total_actual = np.sum(actuals)
wmape = (np.sum(errors) / total_actual * 100) if total_actual > 0 else float('inf')
over_ratio = self._over_count / max(self._total_count, 1)
result = {
'wmape_pct': wmape,
'over_predict_ratio': over_ratio,
'sample_count': self._total_count,
'zero_count': int(np.sum(actuals < 1e-8))
}
self.history.append(result)
# Reset
self._errors, self._actuals = [], []
self._over_count, self._total_count = 0, 0
return result
def check_alerts(self) -> List[str]:
"""Check threshold, asymmetry, and drift alerts."""
alerts = []
if not self.history:
return alerts
latest = self.history[-1]
if latest['wmape_pct'] > self.mape_threshold:
alerts.append(f"WMAPE {latest['wmape_pct']:.1f}% > {self.mape_threshold}%")
if latest['over_predict_ratio'] > 0.65:
alerts.append(f"Over-prediction bias: {latest['over_predict_ratio']:.0%}")
if len(self.history) >= 24:
drift = latest['wmape_pct'] - self.history[-24]['wmape_pct']
if drift > 3.0:
alerts.append(f"MAPE drift: +{drift:.1f}pp over 24h")
return alerts
# Demo
monitor = MAPEMonitor(mape_threshold=15.0)
np.random.seed(42)
for hour in range(48):
actuals = np.random.exponential(50, 100)
preds = actuals + np.random.normal(3, 5 + hour * 0.2, 100)
for a, p in zip(actuals, preds):
monitor.record(float(a), float(p))
result = monitor.flush_window()
alerts = monitor.check_alerts()
if hour % 12 == 0:
print(f"Hour {hour}: WMAPE={result['wmape_pct']:.1f}%")
for alert in alerts:
print(f" [ALERT] {alert}")This production monitor tracks three key signals: (1) WMAPE as the primary metric (zero-safe, volume-weighted), (2) over-prediction ratio to detect MAPE's asymmetry bias affecting model behavior, and (3) drift detection comparing current MAPE against a 24-hour baseline. This pattern is used in demand forecasting at companies like Swiggy Instamart and Flipkart, where MAPE drift signals model staleness or data distribution shifts.
# MAPE monitoring configuration for demand forecasting pipeline
metrics:
mape:
variant: "wmape" # Prefer WMAPE for zero-safety
report_standard_mape: true # Also compute standard MAPE for reference
report_smape: false
zero_handling:
strategy: "skip" # skip | clip | error
clip_epsilon: 1e-6 # Min denominator for clip strategy
max_zero_ratio: 0.10 # Alert if >10% zeros
thresholds:
excellent: 5.0 # MAPE < 5%
good: 10.0 # 5% <= MAPE < 10%
acceptable: 20.0 # 10% <= MAPE < 20%
critical: 30.0 # MAPE >= 30% triggers alert
# Asymmetry monitoring
asymmetry:
track_direction: true
over_predict_alert: 0.65 # Alert if >65% over-predictions
under_predict_alert: 0.65
# Slicing for segment-level MAPE
slices:
- dimension: "product_category"
values: ["electronics", "grocery", "fashion", "home"]
- dimension: "demand_volume"
values: ["low", "medium", "high"]
- dimension: "region"
values: ["north", "south", "east", "west"]
# Companion metrics (always compute alongside MAPE)
companion_metrics:
- "mae" # Absolute error in original units
- "median_absolute_percentage_error" # Robust to outliers
- "forecast_bias" # Mean error (directional)
- "p90_absolute_percentage_error" # 90th percentile APE
# Drift detection
drift:
baseline_window: "7d" # Compare against last 7 days
drift_threshold_pp: 3.0 # Alert on 3 percentage point increase
check_interval: "1h"Common Implementation Mistakes
- ●
Dividing by zero without handling: Computing
abs(y_true - y_pred) / y_truewheny_truecontains zeros produces infinity or NaN. Always check for zeros first. scikit-learn's implementation will returninf-- this is by design, not a bug. - ●
Confusing fraction vs percentage: scikit-learn returns MAPE as a fraction (0.12 = 12%). Many teams report this raw value to stakeholders, who interpret it as 0.12% instead of 12%. Always multiply by 100 and label clearly.
- ●
Using MAPE as a training loss: MAPE's undefined gradient at and explosion near zero make it unsuitable for backpropagation. Train with MAE or Huber loss, then evaluate with MAPE.
- ●
Ignoring the asymmetry bias: MAPE systematically favors models that under-predict. If you select models based on MAPE alone, you will choose models that consistently forecast too low -- potentially catastrophic for inventory planning or capacity provisioning.
- ●
Averaging MAPE across series incorrectly: Computing MAPE per series and then averaging gives equal weight to all series regardless of volume. A series with 10 units and 50% MAPE dominates the average as much as a series with 10,000 units and 5% MAPE. Use WMAPE for volume-weighted accuracy.
- ●
Applying MAPE to intermittent demand: Products with many zero-demand periods (spare parts, seasonal items) make MAPE undefined or misleadingly large. Use WMAPE or the Croston method with MAE for intermittent demand.
- ●
Not reporting companion metrics: MAPE alone hides the magnitude of errors. A 10% MAPE on ₹100 items means ₹10 average error, but 10% MAPE on ₹1,00,000 items means ₹10,000 average error. Always report MAE alongside MAPE for context.
- ●
Comparing MAPE across SMAPE and standard formulations: Different tools use different MAPE definitions. scikit-learn uses standard MAPE, the M3/M4 competitions used SMAPE, and supply chain tools often use WMAPE. Comparing numbers across formulations is meaningless.
When Should You Use This?
Use When
You need a scale-free, universally understood accuracy metric: When reporting to business stakeholders who need to compare forecast accuracy across products, regions, or business units with different scales, MAPE's percentage format is immediately interpretable.
Your actual values are strictly positive and well above zero: MAPE works best when the denominator (actual values) is comfortably far from zero -- revenue forecasting, high-volume demand planning, energy consumption prediction.
Cross-series or cross-product comparison is required: Unlike MAE (which is scale-dependent), MAPE lets you compare accuracy across a ₹500 product and a ₹50,000 product on the same scale.
Industry benchmarking demands it: Supply chain, retail, and energy industries use MAPE as the standard accuracy KPI. If your company or client expects MAPE, you need to report it (possibly alongside better metrics).
You want to assess proportional accuracy: When a 10% miss on a ₹100 item and a 10% miss on a ₹1,00,000 item should be considered equally bad, MAPE's proportional framing is correct.
Demand forecasting with stable, non-zero demand: For high-volume SKUs in FMCG, grocery, or established product lines where zero-demand days are rare, MAPE is the industry standard metric.
Avoid When
Actual values include zeros or near-zero values: MAPE is undefined at zero and explodes near zero. Use WMAPE, MAAPE, or MAE instead. This includes intermittent demand (spare parts, seasonal products) and any count-based target where zero is common.
You are training a model (not evaluating): MAPE should never be used as a training loss due to undefined gradients at zero, explosion near zero, and asymmetry bias. Use MAE, MSE, or Huber loss for training.
Directional accuracy matters: If under-predicting by 10% has different consequences than over-predicting by 10% (e.g., inventory costs vs stockout costs), MAPE's asymmetry will mislead you. Use asymmetric loss functions or track forecast bias separately.
Your data has a wide range including very small values: Even without exact zeros, values close to zero produce disproportionately large percentage errors that dominate the average. Filter to or use WMAPE.
You need a bounded metric: MAPE has no upper bound -- a single bad prediction can produce 1,000%+ error. Use SMAPE (bounded 0-200%) or MAAPE (bounded 0 to ) if you need bounded error.
Comparing across different MAPE conventions: If some of your systems use standard MAPE, others use SMAPE, and others use WMAPE, the numbers are not comparable. Standardize on one variant before comparing.
Key Tradeoffs
The Central Tradeoff: Interpretability vs. Mathematical Soundness
MAPE's greatest strength -- intuitive percentage interpretation -- is inseparable from its greatest weakness -- division by actual values. This single design choice gives MAPE its scale-free property (strength) while introducing undefined behavior at zero, asymmetry, and unbounded errors (weaknesses). You cannot fix one without losing the other.
MAPE vs. MAE
| Property | MAPE | MAE |
|---|---|---|
| Scale | Percentage (scale-free) | Original units (scale-dependent) |
| Zeros | Undefined | Well-defined |
| Asymmetry | Favors under-prediction | Symmetric |
| Cross-series comparison | Direct | Requires normalization |
| Stakeholder communication | Excellent | Good (if units are meaningful) |
| Training loss | Never | Yes (L1 loss) |
MAPE vs. SMAPE
SMAPE was designed to fix MAPE's asymmetry, but Hyndman (2014) showed it introduces the opposite bias: SMAPE favors over-prediction. Neither metric is truly symmetric. SMAPE also behaves oddly when both actual and predicted values are small, producing values near 200% even for minor absolute errors. Use WMAPE instead if you need a symmetric, zero-safe percentage metric.
MAPE vs. WMAPE
WMAPE is often the best practical alternative. It avoids per-element division (so individual zeros do not cause problems), naturally weights by volume (so high-demand items matter more, which aligns with business impact), and is equivalent to MAE normalized by total actual volume. The tradeoff: low-volume items have almost no impact on WMAPE, so you need a separate metric to monitor their accuracy.
When to Report Multiple Variants
In production, the best practice is to report WMAPE as the primary metric (zero-safe, business-aligned), MAE as the absolute companion (for understanding error magnitude in original units), and standard MAPE on the subset of non-zero actuals (for backward compatibility with industry benchmarks). Track forecast bias (mean error, not absolute) separately to detect directional patterns that MAPE hides.
| Metric | Zero-Safe | Symmetric | Bounded | Volume-Weighted | Best For |
|---|---|---|---|---|---|
| MAPE | No | No | No | No | Industry benchmark |
| SMAPE | Partial | No (favors over) | Yes (0-200%) | No | M-competition |
| WMAPE | Yes | Yes | No | Yes | Business forecasting |
| MAAPE | Yes | No | Yes (0-pi/2) | No | Research, intermittent |
| MASE | Yes | Yes | No | No | Academic comparison |
Alternatives & Comparisons
MAE measures absolute error in the original units of the target variable, making it robust to zeros and symmetric by construction. Choose MAE when your data includes zeros or when you need to understand error magnitude in real-world units (e.g., '8 minutes off' vs '12% off'). Choose MAPE when you need cross-product or cross-scale comparisons that MAE cannot provide. In practice, report both: MAE for error magnitude, MAPE for proportional accuracy.
RMSE penalizes large errors quadratically while MAPE penalizes proportionally. A 100-unit error on a 200-unit actual is 50% MAPE regardless of the absolute scale, but contributes 10,000 to MSE. Use RMSE when large individual errors are catastrophic and need heavy penalization. Use MAPE when proportional accuracy matters more than absolute error magnitude. For safety-critical applications, RMSE is strongly preferred.
R² measures the proportion of variance explained by the model relative to a naive mean predictor. It is bounded (typically 0 to 1) and scale-free like MAPE, but interpretable differently: R² = 0.85 means the model explains 85% of variance, not that errors are 15%. Use R² for comparing model architectures on the same dataset; use MAPE for communicating forecast accuracy to business stakeholders. R² handles zeros gracefully and does not suffer from MAPE's asymmetry.
Pros, Cons & Tradeoffs
Advantages
Scale-free and universally interpretable: Expressed as a percentage, MAPE allows direct comparison across products, regions, time horizons, and business units without normalization. A supply chain manager understands '12% MAPE' instantly.
Industry standard for demand forecasting: MAPE is the most widely adopted accuracy KPI in supply chain, retail, energy, and financial forecasting. Using it enables benchmarking against published industry standards and competitor comparisons.
Unit-independent: Unlike MAE (measured in rupees, minutes, kilograms), MAPE is dimensionless. This makes it ideal for dashboards that aggregate accuracy across heterogeneous metric types.
Simple to compute and explain: The formula is elementary -- average of percentage errors. No statistical background required to understand or communicate the result. Cost of computation is with negligible constants.
Effective for strictly positive, non-zero data: When actual values are comfortably above zero (revenue, energy consumption, high-volume demand), MAPE provides clean, meaningful percentage accuracy that aligns with business intuition.
Widely supported in ML tooling: Available in scikit-learn, TensorFlow, statsmodels, Prophet, Apache Spark MLlib, and virtually every forecasting library. No custom implementation needed for basic use cases.
Disadvantages
Undefined when actual values are zero: Division by zero makes MAPE completely unusable for intermittent demand, count data with zeros, or any target that can be exactly zero. This is not a minor edge case -- zero actuals are extremely common in real-world data.
Asymmetric penalty structure: Over-predictions are penalized more heavily than under-predictions of the same absolute magnitude. This bias systematically favors under-forecasting models, which can lead to chronic stockouts in inventory planning.
Unbounded on the high side: A single prediction where the actual is small can produce APE of 1,000%+ or more, disproportionately inflating the mean. A single outlier can make the entire MAPE meaningless.
Explosion near zero: Even if actuals are not exactly zero, small actual values (e.g., 0.01) produce enormous percentage errors (e.g., an error of 1.0 on actual 0.01 = 10,000% APE) that dominate the average.
Equal weighting across all observations: Standard MAPE weights a low-volume product (10 units/day) equally with a high-volume product (10,000 units/day). This misaligns with business impact, where errors on high-volume items matter far more. WMAPE fixes this.
Not suitable as a training loss: Non-differentiability at zero error, undefined gradient at zero actual, and asymmetry bias make MAPE unsuitable for gradient-based optimization. You must train with a different loss and evaluate with MAPE.
Failure Modes & Debugging
Division by zero on zero actuals
Cause
Actual values () contain exact zeros, which is common in intermittent demand (spare parts, seasonal items), count-based targets, and datasets with natural lower bounds at zero. Scikit-learn returns inf, NumPy produces inf or nan depending on the implementation.
Symptoms
MAPE returns inf or nan. Downstream dashboards show blank or broken charts. Alerting systems trigger false positives or silently suppress valid metrics. Pipeline-level metric aggregation becomes contaminated, with a single zero observation invalidating the entire batch's MAPE.
Mitigation
Use WMAPE instead of standard MAPE -- it aggregates sums before dividing, avoiding per-element zero issues. If standard MAPE is required, filter out zero observations and report the exclusion count. Never silently skip zeros without logging. For intermittent demand, use specialized metrics like MASE or the Croston method.
Asymmetry-induced under-forecasting bias
Cause
MAPE's mathematical structure penalizes over-predictions more heavily than under-predictions of the same absolute magnitude (50/100 = 50% vs 50/150 = 33%). When used as a model selection criterion, this systematically selects models that under-forecast.
Symptoms
Forecast bias is consistently negative (predictions below actuals). Inventory systems experience chronic stockouts. Capacity planning under-provisions resources. The model appears to have low MAPE while actually performing poorly in terms of business impact.
Mitigation
Track forecast bias (mean signed error) alongside MAPE. If bias consistently trends negative, the MAPE asymmetry is affecting model selection. Use WMAPE (which does not suffer from per-observation asymmetry) or supplement MAPE with a directional accuracy metric. In production, establish bias thresholds: if mean bias exceeds ±2% of mean actual, investigate.
Near-zero actual value explosion
Cause
Actual values are not exactly zero but very small (e.g., 0.001, 0.1). Even modest absolute errors produce enormous percentage errors: an error of 1.0 on an actual of 0.1 produces 1,000% APE. A handful of such observations can dominate the entire MAPE.
Symptoms
MAPE is much larger than expected given the absolute error magnitude. Discrepancy between MAPE and MAE: MAE looks reasonable while MAPE appears catastrophic. Removing a small number of low-value observations dramatically changes MAPE.
Mitigation
Set a minimum denominator threshold: exclude observations where and report the exclusion count. Alternatively, use MAAPE (Mean Arctangent APE), which bounds each observation's contribution to regardless of the denominator magnitude. For datasets with mixed scales, WMAPE is a robust default.
Outlier-dominated MAPE from single bad predictions
Cause
A single prediction with a very large absolute percentage error (e.g., predicting 500 when actual is 5, producing 9,900% APE) inflates the mean dramatically. Unlike RMSE where squaring is the amplifier, in MAPE the amplifier is division by a small denominator.
Symptoms
MAPE is very high but median APE is normal. Removing 1-2 observations causes MAPE to drop dramatically. MAPE and Median Absolute Percentage Error (MdAPE) diverge significantly.
Mitigation
Report Median Absolute Percentage Error (MdAPE) alongside MAPE to detect outlier influence. If MAPE >> MdAPE, outliers are skewing the mean. Apply percentile-based filtering (e.g., exclude top 1% APE values) and report both filtered and unfiltered MAPE. Use trimmed MAPE (remove top and bottom 5% of APE values) for a robust central estimate.
Cross-formulation comparison error
Cause
Different systems use different MAPE variants without documentation. System A reports standard MAPE, System B reports SMAPE, System C reports WMAPE. Teams compare these numbers directly, drawing incorrect conclusions about relative model quality.
Symptoms
Model A appears better in one dashboard but worse in another. Reported accuracy metrics are inconsistent across teams. Model selection decisions are based on non-comparable numbers.
Mitigation
Standardize on a single variant across the organization and document it explicitly. Include the formula in dashboard tooltips. Tag metric values with the variant used (e.g., mape_standard, mape_weighted, smape). Automated validation in the metrics pipeline should reject comparisons across different variants.
Misleading cross-series average MAPE
Cause
Computing per-series MAPE and then averaging across series gives equal weight to all series regardless of volume. A low-volume series with 80% MAPE dominates the average as much as a high-volume series with 5% MAPE, even though the business impact is vastly different.
Symptoms
Overall MAPE appears poor despite the model performing well on high-volume (business-critical) items. Efforts to reduce overall MAPE focus on low-volume edge cases rather than high-impact items. Stakeholders lose trust in the metric.
Mitigation
Use WMAPE for cross-series aggregation, which naturally weights by volume. If standard MAPE is required, report it at the series level and aggregate using a volume-weighted average. Segment reporting into volume tiers (high/medium/low) with separate MAPE targets for each tier.
Placement in an ML System
Where MAPE Sits in the ML System
MAPE occupies the evaluation and monitoring layer of the ML pipeline, sitting between the model's prediction output and the business decision layer. It is not part of the model itself -- it is a measurement tool applied to the model's outputs.
In a typical demand forecasting system at a company like Flipkart or Swiggy, the flow is: (1) the model produces predictions for next-day demand per SKU-store combination, (2) these predictions are logged to a prediction store, (3) after the prediction horizon elapses, actual demand is observed, (4) the evaluation pipeline joins predictions with actuals and computes MAPE (along with companion metrics), (5) MAPE feeds into dashboards, alerting systems, and model retraining triggers.
MAPE also appears in the model selection phase during development. When comparing multiple candidate models (e.g., LightGBM vs Prophet vs DeepAR), MAPE on the validation set is often the primary ranking criterion -- which is precisely where its asymmetry bias can silently influence model choice toward under-forecasting approaches.
Production Pattern: Use WMAPE as the primary monitoring metric (zero-safe, volume-weighted), standard MAPE on the non-zero subset for industry benchmark compatibility, and forecast bias (mean signed error) for directional monitoring. This three-metric combination covers MAPE's weaknesses while preserving its communication advantages.
Pipeline Stage
Evaluation / Monitoring
Upstream
- Model serving endpoint (predictions)
- Ground truth collection pipeline
- Feature store (for segmented evaluation)
Downstream
- Monitoring dashboards and alerting
- Model selection and champion/challenger comparison
- Business reporting (forecast accuracy KPIs)
- Retraining triggers
Scaling Bottlenecks
MAPE itself is computationally trivial -- with minimal memory requirements. The bottleneck in production is not the metric computation but the data pipeline that delivers paired (prediction, actual) values. In demand forecasting systems, actuals often arrive days or weeks after predictions are made, requiring temporal join infrastructure to align forecasts with observed outcomes. At scale (millions of SKUs across hundreds of stores), this join operation dominates cost.
For real-time monitoring, the zero-checking logic adds a branch per observation. At 100K predictions/second, this is negligible. The practical bottleneck is maintaining sliding-window accumulators across multiple MAPE variants (standard, WMAPE, per-segment) simultaneously, which requires careful memory management on streaming infrastructure like Kafka Streams or Flink.
Production Case Studies
Swiggy Instamart's demand forecasting team uses MAPE and WMAPE to evaluate predictions for perishable and non-perishable SKUs across their dark store network. Their engineering blog describes how they adapted metric alignment to account for the different business impacts of over- vs under-forecasting: over-forecasting perishables leads to spoilage waste, while under-forecasting leads to stockouts and lost orders. They explicitly address MAPE's asymmetry by using cost-aware metrics alongside MAPE.
The adaptive metric framework enabled Swiggy Instamart to reduce food waste by aligning forecasting incentives with actual business costs, rather than optimizing blindly for symmetric MAPE. WMAPE is used as the primary accuracy KPI across their demand planning dashboards.
Amazon's Forecast service documentation explicitly discusses MAPE alongside WAPE (Weighted Absolute Percentage Error, equivalent to WMAPE) for evaluating demand forecasts. They note that MAPE uses an unweighted average that emphasizes forecasting error on all items equally regardless of demand volume, and recommend WAPE when forecasting accuracy should be proportional to item volume. Their documentation serves as a reference for the MAPE vs WMAPE decision framework used across the retail industry.
Amazon Forecast defaults to WAPE as the primary accuracy metric, acknowledging MAPE's limitations for real-world demand datasets with varying volumes and potential zeros. The documentation has become a widely cited industry reference for metric selection.
The Makridakis forecasting competitions have been the most influential benchmarks in time series forecasting since 1982. The M3 competition (2000) and M4 competition (2018) used sMAPE (Symmetric MAPE) as a primary evaluation metric alongside MASE. The M5 competition (2020) shifted to RMSSE for point accuracy, partly due to the documented issues with MAPE and sMAPE. This evolution across competitions traces the forecasting community's gradual move away from percentage-based metrics.
The M4 competition evaluated 61 forecasting methods across 100,000 time series. The winning hybrid method (Smyl, 2020) achieved 11.374% sMAPE overall, outperforming pure statistical methods by ~9.4%. The competition's shift away from standard MAPE to sMAPE, and eventually to scaled metrics in M5, reflects the community's recognition of MAPE's fundamental limitations.
Uber's marketplace forecasting team uses MAPE to evaluate demand forecasts for ride requests across cities and time windows. Their engineering blog describes how they manage MAPE's challenges in practice: low-demand periods (late night, rural areas) produce inflated percentage errors, and the team uses segment-specific MAPE thresholds -- tighter for high-demand urban zones, more lenient for sparse suburban areas. They complement MAPE with absolute metrics for capacity planning.
Uber reports using ensemble forecasting models evaluated with MAPE, achieving sub-15% MAPE on aggregate urban demand. The segment-specific threshold approach prevents low-volume edge cases from masking poor performance on business-critical high-volume segments.
Tooling & Ecosystem
The standard implementation for offline MAPE evaluation in Python. Returns MAPE as a fraction (not percentage). Supports sample weights and multi-output. Returns inf when actuals contain zeros -- by design. Available since scikit-learn 0.24.
Provides time series forecasting tools (ARIMA, ETS, VAR) with MAPE evaluation built into the diagnostics. Useful for classical time series models where MAPE is the default accuracy metric. Also includes SMAPE and other forecast error metrics.
Meta's open-source forecasting library. Reports MAPE in its cross-validation diagnostics via performance_metrics(). Commonly used for business time series (daily sales, website traffic). Supports automatic hyperparameter tuning with MAPE as the optimization target.
Deep learning toolkit for probabilistic time series forecasting. Includes MAPE, sMAPE, MASE, and MSIS as built-in evaluation metrics via its Evaluator class. Integrates with DeepAR, Transformer, and other neural forecasting models.
Distributed evaluation framework for large-scale regression models. Supports MAPE via the RegressionEvaluator with metricName='mae' (compute MAE, then normalize externally for MAPE). Suitable for evaluating forecasts over millions of SKU-store-day combinations.
Modern forecasting framework providing both classical (StatsForecast) and neural (NeuralForecast) models. Includes MAPE, SMAPE, WMAPE, and MASE in its evaluation suite. Optimized for large-scale time series (thousands of series in parallel) with GPU acceleration.
Research & References
Hyndman, R.J. and Koehler, A.B. (2006)International Journal of Forecasting, 22(4), 679-688
The landmark paper that systematically analyzed the shortcomings of MAPE and sMAPE, proposing MASE (Mean Absolute Scaled Error) as a superior alternative. Demonstrated that percentage-based metrics are degenerate when actuals include zeros and recommended abandoning them for cross-series comparisons.
Makridakis, S., Spiliotis, E., and Assimakopoulos, V. (2020)International Journal of Forecasting, 36(1), 54-74
The M4 competition used sMAPE as a primary evaluation metric across 100,000 time series. Results showed that hybrid ML-statistical methods outperformed pure approaches, with the winning method achieving 11.374% sMAPE. The paper's detailed accuracy tables serve as benchmark references for the forecasting community.
Kim, S. and Kim, H. (2016)International Journal of Forecasting, 32(3), 669-679
Proposed MAAPE (Mean Arctangent Absolute Percentage Error), which replaces MAPE's division with an arctangent function to bound errors in . Demonstrated superior behavior on intermittent demand data where MAPE is undefined or misleadingly large.
Chen, C., Twycross, J., and Garibaldi, J.M. (2017)PLOS ONE, 12(3), e0174202
Proposed a bounded relative error metric that addresses MAPE's zero-denominator and asymmetry issues simultaneously. Evaluated against MAPE, sMAPE, and MASE on multiple real-world datasets, showing improved stability and interpretability.
Hyndman, R.J. (2014)Hyndsight Blog (Technical Note)
Demonstrated that sMAPE (Symmetric MAPE) is not actually symmetric -- it favors over-forecasting, the opposite bias from standard MAPE. This influential blog post is widely cited in forecasting literature and contributed to the M5 competition's decision to abandon percentage-based metrics.
Interview & Evaluation Perspective
Common Interview Questions
- ●
What is MAPE and when would you use it over MAE or RMSE?
- ●
Explain MAPE's asymmetry problem. How does it affect model selection?
- ●
What happens to MAPE when actual values are zero? How would you handle this?
- ●
Compare MAPE, SMAPE, and WMAPE. When would you use each?
- ●
How would you set up MAPE monitoring for a demand forecasting system with thousands of SKUs?
- ●
Why should you never use MAPE as a training loss function?
Key Points to Mention
- ●
MAPE is scale-free (percentage-based), which enables cross-product and cross-series comparison -- its primary advantage over MAE.
- ●
MAPE is undefined when actual values are zero and explodes near zero. Always check for zeros before computing. Use WMAPE for zero-safe evaluation.
- ●
MAPE has an asymmetry bias: it penalizes over-predictions more than under-predictions, systematically favoring models that under-forecast. Track forecast bias (mean signed error) to detect this.
- ●
WMAPE = sum(|errors|) / sum(|actuals|) -- aggregates before dividing, avoiding per-element zero issues and weighting by volume. It is the preferred variant for production demand forecasting.
- ●
SMAPE was designed to fix MAPE's asymmetry but introduces the opposite bias (favors over-forecasting). Neither is truly symmetric. WMAPE is a better alternative for practical use.
- ●
Never use MAPE as a training loss: undefined gradients at y=0, explosion near zero, and asymmetry bias make it unsuitable for optimization. Train with MAE or Huber, evaluate with MAPE.
Pitfalls to Avoid
- ●
Claiming SMAPE solves MAPE's asymmetry -- it reverses the bias direction instead of eliminating it. Hyndman (2014) demonstrated this clearly.
- ●
Forgetting that scikit-learn returns MAPE as a fraction (0.12), not a percentage (12%). Quoting the wrong convention in an interview is a credibility issue.
- ●
Stating MAPE is bounded between 0% and 100% -- it is bounded below by 0% but has no upper bound on the high side. Under-predictions are bounded at 100%, over-predictions are unbounded.
- ●
Ignoring the business impact of MAPE's asymmetry: under-forecasting bias leads to stockouts in inventory management, a critical operational failure that directly impacts revenue.
Senior-Level Expectation
A senior candidate should articulate the full decision tree: (1) check data for zeros -- if present, immediately switch to WMAPE or MAAPE, (2) if using standard MAPE, always report forecast bias alongside it to detect asymmetry effects, (3) for cross-series aggregation, use WMAPE (volume-weighted) rather than averaging per-series MAPE, (4) in production monitoring, set up three-metric dashboards (WMAPE + MAE + bias) with segment-level breakdowns. They should reference the Hyndman-Koehler (2006) paper and the M-competition evolution from MAPE to sMAPE to MASE as evidence of the metric's limitations. Cost analysis should include the business cost of MAPE-induced under-forecasting (e.g., stockout costs in Indian FMCG, estimated at 2-5% of revenue for chronic under-forecasters) and frame metric selection as a business decision, not just a technical one.
Summary
MAPE (Mean Absolute Percentage Error) is the most widely used percentage-based metric for evaluating regression and forecasting models. Its formula -- the average of absolute percentage errors -- produces a scale-free, universally interpretable accuracy measure. When a demand planner reports 12% MAPE, every stakeholder in the room understands what it means.
However, MAPE carries three fundamental limitations that every ML practitioner must internalize: (1) it is undefined when actual values are zero and explodes near zero, making it unusable for intermittent demand or zero-inclusive targets, (2) it has an asymmetry bias that penalizes over-predictions more than under-predictions, systematically favoring models that under-forecast (a potentially costly bias in inventory and capacity planning), and (3) it is unbounded on the high side, allowing a single low-denominator observation to produce thousands-of-percent error that dominates the average.
The practical response to these limitations is to use WMAPE (Weighted MAPE) as the primary production metric -- it avoids per-element division, naturally weights by volume, and handles individual zeros gracefully. Complement WMAPE with MAE for absolute error magnitude and forecast bias (mean signed error) for directional monitoring. Report standard MAPE only on the non-zero subset for backward compatibility with industry benchmarks. This three-metric approach -- WMAPE for business KPIs, MAE for debugging, bias for directionality -- covers all of MAPE's weaknesses while preserving the percentage-based communication that makes it so valuable to business stakeholders.