What is a good MAPE score?

There is no universal "good" MAPE threshold -- it depends entirely on the domain, forecast horizon, and data characteristics. However, general industry guidelines exist: - ** 30% MAPE**: Poor. Usually indicates model issues, data quality problems, or inherently unpredictable demand (fashion items, promotions). These ranges vary significantly by industry. Energy load forecasting routinely achieves 2-5% MAPE because demand is stable and well-patterned. Fashion retail may see 30-50% MAPE on new-season items because demand is inherently uncertain. Always calibrate your expectations against domain-specific benchmarks rather than arbitrary thresholds.

Why does MAPE penalize over-predictions more than under-predictions?

The asymmetry comes from dividing by the actual value. Consider a product with actual demand of 100 units: - **Under-predict by 50**: Predicted = 50, APE = |100 - 50| / 100 = **50%** - **Over-predict by 50**: Predicted = 150, APE = |100 - 150| / 100 = **50%** These look symmetric, right? Now flip the scenario with actual = 50: - **Under-predict by 50**: Predicted = 0, APE = |50 - 0| / 50 = **100%** (maximum) - **Over-predict by 50**: Predicted = 100, APE = |50 - 100| / 50 = **100%** Still symmetric? Now consider actual = 200 vs actual = 50 with the same absolute error of 50: - Actual = 200, Error = 50: APE = 50/200 = **25%** - Actual = 50, Error = 50: APE = 50/50 = **100%** The denominator matters enormously. When the actual value is large (which happens during over-demand periods), percentage errors are compressed. When the actual value is small (under-demand periods), percentage errors are amplified. Since over-predictions occur when actuals are small and under-predictions occur when actuals are large, the net effect is that over-predictions get penalized more. Armstrong (1985) was the first to formally document this bias.

How is WMAPE different from MAPE?

The key difference is **when division happens**: - **MAPE** divides each error by its corresponding actual, then averages: $\frac{1}{n} \sum \frac{|e_i|}{|y_i|}$ - **WMAPE** sums all errors and all actuals separately, then divides once: $\frac{\sum |e_i|}{\sum |y_i|}$ This seemingly small change has three major consequences: 1. **Zero-safe**: Individual zeros in $y_i$ do not cause division by zero because they contribute to the sum of actuals without being individual denominators. 2. **Volume-weighted**: High-volume items automatically contribute more to the metric, aligning with business impact. A 20-unit error on a 1,000-unit product matters more than a 2-unit error on a 10-unit product. 3. **No asymmetry per observation**: WMAPE is equivalent to MAE divided by mean actual, which does not favor over- or under-prediction at the individual observation level. For these reasons, WMAPE is the preferred variant in production demand forecasting. Amazon Forecast defaults to WAPE (their name for WMAPE), and Swiggy Instamart uses WMAPE as their primary accuracy KPI.

Is SMAPE really symmetric? Should I use it?

Despite its name, SMAPE is **not truly symmetric**. Hyndman (2014) demonstrated that SMAPE has the opposite bias from MAPE: it favors **over-forecasting** rather than under-forecasting. The reason is that the denominator $(|y| + |\hat{y}|)/2$ is influenced by the predicted value. When predictions are high, the denominator is larger, shrinking the percentage error. Additionally, SMAPE has an awkward range of 0% to 200%, which is unintuitive to stakeholders. It also behaves oddly when both actual and predicted values are small -- a prediction of 1 vs actual of 0 produces SMAPE of 200%, which is hard to interpret meaningfully. The M3 and M4 competitions used SMAPE, giving it visibility, but the M5 competition (2020) abandoned it in favor of RMSSE, reflecting the forecasting community's growing consensus that SMAPE's "fix" introduces as many problems as it solves. **Recommendation**: Use WMAPE instead of SMAPE. It is truly symmetric (at the aggregate level), handles zeros, and is more intuitive to stakeholders. If you must use a per-observation percentage metric that is somewhat symmetric, consider MAAPE (Kim and Kim, 2016).

Can I use MAPE for classification or probability outputs?

No. MAPE is strictly a regression metric designed for continuous targets with meaningful magnitudes. Using it for classification outputs is mathematically nonsensical and will produce misleading results. For **class labels** (0, 1, 2, ...), MAPE would divide by the class label value, making class 0 undefined and making errors on class 1 appear larger than errors on class 5 -- which has no valid interpretation. For **probability outputs** (0.0 to 1.0), MAPE would produce enormous percentage errors near probability 0 (e.g., predicting 0.1 when actual is 0.01 gives 900% APE) while treating errors near probability 1.0 as minor. Instead, use: - **Classification**: Accuracy, precision, recall, F1, AUC-ROC, log loss - **Probabilities**: Brier score, log loss, calibration plots - **Rankings**: NDCG, MAP, MRR

How do I handle MAPE for intermittent demand (many zero observations)?

Intermittent demand -- where many periods have zero sales -- is MAPE's worst-case scenario. Standard MAPE is simply unusable because the zeros make it undefined. Here are four practical approaches: 1. **WMAPE**: Sum all absolute errors and divide by sum of all actuals. As long as total demand across the evaluation period is non-zero, WMAPE is well-defined. This is the simplest fix and works well when at least some periods have non-zero demand. 2. **MAAPE**: Replace $|e_i/y_i|$ with $\arctan(|e_i/y_i|)$, which bounds each observation to $[0, \pi/2)$ even when $y_i = 0$ (the arctangent of infinity is $\pi/2$). Proposed specifically for intermittent demand by Kim and Kim (2016). 3. **MASE**: Mean Absolute Scaled Error, proposed by Hyndman and Koehler (2006), scales errors by the in-sample MAE of a naive forecast. Handles zeros and provides a meaningful baseline comparison. 4. **Croston's method + MAE**: For intermittent demand, use Croston's method (or SBA, TSB variants) for forecasting and evaluate with MAE rather than any percentage metric. This avoids the zero-denominator problem entirely. In Indian e-commerce, intermittent demand is common for long-tail SKUs (niche electronics, specialty items on Flipkart or Amazon India). Teams typically use WMAPE aggregated at the category level and MAE at the SKU level for these items.

What is the relationship between MAPE and the median?

This is a subtle but important statistical connection: **minimizing MAPE does not produce median forecasts** in the way that minimizing MAE does. Minimizing MAE produces predictions of the **conditional median** of $y$. Minimizing MAPE, however, produces predictions of the **conditional median of the percentage errors**, which is not the same as the median of $y$. Specifically, MAPE minimization yields: $$\hat{y}^* = \text{median}(y) \cdot \exp(-\sigma^2 / 2)$$ for log-normal distributions, where $\sigma^2$ is the variance of the log-transformed target. This means MAPE-optimal predictions are biased **downward** compared to the true median -- another manifestation of the under-forecasting bias. In practice, this means: if you train a model with a MAPE-like loss, your predictions will be systematically lower than the true central tendency. For demand forecasting, this translates to chronic under-stocking. Train with MAE (for median prediction) or MSE (for mean prediction), then evaluate with MAPE for reporting purposes.

How much does MAPE evaluation cost in production for a large-scale system?

The computational cost of MAPE itself is negligible -- $O(n)$ with about 4 floating-point operations per observation (subtract, abs, divide, accumulate). For 10 million daily predictions, this takes about 50ms on a single core. The real costs are in the surrounding infrastructure: **Data pipeline costs**: Joining predictions with actuals requires temporal alignment infrastructure. For a demand forecasting system with 1M SKUs across 500 stores, that is 500M prediction-actual pairs to join daily. On AWS, this might cost ₹5,000-10,000/day (~$60-120) in compute (EMR/Glue). **Storage costs**: Storing prediction logs with actuals for historical MAPE trending at 500M rows/day (compressed) requires approximately 5-10 GB/day, or ~3.6 TB/year. On S3, this costs ~₹6,000/year ($72). On a data warehouse like Redshift or BigQuery, query costs dominate at ~₹500-2,000/month ($6-24) depending on query patterns. **Monitoring infrastructure**: Real-time MAPE dashboards with segment-level breakdowns (per category, per region, per store) require streaming computation. On Kafka + Flink, infrastructure costs run ₹20,000-50,000/month ($240-600) for a moderately sized setup. Total cost for production MAPE monitoring on a large-scale Indian e-commerce platform: roughly ₹30,000-80,000/month ($360-960), dominated by data pipeline and monitoring infrastructure rather than the metric computation itself.

Evaluation

MAPE in Machine Learning

Mean Absolute Percentage Error (MAPE) is the most universally recognized percentage-based metric for evaluating regression and forecasting models. Its appeal is immediate: MAPE expresses prediction error as a percentage, making it scale-free and instantly interpretable by technical and non-technical audiences alike. When a demand planner hears that the forecast has 12% MAPE, they know exactly what it means without asking about units, scales, or baselines.

But MAPE carries a set of well-documented pitfalls that trip up even experienced practitioners. It explodes when actual values are zero or near-zero, it penalizes over-predictions more heavily than under-predictions (a subtle asymmetry first noted by Armstrong in 1985), and it has no upper bound on the high side while being capped at 100% on the low side. These quirks have spawned a family of variants -- SMAPE, WMAPE, MAAPE -- each attempting to patch a specific weakness.

Despite these limitations, MAPE remains the de facto standard for forecast accuracy in supply chain, retail demand planning, energy forecasting, and financial projections across the globe. From Flipkart's inventory optimization during Big Billion Days to Swiggy Instamart's daily demand prediction for perishable goods, Indian companies routinely track MAPE as a primary KPI. Understanding when MAPE works, when it misleads, and which variant to reach for is essential knowledge for any ML engineer working on regression or time series problems.

Concept Snapshot

What It Is: A scale-free regression metric that expresses the average prediction error as a percentage of the actual values, enabling cross-series and cross-product comparisons.
Category: Evaluation
Complexity: Beginner
Inputs / Outputs: Predictions array + ground truth array (non-zero actuals) → single MAPE score (non-negative percentage, 0% is perfect)
System Placement: Post-prediction evaluation stage in training, validation, production monitoring, and demand planning dashboards
Also Known As: Mean Absolute Percent Error, Percentage Error, APE (when not averaged), Mean Percentage Deviation
Typical Users: Data Scientists, ML Engineers, Demand Planners, Supply Chain Analysts, Business Analysts, Product Managers
Prerequisites: Basic statistics (mean, absolute value, percentages), Understanding of regression tasks, Familiarity with MAE (Mean Absolute Error)
Key Terms: percentage errorabsolute percentage errorasymmetry biasSMAPEWMAPEMAAPEscale-free metricdemand forecastingzero-value problemforecast accuracy

Why This Concept Exists

The Scale Problem That MAE Cannot Solve

Imagine you are managing two product lines at Flipkart. Product A sells for ₹50,000 (a premium laptop) and Product B sells for ₹500 (a phone case). Your demand forecasting model has an MAE of 2 units for both products. Is the model equally good for both? Absolutely not. Being off by 2 units when the actual demand is 10 laptops (20% error) is far worse than being off by 2 units when actual demand is 2,000 phone cases (0.1% error). MAE, being an absolute metric, cannot distinguish between these situations.

MAPE solves this by normalizing the error against the actual value, converting every prediction error into a percentage. Now Product A shows 20% MAPE and Product B shows 0.1% MAPE -- immediately revealing where attention is needed. This normalization is what makes MAPE scale-free: you can compare forecast accuracy across products, time series, geographies, and business units using a single, universal yardstick.

Historical Roots in Business Forecasting

MAPE's dominance in industry predates machine learning by decades. Supply chain managers, operations researchers, and demand planners adopted percentage-based accuracy measures in the 1970s and 1980s because their work inherently involved comparing forecasts across thousands of SKUs with wildly different volume scales. When Makridakis organized the first M-Competition in 1982 to systematically compare forecasting methods, percentage-based metrics including MAPE featured prominently in the evaluation framework.

The academic community quickly identified problems with MAPE. Armstrong (1985) was the first to formally document MAPE's asymmetry, noting that "it has a bias favoring estimates that are below the actual values." This observation triggered a long line of research into improved variants: Makridakis proposed SMAPE (Symmetric MAPE) in 1993 for the M3 competition, Hyndman and Koehler (2006) advocated abandoning percentage-based metrics entirely in favor of MASE (Mean Absolute Scaled Error), and Kim and Kim (2016) introduced MAAPE (Mean Arctangent Absolute Percentage Error) as a bounded alternative.

Why It Persists Despite Known Flaws

Despite four decades of criticism, MAPE remains the most widely used forecast accuracy metric in industry. The reason is simple: communication trumps mathematical purity. When a VP of Supply Chain asks "how accurate is our forecast?", the answer "12% MAPE" is immediately actionable. It means: on average, we are off by 12%. No further explanation needed. Try explaining MASE or RMSSE to the same audience and watch their eyes glaze over.

This tension between statistical rigor and practical communication is at the heart of MAPE's endurance. As an ML engineer, your job is not to avoid MAPE entirely, but to understand its failure modes, know when to use variants, and always complement it with at least one metric that does not share its weaknesses.

Core Intuition & Mental Model

The Simplest Way to Think About MAPE

MAPE answers one question: "On average, what percentage of the actual value did my predictions miss by?" If your delivery time model predicts 45 minutes for a trip that actually takes 50 minutes, the absolute percentage error for that single prediction is $|50 - 45| / 50 = 10\%$ . MAPE is simply the average of these percentage errors across all predictions.

The percentage framing is what gives MAPE its power. An error of ₹1,000 means nothing without context. Is it ₹1,000 on a ₹10,000 product (10% -- concerning) or ₹1,000 on a ₹10,00,000 property (0.1% -- excellent)? MAPE embeds this context automatically.

The Asymmetry You Must Understand

Here is the single most important thing about MAPE that most tutorials gloss over: MAPE penalizes over-predictions more than under-predictions of the same magnitude. This is not a minor technicality -- it can systematically bias your model selection and your forecasts.

Consider two scenarios with the same absolute error of 50:

Actual = 150, Predicted = 100 (under-predict by 50): APE = 50/150 = 33.3%
Actual = 100, Predicted = 150 (over-predict by 50): APE = 50/100 = 50.0%

Same absolute miss, but MAPE punishes the over-prediction 50% more harshly. Why? Because the denominator is the actual value. When the actual is large, dividing by it shrinks the percentage. When the actual is small, the same absolute error becomes a larger percentage. This means MAPE systematically favors models that under-predict, which can be disastrous for inventory planning (you will consistently stock too little) or capacity planning (you will consistently under-provision).

The Zero Problem

The other critical intuition: MAPE divides by the actual value. If the actual value is zero, you get division by zero -- the metric is undefined. If the actual value is very small (say, 0.001), even a tiny absolute error produces an astronomically large percentage error, potentially thousands of percent. This makes MAPE completely unreliable for data with zeros or near-zero values, which is extremely common in real-world scenarios like intermittent demand (products that sell zero units on many days), rare event prediction, or any dataset with a natural lower bound near zero.

Rule of thumb: If more than 5% of your actual values are zero or near-zero, do not use MAPE. Reach for WMAPE, SMAPE, or MAE instead.

Technical Foundations

The Standard MAPE Formula

MAPE is defined as:

$\text{MAPE} = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right| \times 100\%$

where:

$n$ is the number of samples
$y_i$ is the actual (true) value for sample $i$ (must be non-zero)
$\hat{y}_i$ is the predicted value for sample $i$
The result is expressed as a percentage

Note: Some implementations (including scikit-learn) omit the $\times 100\%$ factor, returning MAPE as a decimal in $[0, \infty)$ rather than a percentage. Always check the convention.

Properties

Range: $\text{MAPE} \in [0, \infty)$ , with 0% indicating perfect predictions. There is no finite upper bound.
Scale-free: MAPE is unitless (a percentage), enabling comparison across datasets with different target scales.
Asymmetric: Under-predictions have APE bounded by 100% (when $\hat{y}_i = 0$ ), but over-predictions have unbounded APE. This creates a systematic bias favoring under-prediction.
Non-negative: Absolute values ensure all individual errors are non-negative.
Undefined at zero: $\text{MAPE}$ is undefined when any $y_i = 0$ .
Non-differentiable: Like MAE, the absolute value function introduces non-differentiability at zero error.

The Asymmetry Formally

Let $e = |y - \hat{y}|$ be the absolute error. Then:

Under-prediction ( $\hat{y} < y$ ): $\text{APE} = \frac{e}{y}$ , which is bounded by $\frac{y}{y} = 1$ (100%) when $\hat{y} = 0$
Over-prediction ( $\hat{y} > y$ ): $\text{APE} = \frac{e}{y}$ , which can exceed 100% without limit as $\hat{y} \to \infty$

This asymmetry means MAPE-minimizing models will systematically under-forecast.

Key Variants

Symmetric MAPE (SMAPE):

$\text{SMAPE} = \frac{1}{n} \sum_{i=1}^{n} \frac{2 |y_i - \hat{y}_i|}{|y_i| + |\hat{y}_i|} \times 100\%$

SMAPE bounds the denominator by using the average of actual and predicted values, making it symmetric in the sense that swapping $y$ and $\hat{y}$ gives the same result. SMAPE ranges from 0% to 200%. However, Hyndman (2014) showed that SMAPE is still not truly symmetric and favors over-forecasting (the opposite bias from MAPE).

Weighted MAPE (WMAPE):

$\text{WMAPE} = \frac{\sum_{i=1}^{n} |y_i - \hat{y}_i|}{\sum_{i=1}^{n} |y_i|} \times 100\%$

WMAPE weights each observation's error by its share of total actual volume. High-volume items dominate the metric, low-volume items have negligible impact. This is equivalent to MAE divided by mean actual value. WMAPE avoids the zero-denominator problem when individual actuals are zero (as long as the sum is non-zero) and naturally handles the asymmetry issue because it is computed as a ratio of aggregated sums rather than an average of ratios.

Mean Arctangent Absolute Percentage Error (MAAPE):

$\text{MAAPE} = \frac{1}{n} \sum_{i=1}^{n} \arctan\left(\left| \frac{y_i - \hat{y}_i}{y_i} \right|\right)$

MAPE replaced by its arctangent, which bounds the error to $[0, \pi/2)$ , eliminating the infinity problem when actuals are zero or near-zero. Proposed by Kim and Kim (2016).

Computational Complexity

MAPE requires $O(n)$ operations: one subtraction, one absolute value, one division per sample, plus final averaging. Identical computational cost to MAE with one additional division per element.

Internal Architecture

The MAPE computation pipeline extends the standard MAE architecture with a normalization step that divides each absolute error by its corresponding actual value. This seemingly simple addition introduces significant complexity around zero handling, numerical stability, and variant selection.

MAPE (Mean Absolute Percentage Error) in ML Systems Architecture — The architecture diagram shows predictions and ground truth merging at a subtraction step, flowin...

The critical architectural difference from MAE is the zero-check gate after absolute error computation. Production systems must decide how to handle zero or near-zero actuals: skip them (biasing the metric toward non-zero observations), clip to a minimum denominator (introducing an arbitrary threshold), or switch to a variant like WMAPE that avoids per-element division entirely.

Key Components

Input Validator

Validates that y_true and y_pred have matching shapes, checks for NaN/Inf values, and flags zero or near-zero values in y_true. Emits warnings or errors based on the configured zero-handling policy.

Residual Computer

Performs element-wise subtraction $r_i = y_i - \hat{y}_i$ to compute raw residuals. Same as MAE -- this step is shared across all error metrics.

Absolute Value Transformer

Applies $|r_i|$ to each residual, converting all errors to non-negative values. For SMAPE, also computes $|y_i| + |\hat{y}_i|$ for the symmetric denominator.

Zero-Value Gate

The critical component unique to MAPE. Inspects each $y_i$ and routes computation: if $y_i = 0$ or $|y_i| < \epsilon$ , applies the configured handling strategy (skip, clip, replace with fallback metric). Logs the count of zero-handled samples for monitoring.

Percentage Normalizer

Divides each absolute error by its corresponding actual value: $|r_i| / |y_i|$ . For WMAPE, replaces per-element division with aggregate division: $\sum |r_i| / \sum |y_i|$ . For SMAPE, uses $2|r_i| / (|y_i| + |\hat{y}_i|)$ as the denominator.

Aggregator

Averages percentage errors across all valid samples (excluding zeros if configured to skip). Multiplies by 100 to convert to percentage form. For WMAPE, computes the ratio of aggregated sums directly.

Output Formatter

Returns MAPE as a scalar percentage with metadata: sample count, number of zero-excluded observations, the variant used, and optionally the per-element APE distribution for percentile analysis.

Data Flow

Data flows through the pipeline with a critical branch point at the zero-value gate. The write path is straightforward: predictions and actuals enter, residuals are computed, absolute values are taken, and each is divided by the actual value before averaging. The key complexity lies in the zero-handling decision. In production systems at companies like Swiggy, the pipeline often computes MAPE, WMAPE, and MAE in parallel from the same residuals, branching after the absolute value step to avoid redundant computation. The zero-value gate logs statistics about how many observations were excluded, providing a data quality signal. If more than a configurable threshold (typically 5-10%) of observations trigger zero handling, the system may automatically fall back to WMAPE or MAE and alert the monitoring team.

The architecture diagram shows predictions and ground truth merging at a subtraction step, flowing through absolute value computation, then hitting a critical zero-check decision gate (highlighted in red). Non-zero values proceed through percentage normalization (orange) while zero values are routed to a handling strategy. Both paths converge at the sum-and-average aggregator, producing the final MAPE percentage (green). Dashed lines show variant selection paths for SMAPE and WMAPE as alternative computation routes.

How to Implement

Implementing MAPE correctly in production requires navigating three challenges that do not exist for simpler metrics like MAE: zero-value handling, variant selection, and the asymmetry bias. Most teams start with scikit-learn's mean_absolute_percentage_error for offline evaluation, which returns MAPE as a decimal (not a percentage) and handles zeros by producing infinity values. For training loops, TensorFlow and PyTorch do not include MAPE as a built-in loss function because its non-differentiability at zero and undefined behavior at $y = 0$ make it unsuitable for gradient-based optimization -- you should train with MAE or Huber loss and evaluate with MAPE.

For production monitoring and demand forecasting, custom implementations are almost always necessary. The key decisions are: (1) how to handle zero actuals -- skip them, clip the denominator to a minimum value, or switch to WMAPE, (2) whether to report standard MAPE, SMAPE, or WMAPE based on the data characteristics, and (3) how to complement MAPE with directional metrics that reveal the asymmetry bias.

A critical implementation detail: scikit-learn's mean_absolute_percentage_error returns the result as a fraction (e.g., 0.12 for 12% error), while most business contexts expect a percentage (12%). Always document and verify the convention in your codebase to avoid off-by-100x errors in dashboards and alerts.

MAPE, SMAPE, and WMAPE with Scikit-learn34 lines

import numpy as np
from sklearn.metrics import mean_absolute_percentage_error

# Ground truth and predictions (demand forecasting example)
y_true = np.array([100, 150, 200, 50, 300, 80])
y_pred = np.array([110, 130, 210, 45, 280, 95])

# 1. Standard MAPE via scikit-learn (returns as fraction, not %)
mape_sklearn = mean_absolute_percentage_error(y_true, y_pred)
print(f"MAPE (sklearn, fraction): {mape_sklearn:.4f}")
print(f"MAPE (percentage):        {mape_sklearn * 100:.2f}%")
# Output: MAPE (percentage): 11.08%

# 2. MAPE with sample weights (prioritize high-value predictions)
weights = np.array([2, 1, 1, 3, 1, 2])  # Weight perishables higher
mape_weighted = mean_absolute_percentage_error(
    y_true, y_pred, sample_weight=weights
)
print(f"Weighted MAPE: {mape_weighted * 100:.2f}%")

# 3. Multi-output MAPE (multiple forecast horizons)
y_true_multi = np.array([[100, 110], [200, 190], [150, 160]])
y_pred_multi = np.array([[105, 115], [180, 195], [155, 150]])

mape_multi = mean_absolute_percentage_error(
    y_true_multi, y_pred_multi, multioutput='raw_values'
)
print(f"Per-horizon MAPE: {mape_multi * 100}")  # Per column

# WARNING: sklearn produces inf when y_true contains zeros!
y_with_zero = np.array([100, 0, 200])
y_pred_zero = np.array([110, 5, 190])
mape_inf = mean_absolute_percentage_error(y_with_zero, y_pred_zero)
print(f"MAPE with zero: {mape_inf}")  # inf!

This demonstrates scikit-learn's mean_absolute_percentage_error, introduced in version 0.24. Critical points: (1) it returns MAPE as a fraction (0.12), not a percentage (12%) -- multiply by 100 for reporting, (2) it produces inf when any actual value is zero, so always check your data first, (3) it supports sample weights and multi-output, making it usable for weighted variants and multi-horizon forecasts. For production use, always wrap this in a function that checks for zeros and handles them gracefully.

Complete MAPE Variants: Standard, SMAPE, WMAPE, MAAPE110 lines

import numpy as np
from typing import Optional, Dict

def mape(
    y_true: np.ndarray,
    y_pred: np.ndarray,
    zero_strategy: str = "skip",
    epsilon: float = 1e-8
) -> float:
    """
    Standard MAPE with configurable zero handling.
    
    Args:
        y_true: Actual values
        y_pred: Predicted values
        zero_strategy: 'skip' (exclude zeros), 'clip' (clip denom to epsilon),
                       'error' (raise ValueError)
        epsilon: Minimum denominator for 'clip' strategy
    Returns:
        MAPE as a percentage (e.g., 12.5 means 12.5%)
    """
    y_true = np.asarray(y_true, dtype=np.float64)
    y_pred = np.asarray(y_pred, dtype=np.float64)
    
    if y_true.shape != y_pred.shape:
        raise ValueError(f"Shape mismatch: {y_true.shape} vs {y_pred.shape}")
    
    abs_errors = np.abs(y_true - y_pred)
    
    if zero_strategy == "skip":
        mask = np.abs(y_true) > epsilon
        if not mask.any():
            raise ValueError("All actual values are zero")
        ape = abs_errors[mask] / np.abs(y_true[mask])
    elif zero_strategy == "clip":
        denom = np.maximum(np.abs(y_true), epsilon)
        ape = abs_errors / denom
    elif zero_strategy == "error":
        if np.any(np.abs(y_true) < epsilon):
            raise ValueError("Zero actual values found; MAPE is undefined")
        ape = abs_errors / np.abs(y_true)
    else:
        raise ValueError(f"Unknown zero_strategy: {zero_strategy}")
    
    return float(np.mean(ape) * 100)


def smape(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    """
    Symmetric MAPE. Range: [0%, 200%].
    Avoids division by zero when both actual and predicted are nonzero.
    """
    y_true = np.asarray(y_true, dtype=np.float64)
    y_pred = np.asarray(y_pred, dtype=np.float64)
    
    numerator = np.abs(y_true - y_pred)
    denominator = (np.abs(y_true) + np.abs(y_pred)) / 2
    
    # Handle case where both actual and predicted are zero
    mask = denominator > 0
    result = np.zeros_like(numerator)
    result[mask] = numerator[mask] / denominator[mask]
    
    return float(np.mean(result) * 100)


def wmape(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    """
    Weighted MAPE. Equivalent to MAE / mean(|y_true|).
    Handles individual zeros gracefully; only fails if ALL actuals are zero.
    """
    y_true = np.asarray(y_true, dtype=np.float64)
    y_pred = np.asarray(y_pred, dtype=np.float64)
    
    total_abs_error = np.sum(np.abs(y_true - y_pred))
    total_actual = np.sum(np.abs(y_true))
    
    if total_actual == 0:
        raise ValueError("WMAPE undefined: sum of actuals is zero")
    
    return float(total_abs_error / total_actual * 100)


def maape(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    """
    Mean Arctangent APE. Bounded in [0, pi/2). Handles zeros gracefully.
    Proposed by Kim & Kim (2016).
    """
    y_true = np.asarray(y_true, dtype=np.float64)
    y_pred = np.asarray(y_pred, dtype=np.float64)
    
    abs_errors = np.abs(y_true - y_pred)
    # Use safe division, defaulting to 0 where y_true is 0
    with np.errstate(divide='ignore', invalid='ignore'):
        ratios = np.where(y_true != 0, abs_errors / np.abs(y_true), abs_errors)
    
    return float(np.mean(np.arctan(ratios)))


# Demonstration with demand forecasting data
y_actual = np.array([120, 0, 85, 200, 15, 300, 0, 50])
y_forecast = np.array([130, 5, 80, 190, 20, 310, 3, 55])

print("=== MAPE Variants Comparison ===")
print(f"MAPE (skip zeros):   {mape(y_actual, y_forecast, 'skip'):.2f}%")
print(f"MAPE (clip zeros):   {mape(y_actual, y_forecast, 'clip'):.2f}%")
print(f"SMAPE:               {smape(y_actual, y_forecast):.2f}%")
print(f"WMAPE:               {wmape(y_actual, y_forecast):.2f}%")
print(f"MAAPE (radians):     {maape(y_actual, y_forecast):.4f}")
print(f"\nZero observations:   {np.sum(y_actual == 0)} / {len(y_actual)}")

This provides production-ready implementations of all four major MAPE variants. Key design choices: (1) mape() supports three zero-handling strategies -- skip (most common in practice), clip (numerically stable but introduces bias), and error (for pipelines that must guarantee no zeros), (2) smape() uses the Makridakis (1993) formulation with denominator as the average of actual and predicted absolute values, (3) wmape() avoids per-element division entirely by computing the ratio of aggregated sums, making it naturally robust to individual zeros, (4) maape() bounds the error via arctangent, eliminating infinity for all inputs. The comparison output shows how these variants diverge on data with zeros -- WMAPE and MAAPE remain well-defined while standard MAPE must skip or clip.

MAPE in PyTorch for Evaluation (Not Training)105 lines

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset


class MAPEMetric:
    """
    Stateful MAPE metric for PyTorch evaluation loops.
    Accumulates predictions across batches, computes MAPE at the end.
    
    WARNING: Do NOT use MAPE as a training loss. Use MAE or Huber instead,
    then evaluate with MAPE after training.
    """
    
    def __init__(self, epsilon: float = 1e-8):
        self.epsilon = epsilon
        self.reset()
    
    def reset(self):
        self.total_ape = 0.0
        self.count = 0
        self.skipped_zeros = 0
    
    def update(self, y_pred: torch.Tensor, y_true: torch.Tensor):
        """Accumulate APE values from a batch."""
        abs_errors = torch.abs(y_true - y_pred)
        
        # Skip near-zero actuals
        mask = torch.abs(y_true) > self.epsilon
        valid_ape = abs_errors[mask] / torch.abs(y_true[mask])
        
        self.total_ape += valid_ape.sum().item()
        self.count += mask.sum().item()
        self.skipped_zeros += (~mask).sum().item()
    
    def compute(self) -> dict:
        """Compute MAPE from accumulated values."""
        if self.count == 0:
            return {"mape_pct": float('inf'), "valid_count": 0, "skipped": self.skipped_zeros}
        
        mape_val = (self.total_ape / self.count) * 100
        return {
            "mape_pct": mape_val,
            "valid_count": self.count,
            "skipped_zeros": self.skipped_zeros
        }


# Example: Evaluate a trained model
def evaluate_model_mape(model, dataloader, device='cpu'):
    """Evaluate regression model and report MAPE."""
    model.eval()
    mape_metric = MAPEMetric()
    mae_total = 0.0
    n_samples = 0
    
    with torch.no_grad():
        for X_batch, y_batch in dataloader:
            X_batch, y_batch = X_batch.to(device), y_batch.to(device)
            predictions = model(X_batch)
            
            # Update MAPE (for percentage-based reporting)
            mape_metric.update(predictions, y_batch)
            
            # Also track MAE (for unit-based reporting)
            mae_total += torch.abs(y_batch - predictions).sum().item()
            n_samples += len(y_batch)
    
    results = mape_metric.compute()
    results["mae"] = mae_total / n_samples
    return results


# Demo with synthetic data
torch.manual_seed(42)
model = nn.Sequential(
    nn.Linear(5, 32), nn.ReLU(),
    nn.Linear(32, 16), nn.ReLU(),
    nn.Linear(16, 1)
)

# Train with MAE loss (NOT MAPE)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = nn.L1Loss()  # MAE for training

X = torch.randn(500, 5)
y = torch.randn(500, 1) * 10 + 50  # Positive target values

dataset = TensorDataset(X, y)
train_loader = DataLoader(dataset, batch_size=32, shuffle=True)

# Quick training
for epoch in range(20):
    for X_batch, y_batch in train_loader:
        optimizer.zero_grad()
        loss = criterion(model(X_batch), y_batch)
        loss.backward()
        optimizer.step()

# Evaluate with MAPE
results = evaluate_model_mape(model, train_loader)
print(f"MAPE: {results['mape_pct']:.2f}%")
print(f"MAE:  {results['mae']:.3f}")
print(f"Valid samples: {results['valid_count']}")
print(f"Skipped (zero actuals): {results['skipped_zeros']}")

This demonstrates the correct pattern for using MAPE with PyTorch: train with MAE (L1 loss), evaluate with MAPE. MAPE should never be used as a training loss because (1) it is undefined at $y = 0$ , (2) its gradients explode near zero, and (3) its asymmetry biases the model toward under-prediction. The MAPEMetric class follows the stateful metric pattern used in TorchMetrics and Keras -- accumulating values across batches and computing the final metric once. The evaluate_model_mape function reports both MAPE (for stakeholder communication) and MAE (for debugging), along with metadata about how many zero-valued samples were skipped.

Production MAPE Monitoring with Asymmetry Tracking78 lines

import numpy as np
from collections import deque
from typing import List


class MAPEMonitor:
    """Production MAPE monitor with asymmetry and drift detection."""
    
    def __init__(self, mape_threshold: float = 15.0, history_size: int = 168):
        self.mape_threshold = mape_threshold
        self.history: deque = deque(maxlen=history_size)
        self._errors: List[float] = []
        self._actuals: List[float] = []
        self._over_count = 0
        self._total_count = 0
    
    def record(self, y_true: float, y_pred: float):
        """Record a single prediction."""
        self._errors.append(abs(y_true - y_pred))
        self._actuals.append(abs(y_true))
        self._total_count += 1
        if y_pred > y_true:
            self._over_count += 1
    
    def flush_window(self) -> dict:
        """Compute WMAPE and asymmetry for current window."""
        errors = np.array(self._errors)
        actuals = np.array(self._actuals)
        
        total_actual = np.sum(actuals)
        wmape = (np.sum(errors) / total_actual * 100) if total_actual > 0 else float('inf')
        over_ratio = self._over_count / max(self._total_count, 1)
        
        result = {
            'wmape_pct': wmape,
            'over_predict_ratio': over_ratio,
            'sample_count': self._total_count,
            'zero_count': int(np.sum(actuals < 1e-8))
        }
        self.history.append(result)
        
        # Reset
        self._errors, self._actuals = [], []
        self._over_count, self._total_count = 0, 0
        return result
    
    def check_alerts(self) -> List[str]:
        """Check threshold, asymmetry, and drift alerts."""
        alerts = []
        if not self.history:
            return alerts
        latest = self.history[-1]
        
        if latest['wmape_pct'] > self.mape_threshold:
            alerts.append(f"WMAPE {latest['wmape_pct']:.1f}% > {self.mape_threshold}%")
        if latest['over_predict_ratio'] > 0.65:
            alerts.append(f"Over-prediction bias: {latest['over_predict_ratio']:.0%}")
        if len(self.history) >= 24:
            drift = latest['wmape_pct'] - self.history[-24]['wmape_pct']
            if drift > 3.0:
                alerts.append(f"MAPE drift: +{drift:.1f}pp over 24h")
        return alerts


# Demo
monitor = MAPEMonitor(mape_threshold=15.0)
np.random.seed(42)
for hour in range(48):
    actuals = np.random.exponential(50, 100)
    preds = actuals + np.random.normal(3, 5 + hour * 0.2, 100)
    for a, p in zip(actuals, preds):
        monitor.record(float(a), float(p))
    result = monitor.flush_window()
    alerts = monitor.check_alerts()
    if hour % 12 == 0:
        print(f"Hour {hour}: WMAPE={result['wmape_pct']:.1f}%")
    for alert in alerts:
        print(f"  [ALERT] {alert}")

This production monitor tracks three key signals: (1) WMAPE as the primary metric (zero-safe, volume-weighted), (2) over-prediction ratio to detect MAPE's asymmetry bias affecting model behavior, and (3) drift detection comparing current MAPE against a 24-hour baseline. This pattern is used in demand forecasting at companies like Swiggy Instamart and Flipkart, where MAPE drift signals model staleness or data distribution shifts.

Configuration Example45 lines

# MAPE monitoring configuration for demand forecasting pipeline
metrics:
  mape:
    variant: "wmape"          # Prefer WMAPE for zero-safety
    report_standard_mape: true  # Also compute standard MAPE for reference
    report_smape: false
    
    zero_handling:
      strategy: "skip"         # skip | clip | error
      clip_epsilon: 1e-6       # Min denominator for clip strategy
      max_zero_ratio: 0.10     # Alert if >10% zeros
    
    thresholds:
      excellent: 5.0           # MAPE < 5%
      good: 10.0               # 5% <= MAPE < 10%
      acceptable: 20.0         # 10% <= MAPE < 20%
      critical: 30.0           # MAPE >= 30% triggers alert
    
    # Asymmetry monitoring
    asymmetry:
      track_direction: true
      over_predict_alert: 0.65  # Alert if >65% over-predictions
      under_predict_alert: 0.65
    
    # Slicing for segment-level MAPE
    slices:
      - dimension: "product_category"
        values: ["electronics", "grocery", "fashion", "home"]
      - dimension: "demand_volume"
        values: ["low", "medium", "high"]
      - dimension: "region"
        values: ["north", "south", "east", "west"]
    
    # Companion metrics (always compute alongside MAPE)
    companion_metrics:
      - "mae"                   # Absolute error in original units
      - "median_absolute_percentage_error"  # Robust to outliers
      - "forecast_bias"         # Mean error (directional)
      - "p90_absolute_percentage_error"     # 90th percentile APE
    
    # Drift detection
    drift:
      baseline_window: "7d"    # Compare against last 7 days
      drift_threshold_pp: 3.0  # Alert on 3 percentage point increase
      check_interval: "1h"

Common Implementation Mistakes

●
Dividing by zero without handling: Computing abs(y_true - y_pred) / y_true when y_true contains zeros produces infinity or NaN. Always check for zeros first. scikit-learn's implementation will return inf -- this is by design, not a bug.
●
Confusing fraction vs percentage: scikit-learn returns MAPE as a fraction (0.12 = 12%). Many teams report this raw value to stakeholders, who interpret it as 0.12% instead of 12%. Always multiply by 100 and label clearly.
●
Using MAPE as a training loss: MAPE's undefined gradient at $y = 0$ and explosion near zero make it unsuitable for backpropagation. Train with MAE or Huber loss, then evaluate with MAPE.
●
Ignoring the asymmetry bias: MAPE systematically favors models that under-predict. If you select models based on MAPE alone, you will choose models that consistently forecast too low -- potentially catastrophic for inventory planning or capacity provisioning.
●
Averaging MAPE across series incorrectly: Computing MAPE per series and then averaging gives equal weight to all series regardless of volume. A series with 10 units and 50% MAPE dominates the average as much as a series with 10,000 units and 5% MAPE. Use WMAPE for volume-weighted accuracy.
●
Applying MAPE to intermittent demand: Products with many zero-demand periods (spare parts, seasonal items) make MAPE undefined or misleadingly large. Use WMAPE or the Croston method with MAE for intermittent demand.
●
Not reporting companion metrics: MAPE alone hides the magnitude of errors. A 10% MAPE on ₹100 items means ₹10 average error, but 10% MAPE on ₹1,00,000 items means ₹10,000 average error. Always report MAE alongside MAPE for context.
●
Comparing MAPE across SMAPE and standard formulations: Different tools use different MAPE definitions. scikit-learn uses standard MAPE, the M3/M4 competitions used SMAPE, and supply chain tools often use WMAPE. Comparing numbers across formulations is meaningless.

When Should You Use This?

Use When

You need a scale-free, universally understood accuracy metric: When reporting to business stakeholders who need to compare forecast accuracy across products, regions, or business units with different scales, MAPE's percentage format is immediately interpretable.
Your actual values are strictly positive and well above zero: MAPE works best when the denominator (actual values) is comfortably far from zero -- revenue forecasting, high-volume demand planning, energy consumption prediction.
Cross-series or cross-product comparison is required: Unlike MAE (which is scale-dependent), MAPE lets you compare accuracy across a ₹500 product and a ₹50,000 product on the same scale.
Industry benchmarking demands it: Supply chain, retail, and energy industries use MAPE as the standard accuracy KPI. If your company or client expects MAPE, you need to report it (possibly alongside better metrics).
You want to assess proportional accuracy: When a 10% miss on a ₹100 item and a 10% miss on a ₹1,00,000 item should be considered equally bad, MAPE's proportional framing is correct.
Demand forecasting with stable, non-zero demand: For high-volume SKUs in FMCG, grocery, or established product lines where zero-demand days are rare, MAPE is the industry standard metric.

Avoid When

Actual values include zeros or near-zero values: MAPE is undefined at zero and explodes near zero. Use WMAPE, MAAPE, or MAE instead. This includes intermittent demand (spare parts, seasonal products) and any count-based target where zero is common.
You are training a model (not evaluating): MAPE should never be used as a training loss due to undefined gradients at zero, explosion near zero, and asymmetry bias. Use MAE, MSE, or Huber loss for training.
Directional accuracy matters: If under-predicting by 10% has different consequences than over-predicting by 10% (e.g., inventory costs vs stockout costs), MAPE's asymmetry will mislead you. Use asymmetric loss functions or track forecast bias separately.
Your data has a wide range including very small values: Even without exact zeros, values close to zero produce disproportionately large percentage errors that dominate the average. Filter to $|y| > \text{threshold}$ or use WMAPE.
You need a bounded metric: MAPE has no upper bound -- a single bad prediction can produce 1,000%+ error. Use SMAPE (bounded 0-200%) or MAAPE (bounded 0 to $\pi/2$ ) if you need bounded error.
Comparing across different MAPE conventions: If some of your systems use standard MAPE, others use SMAPE, and others use WMAPE, the numbers are not comparable. Standardize on one variant before comparing.

Key Tradeoffs

The Central Tradeoff: Interpretability vs. Mathematical Soundness

MAPE's greatest strength -- intuitive percentage interpretation -- is inseparable from its greatest weakness -- division by actual values. This single design choice gives MAPE its scale-free property (strength) while introducing undefined behavior at zero, asymmetry, and unbounded errors (weaknesses). You cannot fix one without losing the other.

MAPE vs. MAE

Property	MAPE	MAE
Scale	Percentage (scale-free)	Original units (scale-dependent)
Zeros	Undefined	Well-defined
Asymmetry	Favors under-prediction	Symmetric
Cross-series comparison	Direct	Requires normalization
Stakeholder communication	Excellent	Good (if units are meaningful)
Training loss	Never	Yes (L1 loss)

MAPE vs. SMAPE

SMAPE was designed to fix MAPE's asymmetry, but Hyndman (2014) showed it introduces the opposite bias: SMAPE favors over-prediction. Neither metric is truly symmetric. SMAPE also behaves oddly when both actual and predicted values are small, producing values near 200% even for minor absolute errors. Use WMAPE instead if you need a symmetric, zero-safe percentage metric.

MAPE vs. WMAPE

WMAPE is often the best practical alternative. It avoids per-element division (so individual zeros do not cause problems), naturally weights by volume (so high-demand items matter more, which aligns with business impact), and is equivalent to MAE normalized by total actual volume. The tradeoff: low-volume items have almost no impact on WMAPE, so you need a separate metric to monitor their accuracy.

When to Report Multiple Variants

In production, the best practice is to report WMAPE as the primary metric (zero-safe, business-aligned), MAE as the absolute companion (for understanding error magnitude in original units), and standard MAPE on the subset of non-zero actuals (for backward compatibility with industry benchmarks). Track forecast bias (mean error, not absolute) separately to detect directional patterns that MAPE hides.

Metric	Zero-Safe	Symmetric	Bounded	Volume-Weighted	Best For
MAPE	No	No	No	No	Industry benchmark
SMAPE	Partial	No (favors over)	Yes (0-200%)	No	M-competition
WMAPE	Yes	Yes	No	Yes	Business forecasting
MAAPE	Yes	No	Yes (0-pi/2)	No	Research, intermittent
MASE	Yes	Yes	No	No	Academic comparison

Alternatives & Comparisons

MAE (Mean Absolute Error)

MAE measures absolute error in the original units of the target variable, making it robust to zeros and symmetric by construction. Choose MAE when your data includes zeros or when you need to understand error magnitude in real-world units (e.g., '8 minutes off' vs '12% off'). Choose MAPE when you need cross-product or cross-scale comparisons that MAE cannot provide. In practice, report both: MAE for error magnitude, MAPE for proportional accuracy.

MSE/RMSE

RMSE penalizes large errors quadratically while MAPE penalizes proportionally. A 100-unit error on a 200-unit actual is 50% MAPE regardless of the absolute scale, but contributes 10,000 to MSE. Use RMSE when large individual errors are catastrophic and need heavy penalization. Use MAPE when proportional accuracy matters more than absolute error magnitude. For safety-critical applications, RMSE is strongly preferred.

R² Score

R² measures the proportion of variance explained by the model relative to a naive mean predictor. It is bounded (typically 0 to 1) and scale-free like MAPE, but interpretable differently: R² = 0.85 means the model explains 85% of variance, not that errors are 15%. Use R² for comparing model architectures on the same dataset; use MAPE for communicating forecast accuracy to business stakeholders. R² handles zeros gracefully and does not suffer from MAPE's asymmetry.

Pros, Cons & Tradeoffs

Advantages

Scale-free and universally interpretable: Expressed as a percentage, MAPE allows direct comparison across products, regions, time horizons, and business units without normalization. A supply chain manager understands '12% MAPE' instantly.
Industry standard for demand forecasting: MAPE is the most widely adopted accuracy KPI in supply chain, retail, energy, and financial forecasting. Using it enables benchmarking against published industry standards and competitor comparisons.
Unit-independent: Unlike MAE (measured in rupees, minutes, kilograms), MAPE is dimensionless. This makes it ideal for dashboards that aggregate accuracy across heterogeneous metric types.
Simple to compute and explain: The formula is elementary -- average of percentage errors. No statistical background required to understand or communicate the result. Cost of computation is $O(n)$ with negligible constants.
Effective for strictly positive, non-zero data: When actual values are comfortably above zero (revenue, energy consumption, high-volume demand), MAPE provides clean, meaningful percentage accuracy that aligns with business intuition.
Widely supported in ML tooling: Available in scikit-learn, TensorFlow, statsmodels, Prophet, Apache Spark MLlib, and virtually every forecasting library. No custom implementation needed for basic use cases.

Disadvantages

Undefined when actual values are zero: Division by zero makes MAPE completely unusable for intermittent demand, count data with zeros, or any target that can be exactly zero. This is not a minor edge case -- zero actuals are extremely common in real-world data.
Asymmetric penalty structure: Over-predictions are penalized more heavily than under-predictions of the same absolute magnitude. This bias systematically favors under-forecasting models, which can lead to chronic stockouts in inventory planning.
Unbounded on the high side: A single prediction where the actual is small can produce APE of 1,000%+ or more, disproportionately inflating the mean. A single outlier can make the entire MAPE meaningless.
Explosion near zero: Even if actuals are not exactly zero, small actual values (e.g., 0.01) produce enormous percentage errors (e.g., an error of 1.0 on actual 0.01 = 10,000% APE) that dominate the average.
Equal weighting across all observations: Standard MAPE weights a low-volume product (10 units/day) equally with a high-volume product (10,000 units/day). This misaligns with business impact, where errors on high-volume items matter far more. WMAPE fixes this.
Not suitable as a training loss: Non-differentiability at zero error, undefined gradient at zero actual, and asymmetry bias make MAPE unsuitable for gradient-based optimization. You must train with a different loss and evaluate with MAPE.

Use WMAPE for cross-series aggregation, which naturally weights by volume. If standard MAPE is required, report it at the series level and aggregate using a volume-weighted average. Segment reporting into volume tiers (high/medium/low) with separate MAPE targets for each tier.

Placement in an ML System

Where MAPE Sits in the ML System

MAPE occupies the evaluation and monitoring layer of the ML pipeline, sitting between the model's prediction output and the business decision layer. It is not part of the model itself -- it is a measurement tool applied to the model's outputs.

In a typical demand forecasting system at a company like Flipkart or Swiggy, the flow is: (1) the model produces predictions for next-day demand per SKU-store combination, (2) these predictions are logged to a prediction store, (3) after the prediction horizon elapses, actual demand is observed, (4) the evaluation pipeline joins predictions with actuals and computes MAPE (along with companion metrics), (5) MAPE feeds into dashboards, alerting systems, and model retraining triggers.

MAPE also appears in the model selection phase during development. When comparing multiple candidate models (e.g., LightGBM vs Prophet vs DeepAR), MAPE on the validation set is often the primary ranking criterion -- which is precisely where its asymmetry bias can silently influence model choice toward under-forecasting approaches.

Production Pattern: Use WMAPE as the primary monitoring metric (zero-safe, volume-weighted), standard MAPE on the non-zero subset for industry benchmark compatibility, and forecast bias (mean signed error) for directional monitoring. This three-metric combination covers MAPE's weaknesses while preserving its communication advantages.

Pipeline Stage

Evaluation / Monitoring

Upstream

Model serving endpoint (predictions)
Ground truth collection pipeline
Feature store (for segmented evaluation)

Downstream

Monitoring dashboards and alerting
Model selection and champion/challenger comparison
Business reporting (forecast accuracy KPIs)
Retraining triggers

Scaling Bottlenecks

MAPE itself is computationally trivial -- $O(n)$ with minimal memory requirements. The bottleneck in production is not the metric computation but the data pipeline that delivers paired (prediction, actual) values. In demand forecasting systems, actuals often arrive days or weeks after predictions are made, requiring temporal join infrastructure to align forecasts with observed outcomes. At scale (millions of SKUs across hundreds of stores), this join operation dominates cost.

For real-time monitoring, the zero-checking logic adds a branch per observation. At 100K predictions/second, this is negligible. The practical bottleneck is maintaining sliding-window accumulators across multiple MAPE variants (standard, WMAPE, per-segment) simultaneously, which requires careful memory management on streaming infrastructure like Kafka Streams or Flink.

Production Case Studies

SwiggyFood Delivery / Quick Commerce (India)

Swiggy Instamart's demand forecasting team uses MAPE and WMAPE to evaluate predictions for perishable and non-perishable SKUs across their dark store network. Their engineering blog describes how they adapted metric alignment to account for the different business impacts of over- vs under-forecasting: over-forecasting perishables leads to spoilage waste, while under-forecasting leads to stockouts and lost orders. They explicitly address MAPE's asymmetry by using cost-aware metrics alongside MAPE.

Outcome:

The adaptive metric framework enabled Swiggy Instamart to reduce food waste by aligning forecasting incentives with actual business costs, rather than optimizing blindly for symmetric MAPE. WMAPE is used as the primary accuracy KPI across their demand planning dashboards.

AmazonE-commerce / Cloud (Global)

Amazon's Forecast service documentation explicitly discusses MAPE alongside WAPE (Weighted Absolute Percentage Error, equivalent to WMAPE) for evaluating demand forecasts. They note that MAPE uses an unweighted average that emphasizes forecasting error on all items equally regardless of demand volume, and recommend WAPE when forecasting accuracy should be proportional to item volume. Their documentation serves as a reference for the MAPE vs WMAPE decision framework used across the retail industry.

Outcome:

Amazon Forecast defaults to WAPE as the primary accuracy metric, acknowledging MAPE's limitations for real-world demand datasets with varying volumes and potential zeros. The documentation has become a widely cited industry reference for metric selection.

Makridakis Competitions (M3/M4/M5)Forecasting Research

The Makridakis forecasting competitions have been the most influential benchmarks in time series forecasting since 1982. The M3 competition (2000) and M4 competition (2018) used sMAPE (Symmetric MAPE) as a primary evaluation metric alongside MASE. The M5 competition (2020) shifted to RMSSE for point accuracy, partly due to the documented issues with MAPE and sMAPE. This evolution across competitions traces the forecasting community's gradual move away from percentage-based metrics.

Outcome:

The M4 competition evaluated 61 forecasting methods across 100,000 time series. The winning hybrid method (Smyl, 2020) achieved 11.374% sMAPE overall, outperforming pure statistical methods by ~9.4%. The competition's shift away from standard MAPE to sMAPE, and eventually to scaled metrics in M5, reflects the community's recognition of MAPE's fundamental limitations.

UberRide-sharing / Logistics (Global)

Uber's marketplace forecasting team uses MAPE to evaluate demand forecasts for ride requests across cities and time windows. Their engineering blog describes how they manage MAPE's challenges in practice: low-demand periods (late night, rural areas) produce inflated percentage errors, and the team uses segment-specific MAPE thresholds -- tighter for high-demand urban zones, more lenient for sparse suburban areas. They complement MAPE with absolute metrics for capacity planning.

Outcome:

Uber reports using ensemble forecasting models evaluated with MAPE, achieving sub-15% MAPE on aggregate urban demand. The segment-specific threshold approach prevents low-volume edge cases from masking poor performance on business-critical high-volume segments.

Tooling & Ecosystem

scikit-learn (mean_absolute_percentage_error)

PythonOpen Source

The standard implementation for offline MAPE evaluation in Python. Returns MAPE as a fraction (not percentage). Supports sample weights and multi-output. Returns inf when actuals contain zeros -- by design. Available since scikit-learn 0.24.

statsmodels

PythonOpen Source

Provides time series forecasting tools (ARIMA, ETS, VAR) with MAPE evaluation built into the diagnostics. Useful for classical time series models where MAPE is the default accuracy metric. Also includes SMAPE and other forecast error metrics.

Prophet (by Meta)

Python / ROpen Source

Meta's open-source forecasting library. Reports MAPE in its cross-validation diagnostics via performance_metrics(). Commonly used for business time series (daily sales, website traffic). Supports automatic hyperparameter tuning with MAPE as the optimization target.

GluonTS (by AWS)

PythonOpen Source

Deep learning toolkit for probabilistic time series forecasting. Includes MAPE, sMAPE, MASE, and MSIS as built-in evaluation metrics via its Evaluator class. Integrates with DeepAR, Transformer, and other neural forecasting models.

Apache Spark MLlib (RegressionEvaluator)

Scala / Python / JavaOpen Source

Distributed evaluation framework for large-scale regression models. Supports MAPE via the RegressionEvaluator with metricName='mae' (compute MAE, then normalize externally for MAPE). Suitable for evaluating forecasts over millions of SKU-store-day combinations.

Nixtla (StatsForecast / NeuralForecast)

PythonOpen Source

Modern forecasting framework providing both classical (StatsForecast) and neural (NeuralForecast) models. Includes MAPE, SMAPE, WMAPE, and MASE in its evaluation suite. Optimized for large-scale time series (thousands of series in parallel) with GPU acceleration.

Research & References

Another look at measures of forecast accuracy

Hyndman, R.J. and Koehler, A.B. (2006)International Journal of Forecasting, 22(4), 679-688

The landmark paper that systematically analyzed the shortcomings of MAPE and sMAPE, proposing MASE (Mean Absolute Scaled Error) as a superior alternative. Demonstrated that percentage-based metrics are degenerate when actuals include zeros and recommended abandoning them for cross-series comparisons.

The M4 Competition: 100,000 time series and 61 forecasting methods

Makridakis, S., Spiliotis, E., and Assimakopoulos, V. (2020)International Journal of Forecasting, 36(1), 54-74

The M4 competition used sMAPE as a primary evaluation metric across 100,000 time series. Results showed that hybrid ML-statistical methods outperformed pure approaches, with the winning method achieving 11.374% sMAPE. The paper's detailed accuracy tables serve as benchmark references for the forecasting community.

A new metric of absolute percentage error for intermittent demand forecasts

Kim, S. and Kim, H. (2016)International Journal of Forecasting, 32(3), 669-679

Proposed MAAPE (Mean Arctangent Absolute Percentage Error), which replaces MAPE's division with an arctangent function to bound errors in $[0, \pi/2)$ . Demonstrated superior behavior on intermittent demand data where MAPE is undefined or misleadingly large.

A new accuracy measure based on bounded relative error for time series forecasting

Chen, C., Twycross, J., and Garibaldi, J.M. (2017)PLOS ONE, 12(3), e0174202

Proposed a bounded relative error metric that addresses MAPE's zero-denominator and asymmetry issues simultaneously. Evaluated against MAPE, sMAPE, and MASE on multiple real-world datasets, showing improved stability and interpretability.

Errors on percentage errors

Hyndman, R.J. (2014)Hyndsight Blog (Technical Note)

Demonstrated that sMAPE (Symmetric MAPE) is not actually symmetric -- it favors over-forecasting, the opposite bias from standard MAPE. This influential blog post is widely cited in forecasting literature and contributed to the M5 competition's decision to abandon percentage-based metrics.

Interview & Evaluation Perspective

Common Interview Questions

●
What is MAPE and when would you use it over MAE or RMSE?
●
Explain MAPE's asymmetry problem. How does it affect model selection?
●
What happens to MAPE when actual values are zero? How would you handle this?
●
Compare MAPE, SMAPE, and WMAPE. When would you use each?
●
How would you set up MAPE monitoring for a demand forecasting system with thousands of SKUs?
●
Why should you never use MAPE as a training loss function?

Key Points to Mention

●
MAPE is scale-free (percentage-based), which enables cross-product and cross-series comparison -- its primary advantage over MAE.
●
MAPE is undefined when actual values are zero and explodes near zero. Always check for zeros before computing. Use WMAPE for zero-safe evaluation.
●
MAPE has an asymmetry bias: it penalizes over-predictions more than under-predictions, systematically favoring models that under-forecast. Track forecast bias (mean signed error) to detect this.
●
WMAPE = sum(|errors|) / sum(|actuals|) -- aggregates before dividing, avoiding per-element zero issues and weighting by volume. It is the preferred variant for production demand forecasting.
●
SMAPE was designed to fix MAPE's asymmetry but introduces the opposite bias (favors over-forecasting). Neither is truly symmetric. WMAPE is a better alternative for practical use.
●
Never use MAPE as a training loss: undefined gradients at y=0, explosion near zero, and asymmetry bias make it unsuitable for optimization. Train with MAE or Huber, evaluate with MAPE.

Pitfalls to Avoid

●
Claiming SMAPE solves MAPE's asymmetry -- it reverses the bias direction instead of eliminating it. Hyndman (2014) demonstrated this clearly.
●
Forgetting that scikit-learn returns MAPE as a fraction (0.12), not a percentage (12%). Quoting the wrong convention in an interview is a credibility issue.
●
Stating MAPE is bounded between 0% and 100% -- it is bounded below by 0% but has no upper bound on the high side. Under-predictions are bounded at 100%, over-predictions are unbounded.
●
Ignoring the business impact of MAPE's asymmetry: under-forecasting bias leads to stockouts in inventory management, a critical operational failure that directly impacts revenue.

Senior-Level Expectation

A senior candidate should articulate the full decision tree: (1) check data for zeros -- if present, immediately switch to WMAPE or MAAPE, (2) if using standard MAPE, always report forecast bias alongside it to detect asymmetry effects, (3) for cross-series aggregation, use WMAPE (volume-weighted) rather than averaging per-series MAPE, (4) in production monitoring, set up three-metric dashboards (WMAPE + MAE + bias) with segment-level breakdowns. They should reference the Hyndman-Koehler (2006) paper and the M-competition evolution from MAPE to sMAPE to MASE as evidence of the metric's limitations. Cost analysis should include the business cost of MAPE-induced under-forecasting (e.g., stockout costs in Indian FMCG, estimated at 2-5% of revenue for chronic under-forecasters) and frame metric selection as a business decision, not just a technical one.

Summary

MAPE (Mean Absolute Percentage Error) is the most widely used percentage-based metric for evaluating regression and forecasting models. Its formula -- the average of absolute percentage errors $\frac{1}{n} \sum |\frac{y_i - \hat{y}_i}{y_i}| \times 100\%$ -- produces a scale-free, universally interpretable accuracy measure. When a demand planner reports 12% MAPE, every stakeholder in the room understands what it means.

However, MAPE carries three fundamental limitations that every ML practitioner must internalize: (1) it is undefined when actual values are zero and explodes near zero, making it unusable for intermittent demand or zero-inclusive targets, (2) it has an asymmetry bias that penalizes over-predictions more than under-predictions, systematically favoring models that under-forecast (a potentially costly bias in inventory and capacity planning), and (3) it is unbounded on the high side, allowing a single low-denominator observation to produce thousands-of-percent error that dominates the average.

The practical response to these limitations is to use WMAPE (Weighted MAPE) as the primary production metric -- it avoids per-element division, naturally weights by volume, and handles individual zeros gracefully. Complement WMAPE with MAE for absolute error magnitude and forecast bias (mean signed error) for directional monitoring. Report standard MAPE only on the non-zero subset for backward compatibility with industry benchmarks. This three-metric approach -- WMAPE for business KPIs, MAE for debugging, bias for directionality -- covers all of MAPE's weaknesses while preserving the percentage-based communication that makes it so valuable to business stakeholders.

Concept Snapshot

Why This Concept Exists

The Scale Problem That MAE Cannot Solve

Historical Roots in Business Forecasting

Why It Persists Despite Known Flaws

Core Intuition & Mental Model

The Simplest Way to Think About MAPE

The Asymmetry You Must Understand

The Zero Problem

Technical Foundations

The Standard MAPE Formula

Properties

The Asymmetry Formally

Key Variants

Computational Complexity

Internal Architecture

Key Components

Data Flow

How to Implement

Common Implementation Mistakes

When Should You Use This?

Use When

Avoid When

Key Tradeoffs

The Central Tradeoff: Interpretability vs. Mathematical Soundness

MAPE vs. MAE

MAPE vs. SMAPE

MAPE vs. WMAPE

When to Report Multiple Variants

Alternatives & Comparisons

Pros, Cons & Tradeoffs

Advantages

Disadvantages

Failure Modes & Debugging

Division by zero on zero actuals

Asymmetry-induced under-forecasting bias

Near-zero actual value explosion

Outlier-dominated MAPE from single bad predictions

Cross-formulation comparison error

Misleading cross-series average MAPE

Placement in an ML System

Where MAPE Sits in the ML System

Pipeline Stage

Upstream

Downstream

Scaling Bottlenecks

Production Case Studies

Tooling & Ecosystem

Research & References

Interview & Evaluation Perspective

Common Interview Questions

Key Points to Mention

Pitfalls to Avoid

Senior-Level Expectation

Summary

Related Blocks & Further Reading

Related ML Blocks

Further Reading