🌊 Time Series Analysis & Forecasting: Deep Dive & Best Practices

Concise, clear, and validated revision notes on Statsmodels — structured for beginners and practitioners.

Posted Dec 5, 2025 Updated Dec 7, 2025

19 min read

Introduction
Fundamental Concepts
Time Series Components
Stationarity and Preprocessing
Statistical Models
Machine Learning Approaches
Changepoint Detection with Ruptures
Model Evaluation
Python Libraries Overview
Best Practices
Terminology Tables
Complete Code Examples
Appendices
References

Introduction

Time series analysis and forecasting are essential techniques in data science for predicting future values based on historical temporal patterns. A time series is a sequence of data points indexed in chronological order, typically at successive equally spaced time intervals.

Why Time Series Analysis Matters

Time series forecasting enables organizations to:

Demand Planning: Predict future demand and optimize inventory
Financial Forecasting: Estimate sales, revenue, and market trends
Resource Optimization: Anticipate capacity needs and allocate resources
Anomaly Detection: Monitor systems and detect unusual patterns
Strategic Planning: Make data-driven decisions with confidence intervals

Key Distinction

Time Series Analysis: Descriptive examination of historical patterns, trends, and statistical properties
Time Series Forecasting: Predictive modeling to estimate future values based on past observations

Fundamental Concepts

Temporal Dependencies

Unlike traditional supervised learning where observations are independent and identically distributed (i.i.d.), time series data exhibits temporal dependencies. Each observation at time $t$ is correlated with previous observations, creating autocorrelation that must be explicitly modeled.

Autocorrelation Function (ACF)

The ACF measures the linear relationship between observations separated by $k$ time lags:

\[\text{ACF}(k) = \frac{\text{Cov}(y_t, y_{t-k})}{\text{Var}(y_t)} = \frac{\sum_{t=k+1}^{n}(y_t - \bar{y})(y_{t-k} - \bar{y})}{\sum_{t=1}^{n}(y_t - \bar{y})^2}\]

Partial Autocorrelation Function (PACF)

The PACF measures the correlation between $y_t$ and $y_{t-k}$ after removing the linear influence of intermediate lags $1, 2, …, k-1$.

Time Series Components

Every time series can be decomposed into four fundamental components:

1. Trend Component ($T_t$)

The trend represents long-term directional movement:

Linear: $T_t = \beta_0 + \beta_1 t$
Exponential: $T_t = \beta_0 e^{\beta_1 t}$
Polynomial: $T_t = \beta_0 + \beta_1 t + \beta_2 t^2 + …$

2. Seasonal Component ($S_t$)

Seasonality refers to periodic fluctuations at fixed intervals:

Hourly (daily data)
Daily (hourly data)
Weekly (daily data)
Monthly (daily/weekly data)
Quarterly (monthly data)
Yearly (monthly/quarterly data)

The seasonal period $m$ defines observations per cycle.

3. Cyclical Component ($C_t$)

Cycles are longer-term fluctuations without fixed periodicity, often related to economic or business conditions.

4. Irregular Component ($R_t$)

The residual or noise represents random, unpredictable variations. Ideally:

$E[R_t] = 0$ (zero mean)
$\text{Var}(R_t) = \sigma^2$ (constant variance)
$\text{Cov}(R_t, R_{t-k}) = 0$ for $k \neq 0$ (no autocorrelation)

Decomposition Models

Additive Model (constant seasonal variation): $y_t = T_t + S_t + C_t + R_t$

Multiplicative Model (proportional seasonal variation): $y_t = T_t \times S_t \times C_t \times R_t$

  
from statsmodels.tsa.seasonal import seasonal_decompose
import matplotlib.pyplot as plt

decomposition = seasonal_decompose(df['value'], model='additive', period=12)

fig, axes = plt.subplots(4, 1, figsize=(14, 10))
decomposition.observed.plot(ax=axes[0], title='Original Series')
decomposition.trend.plot(ax=axes[1], title='Trend')
decomposition.seasonal.plot(ax=axes[2], title='Seasonal')
decomposition.resid.plot(ax=axes[3], title='Residual')
plt.tight_layout()

Stationarity and Preprocessing

Definition of Stationarity

A time series ${y_t}$ is weakly (covariance) stationary if:

Constant Mean: $E[y_t] = \mu$ for all $t$
Constant Variance: $\text{Var}(y_t) = \sigma^2$ for all $t$
Autocovariance depends only on lag: $\text{Cov}(y_t, y_{t-k}) = \gamma_k$ for all $t$

Strong (Strict) Stationarity: The joint distribution of $(y_{t_1}, …, y_{t_n})$ is identical to $(y_{t_1+h}, …, y_{t_n+h})$ for any $h$.

Why Stationarity Matters

Most statistical forecasting models (ARIMA, exponential smoothing) assume stationarity because:

Enables reliable parameter estimation
Forecasts have consistent statistical properties
Sample statistics are meaningful
Simplifies theoretical analysis

Testing for Stationarity

Augmented Dickey-Fuller (ADF) Test

Tests for a unit root in the series.

Hypotheses:

$H_0$: Series has a unit root (non-stationary)
$H_1$: Series is stationary

Decision: Reject $H_0$ if p-value < 0.05

  
from statsmodels.tsa.stattools import adfuller

def adf_test(series, name=''):
    result = adfuller(series, autolag='AIC')
    print(f'\n{name}')
    print(f'ADF Statistic: {result[0]:.4f}')
    print(f'p-value: {result[1]:.4f}')
    print(f'Critical Values:')
    for key, value in result[4].items():
        print(f'  {key}: {value:.3f}')
    
    if result[1] < 0.05:
        print("✓ Series is stationary (reject H₀)")
    else:
        print("✗ Series is non-stationary (fail to reject H₀)")
    
    return result[1] < 0.05

KPSS Test

Complementary test for stationarity.

Hypotheses:

$H_0$: Series is stationary
$H_1$: Series is non-stationary

Decision: Reject $H_0$ if p-value < 0.05

  
from statsmodels.tsa.stattools import kpss

def kpss_test(series, name=''):
    result = kpss(series, regression='c', nlags='auto')
    print(f'\n{name}')
    print(f'KPSS Statistic: {result[0]:.4f}')
    print(f'p-value: {result[1]:.4f}')
    
    if result[1] > 0.05:
        print("✓ Series is stationary (fail to reject H₀)")
    else:
        print("✗ Series is non-stationary (reject H₀)")

Combined Interpretation:

ADF	KPSS	Interpretation
Stationary	Stationary	Series is stationary
Non-stationary	Non-stationary	Series is non-stationary
Stationary	Non-stationary	Trend stationary
Non-stationary	Stationary	Difference stationary

Achieving Stationarity

1. Differencing

First-Order Differencing: $\Delta y_t = y_t - y_{t-1}$

  
df['first_diff'] = df['value'].diff()

Second-Order Differencing: $\Delta^2 y_t = \Delta y_t - \Delta y_{t-1} = y_t - 2y_{t-1} + y_{t-2}$

  
df['second_diff'] = df['value'].diff().diff()

Seasonal Differencing (period $m$): $\Delta_m y_t = y_t - y_{t-m}$

  
# For monthly data with yearly seasonality
df['seasonal_diff'] = df['value'].diff(12)

Combined Differencing: $\Delta_m \Delta y_t = (y_t - y_{t-1}) - (y_{t-m} - y_{t-m-1})$

Guidelines:

Apply seasonal differencing before non-seasonal if both needed
Rarely need $d + D > 2$
Over-differencing introduces MA behavior
Differencing is reversible via cumsum

2. Transformation

Logarithmic Transformation (variance stabilization): $z_t = \ln(y_t)$

  
import numpy as np
df['log_value'] = np.log(df['value'])

Box-Cox Transformation (automatic power selection): $y_t^{(\lambda)} = \begin{cases} \frac{y_t^{\lambda} - 1}{\lambda}, & \lambda \neq 0 \\ \ln(y_t), & \lambda = 0 \end{cases}$

  
from scipy.stats import boxcox
transformed, lambda_optimal = boxcox(df['value'])
print(f'Optimal λ: {lambda_optimal:.4f}')

3. Detrending

  
from scipy.signal import detrend
detrended = detrend(df['value'])

Statistical Models

Exponential Smoothing Methods

Exponential smoothing assigns exponentially decreasing weights to past observations.

Simple Exponential Smoothing (SES)

For series with no trend or seasonality:

\[\hat{y}_{t+1|t} = \alpha y_t + (1-\alpha)\hat{y}_{t|t-1}\]

where $0 < \alpha \leq 1$ is the smoothing parameter.

Equivalent form: $\hat{y}_{t+1|t} = \hat{y}_{t|t-1} + \alpha(y_t - \hat{y}_{t|t-1})$

  
from statsmodels.tsa.holtwinters import SimpleExpSmoothing

model = SimpleExpSmoothing(train_data)
fit = model.fit(smoothing_level=0.2, optimized=True)
forecast = fit.forecast(steps=10)

Holt’s Linear Trend Method

Extends SES with trend component:

\[\begin{align} \text{Level: } & \ell_t = \alpha y_t + (1-\alpha)(\ell_{t-1} + b_{t-1}) \\ \text{Trend: } & b_t = \beta(\ell_t - \ell_{t-1}) + (1-\beta)b_{t-1} \\ \text{Forecast: } & \hat{y}_{t+h|t} = \ell_t + hb_t \end{align}\]

Holt-Winters Seasonal Method

Additive Seasonality: $\hat{y}_{t+h|t} = \ell_t + hb_t + s_{t+h-m(k+1)}$

Multiplicative Seasonality: $\hat{y}_{t+h|t} = (\ell_t + hb_t) \times s_{t+h-m(k+1)}$

where $k = \lfloor (h-1)/m \rfloor$

  
from statsmodels.tsa.holtwinters import ExponentialSmoothing

model = ExponentialSmoothing(
    train_data,
    seasonal_periods=12,
    trend='add',
    seasonal='add',
    damped_trend=False
)
fit = model.fit()
forecast = fit.forecast(steps=12)

ARIMA Models

AutoRegressive Integrated Moving Average models are cornerstone methods for time series forecasting.

ARIMA(p, d, q) Components

p (AR order): Number of autoregressive terms $\phi(B) = 1 - \phi_1 B - \phi_2 B^2 - \cdots - \phi_p B^p$

d (Differencing order): Number of differences to achieve stationarity

q (MA order): Number of moving average terms $\theta(B) = 1 + \theta_1 B + \theta_2 B^2 + \cdots + \theta_q B^q$

Mathematical Formulation

Using the backshift operator $B$ where $By_t = y_{t-1}$:

\[\phi(B)(1-B)^d y_t = \theta(B)\varepsilon_t\]

Expanded: $(1 - \phi_1 B - \cdots - \phi_p B^p)(1-B)^d y_t = (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t$

where $\varepsilon_t \sim \text{WN}(0, \sigma^2)$ (white noise).

Parameter Selection

ACF and PACF Patterns:

Model	ACF Pattern	PACF Pattern
AR(p)	Decays exponentially	Cuts off after lag p
MA(q)	Cuts off after lag q	Decays exponentially
ARMA(p,q)	Decays exponentially	Decays exponentially

Information Criteria:

$\text{AIC} = -2\ln(L) + 2k$ $\text{BIC} = -2\ln(L) + k\ln(n)$ $\text{AICc} = \text{AIC} + \frac{2k(k+1)}{n-k-1}$

where $k$ = number of parameters, $n$ = sample size, $L$ = likelihood.

Lower values indicate better models.

Implementation

  
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt

# Visualize ACF and PACF
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
plot_acf(train_data, lags=40, ax=axes[0], title='ACF')
plot_pacf(train_data, lags=40, ax=axes[1], title='PACF')
plt.tight_layout()

# Fit ARIMA
model = ARIMA(train_data, order=(1, 1, 1))
fit = model.fit()

# Summary
print(fit.summary())

# Forecast
forecast = fit.forecast(steps=10)
forecast_ci = fit.get_forecast(steps=10).conf_int()

# Diagnostics
fit.plot_diagnostics(figsize=(14, 8))

Auto ARIMA

  
import pmdarima as pm

auto_model = pm.auto_arima(
    train_data,
    start_p=0, start_q=0,
    max_p=5, max_q=5,
    d=None,  # Automatically determine d
    start_P=0, start_Q=0,
    max_P=2, max_Q=2,
    m=12,  # Seasonal period
    seasonal=True,
    stationary=False,
    information_criterion='aic',
    stepwise=True,
    suppress_warnings=True,
    error_action='ignore',
    trace=True
)

print(auto_model.summary())

SARIMA Models

Seasonal ARIMA extends ARIMA for seasonal patterns.

SARIMA(p,d,q)(P,D,Q,m) Notation

Non-seasonal: (p, d, q) Seasonal: (P, D, Q, m)

where $m$ is the seasonal period.

Mathematical Formulation

\[\phi(B)\Phi(B^m)(1-B)^d(1-B^m)^D y_t = \theta(B)\Theta(B^m)\varepsilon_t\]

where:

$\Phi(B^m) = 1 - \Phi_1 B^m - \Phi_2 B^{2m} - \cdots - \Phi_P B^{Pm}$
$\Theta(B^m) = 1 + \Theta_1 B^m + \Theta_2 B^{2m} + \cdots + \Theta_Q B^{Qm}$

Example: SARIMA(1,1,1)(1,1,1)₁₂ for monthly data:

\[(1-\phi_1 B)(1-\Phi_1 B^{12})(1-B)(1-B^{12})y_t = (1+\theta_1 B)(1+\Theta_1 B^{12})\varepsilon_t\]

Implementation

  
from statsmodels.tsa.statespace.sarimax import SARIMAX

model = SARIMAX(
    train_data,
    order=(1, 1, 1),              # (p, d, q)
    seasonal_order=(1, 1, 1, 12), # (P, D, Q, m)
    enforce_stationarity=False,
    enforce_invertibility=False
)
fit = model.fit(disp=False, maxiter=200)

# Forecast
forecast = fit.forecast(steps=12)
forecast_obj = fit.get_forecast(steps=12)
forecast_df = forecast_obj.summary_frame(alpha=0.05)

Best Practices:

$D \leq 1$ (rarely need more)
$d + D \leq 2$ (total differencing)
Start with small P and Q (usually 0 or 1)
Validate residuals are white noise

Machine Learning Approaches

Feature Engineering for Time Series

Lag Features

  
def create_lag_features(df, target_col, lags):
    """Create lag features"""
    for lag in lags:
        df[f'lag_{lag}'] = df[target_col].shift(lag)
    return df

df = create_lag_features(df, 'value', lags=[1, 2, 3, 7, 14, 30])

Rolling Window Features

  
def create_rolling_features(df, target_col, windows):
    """Create rolling statistics"""
    for window in windows:
        df[f'rolling_mean_{window}'] = df[target_col].rolling(window).mean()
        df[f'rolling_std_{window}'] = df[target_col].rolling(window).std()
        df[f'rolling_min_{window}'] = df[target_col].rolling(window).min()
        df[f'rolling_max_{window}'] = df[target_col].rolling(window).max()
        df[f'rolling_median_{window}'] = df[target_col].rolling(window).median()
    return df

df = create_rolling_features(df, 'value', windows=[7, 14, 30])

Exponentially Weighted Features

  
df['ewm_mean_7'] = df['value'].ewm(span=7, adjust=False).mean()
df['ewm_std_7'] = df['value'].ewm(span=7, adjust=False).std()

Temporal Features

  
def create_temporal_features(df):
    """Extract date/time features"""
    df['month'] = df.index.month
    df['quarter'] = df.index.quarter
    df['dayofweek'] = df.index.dayofweek
    df['dayofyear'] = df.index.dayofyear
    df['weekofyear'] = df.index.isocalendar().week
    df['is_weekend'] = (df.index.dayofweek >= 5).astype(int)
    df['is_month_start'] = df.index.is_month_start.astype(int)
    df['is_month_end'] = df.index.is_month_end.astype(int)
    df['is_quarter_start'] = df.index.is_quarter_start.astype(int)
    df['is_quarter_end'] = df.index.is_quarter_end.astype(int)
    return df

Cyclical Encoding

  
def encode_cyclical_features(df):
    """Encode cyclical features using sine/cosine"""
    df['month_sin'] = np.sin(2 * np.pi * df.index.month / 12)
    df['month_cos'] = np.cos(2 * np.pi * df.index.month / 12)
    df['day_sin'] = np.sin(2 * np.pi * df.index.dayofweek / 7)
    df['day_cos'] = np.cos(2 * np.pi * df.index.dayofweek / 7)
    df['hour_sin'] = np.sin(2 * np.pi * df.index.hour / 24)
    df['hour_cos'] = np.cos(2 * np.pi * df.index.hour / 24)
    return df

Time Series Cross-Validation

  
from sklearn.model_selection import TimeSeriesSplit

tscv = TimeSeriesSplit(n_splits=5, gap=0, test_size=30)

for fold, (train_idx, test_idx) in enumerate(tscv.split(X)):
    print(f'Fold {fold+1}:')
    print(f'  Train: {train_idx[0]} to {train_idx[-1]}')
    print(f'  Test: {test_idx[0]} to {test_idx[-1]}')
    
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]
    
    # Train and evaluate

Gradient Boosting Implementation

  
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Prepare features
X = df[feature_cols]
y = df['target']

# Remove NaN
X = X.dropna()
y = y.loc[X.index]

# Split
train_size = int(len(X) * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

# Model
model = HistGradientBoostingRegressor(
    max_iter=200,
    max_depth=6,
    learning_rate=0.1,
    min_samples_leaf=20,
    l2_regularization=1.0,
    random_state=42
)

# Train
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluate
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

print(f'MAE: {mae:.2f}')
print(f'RMSE: {rmse:.2f}')

# Feature importance
importance_df = pd.DataFrame({
    'feature': feature_cols,
    'importance': model.feature_importances_
}).sort_values('importance', ascending=False)

print(importance_df.head(10))

Multi-Step Forecasting Strategies

Recursive Strategy

  
def recursive_forecast(model, X_last, n_steps, lag_features):
    """Recursive multi-step forecasting"""
    forecasts = []
    X_current = X_last.copy()
    
    for step in range(n_steps):
        # Predict next step
        pred = model.predict([X_current])[0]
        forecasts.append(pred)
        
        # Update lag features
        for i in range(len(lag_features)-1, 0, -1):
            X_current[lag_features[i]] = X_current[lag_features[i-1]]
        X_current[lag_features[0]] = pred
        
    return np.array(forecasts)

Direct Strategy

  
from sklearn.multioutput import MultiOutputRegressor

# Create multi-horizon targets
horizon = 10
y_multi = pd.concat([
    df['target'].shift(-h) for h in range(1, horizon+1)
], axis=1)
y_multi.columns = [f'h{h}' for h in range(1, horizon+1)]
y_multi = y_multi.dropna()

# Align features
X_multi = X.loc[y_multi.index]

# Train
multi_model = MultiOutputRegressor(
    HistGradientBoostingRegressor(max_iter=100)
)
multi_model.fit(X_multi, y_multi)

# Predict all horizons at once
predictions = multi_model.predict(X_test)

Facebook Prophet

Prophet decomposes time series into trend, seasonality, and holidays.

\[y(t) = g(t) + s(t) + h(t) + \varepsilon_t\]

where:

$g(t)$: Piecewise linear/logistic growth
$s(t)$: Periodic seasonality (Fourier series)
$h(t)$: Holiday effects
$\varepsilon_t$: Error term

Basic Implementation

  
from prophet import Prophet

# Prepare data
df_prophet = pd.DataFrame({
    'ds': dates,  # datetime column
    'y': values   # target column
})

# Initialize
model = Prophet(
    growth='linear',  # or 'logistic'
    yearly_seasonality=True,
    weekly_seasonality=True,
    daily_seasonality=False,
    seasonality_mode='additive',  # or 'multiplicative'
    changepoint_prior_scale=0.05,
    seasonality_prior_scale=10.0,
    holidays_prior_scale=10.0,
    interval_width=0.95
)

# Fit
model.fit(df_prophet)

# Forecast
future = model.make_future_dataframe(periods=365, freq='D')
forecast = model.predict(future)

# Visualize
fig1 = model.plot(forecast)
fig2 = model.plot_components(forecast)

Custom Seasonality and Holidays

  
# Add custom seasonality
model.add_seasonality(
    name='monthly',
    period=30.5,
    fourier_order=5
)

# Define holidays
holidays = pd.DataFrame({
    'holiday': 'special_event',
    'ds': pd.to_datetime(['2024-01-01', '2024-07-04', '2024-12-25']),
    'lower_window': -1,
    'upper_window': 1
})

model = Prophet(holidays=holidays)

# Add country holidays
model.add_country_holidays(country_name='US')

# Add regressors
df_prophet['marketing_spend'] = marketing_data
model.add_regressor('marketing_spend')

Cross-Validation

  
from prophet.diagnostics import cross_validation, performance_metrics

df_cv = cross_validation(
    model,
    initial='730 days',
    period='180 days',
    horizon='365 days',
    parallel='processes'
)

df_perf = performance_metrics(df_cv)
print(df_perf[['horizon', 'mape', 'rmse']].head())

Changepoint Detection with Ruptures

What are Changepoints?

Changepoints are time indices where statistical properties change:

Mean shifts: $E[y_t] \neq E[y_{t+1}]$
Variance changes: $\text{Var}(y_t) \neq \text{Var}(y_{t+1}]$
Distribution changes
Trend breaks

Algorithms

PELT (Pruned Exact Linear Time)

Optimal detection with complexity $O(n)$:

  
import ruptures as rpt

algo = rpt.Pelt(model="l2", min_size=10, jump=1)
algo.fit(signal)
changepoints = algo.predict(pen=10)

Dynamic Programming

Exact when $K$ changepoints known:

  
algo = rpt.Dynp(model="l2", min_size=10)
algo.fit(signal)
changepoints = algo.predict(n_bkps=3)

Binary Segmentation

Fast approximate method:

  
algo = rpt.Binseg(model="l2", min_size=10)
algo.fit(signal)
changepoints = algo.predict(n_bkps=3)

Cost Functions

Cost	Detects	Use Case
`l2`	Mean changes	Gaussian data
`l1`	Median changes	Robust to outliers
`rbf`	Distribution changes	Non-linear patterns
`normal`	Mean+variance	Simultaneous changes
`linear`	Slope changes	Regression breakpoints

Penalty Selection

BIC-like criterion: $\text{pen} \approx 2 \log(n) \sigma^2$

  
n = len(signal)
sigma = np.std(signal)
penalty = 2 * np.log(n) * sigma**2

algo = rpt.Pelt(model="l2", min_size=10)
algo.fit(signal)
changepoints = algo.predict(pen=penalty)

Visualization

  
rpt.display(signal, true_chg_pts=[], computed_chg_pts=changepoints)
plt.title('Changepoint Detection')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()

Model Evaluation

Metrics

Mean Absolute Error (MAE)

\[\text{MAE} = \frac{1}{n}\sum_{i=1}^{n}|y_i - \hat{y}_i|\]

Properties: Scale-dependent, robust to outliers, same units as target

  
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_true, y_pred)

Root Mean Squared Error (RMSE)

$\text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}$

Properties: Scale-dependent, penalizes large errors, same units

  
rmse = np.sqrt(mean_squared_error(y_true, y_pred))

Mean Absolute Percentage Error (MAPE)

$\text{MAPE} = \frac{100\%}{n}\sum_{i=1}^{n}\left

\frac{y_i - \hat{y}_i}{y_i}\right

Properties: Scale-independent, undefined for $y_i = 0$, asymmetric

Mean Absolute Scaled Error (MASE)

$\text{MASE} = \frac{\text{MAE}}{\frac{1}{n-m}\sum_{t=m+1}^{n}

y_t - y_{t-m}

Properties: Scale-independent, no division by zero, MASE < 1 beats naive

  
def mase(y_true, y_pred, y_train, m=1):
    naive_mae = np.mean(np.abs(np.diff(y_train, n=m)))
    model_mae = np.mean(np.abs(y_true - y_pred))
    return model_mae / naive_mae

Residual Diagnostics

  
from statsmodels.stats.diagnostic import acorr_ljungbox
from scipy.stats import shapiro, probplot

residuals = y_true - y_pred

# Ljung-Box test
lb = acorr_ljungbox(residuals, lags=[10], return_df=True)
print(lb)

# Normality
stat, pval = shapiro(residuals)
print(f'Shapiro-Wilk p-value: {pval:.4f}')

# Visual
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
axes[0,0].plot(residuals)
axes[0,0].axhline(0, color='r', linestyle='--')
axes[0,1].hist(residuals, bins=30)
plot_acf(residuals, lags=40, ax=axes[1,0])
probplot(residuals, plot=axes[1,1])
plt.tight_layout()

Python Libraries Overview

Library	Purpose	Strengths	Installation
Statsmodels	Statistical modeling	Rigorous tests, diagnostics	`pip install statsmodels`
Scikit-learn	Machine learning	Scalable, flexible	`pip install scikit-learn`
Prophet	Business forecasting	Automated, interpretable	`pip install prophet`
Ruptures	Changepoint detection	Fast algorithms	`pip install ruptures`

Best Practices

1. Data Preparation

Handle missing values appropriately
Check temporal ordering
Apply proper scaling/transformation

2. Model Selection

Start simple (naive, SES)
Match model to data characteristics
Use domain knowledge

3. Validation

Use TimeSeriesSplit
Evaluate multiple metrics
Check residuals

4. Production

Monitor performance
Update regularly
Document assumptions

Terminology Tables

Table 1: Lifecycle Terminology

General	Statistical	ML	Business
Exploration	EDA	Profiling	Assessment
Preprocessing	Transformation	Feature Engineering	Preparation
Stationarity	Unit Root Test	Distribution Analysis	Trend Check
Forecasting	Prediction	Inference	Estimation

Table 2: Hierarchical Jargon

Level	Category	Terms
Problem	Forecasting	Univariate, Multivariate
Model	Statistical	ARIMA, SARIMA, ETS
	ML	XGBoost, LSTM
Component	Trend	Linear, Exponential
	Seasonal	Fixed, Multiple
Metric	Accuracy	MAE, RMSE, MASE

Table 3: Library Comparison

Feature	Statsmodels	Scikit-learn	Prophet	Ruptures
Init	`ARIMA(data, order)`	`Model()`	`Prophet()`	`Pelt(model)`
Fit	`.fit()`	`.fit(X,y)`	`.fit(df)`	`.fit(signal)`
Predict	`.forecast()`	`.predict()`	`.predict()`	`.predict(pen)`

Complete Code Examples

Example 1: ARIMA End-to-End

  
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller
from sklearn.metrics import mean_absolute_error

# Load
df = pd.read_csv('data.csv', parse_dates=['date'], index_col='date')
train, test = df[:int(len(df)*0.8)], df[int(len(df)*0.8):]

# Test stationarity
adf = adfuller(train['value'])
print(f'ADF p-value: {adf[1]:.4f}')

# Fit
model = ARIMA(train['value'], order=(2,1,2))
fit = model.fit()

# Forecast
forecast = fit.forecast(steps=len(test))

# Evaluate
mae = mean_absolute_error(test['value'], forecast)
print(f'MAE: {mae:.2f}')

# Plot
plt.plot(test.index, test['value'], label='Actual')
plt.plot(test.index, forecast, label='Forecast')
plt.legend()

Example 2: Prophet

  
from prophet import Prophet

df = pd.DataFrame({'ds': dates, 'y': values})
train = df[df['ds'] < '2024-10-01']
test = df[df['ds'] >= '2024-10-01']

model = Prophet(yearly_seasonality=True, weekly_seasonality=True)
model.fit(train)

forecast = model.predict(test[['ds']])
mae = mean_absolute_error(test['y'], forecast['yhat'])
print(f'MAE: {mae:.2f}')

Example 3: ML with Scikit-learn

  
from sklearn.ensemble import HistGradientBoostingRegressor

# Features
for lag in [1,7,14]:
    df[f'lag_{lag}'] = df['value'].shift(lag)
df = df.dropna()

X = df[[c for c in df.columns if c != 'value']]
y = df['value']

# Split
X_train, X_test = X[:int(len(X)*0.8)], X[int(len(X)*0.8):]
y_train, y_test = y[:int(len(y)*0.8)], y[int(len(y)*0.8):]

# Train
model = HistGradientBoostingRegressor(max_iter=200)
model.fit(X_train, y_train)

# Evaluate
mae = mean_absolute_error(y_test, model.predict(X_test))
print(f'MAE: {mae:.2f}')

Example 4: Changepoint Detection

  
import ruptures as rpt

signal = np.concatenate([
    np.random.normal(10, 1, 200),
    np.random.normal(20, 2, 300),
    np.random.normal(5, 1, 500)
])

algo = rpt.Pelt(model="l2", min_size=10)
algo.fit(signal)
changepoints = algo.predict(pen=10)

rpt.display(signal, [], changepoints)
plt.show()

Example 5: Ensemble

  
# Train models
arima_forecast = ARIMA(train, order=(2,1,2)).fit().forecast(len(test))
prophet_forecast = Prophet().fit(train_df).predict(test_df)['yhat'].values
ml_forecast = HistGradientBoostingRegressor().fit(X_train, y_train).predict(X_test)

# Ensemble
ensemble = (arima_forecast + prophet_forecast + ml_forecast) / 3
mae = mean_absolute_error(test, ensemble)
print(f'Ensemble MAE: {mae:.2f}')

Appendices

Appendix A: Common Pitfalls

Data leakage
Over-differencing
Ignoring seasonality
Wrong CV strategy
Single metric evaluation

Appendix B: Formula Reference

Differencing: $\Delta y_t = y_t - y_{t-1}$
ARIMA: $\phi(B)(1-B)^d y_t = \theta(B)\varepsilon_t$
MAE: $\frac{1}{n}\sum y_i - \hat{y}_i $
MASE: $\frac{\text{MAE}}{\text{MAE}_{\text{naive}}}$

Appendix C: Troubleshooting

Won’t converge: Check stationarity, reduce order
Poor accuracy: Try different models, add features
Patterned residuals: Adjust parameters, add components

References

Conclusion

Time series forecasting combines statistical foundations with modern machine learning. Key takeaways:

Start simple: Naive baselines establish performance floors
Match models to data: Consider trend, seasonality, and complexity
Validate properly: Use TimeSeriesSplit, never shuffle
Ensemble when possible: Combining models often improves robustness
Monitor continuously: Models degrade, retrain regularly

The Python ecosystem offers complementary tools:

Statsmodels for statistical rigor
Scikit-learn for ML flexibility
Prophet for business forecasting
Ruptures for changepoint detection

No universal solution exists. Success depends on understanding your data, choosing appropriate methods, validating rigorously, and iterating based on results.

Document Metadata:

Version: 1.0
Date: December 6, 2025
Validated: All code tested with Python 3.8+
Dependencies: statsmodels>=0.13, scikit-learn>=1.0, prophet>=1.1, ruptures>=1.1

Notes, Time Series

This post is licensed under CC BY 4.0 by the author.

Table of Contents

Introduction

Why Time Series Analysis Matters

Key Distinction

Fundamental Concepts

Temporal Dependencies

Autocorrelation Function (ACF)

Partial Autocorrelation Function (PACF)

Time Series Components

1. Trend Component ($T_t$)

2. Seasonal Component ($S_t$)

3. Cyclical Component ($C_t$)

4. Irregular Component ($R_t$)

Decomposition Models

Stationarity and Preprocessing

Definition of Stationarity

Why Stationarity Matters

Testing for Stationarity

Augmented Dickey-Fuller (ADF) Test

KPSS Test

Achieving Stationarity

1. Differencing

2. Transformation

3. Detrending

Statistical Models

Exponential Smoothing Methods

Simple Exponential Smoothing (SES)

Holt’s Linear Trend Method

Holt-Winters Seasonal Method

ARIMA Models

ARIMA(p, d, q) Components

Mathematical Formulation

Parameter Selection

Implementation

Auto ARIMA

SARIMA Models

SARIMA(p,d,q)(P,D,Q,m) Notation

Mathematical Formulation

Implementation

Machine Learning Approaches

Feature Engineering for Time Series

Lag Features

Rolling Window Features

Exponentially Weighted Features

Temporal Features

Cyclical Encoding

Time Series Cross-Validation

Gradient Boosting Implementation

Multi-Step Forecasting Strategies

Recursive Strategy

Direct Strategy

Facebook Prophet

Basic Implementation

Custom Seasonality and Holidays

Cross-Validation

Changepoint Detection with Ruptures

What are Changepoints?

Algorithms

PELT (Pruned Exact Linear Time)

Dynamic Programming

Binary Segmentation

Cost Functions

Penalty Selection

Visualization

Model Evaluation

Metrics

Mean Absolute Error (MAE)

Root Mean Squared Error (RMSE)

Mean Absolute Percentage Error (MAPE)

Mean Absolute Scaled Error (MASE)

Residual Diagnostics

Python Libraries Overview

Best Practices

1. Data Preparation

2. Model Selection

3. Validation

4. Production

Terminology Tables

Table 1: Lifecycle Terminology

Table 2: Hierarchical Jargon