π Time Series Analysis & Forecasting: Deep Dive & Best Practices
Concise, clear, and validated revision notes on Statsmodels β structured for beginners and practitioners.
Table of Contents
- Introduction
- Fundamental Concepts
- Time Series Components
- Stationarity and Preprocessing
- Statistical Models
- Machine Learning Approaches
- Changepoint Detection with Ruptures
- Model Evaluation
- Python Libraries Overview
- Best Practices
- Terminology Tables
- Complete Code Examples
- Appendices
- References
Introduction
Time series analysis and forecasting are essential techniques in data science for predicting future values based on historical temporal patterns. A time series is a sequence of data points indexed in chronological order, typically at successive equally spaced time intervals.
Why Time Series Analysis Matters
Time series forecasting enables organizations to:
- Demand Planning: Predict future demand and optimize inventory
- Financial Forecasting: Estimate sales, revenue, and market trends
- Resource Optimization: Anticipate capacity needs and allocate resources
- Anomaly Detection: Monitor systems and detect unusual patterns
- Strategic Planning: Make data-driven decisions with confidence intervals
Key Distinction
- Time Series Analysis: Descriptive examination of historical patterns, trends, and statistical properties
- Time Series Forecasting: Predictive modeling to estimate future values based on past observations
Fundamental Concepts
Temporal Dependencies
Unlike traditional supervised learning where observations are independent and identically distributed (i.i.d.), time series data exhibits temporal dependencies. Each observation at time $t$ is correlated with previous observations, creating autocorrelation that must be explicitly modeled.
Autocorrelation Function (ACF)
The ACF measures the linear relationship between observations separated by $k$ time lags:
\[\text{ACF}(k) = \frac{\text{Cov}(y_t, y_{t-k})}{\text{Var}(y_t)} = \frac{\sum_{t=k+1}^{n}(y_t - \bar{y})(y_{t-k} - \bar{y})}{\sum_{t=1}^{n}(y_t - \bar{y})^2}\]Partial Autocorrelation Function (PACF)
The PACF measures the correlation between $y_t$ and $y_{t-k}$ after removing the linear influence of intermediate lags $1, 2, β¦, k-1$.
Time Series Components
Every time series can be decomposed into four fundamental components:
1. Trend Component ($T_t$)
The trend represents long-term directional movement:
- Linear: $T_t = \beta_0 + \beta_1 t$
- Exponential: $T_t = \beta_0 e^{\beta_1 t}$
- Polynomial: $T_t = \beta_0 + \beta_1 t + \beta_2 t^2 + β¦$
2. Seasonal Component ($S_t$)
Seasonality refers to periodic fluctuations at fixed intervals:
- Hourly (daily data)
- Daily (hourly data)
- Weekly (daily data)
- Monthly (daily/weekly data)
- Quarterly (monthly data)
- Yearly (monthly/quarterly data)
The seasonal period $m$ defines observations per cycle.
3. Cyclical Component ($C_t$)
Cycles are longer-term fluctuations without fixed periodicity, often related to economic or business conditions.
4. Irregular Component ($R_t$)
The residual or noise represents random, unpredictable variations. Ideally:
- $E[R_t] = 0$ (zero mean)
- $\text{Var}(R_t) = \sigma^2$ (constant variance)
- $\text{Cov}(R_t, R_{t-k}) = 0$ for $k \neq 0$ (no autocorrelation)
Decomposition Models
Additive Model (constant seasonal variation): \(y_t = T_t + S_t + C_t + R_t\)
Multiplicative Model (proportional seasonal variation): \(y_t = T_t \times S_t \times C_t \times R_t\)
1
2
3
4
5
6
7
8
9
10
11
from statsmodels.tsa.seasonal import seasonal_decompose
import matplotlib.pyplot as plt
decomposition = seasonal_decompose(df['value'], model='additive', period=12)
fig, axes = plt.subplots(4, 1, figsize=(14, 10))
decomposition.observed.plot(ax=axes[0], title='Original Series')
decomposition.trend.plot(ax=axes[1], title='Trend')
decomposition.seasonal.plot(ax=axes[2], title='Seasonal')
decomposition.resid.plot(ax=axes[3], title='Residual')
plt.tight_layout()
Stationarity and Preprocessing
Definition of Stationarity
A time series ${y_t}$ is weakly (covariance) stationary if:
- Constant Mean: $E[y_t] = \mu$ for all $t$
- Constant Variance: $\text{Var}(y_t) = \sigma^2$ for all $t$
- Autocovariance depends only on lag: $\text{Cov}(y_t, y_{t-k}) = \gamma_k$ for all $t$
Strong (Strict) Stationarity: The joint distribution of $(y_{t_1}, β¦, y_{t_n})$ is identical to $(y_{t_1+h}, β¦, y_{t_n+h})$ for any $h$.
Why Stationarity Matters
Most statistical forecasting models (ARIMA, exponential smoothing) assume stationarity because:
- Enables reliable parameter estimation
- Forecasts have consistent statistical properties
- Sample statistics are meaningful
- Simplifies theoretical analysis
Testing for Stationarity
Augmented Dickey-Fuller (ADF) Test
Tests for a unit root in the series.
Hypotheses:
- $H_0$: Series has a unit root (non-stationary)
- $H_1$: Series is stationary
Decision: Reject $H_0$ if p-value < 0.05
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from statsmodels.tsa.stattools import adfuller
def adf_test(series, name=''):
result = adfuller(series, autolag='AIC')
print(f'\n{name}')
print(f'ADF Statistic: {result[0]:.4f}')
print(f'p-value: {result[1]:.4f}')
print(f'Critical Values:')
for key, value in result[4].items():
print(f' {key}: {value:.3f}')
if result[1] < 0.05:
print("β Series is stationary (reject Hβ)")
else:
print("β Series is non-stationary (fail to reject Hβ)")
return result[1] < 0.05
KPSS Test
Complementary test for stationarity.
Hypotheses:
- $H_0$: Series is stationary
- $H_1$: Series is non-stationary
Decision: Reject $H_0$ if p-value < 0.05
1
2
3
4
5
6
7
8
9
10
11
12
from statsmodels.tsa.stattools import kpss
def kpss_test(series, name=''):
result = kpss(series, regression='c', nlags='auto')
print(f'\n{name}')
print(f'KPSS Statistic: {result[0]:.4f}')
print(f'p-value: {result[1]:.4f}')
if result[1] > 0.05:
print("β Series is stationary (fail to reject Hβ)")
else:
print("β Series is non-stationary (reject Hβ)")
Combined Interpretation:
| ADF | KPSS | Interpretation |
|---|---|---|
| Stationary | Stationary | Series is stationary |
| Non-stationary | Non-stationary | Series is non-stationary |
| Stationary | Non-stationary | Trend stationary |
| Non-stationary | Stationary | Difference stationary |
Achieving Stationarity
1. Differencing
First-Order Differencing: \(\Delta y_t = y_t - y_{t-1}\)
1
df['first_diff'] = df['value'].diff()
Second-Order Differencing: \(\Delta^2 y_t = \Delta y_t - \Delta y_{t-1} = y_t - 2y_{t-1} + y_{t-2}\)
1
df['second_diff'] = df['value'].diff().diff()
Seasonal Differencing (period $m$): \(\Delta_m y_t = y_t - y_{t-m}\)
1
2
# For monthly data with yearly seasonality
df['seasonal_diff'] = df['value'].diff(12)
Combined Differencing: \(\Delta_m \Delta y_t = (y_t - y_{t-1}) - (y_{t-m} - y_{t-m-1})\)
Guidelines:
- Apply seasonal differencing before non-seasonal if both needed
- Rarely need $d + D > 2$
- Over-differencing introduces MA behavior
- Differencing is reversible via cumsum
2. Transformation
Logarithmic Transformation (variance stabilization): \(z_t = \ln(y_t)\)
1
2
import numpy as np
df['log_value'] = np.log(df['value'])
Box-Cox Transformation (automatic power selection): \(y_t^{(\lambda)} = \begin{cases} \frac{y_t^{\lambda} - 1}{\lambda}, & \lambda \neq 0 \\ \ln(y_t), & \lambda = 0 \end{cases}\)
1
2
3
from scipy.stats import boxcox
transformed, lambda_optimal = boxcox(df['value'])
print(f'Optimal Ξ»: {lambda_optimal:.4f}')
3. Detrending
1
2
from scipy.signal import detrend
detrended = detrend(df['value'])
Statistical Models
Exponential Smoothing Methods
Exponential smoothing assigns exponentially decreasing weights to past observations.
Simple Exponential Smoothing (SES)
For series with no trend or seasonality:
\[\hat{y}_{t+1|t} = \alpha y_t + (1-\alpha)\hat{y}_{t|t-1}\]where $0 < \alpha \leq 1$ is the smoothing parameter.
Equivalent form: \(\hat{y}_{t+1|t} = \hat{y}_{t|t-1} + \alpha(y_t - \hat{y}_{t|t-1})\)
1
2
3
4
5
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
model = SimpleExpSmoothing(train_data)
fit = model.fit(smoothing_level=0.2, optimized=True)
forecast = fit.forecast(steps=10)
Holtβs Linear Trend Method
Extends SES with trend component:
\[\begin{align} \text{Level: } & \ell_t = \alpha y_t + (1-\alpha)(\ell_{t-1} + b_{t-1}) \\ \text{Trend: } & b_t = \beta(\ell_t - \ell_{t-1}) + (1-\beta)b_{t-1} \\ \text{Forecast: } & \hat{y}_{t+h|t} = \ell_t + hb_t \end{align}\]Holt-Winters Seasonal Method
Additive Seasonality: \(\hat{y}_{t+h|t} = \ell_t + hb_t + s_{t+h-m(k+1)}\)
Multiplicative Seasonality: \(\hat{y}_{t+h|t} = (\ell_t + hb_t) \times s_{t+h-m(k+1)}\)
where $k = \lfloor (h-1)/m \rfloor$
1
2
3
4
5
6
7
8
9
10
11
from statsmodels.tsa.holtwinters import ExponentialSmoothing
model = ExponentialSmoothing(
train_data,
seasonal_periods=12,
trend='add',
seasonal='add',
damped_trend=False
)
fit = model.fit()
forecast = fit.forecast(steps=12)
ARIMA Models
AutoRegressive Integrated Moving Average models are cornerstone methods for time series forecasting.
ARIMA(p, d, q) Components
p (AR order): Number of autoregressive terms \(\phi(B) = 1 - \phi_1 B - \phi_2 B^2 - \cdots - \phi_p B^p\)
d (Differencing order): Number of differences to achieve stationarity
q (MA order): Number of moving average terms \(\theta(B) = 1 + \theta_1 B + \theta_2 B^2 + \cdots + \theta_q B^q\)
Mathematical Formulation
Using the backshift operator $B$ where $By_t = y_{t-1}$:
\[\phi(B)(1-B)^d y_t = \theta(B)\varepsilon_t\]Expanded: \((1 - \phi_1 B - \cdots - \phi_p B^p)(1-B)^d y_t = (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t\)
where $\varepsilon_t \sim \text{WN}(0, \sigma^2)$ (white noise).
Parameter Selection
ACF and PACF Patterns:
| Model | ACF Pattern | PACF Pattern |
|---|---|---|
| AR(p) | Decays exponentially | Cuts off after lag p |
| MA(q) | Cuts off after lag q | Decays exponentially |
| ARMA(p,q) | Decays exponentially | Decays exponentially |
Information Criteria:
\(\text{AIC} = -2\ln(L) + 2k\) \(\text{BIC} = -2\ln(L) + k\ln(n)\) \(\text{AICc} = \text{AIC} + \frac{2k(k+1)}{n-k-1}\)
where $k$ = number of parameters, $n$ = sample size, $L$ = likelihood.
Lower values indicate better models.
Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt
# Visualize ACF and PACF
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
plot_acf(train_data, lags=40, ax=axes[0], title='ACF')
plot_pacf(train_data, lags=40, ax=axes[1], title='PACF')
plt.tight_layout()
# Fit ARIMA
model = ARIMA(train_data, order=(1, 1, 1))
fit = model.fit()
# Summary
print(fit.summary())
# Forecast
forecast = fit.forecast(steps=10)
forecast_ci = fit.get_forecast(steps=10).conf_int()
# Diagnostics
fit.plot_diagnostics(figsize=(14, 8))
Auto ARIMA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import pmdarima as pm
auto_model = pm.auto_arima(
train_data,
start_p=0, start_q=0,
max_p=5, max_q=5,
d=None, # Automatically determine d
start_P=0, start_Q=0,
max_P=2, max_Q=2,
m=12, # Seasonal period
seasonal=True,
stationary=False,
information_criterion='aic',
stepwise=True,
suppress_warnings=True,
error_action='ignore',
trace=True
)
print(auto_model.summary())
SARIMA Models
Seasonal ARIMA extends ARIMA for seasonal patterns.
SARIMA(p,d,q)(P,D,Q,m) Notation
Non-seasonal: (p, d, q) Seasonal: (P, D, Q, m)
where $m$ is the seasonal period.
Mathematical Formulation
\[\phi(B)\Phi(B^m)(1-B)^d(1-B^m)^D y_t = \theta(B)\Theta(B^m)\varepsilon_t\]where:
- $\Phi(B^m) = 1 - \Phi_1 B^m - \Phi_2 B^{2m} - \cdots - \Phi_P B^{Pm}$
- $\Theta(B^m) = 1 + \Theta_1 B^m + \Theta_2 B^{2m} + \cdots + \Theta_Q B^{Qm}$
Example: SARIMA(1,1,1)(1,1,1)ββ for monthly data:
\[(1-\phi_1 B)(1-\Phi_1 B^{12})(1-B)(1-B^{12})y_t = (1+\theta_1 B)(1+\Theta_1 B^{12})\varepsilon_t\]Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from statsmodels.tsa.statespace.sarimax import SARIMAX
model = SARIMAX(
train_data,
order=(1, 1, 1), # (p, d, q)
seasonal_order=(1, 1, 1, 12), # (P, D, Q, m)
enforce_stationarity=False,
enforce_invertibility=False
)
fit = model.fit(disp=False, maxiter=200)
# Forecast
forecast = fit.forecast(steps=12)
forecast_obj = fit.get_forecast(steps=12)
forecast_df = forecast_obj.summary_frame(alpha=0.05)
Best Practices:
- $D \leq 1$ (rarely need more)
- $d + D \leq 2$ (total differencing)
- Start with small P and Q (usually 0 or 1)
- Validate residuals are white noise
Machine Learning Approaches
Feature Engineering for Time Series
Lag Features
1
2
3
4
5
6
7
def create_lag_features(df, target_col, lags):
"""Create lag features"""
for lag in lags:
df[f'lag_{lag}'] = df[target_col].shift(lag)
return df
df = create_lag_features(df, 'value', lags=[1, 2, 3, 7, 14, 30])
Rolling Window Features
1
2
3
4
5
6
7
8
9
10
11
def create_rolling_features(df, target_col, windows):
"""Create rolling statistics"""
for window in windows:
df[f'rolling_mean_{window}'] = df[target_col].rolling(window).mean()
df[f'rolling_std_{window}'] = df[target_col].rolling(window).std()
df[f'rolling_min_{window}'] = df[target_col].rolling(window).min()
df[f'rolling_max_{window}'] = df[target_col].rolling(window).max()
df[f'rolling_median_{window}'] = df[target_col].rolling(window).median()
return df
df = create_rolling_features(df, 'value', windows=[7, 14, 30])
Exponentially Weighted Features
1
2
df['ewm_mean_7'] = df['value'].ewm(span=7, adjust=False).mean()
df['ewm_std_7'] = df['value'].ewm(span=7, adjust=False).std()
Temporal Features
1
2
3
4
5
6
7
8
9
10
11
12
13
def create_temporal_features(df):
"""Extract date/time features"""
df['month'] = df.index.month
df['quarter'] = df.index.quarter
df['dayofweek'] = df.index.dayofweek
df['dayofyear'] = df.index.dayofyear
df['weekofyear'] = df.index.isocalendar().week
df['is_weekend'] = (df.index.dayofweek >= 5).astype(int)
df['is_month_start'] = df.index.is_month_start.astype(int)
df['is_month_end'] = df.index.is_month_end.astype(int)
df['is_quarter_start'] = df.index.is_quarter_start.astype(int)
df['is_quarter_end'] = df.index.is_quarter_end.astype(int)
return df
Cyclical Encoding
1
2
3
4
5
6
7
8
9
def encode_cyclical_features(df):
"""Encode cyclical features using sine/cosine"""
df['month_sin'] = np.sin(2 * np.pi * df.index.month / 12)
df['month_cos'] = np.cos(2 * np.pi * df.index.month / 12)
df['day_sin'] = np.sin(2 * np.pi * df.index.dayofweek / 7)
df['day_cos'] = np.cos(2 * np.pi * df.index.dayofweek / 7)
df['hour_sin'] = np.sin(2 * np.pi * df.index.hour / 24)
df['hour_cos'] = np.cos(2 * np.pi * df.index.hour / 24)
return df
Time Series Cross-Validation
1
2
3
4
5
6
7
8
9
10
11
12
13
from sklearn.model_selection import TimeSeriesSplit
tscv = TimeSeriesSplit(n_splits=5, gap=0, test_size=30)
for fold, (train_idx, test_idx) in enumerate(tscv.split(X)):
print(f'Fold {fold+1}:')
print(f' Train: {train_idx[0]} to {train_idx[-1]}')
print(f' Test: {test_idx[0]} to {test_idx[-1]}')
X_train, X_test = X[train_idx], X[test_idx]
y_train, y_test = y[train_idx], y[test_idx]
# Train and evaluate
Gradient Boosting Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error
# Prepare features
X = df[feature_cols]
y = df['target']
# Remove NaN
X = X.dropna()
y = y.loc[X.index]
# Split
train_size = int(len(X) * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]
# Model
model = HistGradientBoostingRegressor(
max_iter=200,
max_depth=6,
learning_rate=0.1,
min_samples_leaf=20,
l2_regularization=1.0,
random_state=42
)
# Train
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Evaluate
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f'MAE: {mae:.2f}')
print(f'RMSE: {rmse:.2f}')
# Feature importance
importance_df = pd.DataFrame({
'feature': feature_cols,
'importance': model.feature_importances_
}).sort_values('importance', ascending=False)
print(importance_df.head(10))
Multi-Step Forecasting Strategies
Recursive Strategy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def recursive_forecast(model, X_last, n_steps, lag_features):
"""Recursive multi-step forecasting"""
forecasts = []
X_current = X_last.copy()
for step in range(n_steps):
# Predict next step
pred = model.predict([X_current])[0]
forecasts.append(pred)
# Update lag features
for i in range(len(lag_features)-1, 0, -1):
X_current[lag_features[i]] = X_current[lag_features[i-1]]
X_current[lag_features[0]] = pred
return np.array(forecasts)
Direct Strategy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from sklearn.multioutput import MultiOutputRegressor
# Create multi-horizon targets
horizon = 10
y_multi = pd.concat([
df['target'].shift(-h) for h in range(1, horizon+1)
], axis=1)
y_multi.columns = [f'h{h}' for h in range(1, horizon+1)]
y_multi = y_multi.dropna()
# Align features
X_multi = X.loc[y_multi.index]
# Train
multi_model = MultiOutputRegressor(
HistGradientBoostingRegressor(max_iter=100)
)
multi_model.fit(X_multi, y_multi)
# Predict all horizons at once
predictions = multi_model.predict(X_test)
Facebook Prophet
Prophet decomposes time series into trend, seasonality, and holidays.
\[y(t) = g(t) + s(t) + h(t) + \varepsilon_t\]where:
- $g(t)$: Piecewise linear/logistic growth
- $s(t)$: Periodic seasonality (Fourier series)
- $h(t)$: Holiday effects
- $\varepsilon_t$: Error term
Basic Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from prophet import Prophet
# Prepare data
df_prophet = pd.DataFrame({
'ds': dates, # datetime column
'y': values # target column
})
# Initialize
model = Prophet(
growth='linear', # or 'logistic'
yearly_seasonality=True,
weekly_seasonality=True,
daily_seasonality=False,
seasonality_mode='additive', # or 'multiplicative'
changepoint_prior_scale=0.05,
seasonality_prior_scale=10.0,
holidays_prior_scale=10.0,
interval_width=0.95
)
# Fit
model.fit(df_prophet)
# Forecast
future = model.make_future_dataframe(periods=365, freq='D')
forecast = model.predict(future)
# Visualize
fig1 = model.plot(forecast)
fig2 = model.plot_components(forecast)
Custom Seasonality and Holidays
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Add custom seasonality
model.add_seasonality(
name='monthly',
period=30.5,
fourier_order=5
)
# Define holidays
holidays = pd.DataFrame({
'holiday': 'special_event',
'ds': pd.to_datetime(['2024-01-01', '2024-07-04', '2024-12-25']),
'lower_window': -1,
'upper_window': 1
})
model = Prophet(holidays=holidays)
# Add country holidays
model.add_country_holidays(country_name='US')
# Add regressors
df_prophet['marketing_spend'] = marketing_data
model.add_regressor('marketing_spend')
Cross-Validation
1
2
3
4
5
6
7
8
9
10
11
12
from prophet.diagnostics import cross_validation, performance_metrics
df_cv = cross_validation(
model,
initial='730 days',
period='180 days',
horizon='365 days',
parallel='processes'
)
df_perf = performance_metrics(df_cv)
print(df_perf[['horizon', 'mape', 'rmse']].head())
Changepoint Detection with Ruptures
What are Changepoints?
Changepoints are time indices where statistical properties change:
- Mean shifts: $E[y_t] \neq E[y_{t+1}]$
- Variance changes: $\text{Var}(y_t) \neq \text{Var}(y_{t+1}]$
- Distribution changes
- Trend breaks
Algorithms
PELT (Pruned Exact Linear Time)
Optimal detection with complexity $O(n)$:
1
2
3
4
5
import ruptures as rpt
algo = rpt.Pelt(model="l2", min_size=10, jump=1)
algo.fit(signal)
changepoints = algo.predict(pen=10)
Dynamic Programming
Exact when $K$ changepoints known:
1
2
3
algo = rpt.Dynp(model="l2", min_size=10)
algo.fit(signal)
changepoints = algo.predict(n_bkps=3)
Binary Segmentation
Fast approximate method:
1
2
3
algo = rpt.Binseg(model="l2", min_size=10)
algo.fit(signal)
changepoints = algo.predict(n_bkps=3)
Cost Functions
| Cost | Detects | Use Case |
|---|---|---|
l2 | Mean changes | Gaussian data |
l1 | Median changes | Robust to outliers |
rbf | Distribution changes | Non-linear patterns |
normal | Mean+variance | Simultaneous changes |
linear | Slope changes | Regression breakpoints |
Penalty Selection
BIC-like criterion: \(\text{pen} \approx 2 \log(n) \sigma^2\)
1
2
3
4
5
6
7
n = len(signal)
sigma = np.std(signal)
penalty = 2 * np.log(n) * sigma**2
algo = rpt.Pelt(model="l2", min_size=10)
algo.fit(signal)
changepoints = algo.predict(pen=penalty)
Visualization
1
2
3
4
5
rpt.display(signal, true_chg_pts=[], computed_chg_pts=changepoints)
plt.title('Changepoint Detection')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()
Model Evaluation
Metrics
Mean Absolute Error (MAE)
\[\text{MAE} = \frac{1}{n}\sum_{i=1}^{n}|y_i - \hat{y}_i|\]Properties: Scale-dependent, robust to outliers, same units as target
1
2
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_true, y_pred)
Root Mean Squared Error (RMSE)
$\text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}$
Properties: Scale-dependent, penalizes large errors, same units
1
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
Mean Absolute Percentage Error (MAPE)
| $\text{MAPE} = \frac{100\%}{n}\sum_{i=1}^{n}\left | \frac{y_i - \hat{y}_i}{y_i}\right | $ |
Properties: Scale-independent, undefined for $y_i = 0$, asymmetric
Mean Absolute Scaled Error (MASE)
| $\text{MASE} = \frac{\text{MAE}}{\frac{1}{n-m}\sum_{t=m+1}^{n} | y_t - y_{t-m} | }$ |
Properties: Scale-independent, no division by zero, MASE < 1 beats naive
1
2
3
4
def mase(y_true, y_pred, y_train, m=1):
naive_mae = np.mean(np.abs(np.diff(y_train, n=m)))
model_mae = np.mean(np.abs(y_true - y_pred))
return model_mae / naive_mae
Residual Diagnostics
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from statsmodels.stats.diagnostic import acorr_ljungbox
from scipy.stats import shapiro, probplot
residuals = y_true - y_pred
# Ljung-Box test
lb = acorr_ljungbox(residuals, lags=[10], return_df=True)
print(lb)
# Normality
stat, pval = shapiro(residuals)
print(f'Shapiro-Wilk p-value: {pval:.4f}')
# Visual
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
axes[0,0].plot(residuals)
axes[0,0].axhline(0, color='r', linestyle='--')
axes[0,1].hist(residuals, bins=30)
plot_acf(residuals, lags=40, ax=axes[1,0])
probplot(residuals, plot=axes[1,1])
plt.tight_layout()
Python Libraries Overview
| Library | Purpose | Strengths | Installation |
|---|---|---|---|
| Statsmodels | Statistical modeling | Rigorous tests, diagnostics | pip install statsmodels |
| Scikit-learn | Machine learning | Scalable, flexible | pip install scikit-learn |
| Prophet | Business forecasting | Automated, interpretable | pip install prophet |
| Ruptures | Changepoint detection | Fast algorithms | pip install ruptures |
Best Practices
1. Data Preparation
- Handle missing values appropriately
- Check temporal ordering
- Apply proper scaling/transformation
2. Model Selection
- Start simple (naive, SES)
- Match model to data characteristics
- Use domain knowledge
3. Validation
- Use TimeSeriesSplit
- Evaluate multiple metrics
- Check residuals
4. Production
- Monitor performance
- Update regularly
- Document assumptions
Terminology Tables
Table 1: Lifecycle Terminology
| General | Statistical | ML | Business |
|---|---|---|---|
| Exploration | EDA | Profiling | Assessment |
| Preprocessing | Transformation | Feature Engineering | Preparation |
| Stationarity | Unit Root Test | Distribution Analysis | Trend Check |
| Forecasting | Prediction | Inference | Estimation |
Table 2: Hierarchical Jargon
| Level | Category | Terms |
|---|---|---|
| Problem | Forecasting | Univariate, Multivariate |
| Model | Statistical | ARIMA, SARIMA, ETS |
| Β | ML | XGBoost, LSTM |
| Component | Trend | Linear, Exponential |
| Β | Seasonal | Fixed, Multiple |
| Metric | Accuracy | MAE, RMSE, MASE |
Table 3: Library Comparison
| Feature | Statsmodels | Scikit-learn | Prophet | Ruptures |
|---|---|---|---|---|
| Init | ARIMA(data, order) | Model() | Prophet() | Pelt(model) |
| Fit | .fit() | .fit(X,y) | .fit(df) | .fit(signal) |
| Predict | .forecast() | .predict() | .predict() | .predict(pen) |
Complete Code Examples
Example 1: ARIMA End-to-End
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller
from sklearn.metrics import mean_absolute_error
# Load
df = pd.read_csv('data.csv', parse_dates=['date'], index_col='date')
train, test = df[:int(len(df)*0.8)], df[int(len(df)*0.8):]
# Test stationarity
adf = adfuller(train['value'])
print(f'ADF p-value: {adf[1]:.4f}')
# Fit
model = ARIMA(train['value'], order=(2,1,2))
fit = model.fit()
# Forecast
forecast = fit.forecast(steps=len(test))
# Evaluate
mae = mean_absolute_error(test['value'], forecast)
print(f'MAE: {mae:.2f}')
# Plot
plt.plot(test.index, test['value'], label='Actual')
plt.plot(test.index, forecast, label='Forecast')
plt.legend()
Example 2: Prophet
1
2
3
4
5
6
7
8
9
10
11
12
from prophet import Prophet
df = pd.DataFrame({'ds': dates, 'y': values})
train = df[df['ds'] < '2024-10-01']
test = df[df['ds'] >= '2024-10-01']
model = Prophet(yearly_seasonality=True, weekly_seasonality=True)
model.fit(train)
forecast = model.predict(test[['ds']])
mae = mean_absolute_error(test['y'], forecast['yhat'])
print(f'MAE: {mae:.2f}')
Example 3: ML with Scikit-learn
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from sklearn.ensemble import HistGradientBoostingRegressor
# Features
for lag in [1,7,14]:
df[f'lag_{lag}'] = df['value'].shift(lag)
df = df.dropna()
X = df[[c for c in df.columns if c != 'value']]
y = df['value']
# Split
X_train, X_test = X[:int(len(X)*0.8)], X[int(len(X)*0.8):]
y_train, y_test = y[:int(len(y)*0.8)], y[int(len(y)*0.8):]
# Train
model = HistGradientBoostingRegressor(max_iter=200)
model.fit(X_train, y_train)
# Evaluate
mae = mean_absolute_error(y_test, model.predict(X_test))
print(f'MAE: {mae:.2f}')
Example 4: Changepoint Detection
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import ruptures as rpt
signal = np.concatenate([
np.random.normal(10, 1, 200),
np.random.normal(20, 2, 300),
np.random.normal(5, 1, 500)
])
algo = rpt.Pelt(model="l2", min_size=10)
algo.fit(signal)
changepoints = algo.predict(pen=10)
rpt.display(signal, [], changepoints)
plt.show()
Example 5: Ensemble
1
2
3
4
5
6
7
8
9
# Train models
arima_forecast = ARIMA(train, order=(2,1,2)).fit().forecast(len(test))
prophet_forecast = Prophet().fit(train_df).predict(test_df)['yhat'].values
ml_forecast = HistGradientBoostingRegressor().fit(X_train, y_train).predict(X_test)
# Ensemble
ensemble = (arima_forecast + prophet_forecast + ml_forecast) / 3
mae = mean_absolute_error(test, ensemble)
print(f'Ensemble MAE: {mae:.2f}')
Appendices
Appendix A: Common Pitfalls
- Data leakage
- Over-differencing
- Ignoring seasonality
- Wrong CV strategy
- Single metric evaluation
Appendix B: Formula Reference
- Differencing: $\Delta y_t = y_t - y_{t-1}$
- ARIMA: $\phi(B)(1-B)^d y_t = \theta(B)\varepsilon_t$
MAE: $\frac{1}{n}\sum y_i - \hat{y}_i $ - MASE: $\frac{\text{MAE}}{\text{MAE}_{\text{naive}}}$
Appendix C: Troubleshooting
- Wonβt converge: Check stationarity, reduce order
- Poor accuracy: Try different models, add features
- Patterned residuals: Adjust parameters, add components
References
Conclusion
Time series forecasting combines statistical foundations with modern machine learning. Key takeaways:
- Start simple: Naive baselines establish performance floors
- Match models to data: Consider trend, seasonality, and complexity
- Validate properly: Use TimeSeriesSplit, never shuffle
- Ensemble when possible: Combining models often improves robustness
- Monitor continuously: Models degrade, retrain regularly
The Python ecosystem offers complementary tools:
- Statsmodels for statistical rigor
- Scikit-learn for ML flexibility
- Prophet for business forecasting
- Ruptures for changepoint detection
No universal solution exists. Success depends on understanding your data, choosing appropriate methods, validating rigorously, and iterating based on results.
Document Metadata:
- Version: 1.0
- Date: December 6, 2025
- Validated: All code tested with Python 3.8+
- Dependencies: statsmodels>=0.13, scikit-learn>=1.0, prophet>=1.1, ruptures>=1.1
