Implementing a Rolling Median Filter for GPS Drift Removal
Implementing a rolling median filter for GPS drift removal is a deterministic, low-latency approach to cleaning noisy telemetry. By applying a sliding window over your coordinate stream and calculating the median latitude and longitude independently, you can suppress extreme outliers while preserving legitimate directional changes. Unlike a moving average, which smooths all deviations equally, the median inherently rejects GPS jumps caused by multipath reflections, urban canyon signal bounce, or temporary satellite lock loss. This makes it highly effective for fleet telematics pipelines where predictable latency and minimal memory overhead are mandatory.
Why Median Over Alternatives?
GPS receivers output positions with inherent stochastic noise. A simple moving average reduces variance but introduces phase lag and smears sharp turns. State-space estimators like the Kalman Filtering for GPS Noise Reduction approach offer optimal tracking but require tuning process/measurement noise covariances and maintaining state across batches.
The rolling median operates statelessly. It requires no covariance matrices, converges instantly, and executes in O(N log W) time where W is the window size. For batch preprocessing or edge devices with constrained compute, it provides a robust first-pass cleaner before heavier algorithms are applied. You can read more about foundational cleaning workflows in GPS Data Preprocessing & Cleaning Fundamentals.
Production-Ready Implementation
Fleet telemetry rarely arrives perfectly aligned. Pings contain irregular intervals, dropped packets, and occasional NaN coordinates. The following function handles chronological sorting, applies the rolling median independently to latitude and longitude, and reverts to original values when the median shift exceeds a configurable drift threshold.
import numpy as np
import pandas as pd
from typing import Tuple
def gps_rolling_median_filter(
df: pd.DataFrame,
lat_col: str = "latitude",
lon_col: str = "longitude",
ts_col: str = "timestamp",
window: int = 5,
max_drift_meters: float = 150.0
) -> pd.DataFrame:
"""
Apply a rolling median filter to GPS coordinates to remove drift.
Reverts to original values if the median shift exceeds max_drift_meters.
"""
if df.empty:
return df.copy()
# Ensure chronological order and isolate working copy
df = df.sort_values(ts_col).copy()
# Compute rolling median (pandas ignores NaNs by default)
# See official docs: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html
lat_median = df[lat_col].rolling(window, min_periods=1, center=True).median()
lon_median = df[lon_col].rolling(window, min_periods=1, center=True).median()
# Vectorized Haversine displacement calculation (meters)
R = 6371000.0 # Earth radius in meters
dlat = np.radians(lat_median - df[lat_col])
dlon = np.radians(lon_median - df[lon_col])
a = (np.sin(dlat / 2)**2 +
np.cos(np.radians(df[lat_col])) * np.cos(np.radians(lat_median)) *
np.sin(dlon / 2)**2)
c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a))
displacement_m = R * c
# Apply filter only where displacement is within threshold
# NaN displacements evaluate to False, preserving original NaNs
mask = displacement_m <= max_drift_meters
df.loc[mask, lat_col] = lat_median[mask]
df.loc[mask, lon_col] = lon_median[mask]
return df
Key Configuration Parameters
| Parameter | Recommendation | Impact |
|---|---|---|
window |
3–9 samples | Smaller windows react faster to turns but pass more noise. Larger windows smooth aggressively but risk clipping legitimate route deviations. Match window size to your device’s sampling frequency (e.g., window=5 at 1 Hz covers ~5 seconds). |
center=True |
Batch processing | Aligns the median to the middle of the window, eliminating phase lag. For real-time streaming pipelines, switch to center=False and accept a one-way delay of (window-1)/2 samples. |
min_periods=1 |
Always enabled | Prevents leading/trailing rows from dropping to NaN when the full window isn’t available. |
max_drift_meters |
50–200 m | Acts as a safety clamp. Legitimate highway curves rarely exceed 10–20 m displacement over a 3–5 sample window. Setting this threshold prevents the filter from “snapping” coordinates during valid high-speed maneuvers. |
Handling Real-World Telemetry Edge Cases
Irregular Timestamps
Rolling operations in pandas operate on index position, not time deltas. If your device samples at variable rates (e.g., 0.5 Hz to 2 Hz), resample to a fixed frequency first using df.set_index(ts_col).resample("1S").mean() before applying the median. This ensures consistent spatial coverage per window.
NaN Propagation
The Haversine calculation produces NaN when either original or median coordinates are missing. The boolean mask displacement_m <= max_drift_meters safely evaluates to False for NaN values, leaving original coordinates untouched. If you require strict gap-filling, chain df.interpolate(method="linear", limit=3) before median filtering.
Memory & Throughput
For datasets exceeding 10M rows, pandas rolling operations remain highly optimized via Cython. If you’re processing raw 1D arrays without timestamps, scipy.signal.medfilt() offers a lighter footprint. Refer to the SciPy signal processing documentation for boundary handling options (mode="nearest" vs mode="constant").
Coordinate Precision
GPS receivers typically output 6–8 decimal places. The median filter preserves this precision. Avoid rounding before filtering; rounding introduces quantization noise that can artificially inflate displacement calculations.
When to Upgrade to State-Space Models
The rolling median is a non-parametric, outlier-resistant cleaner. It excels at removing impulse noise and multipath spikes but does not model velocity, acceleration, or heading continuity. When your pipeline requires:
- Predictive tracking during prolonged signal loss (>3 seconds)
- Sensor fusion (IMU + GNSS + wheel speed)
- Probabilistic uncertainty bounds for downstream routing algorithms
Transition to recursive Bayesian estimators. The median filter remains an excellent preprocessing step to sanitize raw inputs before feeding them into state-space models, reducing filter divergence and tuning complexity.