Implementing a Rolling Median Filter for GPS Drift Removal

Implementing a rolling median filter for GPS drift removal is a deterministic, low-latency approach to cleaning noisy telemetry. By applying a sliding window over your coordinate stream and calculating the median latitude and longitude independently, you can suppress extreme outliers while preserving legitimate directional changes. Unlike a moving average, which smooths all deviations equally, the median inherently rejects GPS jumps caused by multipath reflections, urban canyon signal bounce, or temporary satellite lock loss. This makes it highly effective for fleet telematics pipelines where predictable latency and minimal memory overhead are mandatory.

Why Median Over Alternatives?

GPS receivers output positions with inherent stochastic noise. A simple moving average reduces variance but introduces phase lag and smears sharp turns. State-space estimators like the Kalman Filtering for GPS Noise Reduction approach offer optimal tracking but require tuning process/measurement noise covariances and maintaining state across batches.

The rolling median operates statelessly. It requires no covariance matrices, converges instantly, and executes in O(N log W) time where W is the window size. For batch preprocessing or edge devices with constrained compute, it provides a robust first-pass cleaner before heavier algorithms are applied. You can read more about foundational cleaning workflows in GPS Data Preprocessing & Cleaning Fundamentals.

Production-Ready Implementation

Fleet telemetry rarely arrives perfectly aligned. Pings contain irregular intervals, dropped packets, and occasional NaN coordinates. The following function handles chronological sorting, applies the rolling median independently to latitude and longitude, and reverts to original values when the median shift exceeds a configurable drift threshold.

import numpy as np
import pandas as pd
from typing import Tuple

def gps_rolling_median_filter(
    df: pd.DataFrame,
    lat_col: str = "latitude",
    lon_col: str = "longitude",
    ts_col: str = "timestamp",
    window: int = 5,
    max_drift_meters: float = 150.0
) -> pd.DataFrame:
    """
    Apply a rolling median filter to GPS coordinates to remove drift.
    Reverts to original values if the median shift exceeds max_drift_meters.
    """
    if df.empty:
        return df.copy()

    # Ensure chronological order and isolate working copy
    df = df.sort_values(ts_col).copy()

    # Compute rolling median (pandas ignores NaNs by default)
    # See official docs: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html
    lat_median = df[lat_col].rolling(window, min_periods=1, center=True).median()
    lon_median = df[lon_col].rolling(window, min_periods=1, center=True).median()

    # Vectorized Haversine displacement calculation (meters)
    R = 6371000.0  # Earth radius in meters
    dlat = np.radians(lat_median - df[lat_col])
    dlon = np.radians(lon_median - df[lon_col])
    a = (np.sin(dlat / 2)**2 +
         np.cos(np.radians(df[lat_col])) * np.cos(np.radians(lat_median)) *
         np.sin(dlon / 2)**2)
    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a))
    displacement_m = R * c

    # Apply filter only where displacement is within threshold
    # NaN displacements evaluate to False, preserving original NaNs
    mask = displacement_m <= max_drift_meters
    df.loc[mask, lat_col] = lat_median[mask]
    df.loc[mask, lon_col] = lon_median[mask]

    return df

Key Configuration Parameters

Parameter Recommendation Impact
window 3–9 samples Smaller windows react faster to turns but pass more noise. Larger windows smooth aggressively but risk clipping legitimate route deviations. Match window size to your device’s sampling frequency (e.g., window=5 at 1 Hz covers ~5 seconds).
center=True Batch processing Aligns the median to the middle of the window, eliminating phase lag. For real-time streaming pipelines, switch to center=False and accept a one-way delay of (window-1)/2 samples.
min_periods=1 Always enabled Prevents leading/trailing rows from dropping to NaN when the full window isn’t available.
max_drift_meters 50–200 m Acts as a safety clamp. Legitimate highway curves rarely exceed 10–20 m displacement over a 3–5 sample window. Setting this threshold prevents the filter from “snapping” coordinates during valid high-speed maneuvers.

Handling Real-World Telemetry Edge Cases

Irregular Timestamps

Rolling operations in pandas operate on index position, not time deltas. If your device samples at variable rates (e.g., 0.5 Hz to 2 Hz), resample to a fixed frequency first using df.set_index(ts_col).resample("1S").mean() before applying the median. This ensures consistent spatial coverage per window.

NaN Propagation

The Haversine calculation produces NaN when either original or median coordinates are missing. The boolean mask displacement_m <= max_drift_meters safely evaluates to False for NaN values, leaving original coordinates untouched. If you require strict gap-filling, chain df.interpolate(method="linear", limit=3) before median filtering.

Memory & Throughput

For datasets exceeding 10M rows, pandas rolling operations remain highly optimized via Cython. If you’re processing raw 1D arrays without timestamps, scipy.signal.medfilt() offers a lighter footprint. Refer to the SciPy signal processing documentation for boundary handling options (mode="nearest" vs mode="constant").

Coordinate Precision

GPS receivers typically output 6–8 decimal places. The median filter preserves this precision. Avoid rounding before filtering; rounding introduces quantization noise that can artificially inflate displacement calculations.

When to Upgrade to State-Space Models

The rolling median is a non-parametric, outlier-resistant cleaner. It excels at removing impulse noise and multipath spikes but does not model velocity, acceleration, or heading continuity. When your pipeline requires:

  • Predictive tracking during prolonged signal loss (>3 seconds)
  • Sensor fusion (IMU + GNSS + wheel speed)
  • Probabilistic uncertainty bounds for downstream routing algorithms

Transition to recursive Bayesian estimators. The median filter remains an excellent preprocessing step to sanitize raw inputs before feeding them into state-space models, reducing filter divergence and tuning complexity.