Tuning DBSCAN eps and min_samples for Delivery Truck Stops

Start with eps between 0.0005 and 0.0015 radians (~50–150 meters) and min_samples between 3 and 6 points, then refine using a K-distance graph and dwell-time validation. Always use metric='haversine' with radian coordinates to account for Earth’s curvature. This approach anchors spatial clustering to real-world GPS accuracy, vehicle dwell behavior, and route density, forming the backbone of reliable Stop Detection & Dwell Time Analytics.

Map Parameters to Telematics Reality

Delivery trucks stream high-frequency GPS pings (typically 1 per 5–30 seconds). Raw trajectories contain spatial noise from urban canyons, multipath errors, and intersection idling. DBSCAN’s density-based approach avoids assuming spherical clusters, making it ideal for DBSCAN for Fleet Stop Clustering. However, blind tuning fragments micro-stops or merges distinct route segments.

  • eps (neighborhood radius): Set to GPS horizontal accuracy + spatial spread of the stop. Commercial telematics report 5–15 m accuracy under clear skies, but urban drift can exceed 30 m. A 50–100 m radius safely captures loading docks or parking bays while excluding adjacent travel lanes. Convert meters to radians: meters / 6371000 (Earth’s mean radius).
  • min_samples (core point threshold): Equals the minimum pings required to distinguish a true stop from traffic. At 1 ping/10s, min_samples=4 ≈ 40 seconds stationary. Align this with your operational definition (e.g., >2 min for deliveries, >15 min for mandated breaks).

Systematic Tuning Workflow

  1. Velocity pre-filter: Drop points where speed > 5 km/h. This removes highway cruising and cuts computational overhead before clustering.
  2. Convert to radians: DBSCAN with metric='haversine' expects [lat, lon] in radians. Swapping axes or skipping conversion silently corrupts distances. See the scikit-learn DBSCAN documentation for strict metric requirements.
  3. Generate K-distance plot: Compute the distance to the k-th nearest neighbor for all points, sort descending, and plot. The “elbow” (inflection point) indicates a stable eps. For the plot, set k = min_samples.
  4. Iterate & validate: Run DBSCAN, extract centroids, calculate dwell times, and cross-reference with ground truth (driver logs, geofenced depots, or customer POIs). Adjust eps ±10–20% and min_samples ±1 until false positives (traffic signals) and false negatives (short curbside drops) stabilize.

Production-Ready Tuning Script

import numpy as np
import pandas as pd
from sklearn.cluster import DBSCAN
from sklearn.neighbors import NearestNeighbors
import matplotlib.pyplot as plt

def tune_dbscan_stops(df, lat_col='lat', lon_col='lon',
                      speed_col='speed_kmh', speed_threshold=5.0,
                      min_samples=4, k_neighbors=4):
    """
    Pre-filter, convert, and plot K-distance for DBSCAN stop tuning.
    Returns filtered DataFrame in radians, K-distances, and initial cluster labels.
    """
    # 1. Filter by speed to isolate stationary/slow-moving points
    df_filtered = df[df[speed_col] <= speed_threshold].copy()
    if len(df_filtered) == 0:
        raise ValueError("No points below speed threshold.")

    # 2. Convert to radians [lat, lon] for Haversine metric
    coords_rad = np.radians(df_filtered[[lat_col, lon_col]].values)

    # 3. K-distance plot for eps selection
    nbrs = NearestNeighbors(n_neighbors=k_neighbors, metric='haversine')
    nbrs.fit(coords_rad)
    distances, _ = nbrs.kneighbors(coords_rad)
    k_distances = np.sort(distances[:, -1])[::-1]  # k-th neighbor, descending

    plt.figure(figsize=(8, 4))
    plt.plot(k_distances, marker='.', linestyle='-', markersize=4)
    plt.axhline(y=0.001, color='r', linestyle='--', label='eps=0.001 (~111m)')
    plt.xlabel('Points (sorted by distance)')
    plt.ylabel(f'{k_neighbors}-th Nearest Neighbor Distance (radians)')
    plt.title('K-Distance Plot for DBSCAN eps Tuning')
    plt.legend()
    plt.grid(True)
    plt.tight_layout()
    plt.show()

    # 4. Run DBSCAN with baseline parameters
    db = DBSCAN(eps=0.001, min_samples=min_samples, metric='haversine')
    labels = db.fit_predict(coords_rad)
    df_filtered['cluster'] = labels

    return df_filtered, k_distances

# Usage example:
# df_stops, k_dist = tune_dbscan_stops(telematics_df, min_samples=4, k_neighbors=4)

Validation & Operational Edge Cases

  • False Positives (Traffic Signals/Intersections): Increase min_samples or apply a temporal gap threshold (e.g., require >60s between consecutive pings to split clusters).
  • False Negatives (Short Curbside Drops): Lower eps to 0.0005 (~55m) and reduce min_samples to 2–3, but enforce a minimum dwell duration post-clustering to filter GPS jitter.
  • GPS Drift Compensation: Apply a rolling median or lightweight Kalman filter before clustering. Raw telematics often jump 10–20m even when stationary.
  • Temporal Gaps: DBSCAN ignores time. If a truck leaves a depot, drives for 3 hours, and returns, spatial proximity alone will merge them. Always pair spatial clustering with a temporal break threshold (e.g., time_gap > 15 minutes splits clusters).

For formal GPS accuracy calibration, reference the FAA GNSS / GPS reference or NMEA 0183 specifications when mapping hardware-reported HDOP/VDOP values to your eps baseline.

Scaling & Post-Clustering Dwell Calculation

Once eps and min_samples stabilize, compute dwell times using timestamp deltas rather than point counts. Group by cluster, sort by timestamp, and calculate max(timestamp) - min(timestamp). Filter out clusters below your operational minimum (e.g., <2 minutes) and merge adjacent clusters separated by <5 minutes of driving time. For fleets exceeding 10,000 daily pings, replace NearestNeighbors with BallTree or KDTree (both support metric='haversine') to reduce O(n²) complexity. Always persist the final parameter set alongside route metadata to enable automated re-tuning when hardware upgrades or seasonal traffic patterns shift spatial density.