Tuning DBSCAN eps and min_samples for Delivery Truck Stops
Start with eps between 0.0005 and 0.0015 radians (~50–150 meters) and min_samples between 3 and 6 points, then refine using a K-distance graph and dwell-time validation. Always use metric='haversine' with radian coordinates to account for Earth’s curvature. This approach anchors spatial clustering to real-world GPS accuracy, vehicle dwell behavior, and route density, forming the backbone of reliable Stop Detection & Dwell Time Analytics.
Map Parameters to Telematics Reality
Delivery trucks stream high-frequency GPS pings (typically 1 per 5–30 seconds). Raw trajectories contain spatial noise from urban canyons, multipath errors, and intersection idling. DBSCAN’s density-based approach avoids assuming spherical clusters, making it ideal for DBSCAN for Fleet Stop Clustering. However, blind tuning fragments micro-stops or merges distinct route segments.
eps(neighborhood radius): Set toGPS horizontal accuracy + spatial spread of the stop. Commercial telematics report 5–15 m accuracy under clear skies, but urban drift can exceed 30 m. A 50–100 m radius safely captures loading docks or parking bays while excluding adjacent travel lanes. Convert meters to radians:meters / 6371000(Earth’s mean radius).min_samples(core point threshold): Equals the minimum pings required to distinguish a true stop from traffic. At 1 ping/10s,min_samples=4≈ 40 seconds stationary. Align this with your operational definition (e.g., >2 min for deliveries, >15 min for mandated breaks).
Systematic Tuning Workflow
- Velocity pre-filter: Drop points where
speed > 5 km/h. This removes highway cruising and cuts computational overhead before clustering. - Convert to radians:
DBSCANwithmetric='haversine'expects[lat, lon]in radians. Swapping axes or skipping conversion silently corrupts distances. See the scikit-learn DBSCAN documentation for strict metric requirements. - Generate K-distance plot: Compute the distance to the k-th nearest neighbor for all points, sort descending, and plot. The “elbow” (inflection point) indicates a stable
eps. For the plot, setk = min_samples. - Iterate & validate: Run DBSCAN, extract centroids, calculate dwell times, and cross-reference with ground truth (driver logs, geofenced depots, or customer POIs). Adjust
eps±10–20% andmin_samples±1 until false positives (traffic signals) and false negatives (short curbside drops) stabilize.
Production-Ready Tuning Script
import numpy as np
import pandas as pd
from sklearn.cluster import DBSCAN
from sklearn.neighbors import NearestNeighbors
import matplotlib.pyplot as plt
def tune_dbscan_stops(df, lat_col='lat', lon_col='lon',
speed_col='speed_kmh', speed_threshold=5.0,
min_samples=4, k_neighbors=4):
"""
Pre-filter, convert, and plot K-distance for DBSCAN stop tuning.
Returns filtered DataFrame in radians, K-distances, and initial cluster labels.
"""
# 1. Filter by speed to isolate stationary/slow-moving points
df_filtered = df[df[speed_col] <= speed_threshold].copy()
if len(df_filtered) == 0:
raise ValueError("No points below speed threshold.")
# 2. Convert to radians [lat, lon] for Haversine metric
coords_rad = np.radians(df_filtered[[lat_col, lon_col]].values)
# 3. K-distance plot for eps selection
nbrs = NearestNeighbors(n_neighbors=k_neighbors, metric='haversine')
nbrs.fit(coords_rad)
distances, _ = nbrs.kneighbors(coords_rad)
k_distances = np.sort(distances[:, -1])[::-1] # k-th neighbor, descending
plt.figure(figsize=(8, 4))
plt.plot(k_distances, marker='.', linestyle='-', markersize=4)
plt.axhline(y=0.001, color='r', linestyle='--', label='eps=0.001 (~111m)')
plt.xlabel('Points (sorted by distance)')
plt.ylabel(f'{k_neighbors}-th Nearest Neighbor Distance (radians)')
plt.title('K-Distance Plot for DBSCAN eps Tuning')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
# 4. Run DBSCAN with baseline parameters
db = DBSCAN(eps=0.001, min_samples=min_samples, metric='haversine')
labels = db.fit_predict(coords_rad)
df_filtered['cluster'] = labels
return df_filtered, k_distances
# Usage example:
# df_stops, k_dist = tune_dbscan_stops(telematics_df, min_samples=4, k_neighbors=4)
Validation & Operational Edge Cases
- False Positives (Traffic Signals/Intersections): Increase
min_samplesor apply a temporal gap threshold (e.g., require >60s between consecutive pings to split clusters). - False Negatives (Short Curbside Drops): Lower
epsto0.0005(~55m) and reducemin_samplesto2–3, but enforce a minimum dwell duration post-clustering to filter GPS jitter. - GPS Drift Compensation: Apply a rolling median or lightweight Kalman filter before clustering. Raw telematics often jump 10–20m even when stationary.
- Temporal Gaps: DBSCAN ignores time. If a truck leaves a depot, drives for 3 hours, and returns, spatial proximity alone will merge them. Always pair spatial clustering with a temporal break threshold (e.g.,
time_gap > 15 minutessplits clusters).
For formal GPS accuracy calibration, reference the FAA GNSS / GPS reference or NMEA 0183 specifications when mapping hardware-reported HDOP/VDOP values to your eps baseline.
Scaling & Post-Clustering Dwell Calculation
Once eps and min_samples stabilize, compute dwell times using timestamp deltas rather than point counts. Group by cluster, sort by timestamp, and calculate max(timestamp) - min(timestamp). Filter out clusters below your operational minimum (e.g., <2 minutes) and merge adjacent clusters separated by <5 minutes of driving time. For fleets exceeding 10,000 daily pings, replace NearestNeighbors with BallTree or KDTree (both support metric='haversine') to reduce O(n²) complexity. Always persist the final parameter set alongside route metadata to enable automated re-tuning when hardware upgrades or seasonal traffic patterns shift spatial density.