How we process 10,000 live video streams for a state transportation department
Statewide CCTV networks come with painful constraints: heterogeneous hardware, changing lighting, bandwidth limits, and the need for deterministic failover. Here is how we tackled it.
Architecture
- Split the pipeline into ingest, decode, detector, tracker, and event router stages, each independently autoscalable.
- Used transformer-based embeddings for cross-camera re-ID and cached them in a vector store keyed by time and region.
- Pushed lightweight detectors to edge nodes while reserving heavier models for regional hubs to keep latency predictable.
- Added dead-letter queues and circuit breakers so individual camera failures did not cascade.
Performance work
- Profiled GPU utilization across stages; batching and CUDA stream tuning delivered 80%+ utilization.
- Reduced frame processing latency by ~40% with a hybrid LLM+CNN routing step that decides which frames need heavy processing.
- Applied quantization and mixed precision where possible without tanking mAP on night and weather edge cases.
Outcomes
- Processes 10K+ streams per day with real-time safety analytics and automated incident detection.
- Reduced manual monitoring costs by ~60% and gave planners a richer dataset for infrastructure decisions.
- Clear observability (traces, metrics, and tagged examples) made it easier for non-ML teams to operate the pipeline.