CascadeCBF: Probabilistic Counting for Sparse Spatial Point Clouds
Keywords: probabilistic data structures, counting Bloom filter, spatial data, cardinality estimation
Abstract. Spatial point datasets often exhibit longtail distributions, where few locations receive most observations. Traditional counting Bloom filters struggle with such distributions due to fixed counter bit-depths. We present CascadeCBF, a multi-layer probabilistic data structure using cascading overflow: counters in lower layers handle infrequent locations, while overflow cascades to higher layers for hotspots. CascadeCBF supports spatial operations (union, intersection, aggregation) directly on the compressed representation. Evaluation on Zipfian distributions shows that CascadeCBF with minimal increment matches the memory efficiency of Spectral Bloom Filters and Count-Min Sketch, while the multi-layer design enables high-precision counting for highly skewed data with a 1.3-1.6× higher throughput.