SPARC-HOI: Sparse Anti-Chain Bitset Mining for High-Occupancy Itemsets with Anti-Monotone Pruning
Quan Van
High-occupancy itemset mining with vertical bitset representations achieves high throughput on dense datasets but degrades on sparse databases where bitsets are large and mostly empty. This paper introduces SPARC-HOI, a sparse anti-chain bitset algorithm for high-occupancy itemset mining that switches between dense and sparse bitset representations based on occupancy density thresholds. SPARC-HOI incorporates an anti-chain pruning layer that eliminates dominated candidates before bitset intersection, a compressed sparse-row bitset layout for sparse transaction sets, and an adaptive representation selector that chooses the optimal bitset format per projected database. Experiments on nine real-world datasets spanning dense retail, sparse web-click, and mixed biomedical logs demonstrate consistent speed improvements over dense-only vertical HOI miners.
Open PDF