Academic Researcher & Systems Developer

Hi, I'm Quan Van

A developer and researcher passionate about scientific inquiry and translating complex academic algorithms into high-performance, low-level database systems and agentic architectures.

Core Expertise

Low-Level Engineering Meets Data Mining

Predictable Vector DBs

Author of PomaiDB: An embedded, single-threaded, LSM-based vector store built specifically for resource-constrained edge devices and flash memory longevity.

Advanced Pattern Mining

Developing state-of-the-art algorithms (CHUO-Miner, HUPP-Miner, MHOUI-Miner) for high-utility, occupancy-aware, and closed itemset pattern recognition.

Privacy-Preserving Systems

Designing shape-hiding keyless tree mining architectures (VIFP) that allow executing recursive operations directly over encrypted and secret-shared data.

Portfolio Highlights

Featured Systems & Libraries

PomaiDB

C++20Vector Indexing

Predictable, edge-native vector database designed to run single-threaded with an append-only LSM storage engine. Linking into lightweight ARM/x86 configurations.

dm

C++20Data Mining

The core C++ Data Mining library implementing high-occupancy and closed utility itemset discovery, vertical bitsets, and shape-hiding encrypted tree processing.

Cheeserag

GoRAG

Local-first RAG and agent orchestration framework integrating C++ LLM inference servers and embedded vector databases into autonomous loops.

palloc

C / RustMemory Systems

A hardware-aware DRAM bank partitioning memory allocator designed to safeguard long-term performance isolation and prevent edge system OOMs.

Academic Work

Recent Research Publications

May 21, 2026

SPARC-HOI: Sparse Anti-Chain Bitset Mining for High-Occupancy Itemsets with Anti-Monotone Pruning

Quan Van

High-occupancy itemset mining with vertical bitset representations achieves high throughput on dense datasets but degrades on sparse databases where bitsets are large and mostly empty. This paper introduces SPARC-HOI, a sparse anti-chain bitset algorithm for high-occupancy itemset mining that switches between dense and sparse bitset representations based on occupancy density thresholds. SPARC-HOI incorporates an anti-chain pruning layer that eliminates dominated candidates before bitset intersection, a compressed sparse-row bitset layout for sparse transaction sets, and an adaptive representation selector that chooses the optimal bitset format per projected database. Experiments on nine real-world datasets spanning dense retail, sparse web-click, and mixed biomedical logs demonstrate consistent speed improvements over dense-only vertical HOI miners.

Open PDF
May 20, 2026

MEDM-Gen: Medical Event Data Mining with Generative Augmentation for Rare Pattern Discovery

Quan Van

Medical event databases are characterized by severe class imbalance, rare co-occurrence patterns, and heterogeneous utility semantics derived from clinical outcome data. Standard high-utility itemset mining algorithms fail on medical event logs because rare but clinically significant patterns fall below minimum support thresholds while frequent but clinically trivial patterns dominate the output. This paper introduces MEDM-Gen, a medical event data mining framework with generative augmentation that combines a conditional variational augmentation module for rare pattern synthesis with a utility-aware mining algorithm incorporating clinical significance weights. MEDM-Gen produces rare-pattern-aware high-utility itemsets validated against held-out clinical event datasets.

Open PDF
May 20, 2026

Closed High-Occupancy Itemset Mining: Definitions, Algorithms, and Compact Representations

Quan Van

High-occupancy itemset mining produces patterns that cover a large fraction of each supporting transaction, but the output can be exponentially redundant. This paper formalizes Closed High-Occupancy Itemsets (CHOI), where an itemset is closed if no proper superset has identical support and equal or higher average occupancy. We prove the CHOI closure operator is well-defined, establish that CHOIs form a complete and lossless condensed representation, and design an efficient algorithm using vertical bitsets, a closure-checking operator, and early-termination pruning to enumerate CHOIs without redundancy.

Open PDF