Scaling in Transformer Architectures: The Mathematical Rationale behind $\sqrt{d_k}$
A derivation of the variance explosion in high-dimensional dot products and its deleterious effects on softmax saturation and gradient propagation.
Deep dives into the technical architecture of PomaiDB, vector search algorithms, and the future of decentralized edge AI.
A derivation of the variance explosion in high-dimensional dot products and its deleterious effects on softmax saturation and gradient propagation.
An exploration of the context window in Large Language Models, detailing its token-based architecture, O(N^2) computational complexity, and the 'Lost in the Middle' phenomenon.
A comprehensive breakdown of Multi-Head Attention, the mathematical framework that allows Transformers to capture parallel semantic subspaces simultaneously.
An investigation into the mathematical foundations of Generative AI, focusing on Inverse Diffusion processes and Latent Space formalization for image and text synthesis.
A technical analysis of the Causal Mask, the structural constraint that enforces autoregressive generation in Decoder-only Transformers. This paper derives the mechanism's mathematical basis and its role in preventing attention leakage across future tokens.
A technical exploration of majority class reduction strategies. This paper formalizes Random Undersampling, the NearMiss heuristic suite, and Tomek Link boundary cleaning for optimizing inference in high-imbalance network traffic datasets.
A rigorous mathematical investigation into the Synthetic Minority Over-sampling Technique (SMOTE). This paper details the k-NN selection process and the geometric foundations of linear interpolation used to expand decision boundaries in imbalanced datasets.
A technical formalization of the Scaled Dot-Product Attention mechanism. This paper analyzes the topological interaction between Queries, Keys, and Values, providing a step-by-step numerical derivation of the attention pipeline.
A technical investigation into Byte Pair Encoding (BPE), the subword tokenization standard for Large Language Models. This paper details the iterative transition from character-level granularity to high-density subword dictionaries.
An analytical study of Recurrent Neural Networks (RNNs), examining the mathematical mechanics of parameter sharing, temporal hidden states, and the vanishing gradient bottleneck.
A formal exploration of affine quantization mapping. This paper details the derivation of scaling factors and zero-points for converting FP32 tensors to INT8 precision while preserving structural fidelity during inference.
A comparative study between engineered standard vectors and learned embedding vectors, exploring latent feature spaces and semantic arithmetic in Deep Learning.
A rigorous mathematical analysis of the divergence between Cell State ($C_L$) and Hidden State ($h_L$) at the terminal step of Long Short-Term Memory architectures. Explores the functional roles of these states in Many-to-One and Many-to-Many topologies.
A systematic categorization of algorithmic training methodologies in Artificial Intelligence, analyzing the mathematical foundations of Supervised, Unsupervised, and Reinforcement Learning.
A rigorous mathematical analysis of dimensionality reduction in Deep Learning, exploring the formal mechanics of Max, Average, and Global Pooling across 1D, 2D, and 3D architectures.
A deep dive comparing LoRA and QLoRA, analyzing their mathematical mechanics, memory constraints, and how they democratize LLM fine-tuning.
A rigorous mathematical and architectural analysis of Distributed Data Parallel (DDP) in PyTorch. Explores the GIL bottlenecks of legacy systems and the efficiency of the Multi-process Ring All-Reduce topology.
A rigorous mathematical exploration of convolutional operations in 1D, 2D, and 3D spaces. Analyzing receptive field dynamics, computational complexity, and dimensionality mapping in deep neural networks.
An analysis of memory management strategies for resource-constrained Edge AI devices, focusing on the mechanics of Python Generators and the 'yield' primitive.