Recent Posts
-
April 04, 2026
Accelerating LLM Inference - Speculative Decoding
LLM are incredibly capable, but they are notoriously slow to run. Because they generate text one token at a time, decoding $K$ tokens requires $K$ serial runs of the model. Speculative Decoding is a novel algorithm that breaks this bottleneck, all...
-
February 15, 2026
Preference Leakage: Contamination in LLM as a Judge
Modern development relies on two pillars of efficiency and scalability: Synthetic Data Generation and Automated Evaluations as shown in the the pipeline which is now the industry standard for aligning and benchmarking models. Recently I explore...
-
February 06, 2026
LayoutLM vs. LLMs + OCR: When Specialized Models Still Win
Over the last a couple of years, I’ve seen more and more document pipelines quietly converge on the same pattern: OCR a PDF, dump everything into a large language model, and hope the model figures out the rest.And to be fair — sometimes it works r...
-
January 25, 2026
Transformer -- Decoder-Only Model Explained In Codes
This is the very first post of many Transformer series posts. Keeping track of my own learning notes.Two types of model classesI will mainly use transformers library from HuggingFace. The transformers library has two types of model classes: AutoM...
-
January 19, 2026
Floating Point Computations Errors
Floating-point computations are well-known for their susceptibility to round-off errors. In this post, I aim to document a couple of scenarios where these errors occur and explore potential workarounds where applicable.Scenario 1When the component...