Honglei Xie

Recent Posts

  • April 04, 2026

    Accelerating LLM Inference - Speculative Decoding

    LLM are incredibly capable, but they are notoriously slow to run. Because they generate text one token at a time, decoding $K$ tokens requires $K$ serial runs of the model. Speculative Decoding is a novel algorithm that breaks this bottleneck, all...

  • February 15, 2026

    Preference Leakage: Contamination in LLM as a Judge

    Modern development relies on two pillars of efficiency and scalability: Synthetic Data Generation and Automated Evaluations as shown in the the pipeline which is now the industry standard for aligning and benchmarking models. Recently I explore...

  • February 06, 2026

    LayoutLM vs. LLMs + OCR: When Specialized Models Still Win

    Over the last a couple of years, I’ve seen more and more document pipelines quietly converge on the same pattern: OCR a PDF, dump everything into a large language model, and hope the model figures out the rest.And to be fair — sometimes it works r...

  • January 25, 2026

    Transformer -- Decoder-Only Model Explained In Codes

    This is the very first post of many Transformer series posts. Keeping track of my own learning notes.Two types of model classesI will mainly use transformers library from HuggingFace. The transformers library has two types of model classes: AutoM...

  • January 19, 2026

    Floating Point Computations Errors

    Floating-point computations are well-known for their susceptibility to round-off errors. In this post, I aim to document a couple of scenarios where these errors occur and explore potential workarounds where applicable.Scenario 1When the component...