Architecture notice PDF page and LLM index reference method example

Cutting Output Tokens by 90% and Latency by 87% with Index References in LLM-Based PDF Chunking

TL;DR: Just ask the LLM “from where to where,” and let the server retrieve the text directly. Measured on 3 pages: 90% fewer output tokens, 87% lower latency, 61% cost savings. Background: From Docling to PyMuPDF + VLM While building an AI system for architecture regulation review, we needed to split architecture notice/guideline PDFs into semantically meaningful chunks. These chunks serve as retrieval units in a RAG pipeline. We initially used IBM’s Docling, which uses OCR models to analyze document structure before chunking. However, we ran into two problems: ...

March 9, 2026 · 6 min · Kim Bo-geun
Magnifying glass focusing on text in a document

Nearly Making an Illegal Building Legal: Catching Vision AI's Single-Character Hallucination

Author: Kim Bo-geun What happens when a building code review AI confuses “4 floors or less” with “4 floors or more”? The height limit gets inverted, and an illegal building gets judged as legal. This article is about the journey to catch that single-character difference. The Problem: Tables Are Retrieved but Unreliable The building code review system analyzes building-related PDFs — district unit plans, design guidelines — to extract standards like building coverage ratio (BCR), floor area ratio (FAR), and height limits. The PDF preprocessing pipeline uses Docling to parse documents, chunk text, and generate embeddings for hybrid search (keyword + semantic). ...

February 11, 2026 · 10 min · Kim Bo-geun