LLM Context Window Optimization: Stop Wasting Tokens on HTML
Raw HTML wastes 60-80% of your LLM context on junk. Here's how to extract only the content that matters.
Alex Rivera
ML Engineer

Every token counts. When you feed raw HTML to an LLM, you're paying for navigation menus, cookie notices, and tracking scripts. Here's how to extract only what matters.
The Token Waste Problem
Analyze any modern webpage and you'll find:
- 20-30%: Actual article content
- 25-35%: Navigation and header/footer
- 15-25%: Ads and promotional content
- 10-20%: Scripts, styles, and metadata
- 5-10%: Cookie banners and popups
A 10,000 token page might contain only 2,500 tokens of useful content. At GPT-4 pricing, you're wasting $0.075 per page on junk.
Content Extraction Strategies
Strategy 1: DOM-Based Extraction
Simple but unreliable. Look for content in <article>, <main>, or specific CSS classes:
const content = document.querySelector('article')?.textContent;
Problem: Different sites use different structures. Maintenance nightmare.
Strategy 2: Readability Algorithms
Mozilla's Readability and similar tools score DOM elements by text density:
import { Readability } from "@mozilla/readability";
const article = new Readability(document).parse();
Problem: Still includes some junk, doesn't understand semantic importance.
Strategy 3: AI-Powered Extraction (Tryb)
Use an LLM to identify and extract only the main content:
const { markdown } = await tryb.read(url, { clean_with_ai: true });
Benefit: 95%+ accuracy, understands context, outputs clean Markdown.
Before/After Comparison
| Metric | Raw HTML | Tryb Cleaned |
|---|---|---|
| Token count | 12,450 | 2,890 |
| Content ratio | 23% | 98% |
| GPT-4 cost | $0.12 | $0.03 |
| Response quality | Poor (confused by junk) | Excellent |
RAG Pipeline Integration
For RAG systems, clean extraction is even more critical. Junk content pollutes your vector database and degrades retrieval quality.
// Optimal RAG ingestion pipeline
async function ingestUrl(url: string) {
// 1. Extract clean content
const { markdown, title } = await tryb.read(url);
// 2. Chunk intelligently (by section)
const chunks = chunkByHeadings(markdown);
// 3. Embed and store
for (const chunk of chunks) {
const embedding = await openai.embeddings.create(chunk);
await vectorDb.insert({ url, title, chunk, embedding });
}
}
Start Optimizing Today
Try the Tryb Playground to see the difference clean extraction makes for your LLM applications.

Alex Rivera
ML Engineer at Tryb
Alex specializes in LLM optimization and RAG systems. Former research engineer at Anthropic.


