Tryb
Agents
APIPlayground
  1. Home
  2. Blog
  3. LLM & RAG
  4. LLM Context Window Optimization: Stop Wasting Tokens on HTML
LLM & RAG
Dec 10, 20247 min read

LLM Context Window Optimization: Stop Wasting Tokens on HTML

Raw HTML wastes 60-80% of your LLM context on junk. Here's how to extract only the content that matters.

Alex Rivera

Alex Rivera

ML Engineer

LLM Context Window Optimization: Stop Wasting Tokens on HTML

Every token counts. When you feed raw HTML to an LLM, you're paying for navigation menus, cookie notices, and tracking scripts. Here's how to extract only what matters.

The Token Waste Problem

Analyze any modern webpage and you'll find:

  • 20-30%: Actual article content
  • 25-35%: Navigation and header/footer
  • 15-25%: Ads and promotional content
  • 10-20%: Scripts, styles, and metadata
  • 5-10%: Cookie banners and popups

A 10,000 token page might contain only 2,500 tokens of useful content. At GPT-4 pricing, you're wasting $0.075 per page on junk.

Content Extraction Strategies

Strategy 1: DOM-Based Extraction

Simple but unreliable. Look for content in <article>, <main>, or specific CSS classes:

const content = document.querySelector('article')?.textContent;

Problem: Different sites use different structures. Maintenance nightmare.

Strategy 2: Readability Algorithms

Mozilla's Readability and similar tools score DOM elements by text density:

import { Readability } from "@mozilla/readability";
const article = new Readability(document).parse();

Problem: Still includes some junk, doesn't understand semantic importance.

Strategy 3: AI-Powered Extraction (Tryb)

Use an LLM to identify and extract only the main content:

const { markdown } = await tryb.read(url, { clean_with_ai: true });

Benefit: 95%+ accuracy, understands context, outputs clean Markdown.

Before/After Comparison

MetricRaw HTMLTryb Cleaned
Token count12,4502,890
Content ratio23%98%
GPT-4 cost$0.12$0.03
Response qualityPoor (confused by junk)Excellent

RAG Pipeline Integration

For RAG systems, clean extraction is even more critical. Junk content pollutes your vector database and degrades retrieval quality.

// Optimal RAG ingestion pipeline
async function ingestUrl(url: string) {
  // 1. Extract clean content
  const { markdown, title } = await tryb.read(url);
  
  // 2. Chunk intelligently (by section)
  const chunks = chunkByHeadings(markdown);
  
  // 3. Embed and store
  for (const chunk of chunks) {
    const embedding = await openai.embeddings.create(chunk);
    await vectorDb.insert({ url, title, chunk, embedding });
  }
}

Start Optimizing Today

Try the Tryb Playground to see the difference clean extraction makes for your LLM applications.

LLMRAGOptimizationTokens
Alex Rivera

Alex Rivera

ML Engineer at Tryb

Alex specializes in LLM optimization and RAG systems. Former research engineer at Anthropic.

Related Articles

RAG Pipeline: Ingesting Web Content at Scale
LLM & RAG

RAG Pipeline: Ingesting Web Content at Scale

9 min read

Why AI Agents Can't See the Web (And How to Fix It)
AI Agents

Why AI Agents Can't See the Web (And How to Fix It)

6 min read

Building Web-Aware AI Agents: A Complete Guide
Tutorials

Building Web-Aware AI Agents: A Complete Guide

10 min read

Ready to Give Your AI Eyes?

Start scraping any website in seconds. Get 100 free credits when you sign up.

Tryb

The Universal Reader for AI Agents.

Product

  • Agents
  • Industry
  • API Reference
  • Dashboard

Company

  • About
  • Blog
  • Careers
  • Contact
  • Private Sector

Legal

  • Privacy
  • Terms
  • Security

© 2025 Tryb. All rights reserved.

TwitterGitHubDiscord