Tryb
Agents
APIPlayground
  1. Home
  2. Blog
  3. LLM & RAG
  4. Pinecone + Web Content: Build a Knowledge Base
LLM & RAG
Oct 18, 202410 min read

Pinecone + Web Content: Build a Knowledge Base

Create a vector database of web content for semantic search. Step-by-step Pinecone integration guide.

Alex Rivera

Alex Rivera

ML Engineer

Pinecone + Web Content: Build a Knowledge Base

Pinecone is the leading vector database for AI applications. This tutorial shows you how to build a searchable knowledge base from web content.

Architecture Overview

URLs → Tryb Read → Chunk → OpenAI Embed → Pinecone Store → Query

Setup

npm install @pinecone-database/pinecone openai

Initialize Clients

import { Pinecone } from '@pinecone-database/pinecone';
import OpenAI from 'openai';

const pinecone = new Pinecone();
const openai = new OpenAI();
const index = pinecone.Index('web-content');

Ingest Web Content

async function ingestUrl(url: string) {
  // 1. Fetch clean content
  const response = await fetch('https://api.tryb.dev/v1/read', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.TRYB_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ url })
  });
  const { data } = await response.json();
  
  // 2. Chunk by paragraphs
  const chunks = data.markdown.split('\n\n').filter(c => c.length > 50);
  
  // 3. Generate embeddings
  const embeddings = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: chunks
  });
  
  // 4. Store in Pinecone
  const vectors = chunks.map((chunk, i) => ({
    id: `${url}-${i}`,
    values: embeddings.data[i].embedding,
    metadata: { url, title: data.title, content: chunk }
  }));
  
  await index.upsert(vectors);
}

Query the Knowledge Base

async function search(query: string) {
  const embedding = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: query
  });
  
  const results = await index.query({
    vector: embedding.data[0].embedding,
    topK: 5,
    includeMetadata: true
  });
  
  return results.matches.map(m => ({
    content: m.metadata?.content,
    url: m.metadata?.url,
    score: m.score
  }));
}

Use with LLM

async function answerQuestion(question: string) {
  const context = await search(question);
  
  const response = await openai.chat.completions.create({
    model: 'gpt-4-turbo-preview',
    messages: [
      { 
        role: 'system', 
        content: `Answer based on this context:\n${context.map(c => c.content).join('\n')}`
      },
      { role: 'user', content: question }
    ]
  });
  
  return response.choices[0].message.content;
}

Related Guides

  • RAG Pipeline Guide
  • LLM Context Optimization
PineconeVector DatabaseRAGTutorial
Alex Rivera

Alex Rivera

ML Engineer at Tryb

Alex builds vector search systems.

Related Articles

RAG Pipeline: Ingesting Web Content at Scale
LLM & RAG

RAG Pipeline: Ingesting Web Content at Scale

9 min read

LLM Context Window Optimization: Stop Wasting Tokens on HTML
LLM & RAG

LLM Context Window Optimization: Stop Wasting Tokens on HTML

7 min read

Building Web-Aware AI Agents: A Complete Guide
Tutorials

Building Web-Aware AI Agents: A Complete Guide

10 min read

Ready to Give Your AI Eyes?

Start scraping any website in seconds. Get 100 free credits when you sign up.

Tryb

The Universal Reader for AI Agents.

Product

  • Agents
  • Industry
  • API Reference
  • Dashboard

Company

  • About
  • Blog
  • Careers
  • Contact
  • Private Sector

Legal

  • Privacy
  • Terms
  • Security

© 2025 Tryb. All rights reserved.

TwitterGitHubDiscord