Pinecone is the leading vector database for AI applications. This tutorial shows you how to build a searchable knowledge base from web content.

Architecture Overview

URLs → Tryb Read → Chunk → OpenAI Embed → Pinecone Store → Query

Setup

npm install @pinecone-database/pinecone openai

Initialize Clients

import { Pinecone } from '@pinecone-database/pinecone';
import OpenAI from 'openai';

const pinecone = new Pinecone();
const openai = new OpenAI();
const index = pinecone.Index('web-content');

Ingest Web Content

async function ingestUrl(url: string) {
  // 1. Fetch clean content
  const response = await fetch('https://api.tryb.dev/v1/read', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.TRYB_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ url })
  });
  const { data } = await response.json();
  
  // 2. Chunk by paragraphs
  const chunks = data.markdown.split('\n\n').filter(c => c.length > 50);
  
  // 3. Generate embeddings
  const embeddings = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: chunks
  });
  
  // 4. Store in Pinecone
  const vectors = chunks.map((chunk, i) => ({
    id: `${url}-${i}`,
    values: embeddings.data[i].embedding,
    metadata: { url, title: data.title, content: chunk }
  }));
  
  await index.upsert(vectors);
}

Query the Knowledge Base

async function search(query: string) {
  const embedding = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: query
  });
  
  const results = await index.query({
    vector: embedding.data[0].embedding,
    topK: 5,
    includeMetadata: true
  });
  
  return results.matches.map(m => ({
    content: m.metadata?.content,
    url: m.metadata?.url,
    score: m.score
  }));
}

Use with LLM

async function answerQuestion(question: string) {
  const context = await search(question);
  
  const response = await openai.chat.completions.create({
    model: 'gpt-4-turbo-preview',
    messages: [
      { 
        role: 'system', 
        content: `Answer based on this context:\n${context.map(c => c.content).join('\n')}`
      },
      { role: 'user', content: question }
    ]
  });
  
  return response.choices[0].message.content;
}

Related Guides

Ingest Web Content

async function ingestUrl(url: string) { // 1. Fetch clean content const response = await fetch('https://api.tryb.dev/v1/read', { method: 'POST', headers: { 'Authorization': `Bearer ${process.env.TRYB_API_KEY}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ url }) }); const { data } = await response.json(); // 2. Chunk by paragraphs const chunks = data.markdown.split('\n\n').filter(c => c.length > 50); // 3. Generate embeddings const embeddings = await openai.embeddings.create({ model: 'text-embedding-3-small', input: chunks }); // 4. Store in Pinecone const vectors = chunks.map((chunk, i) => ({ id: `${url}-${i}`, values: embeddings.data[i].embedding, metadata: { url, title: data.title, content: chunk } })); await index.upsert(vectors); }

Query the Knowledge Base

async function search(query: string) { const embedding = await openai.embeddings.create({ model: 'text-embedding-3-small', input: query }); const results = await index.query({ vector: embedding.data[0].embedding, topK: 5, includeMetadata: true }); return results.matches.map(m => ({ content: m.metadata?.content, url: m.metadata?.url, score: m.score })); }

Use with LLM

async function answerQuestion(question: string) { const context = await search(question); const response = await openai.chat.completions.create({ model: 'gpt-4-turbo-preview', messages: [ { role: 'system', content: `Answer based on this context:\n${context.map(c => c.content).join('\n')}` }, { role: 'user', content: question } ] }); return response.choices[0].message.content; }

Pinecone + Web Content: Build a Knowledge Base

Architecture Overview

Setup

Initialize Clients

Ingest Web Content

Query the Knowledge Base

Use with LLM

Related Guides

Related Articles

RAG Pipeline: Ingesting Web Content at Scale

LLM Context Window Optimization: Stop Wasting Tokens on HTML

Building Web-Aware AI Agents: A Complete Guide

Ready to Give Your AI Eyes?

Pinecone + Web Content: Build a Knowledge Base

Architecture Overview

Setup

Initialize Clients

Ingest Web Content

Query the Knowledge Base

Use with LLM

Related Guides

Related Articles

RAG Pipeline: Ingesting Web Content at Scale

LLM Context Window Optimization: Stop Wasting Tokens on HTML

Building Web-Aware AI Agents: A Complete Guide

Ready to Give Your AI Eyes?