Pinecone + Web Content: Build a Knowledge Base
Create a vector database of web content for semantic search. Step-by-step Pinecone integration guide.
Alex Rivera
ML Engineer

Pinecone is the leading vector database for AI applications. This tutorial shows you how to build a searchable knowledge base from web content.
Architecture Overview
URLs → Tryb Read → Chunk → OpenAI Embed → Pinecone Store → Query
Setup
npm install @pinecone-database/pinecone openai
Initialize Clients
import { Pinecone } from '@pinecone-database/pinecone';
import OpenAI from 'openai';
const pinecone = new Pinecone();
const openai = new OpenAI();
const index = pinecone.Index('web-content');
Ingest Web Content
async function ingestUrl(url: string) {
// 1. Fetch clean content
const response = await fetch('https://api.tryb.dev/v1/read', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.TRYB_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ url })
});
const { data } = await response.json();
// 2. Chunk by paragraphs
const chunks = data.markdown.split('\n\n').filter(c => c.length > 50);
// 3. Generate embeddings
const embeddings = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: chunks
});
// 4. Store in Pinecone
const vectors = chunks.map((chunk, i) => ({
id: `${url}-${i}`,
values: embeddings.data[i].embedding,
metadata: { url, title: data.title, content: chunk }
}));
await index.upsert(vectors);
}
Query the Knowledge Base
async function search(query: string) {
const embedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: query
});
const results = await index.query({
vector: embedding.data[0].embedding,
topK: 5,
includeMetadata: true
});
return results.matches.map(m => ({
content: m.metadata?.content,
url: m.metadata?.url,
score: m.score
}));
}
Use with LLM
async function answerQuestion(question: string) {
const context = await search(question);
const response = await openai.chat.completions.create({
model: 'gpt-4-turbo-preview',
messages: [
{
role: 'system',
content: `Answer based on this context:\n${context.map(c => c.content).join('\n')}`
},
{ role: 'user', content: question }
]
});
return response.choices[0].message.content;
}
Related Guides

Alex Rivera
ML Engineer at Tryb
Alex builds vector search systems.


