Code context API for AI agents.
Embedding-quality retrieval from any repo — no vector DB, no indexing. Just call the API.
Built for AI coding agents · IDE extensions · PR review bots · documentation generators
Give an agent the right files, it answers correctly. Give it the wrong files, it hallucinates. Compare 3 ways to pick those files.
Raw response
You control the access
We fetch your files over HTTPS (same as your IDE), read them, and delete them. Nothing is stored. Use a read-only PAT that expires in 1 hour — revoke it after testing.
curl https://contextpacker.com/v1/packs \
-H "X-API-Key: $CP_KEY" \
-H "Content-Type: application/json" \
-d '{
"repo_url": "https://github.com/your-org/private",
"query": "How does auth work?",
"vcs": {
"provider": "github",
"token": "'"$GITHUB_PAT"'"
}
}'
Benchmarked on real repos
177 questions across 14 repos including private codebases. Same ~6000 token budget.
| Approach | Hit@10 | NDCG | Answer Quality | Setup |
|---|---|---|---|---|
| No context | — | — | 4.9/10 | None |
| Embeddings RAG | 98% | 0.79 | 8.6/10 | Vector DB |
| ContextPacker | 98% | 0.92 | 8.5/10 | None |
Tested on private repos (zero LLM prior). Same answer quality as embeddings, +13% better file ranking, zero infrastructure.
Why not just use embeddings?
You could. But here's what you'd need to build and maintain:
No pre-indexing
Embeddings need you to index first, then wait. We work on first call — just pass the repo URL.
No vector DB
Pinecone, Weaviate, Chroma — pick one, pay for it, keep it in sync. Or just call our API.
Structural awareness
We read the file tree. We know src/ from tests/, entrypoints from utils.
We're a building block, not a platform. Skip the RAG pipeline — just call the API.
Under the hood
Clone + index on-the-fly
We shallow-clone your repo, build a file tree with symbol extraction, delete it after. No persistence.
LLM picks the files
A fast model (Gemini Flash) reads the tree + your query and selects ~8 files. Smarter than cosine similarity.
Pack to your token budget
We fit files into your limit. Large files get truncated intelligently, not dropped entirely.
Zero indexing. One HTTP call.
Just call the API with a repo URL and a question. No onboarding, no sync, no CLI.
- ✓ Returns standard Markdown
- ✓ Auto-caching built in
- ✓ Works with any LLM
- ✓ Adjustable token budget
resp = httpx.post(
"https://contextpacker.com/v1/packs",
headers={"X-API-Key": KEY},
json={
"repo_url": "https://github.com/pallets/flask",
"query": "Where is session handling?",
"max_tokens": 6000,
}
)
context = resp.json()["markdown"]
# Feed to your LLM
answer = openai.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "user",
"content": f"{context}\n\nQuestion: {query}"
}]
)
Simple pricing
No subscriptions. Just credits.
1 credit = 1 API call. Top up anytime.