Code context API for AI agents.
Embedding-quality retrieval from any repo — no vector DB, no indexing.
Just run npx contextpacker <repo> "query" or call the API.
Give an agent the right files, it answers correctly. Give it the wrong files, it hallucinates. Compare 3 ways to pick those files.
Raw response
You control the access
We fetch your files over HTTPS (same as your IDE), read them, and delete them. Nothing is stored. Use a read-only PAT that expires in 1 hour — revoke it after testing.
curl https://contextpacker.com/v1/packs \
-H "X-API-Key: $CP_KEY" \
-H "Content-Type: application/json" \
-d '{
"repo_url": "https://github.com/your-org/private",
"query": "How does auth work?",
"vcs": {
"provider": "github",
"token": "'"$GITHUB_PAT"'"
}
}'
Benchmarked on real repos
177 questions across 14 repos including private codebases. Same ~6000 token budget.
| Approach | Hit@10 | NDCG | Answer Quality | Setup |
|---|---|---|---|---|
| No context | — | — | 4.9/10 | None |
| Embeddings RAG | 98% | 0.79 | 8.6/10 | Vector DB |
| ContextPacker | 98% | 0.92 | 8.5/10 | None |
Tested on private repos (zero LLM prior). Same answer quality as embeddings, +13% better file ranking, zero infrastructure.
Why not build your own RAG?
You could. Here's what that takes:
No pre-indexing
Embeddings need you to index first, then wait. We work on first call — just pass the repo URL.
No vector DB
Pinecone, Weaviate, Chroma — pick one, pay for it, keep it in sync. Or just call our API.
Structural awareness
We read the file tree. We know src/ from tests/, entrypoints from utils.
We're infrastructure, not a product. No dashboards, no onboarding. Just an API.
Under the hood
Clone + index on-the-fly
We shallow-clone your repo, build a file tree with symbol extraction, delete it after. Nothing stored.
LLM picks the files
A fast model (Gemini Flash) reads the tree + your query and selects ~8 files. Smarter than cosine similarity.
Pack to your token budget
We fit files into your limit. Large files get truncated intelligently, not dropped entirely.
Zero indexing. One HTTP call.
Just call the API with a repo URL and a question. No onboarding, no sync, no CLI.
- ✓ Returns standard Markdown
- ✓ Auto-caching built in
- ✓ Works with any LLM
- ✓ Adjustable token budget
resp = httpx.post(
"https://contextpacker.com/v1/packs",
headers={"X-API-Key": KEY},
json={
"repo_url": "https://github.com/pallets/flask",
"query": "Where is session handling?",
"max_tokens": 6000,
}
)
context = resp.json()["markdown"]
# Feed to your LLM
answer = openai.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "user",
"content": f"{context}\n\nQuestion: {query}"
}]
)
Simple pricing
No subscriptions. Just credits.
1 credit = 1 API call. Top up anytime.