How Scope Analyzes a Codebase in 2 Minutes
When you connect a GitHub repo — or sync a local codebase via scope_sync through MCP — Scope takes about 2 minutes to build a complete structural model of your codebase — entities, relationships, endpoints, conventions, and domain architecture. Here's how the 5-layer pipeline works under the hood.
The pipeline at a glance
Scope's codebase analyzer runs five layers, each building on the previous:
- Layer 1: Tree-sitter AST extraction (free, deterministic)
- Layer 2: Code graph + PageRank (free, deterministic)
- Layer 3: Schema-driven extraction (free, deterministic)
- Layer 4: LLM semantic interpretation (AI-powered)
- Layer 5: Domain intelligence (AI-powered, optional)
The first three layers are completely deterministic — no AI costs, no variability. The last two use Claude Sonnet for semantic understanding. This hybrid approach keeps costs low while producing rich, accurate results.
Layer 1: Tree-sitter AST extraction
Tree-sitter parses source files into abstract syntax trees without executing any code. Scope extracts:
- Classes and methods — with parameter types and return types
- Functions — standalone and exported
- Route definitions — framework-specific patterns (Express, Rails, FastAPI, etc.)
- Import graphs — which files depend on which
This works across languages: TypeScript, Python, Ruby, Go, Rust, Java, and more. The output is a structured map of every symbol in the codebase.
Layer 2: Code graph + PageRank
The import graph from Layer 1 becomes a dependency graph. Scope runs PageRank on this graph to identify the most important symbols — the "hubs" of your codebase.
A file that's imported by 20 other files ranks higher than a utility imported by 2. This ranking helps Layer 4 focus LLM analysis on the code that matters most. The top 50 ranked symbols get priority in the LLM prompt.
Layer 3: Schema-driven extraction
This is the ground truth layer. Scope has dedicated parsers for:
- Rails
schema.rb— tables, columns, types, indices, foreign keys - Prisma schemas — models, fields, relations, enums
- GraphQL schemas — types, queries, mutations, subscriptions
- SQL migrations — DDL statements for any framework
Schema-extracted entities are treated as ground truth. When the LLM in Layer 4 produces its interpretation, a post-merge step guarantees that every schema-derived entity and field is present in the final output. The LLM can add behavioral context, but it cannot override schema facts.
Layer 4: LLM semantic interpretation
Layers 1–3 tell us what exists. Layer 4 tells us what it means. Claude Sonnet receives:
- The tree-sitter extraction (classes, functions, routes)
- The top-50 PageRank symbols
- The schema entities and associations
- File contents for the most important files
The LLM produces:
- Business logic descriptions — what each entity does in domain terms
- User flows — how users interact with the system
- Tech stack summary — framework choices with reasoning
- Conventions — naming patterns, file organization, API styles
After LLM output, merge_schema_entities() runs to ensure all schema-derived entities are preserved. The LLM enriches — it never overrides ground truth.
Layer 5: Domain intelligence
When a codebase has 3 or more entities, Layer 5 runs an additional LLM pass to produce:
- Domain groupings — which entities belong to which bounded context
- Architectural patterns — repository pattern, service layer, event sourcing, etc.
- Key files — the files that matter most for understanding the codebase
This layer is optional and only runs when there's enough complexity to warrant it.
File fetching: what gets analyzed
Scope doesn't download your entire repo. It uses a priority-based file selection system:
- Priority 0: Manifests —
package.json,Cargo.toml,schema.prisma - Priority 1: Schema files —
schema.rb, migrations - Priority 2: Model files
- Priority 3: API and business logic — routes, controllers, services
- Priority 4: Frontend — pages, components
- Priority 5–7: Config, docs, everything else
Maximum 500 files, 100KB per file, 1.5MB total. Files are fetched in parallel (20 concurrent) and processed in memory — no code is persisted to Scope's database.
The output
After ~2 minutes, you get a structured model of your codebase:
- Every entity with fields, types, and relationships
- Every endpoint with methods, paths, and connected entities
- User flows with step-by-step breakdowns
- Tech stack with framework details
- Naming conventions, file organization, and patterns
This model is stored as vectors in Qdrant and served via MCP to any AI coding tool. When Claude Code calls get_context(scope: "entities"), it gets this pre-analyzed output — not raw file contents.
Try it
Connect a GitHub repo or sync your local codebase via MCP at within-scope.com and see what Scope finds. The analysis runs once, and the structured context is available to every AI tool in your workflow.