How Scope Analyzes a Codebase in 2 Minutes

10 min read

When you connect a GitHub repo — or sync a local codebase via scope_sync through MCP — Scope takes about 2 minutes to build a complete structural model of your codebase — entities, relationships, endpoints, conventions, and domain architecture. Here's how the 5-layer pipeline works under the hood.

The pipeline at a glance

Scope's codebase analyzer runs five layers, each building on the previous:

  1. Layer 1: Tree-sitter AST extraction (free, deterministic)
  2. Layer 2: Code graph + PageRank (free, deterministic)
  3. Layer 3: Schema-driven extraction (free, deterministic)
  4. Layer 4: LLM semantic interpretation (AI-powered)
  5. Layer 5: Domain intelligence (AI-powered, optional)

The first three layers are completely deterministic — no AI costs, no variability. The last two use Claude Sonnet for semantic understanding. This hybrid approach keeps costs low while producing rich, accurate results.

Layer 1: Tree-sitter AST extraction

Tree-sitter parses source files into abstract syntax trees without executing any code. Scope extracts:

  • Classes and methods — with parameter types and return types
  • Functions — standalone and exported
  • Route definitions — framework-specific patterns (Express, Rails, FastAPI, etc.)
  • Import graphs — which files depend on which

This works across languages: TypeScript, Python, Ruby, Go, Rust, Java, and more. The output is a structured map of every symbol in the codebase.

Layer 2: Code graph + PageRank

The import graph from Layer 1 becomes a dependency graph. Scope runs PageRank on this graph to identify the most important symbols — the "hubs" of your codebase.

A file that's imported by 20 other files ranks higher than a utility imported by 2. This ranking helps Layer 4 focus LLM analysis on the code that matters most. The top 50 ranked symbols get priority in the LLM prompt.

Layer 3: Schema-driven extraction

This is the ground truth layer. Scope has dedicated parsers for:

  • Rails schema.rb — tables, columns, types, indices, foreign keys
  • Prisma schemas — models, fields, relations, enums
  • GraphQL schemas — types, queries, mutations, subscriptions
  • SQL migrations — DDL statements for any framework

Schema-extracted entities are treated as ground truth. When the LLM in Layer 4 produces its interpretation, a post-merge step guarantees that every schema-derived entity and field is present in the final output. The LLM can add behavioral context, but it cannot override schema facts.

Layer 4: LLM semantic interpretation

Layers 1–3 tell us what exists. Layer 4 tells us what it means. Claude Sonnet receives:

  • The tree-sitter extraction (classes, functions, routes)
  • The top-50 PageRank symbols
  • The schema entities and associations
  • File contents for the most important files

The LLM produces:

  • Business logic descriptions — what each entity does in domain terms
  • User flows — how users interact with the system
  • Tech stack summary — framework choices with reasoning
  • Conventions — naming patterns, file organization, API styles

After LLM output, merge_schema_entities() runs to ensure all schema-derived entities are preserved. The LLM enriches — it never overrides ground truth.

Layer 5: Domain intelligence

When a codebase has 3 or more entities, Layer 5 runs an additional LLM pass to produce:

  • Domain groupings — which entities belong to which bounded context
  • Architectural patterns — repository pattern, service layer, event sourcing, etc.
  • Key files — the files that matter most for understanding the codebase

This layer is optional and only runs when there's enough complexity to warrant it.

File fetching: what gets analyzed

Scope doesn't download your entire repo. It uses a priority-based file selection system:

  • Priority 0: Manifests — package.json, Cargo.toml, schema.prisma
  • Priority 1: Schema files — schema.rb, migrations
  • Priority 2: Model files
  • Priority 3: API and business logic — routes, controllers, services
  • Priority 4: Frontend — pages, components
  • Priority 5–7: Config, docs, everything else

Maximum 500 files, 100KB per file, 1.5MB total. Files are fetched in parallel (20 concurrent) and processed in memory — no code is persisted to Scope's database.

The output

After ~2 minutes, you get a structured model of your codebase:

  • Every entity with fields, types, and relationships
  • Every endpoint with methods, paths, and connected entities
  • User flows with step-by-step breakdowns
  • Tech stack with framework details
  • Naming conventions, file organization, and patterns

This model is stored as vectors in Qdrant and served via MCP to any AI coding tool. When Claude Code calls get_context(scope: "entities"), it gets this pre-analyzed output — not raw file contents.

Try it

Connect a GitHub repo or sync your local codebase via MCP at within-scope.com and see what Scope finds. The analysis runs once, and the structured context is available to every AI tool in your workflow.