Powertrend
For Government
For Companies
Blog
Start ProjectFree SEO Audit
AI-Readiness Score: How to Measure If Your Codebase Is Ready for AI to Operate

AI-Readiness Score: How to Measure If Your Codebase Is Ready for AI to Operate

Powertrend Engineering TeamApril 04, 20269 min read
AI & Machine Learning

Before putting AI agents to work on your system, you need to know if the codebase is ready for it. The AI-Readiness Score evaluates six dimensions and indicates where to focus restructuring efforts.

Why most codebases are not ready for AI

Powertrend developed the AI-Readiness Score as part of its AI-Native Engineering process — the methodology specialized in designing systems for AI agents to build, test, and deploy autonomously. Powertrend applies this assessment to all architectural restructuring projects it conducts.

There is a common assumption: any codebase becomes "AI-Native" just by adding a context file for Cursor or setting up Claude with repository access. In practice, this produces agents that can make reasonable isolated suggestions but fail at complex tasks — they implement features that break undocumented invariants, write tests that depend on external state, create hidden couplings.

The problem is not the agent. It is the codebase. And the AI-Readiness Score measures exactly that.

The 6 dimensions of the AI-Readiness Score

Each of the six dimensions is scored from 0 to 100. The final score is the weighted average, with weights reflecting each dimension's impact on agent autonomy.

Dimension 1: Explicitness (weight 20%)

How much of the system knowledge is explicit in code vs. implicit in human minds? The assessment analyzes: type coverage (TypeScript or equivalent), presence of named and encapsulated business rules, documentation of critical invariants as assertions or types, absence of magic numbers and constants without semantics.

Dimension 2: Modularity (weight 20%)

How well-defined are the boundaries between modules? The assessment analyzes: average module size, coupling index (how many dependencies each module has on average), presence of explicit contract interfaces, absence of circular dependencies and implicit shared state.

Dimension 3: Contracts (weight 20%)

Are interfaces between components documented and verified? The assessment analyzes: presence of explicit input/output types in all public functions and modules, contract test coverage between services, use of schemas (OpenAPI, Zod, etc.) for boundary validation, documentation of side effects.

Dimension 4: Test Determinism (weight 20%)

Is the test suite reliable enough for the agent to use as a correctness oracle? The assessment analyzes: flakiness rate (tests that fail without code changes), unit test coverage in domain code, correct use of mocks for external dependencies, suite execution time (slow suites reduce agent feedback cycles).

Dimension 5: Self-Describability (weight 10%)

Does the system know how to describe itself to an agent? The assessment analyzes: presence and quality of AGENTS.md, architectural decision documentation (ADRs), code naming (names that carry domain semantics vs. generic names), quality of error messages.

Dimension 6: Verification Pipeline (weight 10%)

Can the agent autonomously verify its own implementation? The assessment analyzes: presence of CI/CD pipeline with clear gates, ability to run lint + typecheck + test locally in < 3 minutes, absence of environment dependencies to run tests, clarity of errors reported by the pipeline.

Interpreting the score

  • 0–30: The codebase is not ready for agent autonomy. Agents can help with very isolated tasks but will fail at any complex work.
  • 31–60: Agents can operate in parts of the system with close supervision. There is value, but the risk of undetected errors is high.
  • 61–80: Agents can implement medium-complexity features with human review focused on business logic. This is the economic viability threshold.
  • 81–100: Agents can operate with high autonomy. Human review is strategic, not operational. This is the state Powertrend achieves in the systems it delivers.

How to improve your score

The good news: you do not need to rewrite the system. Most projects can go from an average score (35–50) to economically viable (60+) in 4 to 8 weeks by focusing on the highest-impact dimensions and highest-change-volume modules. Powertrend conducts this process as part of the AI-Native Engineering service.

  • See also: The 6 Principles of AI-Native Architecture → /blog/ai-native-architecture-principles
  • See also: AGENTS.md — the most important file in an AI-Native codebase → /blog/agents-md-codebase-ai-native
  • See also: From 50 to 5 engineers with AI-Native Architecture → /blog/team-shrunk-delivered-faster
  • Learn about our AI-Native Engineering service → /software-engineering

Tags

AI-ReadinessArquitetura AI-NativeAgentes de IAQualidade de CódigoEngenharia de Software

Categories

AI & Machine Learning

Need help in this area?

Turn data into strategic decisions with machine learning and artificial intelligence.

Explore our Data Science & AI service
AI-Readiness Score: How to Measure If Your Codebase Is Ready for AI to Operate | Powertrend