artificial-intelligence

The 5 Reasons AI Tools Slowed Developers Down. All 5 Hit Harder in .NET.

Last week I spent 40 minutes reviewing a single service class that Claude Code had generated in about 90 seconds. The code compiled. The tests passed. I found three things that would have caused production issues. By the time I was done, I had spent more time reviewing that class than I would have spent writing it without AI.

That ratio felt wrong. But it’s not random.

In July 2025, METR published a randomized controlled trial on AI’s impact on developer productivity. Not a survey, not self-reported estimates. An actual RCT, the same methodology used in clinical drug trials. 16 experienced developers. 246 tasks in mature open-source repositories. Each task randomly assigned to either allow or disallow AI tools.

The result: developers using AI took 19% longer to complete their tasks. And after the study ended, those developers still believed AI had helped them. They estimated a 20% speedup. They had actually gotten a 19% slowdown.

The researchers didn’t just publish the number. They identified 5 specific factors that likely caused it. I’ve been thinking about those 5 factors for months, because every single one of them applies with extra force to .NET. This is my attempt to explain why.

What METR actually found

The study included developers averaging 5 years of experience on their specific repositories. They worked on projects with over 22,000 GitHub stars and more than a million lines of code. When AI was allowed, they primarily used Cursor Pro with Claude 3.5/3.7 Sonnet.

The researchers investigated 20 potential causes for the slowdown. They found evidence that 5 likely contributed:

  1. Over-optimism about AI usefulness
  2. High developer familiarity with their own repositories
  3. Large and complex repositories
  4. Low AI reliability (developers accepted less than 44% of generated suggestions)
  5. Implicit repository context (AI didn’t understand the environment it was operating in)

Each of these has a .NET-specific amplifier. Let me go through them.

Factor 1: Over-optimism, and why .NET makes it structural

METR found developers predicted a 24% speedup, then after completing the study still estimated a 20% speedup, despite actually being slower. That gap between perceived and actual productivity is the most interesting finding in the paper.

In most ecosystems, over-optimism about AI output eventually corrects. You run the Python code, the runtime throws, you find the bug. You run the TypeScript, the test fails, you fix it.

In C#, the compiler corrects you first. And when the compiler says green, your brain says done.

That’s the problem. The C# compiler is genuinely excellent at catching errors that would be silent failures in dynamically typed languages. So .NET developers have trained themselves to trust compilation as validation. It’s a reasonable habit. The compiler catches a lot.

But there’s an entire class of .NET bugs the compiler cannot see:

  • A scoped DbContext injected into a singleton: compiles, works in dev, throws in production
  • A LINQ query that EF Core 10 translates to a full table scan: compiles, returns correct results, scans 2 million rows
  • A CancellationToken declared in every method signature but never passed to FirstOrDefaultAsync: compiles, tests pass, orphaned queries under load
  • Middleware registered in the wrong order: compiles, requests succeed, audit log writes “anonymous” for authenticated users

When AI generates this code, the compiler reports no errors. The developer who has trained themselves to trust the compiler is the last person who will catch these bugs before production.

Over-optimism is a known human bias. .NET’s strong typing turns it into a structural problem.

Factor 2: High developer familiarity

METR found that the more experienced a developer was with their specific repository, the less useful AI suggestions were and the more time it took to evaluate and discard them.

This one has a particular flavor in .NET enterprise codebases. In a Python or TypeScript open-source project, the patterns are usually visible in the code itself. You can read a file and understand the conventions.

In ASP.NET Core applications, a significant portion of the architectural decisions live outside the files AI is looking at:

  • The DI registrations in Program.cs define what can be injected and at what lifetime. If AI generates a service that expects a scoped dependency, whether that's valid depends entirely on where and how that service itself is registered, which is in a file the AI may never see during generation.
  • The EF Core model configuration in DbContext defines which navigation properties load eagerly, which relationships have cascade delete, which properties have value converters. AI generates queries against entity classes without seeing these configurations.
  • Middleware order in Program.cs determines which middleware can see what from which other middleware. AI adds custom middleware without knowing the full pipeline.
  • Appsettings configuration binding: AI generates IOptions<T> injection patterns that may or may not match what's actually in the configuration hierarchy.

An experienced .NET developer holds all of this context in their head. When AI generates something that looks reasonable but conflicts with an implicit architectural decision they made three months ago, they catch it during review. That review takes time. Usually more than the 90-second generation did.

Factor 3: Large and complex repositories

METR explicitly identified large codebases as a contributing factor. The open-source projects in the study averaged over a million lines of code.

Enterprise .NET solutions regularly exceed that. Our current solution is somewhere around 800,000 lines across 40+ projects with shared domain libraries, cross-cutting infrastructure concerns, and six years of accumulated architectural decisions that don’t fully live in any single file.

There’s a specific .NET complication here beyond just size. Visual Studio solution files can include dozens of projects. Each project has its own configuration, its own test project, its own namespace conventions. Claude Code sees the files you’re working in. It doesn’t see the full dependency graph unless you explicitly provide it.

So AI generates a service class in OrderService.Domain that takes a dependency on something in OrderService.Infrastructure. That dependency direction might violate the layering architecture you've been maintaining for three years. The code compiles. The new class works. You find the violation six months later when someone tries to add a new feature and the dependency graph has become a mess.

The bigger the .NET solution, the more implicit context exists outside the files the AI has seen.

Factor 4: Low AI reliability (under 44% acceptance rate)

METR found developers in the study accepted less than 44% of AI-generated suggestions, then spent time reviewing and cleaning up the rest.

In a dynamically typed language, bad suggestions are usually obvious quickly. You reject them fast. The feedback loop is short.

In C#, a suggestion can be wrong in ways that require more than a glance to confirm. Consider an AI-generated LINQ query:

var orders = await _db.Orders
    .Include(o => o.Items)
    .Include(o => o.Customer)
    .Where(o => o.Status == OrderStatus.Pending)
    .OrderByDescending(o => o.CreatedAt)
    .ToListAsync(ct);

Is this right? It depends. Does the Items relationship have a filter on the include? Does Customer need to be included or would a projected DTO be better? Does this query hit an index on Status and CreatedAt in combination? Is there a composite index, and if so in what order? Does EF Core 10's query translation actually use it?

You can’t answer any of those questions by reading the generated code. You need to know the EF Core model configuration, the database schema, and ideally the query execution plan. Evaluating one suggestion correctly might take longer than writing the alternative yourself would have.

Multiply that across a full PR and the 44% acceptance rate starts to look optimistic.

Factor 5: Implicit repository context

This is the most structural one. METR found AI was frequently unaware of the conventions, patterns, and requirements specific to each repository. Things developers take for granted because they’ve been working there for years.

In .NET, implicit context is everywhere:

The DI container is the most obvious example. Program.cs wires everything together, but AI generating a new service class sees the class in isolation, not the registration that will follow. Whether to inject ILogger<T> or a named logger, whether a background service should use IServiceScopeFactory or receive dependencies directly, whether to implement IHostedService or inherit BackgroundService: these are often decided by existing patterns in the codebase that AI may or may not have access to.

EF Core model configuration is another. If the project uses Fluent API in IEntityTypeConfiguration<T> classes, AI might generate data annotations instead. If the project has a global query filter on every entity that isn't IsDeleted, AI-generated queries that don't account for that filter will return different results than expected.

The result is that even when AI generates syntactically correct, logically reasonable code, it frequently needs revision to align with the specific architectural decisions of the repository. And in a codebase with 40 projects and six years of history, there are a lot of those decisions.

What this means now

The METR study used early-2025 tools. Cursor Pro with Claude 3.5/3.7 Sonnet. In February 2026, METR tried to run a follow-up study with newer tools, including Claude Code and newer models. The study fell apart because developers refused to participate in the no-AI conditions. Too many developers said they wouldn’t work without AI even for $150/hour.

METR concluded that the current tools are probably faster than early-2025 tools, but they couldn’t measure it cleanly because the selection effects were too severe. The follow-up data showed something directionally positive, but they couldn’t trust it.

That’s actually useful information. The slowdown is real, the tools are improving, and the 5 factors METR identified are things you can specifically work against. You can provide the DI registration context. You can include the EF Core model configuration when generating queries. You can give the AI the full middleware pipeline when adding to it. You can stay aware that compilation success means nothing for the class of bugs above.

The developers in the METR study who got faster results with AI were the ones who had the most prior experience with the tools, specifically Cursor. One developer with 50+ hours of Cursor experience showed positive speedup while most of the others didn’t.

I don’t think it’s mainly tool familiarity. I think it’s knowing where the verification overhead lives.

Once you know that strong typing covers syntax but not DI lifetime rules, EF Core query translation, or async context behavior, you stop reviewing AI-generated .NET code like you’d review any other code. You review it like you’d review code from a developer who’s been at the company for three weeks. Technically capable, but missing four years of context you’ve accumulated about how this specific system actually behaves.

That framing changes what you look for and where you spend your review time.

The 40 minutes I spent on that service class last week: not wasted. Just different from what I expected. I know what I’m looking for now.