Software development has constantly evolved alongside tools that reduce friction. From compilers to IDEs to version control, every generation of developers has gained leverage. The last few years have brought what I believe is the most significant leap yet — large language models (LLMs) and coding agents.
Tools like Copilot, Cursor, and numerous other “agents” like Claude Code & co. can generate boilerplate in seconds, spin up a REST API, scaffold with a single prompt, or even summarise a 5,000-line pull request. I have personally used this to port logic from a language I don’t want to deal with — no hate to C++, but sometimes, while working with embedded devices, micropython is such a time saver. Regardless, with the power of LLMs, I can condense various miscellaneous work into minutes, which would require hours of my focus time otherwise.
I have seen the evolution of these tools over the last 2 years, and I can confidently say that they are here to stay. They are not perfect, but they are getting better every day. I have been daily-driving Copilot and Claude Code since they were released and I can claim that I’m somewhat of a power-user myself. Some day I’ll write a blog post about my experience with these tools, but for now, I want to focus on a different aspect of this evolution.
The effectiveness of AI coding assistance depends not only on the model’s intelligence, but also on the quality of the codebase it interacts with.
Pretty straightforward, right? But let me explain why I think this is a big deal, and also try to address some of the FUD going around these tools on Twitter every day.
The Rise of Coding Automation
Early “AI pair programmers” were mostly autocomplete on steroids. They could spit out snippets, but context was limited. Today’s LLM-based tools ingest entire projects, understand inter-dependencies, and adapt to your coding patterns. This means I can completely offload repetitive work like DTO generation, CRUD endpoints, and unit test scaffolding to the machines. I spend less time grinding through boilerplate and more time focusing on domain logic and architecture.
But this doesn’t mean developers are “done for.” It means our jobs have shifted from writing every line to more designing, curating, and maintaining code that humans and machines can reason about.
Cutting Through the Noise
“AI will replace all developers.”
Just describe features in English and boom—software ships itself.
This is fantasy. Software is more than code; it’s architecture, systems thinking, tradeoffs, and a touch of empathy for the future users.
As someone writing software for a living, I wish it were that easy. I would love to run a bunch of Claudes and mint money selling software to the world.
I write code that makes LLMs go brrr, so trust me when I say that they are utterly useless for some of the things I do. I routinely turn-off copilot and force-kill Claude for about 20-30% of my work, when it starts to generate garbage on parts of code that work on novel parts — for example, OpenAI’s responses API. It was just released a night back and Claude had no clue. It generated wonderfully consistent garbage, but garbage nonetheless. It just hasn’t seen the API yet, and no amount of prompts and web searches could fix that.
“LLMs are useless.”
Because they sometimes hallucinate or generate brittle code, critics just dismiss them outright.
That ignores the productivity gains already proven in industry settings. I have bootstrapped complete features while on in-flight WiFi with Claude. The truth is in between. LLMs are tools, not magic wands. Neither they make you Hermione to open door locks with “Alohomora”. Their usefulness scales with the structure and readability of the code they touch.
The Sweet Spot for Humans and LLMs
Uncle Bob didn’t write “Clean Code” with LLMs in mind, but it might as well have been. (Let’s ignore politics and Clean Code dogmatists for now. It’s a nice book, read it if you haven’t)
Consider a few principles:
Small, single-responsibility functions make it easier for humans to test, and easier for models to “understand” scope.
Use meaningful variable and class names that act as semantic breadcrumbs for LLM reasoning.
Have a consistent structure. Predictable patterns make it simpler for models to complete code correctly.
Now flip the scenario, Get a file from a poorly written legacy monster with 30,000 LOC files, cryptic variable names, and spaghetti logic. Working on such code is brutal for humans and impossible for even the smartest LLM to navigate effectively within context windows.
Good code isn’t just a human courtesy anymore.
A Peek Under the Hood — How LLM Tools Parse Your Code
LLMs don’t know words; they know tokens. Each token is a piece of the input text, and the model processes these tokens to understand context and predict responses. With enough training data, LLMs have learned to recognise patterns and make educated guesses about what comes next. So models, in essence, take X tokens in and generate Y tokens out. The more context (X) you can provide, the better the output (Y) tends to be. However, each model has its own context window limitations. Today’s models have an advertised limit of around few million tokens, but in practice, only a fraction of that is usable for coding tasks. It’s much better than GPT-3.5’s 32K limit.
But our massive projects won’t fit in a single context window. Even if they did, you don’t want to do that, because computing each token costs time and money. It’s not just one call to an API — agents run in a loop, making multiple calls to refine outputs, and all their memory (including what they read and what they wrote) is going to be part of the context window and you will run out of it pretty quickly.
So, coding agents have to be smart about what parts of your codebase to load into the model at any given time. Much of the heavy work is done by LLMs, which, like any competent developer, would try to guess the structure of the code they’re working with.
Next time you kick off “Agent mode”, look carefully at the tool calls the Cursor makes. You will find a lot of searches across files with patterns matching file names, classes and functions. If I gave you a codebase and no documentation, you would do the same thing. There are optimisations like vector search and embeddings, but fundamentally, it’s the same process.
If your project has massive, tangled files, the assistant has to make hard tradeoffs about what to load. That means it may drop critical logic, or worse, misinterpret and hallucinate stuff. But if your code is modular, well-named, and neatly segmented, the assistant can pull just the right piece and you get faster, cheaper, and more accurate results.
Think of it as coding with an additional teammate: one who doesn’t complain, but has strict memory limits.
Coding for Humans and Machines
The lesson is simple: clean code has never mattered more. I’m not saying to follow “Clean Code” dogmatically, but strive for clarity, modularity, and good naming. Readable, modular, and well-structured code isn’t just about making your future self (or a junior dev) grateful. It’s also about enabling the growing ecosystem of AI coding tools to work effectively. The better your codebase, The better these tools perform, the more they boost your productivity and reduce friction.
So, in 2025 and beyond, don’t just code for humans. Code for humans and machines. That’s how you’ll get the best of both worlds.