I have been using LLMs (Large Language Models, commonly referred to as AI) to help with software development work (and other things, like “how do I clean my classic Gore-Tex jacket”), for a couple of years now. I’ve tried a few different approaches and have finally, at least for the moment, settled on a workflow that works for me. A lot of this learning process has revolved around trying to get a handle on how to manage the technology itself, which honestly, can be a bit unwieldy.
My goal for this blog post is to first discuss what has worked well and where I have run into various issues. Interestingly, while I’m looking at LLMs through the lens of a software engineer, these observations can be applied to other professions and/or occupations as well.
Introduction#
I prefer to use the term Large Language Models or LLMs to Artificial Intelligence or AI. Artificial Intelligence implies intelligence and while these systems can certainly come across as smart, they are still fundamentally context driven systems. They are not thinking in the same way that people think. They do not understand intent, responsibility, tradeoffs or consequences in the way that a human does. They are exceptionally good at synthesizing information in useful ways which can sometimes look remarkably creative, but it is still very different from human understanding, intent or imagination.
At the most fundamental level, the LLM is trying to predict what text should come next. For example, if I say “The best …”, most people would probably guess that the next logical word is “thing”, then maybe “about” and so on. Modern LLMs are doing quite a bit more than this simple example, but at the end of the day, this next-token prediction process is still the foundation of how they generate answers.
Underneath all of the complexity, this is fundamentally a very large probability and statistics problem operating at enormous scale.
If you treat an LLM as an authority, you are going to run into problems.
This matters a lot. Treating an LLM as an authority will eventually cause problems. If you treat it as a tool that can help you explore ideas, generate options, challenge assumptions, and accelerate parts of the work, then LLMs can become incredibly useful. The quality of the output you get from an LLM depends heavily on the context you provide, the data the model was trained on, and your ability to think critically and evaluate what comes back. This is the absolute most important thing to remember when using these systems.
My Use Case#
I’m using LLMs to write code that builds and manages infrastructure on Amazon Web Services (AWS). Generally speaking, infrastructure code needs to be very precise which guides my requirements for the code provided by the LLM and the architecture discussions about it.
On the other hand, not all code deserves the same level of precision and, in my opinion, some software projects are well suited to a black box LLM approach. In these cases, you let the LLM write and own the code. If you need to fix it, you just throw it back at the LLM and let it deal with it. You don’t know or care what it did as long as it works. Of course, the tradeoff here is that you have no clue what it’s doing and, at times, may not even understand what it’s doing. It’s trading your understanding for speed. This is probably fine for some types of frontend work like code that formats webpages. For example, I use this approach to write plugins on my website that display images along with information about the images. But a blackbox approach becomes much more problematic as the importance and/or complexity of the system increases. In fact, even bits that seem appropriate for a blackbox approach may suffer from performance and security issues if you are not careful.
If you are interested, I wrote an article on Dependency Gravity Models which goes into more detail about how I think about where/how development effort should be focused and how LLMs can participate in the code writing process.
What I have found is that LLMs don’t really replace the need for good engineering. If anything, they expose just how important good human engineers really are.
Observations#
These are in no particular order. Some of them are hardly surprising while others might even be things you have already built into your code without realizing it.
Velocity#
One of the biggest claims around LLMs is that they make the software development process dramatically faster. You will hear predictions of massive productivity gains while reducing the size of engineering teams or AI replacing large portions of the workforce.
This is sort of true and, as I mentioned earlier, it really depends on the type of work that you are doing. In fact, in my experience, the total time often stays about the same. What changes is where that time goes. I spend less effort translating ideas into code, and more effort thinking about interfaces, planning, and overall architectural fit. So, in the end I manually write less code and I wind up with a more refined design process which ultimately results in a more complete product. So I find that instead of finding myself with more free time, I reinvest any time savings into the project I’m working on. For example, instead of writing code, I’m working on better documentation which could then be used by the LLM when fixing bugs and refactoring.
Accuracy and Trust - Where things go wrong#
The accuracy of the generated code and your ability to blindly trust it is going to depend on what you are working on. I have used LLMs to write simple programs without really paying much attention to the code the LLM is producing. Basically treating the code like a blackbox or code that the LLM “owns” in the project. If it does not work, I tell the LLM what does not work and have it try again. Oftentimes, I find that it has completely rewritten the code as opposed to a bug fix approach that a developer might take.
On the other hand, if I’m working on code that needs to be precise, readable and maintainable by humans e.g., cloud infrastructure code, I find that I need to spend more time reviewing, testing and documenting the code. The LLM still writes a lot of the code, but I often find small mistakes. One problem I used to run into quite a bit was that the LLM would not use other existing code or modules and just rewrite everything. This creates a real mess as now there are multiple versions of the code that all do similar things. For example, we would build a module that knows how to create buckets (storage) on AWS and then have a second module that builds the actual site and uses the bucket module. When creating the site module, the LLM ignores our bucket module and just happily rewrites the code to create buckets in the website module. If you are not paying attention, this hidden duplication can make debugging the system very complex as you are looking at the bucket module while the problem is actually in the website module…which should have never been included. Another great example is having the LLM write tests. The tests will pass. You think everything is great until you notice that the LLM is completely bypassing the code you are trying to test and instead, just basically rewriting it inside the test.
LLMs optimize for passing tests, not testing correctness.
So, instead of the test calling the make_bucket function in the module, it’s just going directly to AWS, creating a bucket and confirming its there. Obviously this is less than an adequate test strategy, in fact, it’s not testing your code at all!
LLMs are good at writing code, not so good at deleting it#
Often the LLM will fix bugs by working around code that has an error in it. So, if you have a function that returns some kind of data structure that is not formatted correctly, instead of fixing the function, it wraps it with code to manipulate the data structure. Yes, that approach works, but it fixes the symptoms, not the cause. This is obviously wrong unless the function is already in production and being used by other code.
Unfortunately, this highlights a common pattern I’ve seen when working with LLMs: they tend to avoid removing or changing existing code, and instead build around it.LLMs struggle with subtraction. They’re additive by default.
I can do “the ideal” solution - lowering the cost of ambition#
By this I mean that I have been able to build solutions that would have traditionally looked too ambitious, too complex or too expensive. A great example is building infrastructure on AWS with the Python SDK as opposed to using the Infrastructure as Code (IaC) tools provided by AWS like CloudFormation or the CDK. In the past, this type of approach would be seen as too ambitious but the LLM handles a lot of the implementation details and makes a previously unattainable solution attainable. In my case, going direct to the SDK gives my application more of an infrastructure management tool feel as opposed to trying to work around the assumptions and constraints of an IaC tool which is a big bonus.
If you can have anything you want… what do you want?
The nice part about this, at least for the way I approach development, is the “If you can have anything you want…what do you want?” A lot of people have a hard time with intent-first thinking before getting bogged down with implementation details, but I find this to be a mind clearing exercise…like “you know what would be really cool…” Forget about compute limits, databases, networks, cost, everything. What do you want this system to do? This is important as it helps define the problem you are trying to solve at a very high level. Once you have established what it is that you are trying to achieve, you can use the LLM to help figure out the details. In this way, the LLM helps reduce the cost of exploring “ideal” designs so we can start the design process from intent, not constraints. Of course, eventually reality sets in and we have to deal with real constraints, but we are coming at it from a pure ideal standpoint which can have a major impact on what our constraints will be. What’s the best database to use? How does the cost of compute vary across different services? What are our storage requirements and what should our backup strategy be? LLMs are great for this type of research and can often come up with solutions you are not familiar with which can be further explored. The speed-up in research time also gives us more time to think, not only about how to solve the problem but even time to consider better designs. The tradeoff here though is that you need to have a pretty good idea of what you are doing. A junior engineer is going to really struggle to ask the correct questions and the LLM will happily lead you down a path that sounds completely plausible but is also completely fictional.
Consensus vs Correctness#
This is a challenging one. Many people will point out one of the major flaws about crowdsourcing development questions using Stack Overflow or Reddit is that the consensus is not always correct. The consensus tells you that there is a certain way to do it, that everybody does it that way, and you are wrong if you don’t do it the way we do it AND, when you look at it, you start to realize that everyone is doing it wrong! How many times have you gone to stack overflow only to realize that the correct answer to a problem is fairly far down the list and has no upvotes?
Consensus produces working solutions. Engineering requires correct ones.
The LLM comes along and scrapes those sites along with all of their biases and now implements the wrong solutions in your code. It works. By that I mean that the consensus solution will usually work, I mean, that’s why everyone upvoted it, but it’s not necessarily correct. For example, the consensus solution may be insecure, inefficient or rely on a software package that is no longer maintained.
The LLMs inherit the distribution of answers, not the correctness of them.
LLMs amplify dominant patterns, not optimal ones.
I had a bit of a hard time getting the LLM to write proper unit tests and I think part of the problem here is that generally speaking, most people are not very diligent when they write tests. The LLM sees this behaviour and builds it into the solutions it produces. Of course, if you are just using the LLM to write and manage code, you may not actually need any/many tests. If you are building a complex system with very exacting requirements, you probably want to make sure your tests are extensive and bulletproof. The problem though is not that the LLM invented bad habits, it learned them from its training data.
Tools not Tribes#
Kind of following the same thread as “the ideal” solution. Another concept that people have trouble with is having loyalty to different tools/brands. It honestly astonishes me how many people waste tons of time trying to convince other people that the tools they use, or the brand they use is the best and everyone else should be doing it their way.
Things like “Python vs JavaScript”, “Microsoft vs Apple”, even back in the day people would argue about “Emacs vs Vim(vi)”. Forget about loyalty! Use the tools that work for you. I don’t use a lot of Microsoft products because most of the work I do is with Open Source software. It’s not that I purposely avoid it, and if there was ever something where I needed to use it, I will.
Of course, we are now seeing this with LLMs. Many, many people I talk to about AI say stuff like “ok, but have you used Claude” or some other LLM. Yes, I have and do use them all. I find that different LLMs are good at different types of tasks. Tool selection is also complicated by the fact that LLM tech is still changing rapidly. So a tool like ChatGPT is very different now than it was a few months ago. Pick the tools that work the best for you and stop worrying about what everyone else is using.
Don’t trust one voice. Stay in control!
In fact, I would highly recommend that you use more than one LLM. Using a single LLM means that you start to trust it too much and inherit its biases without realizing it. Using multiple LLMs allows you to question, compare, contrast their output and most importantly, means that you are in control of the conversation.
Discipline#
The interesting thing about using an LLM is that it doesn’t let me hide behind implementation effort anymore. I end up being more thorough, more disciplined, because I have the time and space to do it. So where in the past you might skip over some “nice to haves” because you don’t have time to build them, now you can have them all.
The hard parts are harder to ignore
The LLM will open up options in the architecture that you may have not considered. Of course, because it is writing the code anyway, adding some interesting implementation details is much easier, and much harder to ignore.
People talk about rapid prototyping - this is rapid prototyping on steroids!
The benefit is that the resulting code is more robust or feature rich than it would have been when writing by hand.
Ultimately, it’s about being forced to make better decisions because the implementation cost is lower.
Naming Things#
There is an old, somewhat overused saying “The two hardest things about software engineering are cache invalidation, naming things and off-by-one errors”. The LLM’s ability to name things is probably one of the bigger time saving things for me. I often struggle with the naming of variable, class, function/method names. The LLM usually gets these right, or right enough that I don’t spend a lot of time thinking about them.
A different way they help with names though is taking an idea or a concept and attaching a name to it. So, I may be working though a problem and the LLM comes back at me and says “what you are really talking about is effort displacement” (which is actually some of the feedback my LLM gave me about this article). I don’t think I would ever come up with that by myself, but the term is effective in getting the main point of this article across.
So, straight from the LLM, the effort displacement that has been described above looks like:
- coding effort moves into planning,
- implementation effort moves into design,
- lower implementation cost increases ambition,
- faster iteration increases discipline requirements,
- the hard parts become harder to ignore.
This sounds right to me, and it’s a good way to sum up how I have integrated LLMs into my workflow.
Conclusion#
LLMs don’t remove the need for good engineering, they actually make the need for good engineering harder to ignore. Good software engineering is not about writing code, it’s about building solutions. LLMs make that distinction clearer. They reduce the amount of time that needs to be dedicated to coding and give the engineer a sounding board for exploring new ideas, challenging assumptions, and refining designs.
Ultimately, the work you produce is your responsibility. While the coding time can be reduced, it just means that the effort is shifted to other areas that traditionally get neglected like testing, documentation or even things like creating more robust solutions by adding more edge cases. It’s important to keep in mind though that LLMs can be biased and reflect consensus as opposed to being correct. Their usage requires discipline and not blind trust. They are tools to help you get your work done and, with that said, I would not go back to working without them.
The next post in this series will cover how LLMs were used in the development and implementation of dxaws.
Note: I used ChatGPT 5.5 as an editor to help critique the flow of this piece and refine the pull quotes. Everything else—the ideas and writing—is mine. (Except for this sentence 🫠)