This essay is the result of months of thoughts and now about three years of work with AI (namely, LLMs), and 15 years in the MSP industry. It is an attempt to sketch out the future as I see it coming.
This will dive into some of the technical, but as high of a level as I can and in plain-English. You must have some level of knowledge about the technical to understand where things are going.
I am not certain about everything, and I’ve tried to be clear about where I think there is the most room for things to go another way.
The Briefest of Primers
If you are reading this, I assume you have a working knowledge of AI, but let’s cover a couple of bases briefly.
AI is a vaguely defined term. Here, I use the term primarily to mean generative AI (GenAI), AI that “creates” things, such as images and text. Second, I primarily mean AI as we see it in large language models (LLMs) the type of GenAI that can output text based on input, like ChatGPT.
A model is a specific LLM by a specific vendor. Here, I write about Google, OpenAI and Anthropic as they are the leaders in the top models today. Their models are named confusingly and often out of order. GPT 3 came before o1, for instance.
It is enough to know that different models are essentially different versions of LLMs. They have different strengths and weaknesses, but with few exceptions, they are getting better and cheaper.
Things are Moving Fast
Whenever I say this people nod ascent because they think they know this, but they don’t. Unless you are following specific media1 and social media that regularly surfaces advancements early, and unless you are using the most advanced models2, you don’t understand where things really stand.
When ChatGPT hit mainstream in 2022 it had a context limit of 4096 tokens3. This means that after about 3,000 words you couldn’t chat with it anymore. 3,000 words! And that seemed cutting edge, if somewhat quirky. Facts were frequently wrong, and the chats had no knowledge of current events. A few months later in March of 2023, this increased to 12,000 words4, and the newest model5 from OpenAI performed far better on a wide range of tasks.
At this point, if you were using AI for coding it was great on some things and horrible at others. Sometimes it nailed a bug, sometimes it made up libraries you could use that sounded real but didn’t exist. It was decent at things like brainstorming and copy-editing.
Then in early 2024, OpenAI announced 4o and in the summer Anthropic released Sonnet 3.5. Both were (and are) highly capable models at affordable prices, and both could search the web, although OpenAI’s was easily better for consumers.
All of a sudden facts were mostly right, if you weren’t trying to throw them off or ask about things too recent. If you had access to a program that used AI and combined it with web searches the quality went up significantly.
4o’s context limit is 90,000 words, and Sonnet’s is 150,000, the length of The Grapes of Wrath and longer than Pride and Prejudice.
October 2022: 3,000 words
June 2024: 150,000 words
April 2025: 750,000 words. Three weeks after the first draft of this essay, Google and OpenAI are both at 750,000 words. I can’t finish this essay without falling behind.
Then OpenAI released o1, which was even better at coding, and then o1-pro which was even better than that! By a few weeks ago, “vibes coding” had become popular: telling a model the program that you want it to build, and then letting it build it without checking the code yourself.
There isn’t an easy way to quantify the progress, but I’d say, at least for technical work, the value of AI has increased nearly 100-fold over two years. I have friends that I’m sure would say even more.
That is today. What is tomorrow going to bring?
The one thing to note regarding the speed is that the basic underlying technology of these LLMs haven’t changed in the last three years6. Instead of fundamental breakthroughs, the progress has come from throwing more computing power at the models, both while training them and while running them (inference).
We’ve also found that we can train models using models, if the “instructor” model has high quality output it can train a model to at least its own capability.
What we haven’t seen is much in the way of optimization7. While we’ve scaled these models to be larger, with a couple of exceptions we haven’t made them more efficient to run, in terms of resources, or rewritten our software to make better use of them, which is discussed at the end.
The Holy Grail
That is how fast AI is moving and where it is going in the short term, but what about the long term?
And what is the long term?
The Holy Grail of AI is AGI, Artificial General Intelligence (not the same thing as superintelligence)
And what is AGI? Well, that depends on who you ask. OpenAI says AGI is “A highly autonomous system that outperforms humans at most economically valuable work.”8
In other words, AGI will be AI that can do most of the work of any knowledge worker, better than that same person. AGI is the computer that can take your job.
Superintelligence is god-level AI, more like The Matrix, where it can “think” about things well beyond what any human can. I won’t dig into that at all here.
What do we mean by intelligence? Does a system need to demonstrate the correct “thoughts” or are the results enough? How do we measure the “thought process”? This could be an entire series of posts, but I suggest that any system that can mimic thoughts and reasoning successfully in new domains and on new problems be considered intelligent, regardless of any philosophical arguments to the contrary.
That means that if you give it a question that is unlike what it has seen before it is able to reason through it to produce a coherent answer.
Is AGI possible? Most of the leaders in the space say yes. OpenAI CEO Sam Altman, Anthropic CEO Dario Amodei, Meta’s lead AI research Yan LeCun, both Google co-founders Larry Page and Sergey Brin, and many, many more, believe that AGI is possible, and coming soon.
I for one am in the probably camp, with low uncertainty. As I mentioned at the outset of my first piece, the underlying tech of AI hasn’t changed in the last three years. However, we have likely come close to hitting its limit, which is why exploiting its constraints matter so much. It’s also why we’ve seen a shift to using more computing and better (not just more!) training data to produce higher-quality models, because we’re plateauing on what the basic tech can do9, but not how far we can take it.
It seems likely that we’ll need a new breakthrough in the basic technology – and possibly math – before we see models that 100x the usefulness of the current models, the way we see use of the current tech 100x from 2022 to 2025.
But whether or not we need new tech is an area of high uncertainty for me. It might be that Sam Altman figures out that throwing 5-10 megawatts at GPT-5 or GPT-6 effectively reaches AGI.
How long it would take to reach that next technology is also a big unknown. Maybe someone has it and is testing it, or producing the product but managed to keep it under wraps. It might be one post-graduate paper away.
However, more than anything I believe we can reach AGI because of how humans think. We think, mostly, in words10.
We use words as symbols that represent things in reality (and fiction!) Using words I am communicating concepts to you right now. If I were to write about a huge canyon in the western United States you’d know what I meant and probably be able to picture it. If I wrote about last night’s ballgame, you’d be able to follow it along in your head at roughly the same level of detail I used to describe it.
Words are not themselves reality, but we’ve gotten so good with them that they can closely represent reality.
Consider this: Einstein didn’t experiment his way to the theory of relativity, he thought his way to it. And presumably, he thought his way to it using words.
If words can get us to the theory of relativity, then I imagine that they can do a lot of other things, too. When we give AI words, we give it a representation of reality good enough that it can (probably) think its ways to lot of solutions on both existing and new problems.
And AI today doesn’t even need the words, exactly. After translating our words to numbers (vectors) it thinks in numbers, and then eventually translates its thoughts back into words for our sake, not its own.
To go back to our definition of intelligence, if mimicking intelligence is enough to be considered intelligent, and if AI can understand the real world via words, or at least associate them in a nearly infinitely-complex statistical model, then AGI is possible.
If AGI is possible, what is the timeline for it? This is an area of high uncertainty for me, but 2-3 years off is possible, and that is the timeline I plan on. As I wrote about a few weeks ago, readers of The Information converged around 3 years out.
Others in the industry are all over the board. No one can commit to a timeline, but the majority of researchers believe it is coming, soon. (Their unwillingness to put a time on it suggests that they are also waiting for the next breakthrough, and not simply more powerful chips.)
AGI — defined as “a system that’s capable of exhibiting all the cognitive capabilities humans can” — is “probably a handful of years away,”
Google DeepMind CEO Demis Hassabis, Axios
“Over the next two or three years, I am relatively confident that we are indeed going to see models that show up in the workplace, that consumers use — that are, yes, assistants to humans but that gradually get better than us at almost everything,”
Anthropic CEO Dario Amodei, Axios
“We are now confident we know how to build AGI as we have traditionally understood it.”
Sam Altman, CEO OpenAI, Axios
“…the Biden administration identified that the emergency is that AGI is just a few years away.”
Gregory Allen, director of the Wadhwani AI Center at the Center for Strategic and International Studies, Axios
“Today’s [AI] systems, they’re very passive, but there’s still a lot of things they can’t do,” illustrated, “but over the next five to ten years, a lot of those capabilities will start coming to the fore and we’ll start moving towards what we call artificial general intelligence.”
Demis Hassabis, Uniladtech
“I now predict [AGI] in 5 to 20 years but without much confidence. We live in very uncertain times. It’s possible that I am totally wrong about digital intelligence overtaking us. Nobody really knows which is why we should worry now.”
Geoffrey Hinton, Futureism.com
The Impact on Business
I believe that AI is likely to decimate the white-collar workforce. In short, if your job is done primarily at a computer, then odds are most of it can be automated. Maybe not all of it, but enough to have a major impact on jobs.
Imagine that half of all jobs at your company can be automated tomorrow, do you expect your company’s headcount to stay the same? No, you can expect mass layoffs, and the people remaining to be those that are best able to manage AI agents to do the automated work and to personally handle the things that cannot yet be automated.
A lot of the tools we use in any industry have APIs in place already, the interfaces that computers use to talk to each other. And LLMs can easily plug into these (and do). We’re in the early stages, but there are already hundreds of prebuilt tools to connect AI with APIs available for free, online11. What this means, practically, is that software can be written with minimal to moderate effort that connects an AI model to software your business runs on day-to-day that allows the AI to read data, write data, and take the actions you already take day-to-day.
For areas where APIs don’t exist, they will. And for areas that APIs can’t easily be written, AI will be able to use the mouse and keyboard well enough to work through the graphical interface, albeit slower.
In addition, any mature industry has most of what they do documented. Maybe not every company, but whether you are in IT, legal, finance, accounting, insurance, etc., the principles and rules of the road are written down in textbooks, industry websites, or individual companies whose knowledge will soon be at a premium for AI to have access to.
Most of what we do in most white-collar industries is deterministic to some degree, meaning, not random. There is nothing new under the sun. There are exceptions, sure, but most of the legal work on business filings are within moderately narrow parameters. The same is true for accounting, or insurance risk tables, and so on.
Between deterministic scenarios and written documentation an AI system will be able to solve most problems most of the time on its own soon.12
As AI takes over, QA will go up across industries. Humans don’t always follow SOPs, but machines will be able to. Humans make mistakes and it takes time to correct them. AI makes mistakes, but can check its own work within seconds.
The cost of labor will decrease across the board, putting further downward pressure on prices, as budget-friendly competition crops up everywhere competing at the same – or higher –level of quality that people are used to today, but at a dramatically lower internal cost, allowing them to pass on those savings to gain market share. This in turn will create downward pricing pressure on the industry as a whole.
The solutions that are premium solutions today will be table stakes tomorrow. Tax prep? AI will handle 90% of it and CPA review the last 10%. For simple returns, no CPA will be needed at all.
Filing an LLC with the state? No need for a lawyer, the filings are all the same anyway.13
Lower prices and a higher labor supply – more people seeking employment – will lead to lower wages, or stagnant at best. The highest performers in any industry may see a dramatic increase in pay if they can manage AI to 10x their performance compared to the future baseline – not today’s.
Ok, great, what about industries that are not white collar? Churches and other non-profits, manufacturers, medical?
The secondary and tertiary effects will be crucial to them and may create a cyclical effect. Imagine you are a church with several pastors. We’ll assume that most people don’t want their pastor replaced by FatherGPT, so those jobs won’t be automated away. However, the jobs of the church members whose donations pay the bills will be automated away, and the downward pressure on their wages will result in lower donations to the church, which may easily culminate in staff cuts.
What about medical? With a shortage of doctors and nurses today, automation will increase productivity without hurting jobs. BUT, what happens when patients can’t afford to come in because they lost their insurance when they lost their job? Can the medical automations decrease costs for the doctors’ offices enough that they can lower prices to make care affordable?
What about construction? Those are jobs that are not automatable, but what happens when 20%-30% of the population newly are unable to afford houses?
What happens to downtowns as companies downsize?
If white-collar workers out of work turn to the trades, what happens to the prices and wages of those jobs as many new workers enter the labor force? Does the demand for the trades services decrease because people can’t afford them while more people enter the industry? What is the net effect?
It isn’t all bad news. I expect that mid-term AI will be a net positive for some industries, at least the primary effects. AI will free up doctors to see patients and not work on paperwork. Dr. GPT might even be able to answer low-level questions, and order blood work and non-invasive labs.
Accounting doesn’t have nearly enough new accountants entering the labor force today, a major increase in automation will allow the remaining accountants to focus on higher-level value-added services, not filling in the blanks in tax forms.
Lawyers will be able to delegate research to ChatGPT instead of paralegals and have the results in 15 minutes instead of 2-3 days. (This might be bad news for the paralegals!)
And government, I can imagine a world where AI decreases the timeline of bureaucracy by 10x, or 100x.14
To be sure, there are others that believe AI will be a net gain for jobs and workers, but I’ll cover that in the final section.
Even with these bright spots, I think that the net change after considering secondary and tertiary effects is unknown.
To sum it up: I think AI will have an incredibly positive impact on productivity, and a negative impact on workers. On balance, I don’t know where the economy as a whole lands.
Is It A Continuum?
I’ve laid this out like it an all-or-nothing approach, but is it?
Of course not and yes, maybe.
On the one hand, getting more out of what we have today and overcoming current restraints is a continuum where we will see high volumes of incremental progress. It will be industry shifting but not life-changing.
But a model that brings us significantly closer to AGI, or takes us all the way there, can drop literally overnight. At that point, it isn’t a continuum. By then, we’ll have written most of the software and APIs needed to automate most white-collar work. Trading out the good-but-not-AGI model for AGI-1.0 might be as little work as changing a single line of code.
Our world can change overnight.
A Special Note for IT and MSPs
While I’ve focused on white-collar jobs in general, the effects on MSPs will be even more profound. First, the things we troubleshoot are very well documented, in general. Those that are not, can be and can be documented by AI observing the troubleshooting process only once or twice.
Our work is highly deterministic in nature, and the majority of our troubleshooting occurs in narrow scopes. Granted, it doesn’t always feel that way, but a high-level analysis of tickets will find patterns that lend themselves to standard solutions.
Our tools also all have APIs: our PSA, our RMM, our documentation systems15, M365, G Suite, and the list goes on. The operating systems have their internal APIs – especially PowerShell – that allow an AI to easily control it.
I think there will be the significant stages of AI in MSPs.
Stage 1: Now
MSPs use AI to speed up certain, specific tasks, and decrease the amount of time it takes to troubleshoot issues. Using ChatGPT to write scripts, new minimally complex pieces of software and search the web for articles related to the ticket we are working on.
This is where we are today and I don’t think it makes a difference to employment, it is a minor bump in productivity for any MSP making use of AI.
Stage 2: Coming Soon
Stage 2 is where we integrate AI into our core tools, especially ticketing, documentation, security alerts and major cloud vendors such as Microsoft and Google. I do not mean Copilot or Cooper Copilot. I mean real integration that does more than search, read and summarize documents.
At this stage, AI will be able to handle most first-line support issues, including taking limited action and giving techs specific steps to act where the AI cannot act directly. It will review and triage security alerts in real-time, and escalate to humans as needed.
This stage will see a huge jump in productivity for MSPs that can make the leap. There will be a massive discrepancy between the top and bottom performing MSPs. Top performers will be taking on more clients and investing in new AI tools, or writing their own applications, without increasing labor costs, while the bottom performers will be struggling to keep up with the speed and quality.
Top MSPs will be able to drop prices without sacrificing speed or quality for the first time, further eroding the customer base for small MSPs who compete on price today.
Valuations for these top MSPs will skyrocket, possibly hitting 10-20x multiples – or more – regularly, as they do more with less.
But, this increase in productivity and increase in client counts will mask an underlying issue: our clients will reduce their seat counts as they automate.
Stage 3: Newspapers
This is where the vast majority of what we do is automated. I don’t mean tier 1 tickets, I mean tier 3 and below. Most security configurations, all firewall configurations, there isn’t much that will need a human.
To top it off, most of whole-collier clients will be in the same boat: most of their work will be automated. The result will be a seat count that rockets towards the basement at the same time AI will be able to fulfill most of our jobs.
The industry will collapse in the same way newspapers did: they aren’t gone, but for anyone that remembers the 1990s or early 2000s, they are gone. A shadow of their former selves, a handful of national level outlets are what is left of the glory. The regional papers are gone, or bought up and exist in name brand only.
Stage 2.5: Goodbye, Valuations
Stage 2.5 is where savvy investors will clearly see that stage 3 is coming. This is when valuations will drop. Not from 20x to 5x, but below that for most remaining MSPs. The value that MSPs bring to investors will be caught up in how they’ve adapted to selling AI services and being integral to their clients’ businesses in that way.
What About The Robots?
I’m choosing to focus on information work in this essay so I won’t say much on this topic.
The robotic world is going to need significantly different training, and a training corpus does not naturally exist for it. Libraries and Reddit fueled LLM trainings, but data from physical world movements are needed for true AI robotics.
Eventually, however, we can expect that robots will be able to handle even most of the physical work that humans do today.
….But: AI Can’t Do All of This Today!
No, it can’t! But!
Consider how far we can come in a short time with determined effort to exploit the current bottlenecks.
Second, the release of models over the last three years has shown remarkable increases in capability. If the software is written today to connect AI and the tools we use in business, then tomorrow when OpenAI, Google, Anthropic or somebody else releases their new top tier model, only a single line of code will need to change to give a whole new level of capability and cognition to those tools.
The change will happen quite literally overnight.
We Are Not Ready
We are not ready to deal with this in any developed nation. And no one is going to reign it in, because the national security implications of missing out on AGI when your enemies and frenemies have it is second only to nuclear – possibly tied because no one wants to use nuclear, but everyone will use AI.
The current administration shows no interest in regulating AI or in increasing social safety nets. On the contrary, the cuts in government are disproportionately on jobs that will be automated in the future. If the administration successfully brings back some manufacturing that will but soften the blow a little, there just are not enough manufacturing jobs worldwide to make up for half of the white-collar workforce needing a new career.
I believe that far too little heed is being paid to the societal implications that AI is likely to have, within years, and again, a model release away.
Nor are the net effects easy to understand. In the Great Depression we had massive unemployment and no rise in productivity. What happens if there is a rise in unemployment, but a significantly more massive uptick in productivity? 10x – 1000x, measured in years not centuries?
I don’t believe in a happy future where humanity has leisure to do whatever it wants with free time limited only by years of life. Even if it was technologically possible, man seems unable to stand contentment for long.
What Do We Do Next?
I have no idea, but when you write an essay like this you are supposed to present solutions.
But I present problems here, I don’t believe there are simple solutions.
Here are a few ideas I can leave you with.
If you are a worker, start learning how you can use AI as fast as you can. Get an idea of what it’s good at and what it isn’t. See where it can make you more efficient. Think through how you might manage an AI agent – or department of agents – and not just work alongside one.
If you cannot make that transition, then you must seriously consider your own retirement plans, or alternative careers that will resist AI-based automation, at least for longer.
Everyone ought to be thinking about how their savings need to change if future employment opportunities are going to be inconsistent, or at least, not guaranteed.
If you are a business owner, then you need to be thinking through the impacts on your business in a couple of different time horizons. In the near term, how will your industry adopt AI and what do you need to do to stay competitive? Are the changes going to be in services cost, quality, capabilities or a combination of the above? How will your input costs change? How will AI change who you need to hire and what you train for?
Next, are there things you can use AI for that your competition can’t easily copy? Or, are there places where you think you can implement AI faster than they can and gain a temporal edge?
Mid-term you need to consider the primary, second and tertiary effects on your industry. What happens when workers are automated? How are your clients affected when their employees (or jobs if your are B2C) are automated? And what about their customers?
What are the economic impacts that will be felt by your geography or niche?
How will your billing need to change if you bill by the man-hour but now AI is doing a lot of the work? What about if you bill by seat count (i.e., the number of employees your clients have) and their seat count will drop but your value proposition will increase?
How will the long-term valuations of your industry change and how does that affect your retirement and exit plans?
The Arguments Against Doom and Gloom
Here are the main arguments that people make for the other side against various things I’ve articulated.
AGI Is Far Off
The first argument is simply that AGI is a long way off, and we will have time to prepare. Nothing will happen overnight. This is the most plausible argument you can make against anything else I’ve said. But all either side can say here, we’ll see. AI is moving much too rapidly to really know when we’ve hit a brick wall, and whether the next development will be massive or incremental.
AGI Is Impossible
I think it is debatable if we will ever get to super-intelligence. I think you need to plan for AGI as likely. At the very least, no one can prove we cannot do it, and there are very good reasons to believe that we can do it.
But we are already on the cusp of AI that can automate huge amounts of what we do, even before reaching AGI.
Indeed, if AI were simply more reliable today, we’d see much more automation. That might not be “AGI,” but it is still a computer that can take over a whole lot of what you do.
Jobs Are A Net Gain
There are studies that predict that AI will have a net job gain, and that pain will be felt for some – like automation in manufacturing – but that the economy will be stronger and more resilient.
Maybe.
To be sure, the car replaced the horse but didn’t cause mass unemployment. Automation did create unemployment for many, but the increased productivity and cheaper goods made life better for more. ATMs didn’t kill bankers.
And so on.
A major study by PWC predicts that AI will add $15.7 trillion to the global economy by 2030. McKinsey throws out a smaller number, but still in the trillions. Accenture and IDC also see the positives.
The World Economic Forum is also positive as a whole, but slightly more nuanced in their report titled, Future of Jobs Report 2025: 78 Million New Job Opportunities by 2030 but Urgent Upskilling Needed to Prepare Workforces. 170 million new jobs (good) but while displacing 92 million other jobs (ouch).
I suspect that each of these reports see AI are more slowly moving than the picture I’ve painted above. Indeed, if AGI is 30 years out then, yes, we’ll see more productivity gains vs job losses over the next 5-10 years.
Also: the reports seem to missing the headlines from Shopify, Duolingo, IBM, Turnitin and others that have slowed or frozen headcounts specifically because of increased AI productivity.
These reports all point to something closer to a panacea than a negative disruption, and it is possible that I’ll eventually come around to seeing it this way.
Humans Want to Work With Humans
This argument says that jobs won’t go away even if we reach AGI because people like to deal with people.
First, have you met anyone from Gen Z? Smartphones are better than people for interactions.
Second, the economics will triumph. The premium for working with say, an insurance firm or an IT company that primarily uses humans at 10x the cost (or more) will be far too high for most consumers to swallow. This argument is a version of, “but people like their horses,” so cars won’t replace them.
There are still horses, but only for things where cars cannot replace them.
Getting Past Constraints
I took this out of the main body of the essay because it interrupts the flow, but I think it is important because so many people have had an experience with AI that made them want to write it off. And many of those experiences came down to one of a few flaws.
We’re going look at the constraints of using the models today.
Those constraints are:
- Lack of recent knowledge
- A context window that is very large, but still limited to a very important degree
- They are not reliable
- There are some things LLMs just don’t seem to get, and there isn’t a clear pattern.
Each of these can and is being addressed.
Lack of Recent Knowledge
The foundations of every model have some cutoff date, after which the model knows nothing else. There are at least two ways that we can and will exploit this constraint. First, and most obviously if you give a model current knowledge through a web browser or database, it then “has” that knowledge, at least during the conversation. Because each conversation is ephemeral, it will need to re-learn every time, but those processes can be automated.
Second, I suspect that we will be able to take one of the giant models we have trained already and then do post-training on newer data. My suspicion is that the model will intake enough of this new data to be able to use it, even without re-inputting it into every conversation.
Third, we can layer some thin models on top of the large models to give them new or specialized capabilities. This has been most clearly demonstrated on image generation, especially Stable Diffusion16.
A Short Context Window
This is the piece that I believe has been most limiting to most hands-on applications of AI. While the most capable models can get to 150,000 words it’s shocking how fast that fills up17. Those 150,000 words is the limit on the entire conversation, not any given input or output. And while you won’t take up all of those words re-writing Pride and Prejudice, you can fill up the context window super rapidly with outside information such as websites, code bases and databases. The most advanced reasoning models also have internal thoughts18 that count against the token window.
What’s worse, is that you don’t need many of those tokens again. Think of the context window as a budget, you have a fixed amount you can spend to get something done. You can spend tens of thousands of these on a web search looking for a single web page. You may find it, but all the tokens you “spent” reading through the results of the irrelevant websites still cost tokens.
The way around this is active management of the context window. And there are a couple of ways to do that.
First, is… whatever magic OpenAI has in ChatGPT. Seriously, they have managed that context window marvelously and haven’t shared how. But this is only available in their web interface right now not their API, so independent software vendors can’t take advantage of it.
The second way is what I call delegation: you let your model delegate specific tasks to models in a new conversation, because each conversation has its own budget, you can spend a lot more tokens.
Think of a workflow like this. You, as a user, want to know how many RBIs Kirby Puckett batted in a year in his prime, and if that would still be a lot today. Right now, if you posed that to a typical AI agent the flow would go like this:
Question-> AI: searches web “Who is Kirby Puckett?” -> AI: reads the first page-> AI: searches the web for “Kirby Puckett RBIs” -> AI: reads through the first page -> AI: reads through the second page -> AI: Reads through the third page -> AI figures out that Kirby Puckett batted 121 RBIs in 1988 -> AI: searches the web for “Most RBIs in 2024” -> AI: reads the first page -> AI: searches the web for “Most RBIs in 2023” -> AI: reads the first page -> AI: searches the web for “Most RBIs in 2022” -> AI answers: “Kirby Puckett batted 121 RBIs in 1998. That is a very good season, and would still be a good season in 2024, although it wouldn’t put him at the top of the leaderboards.”
That would be a lot of words reading through each page! Even worse than you think, because each page has hundreds or thousands of words of irrelevant content and code that you don’t even see as a human using a web browser.
A delegated flow would more like:
- Question
- Main AI to delegate AI-1: “Please search the web to find information on Kirby Puckett”
- Delegate AI-1
- Searches web for “Who is Kirby Puckett?”
- Reads page 1
- Reads page 2
- Reads Wikipedia page
- Answers: “Kirby Puckett was an outstanding baseball player who spent his entire career with the Minnesota Twins.”
- Delegate AI-1
- Main AI to delegate AI-2: “Please search the web to find out how many RBIs Kirby Puckett batted, and which year was his highest. Only tell me that data.”
- Delegate AI-2
- Searches the web for “Baseball statistics” and gets results for top websites with baseball stats
- Searches the web for “www.baseball-reference.com kirby puckett” and reads page
- Searches the web for “fangraphs.com Kirby puckett”
- Answers: “Kirby Puckett hit 121 RBIs in 1988”
- Delegate AI-2
- Main AI to delegate AI-3: “How many RBIs did the best baseball players hit in 2022, 2023 and 2024?”
- Delegate AI-3
- Searches the web for “most RBIs in 2022”
- Reads the first result
- Reads the second result
- Searches the web for “most RBIs in 2023”
- Reads the first result
- Reads the second result
- Searches the web for “most RBIs in 2024”
- Reads the first result
- Reads the second result
- Answers: “Based on my research, the top baseball players hit 131-144 RBIs in a year. Out of the top ten players per year, the lowest number of RBIs was 94 in 2024 with a three way tie of Marcell Ozuna (ATL) – 94 RBIs Salvador Perez (KC) – 94 RBIs Kyle Schwarber”
- Delegate AI-3
- Main AI answers user: “Kirby Puckett batted 121 RBIs in 1998. That is a very good season, and would still be a good season in 2024, although it wouldn’t put him at the top of the leaderboards.”
But in the first example we spent at least tens of thousands of words in our conversation budget, in our second example the total was only 175 words!
For complex workflows the savings would be even greater. For example, a user may ask AI to work on a project with many steps that requires a lot of information. The main-AI agent might ask a delegate AI agent to accomplish a task, and the delegate might first use subdelegates to gather information before starting on the given task; the main-AI agent becomes a type of managerial project manager of other agents in this scenario.
I’ve written about this privately and talked about it for months, but it wasn’t until Anthropic’s Claude Code tool came out that I saw it in action. This tool is the best I’ve seen yet for programming and follows this type of workflow exactly. The primary conversation is with the user, and then it delegates tasks like “find the file that does XYZ” or, “Please read the documentation in the project for how ABC is accomplished and give me code snippets,” and so on. The result is that the main agent is able to work on a task for a very, very long time before its context window is used up.
The third way to exploit the context window constraint is to throw out portions of the conversation that are no longer needed. While I suspect ChatGPT does this under the hood, I don’t have direct evidence. It seems like the trickiest route. Claude Code accomplishes a similar (but not identical) result and “compacts” the conversation occasionally by asking for a concise summary.
LLMs Are Not Reliable
Reliability is an issue that is likely to plague us for a while and there is no single-solution that I am aware of that resolve it, however there are multiple strategies.
Google just released a paper focused on AI-security, demonstrating how you can reduce portions of conversation to code, and then ensure that the code runs safely19. Today’s models also do a better job evaluating code than they do evaluating words alone. In addition, code is formally verifiable. This means that it is theoretically possible to convert many conversations to code, evaluate the answers from that perspective, and then decide if an answer is correct or not.
A statistical approach is also possible with time: for a given question or type of question create a catalog of results. When the question is asked again, see how close the new AI-generated answers are to known correct answers. This is already part of the training process for most models, but can be built into the software around a model.
Model specialization, I believe, will go a long, long way. Most of the models we use today are generalist models. The same model you use to write code also answers trivia questions or review legal contracts. Taking a base generalist model and running it through post-training for specific tasks will produce more consistent results. Even better: training a smaller model on a narrow set of tasks – coding, legal, health, etc. – will also produce more consistent results.
You can also use a different model to evaluate the response of the first model – although I like this approach the least.
LLMs Just Don’t Get It
This constraint I don’t have a way around, at present, other than to try another model, which may prove to be an effective strategy. Anyone whose worked with LLMs over a period of at least weeks have found that some topics just seem out of reach. It’s like talking calculus with a 5th grader: they know what math and charts are but can’t take the conversation further. Except that the LLM bless-its-silicon-heart won’t stop trying.
The 5th grader is smarter in this regard and can just tell you she doesn’t know the answer.
We’ll Get Past the Constraints
My point in highlighting the problems and solutions is to show they have solutions. Today.
The main point: we’ve hardly optimized for these constraints; but we can, and the usefulness of AI will balloon when we do.
I also believe you will see the tech community turn its collective attention to these strategies very, very soon. Tools that take advantage of them will be everywhere before long, and the average techy person’s ability to use AI for useful things will take a giant leap forward. Soon as in weeks or months. This year. Q2, maybe Q320.
- You probably can’t do better than Simon Willison. Others include Tl;dr’s AI newsletter has great links to things but poor summaries. Hacker News surfaces technical news fast, with an interesting crowd-based analysis to follow. TWIL-AI podcast has interesting news each week I tune in. The Information is the top single news source in tech you can find. Axios has good mainstream coverage on AI. ↩︎
- Until a couple weeks ago, I’d say OpenAI’s o1-pro was at the top, and behind a $200/mo paywall. Today, Google Gemini 2.5, Grok 3 and Sonnet 3.7 are all contenders. ↩︎
- GPT 3.5 ↩︎
- GPT-3.5 Turbo-16k, and GPT-4-32k that I don’t think I ever got access to through the API, and as far as I know was never released on ChatGPT. ↩︎
- GPT-4 ↩︎
- Transformer based LLMs that take in a huge but finite amount of training data, even including synthetic data. ↩︎
- Some might take issue with this statement. Compared to mature technology, such as compressing images or audio, we’ve done very, very little. We’ve just focused on moving fast. ↩︎
- This definition is hated by many but will work for our discussions here. ↩︎
- Note the major model releases in 2025: DeepSeek R1, Sonnet 3.7/4, GPT 4.1, GPT 4.5 and Gemini 2.5 are all iterative approaches on existing technology. Incredibly important and valuable, but they are not fundamentally different, and show some level of plateauing – which also may be why the major vendors are so slow in increment their model numbers up. ↩︎
- With exceptions for spatial reasoning and sensory-specific thoughts, such as hot/cold, loud, and many emotions. ↩︎
- Mostly in the form of MCP servers, which I expect to be the dominate form of API connections in the short to mid term. ↩︎
- If the software was written today to allow LLMs access to the documentation they need on demand in an efficient way they’d already be able to do a lot of this work. The issue is reliability, where they don’t necessarily produce the same output enough of the time to truly be autonomous. ↩︎
- Minimal exaggeration here, except for large companies. ↩︎
- Although I can also imagine a world where government refuses to work with AI and doesn’t see major efficiency gains at all ↩︎
- IT Glue comes to mind, but SharePoint, OneDrive, and Google Drive all have APIs that an AI can use easily. ↩︎
- LoRAs, but for LLMs is the concept here. ↩︎
- Google has models with windows up to 1,500,000 words. However, in my experience and others’, its ability to work coherently at content remotely approaching that length is limited to non-existent. ↩︎
- Reasoning tokens, such as what we see in o1, o1-pro, R1 and Sonnet 3.7. ↩︎
- https://simonwillison.net/2025/Apr/11/camel/ ↩︎
- Claude Code, Cline, Roo and others are already using the delegation strategy I described, as well as learning how to manage the context window. ↩︎
0 Comments