AI Transformation is a Hard, Dirty Job That You Have to Do Well
- Alan Packer

- Jun 18
- 6 min read
Considerations for companies facing AI transformation

The reality of making Artificial Intelligence “work” in production is far more painful than anyone expects. AI feels seamless in well-designed consumer and enterprise apps, and the current state of the art in Large Language Models makes it easy and fast to build impressive demos and proofs of concept. So, when the time comes for a firm to develop their own AI solutions or to swap out legacy systems for newer tech, we have been conditioned to expect a quick fix.
This couldn’t be further from the truth. Building production-ready, effective AI is expensive, unglamorous work that requires trial and error, brute-force testing, and a massive resource commitment. Everyone feels the imperative to do something AI. But it is important to understand the challenges and commitments before you bite off more AI than your company can chew.
The Costs
If you are facing an AI transformation, you are either trying to implement AI into your organization for the first time, or you are trying to bring the org up to speed with the latest tech–moving from mainly machine-learning/algorithmic systems to LLMs, SLMs, and other more modern solutions.
But there is no such thing as a free lunch. Plugging into any model is costly and challenging.
Fine-tuning an open-source base LLM with your enterprise’s proprietary data can seem like an attractive option, but it often fails to pay off. First of all, fine-tuning is usually expensive. Truly tuning an LLM to your needs requires significant engineering and science resources, computing power, and data preparation of both training and test data. What’s worse is that this isn’t a one-time investment. If you're using somebody else's model, you are at their mercy. Any fine-tuning or changes can become obsolete with the next model release. This translates into quarterly or even monthly re-tuning of their model, with all the associated expenses and disruptions. Moreover, LLMs are improving so quickly that the next release of a base model may surpass your fine-tuned model in accuracy, effectively wasting your fine-tuning efforts.
Because of these challenges, most companies invest solely in prompt engineering, That is, the practice of crafting specific and effective instructions, known as prompts, to guide generative AI models in producing desired outputs. Unfortunately, prompt engineering remains as much art as science. Trying to tease out the best answers from the black box of an LLM can be difficult and even well-prompted LLMs can provide different answers to the same question. Worse yet, just like fine-tuning, your prompts are at the mercy of any given model update; a new model may be more accurate in general, but it’s just as likely to break some or all of your existing prompts.
The point here is not that organizations should avoid AI transformation. To the contrary. Companies need to invest in the processes and infrastructure that allow them to continuously keep pace with the evolving tech.
AI is the most compelling new technology in decades and it will define winners and losers in most technology-facing business sectors. This is why adopting the right attitude towards AI is critical. Effective AI adoption isn’t a task to be completed; it is an organizational update to be installed.
The Risks
Replacing a service's architecture with something new implies risk. You are taking something that worked and replacing it with something that hasn’t yet been sent to production. There is a chance of outright failure. Speed bumps, challenges, and putting out fires are an inevitable part of the process. For companies with hundreds of millions of customers, this poses a serious business risk.
For example, if customers aren’t having their issues resolved quickly because theLLM-based customer service chatbots you implemented are less effective then the previous rules-based routing, then those customers will be frustrated. In theory, the LLM should improve customer service because it can be more flexible and provide better answers.
But customers don't care about potential improvements if you break their existing experience and replace it with something worse.
Living with the Black Box
Modern AI systems are effectively black boxes, even when using open-source models. This means that your teams won’t have the ability to simply inspect the code when designing systems or solving problems. Because the models are probabilistic, unpredictability is inherent in their use. That unpredictability is near impossible to root-cause or systematically detect and diagnose with current LLMs and AI models.
This can be particularly brutal for teams that are transitioning from legacy AI systems. These systems weren't deterministic, but their behavior was predictable. So, teams got comfortable with the types of mistakes the algorithms made and implemented fixes with relative ease. With LLMs, that predictability diminishes, and so does the ability to easily design a fix. Non-deterministic behavior poses challenges both internally and externally for customers.
The Answer Nobody Wants to Hear
The solution to these challenges often isn't glamorous: it includes significant iteration and rigorous testing, and it requires hybrid solutions rather than simply relying on LLMs. The challenge with this is that testing is labor-intensive, expensive, and often involves a degree of exposure to risk. A great example of this is the Google AI Overview. Customers hated it at first. It gave bad answers, and it surely cost Google a pretty penny.
But, it was a necessary growing pain so they could transform their traditional information query service into an AI-first service. They believe that this is the right direction for them to go, so they accepted the expense and risk of testing their way through this pain. Now, two years after unveiling it, it has gotten a lot better after many interactions of customer feedback and updates.
To implement AI solutions that actually work, you need to:
Identify when prompts are broken
Compare old versus new performance
Measure outcomes starting from a place of imperfection.
Then, you need to tweak, reorder, and switch up tiny pieces that have potentially large and unpredictable impacts. Then, you need to measure again.
In short, you have to get muddy and keep restacking the blocks until it works. Then, when it breaks, you need to restack them again.
It's not elegant, but consistent testing is the only thing that works at scale. You can minimize the pain by investing in test automation, so that your retesting goes as quickly and costs as little as possible.
To implement effective AI solutions at scale, you must have a system for automated testing.
Hybrid Systems Are the Only Way to Work at Scale
Anybody who tells you that LLMs will solve your product problems once and for all is a religious zealot. The truth is that the probabilistic outputs of LLMs will always require some sort of deterministic rules to standardize simple and high-consequence responses and to limit risk.
For example, you NEVER want your service to refer to your business by your principal competitor’s name, so it is best to write this into a deterministic rule that guides the output of your LLM. Even though it is 99.9999% unlikely to do that anyway, if it is your proprietary model, it is simple enough to write rules at scale so that crucial responses are always accurate.
In terms of limiting risk, it is also worth writing deterministic rules so your model doesn’t include racial slurs or other offensive language in its answers. Again, unlikely, but not worth even an infinitesimal risk.
In my time at Bing, I learned that Bing.com mechanized the responses to the top hundred thousand search queries. At that scale, it was much easier to do this than run the risk of an incorrect response to queries such as "who is the president of the United States?"
The lesson: Probabilistic capabilities of LLMs don’t mean you have to give up on deterministic functions. Use them together to create the best customer experience and the most business-savvy solutions.
The other reason to do this is to keep everything based in the simplest technology that can possibly get the job done to the best standards. You want this because debuggability and explainability matter at scale, and making things complicated just for the sake of it is like buying your own ticket to problem-solving hell. Don’t chase complexity for complexity’s sake.
Do Something
If you are an enterprise tech leader, you are likely wondering, “How can we keep up with all this AI stuff?” Then, you are either trying to keep up or waiting for a clear answer to that question.
I have some good news and some bad news: The clear answer isn’t coming. This is bad news because your AI transformation will be tough and costly. It will require trial and error. This is good news because the only wrong move is to do nothing. And there’s further good news: Nearly every enterprise workflow or process is a candidate for automation with AI. Find the business-critical processes within your company that are currently 1) expensive, 2) require specialized knowledge and 3) are hard to hire enough humans for.
Keep in mind what I wrote above, and jump in and start building and testing. It is the only way to bring your business into the AI era effectively.
And, if you need any guidance on your process, let me know. That’s what Techquity is for.
Alan Packer is a Techquity partner who has led groundbreaking advancements in AI, machine learning, and search technologies. At Amazon, he played a pivotal role in scaling natural language understanding for Alexa. At Meta, he drove search ranking innovations, impacting billions of daily searches. At Microsoft, he spearheaded AI-driven projects, including advancements in speech recognition and Natural Language Understanding for Cortana and security solutions like spam filtering and Windows Defender anti-virus. Earlier in his career, he developed machine learning-based content filtering at Rulespace, which later became the foundation for Bing Safe Search. Today, he is an AI enthusiast and consultant.





Comments