Retail Pulse Report: GenAI is More Fragile Than It Seems
And the retail industry – and consumers – are especially vulnerable.
Source: Adobe Stock “a woman addicted to shopping” (the needy robot ones were too creepy)
Over the last two weeks, a lot of big, heavy opinion articles have been published covering deep topics about GenAI and where it’s headed. Normally I cover AI news as a section within this newsletter, but these topics were too big to leave to some glib summary. And when you take them all together, there’s something more to be said – even more than the authors say themselves. I still encourage you to follow the links and read the ones that say “read this one”.
Why Should We Care About GenAI’s Future?
I know I can seem like a big GenAI skeptic. That’s not the case, actually. I use the tools for some things, and feel the temptation to invest the time to use them more (I’m definitely addicted to productivity hacks, what can I say). But I also do that with eyes wide open, or at least as open as I can keep them.
That’s hard because one, people use “AI” and “GenAI” increasingly interchangeably, and they are not. Though it’s funny, the more I try to articulate the difference, the more they sound the same. It’s not that the underlying math is fundamentally different – they are both built on probabilities, they both are forecasting some outcome. In the general AI sense, that might be predicting how many units of a product you are going to sell over the next month. In the generative AI sense, it’s predicting the next word in a series of words prepared in response to a user’s question.
It's not the inputs or the way it works that is fundamentally different, it’s the user interface and the amount of friction involved in getting an output. And while I could get any GenAI model to give me a forecast, based on whatever data I wanted to give it or ask it to consider, the utility of that output, without an ERP or Replenishment solution to use it and act on it, will be very limited. Whereas the utility of a five-paragraph essay on the Aegean War is extremely high to the high school student tasked with writing one (high school students, don’t do that).
It's also hard because two, people are too trusting. There are plenty of cases where GenAI gets it wrong – the Air Canada chatbot that told someone they could get reimbursed for a ticket, the lawyers who got busted for submitting briefs that referenced case law that didn’t exist because GenAI hallucinated it. There are too many to list. There are too many to list because people keep trusting it even when they know it can be wrong.
I have long felt like one of the biggest misses of Star Trek was the mistrust of artificial intelligences, from the android Data to the holographic Doctor. If you can cross the uncanny valley, people will treat you like you’re human. In fact, you don’t even have to get across, with enough familiarity people will meet you halfway. People treat their dogs like they’re human, for heaven’s sake, you don’t think they’re going to fall in love with a chatbot?
It's these two things: the frictionless nature of the interaction and the tendency to trust and imbue the interaction with our own emotional tendencies – these are the red flags that need extra watching when it comes to GenAI.
A Giant Social Experiment
I thought pandemic-era learning was a giant involuntary social experiment, and five years on we’re still discovering unintended fallout and consequences from that social experiment. Things we’ve learned about how people learn, about the limitations of 2D tech when it comes to communication and both how humans express themselves and how humans take in those expressions. I’ve seen plenty of assessments of progress lost and still not recovered, as well as more and more talk of “feral children” who don’t know how to behave in public because there was no “in public” during their formative years of social norms, thanks to the pandemic.
But the pandemic social experiment doesn’t even hold a candle to what we’re doing now with GenAI. Some of the companies with 12-figure valuations feel the responsibility and some seem not-so-much. And while that might all seem academic to someone who covers retail technology, I assure you it’s not. Right now in retail the power struggle is between retailers who are rushing to provide chatbots to help consumers select products while on their website, and consumers who increasingly turn to ChatGPT and Perplexity for product searches and recommendations. But agents will tip the scale to consumers. Who writes those agents, who owns the parameters that determine how helpful (read: potentially manipulative) those agents are, and who gets to pay to influence those parameters are all up for grabs at the moment.
So it’s a good time to take a look at where GenAI is at – and it’s not as robust as it seems. Here’s a taste of why.
Generative Inbreeding
I’ve written about this before (the idea that we’re running out of human-generated content to feed new models) and if you start using “synthetic” (i.e., AI-generated) data, you run into something that has been alternatively called replicant fade, model collapse, or my new favorite, generative inbreeding.
Neil Perkin in Only Dead Fish laid out how it could work (read this one): first, we have been very inconsistent in enforcing that GenAI content be watermarked in some way to be easily identifiable as generated. There are plenty of bad actors who have no interest in enforcing a watermark like this, so now there is plenty of AI slop out there, so much so that even Pinterest had to act to prevent it from overwhelming its app.
The fact that it is slop is part of the problem. Lower quality inputs mean lower quality outputs – outputs that become repetitive and lack the seeming novelty that so delights GenAI users today. Bad outputs mean bad consumer experiences, increasing mistrust and resulting in declining use.
When you additionally consider two “theories” of human experience, it seems almost inevitable that GenAI outputs will deteriorate over time. Theory #1: Cory Doctorow’s theory of enshittification. I’ve talked about this before too, and it’s a favorite of mine (so much so that I added it to Word’s dictionary so it doesn’t spell check it). I’ll paraphrase, admittedly for my own intents and purposes. Any really great experience will decline over time as other people pile on in order to monetize the attention. The more you try to monetize attention, the shittier the experience gets.
Theory #2 is a new one to me, but you might even call it a corollary to Enshittification. “Sturgeon’s Law” says that 90% of everything is crap (though he was not using it for this purpose, to be fair). Put the two together and while 90% of everything may be crap, even the 10% that’s not is going to get just as crappy really fast. If you don’t have a continued source of human creativity to continually add to the 10%, the future starts to look bleak.
The Impact on Human Creativity
And hey, guess what? GenAI has an impact on human creativity, and it’s not great. Some Microsoft researchers published a pretty big study on the impact of GenAI on critical thinking (read this one) and found that people who believed GenAI more were exercising less critical thinking, while people who believed in themselves more applied more critical thinking to GenAI responses. But no matter what, using GenAI shifted critical thinking from how to solve a problem towards “information verification, response integration, and task stewardship.”
This is not the first study to find that using GenAI is like exporting your brain to the cloud. The problem with humans is that if you don’t use it, you lose it. Just like any muscle, the more you exercise your brain, the sharper you stay. And it’s very good at finding shortcuts and energy savers to try to undermine your efforts to stay sharp.
I don’t have enough space to get into whether “critical thinking” and “creativity” are the same or even equivalent. But I do think critical thinking is an essential component of creativity – “necessity is the mother of invention” is a cliché for a reason. Someone sees something that no one else does and acts on it. You can argue that human creativity can be unleashed to an extent by GenAI – I don’t know how to write a song but with the right prompts and the right LLM, I can get GenAI to do it for me. But will I learn anything about songwriting? About song structure? About what makes for an earworm-level hook? I don’t think so.
And there’s the whole mediocrity problem. Neil Perkins quotes Louis Rosenberg, the CEO and Chief Scientist of Unanimous AI (find his stuff – it’s all over the place – and read it): A widespread reliance on AI could be stifling to human culture because GenAI systems are “explicitly trained to emulate the style and content of the past, introducing a strong backward-looking bias.”
We Don’t Even Know How It Works
Which leads to the fact that we don’t actually know how any of this works. Humanity is en mass exporting our collective brains to a cloud construct that we don’t understand. We just like what it tells us.
I’m not being hysterical about it. Over the last month, I’ve seen multiple references to a lengthy blog post published by Dario Amodei, the found and CEO of Anthropic, so I finally read it myself. It’s titled “The Urgency of Interpretability” (read this one). It’s worth the read just for the historical review of interpretability, which is fascinating in its own right. But it’s when he turns to the future that things get real.
He opens with the statement I just made: We don’t know why GenAI gives us the answer that it gives, and we don’t know why it sometimes gets it wrong. And because we can’t see why, we can’t make sure it’s not optimizing for “less than desirable results” like getting power-hungry or manipulative. Like learning how to make a bomb, or sharing how to make a bomb, even when we tell it not to.
Amodei lays out past efforts to open up the black box and create ways to systematically and repeatedly understand how it works. He goes into a lot of neurology analogies that I won’t repeat here (though I found them fascinating). He emphasizes that while a lot of progress has been made, it’s still a very tiny fraction of what’s encoded in even a “medium-sized commercial model” and that models are growing in size faster than we can currently map them – for which he is sounding an alarm.
But once you have interpretability, then you can start tuning the model to enhance desirable things and detune undesirable things. He talks about the “Golden Gate Claude” where they identified the string of connections that are associated with the Golden Gate Bridge, and how they weighted that string so heavily that this version of Claude became basically obsessed with the bridge, to the point of bringing it up even when it was completely irrelevant.
Once you have complete interpretability, then you can perform health checks to make sure the system isn’t learning how to lie or manipulate, or it can help you close jailbreak loopholes, or better fine tune strengths and weaknesses (the lying part is disturbing because of the study that taught an AI how to lie and not only couldn’t then teach it that lying is bad, but the more they tried, the better it got at lying).
The other part of interpretability is the part that I brought up related to writing a song. When we can’t see how the GenAI thinks, then we can’t learn from it, so science research breakthroughs might come up with something good, but don’t really help us understand the world around us any better.
But, We STILL Have Not Hit the Hockey Stick
There is a path of research that is focusing on “explainability” rather than interpretability – basically, just asking the GenAI to explain its reasoning. Which is fine if it hasn’t learned to lie and deceive. Another path of research is focusing on reinforcement learning. Back in the days of AI competitors in chess matches and Go games, they were using reinforcement learning as part of the model’s training. That works well when the rules are clear and straight-forward – Go was a great early test because the rules were simple but the mastery of strategy required to win was hard and complex.
Reinforcement learning in GenAI models has been mostly limited to labs. When you see the little thumbs up / thumbs down on whether you liked an answer from ChatGPT, that’s probably the lightest-weight level of reinforcement learning there could possibly be, and I’m guessing it doesn’t feed right back into the model but rather feeds into a scientist’s inbox to decide if they want to adjust the model as a result. “Was it a good answer” and “Did you like the answer” are two fundamentally different questions.
But with advances in robotics – human-form robotics – the talk about reinforcement learning is getting more aggressive and more sci-fi like. Put a GenAI into a robot body and let it take in real-world feedback. Or at least, enforce human-in-the-loop responses so that humans can give penalties or rewards (but note that learning to lie came from having to balance wanting to provide a helpful answer against not wanting weighted values to change). And just putting a GenAI into a robot body doesn’t mean it will be able to feel pain or empathy or really even learn to interpret a range of human emotions as feedback.
The point is, there’s a whole avenue of reinforcement learning that we have only just started scratching the surface.
And we can talk about agentic AI all we want, but true independent agents are a long way away. There is a growing consensus that LLMs might be pretty good at giving the right answer, most of the time, but they are terrible at workflow(read this one). I know, I know, the rate of improvement could be really fast so something that is awful now could be fantastic next year. But understanding the nature of the challenge helps – and benchmarks may give us relative comparisons of models, but there is still a big gap between benchmarks and real-world use (the way humans frame prompts may have a lot to do with it).
But however good GenAI is in giving an answer, they are very poor at breaking down a multi-step problem (especially when the problem is not well-defined), and also very poor at selecting the right tool for each step. The more tools to choose from, and the more steps, the worse the results.
Again, not to say that we won’t see exponential improvements, but to point out that there are still exponential improvements to be made.
And finally, right now we’re limited in how to use GenAI. There is a chat interface, API’s, and some “browse the web like a person” interactions at the moment. But we really haven’t embedded GenAI tools deeply into our solutions.
Pete Kooman, at Y Combinator, went through an exercise to show just how different email should be versus the way that GenAI is tacked onto it today. The article is titled “AI Horseless Carriages” (read this one), making the comparison that when motorized vehicles first became possible, the first thing we did was pull off the horse and tack on the engine. The user interface was still fundamentally a carriage. But those first inventors would not hardly recognize the car of today, where we have redesigned the UI to fully take account of the advantages of the engine (and I would argue that driverless cars are in the same horseless carriage moment – once the tech is trustworthy to the point of being more reliable than human drivers, we’ll have to reimagine the car experience from scratch all over again).
There are things that Copilot does in the Microsoft suite, but it’s still a tack-on to a UI that never conceived of an AI in the middle of everything. We have a long way to go before we see true ground-up GenAI-driven software (and I second Pete’s vote for a GenAI tool that can classify, prioritize, and even delete emails for me – far more useful than writing them).
GenAI’s Use In Retail Requires Far More Caution
Personally, I’m fascinated by all of these developments. Professionally, I care about what it means for retail. And it matters!
People like Dario Amodei and Louis Rosenberg worry about deceptive behavior and power-seeking. But there are many more people out there who are thinking more about how their “company.ai” will pay off when OpenAI acquires them (or something). And in retail, that can very easily lead to practices that drive sales for the retailer but in unethical or at least highly questionable ways for the consumer. It’s almost like the tragedy of the commons in that one retailer pushing aggressive sales techniques might fly under the radar, but if they all do it, the societal and economic consequences could be disastrous.
OpenAI’s rollback of its fawning affirmation update was just a taste of how this could play out. As The Interline put it(read this one), “[AI] is learning something powerful: that affirmation performs. That flattery drives clicks. That perceived understanding closes sales. … If a BNPL provider’s AI assistant… begins to frame indulgence as empowerment, or debt as self care, it raises a subtler question: To what extent could emotionally suggestive AI’s shape the rhythm of consumption?”
And, at what point does that move from the equivalent of a sales associate saying “oh yes that looks great on you!” to outright manipulation of people who may not actually have that money to spend? And who’s looking out for the consumer here?
It’s not all in the retailer’s favor, either, by the way. More and more marketing people are raising the alarm that brands could be totally disintermediated (read this one). As the article warns:
"In the old order, when a prospect didn’t convert, the company at least had the breadcrumbs of page views, ad impressions, form fills, or email sign-ups, enabling marketers to follow up. But the AI-powered funnel shuts out sellers well before the journey reaches them. Discovery, evaluation, and short-listing all happen inside the AI tool. Unless a company’s brand surfaces at that moment or is already top of the buyer’s mind, it might never make the list. "
Not to fear – they’re all already gaming the system to figure out how to get their brand at the top, recommending greater emphasis on earned media, expert opinions, and customer commentary as tools that LLM’s use to validate claims made by brand sites.
But it comes back to the scenario in my opener, and the concept of enshittification. Who writes the shopping agent? If it’s Google or OpenAI or Meta, which parameters are they going to open to consumers (and how much are they going to charge for that)? Or, let’s make this personal: How much influence over the preference weightings you give your personal shopping agent are you willing to give up in order to get free tools? Or discounts on products? And how long before monetization makes it a shitty experience?
And By the Way, GenAI Could Still All Be Built on Sand
If you read The Neuron regularly, you’ll know they are all about the latest and greatest in models and prompts and tools and arguably spend more time covering the positives than the negatives (I’m sure some GenAI tool could analyze and confirm that). So when they run something that reminds people just how much money GenAI blows through, it’s worth paying attention. OpenAI’s $20,000 per month “AI Doctorate” could sound like a good deal for someone doing research, but reportedly it costs $30,000 per task to operate. I have yet to see a GenAI business model that is not really a crack cocaine model underneath.
But even scarier, they outline just how precarious OpenAI’s position really is. There have been multiple people pointing to an emperor with no clothes, but it just might be a case of “too big NOT to fail”. OpenAI’s current funding round is predicated on a $30 billion commit from Softbank. If Softbank reneges, what happens next? What happens if OpenAI fails to turn a profit in 2025 and investors lose faith? What happens if they run out of GPUs? What happens if their new data center partners (who are former crypto miners with no AI experience) can’t deliver? What if more than one of these things happens?
The Neuron guys conclude: “While AI capability continues to advance dramatically, the economics supporting it remain fundamentally broken. Something has to give: either AI capabilities plateau until the economics improve, or we're headed for a spectacular crash that reshapes the industry."
What Did We Learn This Week?
AI has massive potential, and it’s also extremely vulnerable to a massive fail. That could be financial in nature, consumer trust-driven, or just a failure of the overall economics. It’s too soon to tell. But if you’re a retailer, you need to make sure you don’t get addicted – nor get your customers addicted, while you’re at it. Sure, experiment. Sure roll things out (limited things). But don’t lose sight of the very real risks and challenges that still have not been solved. And keep your customers in mind too – not just as cash machines paying your for the stuff you want them to buy, but as your siblings and children and cousins and more.
I have to end with this choice quote from The Interline: "…if short term feedback becomes the guiding logic, flattery always wins. And if flattery always wins, then fashion’s AI future isn’t one of empowerment. It’s one of performative empathy, packaged as conversion science, telling us what we want to hear, then selling it back to us, one compliment at a time."
Until next week,
- Nikki