2023-09-17 19:47:14 +0000

Language Model UXes in 2027

Recently, I started working on a new LLM-oriented project. It’s currently far from feature-complete and probably filled with bugs, but I’m hoping to use it as a platform for exploring a question I’ve been considering a lot lately: what will LLM-powered software look like in the medium-term future? It seems undeniable that LLMs are bringing about a significant shift in how we interact with computers, but the current state of those LLM-mediated interactions seems distinctly rudimentary. Using an LLM today (via a chatbot, a copilot, or whatever other UX) still feels rife with kludges, inconsistencies, frustrations, and limitations. Natural language is a fascinating and liberating interface, but it seems increasingly clear that it can’t quite replace all the other innovations that’ve occurred in digital UX over the last several decades. I’ve read several interesting articles recently (Maggie Appleton, Nick Arner, and Linus Lee, among others) that imagine improvements to current LLM UXes, ways we might improve on what we have today. With this post, I’m hoping to approach the question from the opposite direction: if one were to imagine the LLM-oriented UXes of, say, 2027, what properties do these UXes have? I’m writing it primarily for my own benefit, because writing is thinking, but I’m also hoping it’ll inspire others on the internet to fill in the gaps and let me know where I’m wildly off-base. Some of these ideas are partially implemented (for a developer-centric audience) in Lightrail, some of them are on my roadmap, and some of them I have no real idea how to begin implementing effectively; nevertheless, I’m hoping that defining these features is the first step towards a world where they exist.

1. Chat UX Consolidation

Lately, I’ve heard the opinion that LLMs are a harbinger of the end of distinct application-specific UX. Basically, the thinking goes, once we’re able to express our desires to computers in natural language, and once natural-language agents can carry out actions on our behalf, we’d no longer need the myriad of individual task-specific applications that exist today. Personally, I neither believe in nor desire this outcome. Along the lines of Dijkstra’s repudiation of natural language programming, I don’t believe that natural language is an adequate medium for conveying instructions with the precision required for many applications. Moreover, the “inefficiency” involved in carrying out many digital tasks “manually” is a core driver of the joy that comes with using computers (for me and, I assume, for at least some subset of others). Even if an LLM could provide me with a recipe that perfectly suits what I’m looking for, I wouldn’t want to give up the experience of using a recipe search engine and browsing through a large collection of recipes. Even if an LLM could provide me with restaurant recommendations based on my preferences, I’d still seek out a map-based UX for exploring the variety present in my vicinity. The desire to replace all UX with LLMs seems like a desire to replace all serendipity with efficiency, and I think (or hope) that such a transition is much more appealing in theory than it would be in practice.

With that being said, I do think that chat and chat-adjacent UXes specifically will see broad consolidation into a single point-of-contact. The current state of having a chat-ish UX that’s specific to each tool or website (e.g. a Documentation Chat on a library documentation page, a VSCode extension, a Google Bard integration, a really-badly-implemented chatbot in my banking app, etc.) doesn’t make any single one of those experiences more enjoyable, effective, or entertaining; it just makes all of them slightly idiosyncratic. If we’re moving towards a world where most software and digital experiences will be able to react to natural language commands, any steps that currently stand between me and the interface where I can deliver those commands (i.e. opening a functionality-specific app or website) seem like they’ll ideally disappear in the long run. LLMs will probably be used under the hood in all sorts of contexts, but the ones that are obviously/outwardly language-model-driven will collapse into each other. Beyond just leveraging the power of LLMs most effectively, part of why I’m betting on this is that having separate chat interfaces makes it much harder to deliver most of the other items on this list, like…

2. Persistence Across Uses

The role of persistence in LLMs currently is an interesting one. In a sense, LLMs and natural language UX make it especially easy to deliver short-term persistence—the “conversation” is the basic unit of interaction, so the context of previous requests/actions/statements is automatically included in the current one. It’s easy for LLMs to contextualize my current intentions within the scope of my previous ones, and interpret them accordingly. The idealized role of persistence in LLM UX is also fairly obvious: it’s easy to imagine an LLM-powered experience that remembers and “understands” all my previous interactions with it, and uses that information to better help me with whatever my current task is. Given the strengths that computers have had historically, it’s almost ironic that the understanding part of that vision is currently fairly robust, and it’s the memory aspect that is currently lacking. Context length limitations and the static nature of LLM weights make it difficult to genuinely implement long-term memory, and approaches like RAG show promise but seem to still have quite a lot of kinks left to work out before they can work at scale as a persistent memory bank.

However, since this post is (thankfully) speculative rather than prescriptive, I’ll just stick to the claim that by the 2027-ish timescale, we’ll have figured out a way to implement this sort of long-term persistence such that LLM UXes can reliably incorporate information from any point in my previous usage. Perhaps that’ll be via more sophisticated RAG, or just incredibly long context lengths (maybe via a different architecture, like RWKV), or maybe it’ll be some exciting advance in our understanding of LLM training. I imagine this capability will be a significant catalyst for bringing about the previous feature, Chat UX Consolidation—if I’m interacting with UXes that can remember and utilize my previously expressed preferences, desires, goals, and information, using that underlying memory across different use cases seems like low-hanging fruit for drastically reducing friction.

3. Universal Access (& Multimodality)

In a similar vein, I imagine we’ll soon give LLMs access to not just the information we paste into a chatbox (or have open in an application with LLM-powered features), but a broad swath of content across our digital existence. A small-scale and developer-centric example: I use GitHub Copilot in VSCode, and I was recently implementing a library with documentation that featured LLM-powered Q&A, and it felt bizarre to have two LLM-mediated experiences open that each had exactly half the info needed to solve my problem. If LLM UXes consolidate into a single, long-lived assistant, I’d expect that assistant to be pulling information from my browsing history, the files on my laptop, files in the cloud, and any other textual source of information. The fundamentally incredible thing about the modern LLM is that it turns text into actionable meaning, and it just so happens the vast majority of the content I create or consume on my laptop is textual in some form or another. A relatively small minority is in image or video format, but I don’t expect that to be an impediment, since Multimodal LLMs are already imminent.

In addition to the content itself, I’d expect LLM UXes to have a much better understanding of my current and past interactions with the other, non-LLM UXes. Pronouns have long posed a challenge in natural language processing, and the latest iteration of this problem seems to be using them to refer to other things on my screen. I expect to never need to copy-paste anything into a chat box; using “this” or “that” (or some version thereof) to refer to windows, selections of text, or things that I’m hovering over feels like functionality that’s necessary for a smooth interactive experience. I won’t belabor this point too much more since Linus Lee discussed it at length, and I’m sure I can’t explain it better, but it does seem like an essential and logical UX progression.

3b. (Hopefully) Commodified, Local LLMs

The more privacy-concerned reader might currently be a bit alarmed by the last two points. To be clear, these are predictions first and foremost; while I don’t necessarily love the idea of {OpenAI/Microsoft/Google/Whoever} hoovering up enough of my data to deliver that sort of functionality, I do think that historically, users have been fairly willing to sacrifice data privacy for a convenient and seamless experience, so that’s what I’m expecting will happen here as well. I’m hoping, however, that these developments occur in tandem with the commodification of effective/high-performance LLMs and the hardware needed to run inference with them. In an ideal world, I’d like to have the underlying model be a swappable implementation detail. Llama 2 and similar developments make me optimistic. My (potentially naive) hope is that if the LLMs and their memory banks can be run/maintained locally, they can even reduce the need for large-scale data collection by various domain-specific businesses, since interacting with my LLM (in a way I permission) can provide third-party services with the precise information they need to personalize my experience for me without having to rely on heuristics and models that require broad non-specific personal data. However, this is all more of a wish than a conviction; currently, locally runnable and open models are no match for GPT-4 (etc.), and I have no real reason to believe that the winning players in the LLM UX space will develop with these concerns in mind. 🤞

4. Dynamically Generated UI & Higher Level Prompting

Today’s LLM UXes lean heavily on chat & manually written prompts, but I expect that to change rapidly. Prompting often seems more like art than science, and it’s full of unexpected nuances that can make it much less accessible than it initially seems. Text is a broadly intuitive interface, but in many domains, it’s much worse (in terms of discoverability & accuracy) than domain-specific controls. Similarly, the text outputs of LLMs are often not the optimal presentation of the information or task that was requested. Within a few years, I expect to see LLM UXes leaning much more heavily on (a) structured or semi-structured input and (b) dynamically generated UI elements based on or as part of responses. I expect a lot more suggestions for possible actions or ways of accomplishing tasks, a lot more behind-the-scenes prompt construction that just interpolates user inputs, and a lot more incorporation of UXes like toggles, sliders, and buttons that generate specific prompts upon interaction. On the output side, I expect interactivity as part of the LLM’s response, such that instead of sending me text, I could directly be presented with the UX I need to carry out any parts of the task that the LLM-based system couldn’t handle internally. I also expect LLM’s ability to use tools and interact with other software to improve rapidly (see Adept and Meta’s Toolformer or even ChatGPT Plugins). Conversely (and more speculatively), other applications and websites that don’t rely on LLMs themselves might still integrate functionality that reacts to LLMs, if the idea of a consolidated touchpoint for LLM interaction comes to pass. For example, assuming I have a single LLM “assistant” with long-term memory and a far-reaching context, I could imagine websites I visit using a permissioned API to make requests to that LLM and use the responses to change the information or UI that I’m shown—I’m not certain exactly what that’d look like yet, but it’s an idea I’ve been coming back to a lot lately.

5. Proactive Interactions

Finally, if we can build centralized LLM UXes that have access to this wealth of information, I’d expect them to make proactive suggestions rather than always requiring me to request actions. Of course, anyone (of above a certain age) reading that sentence is probably thinking of the horrors of Clippy. And, yes, there are a million ways in which a proactive UX could go horribly wrong. However, just like ChatGPT delivered a genuinely useful product standing on the trash heap of pre-LLM chatbots, I’m hoping that future attempts at this sort of proactivity hew much closer to the ideal of “the right thing at the right time.” Maggie Appleton’s concept of LLM Daemons is an interesting potential implementation whereby the LLM provides proactive (but unobtrusive) suggestions only within a certain content and with clearly defined goals. Github Copilot is also a pretty good implementation of a proactive UX, albeit one with strict limitations. On average, I expect that the balance of me-going-to-the-LLM vs. the-LLM-coming-to-me will shift in favor of the latter as time progresses.

In Summary

I’m sure those 5 features are far from a comprehensive picture of where LLM UX is headed, but I do think the picture they paint when taken in aggregate is a pretty good indication of where things are headed. A single entry-point for most explicitly LLM-driven interactions, one that has access to a memory bank of all my past interactions with it as well as all the consumable data that makes up my digital life, that can use current UX best practices as layers on top of both prompt-construction and LLM outputs, and that can take helpful actions proactively based on my current task — I’d wager that’s a future we’ll start to see very soon. Of course, that’s a pretty high-level description, with plenty of details left to flesh out, something that I hope to do in future posts and my own experiments. Still, I’m hoping it’s enough of a concrete prediction to still be contestable, and I’d love to hear objections!

~ Vishnu