Generative AI and the Burden of Interface Complexity
Who should do the heavy lifting of interface complexity? Definitely not the users!
Most AI applications today are either just a simple AI chat interface or the original application with the AI chat interface glued on top. Apparently, all it takes today is to take the chat window, slap it over any application, and we're done - now it's the time to put the AI-powered, AI-driven, or AI-whatever label on the company website. Profit!
One is almost dizzy from this inflation of chat interfaces. Of course, chat, conversational, or textual interfaces are valid and robust design concepts in specific use cases. Talking to other human beings online or a command line interface comes to mind, sure. But trying to squeeze all the commands into that tiny text field? No, please! We've been there - remember MS-DOS?
Author and designer Golden Krishna wrote a whole book around the statement that the best interface is no interface, and the engineers/designers of the current AI products apparently took notice. It's a valid concept but taken too far in the space of recent AI tools development. Mainly because the AI systems - in the same way as any other systems - have a certain level of complexity that cannot be designed away into the form of a simple chat interface.
As stated by Tesler's law, for any system, there is a certain amount of complexity that cannot be reduced. It implies three main takeaways:
The complexity needs to be assumed by either the system or the user.
As much as possible, the burden should be lifted from users.
We should not simplify the interface to the point of abstraction.
I argue most of today's AI systems are breaking this law by assuming most of the complexity burden to users. All the heavy lifting of controlling the application is left to users - the UI controls are gone, and all that remains is the chat interface. That requires unnecessary effort from users to remember various commands to control the results of generative AI.
We need to strike the balance between the chat interface and the traditional UI controls - not to design away so much of the interface that it's detrimental to user experience.
I think that the commands sent in the prompt to an AI system can be divided into two main categories. Let's look at generating images as an example, but the distinction works for other types of tools, too:
Creative commands - what should be generated - the subject and its properties
Technical commands - should it be artwork or photo, desired look, image aspect ratio, creative style, artist simulation, used colors, etc.
The distinction might be a little arbitrary, with a grey zone in the middle. But I still believe it serves a point applicable to the UX of such tools:
Technical commands should be lifted away from the textual prompt and user should be able to specify them with traditional UI elements like buttons, selectors, or sliders
These hybrid user interfaces were already mentioned by Jakob Nielsen in the article AI Is First New UI Paradigm in 60 Years: Future AI systems will likely have a hybrid user interface that combines elements of both intent-based and command-based interfaces while still retaining many GUI elements. I could not agree more - I strongly believe this is the way forward to allow AI capabilities to wider audiences.
Let's now look at how Adobe Firefly implements this hybrid interface. Firefly starts with a standard AI chat interface. It already contains animated suggestions of prompts together with the images in the background. These hints are great for overcoming the user's initial fear of an empty chatbox:
After the user enters the prompt, after a few seconds of processing, the following interface is shown:
The interface shows both the conversational interface for creative commands and the traditional UI for technical controls - this combination offers a wide range of possibilities for manipulating the image and creating different artistic styles without the need to type all the commands manually. The difference in the user interface and experience is striking when compared with other image-generating tools like Midjourney or DALL-E via ChatGPT:
With Firefly, Adobe did a great job with a hybrid user interface - it nicely combines creative and technical controls into one cohesive user interface, allowing even less experienced users to create a wide range of image styles. You might argue that the comparison is unfair since ChatGPT is a universal tool, and Firefly specializes in image generation. I think it's just a lazy solution that is hostile to the users. Since ChatGPT is invoking a specific image-generating model (DALL-E), it should also invoke image-specific contextual controls to accompany the bare chat interface. But that is a topic for a different article.