The Rise Of The Multimodal LLM
The Rise Of The Multimodal LLM
https://www.forbes.com/sites/johnwerner/2026/05/22/the-rise-of-the-multimodal-llm/
Publish Date: 2026-05-22 15:53:00
Source Domain: www.forbes.com
Illustration of abstract stream. Artificial intelligence. Big data, technology, AI, data transfer, data flow, large language model, generative AI, binary concept
getty
There’s a new bit of jargon in the AI world, but it’s more than just a detail. It involves adding a familiar letter to a familiar acronym, and although that may sound glib, catching up might feel a little like déjà vu.
Do a quick conventional search for “LLMM.” You won’t come up with much, unless you check out the AI overviews, where Gemini in Google or Copilot in Bing tells you what this is.
“MLLM” does a bit better – you might find a result from IBM, and some academic papers, and a page from Github. But the idea of the Multimodal Large Language Model, or to some, the Large Language Multimodal Model, hasn’t really made it into the mainstream, to places like CNBC or Newsweek. It’s still sort of the province of the true tech geek – for now.
What is a Multimodal Large Language Model?
The essential concept of a Multimodal Large Language Model is that it works on different kinds of data, although there’s the implication that it does this through specific kinds of design. PhD researcher and engineer Sebastian Raschka defines the MLLM this way on a self-published platform:
“Multimodal LLMs are large language models capable of processing multiple types of inputs, where each ‘modality’ refers to a specific type of data—such as text (like in traditional LLMs), sound, images, videos, and more.”
If you assume that the machines do this by attaining something like a sophisticated form of distillation, you’d be right. But there’s another component to this, too. In some ways, it sounds like engineers are going back to the well of using classical ML techniques to enhance what an LLM, as a central “brain,” can do.
This starts with attaching sensor tools to the LLM itself, to bring that multimodal data in.
“Recent research shows that Multimodal Large Language Models (MLLMs) can…