AI21 Labs is a bit like an Israeli rejoinder to U.S.-based OpenAI. It is both a research lab, doing cutting-edge work on natural language processing (NLP), and also a commercial business, hoping to quickly push those state-of-the-art developments into products that real businesses can use—and pay for.
AI21 Labs was founded by Yoav Shoham, an emeritus professor of artificial intelligence at Stanford University; Amnon Shashua, a founder of autonomous driving software company Mobileye, which was acquired by Intel; and Ori Goshen, a founder of crowdfunding platform CrowdX. The company’s lofty goal is “reimagining the way people read and write, for the better.”
The lab has built a new system that it somewhat cheekily calls “Miracle,” a friendlier version of MRKL, an acronym for Modular Reasoning, Knowledge and Language system. MRKL is important because of what it says about four key trends in how businesses will use A.I. going forward.
First, MRKL is designed to handle all kinds of natural language tasks, not just one specific job as most such systems have until recently. For instance, if you wanted a customer service chatbot, the same A.I. could not help analyze the sentiment of CEO earnings calls. But now a single NLP engine can help handle both tasks. It is another example of the genuine revolution in NLP and the impact it is starting to have on business.
The second, and closely related, trend to note is that these general-purpose NLP systems will increasingly be built upon “ultra-large language models,” single algorithms that learn billions of statistical relationships between words. They are trained on vast amounts of text scraped from the internet, including books written in English and other languages, as well as public sources like Wikipedia and Reddit threads. Most of these systems are trained either to predict a missing word in a sentence or the next word in a sentence. But it turns out, when you build an A.I. system that big and train it to do one thing, it’s also able to do a lot of other things with little to no additional training: translation, answering questions, and writing original passages of text.
What’s more, with just a little more training on a relatively small number of examples, these large language models can often outperform smaller A.I. systems that were trained on big data sets—often curated at great expense—to accomplish just one narrow task. It is this ability to perform with “little data” that makes ultra-large language models so potentially attractive to business because using them could be faster and cheaper.
Perhaps the best-known example of an ultra-large language model available for commercial use is OpenAI’s GPT-3. OpenAI has a close relationship with Microsoft, which invested more than $1 billion in the company, and, unsurprisingly, Microsoft has incorporated GPT-3 into a product that automatically writes computer code. It also makes the technology available to its Azure cloud customers.
AI21 Labs has its own ultra-large language model called Jurassic-1 that it released commercially last year and that it claims is superior to GPT-3, partly because it has a larger “token vocabulary.” That refers to the number of words and parts of words it knows. Jurassic has a token vocabulary of more than 250,000, five times GPT-3’s.
There are some well-documented problems with these ultra-large language models, including that they can be prompted to spit out toxic language. But another giant flaw is that they have a tendency to produce inaccurate information in response to factual questions.
For instance, ask GPT-3 to add two plus two, and it will confidently tell you four, but ask it to add several four- and five-digit numbers, and chances are that it will just as confidently spit out the wrong answer. Ask it what the weather is like in New York currently, and it will tell you, but it will likely be the temperature in New York whenever data from AccuWeather was scraped into its training set, not today’s weather. The same problem applies to questions about current events or even science. And because these large language models are so big, they are extremely expensive to train—in the millions of dollars—so it is not practical to constantly update to ensure their data is up-to-the-minute.
This is the problem AI21 Labs set out to solve with MRKL (I wrote about one of the lab’s previous innovations here). Which brings us to the third big trend that MRKL represents: MRKL is a hybrid system. It doesn’t only use deep learning, the A.I. method that is responsible for most of the big leaps forward in the technology over the past decade. Instead it combines different modules, some of which use deep learning, and some of which use an older form of A.I., symbolic reasoning, to provide accurate, up-to-date responses to factual questions.
The clever thing about MRKL is a module called a router that takes a question from a user and figures out what kind of information the user is seeking. If the question involves mathematics, it sends that question to a plain, old-fashioned scientific calculator. If it involves exchange rates, it routes it to a currency converter. If it is about weather, it sends it to a forecasting website. There are 55 of these task-specific modules that MRKL currently supports, according to Shoham. If the router is unsure which module is best, it calls on Jurassic-1. Jurassic also helps compose the contextual language around MRKL’s response.
Another clever innovation here is how AI21 Labs is able to elicit the right kind of response from Jurassic. It does this with a method called “prompt tuning,” in which the way an initial question or fragment of text is fed to the ultra-large language model helps determine the nature of the output. It’s one way to adjust the A.I. for a particular kind of task without having to fine-tune it with additional training data. The problem with additional training is that as the system gets better at one narrow task, it actually gets worse at others. Researchers call this problem “catastrophic forgetting.”
Some A.I. researchers overcome catastrophic forgetting by training the model for a variety of disparate tasks at the same time, but that takes a lot of computer power, time, and money. Prompt tuning avoids this. AI21 Labs’ innovation with MRKL is to create small deep learning modules that can automatically prompt tune Jurassic on the fly, taking a user’s query and composing the best set of prompts to nudge Jurassic into coughing up answers in the correct style and format.
And with that here’s the rest of this week’s news in A.I.
Jeremy Kahn
@jeremyakahn
jeremy.kahn@fortune.com
This story was originally featured on Fortune.com


