OpenAI announces multimodal GPT-4 promising “human-level performance”

OpenAI's logo, shot from below against blurred purple and white light streaming off the letters (purple on the left and white on the right)
(Image credit: Getty Images)

OpenAI has announced the release of GPT-4, the successor to its popular GPT-3 and 3.5 models, and has promised “human-level performance” in a more creative and stable package than ever before.

The new multimodal model can accept both text and images as input, and was stated to be more creative, reliable, and nuanced than its predecessor. It has been shown to process documents, photos, and charts at a similar level to text input and unpick complex context and tone from user inputs.

OpenAI showed that GPT-4 can reliably identify and caption objects in images, and use these as input context, in a number of examples. These include processing information from a chart, translating and solving a French exam question, and identifying what is wrong or humorous in a given image.

In a livestreamed demonstration of GPT-4, president and co-founder of OpenAI Greg Brockman used the model to translate a photo of a sketch he had made of a website into working HTML code.

GPT-4 also offers major accuracy and stability increases relative to GPT-3 and GPT-3.5 results, having scored in the top 10% of test-takers in a simulated bar exam while GPT-3.5 scored in the bottom 10%. In its blog post on the release, OpenAI stated the model has shown “human-level performance on various professional and academic benchmarks”.

https://twitter.com/DrJimFan/status/1635694095460102145

The new model can process longer documents than ever before, with accepted strings exceeding 25,000 words, and has enabled the long-form analysis and aggregation of entire web pages.

In controlled tests, it was also able to complete extensive multiple choice questions in 26 languages, at a comprehension and accuracy level higher than that of GPT-3.5's English output.

This could dramatically improve document automation and customer-facing interactions, as well as the aggregation and translation of foreign-language dark web posts by security companies.

The road to GPT-4

RELATED RESOURCE

Drive digital transformation with IBM process mining

A process discovery, analysis and monitoring technique to help businesses succeed throughout the entire DX journey

FREE DOWNLOAD

Achieving these results required OpenAI to redesign its deep learning stack from scratch, while Microsoft’s partnership and $10 billion investment helped the firm establish a supercomputer to facilitate a stable training process for the vast model.

The work has also helped set GPT-4 apart from the models that came before it, both in complexity and reliability.

“On first review, GPT-4 does seem like an important advancement over GPT-3,” Bern Elliot, research vice president and distinguished analyst at Gartner, told IT Pro.

“Much of it is very new and so it will take some time to really understand where and how its improvements over GPT-3 will be realised. The first feature that leapt out to me was multimodal capabilities of GPT-4, where the inputs can be both text and images. This offers a range of new use cases. For instance, in visualising information or in identifying and describing content.

“A second capability that stood out was the more advanced multilingual abilities – its improved ability to handle inputs in languages beyond English. In the press release, they state that in 24 of 26 languages tested, GPT-4 outperforms the English-language performance of GPT-3.5. I spoke with a bank in Thailand this week that was struggling with how to leverage ChatGPT in Thai, this feature capabilities may allow them to do this.”

Elliot also highlighted the improvements to model steerability, which could help firms to customise user experiences, and the reduction of overly-confident lies or ‘hallucinations’ in GPT-4 compared to previous models.

Hallucinations are a growing issue with generative AI, with firms fighting to maintain creativity in models without encouraging them to invent information when real facts are necessary.

“As stated above, it will take time to fully understand the use cases where GPT-4 will excel," Elliot said.

"However, as indicated in the above functions, there are clearly some business uses that this version can address that were not addressable by GPT-3. There is also the overall improved capabilities of this more advanced model. While that may not make a difference for all, for demanding use cases this may be important.”

OpenAI has been clear that the model is not without its limitations. Like its predecessor, the majority of data used to train GPT-4 is from before September 2021 limiting the scope of its knowledge, and cannot learn knowledge through repeated exposure.

The developers enlisted the help of 50 IT experts to adversarially test the model and report its weaknesses in order to reduce unwanted outputs seen in GPT-3.5 such as lies and potentially harmful content.

“We spent six months making GPT-4 safer and more aligned,” OpenAI stated on the GPT-4’s product page.

“GPT-4 is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on our internal evaluations.”

Subscribers to OpenAI’s paid tier, ChatGPT Plus, have access to GPT-4 as of now, while those seeking ChatGPT API access to GPT-4 have been asked to join a waitlist. OpenAI stated that select developers are being given access to the API as the firm scales capacity for GPT-4.

OpenAI also revealed that it is making Evals, the software framework used to evaluate large models such as GPT-4, open source. Those that submit “high quality” evaluations of OpenAI models through Evals run a chance of receiving GPT-4 API access, in an incentive that will help OpenAI rapidly collect data on the performance and quality of its products.

GPT-4 will cost $0.03 (£0.025) per thousand prompt tokens up to an 8,000 token limit, and $0.05 (£0.049) per thousand up to a 32,000 token limit.

RELATED RESOURCE

AI for customer service

IBM Watson Assistant solves customer problems the first time

FREE DOWNLOAD

In response to the announcement, Microsoft has revealed that its Bing chatbot, which has helped drive the search engine’s daily traffic above 100 million users for the first time, has been running on GPT-4 all along.

“If you’ve used the new Bing preview at any time in the last five weeks, you’ve already experienced an early version of this powerful model,” wrote Yusuf Mehdi, corporate vice president and consumer chief marketing officer, at Microsoft.

OpenAI has also revealed a number of partner companies that have already adopted GPT-4 into their stacks.

Financial services firm Stripe has used GPT-4 to summarise websites of potential clients, read and explain complex documentation, and detect fraudsters through syntactic analysis.

Morgan Stanley has found GPT-4 an ideal foundation for its internal chatbot that can draw together the sum knowledge of the firm from across its content library including PDFs. The technology has also been implemented as a virtual assistant in Danish app Be My Eyes, which normally pairs blind or low-vision users with volunteers over video calls.

From having described the contents of a fridge and recommending a suitable recipe to successfully having guided a user through a railway journey step-by-step, the model has demonstrated a clear ability at processing visual input. Be My Eyes is currently OpenAI’s only partner for image inputs, with a wider release on the horizon.

Rory Bathgate
Features and Multimedia Editor

Rory Bathgate is Features and Multimedia Editor at ITPro, overseeing all in-depth content and case studies. He can also be found co-hosting the ITPro Podcast with Jane McCallion, swapping a keyboard for a microphone to discuss the latest learnings with thought leaders from across the tech sector.

In his free time, Rory enjoys photography, video editing, and good science fiction. After graduating from the University of Kent with a BA in English and American Literature, Rory undertook an MA in Eighteenth-Century Studies at King’s College London. He joined ITPro in 2022 as a graduate, following four years in student journalism. You can contact Rory at rory.bathgate@futurenet.com or on LinkedIn.