OpenAI Unveils Groundbreaking Model GPT-4o
Just a day before Google I/O, OpenAI stole the spotlight by introducing their latest model, GPT-4o. With the intelligence level of GPT-4, GPT-4o goes a step further by incorporating powerful audio and video processing capabilities, providing users with an experience close to interacting with a real person.
The uniqueness of GPT-4o can perhaps be inferred from its name. The “o” stands for “omni,” which suggests “all” or “everything” in Chinese, representing the new model’s ability to transcend text, audio, and video reasoning. “We are announcing the launch of GPT-4o, our new flagship model that can perform real-time inference with audio, video, and text,” stated OpenAI in a press release.
Read More:
Foreign media: Worldcoin may collaborate with OpenAI and PayPal! If true, will it raise more regulatory concerns?
Approaching human response capabilities, “like AI in movies”
While GPT-4 is also capable of image recognition and text-to-speech conversion, OpenAI previously divided these functions among different models, resulting in longer response times. GPT-4o, on the other hand, integrates all these features into a single model, known as omnimodel. Compared to its predecessor, the flagship GPT-4 Turbo, GPT-4o performs similarly in English and programming languages, but shows significant performance improvements in languages other than English. Additionally, the API speed is faster and costs have been reduced by up to 50%.
OpenAI highlights that GPT-4o has response times approaching that of humans, providing users with a more natural communication experience. It can respond to questions within a minimum of 232 milliseconds and an average of 320 milliseconds. In comparison, GPT-3.5 and GPT-4 had response times of 2.8 seconds and 5.4 seconds, respectively, in voice mode.
During OpenAI’s demonstration, GPT-4o showcased its ability to provide real-time interpretation, enabling smooth conversations between individuals speaking different languages.
According to OpenAI, GPT-4o can instantly “understand” users’ expressions and tone, knowing how to respond and quickly switch between different tones. It can sound mechanical with a robotic voice one moment and sing with liveliness the next. Mira Murati, the CTO of OpenAI, stated that the development of GPT-4o was inspired by human conversational processes. “When you stop talking, it’s my turn to speak. I can understand your tone and respond. It’s so natural, rich, and interactive.”
OpenAI CEO Sam Altman also expressed his astonishment on the company’s blog, stating, “The new voice (and video) models are the best computer interfaces I’ve ever seen, like the AI you see in movies. I’m a bit surprised by how real it is and how much the response time and expressive abilities that reach human levels have changed.”
Despite some imperfections during the demonstration, as pointed out by MIT Technology Review, such as occasional interruptions and unsolicited comments on the host’s attire, GPT-4o quickly returned to normal after being corrected by the presenter.
Murati revealed that with the power of the omnimodel, future GPT technology will be further enhanced, such as explaining game rules to users after watching sports broadcasts, not just performing simple tasks like translating text from images.
OpenAI announced that GPT-4o will be available to users in the free version, while paid subscribers will enjoy a five-fold increase in message limits for the free version. The subscription-based voice services based on GPT-4o are expected to be available for testing by users next month. The fact that GPT-4o is offered for free to users reflects OpenAI’s success in reducing costs.
However, due to concerns about potential abuse, OpenAI mentioned that the voice functionality will not be immediately available to all API users and will first be provided to select trusted partners in the coming weeks.
ChatGPT Desktop App Launch and Free Opening of GPT Store
Alongside the significant enhancement of audio and video capabilities in GPT-4o, OpenAI also announced the release of an updated ChatGPT UI for web browsers, claiming to have a more conversational main interface and message presentation. While the models are becoming more complex, Murati emphasized her desire for a simpler and more user-friendly interaction experience with AI, allowing users to focus on collaborating with ChatGPT rather than worrying about the UI.
OpenAI also unveiled the ChatGPT desktop app, which is set to launch first on macOS, with the Windows version scheduled for later this year. It’s worth noting that just recently, news of the nearing end of negotiations between OpenAI and Apple for AI technology cooperation emerged, and the release of the macOS version of the app at this time sparks speculation.
OpenAI announces the launch of the macOS version of the ChatGPT application.
In addition, OpenAI has made the GPT Store, introduced earlier this year, free for all users. This platform allows developers to customize various chatbots and make them available for other users in the store. Free users will also have access to specific features previously exclusive to paid users.
Sources:
OpenAI, TechCrunch, MIT Technology Review