OpenAI has done something nobody would have expected: it slowed down the process of giving you an answer in the hopes that it gets it right.
The new OpenAI o1-preview models are designed for what OpenAI calls hard problems — complex tasks in subjects like science, coding, and math. These new models are released through the ChatGPT service along with access through OpenAI’s API and are still in development, but this is a promising idea.
I love the idea that one of the companies that made AI so bad is actually doing something to improve it. People think of AI as some sort of scientific mystery but at its core, it is the same as any other complex computer software. There is no magic; a computer program accepts input and sends output based on the way the software is written.
It seems like magic to us because we’re used to seeing software output in a different way. When it acts human-like, it seems strange and futuristic, and that’s really cool. Everyone wants to be Tony Stark and have conversations with their computer.
Unfortunately, the rush to release the cool type of AI that seems conversational has highlighted how bad it can be. Some companies call it a hallucination (not the fun kind, unfortunately), but no matter what label is placed on it, the answers we get from AI are often hilariously wrong or even wrong in a more concerning way.
OpenAI says that its GPT-4 model was only able to get 13% of the International Mathematics Olympiad exam questions correct. That’s probably better than most people would score but a computer should be able to score more accurately when it comes to mathematics. The new OpenAI o1-preview was able to get 83% of the questions correct. That is a dramatic leap and highlights the effectiveness of the new models.
Thankfully, OpenAI is true to its name and has shared how these models “think.” In an article about the reasoning capabilities of the new model, you can scroll to the “Chain-of-Thought” section to see a glimpse into the process. I found the Safety section particularly interesting as the model has used some safety rails to make sure it’s not telling you how to make homemade arsenic like the GPT-4 model will (don’t try to make homemade arsenic). This will lead to defeating the current tricks used to get conversational AI models to break their own rules once they are complete.
Overall, the industry needed this. My colleague and Android Central managing editor Derrek Lee pointed out that it’s interesting that when we want information instantly, OpenAI is willing to slow things down a bit, letting AI “think” to provide us with better answers. He’s absolutely right. This feels like a case of a tech company doing the right thing even if the results aren’t optimal.
I don’t think this will have any effect overnight, and I’m not convinced there is a purely altruistic goal at work. OpenAI wants its new LLM to be better at the tasks the current model does poorly. A side effect is a safer and better conversational AI that gets it right more often. I’ll take that trade, and I’ll expect Google to do something similar to show that it also understands that AI needs to get better.
AI isn’t going away until someone dreams up something newer and more profitable. Companies might as well work on making it as great as it can be.