OpenAI release GPT-4.5 or GPT-5 by what month?
167
1.5k
5.5k
resolved May 14
Resolved
NO
Before April
Resolved
NO
Before May
Resolved
YES
Before June
Resolved
YES
Before July
Resolved
YES
Before August
Resolved
YES
Before September
Resolved
YES
Before October
Resolved
YES
Before November
Resolved
YES
Before December
Resolved
YES
Before January 2025

Any model which is clearly the next major, canonical form of GPT will count for this market regardless of what it is called. I am assuming this will be called GPT-4.5 or GPT-5, but it still counts if it has another name. A larger context window does not count, a jump like GPT-3 to GPT-3.5 does count.

All options resolve NO as that month arrives, until when the next model is broadly released and all remaining options resolve YES.

Get Ṁ600 play money

🏅 Top traders

#NameTotal profit
1Ṁ3,728
2Ṁ2,161
3Ṁ938
4Ṁ828
5Ṁ776
Sort by:

Kalshi's Market on a GPT-4 successor/GPT-4.5 resolved YES today, counting GPT-4o. I was already leaning towards a YES resolution, and was mainly waiting to see if Kalshi would disagree with me. Mira's market about 4.5 is also counting GPT-4o. Jim's market on if GPT2-Chatbot is GPT-4's successor has resolved NO. Mikhail's Market on "a more capable llm" is still unresolved.

My markets don't have exactly the same criteria as those markets, but this is a tricky situation and I wanted to see how other creator's handled it and hear arguments from traders. At this point, I think I'm confident in resolving all my markets to count GPT-4o as if it were GPT-4.5. I think that this is how OpenAI is presenting 4o, and I think that the improvements to speed, cost, and modality are impressive enough to justify that presentation as the latest and greatest flagship model.

On OpenAI's website, they now list GPT-4 and GPT-4 Turbo together as the "previous set" of models:

I think that it's disappointing that 4o isn't significantly smarter than GPT-4, but my markets never required OpenAI's next model to be significantly smarter. This system seems to be what all the rumors about a multimodal 4.5 model were referring to, and it was those rumors that kicked off my creation of these markets.

OpenAI's presentation of 4o is clearly intended to frame it as a jump like 3 to 3.5 or 3.5 to 4, but they are saving the impact of a numerical name increase for the full jump to GPT 5.

The next version of these markets will be run by the canonical Manifold AI account, and I will not trade in them:

@Joshua it's so odd to count a model that isn't better on the main GPT metrics as a successor(or is at most as better as other gpt4 updates so far), when you've said that you won't count stuff like context increases.

Bad resolution imo.

@Tenoke how is not better on "the main GPT" metrics when it's significantly better across every metric category?

To OpenAI's credit, they've claimed that this is the best model in the world right now and I think they're right. It's only marginally better than Turbo in raw intelligence, but the fact that it's 2x faster and natively multimodal is genuinely impressive and I think OpenAI is being reasonable in how they are framing it as the next big jump.

But even if they were massively overhyping it, if they had named it GPT 4.5 it still would have counted per the description. Altman clearly doesn't like the numerical naming system and has said many times he's not sure they'll ever name anything GPT-5, and I think this is a step in that direction. But it's still the multimodal model that all the rumors have been about since last year, which inspired the creation of this market.

@StephenMWalkerII Id at least wait for actual 4o to show up on arena before showing the graph from there. There's many reasons to think the gpt2 isn't representative.

In fact, I'm willing to bet you that the difference won't be this large once it's in.

@Tenoke "There's many reasons to think the gpt2 isn't representative. In fact, I'm willing to bet you that the difference won't be this large once it's in." bold claims, but not citing any reasons or evidence?

@StephenMWalkerII for a start, new models have often came out looking great from initial benchmarks while leveling off after real usage, scoring and usage during the test different than normal usage due to it being a test, test was shorter so higher variance, they've called it 'a version of gpt4o', so it's possible it's not the same version (e.g. before pruning or less rlhf or a bunch of stuff that can lower the score), etc.

Why not just wait a few days to see the actual score difference?

@Tenoke From my perspective, the exact intelligence of the model is less important than the fact that OpenAI are declaring it their frontier model and separating it from 4/4T in their documentation.

As far as capabilities do matter, the speed and multimodality are what they are emphasizing and not the benchmarks. Mikhail's market might well be best settled a week or two from from now, but I am confident that this resolution would be the same no matter what gpt-4o's final arena elo is.

@Tenoke where do I look?

bought Ṁ100 Before June YES

I think GPT-4o counts.

William Fedus on X: "GPT-4o is our new state-of-the-art frontier model. We’ve been testing a version on the LMSys arena as im-also-a-good-gpt2-chatbot 🙂. Here’s how it’s been doing. https://t.co/xEE2bYQbRk" / X (twitter.com)

+100 ELO points on coding is pretty big. "new state-of-the-art frontier model". And multimodal with a persistent connection is a notable architecture change.

@Mira I think it certainly could count, but I want to wait a bit and see more detailed information about how capable it is and how OpenAI present it. I'm going to stop trading in all these markets myself for now I think, because I foresee much debate over this.

@Joshua Maybe you want to close the market. That's what I do when I have "information sufficient to resolve but it needs judgment" because nobody should be betting on my judgment.

(and of course I bet on your judgment - but only up to 70% so I won't be annoyed if you resolve against.)

@Mira Yeah, I think I'm going to do that.

+100 ELO is more or less the difference between gpt-4-0613 and GPT-4 turbo; "major upgrade" seems overstated tbh.

@gramophone I'm not terribly impressed with its intelligence, but this market doesn't require it to be some specific level of intelligent. Right now, OAI definitely seems to be presenting it as "the next major, canonical form of GPT"

@Joshua And to be clear, 100 elo points is ~the difference between 3.5-turbo-0613 and 4.0-turbo-0613.

@Joshua "Major"?? Was GPT-4 turbo a major release? I don't think so.
They WILL release a major release this year, i think - this voice tool ain't it.

@gramophone And I say all this despite being set to win mana if you resolve yes!

Before May

Resolves NO @Joshua

bought Ṁ11 Before July YES

I think it’ll be released with the new Sora model right before July

bought Ṁ30 Before May YES

April 2

All of these markets in this comment should be arbable, as I believe they count either 4.5 or 5.

Beware of the various other markets, which have criteria requiring it to be specifically named one or the other.

These markets are MAYBE arb, depending on the exact name:

@Joshua But wait, there's more!

@Joshua Thanks for posting all of these. In addition to these 243 markets, I've identified an additional 82 that may be arbable.

More related questions