OpenAI release GPT-4.5 or GPT-5 by what month?

GPT-5 #AI #OpenAI #GPT-5 Speculation #GPT-4 speculation

167

1.5k

Ṁ48k

Ṁ5.5k

resolved May 14

ALL

Resolved

Before April

Resolved

Before May

Resolved

YES

Before June

Resolved

YES

Before July

Resolved

YES

Before August

Resolved

YES

Before September

Resolved

YES

Before October

Resolved

YES

Before November

Resolved

YES

Before December

Resolved

YES

Before January 2025

Any model which is clearly the next major, canonical form of GPT will count for this market regardless of what it is called. I am assuming this will be called GPT-4.5 or GPT-5, but it still counts if it has another name. A larger context window does not count, a jump like GPT-3 to GPT-3.5 does count.

All options resolve NO as that month arrives, until when the next model is broadly released and all remaining options resolve YES.

Get Ṁ600 play money

🏅 Top traders

#	Name	Total profit
1		Ṁ3,728
2		Ṁ2,161
3		Ṁ938
4		Ṁ828
5		Ṁ776

25 Comments

154 Holders

745 Trades

Sort by:

Kalshi's Market on a GPT-4 successor/GPT-4.5 resolved YES today, counting GPT-4o. I was already leaning towards a YES resolution, and was mainly waiting to see if Kalshi would disagree with me. Mira's market about 4.5 is also counting GPT-4o. Jim's market on if GPT2-Chatbot is GPT-4's successor has resolved NO. Mikhail's Market on "a more capable llm" is still unresolved.

My markets don't have exactly the same criteria as those markets, but this is a tricky situation and I wanted to see how other creator's handled it and hear arguments from traders. At this point, I think I'm confident in resolving all my markets to count GPT-4o as if it were GPT-4.5. I think that this is how OpenAI is presenting 4o, and I think that the improvements to speed, cost, and modality are impressive enough to justify that presentation as the latest and greatest flagship model.

On OpenAI's website, they now list GPT-4 and GPT-4 Turbo together as the "previous set" of models:

I think that it's disappointing that 4o isn't significantly smarter than GPT-4, but my markets never required OpenAI's next model to be significantly smarter. This system seems to be what all the rumors about a multimodal 4.5 model were referring to, and it was those rumors that kicked off my creation of these markets.

OpenAI's presentation of 4o is clearly intended to frame it as a jump like 3 to 3.5 or 3.5 to 4, but they are saving the impact of a numerical name increase for the full jump to GPT 5.

The next version of these markets will be run by the canonical Manifold AI account, and I will not trade in them:

@Joshua it's so odd to count a model that isn't better on the main GPT metrics as a successor(or is at most as better as other gpt4 updates so far), when you've said that you won't count stuff like context increases.

Bad resolution imo.

@Tenoke how is not better on "the main GPT" metrics when it's significantly better across every metric category?

To OpenAI's credit, they've claimed that this is the best model in the world right now and I think they're right. It's only marginally better than Turbo in raw intelligence, but the fact that it's 2x faster and natively multimodal is genuinely impressive and I think OpenAI is being reasonable in how they are framing it as the next big jump.

But even if they were massively overhyping it, if they had named it GPT 4.5 it still would have counted per the description. Altman clearly doesn't like the numerical naming system and has said many times he's not sure they'll ever name anything GPT-5, and I think this is a step in that direction. But it's still the multimodal model that all the rumors have been about since last year, which inspired the creation of this market.

@StephenMWalkerII Id at least wait for actual 4o to show up on arena before showing the graph from there. There's many reasons to think the gpt2 isn't representative.

In fact, I'm willing to bet you that the difference won't be this large once it's in.

@Tenoke "There's many reasons to think the gpt2 isn't representative. In fact, I'm willing to bet you that the difference won't be this large once it's in." bold claims, but not citing any reasons or evidence?

@StephenMWalkerII for a start, new models have often came out looking great from initial benchmarks while leveling off after real usage, scoring and usage during the test different than normal usage due to it being a test, test was shorter so higher variance, they've called it 'a version of gpt4o', so it's possible it's not the same version (e.g. before pruning or less rlhf or a bunch of stuff that can lower the score), etc.

Why not just wait a few days to see the actual score difference?

@Tenoke From my perspective, the exact intelligence of the model is less important than the fact that OpenAI are declaring it their frontier model and separating it from 4/4T in their documentation.

As far as capabilities do matter, the speed and multimodality are what they are emphasizing and not the benchmarks. Mikhail's market might well be best settled a week or two from from now, but I am confident that this resolution would be the same no matter what gpt-4o's final arena elo is.

@Tenoke where do I look?

bought Ṁ100 Before June YES

I think GPT-4o counts.

William Fedus on X: "GPT-4o is our new state-of-the-art frontier model. We’ve been testing a version on the LMSys arena as im-also-a-good-gpt2-chatbot 🙂. Here’s how it’s been doing. https://t.co/xEE2bYQbRk " / X (twitter.com )

+100 ELO points on coding is pretty big. "new state-of-the-art frontier model". And multimodal with a persistent connection is a notable architecture change.

@Mira I think it certainly could count, but I want to wait a bit and see more detailed information about how capable it is and how OpenAI present it. I'm going to stop trading in all these markets myself for now I think, because I foresee much debate over this.

@Joshua Maybe you want to close the market. That's what I do when I have "information sufficient to resolve but it needs judgment" because nobody should be betting on my judgment.

(and of course I bet on your judgment - but only up to 70% so I won't be annoyed if you resolve against.)

@Mira Yeah, I think I'm going to do that.

+100 ELO is more or less the difference between gpt-4-0613 and GPT-4 turbo; "major upgrade" seems overstated tbh.

@gramophone I'm not terribly impressed with its intelligence, but this market doesn't require it to be some specific level of intelligent. Right now, OAI definitely seems to be presenting it as "the next major, canonical form of GPT"