What will be true of OpenAI's next major LLM release (GPT-4.5 or GPT-5)?

OpenAI #AI #Technology #OpenAI

148

1.4k

Ṁ24k

Ṁ4.4k

May 30

ALL

94%

achieve the highest ELO rating on LMSYS (ChatBot Arena)

89%

released in 2024

71%

is released within 1 week of being announced

57%

support long-term memory

42%

support video input

37%

support more agentic, time-consuming tasks with minimal input

28%

It is the gpt2-chatbot released earlier on Lmsys Arena Leaderboard

14%

only available through ChatGPT interface at the beginning

13%

it is named GPT-5

11%

context window >= 500k tokens

This question will be resolved when the new model is released. Variations of GPT-4 won't count; only major new models will qualify. The deadline will be postponed if neither model is released by the end of the month.

Get Ṁ600 play money

Related questions

Will there be a OpenAI LLM known as GPT-4.5? by 2033

25% chance

Will an open-source LLM beat or match GPT-4 by the end of 2024?

66% chance

By when will OpenAI broadly release the model expected to be called GPT-5?

OpenAI releases a model unquestionably named "GPT-4.5" or "GPT-4.5x" (where X varies) by mid 2025, hardcore legalism arc

14% chance

In what month will OpenAI broadly release the model expected to be called GPT-5?

What will be true about GPT-4.5?

What will be true about GPT-5?

Will OpenAI's next major LLM (after GPT-4) surpass 74% accuracy on the GPQA benchmark?

60% chance

What will be true of OpenAI's best LLM by EOY 2025?

Will OpenAI release GPT-5 before the end of 2024?

Sort by:

I just want to note my scepticism with regards to the video input capabilities of GPT-4o (I am aware that it is unclear whether GPT-4o will be the relevant model for the resolution of this market).

From both the live demo and some of the example videos shared on X, it seems to me like the app only samples the video feed at a very low rate (several seconds between samples) and passes these still images into the model. E.g. how the model still saw the table in the facial expression example. Or in the tic-tac-toe example on X, how they keep holding their hands in the resulting position for several seconds for the model to pick up on the result. I could be wrong and these are just examples of current inefficiencies of video input.

Even if the actual model only takes in stills, I could see an argument for resolving "supports video input" (as in, the app and maybe even the api take in video, even if the model itself doesn't), but I would at least ask for care in resolving this question (if GPT-4o leads to resolution).

sold Ṁ44 support video input NO

on second thought, I think I overestimated the likelihood of GPT-4o leading to resolution. I remain pretty convinced GPT-4o video input is not real, but sold my position here.

bought Ṁ30 context window >= 50... NO

Free mana in this market

Is GPT4o a "major" release?

sold Ṁ3 It is the gpt2-chatb... YES

@Sss19971997 That's a great question. Would wreck havoc in this market if so.

@ErikBjareholt I would say no.

bought Ṁ30 it is named GPT-5 NO

@Sss19971997 Mira called it their "flagship model". The blog also gives me clear "next major model" vibes: https://openai.com/index/hello-gpt-4o/

@Sss19971997 I would say yes. I think they clearly could have called this "GPT-4.5" if they wanted to.

@Sss19971997 Why would you say no? This model is OpenAI's next-generation model. They created a whole event + press release around it. They also call it their flagship new model.

@Soli closed prematurely by the looks of it

I really hope that GPT-5 is better at following instructions of style instead of being so stubborn in it's built-in personality, so making custom GPTs is actually useful (other than for accessing custom APIs and data).

@RobertoGomez It might be a little better, but I doubt it will be as good as you want. This is not really a scale issue. Base models are extremely good at imitating any style, and OpenAI is intentionally training the personality in afterwards to prevent the model from being used for harm.

Can you add an option to the list of predictions: "has minimal (<5%) hallucinations at <=128k token length"

@PaulJones2733 nice suggestion - when/where are such results normally shared?

@PaulJones2733 You mean, if given a representative sample of prompts of that length that it gets from users, it will hallucinate less than 5% of the time? What are the kinds of prompts, and what defines a hallucination?

@traders I created a similar question for LLAMA 3 and added 1k subsidy -> /Soli/what-will-be-true-of-llama-3-in-the

support video input

In this and all the options about “supporting” different capabilities, how do we interpret the situation where the model is claimed to support it in an announcement, but it’s not available in the first version that users are given access to, like how GPT-4 was announced as supporting image input, but ChatGPT didn’t get image input until some time after.

Also, specifically for video input, does slicing the video into frames like Gemini 1.5 count as supporting video input or does it require some richer form of support?

@GradySimon we would have to wait a reasonable amount of time till we are able to test the specific capability to resolve the markert or rely on reports from people who got beta/early access.

Regarding video input I think the model just needs to be able to discuss a any video file uploaded through the UI

What counts as “long-term memory”?

@GradySimon i guess this would be more on the application layer (chatgpt) and would require the ai assistant to be able to recall information from previous conversations

@Soli This is great! I was happy to find an unlinked market that didn't differentiate on what the model is called, and so I put this up on the dashboard.

Would you consider opening this to submissions from other people?

@Joshua I was worried it would get a bit too messy and neither of the options listed would get any significant trading volume which is why I did not allow submissions from other users. Happy to change that though if you tell me how haha. I tried to do it now for 5min and failed.