ChatGPT with Search, Altman AMA

@thesobercoder November 3, 2024 at 1:37 pm

Love the EPL test.

@ntelas46 November 3, 2024 at 2:34 pm

It’s also available for all searchgpt waitlist users like me. Even on the free plan.

@ChosenFate_ November 3, 2024 at 9:42 pm

SearchGPT isnt just for paid users… everyone who was on the waitlist also got it (aka also free users)

@gubzs November 3, 2024 at 10:33 pm

Are you concerned that putting simple bench online can allow "training for the test"?

@cheslerpark7223 November 4, 2024 at 1:02 am

Hi, I enjoy your YT vids and decided to check out Simple Bench. I happen to have a concern with the first question that came up, and I think you should give it another look and maybe revise it.
The question: "Beth places four whole ice cubes in a frying pan at the start of the first minute, then five at the start of the second minute and some more at the start of the third minute, but none in the fourth minute. If the average number of ice cubes per minute placed in the pan while it was frying a crispy egg was five, how many whole ice cubes can be found in the pan at the end of the third minute?"

Ok, why I think it's problematic is because it is a complete assumption that the "frying pan" is sufficiently heated to melt the ice cubes in less than a minute. Pans are often referred to as "frying pans" as a descriptor of the type of pan, not because they are currently frying something at high heat. "While it was frying a crispy egg" could be at any time, and doesn't necessarily refer to the first three minutes. So because there is no direct piece of evidence that the frying pan is hot and could quickly melt an ice cube, then the reason an intelligence would select the correct answer "0" is because it's inferring that the question is quite silly and nonsensical, that there isn't sufficient evidence to give a correct answer, and that it thinks it's likely it is a trick question. If that is your intention with the question, then my apologies. However, I get the feeling you want the question to be more accessible and test if the intelligence can figure out the pan is indeed hot enough to melt the ice through real world context clues. To do that, it needs to be clear, not assumed or inferred, that the pan is in use and at high heat from the start.

@omarnomad November 4, 2024 at 1:17 am

fine-tuning YT algorithm.

@JohnnyMoonshine7 November 4, 2024 at 3:15 am

Nobody has heard of preemire league. Not even gpt

@brandonhamaguchi November 4, 2024 at 4:42 am

Test Simple Bench on mobile, you may want to fix the leaderboard table (horizontal scroll), fix the title who reads
SimpleBenc
h

@DylanKane-u2j November 4, 2024 at 8:54 am

you have such a talent to summarize this frontier of info and share it here. Thank you

@lako2023 November 4, 2024 at 11:39 am

The problem I see with the various AI benchmarks (or human benchmarks used with an AI): The training data of the foundation models will of course also include descriptions/questions for all kinds of AI benchmarks as well as all the discussions about them. So if an AI can use parts of its knowledge when being benchmarked, this will already change the result. We'd need offline benchmarks on systems not connected to the internet, vetted by professionals who are under NDA (= won't write about the details) to really know what they are capable off, right?

@lordfieldsworth595 November 4, 2024 at 11:56 am

Nice

@kaikapioka9711 November 4, 2024 at 2:42 pm

"AMA", then proceeds to answer nothing. Well, that's on us it's ask me anything not answer anything (at all).

@SiddhantGautam-o3x November 4, 2024 at 8:17 pm

Hey man,Nice video just a appreciation from your subscriber on how you reaffirm or debunk the hype or breakthroughs made in this Ai age,Just a small request that in future videos it would great if you could include how how you simple bench worked or the models launched in the yt videos ,also some tutorials on how your website works.
Anyways great work man👍

@WilliamBoothClibborn November 4, 2024 at 8:56 pm

Woo, new simple bench

@geoffdavids7647 November 5, 2024 at 2:37 am

Daaamn those simple bench questions are really tugging at the edge of what a human can reliably figure out without properly sitting down and studying them. I am embarrassed to say I scored only 80% on the try-it-yourself – I was fully out-foxed by the man seeing the light fall in the bathroom question, and the glove falling out of the car question. The glove I might have missed by doing it too quickly, but the light in the bathroom one I sat and thought on for several minutes and only figured it out once I knew my first guess was wrong 😩

I fed both questions I got wrong and their multi-choice answers into o1 preview, prefaced by only "think carefully, the following might be a trick question". It got both of them absolutely right in one go, and perfectly figured out the trick in each. I am mortified.

I feel like these questions really rely on visualisation and a robust world-model. I might have gotten them all right if I'd really tried to build a visual mental image of the situations a bit more carefully, or even drawn them out. If these LLMs were trained more extensively on spatial understanding using physical 3d model simulations? I wouldn't be surprised if it was able to smash through most of these.

@carloslfu November 5, 2024 at 3:38 pm

lol! Loved Perplexity result about Simple Bench!

@jaxonterrill8416 November 6, 2024 at 11:34 pm

I love AI Explained. Funny, short, concise, and extremely informative.

@joema985 November 7, 2024 at 3:51 am

I've wasted two years of my life following this AI con. What's changed since then? Chatbots that are stumbling over the same questions. AGI still "5-10 years away." Oh, but now it has a new search frontend! All these moving goal posts. I'm done.

@kimcolpo November 7, 2024 at 12:59 pm

Cool

@ArnaudMEURET November 7, 2024 at 11:45 pm

I use Brave search by default and it’s often insufficient but since they added the AI summary last year, it has proven extremely helpful in most of my daily queries (mostly development and general tech queries)

@InternetStranger10101 November 9, 2024 at 2:48 pm

Love this video, can’t wait to see more

@harsh_hydra12345 November 9, 2024 at 4:29 pm

Waiting for new video .
❤

@linus8490 November 10, 2024 at 12:36 pm

I think SimpleBench evaluation is unfair because the human evaluators could remember the previous questions while the LLMs don't know the other questions..
Please eval again with history for the LLMs

@mrpicky1868 November 10, 2024 at 2:31 pm

i disagree on reliability being a "must". average human is a dumb mess and we call it norm

@maciejbala477 November 11, 2024 at 12:04 am

The hallucination thing is worrying. It's the thing that I'd most want to see properly addressed. I think it makes so much difference, not just for economic growth, but for everyday use as well. It is tiring to have to verify everything and deal with annoying mistakes which crop up out of nowhere whenever you're doing something. But it is what I suspected, to be honest, AI companies do not have a good answer for it. And until they do, I don't think we can ever claim they're more intelligent than humans, because reliability is implied in that claim, in my view

@luizpereira7165 November 11, 2024 at 2:21 am

Are you the creator of simplebench? Congratulations, it's a great bench mark. You could run a interesting little experiment with it. Change some details of the questions (without making any diference for the reasouning) like the order names apears on those questions, the names of people or the shapes and colors of objects to see if the models decrease their performance.

@DC-uc4sh November 23, 2024 at 5:01 am

Yes, reliability but does it have to be zero-hallucinations! We have sufficiently reliability inn the modern world despite human hallucinations. I sense goal-posting moving on this platform. Give us an operational definition of reliability!

Welcome to DecorrisList: Your Ultimate Guide to Health, Wealth, and Positivity!

ChatGPT with Search, Altman AMA

Like this:

27 thoughts on “ChatGPT with Search, Altman AMA”

ChatGPT with Search, Altman AMA

Share this:

Like this:

27 thoughts on “ChatGPT with Search, Altman AMA”