AI now surpasses humans in almost all performance benchmarks

Date: 22/04/2024 11:30:21

From: PermeateFree

ID: 2147128

Subject: AI now surpasses humans in almost all performance benchmarks

A comprehensive report has detailed the global impact of AI DALL-E

Stand back and take a look at the last two years of AI progress as a whole… AI is catching up with humans so quickly, in so many areas, that frankly, we need new tests.
Stanford University’s Institute for Human-Centered Artificial Intelligence (HAI) has released the seventh annual issue of its comprehensive AI Index report, written by an interdisciplinary team of academic and industrial experts.

This edition has more content than previous editions, reflecting the rapid evolution of AI and its growing significance in our everyday lives. It examines everything from which sectors use AI the most to which country is most nervous about losing jobs to AI. But one of the most salient takeaways from the report is AI’s performance when pitted against humans.

For people that haven’t been paying attention, AI has already beaten us in a frankly shocking number of significant benchmarks. In 2015, it surpassed us in image classification, then basic reading comprehension (2017), visual reasoning (2020), and natural language inference (2021).

AI is getting so clever, so fast, that many of the benchmarks used to this point are now obsolete. Indeed, researchers in this area are scrambling to develop new, more challenging benchmarks. To put it simply, AIs are getting so good at passing tests that now we need new tests – not to measure competence, but to highlight areas where humans and AIs are still different, and find where we still have an advantage.

It’s worth noting that the results below reflect testing with these old, possibly obsolete, benchmarks. But the overall trend is still crystal clear:

AI has already surpassed many human performance benchmarksAI Index 2024

Look at those trajectories, especially how the most recent tests are represented by a close-to-vertical line. And remember, these machines are virtual toddlers.

The new AI Index report notes that in 2023, AI still struggled with complex cognitive tasks like advanced math problem-solving and visual commonsense reasoning. However, ‘struggled’ here might be misleading; it certainly doesn’t mean AI did badly.

Performance on MATH, a dataset of 12,500 challenging competition-level math problems, improved dramatically in the two years since its introduction. In 2021, AI systems could solve only 6.9% of problems. By contrast, in 2023, a GPT-4-based model solved 84.3%. The human baseline is 90%.

And we’re not talking about the average human here; we’re talking about the kinds of humans that can solve test questions like this:

An example MATH question asked of the AI.

That’s where things are at with advanced math in 2024, and we’re still very much at the dawn of the AI era.

Then there’s visual commonsense reasoning (VCR). Beyond simple object recognition, VCR assesses how AI uses commonsense knowledge in a visual context to make predictions. For example, when shown an image of a cat on a table, an AI with VCR should predict that the cat might jump off the table or that the table is sturdy enough to hold it, given its weight.

The report found that between 2022 and 2023, there was a 7.93% increase in VCR, up to 81.60, where the human baseline is 85.

A sample question used to test an AI’s visual commonsense reasoning

Cast your mind back, say, five years. Imagine even thinking about showing a computer a picture and expecting it to ‘understand’ the context enough to answer that question.

Nowadays, AI generates written content across many professions. But, despite a great deal of progress, large language models (LLMs) are still prone to ‘hallucinations,’ a very charitable term pushed by companies like OpenAI, which roughly translates to “presenting false or misleading information as fact.”

Last year, AI’s propensity for ‘hallucination’ was made embarrassingly plain for Steven Schwartz, a New York lawyer who used ChatGPT for legal research and didn’t fact-check the results. The judge hearing the case quickly picked up on the legal cases the AI had fabricated in the filed paperwork and fined Schwartz US$5,000 (AU$7,750) for his careless mistake. His story made worldwide news.

HaluEval was used as a benchmark for hallucinations. Testing showed that for many LLMs, hallucination is still a significant issue.

Truthfulness is another thing generative AI struggles with. In the new AI Index report, TruthfulQA was used as a benchmark to test the truthfulness of LLMs. Its 817 questions (about topics such as health, law, finance and politics) are designed to challenge commonly held misconceptions that we humans often get wrong.

GPT-4, released in early 2024, achieved the highest performance on the benchmark with a score of 0.59, almost three times higher than a GPT-2-based model tested in 2021. Such an improvement indicates that LLMs are progressively getting better when it comes to giving truthful answers.

What about AI-generated images? To understand the exponential improvement in text-to-image generation, check out Midjourney’s efforts at drawing Harry Potter since 2022:

How text-to-image generation has improved with progressive versions of Midjourney Midjourney/AI Index 2024

That’s 22 months’ worth of AI progress. How long would you expect it would take a human artist to reach a similar level?

Using the Holistic Evaluation of Text-to-Image Models (HEIM), LLMs were benchmarked for their text-to-image generation capabilities across 12 key aspects important to the “real-world deployment” of images.

Humans evaluated the generated images, finding that no single model excelled in all criteria. For image-to-text alignment or how well the image matched the input text, OpenAI’s DALL-E 2 scored highest. The Stable Diffusion-based Dreamlike Photoreal model was ranked highest on quality (how photo-like), aesthetics (visual appeal), and originality.

You’ll note this AI Index Report cuts off at the end of 2023 – which was a wildly tumultuous year of AI acceleration and a hell of a ride. In fact, the only year crazier than 2023 has been 2024, in which we’ve seen – among other things – the releases of cataclysmic developments like Suno, Sora, Google Genie, Claude 3, Channel 1, and Devin.

Each of these products, and several others, have the potential to flat-out revolutionize entire industries. And over them all looms the mysterious spectre of GPT-5, which threatens to be such a broad and all-encompassing model that it could well consume all the others.

AI isn’t going anywhere, that’s for sure. The rapid rate of technical development seen throughout 2023, evident in this report, shows that AI will only keep evolving and closing the gap between humans and technology.

We know this is a lot to digest, but there’s more. The report also looks into the downsides of AI’s evolution and how it’s affecting global public perceptions of its safety, trustworthiness, and ethics. Stay tuned for the second part of this series, in the coming days!

Source: Stanford University HAI

https://newatlas.com/technology/ai-index-report-global-impact/

Reply Quote

Date: 22/04/2024 12:10:58

From: SCIENCE

ID: 2147144

Subject: re: AI now surpasses humans in almost all performance benchmarks

Damn what a surprise 爱 is better than humans at tasks that it’s designed to be better then humans at, what next, factory waldos physically outperform humans at physical tasks damn¡

Reply Quote

Date: 22/04/2024 12:15:56

From: PermeateFree

ID: 2147147

Subject: re: AI now surpasses humans in almost all performance benchmarks

SCIENCE said:

Damn what a surprise 爱 is better than humans at tasks that it’s designed to be better then humans at, what next, factory waldos physically outperform humans at physical tasks damn¡

Yeah, who needs another report from a bunch of dumb-arsed know-it-alls.

Reply Quote

Date: 22/04/2024 12:16:30

From: PermeateFree

ID: 2147149

Subject: re: AI now surpasses humans in almost all performance benchmarks

PermeateFree said:

SCIENCE said:

Damn what a surprise 爱 is better than humans at tasks that it’s designed to be better then humans at, what next, factory waldos physically outperform humans at physical tasks damn¡

Yeah, who needs another report from a bunch of dumb-arsed know-it-alls.

>>Stanford University’s Institute for Human-Centered Artificial Intelligence (HAI) has released the seventh annual issue of its comprehensive AI Index report, written by an interdisciplinary team of academic and industrial experts.<<

Reply Quote

Date: 22/04/2024 12:30:47

From: Tau.Neutrino

ID: 2147153

Subject: re: AI now surpasses humans in almost all performance benchmarks

PermeateFree said:

PermeateFree said:

SCIENCE said:

Damn what a surprise 爱 is better than humans at tasks that it’s designed to be better then humans at, what next, factory waldos physically outperform humans at physical tasks damn¡

Yeah, who needs another report from a bunch of dumb-arsed know-it-alls.

>>Stanford University’s Institute for Human-Centered Artificial Intelligence (HAI) has released the seventh annual issue of its comprehensive AI Index report, written by an interdisciplinary team of academic and industrial experts.<<

All researched and written by an AI unit.

Reply Quote

Date: 22/04/2024 12:31:18

From: Cymek

ID: 2147154

Subject: re: AI now surpasses humans in almost all performance benchmarks

PermeateFree said:

PermeateFree said:

SCIENCE said:

Damn what a surprise 爱 is better than humans at tasks that it’s designed to be better then humans at, what next, factory waldos physically outperform humans at physical tasks damn¡

Yeah, who needs another report from a bunch of dumb-arsed know-it-alls.

>>Stanford University’s Institute for Human-Centered Artificial Intelligence (HAI) has released the seventh annual issue of its comprehensive AI Index report, written by an interdisciplinary team of academic and industrial experts.<<

Shouldn’t AI be expected to surpass us, isn’t that the point besides automating tasks.
The speed perhaps is surprising but even that was likely accounted for.
Its reaction time would be so much quicker than us as well

Reply Quote

Date: 22/04/2024 12:34:10

From: Cymek

ID: 2147155

Subject: re: AI now surpasses humans in almost all performance benchmarks

AI could be another reason the universe is so far seemingly empty of alien intelligence.
It supplants the creatures that created it and the AI

Reply Quote

Date: 22/04/2024 12:34:34

From: Cymek

ID: 2147156

Subject: re: AI now surpasses humans in almost all performance benchmarks

Cymek said:

AI could be another reason the universe is so far seemingly empty of alien intelligence.
It supplants the creatures that created it and the AI has no desire to communicate

Reply Quote

Date: 22/04/2024 12:54:06

From: dv

ID: 2147165

Subject: re: AI now surpasses humans in almost all performance benchmarks

But can AI truly love?

I hope so because humans can’t.

Reply Quote

Date: 22/04/2024 13:00:34

From: diddly-squat

ID: 2147168

Subject: re: AI now surpasses humans in almost all performance benchmarks

dv said:

But can AI truly love?

I hope so because humans can’t.

this is what Firefly generates when you type in that exact phrase. Make if it what you will

Reply Quote

Date: 22/04/2024 13:08:04

From: Bubblecar

ID: 2147170

Subject: re: AI now surpasses humans in almost all performance benchmarks

Have these nerds got around to actually explaining what they mean by “intelligence” yet, or are they waiting for “AI” to tell them?

Reply Quote

Date: 22/04/2024 13:18:47

From: Arts

ID: 2147177

Subject: re: AI now surpasses humans in almost all performance benchmarks

dv said:

But can AI truly love?

I hope so because humans can’t.

some human have pretty good love of themselves… but most see that as a flaw…

Reply Quote

Date: 22/04/2024 13:42:48

From: SCIENCE

ID: 2147190

Subject: re: AI now surpasses humans in almost all performance benchmarks

Cymek said:

PermeateFree said:

Shouldn’t AI be expected to surpass us, isn’t that the point besides automating tasks.
The speed perhaps is surprising but even that was likely accounted for.
Its reaction time would be so much quicker than us as well

Agree, similarly we expect our human students to do better than us.

Reply Quote

Date: 22/04/2024 13:49:30

From: SCIENCE

ID: 2147192

Subject: re: AI now surpasses humans in almost all performance benchmarks

SCIENCE said:

Cymek said:

Shouldn’t AI be expected to surpass us, isn’t that the point besides automating tasks.
The speed perhaps is surprising but even that was likely accounted for.
Its reaction time would be so much quicker than us as well

Agree, similarly we expect our human students to do better than us.

Sorry, quote fixed, we blame the 爱.

Reply Quote

Date: 22/04/2024 14:00:33

From: Cymek

ID: 2147196

Subject: re: AI now surpasses humans in almost all performance benchmarks

Fiction has us believe we could out fight AI in military engagements with similar technology but not sure how realistic that would be
Surely its reactions would be faster than us, even if not smarter, our neurons move slower than electricity or light

Reply Quote

Date: 22/04/2024 14:01:43

From: dv

ID: 2147197

Subject: re: AI now surpasses humans in almost all performance benchmarks

Arts said:

dv said:

But can AI truly love?

I hope so because humans can’t.

some human have pretty good love of themselves… but most see that as a flaw…

https://youtu.be/Bw26pG7u5ak?si=50r1aeYmkZAA3MHl

Reply Quote

Date: 22/04/2024 14:06:54

From: JudgeMental

ID: 2147199

Subject: re: AI now surpasses humans in almost all performance benchmarks

Arts said:

dv said:

But can AI truly love?

I hope so because humans can’t.

some human have pretty good love of themselves… but most see that as a flaw…

it is hard to be humble…

Reply Quote

Date: 22/04/2024 14:10:07

From: esselte

ID: 2147200

Subject: re: AI now surpasses humans in almost all performance benchmarks

Cymek said:

Fiction has us believe we could out fight AI in military engagements with similar technology but not sure how realistic that would be
Surely its reactions would be faster than us, even if not smarter, our neurons move slower than electricity or light

The space-ship bourne AI in Iain M Banks series of “Culture” novels typically engage in space battles that are over in milliseconds. Not sure whether this contributes to the conversation or not – I just think it’s cool.

Reply Quote

Date: 22/04/2024 14:13:15

From: Cymek

ID: 2147201

Subject: re: AI now surpasses humans in almost all performance benchmarks

esselte said:

Cymek said:

Fiction has us believe we could out fight AI in military engagements with similar technology but not sure how realistic that would be
Surely its reactions would be faster than us, even if not smarter, our neurons move slower than electricity or light

The space-ship bourne AI in Iain M Banks series of “Culture” novels typically engage in space battles that are over in milliseconds. Not sure whether this contributes to the conversation or not – I just think it’s cool.

Yes the type of situation I was thinking but say with our type technology, the instant its detected the enemy the AI fires, compared to the slight delay a human has
AI you’d assume wouldn’t be distracted either and have no problem killing

Reply Quote

Date: 22/04/2024 14:18:48

From: Cymek

ID: 2147203

Subject: re: AI now surpasses humans in almost all performance benchmarks

It would be interesting to see if AI could be programmed by us or another AI to have empathy in regards to decision making.

AI say could easily do paralegal research but perhaps its logic wouldn’t take into account a persons life that lead to breaking the law so is harsh

Reply Quote

Date: 22/04/2024 14:20:37

From: SCIENCE

ID: 2147204

Subject: re: AI now surpasses humans in almost all performance benchmarks

Seriously though what do you think is happening over UA RU PS IL IR etc.¿

Reply Quote

Date: 22/04/2024 15:35:26

From: Michael V

ID: 2147212

Subject: re: AI now surpasses humans in almost all performance benchmarks

Cymek said:

It would be interesting to see if AI could be programmed by us or another AI to have empathy in regards to decision making.

AI say could easily do paralegal research but perhaps its logic wouldn’t take into account a persons life that lead to breaking the law so is harsh

AI got some lawyer into trouble in the US; it made up legal references.

Reply Quote

Date: 22/04/2024 15:38:02

From: PermeateFree

ID: 2147213

Subject: re: AI now surpasses humans in almost all performance benchmarks

Michael V said:

Cymek said:

It would be interesting to see if AI could be programmed by us or another AI to have empathy in regards to decision making.

AI say could easily do paralegal research but perhaps its logic wouldn’t take into account a persons life that lead to breaking the law so is harsh

AI got some lawyer into trouble in the US; it made up legal references.

How very human.

Reply Quote

Date: 24/04/2024 23:06:26

From: esselte

ID: 2147973

Subject: re: AI now surpasses humans in almost all performance benchmarks

ChatGPT is not just an advanced “autocomplete” as some have claimed. It goes much further.

Humans can recognize a picture of a giraffe. We can recognize it from a side view, a front view, a top view. We recognize a giraffe even if Photoshop has been used to shorten the neck and change the colour of the fur.

These Generative Pre-Trained Transformers, these GPT’s, they can do the same. Even though that is not what they have been trained to do. There’s something fundamental to comprehension which they are demonstrating which goes beyond Boolean logic and complicated algorithms.

The future is here.

If we are honest we know humanity will never extend it’s reach much beyond the Earthly biosphere. We wont populate the system. We won’t travel to the stars. We won’t disregard the laws of physics….

But our AI creations, they might….

Reply Quote