Inbred, gibberish or just MAD? Warnings rise about AI models

Dubai Telegraph - Inbred, gibberish or just MAD? Warnings rise about AI models

Dubai 29°C

AED 3.865039

AFN 71.961868

ALL 97.885367

AMD 409.705534

ANG 1.898038

AOA 960.733931

ARS 1055.061215

AUD 1.613881

AWG 1.894109

AZN 1.787029

BAM 1.951539

BBD 2.126437

BDT 125.855234

BGN 1.956342

BHD 0.396578

BIF 3110.579445

BMD 1.052283

BND 1.414399

BOB 7.293078

BRL 6.086683

BSD 1.053191

BTN 88.848028

BWP 14.387453

BYN 3.446543

BYR 20624.740218

BZD 2.122845

CAD 1.469502

CDF 3014.78969

CHF 0.929776

CLF 0.037101

CLP 1023.776253

CNY 7.619996

CNH 7.625593

COP 4626.455438

CRC 534.824751

CUC 1.052283

CUP 27.885491

CVE 110.024795

CZK 25.350861

DJF 187.538784

DKK 7.458788

DOP 63.520417

DZD 140.573397

EGP 52.274979

ERN 15.78424

ETB 131.306162

FJD 2.388363

FKP 0.830585

GBP 0.832524

GEL 2.883571

GGP 0.830585

GHS 16.7185

GIP 0.830585

GMD 74.71233

GNF 9078.051459

GTQ 8.13025

GYD 220.338958

HKD 8.189863

HNL 26.613518

HRK 7.506205

HTG 138.346648

HUF 411.186809

IDR 16734.714279

ILS 3.929639

IMP 0.830585

INR 88.911049

IQD 1379.588093

IRR 44293.214291

ISK 145.520299

JEP 0.830585

JMD 166.933965

JOD 0.746386

JPY 162.676061

KES 136.007134

KGS 91.02957

KHR 4249.68174

KMF 491.94202

KPW 947.053999

KRW 1471.222726

KWD 0.323672

KYD 0.877684

KZT 523.167824

LAK 23125.51255

LBP 94319.785398

LKR 306.411046

LRD 190.622024

LSL 19.101997

LTL 3.107117

LVL 0.636515

LYD 5.138732

MAD 10.521031

MDL 19.167154

MGA 4930.189594

MKD 61.546561

MMK 3417.773046

MNT 3575.656436

MOP 8.443666

MRU 41.866002

MUR 48.839087

MVR 16.268296

MWK 1826.195708

MXN 21.380416

MYR 4.698412

MZN 67.293799

NAD 19.101997

NGN 1768.455747

NIO 38.755022

NOK 11.613586

NPR 142.154623

NZD 1.792324

OMR 0.40513

PAB 1.053101

PEN 3.996674

PGK 4.239684

PHP 62.126243

PKR 292.773138

PLN 4.342422

PYG 8247.914831

QAR 3.840515

RON 4.977085

RSD 117.020141

RUB 106.281009

RWF 1452.315514

SAR 3.95054

SBD 8.79238

SCR 14.332083

SDG 632.944958

SEK 11.610939

SGD 1.413951

SHP 0.830585

SLE 23.75528

SLL 22065.84631

SOS 601.88026

SRD 37.282669

STD 21780.126598

SVC 9.214882

SYP 2643.891613

SZL 19.091139

THB 36.458458

TJS 11.216013

TMT 3.682989

TND 3.324243

TOP 2.464553

TRY 36.27081

TTD 7.130433

TWD 34.270209

TZS 2791.031424

UAH 43.426878

UGX 3886.514989

USD 1.052283

UYU 45.021709

UZS 13526.469111

VES 48.861031

VND 26751.65603

VUV 124.929112

WST 2.937543

XAF 654.521833

XAG 0.033884

XAU 0.000395

XCD 2.843846

XDR 0.801343

XOF 654.521833

XPF 119.331742

YER 262.991742

ZAR 19.064031

ZMK 9471.810193

ZMW 29.146091

ZWL 338.834589

RBGPF

59.6900

59.69

+100%
SCS

-0.0200

13.07

-0.15%
RIO

-0.0400

62.39

-0.06%
CMSC

-0.0450

24.52

-0.18%
BCC

-0.7700

137.41

-0.56%
RYCEF

-0.0800

6.61

-1.21%
JRI

-0.0300

13.23

-0.23%
CMSD

-0.0836

24.26

-0.34%
VOD

0.0200

8.94

+0.22%
RELX

-0.1800

45.11

-0.4%
NGG

-0.3100

63.27

-0.49%
BCE

-0.3100

27

-1.15%
AZN

-0.6000

63.2

-0.95%
BTI

0.1500

37.08

+0.4%
GSK

-0.1100

33.35

-0.33%
BP

-0.0100

29.08

-0.03%

Inbred, gibberish or just MAD? Warnings rise about AI models / Photo: Fabrice COFFRINI - AFP/File

Inbred, gibberish or just MAD? Warnings rise about AI models

TECHNOLOGY 05.08.2024

When academic Jathan Sadowski reached for an analogy last year to describe how AI programs decay, he landed on the term "Habsburg AI".

Text size:

The Habsburgs were one of Europe's most powerful royal houses, but entire sections of their family line collapsed after centuries of inbreeding.

Recent studies have shown how AI programs underpinning products like ChatGPT go through a similar collapse when they are repeatedly fed their own data.

"I think the term Habsburg AI has aged very well," Sadowski told AFP, saying his coinage had "only become more relevant for how we think about AI systems".

The ultimate concern is that AI-generated content could take over the web, which could in turn render chatbots and image generators useless and throw a trillion-dollar industry into a tailspin.

But other experts argue that the problem is overstated, or can be fixed.

And many companies are enthusiastic about using what they call synthetic data to train AI programs. This artificially generated data is used to augment or replace real-world data. It is cheaper than human-created content but more predictable.

"The open question for researchers and companies building AI systems is: how much synthetic data is too much," said Sadowski, lecturer in emerging technologies at Australia's Monash University.

- 'Mad cow disease' -

Training AI programs, known in the industry as large language models (LLMs), involves scraping vast quantities of text or images from the internet.

This information is broken into trillions of tiny machine-readable chunks, known as tokens.

When asked a question, a program like ChatGPT selects and assembles tokens in a way that its training data tells it is the most likely sequence to fit with the query.

But even the best AI tools generate falsehoods and nonsense, and critics have long expressed concern about what would happen if a model was fed on its own outputs.

In late July, a paper in the journal Nature titled "AI models collapse when trained on recursively generated data" proved a lightning rod for discussion.

The authors described how models quickly discarded rarer elements in their original dataset and, as Nature reported, outputs degenerated into "gibberish".

A week later, researchers from Rice and Stanford universities published a paper titled "Self-consuming generative models go MAD" that reached a similar conclusion.

They tested image-generating AI programs and showed that outputs become more generic and strafed with undesirable elements as they added AI-generated data to the underlying model.

They labelled model collapse "Model Autophagy Disorder" (MAD) and compared it to mad cow disease, a fatal illness caused by feeding the remnants of dead cows to other cows.

- 'Doomsday scenario' -

These researchers worry that AI-generated text, images and video are clearing the web of usable human-made data.

"One doomsday scenario is that if left uncontrolled for many generations, MAD could poison the data quality and diversity of the entire internet," one of the Rice University authors, Richard Baraniuk, said in a statement.

However, industry figures are unfazed.

Anthropic and Hugging Face, two leaders in the field who pride themselves on taking an ethical approach to the technology, both told AFP they used AI-generated data to fine-tune or filter their datasets.

Anton Lozhkov, machine learning engineer at Hugging Face, said the Nature paper gave an interesting theoretical perspective but its disaster scenario was not realistic.

"Training on multiple rounds of synthetic data is simply not done in reality," he said.

However, he said researchers were just as frustrated as everyone else with the state of the internet.

"A large part of the internet is trash," he said, adding that Hugging Face already made huge efforts to clean data -- sometimes jettisoning as much as 90 percent.

He hoped that web users would help clear up the internet by simply not engaging with generated content.

"I strongly believe that humans will see the effects and catch generated data way before models will," he said.

A.Ansari--DT

Dubai Telegraph - Inbred, gibberish or just MAD? Warnings rise about AI models

Inbred, gibberish or just MAD? Warnings rise about AI models

Featured

The first 'zoomed-in' image of a star outside our galaxy

Historic gold regalia returned to Ghana's king

Endometriosis linked to slightly higher risk of early death

2024's record ocean heat revved up Atlantic hurricane wind speeds: study