Dubai Telegraph - Inbred, gibberish or just MAD? Warnings rise about AI models

EUR -
AED 3.938479
AFN 73.284283
ALL 98.19234
AMD 417.267449
ANG 1.943348
AOA 978.447316
ARS 1071.53141
AUD 1.629089
AWG 1.930079
AZN 1.82711
BAM 1.955647
BBD 2.17713
BDT 128.849948
BGN 1.9562
BHD 0.406468
BIF 3183.551653
BMD 1.072266
BND 1.425189
BOB 7.467417
BRL 6.152562
BSD 1.078316
BTN 90.972903
BWP 14.300884
BYN 3.528725
BYR 21016.42052
BZD 2.17343
CAD 1.49386
CDF 3073.115756
CHF 0.939162
CLF 0.03726
CLP 1028.119797
CNY 7.698019
CNH 7.63378
COP 4640.937963
CRC 551.556973
CUC 1.072266
CUP 28.415058
CVE 110.256399
CZK 25.259812
DJF 192.015021
DKK 7.459869
DOP 64.934934
DZD 142.958848
EGP 52.835878
ERN 16.083995
ETB 133.503285
FJD 2.399951
FKP 0.820465
GBP 0.830088
GEL 2.916983
GGP 0.820465
GHS 17.683621
GIP 0.820465
GMD 76.671173
GNF 9295.27488
GTQ 8.33535
GYD 225.592402
HKD 8.336174
HNL 27.205878
HRK 7.386875
HTG 141.888931
HUF 407.236454
IDR 16786.168917
ILS 4.020796
IMP 0.820465
INR 90.481213
IQD 1412.489812
IRR 45134.375558
ISK 148.766647
JEP 0.820465
JMD 171.076654
JOD 0.760348
JPY 163.686863
KES 139.08915
KGS 92.433433
KHR 4378.658423
KMF 493.644665
KPW 965.039476
KRW 1499.246878
KWD 0.328832
KYD 0.89853
KZT 530.808592
LAK 23665.153893
LBP 96559.167469
LKR 315.465391
LRD 204.33406
LSL 18.869628
LTL 3.166124
LVL 0.648604
LYD 5.232592
MAD 10.648369
MDL 19.338491
MGA 4988.610841
MKD 61.5252
MMK 3482.679288
MNT 3643.561097
MOP 8.633826
MRU 42.957649
MUR 49.75717
MVR 16.566921
MWK 1869.754141
MXN 21.634265
MYR 4.699212
MZN 68.521819
NAD 18.869628
NGN 1788.626462
NIO 39.676905
NOK 11.794827
NPR 145.556645
NZD 1.797446
OMR 0.412628
PAB 1.078316
PEN 4.044584
PGK 4.328662
PHP 62.679371
PKR 299.424042
PLN 4.325898
PYG 8431.342275
QAR 3.931893
RON 4.977143
RSD 116.980874
RUB 104.99181
RWF 1478.084695
SAR 4.02742
SBD 8.943509
SCR 14.390377
SDG 644.972153
SEK 11.594849
SGD 1.4214
SHP 0.820465
SLE 24.501684
SLL 22484.885861
SOS 616.251927
SRD 37.497551
STD 22193.748611
SVC 9.435264
SYP 2694.101668
SZL 18.864528
THB 36.687634
TJS 11.462006
TMT 3.763655
TND 3.347839
TOP 2.511359
TRY 36.822021
TTD 7.327428
TWD 34.580984
TZS 2878.975413
UAH 44.514627
UGX 3946.692121
USD 1.072266
UYU 45.046486
UZS 13787.924411
VEF 3884341.194834
VES 47.874003
VND 27101.532073
VUV 127.301648
WST 3.003615
XAF 655.905833
XAG 0.031788
XAU 0.000394
XCD 2.897854
XDR 0.808437
XOF 655.905833
XPF 119.331742
YER 267.878982
ZAR 19.79817
ZMK 9651.687743
ZMW 29.35571
ZWL 345.269328
  • JRI

    0.1600

    13.53

    +1.18%

  • BCC

    1.4700

    142.32

    +1.03%

  • BCE

    0.3000

    28.37

    +1.06%

  • GSK

    -0.3700

    36.29

    -1.02%

  • RBGPF

    61.4000

    61.4

    +100%

  • CMSD

    0.2350

    25.125

    +0.94%

  • RIO

    -3.0400

    64.43

    -4.72%

  • SCS

    0.0600

    13.14

    +0.46%

  • CMSC

    0.1600

    24.84

    +0.64%

  • BTI

    -0.0100

    35.39

    -0.03%

  • RELX

    0.3200

    47.98

    +0.67%

  • VOD

    -0.0100

    9.31

    -0.11%

  • RYCEF

    0.0100

    7.15

    +0.14%

  • BP

    -0.8800

    28.93

    -3.04%

  • NGG

    -0.3600

    63.94

    -0.56%

  • AZN

    -0.2000

    64.49

    -0.31%

Inbred, gibberish or just MAD? Warnings rise about AI models
Inbred, gibberish or just MAD? Warnings rise about AI models / Photo: Fabrice COFFRINI - AFP/File

Inbred, gibberish or just MAD? Warnings rise about AI models

When academic Jathan Sadowski reached for an analogy last year to describe how AI programs decay, he landed on the term "Habsburg AI".

Text size:

The Habsburgs were one of Europe's most powerful royal houses, but entire sections of their family line collapsed after centuries of inbreeding.

Recent studies have shown how AI programs underpinning products like ChatGPT go through a similar collapse when they are repeatedly fed their own data.

"I think the term Habsburg AI has aged very well," Sadowski told AFP, saying his coinage had "only become more relevant for how we think about AI systems".

The ultimate concern is that AI-generated content could take over the web, which could in turn render chatbots and image generators useless and throw a trillion-dollar industry into a tailspin.

But other experts argue that the problem is overstated, or can be fixed.

And many companies are enthusiastic about using what they call synthetic data to train AI programs. This artificially generated data is used to augment or replace real-world data. It is cheaper than human-created content but more predictable.

"The open question for researchers and companies building AI systems is: how much synthetic data is too much," said Sadowski, lecturer in emerging technologies at Australia's Monash University.

- 'Mad cow disease' -

Training AI programs, known in the industry as large language models (LLMs), involves scraping vast quantities of text or images from the internet.

This information is broken into trillions of tiny machine-readable chunks, known as tokens.

When asked a question, a program like ChatGPT selects and assembles tokens in a way that its training data tells it is the most likely sequence to fit with the query.

But even the best AI tools generate falsehoods and nonsense, and critics have long expressed concern about what would happen if a model was fed on its own outputs.

In late July, a paper in the journal Nature titled "AI models collapse when trained on recursively generated data" proved a lightning rod for discussion.

The authors described how models quickly discarded rarer elements in their original dataset and, as Nature reported, outputs degenerated into "gibberish".

A week later, researchers from Rice and Stanford universities published a paper titled "Self-consuming generative models go MAD" that reached a similar conclusion.

They tested image-generating AI programs and showed that outputs become more generic and strafed with undesirable elements as they added AI-generated data to the underlying model.

They labelled model collapse "Model Autophagy Disorder" (MAD) and compared it to mad cow disease, a fatal illness caused by feeding the remnants of dead cows to other cows.

- 'Doomsday scenario' -

These researchers worry that AI-generated text, images and video are clearing the web of usable human-made data.

"One doomsday scenario is that if left uncontrolled for many generations, MAD could poison the data quality and diversity of the entire internet," one of the Rice University authors, Richard Baraniuk, said in a statement.

However, industry figures are unfazed.

Anthropic and Hugging Face, two leaders in the field who pride themselves on taking an ethical approach to the technology, both told AFP they used AI-generated data to fine-tune or filter their datasets.

Anton Lozhkov, machine learning engineer at Hugging Face, said the Nature paper gave an interesting theoretical perspective but its disaster scenario was not realistic.

"Training on multiple rounds of synthetic data is simply not done in reality," he said.

However, he said researchers were just as frustrated as everyone else with the state of the internet.

"A large part of the internet is trash," he said, adding that Hugging Face already made huge efforts to clean data -- sometimes jettisoning as much as 90 percent.

He hoped that web users would help clear up the internet by simply not engaging with generated content.

"I strongly believe that humans will see the effects and catch generated data way before models will," he said.

A.Ansari--DT