Cornell University Discovers a Huge Threat at the Core of ChatGPT

Raw Text

Member-only story

The eggs and omelet paradigm

Ignacio de Gregorio · Follow

Published in Towards AI · 9 min read · 4 days ago

--

28

Share

Over the last six months, companies around the world have been deploying Generative AI (GenAI) solutions.

As most cases require the GenAI model to have “long-term memory” almost every enterprise solution requires a vector database the model can query at run time to retrieve the context required to answer the user inquiry.

But, according to researchers from Cornell University, the now once-thought as highly secure solution hides a troublesome truth that could cause huge privacy concerns.

Also, this discovery gives us tremendous insights into one of the most unknown components of frontier AI models today.

Most insights I share in Medium have previously been shared in my weekly newsletter, TheTechOasis . If you want to be up-to-date with the frenetic world of AI while also feeling inspired to take action or, at the very least, to be well-prepared for the future ahead of us, this is for you. 🏝Subscribe below🏝 to become an AI leader among your peers and receive content not present in any other platform, including Medium:

Subscribe | TheTechOasis The newsletter to stay ahead of the curve in AI thetechoasis.beehiiv.com

The Memory Problem

If there’s a ubiquitous element in today’s frontier AI, that is embeddings.

Embeddings sit at the core of models like ChatGPT, and almost all progress made over the last years in AI can be traced down one way or another to these elements.

The great discovery

When working with non-numerical data, for decades, AI researchers found themselves with an insurmountable problem.

Classical computers — the computers still used to this day — only understand ‘1s’ and ‘0s’. Not letters, or audio waves. Only those two numbers.

Therefore, how do we express information from the world so that machines can understand it?

Single Line Text

Member-only story. The eggs and omelet paradigm. Ignacio de Gregorio · Follow. Published in Towards AI · 9 min read · 4 days ago. -- 28. Share. Over the last six months, companies around the world have been deploying Generative AI (GenAI) solutions. As most cases require the GenAI model to have “long-term memory” almost every enterprise solution requires a vector database the model can query at run time to retrieve the context required to answer the user inquiry. But, according to researchers from Cornell University, the now once-thought as highly secure solution hides a troublesome truth that could cause huge privacy concerns. Also, this discovery gives us tremendous insights into one of the most unknown components of frontier AI models today. Most insights I share in Medium have previously been shared in my weekly newsletter, TheTechOasis . If you want to be up-to-date with the frenetic world of AI while also feeling inspired to take action or, at the very least, to be well-prepared for the future ahead of us, this is for you. 🏝Subscribe below🏝 to become an AI leader among your peers and receive content not present in any other platform, including Medium: Subscribe | TheTechOasis The newsletter to stay ahead of the curve in AI thetechoasis.beehiiv.com. The Memory Problem. If there’s a ubiquitous element in today’s frontier AI, that is embeddings. Embeddings sit at the core of models like ChatGPT, and almost all progress made over the last years in AI can be traced down one way or another to these elements. The great discovery. When working with non-numerical data, for decades, AI researchers found themselves with an insurmountable problem. Classical computers — the computers still used to this day — only understand ‘1s’ and ‘0s’. Not letters, or audio waves. Only those two numbers. Therefore, how do we express information from the world so that machines can understand it?