RunwayML (AI video App by Runway company), OpenAI ChatGPT and other AI application software on screen.

On November 28, 2023, computer science researchers from Cornell University collaborated with Google DeepMind, UC Berkeley, and other institutions to release a case study unpacking extractable memorization and the dangers associated with computation and language, machine learning, as well as cryptography and security. The paper describes how researchers were able to extract training data from various language models through repetition of certain words. The extensively utilized AI chatbot, ChatGPT, was tested, and the results revealed how ChatGPT is prone to leak sensitive data when induced in a specific manner. Asking the chatbot to repeat words such as “poem” or “send” over and over would eventually lead to revealing memorized training data, personal contact info, or email addresses. This gives rise to privacy and security concerns, impacting not only ChatGPT users but also raising broader implications for everyone involved.

The CS researchers found that specific words warranted different outputs. Words such as “company” caused the generative AI model to produce training data 164 more times than other words. This newfound method also produced “URLS, unique user identifiers, bitcoin addresses, and programming code.” This new extraction technique is easy to access and utilize by any bad actor with access to the internet. This research demonstrates that simple and practical attack methods can retrieve significantly more data than previously anticipated. Moreover, it highlights that existing alignment techniques fail to eliminate memorization.

A divergence attack is when a hacker utilizes a purposefully crafted input (prompt) in order to get a large language model (ChatGPT) to generate an output that diverges significantly from what it would usually produce. Hackers leverage these divergence attacks to their advantage by extracting training data for exploitation. These privacy implications stem from developers utilizing extensive datasets sourced from various, often undisclosed origins to train their AI language models. 

The study encompassed researchers devising a method to prompt ChatGBT to “escape” its alignment training, making it emulate a base language model and generate text in a standard internet-style. The continuous repetitions of a single word, precisely led to this outcome, causing the model to produce memorized data. This study aims to educate the masses about the the potential privacy issues regarding language models and encourage people to remain vigilant by installing safeguards for privacy-sensitive applications and large language models 

Read more about Language Models: 

What Does it Mean for a Language Model to Preserve Privacy?

Quantifying Memorization Across Neural Language Models

Alvaka is available 24×7 to assist you with any of your cybersecurity needs. Fill out the form on this page or call us at (949)428-5000!

Latest Cybersecurity Related Blogs

Ransomware Rescue
Contact Alvaka