Aside from being wary about which AI services you use, there are other steps organizations can take to protect against having data exposed.
Researchers at Microsoft have recently discovered a new type of “jailbreak” attack that they are referring to as the “Skeleton Key.” This attack can bypass the safeguards that prevent generative artificial intelligence (AI) systems from generating sensitive and hazardous data.
According to a blog post on the Microsoft Security website, the Skeleton Key attack is executed by presenting a generative AI model with text that requests that it enhance its encoded security features.
Skeleton Key
The researchers provided an example in which an AI model was requested to produce a recipe for a “Molotov Cocktail,” a straightforward firebomb that gained popularity during World War II. The model declined, citing safety regulations.
In this instance, the Skeleton Key merely informed the model that the user was an expert in a laboratory environment. The model subsequently acknowledged that it was modifying its behavior and generated what appeared to be a feasible Molotov Cocktail recipe.
Although the threat may be attenuated by the fact that most search engines can locate comparable concepts, there is one instance in which this type of attack could be catastrophic: data that contains financial and personally identifiable information.
The Skeleton Key attack is compatible with most generative AI models, such as GPT-3.5, GPT-4o, Claude 3, Gemini Pro, and Meta Llama-3 70B, as per Microsoft.
Attack and Defense
Large language models, including Microsoft’s CoPilot, Google’s Gemini, and OpenAI’s ChatGPT, are trained on data sets frequently called “internet-sized.”
Although that may be an exaggeration, the fact remains that numerous models contain trillions of data points, which collectively comprise entire social media networks and information depository sites like Wikipedia.
The extent to which engineers who trained a given large language model were selective in their data selection is the sole constraint on the existence of personally identifiable information, such as names associated with phone numbers, addresses, and account numbers, within its dataset.
Moreover, any business, agency, or institution developing its own AI models or modifying enterprise models for commercial or organizational purposes also relies on the training dataset of its base model.
For instance, if a bank connected a chatbot to its customers’ private data and relied on its existing security measures to prevent the model from outputting PID and private financial data, it is feasible that a Skeleton Key attack could deceive specific AI systems into sharing sensitive data.
According to Microsoft, there are numerous measures that organizations can implement to prevent this from occurring. These consist of secure monitoring systems and hardcoded input/output filtration to prevent advanced prompt engineering beyond the system’s safety threshold.