Anthropic Reveals Claude’s System Prompts

James Emmanuel

1 year ago

Anthropic reveals Claude’s system prompts, showing generative AI models as statistical systems that follow specific instructions without true humanlike intelligence

Generative AI models are not genuinely humanlike. They are merely statistical systems that anticipate a sentence’s most probable subsequent words; they lack intelligence and personality.

However, similar to apprentices in an oppressive environment, they comply with instructions without protest, including the initial “system prompts” that familiarize the models with their fundamental attributes and the appropriate and inappropriate behaviors.

To prevent (or at least attempt to prevent) models from behaving poorly and to guide the general tone and sentiment of the models’ replies, all generative AI vendors, from OpenAI to Anthropic, employ system prompts. For example, a prompt may instruct a model to be courteous but never apologize or to be candid about the fact that it cannot know everything.

However, vendors typically maintain the confidentiality of system prompts, possibly due to competitive considerations or because knowledge of the prompt may propose methods for circumventing it. For instance, the sole method of revealing the system prompt of GPT-4o is through a prompt injection attack. And even then, the system’s output cannot be entirely trusted.

However, in an ongoing effort to establish itself as a more ethical and transparent AI vendor, Anthropic has made the system prompts for its most recent models (Claude 3 Opus, Claude 3.5 Sonnet, and Claude 3.5 Haiku) available on the web and in the Claude iOS and Android apps.

In a post on X, Alex Albert, the director of Anthropic’s developer relations, stated that the company intends to implement this type of disclosure consistently as it updates and refines its system prompts.

We've added a new system prompts release notes section to our docs. We're going to log changes we make to the default system prompts on Claude dot ai and our mobile apps. (The system prompt does not affect the API.) pic.twitter.com/9mBwv2SgB1
— Alex Albert (@alexalbert__) August 26, 2024

The most recent prompts, dated July 12, provide a comprehensive explanation of the limitations of the Claude models, such as the inability to access URLs, links, or videos. Facial recognition is strictly prohibited; the system prompts Claude Opus to instruct the model to “respond as if it is completely face blind” and to “avoid identifying or naming any humans in [images].”

However, the prompts also characterize specific personality traits and characteristics—traits and characteristics that Anthropic would like the Claude models to exemplify.

For example, the prompt for Claude 3 Opus specifies that Claude is to present itself as “intellectually curious and highly intelligent” and “delights in engaging in discussions on a diverse range of topics and hearing the perspectives of humans.”

It also instructs Claude to approach controversial topics impartially and objectively, offering “careful thoughts” and “clear information.” Additionally, he is advised to refrain from beginning responses with the words “certainly” or “absolutely.”

This human finds the system prompts somewhat peculiar, as they are composed like an actor would compose a character analysis page for a stage play. The prompt for Opus concludes with the statement, “Claude is now being connected with a human,” which suggests that Claude is a consciousness on the other side of the screen and exists solely to satisfy the desires of its human conversation companions.

However, that is undoubtedly an illusion. If the prompts for Claude demonstrate anything, these models are alarmingly undeveloped without human guidance and assistance.

Anthropic is imposing pressure on its competitors to publish these new system prompt changelogs, which are the first of their kind from a significant AI vendor. It will be necessary to determine whether the gambit is effective.