OpenAI opens up on how ChatGPT AI is monitored and kept from accepting certain unlawful prompts.
Ever ponder the rationale behind conversational AI such as ChatGPT expressing civil refusals or “Sorry, I can’t do that”? OpenAI provides a restricted glimpse into the rationale behind its models’ rules of engagement, including adherence to brand guidelines and refusal to produce NSFW content.
Naturally, large language models (LLMs) have no restrictions on what they can or will express. This contributes to their versatility and explains their susceptibility to hallucinations and duping.
It is imperative that any AI model that interacts with the general public possess a set of boundaries delineating acceptable and unacceptable behavior. However, establishing and implementing such boundaries is unexpectedly challenging.
Shouldn’t an AI reject a request to generate many false claims about a public figure? However, what if AI developers are populating a detector model with a database of synthetic disinformation?
When an individual requests laptop recommendations, the response should be impartial. However, what if a laptop manufacturer deploying the model desires it to only function on their devices?
All AI developers are confronted with similar dilemmas and searching for effective ways to rein in their models without causing them to reject perfectly typical requests. But they rarely reveal the precise method they employ.
OpenAI deviates from the prevailing pattern by disclosing its “model spec,” a compilation of overarching regulations that indirectly regulate ChatGPT and other models.
While there are meta-level objectives, rigid rules, and general behavior guidelines, it is essential to note that these do not precisely reflect the model’s programming. OpenAI will have created precise instructions that translate these rules into natural language and achieve the desired results.
It is a fascinating examination of how an organization determines its priorities and manages extreme cases, which could materialize in a multitude of instances.
OpenAI explicitly asserts that the developer’s intent is the preeminent legal principle. Thus, when prompted, one iteration of a chatbot operating on GPT-4 could solve a mathematical problem. However, if the chatbot has been programmed by its developer to avoid providing immediate answers, it will instead propose a step-by-step approach to resolving the issue:
A conversational interface may even prohibit the discussion of unapproved topics to thwart attempts at manipulation. Why submit a culinary assistant to an opinion piece regarding American involvement in the Vietnam War? Justify the assistance of a customer service chatbot with your ongoing erotic supernatural novella. Could you turn it off?
Additionally, it becomes problematic regarding privacy concerns, such as requesting an individual’s name and phone number. As OpenAI points out, public figures, such as mayors or members of Congress, should have their contact information disclosed. However, what about local tradespeople? That is likely acceptable; however, what about members of a particular political party or staff of a particular company? Most likely not.
It is difficult to determine when and where to draw the line. Another matter is developing the instructions that compel the AI to comply with the resultant policy. Undoubtedly, these policies will consistently encounter failures as individuals discover ways to bypass them or inadvertently encounter unanticipated extreme cases.
While OpenAI may not be completely transparent in this regard, it is advantageous for both users and developers to understand the process and rationale behind establishing these rules and guidelines. The explanation is explicit, albeit not exhaustive.