Anthropic employs Jan Leike, a former safety lead at OpenAI, to lead the new “superalignment” team in the company.
Leading AI researcher Jan Leike, who quit OpenAI earlier this month after openly criticizing the company’s AI safety policy, has joined rival Anthropic as the team leader for a newly formed “superalignment” division.
Leike stated in a post on X that his team at Anthropic will concentrate on “scalable oversight,” “weak-to-strong generalization,” and automated alignment research, among other areas of AI safety and security.
As Leike’s team grows, researchers at Anthropic working on scalable oversight—methods to regulate the behavior of large-scale AI in predictable and desirable ways—will report directly to Leike, according to a source familiar with the situation who spoke with TechCrunch.
Leike’s team’s goal sounds a lot like that of OpenAI’s recently disbanded Superalignment project. Leike co-led the Superalignment team, tasked with overcoming the fundamental technological problems of superintelligent AI control within the next four years. However, the team frequently encountered obstacles due to OpenAI’s leadership.
Anthropic has repeatedly attempted to portray itself as OpenAI’s less concerned about security.
Dario Amodei, the CEO of Anthropic, was previously the vice president of research at OpenAI. It is said that the two companies fell out throughout the company’s course. Namely, OpenAI’s increasing commercial emphasis. To create Anthropic, Amodei pulled in several former OpenAI staff members, including Jack Clark, the former policy lead.