OpenAI was established with a commitment to develop artificial intelligence (AI) that serves the greater good of humanity, even as the AI surpasses human intelligence. Despite recent commercial focus, especially with the introduction of ChatGPT, the company remains dedicated to addressing the challenges posed by increasingly powerful AIs. The Superalignment research team, formed in July, is actively working on strategies to manage future superhuman AI entities, which are anticipated to possess significant capabilities and potential risks.
Leopold Aschenbrenner, a researcher at OpenAI involved in the Superalignment project, emphasizes the rapid approach of Artificial General Intelligence (AGI) and the need for effective control methods. OpenAI has allocated a substantial portion of its computing power to this critical research initiative.
In a newly released research paper, OpenAI details experiments aimed at allowing a less advanced AI model to guide the behavior of a more intelligent one without compromising its capabilities. The study focuses on the supervision process, currently involving human feedback to enhance the performance of models like GPT-4. As AI progresses, there is a growing interest in automating this feedback loop, considering the potential limitations of human input as AI surpasses human intelligence.
The researchers conducted a control experiment using GPT-2 to train GPT-4, which initially resulted in a decrease in the capabilities of the superior model. Two proposed solutions were tested: training progressively larger models to mitigate performance loss and implementing an algorithmic adjustment to GPT-4, enabling it to heed the guidance of the inferior model without significant performance reduction. The latter proved more effective, although the researchers acknowledge that these methods do not guarantee flawless behavior from the stronger model, considering it a preliminary step for future research.
Dan Hendryks, director of the Center for AI Safety, applauds OpenAI’s proactive approach to the challenges of controlling superhuman AIs and emphasizes the need for sustained, dedicated efforts over many years to successfully address this complex issue.