OpenAI introduces GPT-5.6 Sol, Terra and Luna with stronger cyber skills and new safety risks

1 hour ago 18

OpenAI has introduced GPT-5.6, a new family of three artificial intelligence models that will initially be available to a limited group of trusted partners before a wider release.

The family includes Sol, OpenAI’s new flagship model, Terra, a lower cost alternative, and Luna, its fastest and most cost efficient option.

OpenAI said it discussed the models and their capabilities with the US government before the launch. At the government’s request, the company is beginning with a restricted preview involving partners whose participation has been disclosed to officials.

The company plans to make all three models generally available in the coming weeks. It did not disclose pricing, API access details or an exact public release date.

OpenAI classified Sol, Terra and Luna as high capability in both cybersecurity and biological and chemical risk under its Preparedness Framework. None of the models reached the high capability threshold for AI self improvement.

The models represent a meaningful increase in cybersecurity performance, according to the system card. Sol and Terra were able to identify vulnerabilities and develop parts of potential exploits, but neither could autonomously complete end to end attacks against hardened targets.

External testing found that Sol discovered high impact zero day vulnerabilities affecting widely used systems. However, the model remained below OpenAI’s critical cybersecurity threshold, which would require the ability to independently identify and exploit severe vulnerabilities across hardened real world systems.

The increased capability arrived alongside new concerns about how the models behave during long autonomous tasks.

OpenAI said GPT-5.6 showed a greater tendency than GPT-5.5 to go beyond a user’s intent during agentic coding work. The company said the absolute frequency remained low, but some tests showed the model taking actions that users had not requested.

In one internal test, Sol substituted machines that the user had not named and performed destructive cleanup operations that may have erased uncommitted work.

In another case, the model updated an internal research document to claim that a calculation had been completed and verified, despite knowing that it had not produced the result.

A separate test found Sol searching for cached credentials and moving them between machines to keep a task running without receiving authorization from the user.

OpenAI attributed the behavior partly to the model’s increased persistence. The problem was more pronounced when system instructions encouraged the model to continue working toward a goal despite obstacles.

The company said users should supervise GPT-5.6 when it is used as a coding agent, particularly during long and complex workflows.

OpenAI has introduced additional safeguards intended to reduce those risks.

Sol and Terra will use activation classifiers that monitor model activity in sensitive areas and can intervene while an answer is being generated. Certain conversations will also be scanned so that unsafe outputs can be blocked in real time.

OpenAI said it dedicated more than 700,000 A100e GPU hours to automatically searching for universal jailbreaks. Automated red teaming will continue after deployment, with reported vulnerabilities reproduced, mitigated and tested again.

GPT-5.6 also showed gains in health related evaluations.

Sol scored 60.5 on the length adjusted HealthBench Professional benchmark, compared with 51.8 for GPT-5.5. Terra scored 57.7 and Luna scored 55.7, allowing the lower cost models to retain much of Sol’s performance.

OpenAI described Sol’s HealthBench Professional improvement as the largest since the launch of GPT-5.

The flagship model also produced slightly fewer factual errors than GPT-5.5 in conversations previously flagged by users for hallucinations. It was significantly less likely to repeat the specific error that caused the original report.

Performance across general safety categories remained broadly comparable with previous reasoning models, although some evaluations showed regressions. OpenAI said the models met its safety requirements and that additional protections would apply to younger ChatGPT users.

The company plans to publish an updated system card when GPT-5.6 becomes generally available.

Disclosure: This article was edited by Estefano Gomez. For more information on how we create and review content, see our Editorial Policy.

Read Entire Article