OpenAI unveils GPT-5.6 Sol, Terra and Luna models — but only accessible to limited preview partners for now, per US Gov

Read full story on VentureBeat
Share
OpenAI unveils GPT-5.6 Sol, Terra and Luna models — but only accessible to limited preview partners for now, per US Gov
AI disclosure

Summary

<p>OpenAI is <a href="https://openai.com/index/previewing-gpt-5-6-sol/">announcing</a> a limited preview of its newest frontier AI model GPT-5.6 family, which comes in three variants: <b>Sol, Terra, and Luna.</b> </p><p><b>Sol</b> is for the hardest problems, such as complex coding and security research; <b>Terra</b> is for high-volume business tasks like customer support, internal tools and document analysis; and <b>Luna</b> is for faster, lower-cost everyday work like summarization, drafting and routine automation. Sol and Terra set new high benchmark scores, while Luna performs near GPT-5.5 levels on several tests despite being positioned as the fastest and lowest-cost model in the GPT-5.6 family.</p><p>However, the models are being made available initially to a narrow set of approximately 20 total organizations, after OpenAI shared the models and release plans with the U.S. government. A general release is planned for &quot;the coming weeks.&quot;</p><p>The staggered release follows an <a href="https://www.whitehouse.gov/presidential-actions/2026/06/promoting-advanced-artificial-intelligence-innovation-and-security/">executive order issued by President Donald J. Trump earlier this month on June 2, 2026</a>, which calls upon various federal agencies to collaborate on a process for benchmarking and assessing capabilities of new AI models to ensure they are safe and appropriate for wide release. </p><p>While this process remains underway (it was said in the order to take 30 days, so July 2), OpenAI says in its release blog post that it &quot;previewed our plans and the models’ capabilities ahead of today’s launch. At [the U.S. government&#x27;s] request, we are starting with a limited preview for a small group of trusted partners.&quot; </p><p>OpenAI&#x27;s limited preview release strategy also follows the drastic step taken by the<a href="https://venturebeat.com/technology/anthropic-blocks-all-public-access-to-claude-fable-5-mythos-5-following-us-government-order-what-enterprises-should-do"> U.S. government to issue an export control order against Anthropic</a>, OpenAI&#x27;s top U.S. competitor, over jailbreaks found in its most powerful generally released model, Claude Fable 5, to which Anthropic responded by removing any access to the model and its cybersecurity focused counterpart Claude Mythos 5 by public or private parties. (Anthropic had earlier previewed a prior version of the model as &quot;Claude Mythos Preview&quot; to a selected small number of external participants in its cybersecurity research program &quot;<a href="https://venturebeat.com/technology/anthropic-says-its-most-powerful-ai-cyber-model-is-too-dangerous-to-release">Project Glasswing</a>,&quot; dating back to April.)</p><p>Because OpenAI is coordinating its release framework with the White House ahead of a broader public launch, enterprise buyers must navigate a novel landscape of real-time safety interventions, mandatory compliance parameters, and structured token caching systems. </p><h2><b>How the 3 new GPT-5.6 models differ: Sol vs. Terra vs. Luna</b></h2><p>The three GPT-5.6 models are designed to address different enterprise needs and performance profiles. </p><p><b>Sol</b> is the top-tier option, built for the most demanding tasks such as complex reasoning, extended coding sessions, advanced agent-driven workflows, and security-focused applications. </p><p>Sol delivers the highest level of capability but comes at the highest price: $5.00 per million input tokens / $30.00 per million output tokens — the same as GPT-5.5 — and OpenAI says it delivers a major performance gain for long-running coding, cybersecurity and agentic tasks. </p><p><b>Terra</b> balances strong performance with efficiency. It is intended for large-scale production environments where organizations need reliable results across high volumes of work without the overhead of the most advanced model. It&#x27;s available for $2.50/$15 per 1M tokens. </p><p><b>Luna</b> is the most lightweight and cost-efficient option, optimized for speed and everyday use cases. It is well suited for simpler tasks, routine workflows, and applications where responsiveness and scalability are more important than maximum depth of reasoning, and is the most affordably priced at $1/$6 per million tokens in and out, respectively. </p><p>Sources with knowledge of OpenAI&#x27;s inner workings shared with VentureBeat that the new naming scheme was designed to move away from<a href="https://venturebeat.com/ai/openai-launches-gpt-5-not-agi-but-capable-of-generating-software-on-demand"> the &quot;nano&quot; and &quot;mini&quot; variants of GPT-5</a>, as these models are not so different in terms of size or raw intelligence, but rather, designed for different distinct use cases. </p><p>As OpenAI states in its blog post about the new naming scheme: &quot;In this new naming system introduced with GPT‑5.6, the number identifies a model’s generation, while Sol, Terra, and Luna identify durable capability tiers that can advance on their own cadence. Together, the family gives people and developers clearer choices across intelligence, speed, and cost.&quot; </p><p>Also, sources said OpenAI sought to evoke a sense of inspiration by looking to the cosmos and names associated with it. </p><p>Further, Sol fits well alongside OpenAI&#x27;s <a href="https://openai.com/daybreak/">Daybreak</a> opt-in program for organizations interested in using OpenAI models to bolster cyber defense, which is an added bonus. The &quot;Sol&quot; voice style for OpenAI&#x27;s voice mode on ChatGPT is unrelated, and will likely be renamed. </p><p>The<a href="https://deploymentsafety.openai.com/gpt-5-6-preview"> new GPT-5.6 system card</a> adds another important point for businesses: OpenAI is classifying all three GPT-5.6 models — not just Sol — at its “High” risk level for both cyber and biological/chemical capability, while rating them below that level for AI self-improvement. That means even the cheaper Terra and Luna tiers may carry new governance obligations for companies using them in security, life sciences or other sensitive workflows.</p><p>Here&#x27;s how they stack up against the rest of the current leading LLM field in price — note that OpenAI&#x27;s cheapest option is overall a mid-priced model, and still more expensive than the frontier-level GLM-5.2</p><h1><b>VentureBeat Frontier AI Model API Pricing Snapshot</b></h1><table><tbody><tr><td><p><b>Model</b></p></td><td><p><b>Input</b></p></td><td><p><b>Output</b></p></td><td><p><b>Total Cost</b></p></td><td><p><b>Source</b></p></td></tr><tr><td><p>MiMo-V2.5 Flash</p></td><td><p>$0.10</p></td><td><p>$0.30</p></td><td><p>$0.40</p></td><td><p><a href="https://platform.xiaomimimo.com/docs/en-US/pricing">Xiaomi MiMo</a></p></td></tr><tr><td><p>deepseek-v4-flash</p></td><td><p>$0.14</p></td><td><p>$0.28</p></td><td><p>$0.42</p></td><td><p><a href="https://api-docs.deepseek.com/quick_start/pricing">DeepSeek</a></p></td></tr><tr><td><p>deepseek-v4-pro</p></td><td><p>$0.435</p></td><td><p>$0.87</p></td><td><p>$1.305</p></td><td><p><a href="https://api-docs.deepseek.com/quick_start/pricing">DeepSeek</a></p></td></tr><tr><td><p>MiniMax-M3</p></td><td><p>$0.30</p></td><td><p>$1.20</p></td><td><p>$1.50</p></td><td><p><a href="https://platform.minimax.io/subscribe/token-plan?tab=api-enterprise">MiniMax</a></p></td></tr><tr><td><p>Gemini 3.1 Flash-Lite</p></td><td><p>$0.25</p></td><td><p>$1.50</p></td><td><p>$1.75</p></td><td><p><a href="https://ai.google.dev/gemini-api/docs/pricing">Google</a></p></td></tr><tr><td><p>Qwen3.7-Plus</p></td><td><p>$0.40</p></td><td><p>$1.60</p></td><td><p>$2.00</p></td><td><p><a href="https://modelstudio.console.alibabacloud.com/ap-southeast-1?tab=doc#/doc/?type=model&amp;url=2840914_2&amp;modelId=qwen3.7-plus&amp;serviceSite=international">Alibaba Cloud</a></p></td></tr><tr><td><p>MiMo-V2.5</p></td><td><p>$0.40</p></td><td><p>$2.00</p></td><td><p>$2.40</p></td><td><p><a href="https://platform.xiaomimimo.com/docs/en-US/pricing">Xiaomi MiMo</a></p></td></tr><tr><td><p>Grok 4.3 (low context)</p></td><td><p>$1.25</p></td><td><p>$2.50</p></td><td><p>$3.75</p></td><td><p><a href="https://docs.x.ai/developers/models/grok-4.3">xAI</a></p></td></tr><tr><td><p>MiMo-V2.5 Pro (≤256K)</p></td><td><p>$1.00</p></td><td><p>$3.00</p></td><td><p>$4.00</p></td><td><p><a href="https://platform.xiaomimimo.com/docs/en-US/pricing">Xiaomi MiMo</a></p></td></tr><tr><td><p>Kimi-K2.6</p></td><td><p>$0.95</p></td><td><p>$4.00</p></td><td><p>$4.95</p></td><td><p><a href="https://platform.kimi.ai/docs/pricing/chat-k26">Moonshot/Kimi</a></p></td></tr><tr><td><p>GLM-5.2</p></td><td><p>$1.40</p></td><td><p>$4.40</p></td><td><p>$5.80</p></td><td><p><a href="https://docs.z.ai/guides/overview/pricing">Z.ai</a></p></td></tr><tr><td><p><b>GPT-5.6 Luna</b></p></td><td><p><b>$1.00</b></p></td><td><p><b>$6.00</b></p></td><td><p><b>$7.00</b></p></td><td><p><b></b><a href="https://openai.com/index/previewing-gpt-5-6-sol/"><b>OpenAI</b></a></p></td></tr><tr><td><p>Grok 4.3 (high context)</p></td><td><p>$2.50</p></td><td><p>$5.00</p></td><td><p>$7.50</p></td><td><p><a href="https://docs.x.ai/developers/models/grok-4.3">xAI</a></p></td></tr><tr><td><p>MiMo-V2.5 Pro (&gt;256K)</p></td><td><p>$2.00</p></td><td><p>$6.00</p></td><td><p>$8.00</p></td><td><p><a href="https://platform.xiaomimimo.com/docs/en-US/pricing">Xiaomi MiMo</a></p></td></tr><tr><td><p>Qwen3.7-Max</p></td><td><p>$2.50</p></td><td><p>$7.50</p></td><td><p>$10.00</p></td><td><p><a href="https://modelstudio.console.alibabacloud.com/ap-southeast-1?tab=doc#/doc/?type=model&amp;url=2840914_2&amp;modelId=qwen3.7-max&amp;serviceSite=international">Alibaba Cloud</a></p></td></tr><tr><td><p>Gemini 3.5 Flash</p></td><td><p>$1.50</p></td><td><p>$9.00</p></td><td><p>$10.50</p></td><td><p><a href="https://ai.google.dev/gemini-api/docs/pricing">Google</a></p></td></tr><tr><td><p>Gemini 3.1 Pro Preview (≤200K)</p></td><td><p>$2.00</p></td><td><p>$12.00</p></td><td><p>$14.00</p></td><td><p><a href="https://ai.google.dev/gemini-api/docs/pricing">Google</a></p></td></tr><tr><td><p><b>GPT-5.6 Terra</b></p></td><td><p><b>$2.50</b></p></td><td><p><b>$15.00</b></p></td><td><p><b>$17.50</b></p></td><td><p><b></b><a href="https://openai.com/index/previewing-gpt-5-6-sol/"><b>OpenAI</b></a></p></td></tr><tr><td><p>GPT-5.4</p></td><td><p>$2.50</p></td><td><p>$15.00</p></td><td><p>$17.50</p></td><td><p><a href="https://openai.com/api/pricing/">OpenAI</a></p></td></tr><tr><td><p>Gemini 3.1 Pro Preview (&gt;200K)</p></td><td><p>$4.00</p></td><td><p>$18.00</p></td><td><p>$22.00</p></td><td><p><a href="https://ai.google.dev/gemini-api/docs/pricing">Google</a></p></td></tr><tr><td><p>Claude Opus 4.8</p></td><td><p>$5.00</p></td><td><p>$25.00</p></td><td><p>$30.00</p></td><td><p><a href="https://platform.claude.com/docs/en/about-claude/pricing">Anthropic</a></p></td></tr><tr><td><p>GPT-5.5</p></td><td><p>$5.00</p></td><td><p>$30.00</p></td><td><p>$35.00</p></td><td><p><a href="https://openai.com/api/pricing/">OpenAI</a></p></td></tr><tr><td><p>GPT-5.5 Instant (<code>chat-latest</code>)</p></td><td><p>$5.00</p></td><td><p>$30.00</p></td><td><p>$35.00</p></td><td><p><a href="https://developers.openai.com/api/docs/models/chat-latest">OpenAI</a></p></td></tr><tr><td><p>Sakana Fugu Ultra (≤272K)</p></td><td><p>$5.00</p></td><td><p>$30.00</p></td><td><p>$35.00</p></td><td><p><a href="https://console.sakana.ai/pricing#subscription-plan">Sakana AI</a></p></td></tr><tr><td><p><b>GPT-5.6 Sol</b></p></td><td><p><b>$5.00</b></p></td><td><p><b>$30.00</b></p></td><td><p><b>$35.00</b></p></td><td><p><b></b><a href="https://openai.com/index/previewing-gpt-5-6-sol/"><b>OpenAI</b></a></p></td></tr><tr><td><p>Claude Fable 5 / Claude Mythos 5</p></td><td><p>$10.00</p></td><td><p>$50.00</p></td><td><p>$60.00</p></td><td><p><a href="https://platform.claude.com/docs/en/about-claude/models/overview">Anthropic</a></p></td></tr></tbody></table><h2><b>Technology: deeper reasoning and subagent-based work</b></h2><p>The main technical change in GPT-5.6 centers on giving the model more time and structure for hard tasks during inference. </p><p>OpenAI is adding a new <code>max</code> reasoning setting for GPT-5.6 Sol, aimed at problems that require more extended deliberation.</p><p>OpenAI is also introducing <code>ultra</code> mode, which brings in subagents that can split up and accelerate complex projects, rather than keeping the work inside a single-agent flow.</p><p>The company’s launch evaluations suggest this approach improves performance on several agent-style tasks.</p><h2><b>Benchmarks show measurable improvement from GPT-5.5, and new state-of-the-art on TerminalBench 2.1 command-line tasks </b></h2><p>The GPT-5.6 series demonstrates a clear performance leap over its predecessors across complex reasoning and long-horizon tasks. </p><p>In command-line automation evaluated on TerminalBench 2.1, both the flagship Sol model and the mid-tier Terra outpace the previous GPT-5.5 benchmark, though notably Sol used the new ultra thinking mode to achieve a record-high score of 91.91% on the benchmark, and the max mode achieved 88.76% — ahead of both GPT-5.5&#x27;s 83.4% and Claude Mythos 5&#x27;s 88%. </p><p>This superiority extends into professional workflows on Agent&#x27;s Last Exam, where Sol is the sole model to successfully clear the halfway mark for task completion at 50.9% in &quot;code mode,&quot; while the everyday Luna tier also manages to narrowly edge out the prior generation&#x27;s flagship. </p><p>In quantitative biology and genomics testing, Sol and Terra achieve higher accuracy rates than both GPT-5.5 and GPT-5.4, with Sol explicitly managing these stronger results while consuming fewer tokens. </p><p>Finally, across cybersecurity evaluations measuring vulnerability research and exploitation, the new models push past prior performance ceilings; Sol reaches significantly higher intended exploit rates as reasoning time scales up and achieves competitive capability caps using a fraction of the output tokens required by older models.</p><p>On ExploitBench, OpenAI says Sol performs near Mythos Preview while generating roughly one-third as many output tokens.</p><h2><b>Predictable prompt caching mechanics and a Cerebras speed bump</b></h2><p>To help enterprises control the unpredictable cost curves of running agentic loops, the GPT-5.6 API introduces a revamped prompt caching protocol. </p><p>Developers can now implement explicit cache breakpoints, backed by a guaranteed 30-minute minimum cache lifetime. </p><p>Under this framework<b>, initial cache writes cost 1.25x</b> the model’s standard uncached input rate, while <b>later cache reads receive a 90% discount. </b></p><p>In practice, businesses running repeated or similar operations pay more to establish the cache, then much less each time they reuse that cached context during at least the 30-minute minimum cache window.</p><p>For systems that routinely pass massive context windows or codebase definitions back into the model, this predictability is a critical financial guardrail. </p><p>Furthermore, for enterprise applications where latency is the primary barrier to adoption, OpenAI is launching GPT-5.6 Sol on Cerebras hardware this July. </p><p>This infrastructure partnership claims processing speeds of up to <b>750 tokens per second</b>, targeting specialized enterprise applications requiring real-time, frontier-grade reasoning. </p><h2><b>Enterprise implications: High security and algorithmic friction</b></h2><p>For corporate engineering, information security, and compliance teams, the deployment of GPT-5.6 requires a meticulous look at its security architecture. </p><p>To achieve clearance for release, OpenAI dedicated roughly <b>700,000 A100e GPU hours</b> solely to automated red-teaming GPT-5.6. This compute was allocated to discovering &quot;universal jailbreaks&quot;—systemic attack vectors designed to bypass safeguards across varied contexts, rather than single-prompt workarounds.</p><p>OpenAI says it has implemented a multi-layered safeguard stack that operates in real time, putting up intentional operational hurdles for enterprise security teams. </p><ul><li><p><b>Model-level refusals:</b> GPT-5.6 is tuned to reject banned cyber help, including requests that mask malicious intent or attempt jailbreak-style workarounds.</p></li><li><p><b>Live misuse screening:</b> Separate cyber and biology detectors review generations while they are being produced.</p></li><li><p><b>Activation-based screening: </b>For Sol and Terra, OpenAI says it is adding activation classifiers that monitor internal model signals during inference. If those systems detect a risky pattern, output streaming can pause while another safety check reviews the content. Luna does not appear to receive that same activation-classifier layer, though it is still covered by other monitoring systems.</p></li><li><p><b>Reasoning review pauses: </b>When risk appears elevated, generation can stop while a larger reasoning system examines the exchange and surrounding context. If the system classifies the output as disallowed, the answer is blocked before it reaches the endpoint.</p></li></ul><p>Because legitimate defensive work—such as code reviews, vulnerability discovery, patch engineering, and defensive testing—frequently utilizes the exact same code primitives as offensive exploits, OpenAI admits that its classifiers may regularly trigger false positives. </p><p>The system card says OpenAI’s monitoring stack posted 94.8% overall recall on its biology evaluation set and 81.6% overall recall on its cybersecurity evaluation set. Those figures give enterprises a rare quantitative look at the safeguards, but they also show the system is not perfect and may miss some risky cases or block some legitimate work.</p><p>Persistent flagging can trigger automated account-level reviews across historical conversations to evaluate if an enterprise client is engaging in malicious behavior or standard security research. OpenAI is currently negotiating longer-term enterprise safety compliance controls, including customer-operated safety overrides and privacy-preserving detection mechanisms, to insulate corporate data from manual review pipelines. </p><p>Importantly, OpenAI notes that under testing, Sol remains optimized for defensive containment rather than offensive deployment. In evaluations running against the Chromium and Firefox codebases, the model successfully isolated bugs and exploitation primitives but was unable to autonomously engineer a functional, full-chain exploit, keeping it safely below the organization&#x27;s &quot;Cyber Critical&quot; alert threshold. </p><p>But all three GPT-5.6 models crossed its “High” cyber threshold on internal capture-the-flag testing, with Sol reaching 96.7%, Terra reaching 91.84% and Luna reaching 85.19%.</p><p>That distinction matters for enterprise security buyers: OpenAI is presenting GPT-5.6 as powerful enough to help automate parts of vulnerability research and exploit analysis, but not yet as a system that can reliably run a complete advanced attack campaign without human direction under the company’s test conditions.</p><h2><b>The Geopolitics of the phased release</b></h2><p>The broader rollout of the GPT-5.6 series reflects an escalating entanglement between frontier AI labs and national security protocols. </p><p>The decision to limit initial access to a small circle of vetted partners whose details are shared with the U.S. government stems from direct coordination regarding the developing cyber Executive Order framework. OpenAI has taken the unusual step of publicly critiquing this sovereign gatekeeping within its official product announcement documentation. The company states plainly: </p><blockquote><p>&quot;We don’t believe this kind of government access process should become the long-term default. It keeps the best tools from users, developers, enterprises, cyber defenders, and global partners who need them.&quot; </p></blockquote><p>This tension highlights the precarious position of modern tech enterprises. While organizations can leverage unprecedented agentic efficiency and robust defensive patching capabilities via benchmarks like ExploitGym and ExploitBench, they must also accept that access to premier tools remains subject to diplomatic and regulatory authorization. </p>

Discussion on

Trending posts from X.

Original reporting

Open original source

Related coverage

Read full article on VentureBeat

Get the AFBytes Brief

Major stories, AI-assisted analysis, and what to watch next. Free, monthly, unsubscribe anytime.