Bittensor subnet achieves state-of-the-art AI safety with HaloGuard 1.0

1 hour ago 26

A Bittensor subnet called Trishool just dropped a safety model that beats every other open-weight guard model on the market. HaloGuard 1.0, released on July 2 by Astroware Labs, achieved top marks across seven established prompt-safety benchmarks, and it did so with a model small enough to run as a lightweight filter.

HaloGuard 1.0 comes in two sizes: a 0.8B parameter version and a 4B parameter version. The 4B variant is the headline grabber, securing first place across all seven benchmarks it was tested against.

Both models function as runtime guards, meaning they operate in real time to screen prompts before they hit the LLM or agent handling user requests. This is a fundamentally different approach from post-generation filtering, which tries to catch harmful outputs after the damage is already done.

The earlier Alpha version of HaloGuard was integrated into the Chutes subnet on May 19 for live AI chat applications. That deployment hit an 87% F1 score on safety benchmarks including Aegis and HarmBench, which gave the team real-world validation before pushing to version 1.0.

Trishool, designated SN23 on the Bittensor network, operates as a decentralized adversarial red-teaming network. Miners on the subnet are incentivized to continuously attack and stress-test safety models, finding vulnerabilities so they can be patched. The more effectively a miner breaks the model, the more they earn.

The subnet was relaunched roughly seven months before the HaloGuard 1.0 announcement. Astroware Labs, which operates the subnet, has positioned its work as building “production-grade safety layers” for AI applications.

A full arXiv paper detailing HaloGuard 1.0’s architecture and benchmark results is expected soon.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

Read Entire Article