Kimi K2.5 runs on RTX 3060 with 768GB Intel Optane memory at 4 tokens per second

2 hours ago 19

A trillion-parameter AI model just ran on a graphics card that most gamers would consider mid-range.

A Chinese AI enthusiast known as APFrisco demonstrated Moonshot AI’s Kimi K2.5 model, a Mixture-of-Experts (MoE) large language model with 1 trillion total parameters, running on a single Nvidia RTX 3060 GPU paired with 768 GB of Intel Optane Persistent Memory. The setup achieved roughly four tokens per second, which is slow by production standards but remarkable given the hardware involved.

How a mid-tier GPU handles a trillion parameters

Kimi K2.5 doesn’t actually fire up all 1 trillion parameters at once. For each token generated, only 32 billion parameters are activated. The rest sit idle, waiting their turn.

Even with that efficiency trick, the model is enormous. The full Kimi K2.5 weighs in at approximately 630 GB. Quantized versions, which compress the model’s precision to reduce memory requirements, still clock in around 381 GB. That’s why APFrisco needed 768 GB of Intel Optane Persistent Memory: no standard consumer RAM setup comes close to handling that kind of footprint.

Optane PMem DIMMs are an interesting choice. Intel discontinued its Optane line, which means these modules are now essentially legacy hardware floating around the second-hand market. They’re slower than traditional DRAM but vastly cheaper per gigabyte, making them an unconventional but surprisingly practical solution for loading massive models that would otherwise require enterprise-grade infrastructure.

The RTX 3060 launched in early 2021 with 12 GB of VRAM. It was designed for 1080p gaming and light creative workloads, not running frontier AI models.

What typical Kimi K2.5 deployments look like

High-performance inference for Kimi K2.5 typically targets configurations with up to 8 high-end GPUs. Those setups deliver speeds between 10 and 300-plus tokens per second.

The demonstration was shared on Reddit’s r/LocalLLaMA community and subsequently covered by Tom’s Hardware.

Kimi K2.5 itself was released on January 27, 2026, by Moonshot AI. It features multimodal capabilities and was trained on roughly 15 trillion mixed visual and text tokens. It’s an open-weight model, meaning anyone can download and run it, which is precisely what made APFrisco’s experiment possible in the first place.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

Read Entire Article