Create a REST API for the Microsoft/BitNet B1.58 model and integrate it with an Open WebUI

I’m currently writing this as a bit of a vent. I typically use Ollama models, but I discovered someone’s post on X (formerly Twitter) about a Microsoft model that supposedly runs well on CPU alone, with even better performance on systems like the M2 chip. Microsoft just a 1-bit LLM with 2B parameters that can run on CPUs like Apple M2. BitNet b1.58 2B4T outperforms fp LLaMA 3.2 1B while using only 0.4GB memory versus 2GB and processes tokens 40% faster. 100% opensource. pic.twitter.com/kTeqTs6PHd — Shubham Saboo (@Saboo_Shubham_) April 18, 2025 And that model is microsoft/BitNet b1.58 2B4T. After seeing…








