← All posts

What's done, and what isn't

A new engine is easy to oversell. Here is the real state of Smedjan — the same map you will find in the docs — so you can decide what to trust it with.

What trains today: the full pipeline — tokenizer, data prep, pretraining, knowledge distillation, supervised fine-tuning, DPO alignment, quantization, and export — on both Metal and CUDA, with a portable checkpoint format. Every mixer trains: softmax attention, selective state-space (Mamba-2/SSD), linear attention, RWKV (numerically-stable WKV), and block-sparse. AdamW, Muon, and the hybrid optimizer all work.

Interop that works: safetensors import reads F32/BF16/F16, a HuggingFace config.json maps straight to a model (import-hf), GGUF export emits real GGML f32/q8_0/q4_0 blocks for llama.cpp, and the long-context suite (NIAH / RULER) runs with smedjan eval --longctx.

What is on the roadmap: faithful bit-exact HuggingFace inference parity (half-split RoPE, fixed QK-norm) — the import path is for continued training, not inference reproduction — and CUDA backward parity for a few specialized kernels, where the Metal path is the most exercised.

I would rather you find this list here than in a failed run. The full, always-current version lives in the troubleshooting & FAQ page. If that excites you rather than worries you, the repository is open.

— Andrei

Watch the repository for releases, or come help forge it.