Particle News: Zhipu AI Launches GLM-4.6 With 200K-Token Context and Claimed Open-Weights MoE Design

Overview

Reporting describes a 355B-parameter Mixture-of-Experts model with open weights on Hugging Face that can run locally, including guidance for multi-GPU or quantized setups using tools like vLLM.
Comparative results place GLM-4.6 near Claude Sonnet 4 on some coding tasks but behind Claude Sonnet 4.5 on advanced benchmarks such as SWE-bench.
Practical evaluations in coding environments like Claude Code cite wins in 74 scenarios and note roughly 30% fewer tokens used for comparable work.
Articles cite per-token pricing of about $0.60 per million input tokens and $2.20 per million output tokens, with examples claiming lower monthly costs than Claude; these figures are reported and not independently verified.
Dataconomy notes availability in popular coding tools and inclusion in a GLM Coding Plan starting at $3 per month, with technical specs listing text-only I/O and a maximum output of 128,000 tokens.