Overview
- Reporting describes a 355B-parameter Mixture-of-Experts model with open weights on Hugging Face that can run locally, including guidance for multi-GPU or quantized setups using tools like vLLM.
- Comparative results place GLM-4.6 near Claude Sonnet 4 on some coding tasks but behind Claude Sonnet 4.5 on advanced benchmarks such as SWE-bench.
- Practical evaluations in coding environments like Claude Code cite wins in 74 scenarios and note roughly 30% fewer tokens used for comparable work.
- Articles cite per-token pricing of about $0.60 per million input tokens and $2.20 per million output tokens, with examples claiming lower monthly costs than Claude; these figures are reported and not independently verified.
- Dataconomy notes availability in popular coding tools and inclusion in a GLM Coding Plan starting at $3 per month, with technical specs listing text-only I/O and a maximum output of 128,000 tokens.