Technology ❯ Computer Science ❯ Artificial Intelligence

Large Language Models

Mixture-of-Experts LLM Applications Expert Selection Strategies

Studies Unveil Scaling Laws and Edge Quantization for Mixture-of-Experts Language Models

Researchers validated a metric for predicting sparse model compute efficiency, developing Hessian-aware low-bit inference with expert offloading to reduce on-device memory by roughly 60%