Particle.news

Download on the App Store

New Algorithm Compresses Large Language Models for Local Device Use

Engineers from Princeton and Stanford develop a technique to run LLMs on phones and laptops, enhancing privacy and efficiency.

  • The CALDERA algorithm reduces redundancies and precision in LLM data, enabling storage on local devices.
  • By compressing data, the technique allows LLMs to operate on consumer-grade GPUs, lowering costs and energy use.
  • The method combines low-precision and low-rank techniques for improved compression over previous methods.
  • Testing with Meta AI's Llama models showed up to a 5% performance improvement in certain tasks.
  • Local processing of LLMs enhances privacy by eliminating the need to send data to centralized servers.
Hero image