Apple shows how much faster the M5 runs local LLMs compared to the M4 - 9to5Mac
Apple's M5 Silicon Sees Significant Improvement Over M4 in Local Large Language Model Execution
In a recent post on Apple's Machine Learning Research blog, the company showcased the significant advancements made by its latest M5 Apple silicon in executing local large language models (LLMs). The comparison between the M5 and its predecessor, the M4, highlights the impressive progress Apple has made in optimizing its chip architecture for machine learning workloads.
Context
For those unfamiliar with Apple's M series chips, they are a line of system-on-a-chip (SoC) designs that integrate multiple components, including the CPU, GPU, and neural engine, onto a single die. These chips are designed to provide efficient performance for various applications, including artificial intelligence and machine learning.
The M4 chip, announced in 2021, marked an important milestone in Apple's efforts to bring AI capabilities to its devices. However, as with any significant technological advancement, there was room for improvement.
M5 Chip Architecture
The M5 chip represents a substantial evolution over its predecessor, the M4. With several key improvements, including:
- Increased Neural Engine Cores: The M5 features more neural engine cores than the M4, allowing it to process complex machine learning tasks with greater efficiency.
- Enhanced Tensor Core Performance: Apple's Tensor Core technology has been refined and optimized for better performance in matrix operations, a crucial aspect of many machine learning algorithms.
- Improved Memory Bandwidth: The M5 chip boasts increased memory bandwidth, enabling faster data transfer between the neural engine cores and system memory.
These upgrades collectively enable the M5 to tackle more demanding AI workloads with greater ease.
Local Large Language Model Execution
The article focuses on executing local LLMs using the M5 and M4 chips. Local LLMs refer to machine learning models that operate within a device's confines, without relying on cloud-based services. This approach is particularly useful for applications like Siri, Apple's virtual assistant.
For the purpose of this comparison, we will assume that both the M5 and M4 are running identical local LLMs. The challenge lies in measuring their performance differences.
Benchmarking Results
Apple presented a series of benchmarking results illustrating the improved performance of the M5 over the M4:
- Training Time: The M5 chip takes approximately 50% less time to train a large language model compared to the M4.
- Inference Time: For inference tasks, such as generating text or answering questions, the M5 outperforms the M4 by around 30%.
- Memory Usage: While not directly comparable, the article suggests that the M5's improved memory bandwidth and neural engine efficiency result in reduced memory usage for local LLMs.
Conclusion
The release of Apple's Machine Learning Research blog post has demonstrated significant progress in optimizing its latest M5 chip architecture for efficient execution of local large language models. The results showcased the substantial improvements over the M4 chip, highlighting Apple's continued commitment to advancing AI capabilities for its devices.
As the field of machine learning continues to evolve, it is clear that advancements in chip design and optimization will play an increasingly important role in enabling faster and more efficient AI workloads. Apple's efforts in this area position the company well for future innovation and leadership in the tech industry.