LLM Inference Optimization

Micron Sets New Benchmark With the World's First High-Capacity 256GB LPDRAM SOCAMM2 for Data Center Infrastructure

News highlights: 1/3 the power consumption and 1/3 smaller footprint versus standard RDIMMs — enabled by the industry's first monolithic 32Gb ...

VentureBeat

New LLM optimization technique slashes memory costs up to 75%

Researchers at the Tokyo-based startup Sakana AI have developed a new technique that enables language models to use memory more efficiently, helping enterprises cut the costs of building applications ...

TMCnet

Inception Launches Mercury 2, the Fastest Reasoning LLM - 5x Faster Than Leading Speed-Optimized LLMs, with Dramatically Lower Inference Cost

Inception, the company behind the first commercial diffusion large language models (dLLMs), today announced the launch of Mercury 2, the fastest reasoning LLM and first reasoning dLLM. Mercury 2 ...

Show inaccessible results

Micron Sets New Benchmark With the World's First High-Capacity 256GB LPDRAM SOCAMM2 for Data Center Infrastructure

New LLM optimization technique slashes memory costs up to 75%

Inception Launches Mercury 2, the Fastest Reasoning LLM - 5x Faster Than Leading Speed-Optimized LLMs, with Dramatically Lower Inference Cost

ElastixAI Emerges From Stealth to Redefine Generative AI Economics via FPGA-Based Supercomputers

ASC24 Finals Set for April in Shanghai: Focus on Cutting-Edge Large Language Model Inference and Seepage Simulation!

Yehey.com - AI Inference Market Forecast to Reach $255B by 2030 Stocks

HW-SW Co-Designed System With 3 Core Optimization Pathways For Long-Context Agentic LLM Inference (Cambridge, ICL)

I served a 200 billion parameter LLM from a Lenovo workstation the size of a Mac Mini

Taalas Launches Hardcore Chip With ‘Insane’ AI Inference Performance

Detailed Study of Performance Modeling For LLM Implementations At Scale (imec)