#llm

Articles tagged with #llm

Implementing TurboQuant in llama.cpp: CUDA Scars and What Actually Ships
3 weeks of porting TurboQuant to CUDA, 5 scars, and what actually ships for document processing on T4 GPUs
Apr 6, 202611 min read244