Optimize Your AI Workload with Multi-Model CPU ClustersOptimize Your AI Workload with Multi-Model CPU Clusters
The network for creativity
Join 1.25M professional creatives like you
Connect with clients, get discovered, and run your business 100% commission-free
Creatives on Contra have earned over $150M and we are just getting started
Deployed a multi-model CPU inference cluster on 2× HP ProLiant DL360 Gen9 servers (376GB RAM total). Running 5 concurrent LLM endpoints: Qwen3-Coder-30B, Qwen3-Next-80B, GLM-4.7-Flash, Granite-4.0-Tiny — all quantized (GGUF/Q4_K_M). Optimized with ik_llama + MKL for +62% throughput. Ollama-compatible API, Open WebUI frontend, Opik observability.
Post image
Back to feed
The network for creativity
Join 1.25M professional creatives like you
Connect with clients, get discovered, and run your business 100% commission-free
Creatives on Contra have earned over $150M and we are just getting started