๐Ÿง  ๐—ง๐—ต๐—ถ๐—ป๐—ธ๐—ถ๐—ป๐—ด ๐—ผ๐—ณ ๐—ฟ๐˜‚๐—ป๐—ป๐—ถ๐—ป๐—ด ๐—ฎ๐—ป ๐—Ÿ๐—Ÿ๐—  ๐—ผ๐—ป ๐—ž๐˜‚๐—ฏ๐—ฒ๐—ฟ๐—ป๐—ฒ๐˜๐—ฒ๐˜€? Hereโ€™s a rough out...๐Ÿง  ๐—ง๐—ต๐—ถ๐—ป๐—ธ๐—ถ๐—ป๐—ด ๐—ผ๐—ณ ๐—ฟ๐˜‚๐—ป๐—ป๐—ถ๐—ป๐—ด ๐—ฎ๐—ป ๐—Ÿ๐—Ÿ๐—  ๐—ผ๐—ป ๐—ž๐˜‚๐—ฏ๐—ฒ๐—ฟ๐—ป๐—ฒ๐˜๐—ฒ๐˜€? Hereโ€™s a rough out...
The network for creativity
Join 1.25M professional creatives like you
Connect with clients, get discovered, and run your business 100% commission-free
Creatives on Contra have earned over $150M and we are just getting started
๐Ÿง  ๐—ง๐—ต๐—ถ๐—ป๐—ธ๐—ถ๐—ป๐—ด ๐—ผ๐—ณ ๐—ฟ๐˜‚๐—ป๐—ป๐—ถ๐—ป๐—ด ๐—ฎ๐—ป ๐—Ÿ๐—Ÿ๐—  ๐—ผ๐—ป ๐—ž๐˜‚๐—ฏ๐—ฒ๐—ฟ๐—ป๐—ฒ๐˜๐—ฒ๐˜€?
Hereโ€™s a rough outline of what it takes
๐—–๐—ผ๐—ป๐˜๐—ฎ๐—ถ๐—ป๐—ฒ๐—ฟ๐—ถ๐˜‡๐—ฒ ๐˜๐—ต๐—ฒ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น โ€“ Start with something like a quantized Llama2, Mistral, or a custom fine-tuned model.
Use a lightweight serving framework (like text-generation-inference, vLLM, or TGI) and wrap it in a Docker container.
๐—š๐—ฃ๐—จ ๐˜€๐—ฐ๐—ต๐—ฒ๐—ฑ๐˜‚๐—น๐—ถ๐—ป๐—ด โ€“ Use node selectors or taints/tolerations to schedule pods on GPU-enabled nodes
๐—”๐˜‚๐˜๐—ผ๐˜€๐—ฐ๐—ฎ๐—น๐—น๐—ถ๐—ป๐—ด โ€“ Use KEDA or HPA to scale pods based on requests per second or GPU utilization. LLM workloads are spiky, so dynamic scaling saves $$.
๐—”๐—ฃ๐—œ ๐—š๐—ฎ๐˜๐—ฒ๐˜„๐—ฎ๐˜† / ๐—Ÿ๐—ผ๐—ฎ๐—ฑ ๐—•๐—ฎ๐—น๐—ฎ๐—ป๐—ฐ๐—ฒ๐—ฟ โ€“ Expose your model via a gateway (like Istio, NGINX, or even API Gateway in hybrid setups).
#LLM #Kubernetes #DevOps #MLOps #CloudNative #K8s #OpenSource
Post image
Back to feed
The network for creativity
Join 1.25M professional creatives like you
Connect with clients, get discovered, and run your business 100% commission-free
Creatives on Contra have earned over $150M and we are just getting started