Edge deployments bring machine learning models closer to the source of data generation, such as IoT devices or local servers. By running inference directly on devices like sensors, cameras, or smartphones, we achieve ultra-low latency and reduced reliance on centralized systems. Edge computing is particularly useful in scenarios like autonomous vehicles, real-time analytics, or industrial automation, where speed and reliability are critical. Nevertheless, we face constraints in terms of storage, computation power, and model complexity when working on edge devices.