Developed an AI threat detection system designed to identify and mitigate risks in Large Language Model (LLM) environments. The platform analyzes user inputs in real time to detect prompt injection, jailbreak attempts, and malicious patterns using rule-based and heuristic techniques. It integrates automated testing workflows to simulate adversarial scenarios and continuously evaluate model behavior. The system includes logging and monitoring capabilities to track suspicious activity and measure performance metrics such as detection accuracy and latency. This solution demonstrates practical implementation of AI security principles, enabling safer deployment of AI systems by proactively identifying vulnerabilities and enforcing guardrails against misuse.
1
14
Guardrail Service is a production-grade LLM input validation system built to detect prompt injection and jailbreak attacks before they reach a language model. It uses a two-tier pipeline — a sub-millisecond rule engine powered by Aho-Corasick pattern matching, followed by a quantized DeBERTa transformer exported to ONNX for semantic attack detection at ~20ms. The service runs as an internal HTTP API built with FastAPI, containerized with Docker, and monitored via Prometheus and Grafana. It includes an adversarial test suite of 100+ prompts achieving 95%+ detection accuracy with under 5% false positives, and a live dashboard showing real-time validation, pipeline flowchart, and benchmark results.