Grand Finale Finalist — Meta by Nishtha SharmaGrand Finale Finalist — Meta by Nishtha Sharma

Grand Finale Finalist — Meta

Nishtha  Sharma

Nishtha Sharma

Grand Finale Finalist — Meta PyTorch OpenEnv Hackathon 2026, selected out of hundreds of submissions.
A multi-agent RL environment for enterprise email triage. Agents learn to classify, prioritize, route, and flag phishing emails through a 3-tier task curriculum with a dense reward structure and a symbolic safety layer that hard-blocks phishing responses regardless of agent policy.
Training used GRPO (Group Relative Policy Optimization). The environment ships as a full OpenEnv-spec RL gym — with live stats, a playable UI, and an API endpoint — deployed open-source on Hugging Face Spaces.
Stack: Python · PyTorch · GRPO · Pydantic · Docker · Hugging Face Spaces
This isn't a demo — it's a complete RL training environment with phishing-aware safety constraints built into the reward design itself.
Like this project

Posted Jun 4, 2026

Grand Finale Finalist — Meta PyTorch OpenEnv Hackathon 2026, selected out of hundreds of submissions. A multi-agent RL environment for enterprise email triag...