Grand Finale Finalist — Meta PyTorch OpenEnv Hackathon 2026, selected out of hundreds of submissions.
A multi-agent RL environment for enterprise email triage. Agents learn to classify, prioritize, route, and flag phishing emails through a 3-tier task curriculum with a dense reward structure and a symbolic safety layer that hard-blocks phishing responses regardless of agent policy.
Training used GRPO (Group Relative Policy Optimization). The environment ships as a full OpenEnv-spec RL gym — with live stats, a playable UI, and an API endpoint — deployed open-source on Hugging Face Spaces.
Grand Finale Finalist — Meta PyTorch OpenEnv Hackathon 2026, selected out of hundreds of submissions.
A multi-agent RL environment for enterprise email triag...