This project implements a reinforced multiagent learning system in a transport world, where three agents navigate a grid environment to reach a target location while avoiding obstacles. The system uses Q-learning and epsilon-greedy exploration to train the agents to make optimal decisions. The project includes four experiments to evaluate the performance of the agents under different learning rates, exploration strategies, and reward structures.