Games have long been a part of childhood learning. When playing, we learn through trial and error how to get better at something. In hide-and-seek, a toddler will often “hide” somewhere obvious when they first play, but as they learn, they pick better hiding places. With the help of a U.S. National Science Foundation ACCESS allocation on NCSA’s DeltaAI, researchers from the University of Pennsylvania (UPenn) took this basic concept, learning through “play,” and used it to train tiny, palm-sized autonomous flying robots to race to a finish line against each other.
Incentives for Efficient AI Training

Antonio Loquercio, assistant professor of electrical and systems engineering at UPenn, and his team recently had their paper accepted for presentation at the prestigious IEEE International Conference on Robotics & Automation (ICRA). He and his student and fellow author on the paper, Vineet Pasumarti, explained how their research uses the mechanics of play to help train AI.
“We study a branch of machine learning called reinforcement learning,” said Loquercio. “In particular, we are interested in leveraging multi-agent reinforcement learning algorithms that allow for competitive and cooperative tactics to emerge. In this project, two autonomous racing drones compete with each other to accumulate rewards during training, which takes place in a high-fidelity simulation software, and these rewards facilitate the learning of behaviors that lead to good outcomes in the real world, i.e., the winning of races.”
Instead of giving the AI agents step-by-step instructions on how to complete the course and win, Loquercio and his team let them compete against each other to learn the best strategy. However, there was a twist in how the race was set up. Instead of rewarding the AI agents when they stayed on a certain path or followed a specific racing line at a specific speed, Loquercio’s team rewarded the AI when the drone it was piloting passed a gate or crossed the finish line before the opponent.

Something fascinating began to happen when the AI-piloted drones were left to their own devices to plot the optimal path through the race course – they developed “human-like” strategies to win, including maneuvers familiar to anyone who has seen professionals race cars in real life.
“These maneuvers include smooth overtaking and defensive blocking to push the opponent towards subpar racelines,” said Loquercio. “The drones learned to overtake, block, and modulate risk. When approaching a slower opponent, the drone often passes the opponent without collision. Additionally, the drones learned to block/defend a raceline.”
This is a very compute-heavy project. Without ACCESS, it would have taken much longer to get these results.
–Antonio Loquercio, University of Pennsylvania
While all this work was simulated using high-performance computing resources like DeltaAI, moving the simulation to the real world is often the goal. In this case, Loquercio’s team found their methodology offered another surprising outcome. “Even more interestingly, we also found that our method allows for drone racing policies to transfer from simulation to reality (broadly referred to as the “sim-to-real gap”) far more robustly than existing methods, and allows for successful obstacle avoidance to occur.”
When Training Through Play Makes Sense
Watching these micro-flyers race isn’t the end goal of this research – it’s about helping AI learn to navigate worlds where other independent elements are involved. Winning is just the goal of this particular research. Competitive training could lead to robots that protect, rather than just compete.
Take, for example, the global decline of the honeybee. In nature, bees face constant threats from invasive predators like the “murder wasp.” While a standard micro drone might struggle to keep up with the erratic movements of a wasp, a micro-flyer trained through this play-based competition could be different.
By learning the complex “game” of intercepting an opponent, these palm-sized protectors could be trained to identify a wasp’s approach and gently steer it away from a hive. Because the AI has practiced against thousands of different simulated opponents, it wouldn’t need a human to tell it how to react; it would have already discovered the most efficient way to guard the hive through millions of rounds of practice in the safety of a supercomputer.

“The biggest reward from this direction is the potential for robots to successfully handle complex competitor dynamics,” said Loquercio.
From Simulation to Real World
Simulations on machines like DeltaAI help researchers get answers quickly. In this case, allowing two AI agents to run and learn from each other would have led to many failures in each race. A simulated race is completed in a fraction of the time it would take to set each race up in the real world.
“Leveraging high-performance computing (HPC) is critical to the success of this work,” said Loquercio. “Our multi-agent reinforcement learning approach relies on 10240 parallel environments for agents to learn good behaviors through trial-and-error, especially when we forgo dense, behavior-prescribing rewards.”
Loquercio’s team secured time on DeltaAI through the U.S. National Science Foundation’s ACCESS program. “ACCESS resources allowed us to train drones across more than 10,000 parallel environments,” he said.
If you’d like to read more about this research, you can find the original story here: Mastering the Art of the Game. If you have a project that could benefit from HPC resources, it’s easy to get started with ACCESS here.
Resource Provider Institution(s): National Center for Supercomputing Applications (NCSA)
Resources Used: DeltaAI
Affiliations: UPenn
Funding Agency: NSF
Grant or Allocation Number(s): CIS250236
The science story featured here was enabled by the U.S. National Science Foundation’s ACCESS program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.
