Learning Through Play

By Megan Johnson, NCSA
someone holding a toy drone in their hand.

Games have long been a part of childhood learning. When playing, we learn through trial and error how to get better at something. In hide-and-seek, a toddler will often “hide” somewhere obvious when they first play, but as they learn, they pick better hiding places. With the help of a U.S. National Science Foundation ACCESS allocation on NCSA’s DeltaAI, researchers from the University of Pennsylvania (UPenn) took this basic concept, learning through “play,” and used it to train tiny, palm-sized autonomous flying robots to race to a finish line against each other.

Incentives for Efficient AI Training

Antonio Loquercio
Antonio Loquercio, University of Pennsylvania

Antonio Loquercio, assistant professor of electrical and systems engineering at UPenn, and his team recently had their paper accepted for presentation at the prestigious IEEE International Conference on Robotics & Automation (ICRA). He and his student and fellow author on the paper, Vineet Pasumarti, explained how their research uses the mechanics of play to help train AI.

“We study a branch of machine learning called reinforcement learning,” said Loquercio. “In particular, we are interested in leveraging multi-agent reinforcement learning algorithms that allow for competitive and cooperative tactics to emerge. In this project, two autonomous racing drones compete with each other to accumulate rewards during training, which takes place in a high-fidelity simulation software, and these rewards facilitate the learning of behaviors that lead to good outcomes in the real world, i.e., the winning of races.”

Instead of giving the AI agents step-by-step instructions on how to complete the course and win, Loquercio and his team let them compete against each other to learn the best strategy. However, there was a twist in how the race was set up. Instead of rewarding the AI agents when they stayed on a certain path or followed a specific racing line at a specific speed, Loquercio’s team rewarded the AI when the drone it was piloting passed a gate or crossed the finish line before the opponent.

the path the drones fly
Tracks used for both training and evaluation, with 1 m × 1 m gates shown in red and obstacles in transparent blue. Green arrows indicate the gate-passing directions, and the orange curves show the trajectories followed by the drone over multiple laps in simulation. On the left, the Complex Track (CT) spans 8 m×7 m and features six gates, including a slit-S, and optionally four obstacles. On the right, the Lemniscate Track (LT) measures 5 m×5 m, with five gates – one of which is passed twice in a single lap—and optionally two overlapping obstacles. Credit: Pasumarti et al.

Something fascinating began to happen when the AI-piloted drones were left to their own devices to plot the optimal path through the race course – they developed “human-like” strategies to win, including maneuvers familiar to anyone who has seen professionals race cars in real life.

“These maneuvers include smooth overtaking and defensive blocking to push the opponent towards subpar racelines,” said Loquercio. “The drones learned to overtake, block, and modulate risk. When approaching a slower opponent, the drone often passes the opponent without collision. Additionally, the drones learned to block/defend a raceline.”

This is a very compute-heavy project. Without ACCESS, it would have taken much longer to get these results.

–Antonio Loquercio, University of Pennsylvania

While all this work was simulated using high-performance computing resources like DeltaAI, moving the simulation to the real world is often the goal. In this case, Loquercio’s team found their methodology offered another surprising outcome. “Even more interestingly, we also found that our method allows for drone racing policies to transfer from simulation to reality (broadly referred to as the “sim-to-real gap”) far more robustly than existing methods, and allows for successful obstacle avoidance to occur.”

When Training Through Play Makes Sense

Watching these micro-flyers race isn’t the end goal of this research – it’s about helping AI learn to navigate worlds where other independent elements are involved. Winning is just the goal of this particular research. Competitive training could lead to robots that protect, rather than just compete.

Take, for example, the global decline of the honeybee. In nature, bees face constant threats from invasive predators like the “murder wasp.” While a standard micro drone might struggle to keep up with the erratic movements of a wasp, a micro-flyer trained through this play-based competition could be different.

By learning the complex “game” of intercepting an opponent, these palm-sized protectors could be trained to identify a wasp’s approach and gently steer it away from a hive. Because the AI has practiced against thousands of different simulated opponents, it wouldn’t need a human to tell it how to react; it would have already discovered the most efficient way to guard the hive through millions of rounds of practice in the safety of a supercomputer.

two toy drones racing around an obstacle
Two opponent-aware quadrotors performing head-to-head autonomous racing. The multi-agent policies are trained in simulation with a competitive, sparse, task-level reward (i.e., winning the race), without any specific behavior-based reward (e.g., flying fast), and are transferred zero-shot on the real-world drones. Credit: Pasumarti et al.

“The biggest reward from this direction is the potential for robots to successfully handle complex competitor dynamics,” said Loquercio.

From Simulation to Real World

Simulations on machines like DeltaAI help researchers get answers quickly. In this case, allowing two AI agents to run and learn from each other would have led to many failures in each race. A simulated race is completed in a fraction of the time it would take to set each race up in the real world.

“Leveraging high-performance computing (HPC) is critical to the success of this work,” said Loquercio. “Our multi-agent reinforcement learning approach relies on 10240 parallel environments for agents to learn good behaviors through trial-and-error, especially when we forgo dense, behavior-prescribing rewards.”

Loquercio’s team secured time on DeltaAI through the U.S. National Science Foundation’s ACCESS program. “ACCESS resources allowed us to train drones across more than 10,000 parallel environments,” he said.

If you’d like to read more about this research, you can find the original story here: Mastering the Art of the Game. If you have a project that could benefit from HPC resources, it’s easy to get started with ACCESS here.


Resource Provider Institution(s): National Center for Supercomputing Applications (NCSA)
Resources Used: DeltaAI
Affiliations: UPenn
Funding Agency: NSF
Grant or Allocation Number(s): CIS250236

The science story featured here was enabled by the U.S. National Science Foundation’s ACCESS program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.

Sign up for ACCESS news and updates.

Receive our monthly newsletter with ACCESS program news in your inbox. Read past issues.