Optimization of Butterfly-Inspired Drones via Reinforcement Learning

In this work, we present a comprehensive experimental and computational study aimed at enhancing the takeoff lift of butterfly-inspired drones. By integrating an experimental platform with a reinforcement learning algorithm, we achieved a remarkable increase in mean lift from 0.044 N to 0.861 N. We further explain the underlying aerodynamic mechanisms through flow visualization and numerical simulations. The entire investigation is conducted from a first-person perspective, reflecting our direct involvement in the design, training, and analysis of the butterfly drone system.

Butterfly-inspired drones, or simply butterfly drones, have attracted significant attention due to their unique flapping-wing kinematics, which offer high maneuverability and potential for low-noise operations. However, achieving sufficient lift during takeoff remains a critical challenge. Traditional control approaches often fail to fully exploit the complex fluid-structure interactions inherent in flapping flight. To address this, we adopted a hardware-in-the-loop reinforcement learning framework that enables the butterfly drone to autonomously optimize its wing motion in real time. Our goal is not only to boost lift but also to provide physical insights into the vortical structures responsible for lift enhancement.

Experimental Platform and Methodology

The core of our experimental setup is a flapping-wing butterfly drone with a wingspan of 81 cm. The wings are actuated by two servomotors that allow independent control of the left and right wings. The drone is mounted on a support structure equipped with force sensors that continuously record the aerodynamic forces during flapping. A motion capture system with multiple high-speed cameras tracks reflective markers placed on the wing surfaces, enabling simultaneous measurement of wing kinematics, deformation, and inertial forces. All data are streamed in real time to a central computer that runs the reinforcement learning algorithm.

We employed the Proximal Policy Optimization (PPO) algorithm, a state-of-the-art deep reinforcement learning method, to train the butterfly drone directly on the physical hardware. The training objective was to maximize the time-averaged lift over a flapping period. At each step, the algorithm observes the current force measurements and wing positions, then outputs motor commands. The reward function is defined as the instantaneous lift force normalized by a baseline. The training continued until the lift performance plateaued.

To characterize the aerodynamic conditions, we define the Reynolds number based on the mean wing chord $c$ and flapping tip velocity $U$:

$$
Re = \frac{\rho U c}{\mu}
$$

where $\rho$ is air density (1.225 kg/m³) and $\mu$ is dynamic viscosity (1.81×10⁻⁵ Pa·s). For our drone, the typical Reynolds number is on the order of 10⁴, indicating a regime dominated by laminar to transitional flow with strong vortex dynamics.

The lift coefficient is defined as:

$$
C_L = \frac{L}{\frac{1}{2}\rho U^2 S}
$$

where $L$ is the measured lift, $S$ is the total wing area. In our experiments, the wing area is approximately 0.12 m² for both wings combined.

Table 1 summarizes the key parameters of our butterfly drone and the experimental conditions.

Table 1: Parameters of the butterfly drone and experimental conditions.
Parameter	Value
Wingspan	81 cm
Mean chord	15 cm
Wing area (both wings)	0.12 m²
Flapping frequency	2–4 Hz (adaptive)
Servo torque	5 kg·cm
Force sensor resolution	0.01 N
Motion capture sampling rate	200 Hz
Reinforcement learning algorithm	PPO
Training episodes	1500

Reinforcement Learning Results

Before training, the butterfly drone generated a mean lift of only 0.044 N when using a simple sinusoidal flapping pattern. After applying the PPO-based hardware-in-the-loop training, the mean lift increased to 0.861 N—a remarkable 19.6-fold improvement. The trained motion exhibited an asymmetric flapping pattern: the downstroke was relatively fast and strong, while the upstroke included a deliberate deceleration and an additional wing pitch adjustment that produced a pronounced lift peak during the latter part of the upstroke. Figure 1 shows the evolution of the average lift over training episodes, clearly demonstrating the convergence and the final performance gain.

We note that the lift peak observed during the upstroke is particularly significant. At the instant of peak lift, the instantaneous lift force reached 1.52 N, which is about 77% higher than the mean value. This phenomenon is reminiscent of wake capture mechanisms observed in insect flight, where the wing interacts with the vortical wake left by the previous stroke. To confirm this, we conducted complementary measurements of inertial forces and flow field analyses.

Inertial Force Decomposition

To separate aerodynamic lift from inertial contributions, we used the motion capture system to compute the acceleration of each wing segment and then estimated the inertial force via Newton’s second law. The net lift measured by the force sensor is the sum of the aerodynamic lift and the vertical component of the inertial forces. By comparing the measured total lift with the computed inertial force, we isolated the pure aerodynamic lift. Table 2 presents the peak values and phase angles of these components over one typical flapping cycle after training.

Table 2: Peak forces and phase angles in the trained flapping cycle.
Force Component	Peak Magnitude (N)	Phase Angle (°) relative to start of downstroke
Total measured lift	1.52	210° (mid-upstroke)
Aerodynamic lift (computed)	1.48	215°
Inertial force (vertical)	0.05	45° (early downstroke)

Clearly, the inertial force is negligible compared to the aerodynamic lift, contributing less than 4% of the peak value. The timing of the peak inertial force also does not coincide with the lift peak, reinforcing that the observed upstroke lift enhancement is purely aerodynamic in origin.

Flow Visualization and Numerical Simulation

We performed smoke-wire flow visualization experiments in a low-speed wind tunnel to reveal the large-scale vortical structures around the butterfly drone. Simultaneously, we conducted unsteady Reynolds-averaged Navier–Stokes (URANS) simulations using ANSYS Fluent to capture the detailed flow field. The simulation was validated against the experimental data for the baseline case before being applied to the optimized flapping kinematics.

The flow visualization showed that during the downstroke, a strong leading-edge vortex (LEV) forms on the upper surface of the wing, and a vortex ring is shed as the wing reaches the bottom of the stroke. This vortex ring continues to propagate downward and forward. During the subsequent upstroke, the wing moves upward through the residual vortex ring, re-energizing the flow over the wing and generating a transient low-pressure region. This wake capture effect is responsible for the pronounced lift peak in the upstroke.

In the numerical simulation, we quantified the circulation of the shed vortex ring:

$$
\Gamma = \oint \mathbf{v} \cdot d\mathbf{l}
$$

where the integration contour is taken around the vortex core in a cross-sectional plane. The peak circulation reached approximately 0.012 m²/s. The interaction between the wing and the vortex ring can be modeled by considering the induced velocity from the vortex ring onto the wing. The unsteady lift augmentation can be expressed using the Kutta–Joukowski theorem adapted for unsteady flows:

$$
\Delta L = \rho U \Gamma_{\mathrm{eff}} b
$$

where $\Gamma_{\mathrm{eff}}$ is the effective circulation induced by the wake capture and $b$ is the effective span length of the interacting portion of the wing. Our simulation estimates that $\Gamma_{\mathrm{eff}}$ during the upstroke peak is about 0.008 m²/s, leading to a lift increment of approximately 0.9 N, consistent with the observed peak.

To further validate, we compared the time history of the aerodynamic lift from the simulation and the experiment for the optimized motion. Table 3 lists the lift values at several key instants.

Table 3: Comparison of lift from experiment and simulation for the optimized flapping motion.
Instant	Experimental Lift (N)	Simulated Lift (N)
End of downstroke	0.78	0.74
Mid-upstroke (peak)	1.52	1.46
End of upstroke	0.23	0.21

The agreement is good, confirming the fidelity of our simulation and the physical explanation.

Discussion

The integration of reinforcement learning with a butterfly drone platform allowed us to discover a non-intuitive flapping pattern that dramatically improves lift. The key mechanism—wake capture via a trailing-edge vortex interacting with the wing on the upstroke—is consistent with observations in biological insects, but our work demonstrates that this phenomenon can be systematically exploited through machine learning optimization. The butterfly drone, as a platform, is particularly well-suited for such studies due to its highly deformable wings and low flapping frequency, which make fluid-structure interactions more accessible to both measurement and simulation.

Our results open new avenues for designing high-lift flapping wing micro air vehicles. Future work will focus on scaling the butterfly drone to smaller sizes, incorporating real-time flow sensing, and extending the training to include forward flight and maneuvering. The butterfly drones of the next generation could leverage similar reinforcement learning frameworks to adapt to changing aerodynamic conditions, enabling robust outdoor flight.

In summary, we have shown that butterfly drones can achieve a significant lift enhancement through reinforcement learning-driven optimization, with the physical origin rooted in wake capture. This work not only advances the state of the art in flapping-wing robotics but also provides a deeper understanding of the complex flow physics that govern insect flight.