In the pursuit of enhancing aerial agility and efficiency in micro-aerial vehicles, we turned to nature’s master fliers: butterflies. Their complex flapping patterns, characterized by large deformations and unsteady aerodynamics, offer a rich source of inspiration for designing next-generation drones. This work presents a comprehensive exploration into the intelligent optimization of a bio-inspired flying butterfly drone, specifically targeting the critical phase of takeoff where sufficient lift generation is paramount. Our approach integrates a physical experimental platform with a reinforcement learning framework, followed by detailed fluid dynamic analyses to interpret the optimized performance. The core objective is to significantly augment the takeoff lift force of our flying butterfly drone, moving beyond traditional control strategies toward an AI-driven, adaptive optimization process.
The design and construction of our flying butterfly drone form the foundation of this study. The drone mimics the morphology and kinematics of a typical butterfly, with a wingspan of 81 cm. The wings are constructed from a flexible membrane supported by a lightweight carbon fiber frame, allowing for passive deformation during flapping. Actuation is achieved through two high-torque servo motors located at the wing roots, enabling independent control of each wing’s flapping motion. The entire flying butterfly drone assembly is mounted on a rigid stand equipped with a six-axis force/torque sensor, which provides real-time, high-fidelity measurements of the aerodynamic forces and moments generated during flapping cycles. This hardware-in-the-loop setup is crucial for our optimization process. Key parameters of the flying butterfly drone are summarized in Table 1.
| Parameter | Value | Description |
|---|---|---|
| Wingspan | 0.81 m | Total tip-to-tip wingspan |
| Wing Chord (Avg.) | 0.18 m | Average wing chord length |
| Total Mass | 0.055 kg | Mass of the drone structure |
| Flapping Frequency Range | 0.5 – 5.0 Hz | Controllable flapping frequency |
| Stroke Amplitude Range | ±60° | Peak-to-peak flapping angle |
| Servo Motor Resolution | 0.09° | Angular resolution of actuators |
| Force Sensor Range | ±10 N | Measurement range for lift force |
The flapping kinematics of the flying butterfly drone can be described by a set of Euler angles. We define the wing pitch (feathering), flap (stroke), and deviation angles. The primary flapping motion is governed by the stroke angle $\phi(t)$. The reinforcement learning agent will modulate the time history of these angles. The instantaneous lift force $L(t)$ measured by the sensor is the primary performance metric. The average lift over a cycle is calculated as:
$$ \bar{L} = \frac{1}{T} \int_{0}^{T} L(t) \, dt $$
where $T$ is the period of the flapping cycle. Our initial, hand-tuned flapping pattern for the flying butterfly drone yielded a disappointingly low $\bar{L}_{initial} = 0.044 \, \text{N}$, insufficient for sustained takeoff.
To transcend the limitations of manual tuning, we implemented a reinforcement learning (RL) algorithm to discover optimal flapping strategies. We employed the Proximal Policy Optimization (PPO) algorithm, a state-of-the-art policy gradient method known for its stability and sample efficiency. In our hardware-in-the-loop setup, the RL agent interacts directly with the physical flying butterfly drone. The state $s_t$ observed by the agent includes the real-time force sensor readings (lift and drag), the current servo motor positions (angles), and their time derivatives. The action $a_t$ output by the agent is a vector of desired changes to the servo motor control signals, effectively defining the next segment of the flapping trajectory. The reward function $r_t$ is carefully designed to maximize average lift while penalizing excessive power consumption and unstable motions:
$$ r_t = \alpha L(t) – \beta |\dot{\phi}(t)|^2 – \gamma \sigma_{L} $$
where $\alpha, \beta, \gamma$ are weighting coefficients, $L(t)$ is the instantaneous lift, $\dot{\phi}(t)$ is the flapping angular velocity, and $\sigma_{L}$ is the standard deviation of lift over a short time window, promoting smooth force generation. The PPO algorithm aims to maximize the expected cumulative reward $J(\theta)$:
$$ J(\theta) = \mathbb{E}_{\tau \sim \pi_{\theta}} \left[ \sum_{t=0}^{T} \gamma^{t} r_t \right] $$
where $\pi_{\theta}$ is the policy parameterized by $\theta$, $\tau$ is a trajectory, and $\gamma$ is a discount factor. The policy update in PPO involves a clipped objective function to prevent excessively large updates:
$$ L^{CLIP}(\theta) = \mathbb{E}_t \left[ \min\left( \frac{\pi_{\theta}(a_t|s_t)}{\pi_{\theta_{old}}(a_t|s_t)} A_t, \,\, \text{clip}\left(\frac{\pi_{\theta}(a_t|s_t)}{\pi_{\theta_{old}}(a_t|s_t)}, 1-\epsilon, 1+\epsilon\right) A_t \right) \right] $$
where $A_t$ is the advantage estimate. Training involved millions of interactions with the physical flying butterfly drone platform, gradually exploring the high-dimensional action space to find efficient flapping patterns.
The optimization results were striking. The reinforcement learning agent successfully discovered a flapping strategy that dramatically increased the lift output of the flying butterfly drone. The average lift force $\bar{L}$ rose from the baseline of 0.044 N to an optimized value of 0.861 N, representing nearly a 20-fold improvement. More intriguingly, the time-history of the lift force revealed a distinct, strong peak during the upstroke phase of the flapping cycle, a feature not prominent in the initial pattern. This peak is critical for achieving net positive lift during takeoff. A comparison of key metrics before and after optimization is presented in Table 2.
| Metric | Baseline Performance | Optimized Performance | Improvement Factor |
|---|---|---|---|
| Average Lift Force ($\bar{L}$) | 0.044 N | 0.861 N | 19.57 |
| Peak Lift Force (Upstroke) | 0.12 N | 2.35 N | 19.58 |
| Lift-to-Power Ratio | 0.18 N/W | 1.67 N/W | 9.28 |
| Takeoff Feasibility Index* | 0.15 | 1.42 | 9.47 |
*A dimensionless index comparing average lift to weight, where >1 indicates potential for takeoff.
To dissect the origins of this performance leap, especially the mysterious upstroke lift peak, we needed to separate aerodynamic forces from inertial forces. The wings of the flying butterfly drone have mass, and their acceleration during flapping generates inertial forces that are measured by the base-mounted force sensor. We integrated a high-speed motion capture system with 12 cameras to track the precise 3D motion of multiple markers placed on the wing surfaces. This allowed us to reconstruct the full kinematic field—including wing bending, twisting, and acceleration—for each time instance during the flapping cycle. The inertial force contribution $F_{inertial}(t)$ can be estimated by applying Newton’s second law to the discretized wing segments:
$$ F_{inertial}(t) = \sum_{i=1}^{N} m_i \ddot{\mathbf{r}}_{i,CM}(t) $$
where $m_i$ is the mass of the i-th wing segment, $\ddot{\mathbf{r}}_{i,CM}(t)$ is the acceleration of its center of mass derived from motion capture data, and $N$ is the total number of segments. The net aerodynamic force $F_{aero}(t)$ is then:
$$ F_{aero}(t) = L_{measured}(t) – F_{inertial}(t) \cdot \mathbf{\hat{z}} $$
where $\mathbf{\hat{z}}$ is the unit vector in the vertical (lift) direction. Our analysis, summarized over multiple cycles in Table 3, revealed that the inertial force component was negligible during the critical upstroke peak period and contributed minimally to the net average lift. This confirmed that the dramatic improvement was primarily aerodynamic in nature.
| Flapping Phase | Duration (ms) | Average Measured Lift (N) | Average Inertial Force (N) | Average Aerodynamic Force (N) | Aerodynamic Contribution (%) |
|---|---|---|---|---|---|
| Downstroke | 210 | 1.12 | 0.08 | 1.04 | 92.9% |
| Stroke Reversal (Bottom) | 40 | -0.21 | -0.15 | -0.06 | 28.6% |
| Upstroke | 190 | 1.89 | 0.05 | 1.84 | 97.4% |
| Stroke Reversal (Top) | 30 | -0.31 | -0.18 | -0.13 | 41.9% |
| Full Cycle | 470 | 0.861 | 0.002 | 0.859 | 99.8% |
With inertial forces ruled out as the main cause, we turned to fluid dynamics to unravel the source of the enhanced aerodynamics, particularly the upstroke lift peak. We conducted flow visualization experiments using a high-speed laser sheet and smoke particles seeded in the air around the flying butterfly drone. Simultaneously, we performed computational fluid dynamics (CFD) simulations using ANSYS Fluent to obtain a quantitative, three-dimensional view of the flow field. The CFD model solved the unsteady, incompressible Navier-Stokes equations:
$$ \nabla \cdot \mathbf{u} = 0 $$
$$ \frac{\partial \mathbf{u}}{\partial t} + (\mathbf{u} \cdot \nabla) \mathbf{u} = -\frac{1}{\rho} \nabla p + \nu \nabla^2 \mathbf{u} $$
where $\mathbf{u}$ is the velocity vector, $p$ is pressure, $\rho$ is density, and $\nu$ is the kinematic viscosity. The geometry of the flying butterfly drone’s wings was imported and meshed with a dynamic mesh technique to accommodate the complex flapping and deformation kinematics captured by the motion system.

The synergy between flow visualization and CFD yielded profound insights. During the downstroke phase, the flying butterfly drone’s wings, moving downward and with a specific pitch angle dictated by the RL-optimized policy, generate a strong starting vortex and subsequently form a coherent vortex ring structure beneath the wing. This vortex ring, associated with high rotational kinetic energy, is shed into the wake as the wing approaches the end of the downstroke. The critical phenomenon occurs during the subsequent upstroke. As the wing reverses direction and moves upward, its trajectory and altered angle of attack cause it to interact with the vortex ring shed during the previous downstroke. Specifically, the trailing-edge vortex (TEV) from the downstroke, which is part of this vortex ring, remains in close proximity to the wing’s upper surface during the early upstroke. This interaction is akin to a “wake capture” mechanism, where the wing effectively re-encounters and gains energy from its own wake. The presence of this vortex near the wing surface induces a region of low pressure on the upper wing, following the principles described by the Kelvin circulation theorem and Bernoulli’s equation. The pressure difference $\Delta p$ between the upper and lower surfaces integrates to produce a strong lift force, manifesting as the observed upstroke peak. This can be conceptually linked to the lift formula:
$$ L = \oint_{C} (\mathbf{u} \times \mathbf{\omega}) \cdot d\mathbf{A} \,\, + \,\, \text{pressure integration terms} $$
where circulation $\Gamma$ around the wing plays a key role. The wake capture enhances the effective circulation during the upstroke. The optimized kinematics discovered by the RL agent for the flying butterfly drone appear to perfectly time the wing motion to maximize this beneficial vortex-wing interaction, a feat difficult to achieve through intuition or conventional parametric studies.
To further quantify the flow structures, we analyzed the vorticity field $\mathbf{\omega} = \nabla \times \mathbf{u}$ from the CFD results. The strength of the shed vortex ring and its spatial relationship with the wing during upstroke were critical. We defined a dimensionless interaction parameter $\zeta$ for the upstroke phase:
$$ \zeta(t) = \frac{ \| \int_{V_{vortex}} \mathbf{\omega} \, dV \| }{ A_{wing} \, \| \mathbf{U}_{wing}(t) \| } $$
where the numerator is the magnitude of the total vorticity in the captured vortex core volume $V_{vortex}$, $A_{wing}$ is the wing area, and $\mathbf{U}_{wing}(t)$ is the wing’s velocity vector. A time series of $\zeta(t)$ showed a sharp correlation with the lift coefficient $C_L(t)$ during upstroke, confirming the causal link. Table 4 summarizes key vortex parameters extracted from the CFD simulation for the optimized flying butterfly drone case.
| Parameter | Downstroke Vortex Ring | Upstroke Trailing-Edge Vortex (TEV) | Interaction Phase |
|---|---|---|---|
| Core Circulation $\Gamma$ (m²/s) | 0.45 | 0.38 | Early Upstroke |
| Core Diameter (mm) | 35 | 28 | Early Upstroke |
| Distance from Wing TE (mm) | N/A (shed) | 8 – 15 | Early Upstroke |
| Induced Velocity at Wing (m/s) | N/A | 1.2 – 2.0 | Peak Upstroke Lift |
| Estimated Pressure Drop (Pa) | N/A | 45 – 120 | Peak Upstroke Lift |
The success of the reinforcement learning framework in optimizing the flying butterfly drone’s performance opens new avenues for control strategy development. The policy learned by the agent encodes a complex, time-varying actuation pattern that efficiently orchestrates vortex generation and capture. We analyzed the learned policy by examining the relationship between the state variables and the action outputs. A simplified representation of the policy for the stroke angle $\phi(t)$ can be approximated as a nonlinear function of the phase within the flapping cycle $\psi$ and the recent history of lift force:
$$ \phi(t) \approx f_{\theta}(\psi, L(t-1), L(t-2), \dot{\phi}(t-1)) $$
where $f_{\theta}$ is the neural network parameterized by $\theta$. This indicates that the flying butterfly drone’s control system effectively uses feedback from its immediate aerodynamic state to adjust its kinematics, creating an adaptive, closed-loop flapping strategy superior to any fixed open-loop pattern.
In conclusion, this work demonstrates a powerful methodology for enhancing the performance of bio-inspired aerial robots. By embedding a physical model of a flying butterfly drone within a reinforcement learning loop, we achieved a dramatic, order-of-magnitude increase in takeoff lift force. The optimization process, free from human preconceptions about efficient flapping, discovered a strategy that leverages complex unsteady aerodynamic mechanisms, specifically a vortex wake capture phenomenon during the upstroke. Detailed experimental decomposition and high-fidelity CFD simulations provided a clear physical explanation for the learned behavior, showing that inertial contributions were minimal and that the lift boost was aerodynamically driven by intelligent manipulation of vortex structures. The implications are significant for the design and control of future flying butterfly drones and other flapping-wing micro-air vehicles, suggesting that AI-driven hardware-in-the-loop optimization can unlock high-performance flight regimes inspired by, but not strictly limited to, biological paradigms. Future work will focus on implementing the learned policy on a free-flying version of the flying butterfly drone and exploring multi-objective optimization for tasks beyond takeoff, such as maneuvering and hover.
