In the rapidly evolving landscape of low-altitude economies, drone technology has become indispensable for applications such as emergency rescue, disaster monitoring, and urban security. A critical requirement across these scenarios is the real-time transmission of high-definition video from unmanned aerial vehicles to ground control stations, enabling timely situational awareness and decision-making. Deploying multipath transport protocols like Multipath TCP and Multipath QUIC offers a compelling solution by aggregating bandwidth across heterogeneous wireless links, such as LTE and 5G. However, the performance of these protocols is heavily contingent on the scheduling algorithm used to distribute traffic across paths. The dynamic and heterogeneous nature of drone technology networks—characterized by fluctuating latency, variable bandwidth, and sudden signal degradation—presents significant challenges for traditional schedulers.
To address these challenges, we introduce NeuroFly, a novel multipath scheduling framework specifically designed for drone technology video streaming. Our framework models the scheduling problem as a Contextual Multi-Armed Bandit (CMAB) problem and employs the Neural Upper Confidence Bound (NeuralUCB) algorithm to learn an adaptive policy online. This paper details the design, implementation, and evaluation of NeuroFly, demonstrating its superior performance in reducing latency and enhancing video Quality of Experience (QoE) compared to existing state-of-the-art schedulers and standard transport protocols.

1. Problem Formulation as a CMAB
The core of our approach is to view the multipath scheduling decision at each interval as an action selection problem within a CMAB framework. At the start of every scheduling period, the scheduler (the agent) observes a context describing the current state of the network, the video content, and the drone’s flight dynamics. Based on this context, it selects a scheduling action (e.g., a specific redundancy rate for different frame types) and receives a reward reflecting the transmission performance. The goal is to maximize the cumulative reward over time by learning a policy that maps contexts to actions. This formulation naturally balances the exploration-exploitation dilemma, which is crucial in the volatile environments typical of drone technology operations.
2. The NeuroFly Framework
The architecture of NeuroFly is built on four key design pillars: a rich context space, a priority-driven action space, a multi-objective reward function, and the NeuralUCB learning algorithm enhanced by a context monitoring mechanism.
2.1 Context Space Design
To make informed decisions, the context vector xt at time t must accurately capture the current environment. We construct this context from three distinct feature groups:
- Path State (s(t)all): For each of the n paths, we collect five metrics: smoothed round-trip time (srtt), available bandwidth (bw), packet loss rate (plr), average congestion window (cwnd), and average signal strength (rss). The smoothed RTT is calculated as:
$$ srtt^{(t)}_p = \begin{cases} rtt^{(t)}, & t = 0 \\ \gamma \cdot rtt^{(t)} + (1-\gamma) \cdot srtt^{(t-1)}_p, & t > 0 \end{cases} $$
where the smoothing factor γ is set to 0.125. The state vector for a single path p is:
$$ s^{(t)}_p = [srtt^{(t)}_p, bw^{(t)}_p, plr^{(t)}_p, cwnd^{(t)}_p, rss^{(t)}_p]^T \in \mathbb{R}^5 $$
The global path state concatenates all paths: s(t)all ∈ ℝ5n.
- Video Encoding Features (v(t)): We track the average sizes of I-frames, P-frames, and B-frames (lI, lP, lB) from the last Group of Pictures (GOP). This informs the scheduler about the bandwidth requirements of upcoming frames.
$$ v^{(t)} = [l^{(t)}_I, l^{(t)}_P, l^{(t)}_B]^T \in \mathbb{R}^3 $$
- Drone Flight Parameters (e(t)): The drone’s altitude (h), vertical speed (vv), and horizontal speed (vh) are included to sense mobility-induced channel variations.
$$ e^{(t)} = [h^{(t)}, v^{(t)}_v, v^{(t)}_h]^T \in \mathbb{R}^3 $$
The complete context vector is the fusion of all features:
$$ x_t = [(s^{(t)}_{all})^T, (v^{(t)})^T, (e^{(t)})^T]^T \in \mathbb{R}^{5n+6} $$
2.2 Action Space Design
Our action space incorporates a frame-priority-driven redundant transmission mechanism. We define K+1 candidate base redundancy rates, REinit ∈ {0, 1/K, …, 1}. The scheduler first selects REinit and then adjusts it for each frame type based on its relative size and criticality:
$$ RE_{type} = RE_{init} \cdot \frac{l_{type}}{l_{sum}}, \quad type \in \{I, P, B\} $$
Here, lsum = lI + lP + lB. Original video data is always sent over the fastest path (lowest RTT), while redundant copies are transmitted over the second fastest path. This targeting of redundancy to the most critical frames (I > P > B) efficiently improves delivery reliability without excessive bandwidth waste.
2.3 Multi-Objective Reward Function
At the end of each scheduling period, the agent receives a reward rt that encourages low latency, minimal packet loss, and efficient bandwidth usage:
$$ r_t = \left(1 – \frac{\min(srtt^{(t)}, D_{max})}{D_{max}}\right) + (1 – lr^{(t)}) – \lambda \cdot RE^{(t)} $$
where Dmax is the maximum tolerable delay (e.g., 150 ms), lr(t) is the packet loss rate during the period, and RE(t) is the normalized redundancy overhead. The coefficient λ = min(lr(t) + α, 1) dynamically balances the cost of redundancy against the benefit of reduced loss, ensuring the scheduler learns to use redundancy efficiently.
2.4 NeuralUCB Algorithm and Context Monitoring
We selected the NeuralUCB algorithm as the core learning engine due to its ability to model complex, non-linear relationships between the high-dimensional context and the expected reward. NeuralUCB uses a deep neural network f(x; θ) to predict the reward for a context and constructs an upper confidence bound (UCB) for each action to guide exploration, achieving a theoretical regret bound of Õ(√(d̃T)).
To handle abrupt environmental changes common in drone technology (e.g., sudden signal blockage or a drone passing through a tunnel), we designed a context monitoring mechanism based on the Adaptive Windowing (ADWIN) algorithm extended to multivariate drift detection. For each of the d context dimensions, ADWIN maintains two sub-windows and computes the mean difference:
$$ \Delta^{(i)} = |\mu^{(i)}_1 – \mu^{(i)}_0| $$
If the percentage of dimensions where Δ(i) exceeds a threshold ε surpasses a predefined level, a restart is triggered:
- Soft Restart: If more than 1/4 of dimensions drift, the experience replay buffer is partially purged (keeping the most recent samples) to adapt to gradual shifts.
- Hard Restart: If more than 1/2 of dimensions drift, the buffer is cleared entirely, and the neural network parameters θ are reinitialized to restart learning under a new distribution.
2.5 Algorithm Implementation
The core logic of NeuroFly is summarized below.
| Algorithm 1: NeuroFly Multipath Scheduler | |
|---|---|
| Input: Scheduling period duration TS, Redundancy granularity K | |
| Output: Redundancy rate decision | |
| 1: Initialize: Neural network parameters θ0, Experience replay buffer M | |
| 2: for time step t = 1, 2, …, T do | |
| 3: Obtain current context observation {xt, as}Ks=0 | |
| 4: for each a ∈ A = {a0, a1, …, aK} do | |
| 5: Compute UCB Uat for action a | |
| 6: end for | |
| 7: Select action at = argmaxa∈A Uat | |
| 8: Execute action at and start redundant transmission | |
| 9: Compute reward rt at the end of the scheduling period | |
| 10: Store sample ⟨xt, at, rt⟩ in replay buffer M | |
| 11: Sample a random batch from M for training | |
| 12: Update network parameters θt via SGD | |
| 13: end for |
Context monitoring runs in an asynchronous thread, independent of the main scheduling loop, to avoid impacting real-time performance.
3. Experimental Evaluation
We conducted a comprehensive evaluation of NeuroFly in both simulated and field environments to validate its performance for drone technology video streaming. We compared it against a set of baselines including traditional heuristic schedulers (MinRTT, RR, ECF, BLEST), learning-based schedulers (Peekaboo, QC-MAB, LinFly), and standard single- and multi-path transport protocols (TCP, QUIC, MPTCP, MPQUIC). The QoE metrics used were video frame rate, image structural similarity (SSIM), and buffering time ratio.
3.1 Simulation Results
Simulations were conducted using Mininet-WiFi with two heterogeneous paths whose parameters were randomly sampled from the space defined in the table below over 100 trials.
| Parameter | Path 1 (e.g., 5G-like) | Path 2 (e.g., LTE-like) |
|---|---|---|
| RTT (ms) | 25 – 50 | 50 – 100 |
| Jitter (ms) | 0 – 10 | 0 – 20 |
| Packet Loss (%) | 0 – 3 | 0 – 3 |
| Bandwidth (Mbps) | 20 – 30 | 20 – 30 |
The results for all tested schedulers are summarized in the following table, highlighting the superior performance of NeuroFly in improving video QoE.
| Scheduler | Avg. Frame Rate (fps) | Avg. SSIM | Buffering Time Ratio (%) | 99th %ile Delay (ms) |
|---|---|---|---|---|
| RR | 23.4 | 0.64 | 15.2 | 269.5 |
| MinRTT | 25.7 | 0.69 | 12.7 | 201.3 |
| ECF | 26.5 | 0.73 | 10.5 | 183.6 |
| BLEST | 26.8 | 0.74 | 9.8 | 174.2 |
| Peekaboo | 27.6 | 0.79 | 7.1 | 165.0 |
| QC-MAB | 28.2 | 0.94 | 4.2 | 154.8 |
| LinFly | 27.8 | 0.81 | 5.9 | 160.4 |
| NeuroFly | 29.2 | 0.94 | 3.4 | 132.1 |
The simulation results clearly show that NeuroFly achieves the highest frame rate (29.2 fps), the lowest buffering time (3.4%), and the lowest tail latency (132.1 ms). While QC-MAB matches NeuroFly in SSIM due to its FEC mechanism, NeuroFly’s latency is significantly better. The improvement over the simpler LinUCB-based scheduler (LinFly) justifies the use of the neural network for modeling the non-linear reward function in the dynamic context of drone technology.
3.2 Field Experiment Results
We validated NeuroFly in a real-world drone technology environment using a Holybro-X650 drone equipped with LTE and 5G modules, transmitting a live video stream to a cloud server. We benchmarked NeuroFly against standard, production-deployed transport protocols.
| Transport Protocol | Avg. Frame Rate (fps) | Avg. SSIM | Buffering Time Ratio (%) | 99th %ile Delay (ms) |
|---|---|---|---|---|
| TCP-LTE | 12.7 | 0.61 | 21.5 | 231.2 |
| QUIC-LTE | 13.6 | 0.65 | 18.9 | 213.5 |
| TCP-5G | 14.8 | 0.70 | 15.7 | 197.8 |
| QUIC-5G | 15.9 | 0.73 | 12.4 | 181.4 |
| MPTCP (MinRTT) | 16.4 | 0.78 | 9.8 | 155.2 |
| MPQUIC (MinRTT) | 17.1 | 0.81 | 7.3 | 146.8 |
| NeuroFly | 18.2 | 0.92 | 1.7 | 140.5 |
In the real-world field experiment, NeuroFly again demonstrated its superiority. It achieved the highest frame rate (18.2 fps), the highest SSIM (0.92), and the lowest buffering time (1.7%), representing up to a 76.6% reduction in buffering time compared to TCP-LTE. The 99th-percentile delay was also the lowest among all tested schemes. These results confirm that NeuroFly can deliver robust and high-quality real-time video streaming in the demanding and unpredictable conditions of a real-world drone technology flight.
4. Conclusions and Future Work
This paper introduced NeuroFly, a novel multipath scheduling framework specifically designed for real-time video streaming over drone technology networks. By modeling the scheduling problem as a CMAB, leveraging the powerful non-linear learning capabilities of the NeuralUCB algorithm, and incorporating a context monitoring mechanism for environmental adaptability, NeuroFly effectively addresses the dynamic and heterogeneous challenges inherent to drone technology communications.
Our extensive evaluation in both simulated and real-world field experiments demonstrates that NeuroFly significantly outperforms existing state-of-the-art schedulers and standard transport protocols. It achieves substantial reductions in latency (up to 51%) and buffering time (up to 77.6%), while simultaneously improving video frame rate (up to 24.6%) and image structural similarity (up to 49.2%). The successful deployment and testing in a real drone technology platform validates its practical applicability and robustness.
In future work, we plan to explore the integration of adaptive video encoding techniques with NeuroFly to create a joint source-channel coding optimization framework. Furthermore, extending NeuroFly to multi-drone scenarios with shared network infrastructure is a promising direction for supporting large-scale drone technology operations in applications such as search and rescue, precision agriculture, and infrastructure inspection.
