NeuroFly: A NeuralUCB-Driven Multipath Scheduler for Drone Video Streaming

In the rapidly evolving landscape of low-altitude economies, drone technology has become indispensable for applications such as emergency rescue, disaster monitoring, and urban security. A critical requirement across these scenarios is the real-time transmission of high-definition video from unmanned aerial vehicles to ground control stations, enabling timely situational awareness and decision-making. Deploying multipath transport protocols like Multipath TCP and Multipath QUIC offers a compelling solution by aggregating bandwidth across heterogeneous wireless links, such as LTE and 5G. However, the performance of these protocols is heavily contingent on the scheduling algorithm used to distribute traffic across paths. The dynamic and heterogeneous nature of drone technology networks—characterized by fluctuating latency, variable bandwidth, and sudden signal degradation—presents significant challenges for traditional schedulers.

To address these challenges, we introduce NeuroFly, a novel multipath scheduling framework specifically designed for drone technology video streaming. Our framework models the scheduling problem as a Contextual Multi-Armed Bandit (CMAB) problem and employs the Neural Upper Confidence Bound (NeuralUCB) algorithm to learn an adaptive policy online. This paper details the design, implementation, and evaluation of NeuroFly, demonstrating its superior performance in reducing latency and enhancing video Quality of Experience (QoE) compared to existing state-of-the-art schedulers and standard transport protocols.

1. Problem Formulation as a CMAB

The core of our approach is to view the multipath scheduling decision at each interval as an action selection problem within a CMAB framework. At the start of every scheduling period, the scheduler (the agent) observes a context describing the current state of the network, the video content, and the drone’s flight dynamics. Based on this context, it selects a scheduling action (e.g., a specific redundancy rate for different frame types) and receives a reward reflecting the transmission performance. The goal is to maximize the cumulative reward over time by learning a policy that maps contexts to actions. This formulation naturally balances the exploration-exploitation dilemma, which is crucial in the volatile environments typical of drone technology operations.

2. The NeuroFly Framework

The architecture of NeuroFly is built on four key design pillars: a rich context space, a priority-driven action space, a multi-objective reward function, and the NeuralUCB learning algorithm enhanced by a context monitoring mechanism.

2.1 Context Space Design

To make informed decisions, the context vector x_t at time t must accurately capture the current environment. We construct this context from three distinct feature groups:

Path State (s^(t)_all): For each of the n paths, we collect five metrics: smoothed round-trip time (srtt), available bandwidth (bw), packet loss rate (plr), average congestion window (cwnd), and average signal strength (rss). The smoothed RTT is calculated as:

$$ srtt^{(t)}_p = \begin{cases} rtt^{(t)}, & t = 0 \\ \gamma \cdot rtt^{(t)} + (1-\gamma) \cdot srtt^{(t-1)}_p, & t > 0 \end{cases} $$

where the smoothing factor γ is set to 0.125. The state vector for a single path p is:

$$ s^{(t)}_p = [srtt^{(t)}_p, bw^{(t)}_p, plr^{(t)}_p, cwnd^{(t)}_p, rss^{(t)}_p]^T \in \mathbb{R}^5 $$

The global path state concatenates all paths: s^(t)_all ∈ ℝ⁵ⁿ.

Video Encoding Features (v^(t)): We track the average sizes of I-frames, P-frames, and B-frames (l_I, l_P, l_B) from the last Group of Pictures (GOP). This informs the scheduler about the bandwidth requirements of upcoming frames.

$$ v^{(t)} = [l^{(t)}_I, l^{(t)}_P, l^{(t)}_B]^T \in \mathbb{R}^3 $$

Drone Flight Parameters (e^(t)): The drone’s altitude (h), vertical speed (v_v), and horizontal speed (v_h) are included to sense mobility-induced channel variations.

$$ e^{(t)} = [h^{(t)}, v^{(t)}_v, v^{(t)}_h]^T \in \mathbb{R}^3 $$

The complete context vector is the fusion of all features:

$$ x_t = [(s^{(t)}_{all})^T, (v^{(t)})^T, (e^{(t)})^T]^T \in \mathbb{R}^{5n+6} $$

2.2 Action Space Design

Our action space incorporates a frame-priority-driven redundant transmission mechanism. We define K+1 candidate base redundancy rates, RE_init ∈ {0, 1/K, …, 1}. The scheduler first selects RE_init and then adjusts it for each frame type based on its relative size and criticality:

$$ RE_{type} = RE_{init} \cdot \frac{l_{type}}{l_{sum}}, \quad type \in \{I, P, B\} $$

Here, l_sum = l_I + l_P + l_B. Original video data is always sent over the fastest path (lowest RTT), while redundant copies are transmitted over the second fastest path. This targeting of redundancy to the most critical frames (I > P > B) efficiently improves delivery reliability without excessive bandwidth waste.

2.3 Multi-Objective Reward Function

At the end of each scheduling period, the agent receives a reward r_t that encourages low latency, minimal packet loss, and efficient bandwidth usage:

$$ r_t = \left(1 – \frac{\min(srtt^{(t)}, D_{max})}{D_{max}}\right) + (1 – lr^{(t)}) – \lambda \cdot RE^{(t)} $$

where D_max is the maximum tolerable delay (e.g., 150 ms), lr^(t) is the packet loss rate during the period, and RE^(t) is the normalized redundancy overhead. The coefficient λ = min(lr^(t) + α, 1) dynamically balances the cost of redundancy against the benefit of reduced loss, ensuring the scheduler learns to use redundancy efficiently.

2.4 NeuralUCB Algorithm and Context Monitoring

We selected the NeuralUCB algorithm as the core learning engine due to its ability to model complex, non-linear relationships between the high-dimensional context and the expected reward. NeuralUCB uses a deep neural network f(x; θ) to predict the reward for a context and constructs an upper confidence bound (UCB) for each action to guide exploration, achieving a theoretical regret bound of Õ(√(d̃T)).

To handle abrupt environmental changes common in drone technology (e.g., sudden signal blockage or a drone passing through a tunnel), we designed a context monitoring mechanism based on the Adaptive Windowing (ADWIN) algorithm extended to multivariate drift detection. For each of the d context dimensions, ADWIN maintains two sub-windows and computes the mean difference:

$$ \Delta^{(i)} = |\mu^{(i)}_1 – \mu^{(i)}_0| $$

If the percentage of dimensions where Δ⁽ⁱ⁾ exceeds a threshold ε surpasses a predefined level, a restart is triggered:

Soft Restart: If more than 1/4 of dimensions drift, the experience replay buffer is partially purged (keeping the most recent samples) to adapt to gradual shifts.
Hard Restart: If more than 1/2 of dimensions drift, the buffer is cleared entirely, and the neural network parameters θ are reinitialized to restart learning under a new distribution.

2.5 Algorithm Implementation

The core logic of NeuroFly is summarized below.

Algorithm 1: NeuroFly Multipath Scheduler
Input: Scheduling period duration T_S, Redundancy granularity K
Output: Redundancy rate decision
1: Initialize: Neural network parameters θ₀, Experience replay buffer M
2: for time step t = 1, 2, …, T do
3: Obtain current context observation {x_t, a_s}^K_s=0
4: for each a ∈ A = {a₀, a₁, …, a_K} do
5: Compute UCB U^a_t for action a
6: end for
7: Select action a_t = argmax_a∈A U^a_t
8: Execute action a_t and start redundant transmission
9: Compute reward r_t at the end of the scheduling period
10: Store sample ⟨x_t, a_t, r_t⟩ in replay buffer M
11: Sample a random batch from M for training
12: Update network parameters θ_t via SGD
13: end for

Context monitoring runs in an asynchronous thread, independent of the main scheduling loop, to avoid impacting real-time performance.

3. Experimental Evaluation

We conducted a comprehensive evaluation of NeuroFly in both simulated and field environments to validate its performance for drone technology video streaming. We compared it against a set of baselines including traditional heuristic schedulers (MinRTT, RR, ECF, BLEST), learning-based schedulers (Peekaboo, QC-MAB, LinFly), and standard single- and multi-path transport protocols (TCP, QUIC, MPTCP, MPQUIC). The QoE metrics used were video frame rate, image structural similarity (SSIM), and buffering time ratio.

3.1 Simulation Results

Simulations were conducted using Mininet-WiFi with two heterogeneous paths whose parameters were randomly sampled from the space defined in the table below over 100 trials.

Parameter	Path 1 (e.g., 5G-like)	Path 2 (e.g., LTE-like)
RTT (ms)	25 – 50	50 – 100
Jitter (ms)	0 – 10	0 – 20
Packet Loss (%)	0 – 3	0 – 3
Bandwidth (Mbps)	20 – 30	20 – 30

The results for all tested schedulers are summarized in the following table, highlighting the superior performance of NeuroFly in improving video QoE.

Scheduler	Avg. Frame Rate (fps)	Avg. SSIM	Buffering Time Ratio (%)	99th %ile Delay (ms)
RR	23.4	0.64	15.2	269.5
MinRTT	25.7	0.69	12.7	201.3
ECF	26.5	0.73	10.5	183.6
BLEST	26.8	0.74	9.8	174.2
Peekaboo	27.6	0.79	7.1	165.0
QC-MAB	28.2	0.94	4.2	154.8
LinFly	27.8	0.81	5.9	160.4
NeuroFly	29.2	0.94	3.4	132.1

The simulation results clearly show that NeuroFly achieves the highest frame rate (29.2 fps), the lowest buffering time (3.4%), and the lowest tail latency (132.1 ms). While QC-MAB matches NeuroFly in SSIM due to its FEC mechanism, NeuroFly’s latency is significantly better. The improvement over the simpler LinUCB-based scheduler (LinFly) justifies the use of the neural network for modeling the non-linear reward function in the dynamic context of drone technology.

3.2 Field Experiment Results

We validated NeuroFly in a real-world drone technology environment using a Holybro-X650 drone equipped with LTE and 5G modules, transmitting a live video stream to a cloud server. We benchmarked NeuroFly against standard, production-deployed transport protocols.

Transport Protocol	Avg. Frame Rate (fps)	Avg. SSIM	Buffering Time Ratio (%)	99th %ile Delay (ms)
TCP-LTE	12.7	0.61	21.5	231.2
QUIC-LTE	13.6	0.65	18.9	213.5
TCP-5G	14.8	0.70	15.7	197.8
QUIC-5G	15.9	0.73	12.4	181.4
MPTCP (MinRTT)	16.4	0.78	9.8	155.2
MPQUIC (MinRTT)	17.1	0.81	7.3	146.8
NeuroFly	18.2	0.92	1.7	140.5

In the real-world field experiment, NeuroFly again demonstrated its superiority. It achieved the highest frame rate (18.2 fps), the highest SSIM (0.92), and the lowest buffering time (1.7%), representing up to a 76.6% reduction in buffering time compared to TCP-LTE. The 99th-percentile delay was also the lowest among all tested schemes. These results confirm that NeuroFly can deliver robust and high-quality real-time video streaming in the demanding and unpredictable conditions of a real-world drone technology flight.

4. Conclusions and Future Work

This paper introduced NeuroFly, a novel multipath scheduling framework specifically designed for real-time video streaming over drone technology networks. By modeling the scheduling problem as a CMAB, leveraging the powerful non-linear learning capabilities of the NeuralUCB algorithm, and incorporating a context monitoring mechanism for environmental adaptability, NeuroFly effectively addresses the dynamic and heterogeneous challenges inherent to drone technology communications.

Our extensive evaluation in both simulated and real-world field experiments demonstrates that NeuroFly significantly outperforms existing state-of-the-art schedulers and standard transport protocols. It achieves substantial reductions in latency (up to 51%) and buffering time (up to 77.6%), while simultaneously improving video frame rate (up to 24.6%) and image structural similarity (up to 49.2%). The successful deployment and testing in a real drone technology platform validates its practical applicability and robustness.

In future work, we plan to explore the integration of adaptive video encoding techniques with NeuroFly to create a joint source-channel coding optimization framework. Furthermore, extending NeuroFly to multi-drone scenarios with shared network infrastructure is a promising direction for supporting large-scale drone technology operations in applications such as search and rescue, precision agriculture, and infrastructure inspection.