Research on Web-Based UAV Flight Operation Platform Using DJI SDK

The rapid evolution of Unmanned Aerial Vehicle (UAV) technology has led to its widespread application across numerous sectors. In the context of China’s infrastructure development, UAVs, or drones, have become instrumental in sectors such as railway maintenance, facilitating tasks like oblique photography for 3D modeling, facility inspection at stations and along tracks, and emergency response during floods. This technological adoption signifies a shift towards more intelligent and information-driven operational paradigms across Chinese industries. However, the predominant mode of operation for these China UAV drones still relies heavily on on-site pilots, leading to a lack of centralized management and technical oversight. Flight route planning often occurs without sufficient supervision, and real-time risks during missions are difficult to assess and control. In cases of geofence breaches or emergencies, timely intervention is challenging, posing significant safety hazards and reducing operational efficiency.

Current UAV management solutions within the industry predominantly utilize Mobile SDK (MSDK), which confines the control and monitoring applications to Android-based handheld devices. While these platforms enable basic data acquisition and monitoring, they are inherently limited by the mobile ecosystem. This limitation hinders deep integration with enterprise-level information systems and lacks the flexibility and cross-platform scalability required for comprehensive fleet management. To address these challenges, there is a clear necessity to develop a web-based UAV operation management and control platform. Such a platform, leveraging DJI’s Cloud SDK and Payload SDK (PSDK), can facilitate the connection of drones and their remote controllers to the cloud, establishing a new technical pathway for the remote supervision and control of UAV flight operations.

1. UAV SDK Ecosystem Analysis

The DJI SDK ecosystem provides developers with a suite of tools tailored for different integration levels. The key SDKs are analyzed below:

SDK Type Core Function Target Platform/Scenario Platform Selection Rationale
Cloud SDK Cloud service integration for device management, data storage/distribution, flight log sync. Multi-device协同 management, web platforms. Essential for enabling UAV/controller cloud connectivity and remote, fine-grained payload control, fitting dispersed and complex inspection scenarios in China’s railway and other sectors.
Payload SDK (PSDK) Development for onboard payloads (cameras, LiDAR), enabling communication and control. Custom payload control, task automation.
Mobile SDK (MSDK) Basic control, real-time data (FPV, telemetry) on iOS/Android. Mobile app development for pilots. Limited by mobile OS, not suitable for deep enterprise web integration.
Onboard SDK (OSDK) Access to onboard computing for advanced autonomous behaviors and vision models. Edge computing, advanced autonomy. Not selected for the core Web control platform but noted for future AI integration.

For operational scenarios prevalent in China, such as precise oblique photography collection and intelligent inspection of railway lines involving tunnels and bridges, the platform requires both reliable cloud connectivity and deep control over payloads. Therefore, the combination of Cloud SDK and Payload SDK was selected. Both SDKs support Web system development. The Cloud SDK provides the foundational capability for device-to-cloud connectivity, enabling real-time networking for drones and remote controllers. The PSDK allows for granular control of mounted payloads. When integrated, they enable remote, precise execution of complex flight tasks via a web interface, which is perfectly suited for the distributed and challenging environments encountered in China UAV drone operations.

2. Overall Platform Architecture

The platform’s functionality is decomposed into several core modules: UAV cloud connectivity, flight data transmission, media streaming, route planning synchronization, and flight command control. The cornerstone of this system is establishing cloud connectivity for the drone. The remote controller (e.g., DJI RC Plus) acts as a relay gateway, connecting the UAV to a custom third-party cloud service to facilitate bidirectional data flow. The overall architecture is depicted conceptually below, illustrating the integration of these modules within an enterprise cloud environment.

2.1 MQTT-based Flight Data Transmission

The DJI Pilot application allows for the loading of custom third-party cloud service modules. By developing such a module using the Cloud SDK and loading it into Pilot2, we gain the ability to access the UAV’s flight state data. This data, including attitude, GPS coordinates, battery level, and mission status, is relatively small in volume but demands high real-time performance and reliability. The platform employs the MQTT (Message Queuing Telemetry Transport) protocol for this purpose. MQTT is a lightweight, publish-subscribe-based messaging protocol known for its low overhead, high real-time capability, and robustness in unstable network conditions—making it ideal for IoT and UAV applications. Its Quality of Service (QoS) mechanisms ensure reliable message delivery. An MQTT gateway deployed on the cloud platform acts as a message broker, facilitating bidirectional communication between the web backend and the remote controller.

2.2 RTMP-based Media Streaming

The requirements for video data transmission differ significantly from those of telemetry data. Video streams require sustained high frame rates and continuity. Therefore, the platform utilizes the RTMP (Real-Time Messaging Protocol) for video pushing. RTMP is a mature protocol for real-time video streaming, ensuring low-latency and continuous video feed. The DJI Pilot application supports RTMP output. A streaming media server (e.g., SRS) is deployed on the enterprise cloud to receive the RTMP stream from the drone/controller and simultaneously serve it to web clients, enabling live viewing of the First-Person View (FPV) or gimbal camera feed.

2.3 Real-time Frontend-Backend Communication

The backend server subscribes to the UAV’s flight data topics via MQTT. It then establishes a persistent Websocket connection with the frontend web page. This creates a seamless data pipeline: MQTT → Websocket → Web Frontend. This pipeline allows for the real-time presentation of flight status data (position, speed, altitude, etc.) on the user’s web dashboard, providing immediate situational awareness.

3. Key Technology Implementation

3.1 Third-party Cloud Service Development

While DJI’s native cloud service offers basic functions, it often falls short for enterprise needs regarding data access and business flow integration. Its APIs are often closed or restrictive, making direct data extraction and integration with custom information systems difficult. Developing a custom third-party cloud service overcomes these limitations, enabling flexible functional expansion and seamless integration with the enterprise’s cloud platform.

3.1.1 JSBridge Interface Invocation
JSBridge (JavaScript Bridge) is a communication mechanism that establishes bidirectional communication between a Web frontend (H5 page) and a native application. In our implementation, the DJI Pilot app (an Android application) loads our custom cloud service’s H5 page. Through JSBridge, the H5 page can invoke native functions provided by the Cloud SDK. The WebView container within Pilot injects a `window.djiBridge` object, exposing native capabilities to the JavaScript environment. This allows the H5 page to request device information, flight data, and more. The core code pattern involves calling methods on this bridge object to retrieve critical identifiers like the Serial Numbers (SN) of the remote controller and the aircraft, which are then used as unique keys for MQTT topics.

3.1.2 EMQX Broker Setup
Once flight state data is acquired in the H5 page, an MQTT client SDK is used to publish this data to a message broker. We deployed EMQX as our MQTT gateway. It supports a rules engine to forward messages from specific topics to enterprise HTTP APIs. The publishing topic format uses the drone’s SN code: `thing/product/{SN}/osd`. The web backend, acting as another MQTT client, subscribes to this topic to receive the data.

3.1.3 Cloud Service Loading and Data Acquisition
The complete third-party service comprises frontend and backend components. The frontend provides UI for authentication, device binding, route sync, and status management, integrating the H5-JSBridge interfaces. Dynamic flight data is published to EMQX, while static data (like SN codes) is sent via HTTP to the backend for storage. The backend service manages device info, routes, logs, and most importantly, subscribes to flight data via MQTT. After loading the service, the Pilot interface allows for API debugging. The MQTT message structure for flight data is complex; key attributes are summarized below:

Column Name Type Description
longitude float Aircraft current longitude.
latitude float Aircraft current latitude.
horizontal_speed float Aircraft horizontal velocity.
vertical_speed float Aircraft vertical velocity.
elevation float Altitude relative to takeoff point.
gimbal_pitch double Gimbal pitch angle rotation.

3.2 Streaming Media Live Broadcast Development

3.2.1 SRS Streaming Server Deployment
To relay and distribute the real-time video, a streaming media server is essential. While Nginx-RTMP is a common choice, we selected SRS (Simple Realtime Server) for its superior performance in high-concurrency scenarios and lower latency, which is critical for real-time drone video feeds in China UAV drone operations.

Media Server Supported Protocols Performance Latency
Nginx-RTMP Mainly RTMP, some HLS Can bottleneck under high concurrency Higher, typically 3-5s+
SRS RTMP, HLS, HTTP-FLV, etc. Optimized for high concurrency Lower, typically ~2s

3.2.2 HTTP-FLV Format Conversion
Modern browsers do not natively support the RTMP protocol. Therefore, the SRS server performs protocol conversion, transmuxing the incoming RTMP stream into an HTTP-FLV stream. HTTP-FLV uses a persistent connection to deliver FLV-encapsulated audio/video data, offering lower latency compared to HLS. Frontend web players like flv.js can then play this HTTP-FLV stream directly within a standard HTML5 video element.

3.3 Flight Route Planning Development

This functionality enables the synchronization of flight route files (in .kmz format) between the web platform and the DJI Pilot 2 application within a shared workspace. KMZ is a compressed package containing KML (Keyhole Markup Language) files, which are XML-based descriptions of geographic features.

3.3.1 Route File Synchronization
Route files are stored on an Object Storage Service (OSS). The web backend maintains a database index for these files. When the mission module loads in Pilot2, it fetches the list of available route files via HTTP. Similarly, the web frontend can download these files, extract and convert the KML data to GeoJSON for editing, and upload modified versions back to the server. A temporary upload credential from DJI’s cloud service is required for Pilot2 to upload files directly to the OSS, ensuring security.

3.3.2 Web-based Route File Modification
After retrieving a KMZ file, the web platform extracts the KML, converts it to GeoJSON, and visualizes it on an interactive map built with the Leaflet library. The base map utilizes OpenStreetMap, which provides street-level detail under an open license. Users can then directly modify the flight path waypoints or polygon boundaries on the web map. The modified GeoJSON is converted back to KML/KMZ and synchronized to the server, making it available for the Pilot2 application. This empowers ground controllers to plan and adjust complex survey or inspection missions for China UAV drones from a centralized web console.

3.4 Virtual Joystick Control Development

Leveraging DJI’s PSDK, the platform implements remote, intelligent control capabilities. It’s important to note that not all DJI drones support PSDK, and versions vary. This research is based on models like the Matrice 350 RTK and Mavic 3 Enterprise series.

3.4.1 Command Channel Establishment
Control commands are categorized into Flight Control commands (Direct Remote Control – DRC) and Payload Control commands. Flight commands govern movement along the Pitch, Roll, and Yaw axes, as well as take-off and Return-to-Home. Payload commands control gimbal movement and camera parameters. A dedicated MQTT channel is established for command transmission, with separate topics for uplink (cloud to device) and downlink (device to cloud). This dedicated channel, allocated after a successful MQTT connection, ensures faster command transmission and response for critical flight control of the China UAV drone.

3.4.2 Web-based Virtual Joystick Interface
A virtual joystick interface is constructed on the web frontend, mimicking the physical controls of a standard remote controller. Keyboard and mouse events are bound to this interface. User inputs from these virtual controls are translated into standardized PSDK commands, which are then sent via the dedicated MQTT command channel to control the drone’s flight and its gimbal payload in real-time. This provides a familiar yet powerful control scheme accessible from any web browser.

4. Experiment and Application Analysis

4.1 Data Transmission Testing and Latency Analysis

4.1.1 Media Streaming Test
Tests were conducted using a DJI Matrice 350 RTK drone and DJI RC Plus controller under different 5G network conditions. The SRS server’s performance was evaluated for latency and stability.

Resolution Obstruction Distance Latency Latency Jitter
1080P Yes 2 km 2.0 – 2.5 s High
1080P No 2 km 1.8 – 2.3 s Low
720P No 500 m 1.0 – 1.5 s Low

The results indicate that the primary bottleneck for live video latency remains the private video transmission link between the drone and the controller. Distance and physical obstructions significantly increase latency and jitter.

4.1.2 Flight Data Real-time Transmission
The UAV publishes telemetry data at a fixed frequency of 2 Hz. Under good network conditions, the MQTT transmission via EMQX demonstrated high real-time performance. The end-to-end delay for telemetry data was primarily concentrated between 50-120 ms, with an average of approximately 80 ms. This low and stable latency is sufficient for real-time status monitoring.

4.1.3 Virtual Joystick Control Transmission
Testing the dedicated MQTT command channel revealed excellent performance for remote control. Commands for basic aircraft movement (pitch, roll, yaw) showed an average transmission latency of 50-80 ms. Commands for gimbal control exhibited slightly higher latency, averaging 100-150 ms. Both results confirm the feasibility and effectiveness of real-time web-based control for China UAV drones.

The overall control loop latency $L_{total}$ can be modeled as the sum of several components:

$$L_{total} = L_{web\_proc} + L_{mqtt\_up} + L_{rc\_proc} + L_{air\_proc} + L_{video\_down}$$

Where $L_{web\_proc}$ is web interface processing delay, $L_{mqtt\_up}$ is uplink MQTT delay (~80ms), $L_{rc\_proc}$ is remote controller processing delay, $L_{air\_proc}$ is aircraft flight controller processing delay, and $L_{video\_down}$ is the downlink video latency (~2000ms). The mismatch between low $L_{mqtt\_up}$ and high $L_{video\_down}$ is a key challenge.

4.2 Flight Safety Analysis

For China UAV drone operations like surveying and inspection, safety is paramount. The platform enhances safety through geofencing and terrain awareness.

4.2.1 Boundary Breach and Intrusion Detection
Electronic geofences are defined as GeoJSON polygons representing approved flight areas. The backend continuously monitors the drone’s live latitude and longitude from the MQTT stream. A simple point-in-polygon algorithm is used for detection. If the drone’s position $P(x_{lon}, y_{lat})$ falls outside the defined polygon $G$, a breach is detected:

$$\text{Breach} = \begin{cases}
\text{True}, & \text{if } P \notin G \\
\text{False}, & \text{if } P \in G
\end{cases}$$

Furthermore, a Digital Surface Model (DSM) of the operational area is integrated. The DSM provides elevation data for surface features (buildings, trees). By comparing the drone’s current altitude $A_{rel}$ (relative to takeoff) with the terrain elevation $E_{terrain}$ at its projected path, potential collisions can be predicted. If $E_{terrain} > A_{rel}$ for an upcoming position, a collision warning is triggered.

4.2.2 UAV Braking Verification
In field tests, upon breach detection, a stop command was sent via the control channel. The total latency from breach detection to the drone executing a hover command was between 1-1.5 seconds (500ms-1s for command transmission + ~500ms for UAV response). The drone successfully achieved a stable hover near the boundary, validating the safety intervention mechanism.

4.3 Flight Log Storage

Using the H5-JSBridge interface, comprehensive flight logs—containing detailed attitude parameters, system states, and mission records—are retrieved and stored in a structured database. This creates a traceable audit trail. The logs are indexed against specific flight missions and telemetry data, enabling post-mission analysis, reconstruction of flight events, and identification of operational anomalies or risk patterns for future optimization of China UAV drone operations.

5. Discussion and Future Outlook

5.1 Existing Limitations

While the platform demonstrates core functionalities, several limitations are acknowledged:
1. Command-Video Latency Mismatch: The disparity between low-latency command channels (~100ms) and higher-latency video streams (~2s) can lead to a disconnect between control input and visual feedback, complicating precise real-time maneuvers.
2. Limited Virtual Control Dimensions: The current virtual joystick offers basic 3-axis flight control but lacks the nuanced control required for highly complex or delicate tasks.
3. Simplistic Safety Algorithms: Safety relies on static geofences and DSM data. It cannot dynamically identify and react to unforeseen moving obstacles or complex environmental hazards.
4. Underutilized Flight Logs: Logs are primarily used for storage and basic review, not yet leveraged for advanced analytics like mission pattern recognition, predictive risk modeling, or automated decision support.

5.2 Research Prospects

5.2.1 Intelligent Route Planning: For applications like railway inspection in China, future work will integrate deep learning to enable autonomous route optimization. The China UAV drone would automatically identify key infrastructure (bridges, tunnels, signals) and adjust its flight path and inspection focus based on priority and detected anomalies.
5.2.2 Enhanced Virtual Control: Future interfaces could incorporate multi-modal interaction, such as voice commands, gesture recognition, or haptic feedback, to overcome the limitations of 2D screen-based joysticks and provide richer, more intuitive control.
5.2.3 AI-Integrated Flight Safety: Moving beyond static geofences, future systems will integrate computer vision models, potentially deployed via DJI’s Onboard SDK (OSDK) to leverage the drone’s own computational resources. This would enable real-time, on-board detection of dynamic obstacles (e.g., birds, other aircraft, unauthorized personnel), leading to near-zero latency autonomous avoidance maneuvers.
5.2.4 Flight Log Observability: Adopting observability principles, future research will treat flight logs, metrics, and traces as a unified dataset. Advanced analytics and machine learning can be applied to this data to gain deep insights into system health, predict failures, and optimize overall fleet performance for large-scale China UAV drone operations.

6. Conclusion

This research successfully designed and implemented a web-based UAV flight operation monitoring and control platform utilizing DJI’s Cloud SDK and Payload SDK. By analyzing existing solutions and the SDK ecosystem, we developed a platform tailored for operational scenarios like oblique photography and infrastructure inspection in China. The platform provides four core capabilities: cloud-based telemetry data acquisition, real-time video streaming, synchronized route planning, and web-based virtual joystick control. Experimental tests confirmed that the latency for media streaming, MQTT data transmission, and command control is within acceptable and controllable ranges for effective remote supervision and operation. Furthermore, the platform implements a basic safety framework using geofencing and terrain data to detect boundary breaches and trigger automated hovering. This system demonstrates the feasibility and provides a valuable reference model for enterprises seeking to develop their own customized, web-centric management and control systems for UAV fleets, particularly in the context of advancing China UAV drone applications. Future work will focus on integrating AI technologies to enhance the platform’s intelligence, adaptability, and safety in complex operational environments.

Scroll to Top