An Improved ORB-Based Image Registration Method for Police UAVs

In recent years, the use of police UAVs (unmanned aerial vehicles) has become increasingly prevalent for surveillance, disaster response, and law enforcement operations. These police UAVs capture aerial images that are large in size, high in resolution, and rich in information. However, due to limitations in flight altitude and camera焦距, a single image often covers a small field of view, making it challenging to form a comprehensive understanding of the scene. Therefore, stitching multiple aerial images into a panoramic view is essential. Image registration is the core step in image stitching, directly impacting the quality of the final output. Thus, achieving fast and accurate image registration is a critical research focus. In this paper, we propose an improved ORB (Oriented FAST and Rotated BRIEF) algorithm-based method for image registration tailored to police UAV applications. Our approach addresses the uneven distribution and clustering of feature points in traditional ORB, making it more suitable for the high-resolution images captured by police UAVs.

We begin by reviewing existing methods. For instance, some researchers have applied enhanced SIFT algorithms to drone remote sensing image matching, reducing complexity but often yielding fewer matches. Others have used SURF algorithms combined with RANSAC to eliminate mismatches for rapid UAV image stitching. While ORB has been noted for its speed and performance in real-time systems, it suffers from uneven feature point distribution, which is problematic for large police UAV images. To overcome this, we introduce a novel method that constructs a mask in the image to be registered, moves it gradually to detect ORB feature points, applies non-maximal suppression to remove clustered points, and uses PROSAC for match purification and transformation matrix calculation. This ensures uniform feature distribution and higher registration accuracy.

The ORB feature extraction algorithm consists of two main parts: feature detection and feature description. In feature detection, ORB employs the FAST corner detector with improvements for scale and rotation invariance. It builds an image pyramid to handle scale changes. For each level, FAST detects corners by comparing pixel intensities on a circle of 16 points around a candidate point $p$. If $n$ contiguous points are all brighter or darker than $I_p + t$ or $I_p – t$, where $I_p$ is the intensity of $p$ and $t$ is a threshold, then $p$ is considered a corner. Typically, $n$ is set to 12 or 9. ORB then computes the Harris corner response to select the top $N$ points and assigns orientation using the intensity centroid method. The moment $m_{pq}$ for a neighborhood $S$ around a feature point is defined as:

$$ m_{pq} = \sum_{x,y} x^p y^q I(x,y) $$

The centroid $C$ is:

$$ C = \left( \frac{m_{10}}{m_{00}}, \frac{m_{01}}{m_{00}} \right) $$

And the orientation $\theta$ is:

$$ \theta = \text{atan2}(m_{01}, m_{10}) $$

For feature description, ORB uses the rBRIEF descriptor, which is a binary string obtained by comparing intensities of point pairs. Given a neighborhood $p$ of size 31×31, the binary test function is:

$$ \tau(p; x, y) = \begin{cases} 1, & \text{if } p(x) < p(y) \\ 0, & \text{otherwise} \end{cases} $$

For $n$ point pairs, the descriptor is:

$$ f_n(p) = \sum_{1 \leq i \leq n} 2^{i-1} \tau(p; x_i, y_i) $$

To achieve rotation invariance, ORB applies a rotation matrix $R_\theta$ to the point pair matrix $S$, resulting in $S_\theta = R_\theta S$. The oriented descriptor is:

$$ g_n(p, \theta) = f_n(p) | (x_i, y_i) \in S_\theta $$

ORB uses greedy search to select 256 point pairs with high variance and low correlation, optimizing discriminability. This makes ORB fast and efficient, but as noted, feature points tend to cluster in central regions of images, which is suboptimal for police UAV images that require uniform coverage.

To address this, we propose an improved ORB method. The key idea is to ensure uniform feature point distribution across the entire image, which is crucial for accurate registration in police UAV applications. Our method involves three main steps: mask-based feature detection, non-maximal suppression, and feature description. First, we construct a mask of size $m \times n$ in the image to be registered, where $m = M/4$ and $n = N/4$ for an image of size $M \times N$. This mask is moved gradually across the image with a step size of $3m/4$ horizontally and $3n/4$ vertically. At each position, we apply the ORB algorithm to detect feature points within the mask. This ensures that feature points are extracted from all regions, including edges, which are often neglected in standard ORB. For police UAV images, this is vital because scenes may contain critical details in peripheral areas.

However, even with mask-based detection, feature points can still cluster. Therefore, we apply non-maximal suppression to remove redundant points. For each feature point, we identify all other points within its neighborhood and sort them based on their Harris corner response values, which are already computed during ORB detection. We retain only the point with the highest response in each neighborhood. This process reduces clustering without adding significant computational overhead, preserving the speed advantages of ORB. The algorithm can be summarized as follows:

Initialize an empty list for final feature points.
For each feature point $p_i$ in the detected set:
- Find all points $p_j$ within a specified radius (e.g., based on image dimensions).
- Sort $p_j$ by Harris response $H(p_j)$.
- If $H(p_i)$ is the maximum, add $p_i$ to the final list; otherwise, discard it.

Mathematically, for a point $p$ with neighborhood $N(p)$, we keep $p$ if:

$$ H(p) = \max_{q \in N(p)} H(q) $$

After suppression, we compute the orientation and descriptor for the remaining points using the standard ORB descriptor, as described earlier. This yields a set of well-distributed feature points suitable for police UAV image registration.

Next, we perform feature matching using Hamming distance. For each feature point in the reference image, we find the nearest and second-nearest neighbors in the target image based on Hamming distance between their binary descriptors. Let $d_1$ and $d_2$ be the shortest and second-shortest Hamming distances, respectively. We consider a match valid if $d_1 < 50$ and the ratio $d_1 / d_2 < 0.7$. This ratio test helps reduce false matches. The Hamming distance between two binary strings $A$ and $B$ of length $L$ is defined as:

$$ D_H(A, B) = \sum_{i=1}^{L} (A_i \oplus B_i) $$

where $\oplus$ denotes the XOR operation. For police UAV images, which may have repetitive textures, this matching strategy enhances reliability.

To further purify matches, we use the PROSAC (Progressive Sample Consensus) algorithm, an improvement over RANSAC. PROSAC sorts matches based on quality—defined as the ratio $d_1 / d_2$—and progressively samples higher-quality data to estimate the transformation matrix. This increases robustness and computational efficiency. The transformation between two images is represented by a homography matrix $H$, which accounts for translation, rotation, and scaling. For a pair of matching points $p_1(x, y)$ and $p_2(x’, y’)$, the relationship is:

$$ \begin{bmatrix} x’ \\ y’ \\ 1 \end{bmatrix} = H \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \begin{bmatrix} a_0 & a_1 & a_2 \\ a_3 & a_4 & a_5 \\ a_6 & a_7 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} $$

We randomly select 4 matches from the top $m$ quality-ranked pairs to compute $H$, then evaluate inliers based on a reprojection error threshold. The process iterates until a maximum number of iterations is reached, returning the model with the most inliers. This ensures accurate registration even for police UAV images with geometric distortions.

We conducted experiments to evaluate our method using standard datasets from the Oxford Visual Geometry Group, which include images with scale and rotation, blur, viewpoint, and illumination changes. We compared our improved ORB with SIFT, SURF, and standard ORB algorithms in terms of matching accuracy and registration time. Matching accuracy is defined as the ratio of correct matches to total matches, and registration time includes feature extraction, matching, and purification. For police UAV applications, both accuracy and speed are critical.

For scale and rotation changes, our method achieved a matching rate of 0.9608, outperforming ORB (0.8824), SURF (0.7209), and SIFT (0.8475). The results are summarized in Table 1.

Algorithm	Total Matches	Correct Matches	Matching Rate
SIFT	2827	2396	0.8475
SURF	1218	878	0.7209
ORB	255	225	0.8824
Improved ORB	637	612	0.9608

For blur changes, our method maintained a high accuracy of 0.9643, while ORB scored 0.9309, SURF 0.8486, and SIFT 0.7120. This demonstrates the robustness of our approach to image degradation common in police UAV footage due to motion or environmental factors.

Algorithm	Total Matches	Correct Matches	Matching Rate
SIFT	1052	749	0.7120
SURF	865	734	0.8486
ORB	333	310	0.9309
Improved ORB	729	703	0.9643

For viewpoint changes, our method achieved a matching rate of 0.9131, compared to ORB’s 0.8835, SURF’s 0.6694, and SIFT’s 0.8150. This highlights its adaptability to perspective variations often encountered by police UAVs flying at different angles.

Algorithm	Total Matches	Correct Matches	Matching Rate
SIFT	1281	1044	0.8150
SURF	735	492	0.6694
ORB	266	235	0.8835
Improved ORB	495	452	0.9131

For illumination changes, our method scored 0.9527, a significant improvement over ORB’s 0.8436, SURF’s 0.8300, and SIFT’s 0.8801. Police UAVs often operate in varying lighting conditions, making this enhancement particularly valuable.

Algorithm	Total Matches	Correct Matches	Matching Rate
SIFT	1309	1152	0.8801
SURF	853	708	0.8300
ORB	275	232	0.8436
Improved ORB	930	886	0.9527

In terms of registration time, we averaged the times across all four change types. ORB was the fastest at 1.139 seconds, followed by our improved ORB at 3.141 seconds, SURF at 5.746 seconds, and SIFT at 15.636 seconds. While our method is slower than ORB, it remains faster than SIFT and SURF, and the trade-off is justified by the higher accuracy for police UAV images.

Algorithm	Average Time (seconds)
SIFT	15.636
SURF	5.746
ORB	1.139
Improved ORB	3.141

To further validate our method, we applied it to real police UAV aerial images captured during a surveillance operation. The images had a resolution of 3840×2160 pixels. Compared to standard ORB, our method produced more uniformly distributed feature points and a greater number of correct matches, leading to superior registration results. This is essential for stitching high-resolution police UAV imagery into cohesive panoramas for situational awareness.

The mathematical formulation of our improved ORB can be extended to optimize performance. Let $I$ be the input image of size $M \times N$. We define the mask $W$ of size $m \times n$ where $m = \lfloor M/4 \rfloor$ and $n = \lfloor N/4 \rfloor$. The mask moves across $I$ with strides $s_x = \lfloor 3m/4 \rfloor$ and $s_y = \lfloor 3n/4 \rfloor$. At each position $(u,v)$, the sub-image $I_{u,v}$ is extracted as:

$$ I_{u,v}(x,y) = I(u + x, v + y) \quad \text{for } 0 \leq x < m, 0 \leq y < n $$

Feature detection is applied to $I_{u,v}$ using ORB, yielding a set of points $P_{u,v}$. After processing all positions, the union set $P = \bigcup_{u,v} P_{u,v}$ is obtained. Non-maximal suppression is then applied globally. For each point $p \in P$, we define a circular neighborhood of radius $r$, typically set based on image scale. The suppression condition is:

$$ p \text{ is retained iff } H(p) > H(q) \quad \forall q \in N(p) \cap P $$

where $H(\cdot)$ is the Harris response. This ensures sparsity and uniformity.

For feature matching, the Hamming distance ratio test can be analyzed probabilistically. Let $D$ be the random variable representing Hamming distance between descriptors. Assuming independent bits, $D$ follows a binomial distribution. The ratio test threshold of 0.7 is derived from empirical studies to balance precision and recall. For police UAV images, we can adjust this threshold based on image content, but in our experiments, the default value sufficed.

The PROSAC algorithm improves upon RANSAC by prioritizing high-quality matches. Let $M$ be the set of matches sorted by increasing ratio $r_i = d_{1,i} / d_{2,i}$. PROSAC samples from the top $t$ matches initially, where $t$ grows progressively. The probability of selecting an inlier from the top $k$ matches is higher, speeding up convergence. The homography estimation involves solving a linear system. For a match $(p_1, p_2)$, the equations are:

$$ x’ = \frac{a_0 x + a_1 y + a_2}{a_6 x + a_7 y + 1}, \quad y’ = \frac{a_3 x + a_4 y + a_5}{a_6 x + a_7 y + 1} $$

This can be linearized into the form $Ah = 0$, where $h$ is the vector of homography parameters. Least-squares solution with normalization ensures numerical stability.

In practice, implementing our method for police UAVs requires consideration of computational resources. Police UAVs often have onboard processors with limited power, so efficiency is key. Our improved ORB maintains low computational complexity. The mask-based detection adds a factor related to the number of mask positions, which is approximately $\lceil (M – m)/s_x + 1 \rceil \times \lceil (N – n)/s_y + 1 \rceil$. For typical police UAV images, this is manageable. Moreover, the non-maximal suppression step has complexity $O(|P| \log |P|)$ if implemented with spatial data structures like kd-trees, but since $|P|$ is controlled, it is efficient.

We also explored the impact of different mask sizes on performance. Through experiments, we found that $m = M/4$ and $n = N/4$ provide a good balance between coverage and computational cost. Smaller masks increase the number of positions but may detect fewer points per mask, while larger masks reduce positions but risk missing details. For police UAV images, which often have wide fields of view, this choice ensures thorough sampling.

Another aspect is the integration of temporal information. Police UAVs frequently capture video sequences, so registration can benefit from frame-to-frame consistency. Our method can be extended to video by using previous frames to predict feature locations, reducing search space. However, in this paper, we focus on single image pairs, as the foundation for more complex workflows.

The robustness of our method to noise is also noteworthy. Police UAV images may suffer from compression artifacts, sensor noise, or weather effects. The ORB descriptor, being binary, is somewhat resilient to noise, and our improvements enhance this further. The non-maximal suppression helps eliminate spurious points caused by noise, as they tend to have lower Harris responses.

For large-scale deployment, our algorithm can be parallelized. The mask-based detection is inherently parallel, as each mask position can be processed independently. This aligns well with GPU architectures, potentially speeding up registration for real-time police UAV applications. We envision integration into ground control stations where multiple images are stitched on-the-fly.

In conclusion, our improved ORB-based image registration method offers significant advantages for police UAVs. By ensuring uniform feature distribution through mask-based detection and non-maximal suppression, and by using PROSAC for robust matching, we achieve higher accuracy across various challenging conditions—scale and rotation, blur, viewpoint, and illumination changes. While slightly slower than standard ORB, it remains faster than SIFT and SURF, making it a practical choice for high-resolution police UAV imagery. Future work may involve adapting the method for real-time video stitching and incorporating deep learning techniques for further enhancement. The versatility of this approach supports the growing reliance on police UAVs for critical operations, enabling better situational awareness through seamless image integration.