International Journal of Soft Computing

Year: 2011
Volume: 6
Issue: 3
Page No. 40 - 45

Fast Inter Mode Decision Algorithm for H.264/AVC Video Encoding System Based on Correlation of Macroblock

Authors : Byung-Gyu Kim, Badrul Hilmi and Hyo-Sung Kim

Abstract: To more effectively reduce temporal and spatial redundancy in the MPEG-4 Part-10 AVC/H.264, motion compensation uses variable block sizes and directional inter prediction investigates all available coding modes to determine the best one. However, these functions are performed for all variable block sizes, high complexity results due to the large number of combinations of coding modes. The researchers propose a fast inter mode decision to reduce the number of combinations of candidate modes using predictive data from macroblock correlations of adjacent frames. An adaptive thresholding scheme using the rate-distortion costs of neighboring blocks is applied for an early decision of the best macroblock in the inter-mode search process.

How to cite this article:

Byung-Gyu Kim, Badrul Hilmi and Hyo-Sung Kim, 2011. Fast Inter Mode Decision Algorithm for H.264/AVC Video Encoding System Based on Correlation of Macroblock. International Journal of Soft Computing, 6: 40-45.


In H.264/AVC, there are a total of 7 different block sizes (16x16, 16x8, 8x16, 8x8, 8x4, 4x8 and 4x4) that can be used for inter frame motion estimation/compensation (Wiegand et al., 2003). These different block sizes actually form a two-level hierarchy inside the MB. The first level comprises block sizes of 16x16, 16x8 or 8x16 and is called the Large Macroblock type. In the second level, the MB is specified as the P8x8 type where each 8x8 block can be one of the subtypes 8x8, 8x4, 4x8 or 4x4. These are referred to as the Sub-Macroblock type. The relationship between these different block sizes is shown in Fig. 1.

In addition, many natural video sequences contain stationary regions. In the H.264/AVC encoder, the RD cost to decide the best inter-prediction mode is computed as follows (Sullivan and Wiegend, 1998):

where, QP is the quantization parameter, λMODE is the Lagrangian multiplier and SSD is the sum of the squared differences between the original block and its reconstruction. R (s,c MODE|QP) is the number of bits associated with the mode currently selected for the MB.

Fig. 1: Different partitions in the MB; (a) Macroblock partition, (b) Sub-Macroblock partition

To find the best coding parameters for each macroblock, H.264/AVC reference software encodes all possible combinations of parameters and calculates the rate and distortion of a given macroblock for each combination. Thus, the encoder computes the RD costs of all possible coding options and chooses the coding mode of a given macroblock that has the minimum RD cost. This optimal decision causes the problem of a high degree of computational complexity because for each macroblock, the encoder must have information about the required bits and the resulting distortion of the current coding mode to choose the best coding mode. This information is available only after finishing the encoding process. Therefore, the current H.264/AVC reference software performs this complex process to find the best coding mode.

H.264/AVC also uses a 4x4 integer transform in which the RD cost of a macroblock is computed 16 times under the assumption that the RD cost is computed on 4x4 block units. For 8x8 sub-parti tions each sub-block can be motion-compensated independently with the variable block sizes of 8x8, 8x4, 4x8 and 4x4. Thus, for each 8x8 sub-block the RD cost is computed to determine the best block mode.

To choose the minimum RD cost of an 8x8 sub-partition, the RD cost of each sub-partition is calculated 16 times. Thus, the RD cost is computed 64 times for an 8x8 sub-partition mode for a given macroblock. In total, the RD cost is calculated 128 times for all variable block size modes of inter macroblocks. A crucial consideration for application of mode prediction for a fast mode decision is to make sure that the predicted mode has the smallest RD cost for a given MB. However, during experiments on various video sequences using H.264/AVC video coding, we observed that homogeneous regions prevail in video sequences. Thus, many parts of a video sequence only require a search with a large block type even in SKIP mode so using the sub-macroblock in the search process is not necessary. Use of the sub-macroblock type wastes computational time and effort.

Many fast mode decision algorithms have been proposed to reduce the computational complexity without any degradation in image quality. Jeon’s (Choi et al., 2006; Hilmi et al., 2010) selective intra-coding and early SKIP is based on the Average sum Boundary Error (ABE) and the Average Rate (AR), Jing and Chau (2004) fast inter-mode decision scheme uses both the frame difference and the MB difference. The method of Wu et al. (2005) uses the spatial homogeneity of a video object’s textures and the temporal stationary characteristics in video sequences. Salgado’s (Sullivan and Wiegend, 1998) method uses temporal correlation and a sequential mode search with a high speed-up ratio for the baseline profile in the encoding time. The efficient thresholding scheme of Kim (Zeng et al., 2009; Badrul Hilmi et al., 2010) for early termination of the mode search is based on the RD cost of the most correlated MB. The cost can be found using a simple MB tracking scheme with a P-16x16 block type in the previous frame. Chia’s (Grecos and Yang, 2007) hierarchical decision method reduces the number of search points in the transform process.

Kim and Kim (2008) and Kim et al. (2010) proposed two algorithms for a fast mode decision. The first algorithm reduces the inter-mode decision complexity using a direct prediction based on block correlation. The second algorithm is for B-picture coding group candidate modes that have similar average SAD values. Grecos and Yang (2007) algorithm uses two heuristics for prediction of a small set decidable mode then uses a set of skip mode conditions for P and B slices.

Zeng et al. (2009) proposed an algorithm based on the motion activity of macroblocks which were divided into classes from no motion to highly-textured regions with fast motion or with a scene change. Liu et al. (2009) proposed an efficient inter-mode decision based on motion homogeneity that evaluated a normalized motion vector field from motion estimation of a 4x4 block size. With three directional motion homogeneity measures derived from the normalized motion vector fi Wi, candidate inter-modes for each macroblock are determined and the RD cost for other modes is unnecessary, thus reducing calculations.

In this study, we propose a fast inter-mode decision to reduce the number of combinations of candidate modes using predictive data from macroblock correlations of the adjacent frame.

An adaptive thresholding scheme using the rate-distortion costs of neighboring blocks is applied for an early decision of the best macroblock in the inter-mode search process.


Early termination of candidate modes using a co-located macroblock: Strong correlations between adjacent inter-frames exist in video sequences. Some algorithms were proposed based on information from a co-located macroblock. Experiments have shown that a co-located macroblock is highly correlated with the current macroblock (Table 1).

Table 1: The conditional probabilities of the current MB with the previous MB (%)

Thus, correlations are especially strong for the SKIP (mode 0), 16x16 (mode 1) and P8x8 (mode 8) modes. To determine the initial search mode, the best mode information of a correlated MB in a time-successive frame is used. This direct prediction algorithm only uses a large size block mode (Modes 0, 1, 2, 3). Otherwise in the case of a sub-macroblock, a full mode search is used. This direct prediction algorithm for the initial search can be described as (Initial mode search):

Case 0, If the MBcorrelated: SKIP (mode 0) Set the initial MBcurrent: SKIP (mode 0)
Case 1, If the MBcorrelated: 16x16 (mode 1) Set the initial MBcurrent: SKIP (mode 0), 16x16 (mode 1)
Case 2, If the MBcorrelated: 16x8 (mode 2) Set the initial MBcurrent: SKIP (mode 0), 16x8 (mode 2)
Case 3, If the MBcorrelated: 8x16 (mode 3) Set the initial MBcurrent: SKIP (mode 0), 8x16 (mode 3)
Case 4, If the MBcorrelated: Sub-block Set the initial MBcurrent: Full mode search

The early termination method to determine the initial search mode is based on the experimental results in Table 1. In Case 0, if the correlated MB is encoded in SKIP mode, the probability that the current MB will be encoded in SKIP mode is 70.15%. Based on this high probability we can predict the candidate search mode.

Macroblock correlation and adaptive threshold of error correction: Early termination of the candidate mode decision process will cause a significant speed up but when we only use this process the image quality will decrease. To maintain good image quality we need an additional process to determine whether an additional search mode is needed. The researchers use an adaptive thresholding scheme based on the rate-distortion cost of the correlated MB.

Generally if an object’s motion is increased between successive macroblocks, the RD cost is also increased and vice versa. The RD cost represents the motion characteristics of the macroblock. For example, if the RD of the current macroblock is less than the RD cost of the co-located macroblock, the researchers can assume that the current macroblock contains less detail than the co-located macroblock and the probability that the current MB is encoded in the same mode as the co-located macroblock mode is high.

Fig. 2: Correlations with the current macroblock

Therefore, in this process if the rate distortion cost of the current macroblock is higher than the RD cost of the co-located macroblock, this macroblock requires an additional search mode. This process using properties of neighboring macroblock is shown in Fig. 2.

This additional search mode is divided into two groups. The first is the SKIP and 16x16 modes and the second is the 16x8 and 8x16 modes. For SKIP and 16x16, if the left MB and the top MB are a large block size then the additional search mode will be a Large Block type. Otherwise, the Fullsearch mode is used. In the next case for 16x8 and 8x16, we use the average of the RD costs of the co-located macroblock. The researchers check whether the left macroblock (current frame) and top macroblock (current frame) require an additional process. Then, for the 16x8 mode, if the top and left MBs are 16x8, then the additional search mode is SKIP, 16x16 or 16x8. The current RD cost is compared with the average RD cost and if it is still higher then the fullsearch mode is applied. This process is also used for the 8x16 mode. This additional search mode can be described as follows:


To verify the proposed scheme, a comprehensive set of experiments for a variety of video sequences with different motion characteristics was performed. The researchers used Jeon (Choi et al., 2006), Kim’s (Salgado and Nieto, 2006; Grecos and Yang, 2007) methods for an objective comparison of the encoding performance. These two methods provide good quality performance for a fast mode decision.

All test video sequences used were well known in video compression testing. They included various MPEG standard sequences with CIF and QCIF sizes. Analyses were performed with encoding frames = 100, RD optimization enabled, QP = 24, 28 and 32, IPPP sequence types in the main profile using CAVLC with a search range of MV = ±16 and the number of reference frames = 1. FME was used as a default and Hadamard transform was enabled.

JM 11.0 reference software of the JVT (Joint Video Team) was used as a reference code for evaluation of the encoding performance. We defined three measures for evaluating the encoding performance including average ΔPSNR, average ΔBits and an encoding-time saving factor ΔT. The average ΔPSNR was defined as the difference in decibels between the average PSNR of the proposed method and the corresponding value of another method. As performance improved, this criterion became smaller:

The average ΔBits was defined as the bit-rate difference as a percentage between the compared methods and the encoding-time saving factor was defined for a complexity comparison as:

Figure 3 shows RD curves for several sequences. Two algorithms were used to verify the performance for the fast mode decision schemes. The RDO performance of the proposed method and the performance of JM Reference software using full inter-mode search was similar. Kim’s method also showed similar performance compared with a full inter-mode search and also had a large speed up factor for the encoding time. Grecos’s algorithm provided worse image quality than both the proposed method and Kim’s method with a slower speed-up factor.

Fig. 3: RDO performance for IPPP sequences: (a) Foreman, (b) Mobile, (c) Paris and (d) Salesman sequences

Table 2 shows a comparison between all tested algorithms for IPPP sequences. For image quality, the average PSNR loss of the proposed algorithm was 0.067 dB. The PSNR loss was also stable in all video sequence types. Despite Kim’s and Grecos’s algorithms providing better image quality performance, this result for the algorithm is good considering the speed-up factor we achieved.

Table 2: Comparison results for IPPP sequences

The proposed algorithm causes a negligible bitrate increment in low motion video sequences while high motion sequences had a bit increment of >1%. The proposed algorithm had a 0.448% bit increment compared with Kim’s and Grecos’s algorithms and Grecos’s method exhibited a small bitrate saving for all video sequence types.

The last performance evaluation in the fast mode decision is the speed-up factor or time saving for the encoding process. Good speed up performance should not affect the other quality factors, meaning that a method can achieve a minimum image quality loss and a negligible bitrate increment with a good speed-up factor. The proposed method achieved an average time saving of 79.07% with little image quality loss and a negligible bitrate increment. The proposed method had better speed-up performance than both Kim’s method (73.58% on average) and Grecos’s method (52.128% on average).

The proposed algorithm exhibited a small quality loss (-0.067 dB) and a small bit increment (0.448%) with an average speed-up factor of 79.077%. Degradation in both PSNR and the bit rate were slightly more in comparison to Kim’s method. The proposed algorithm achieved a speed-up factor of >6% compared to Kim’s method. The proposed method reduced the encoding time with a negligible loss in image quality and bit-rate performance. Future research will focus on an adaptive search range control mechanism to obtain a larger speed-up factor while maintaining a small quality loss. The magnitude of the motion vector is inclined to be larger for the sub-block type (P8x8) due to a detailed texture and partial motion components. Thus, the search range can be controlled using an adaptive mathematical mode. Use of this characteristic can increase the speed of the encoding system.


Researchers proposed a fast block mode decision algorithm for the H.264/AVC video standard based on two-step processing. First, the proposed fast inter-mode decision algorithm causes an early termination of the search process based on correlations of the macroblock. Adaptive thresholding is used to provide an additional search process when necessary to achieve better image quality. Based on the rate-distortion curve, the proposed method shows performance similar to the full search method by reducing the total encoding time with little image quality degradation and a small bit increment. A 79.07% time saving factor was achieved with a 0.448% bit increment and a 0.067 dB PSNR loss in image quality.


This research was supported by the Sun Moon University Research Grant of 2009.

Design and power by Medwell Web Development Team. © Medwell Publishing 2023 All Rights Reserved