Audio Engineering Society
Convention Paper Presented at the 117th Convention 2004 October 28–31 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society.
Audio Patch Method in MPEG-4 HE-AAC Decoder 1
1
1
Han-Wen Hsu , Chi-Min Liu , Wen-Chieh Lee , and
2
1. PSPLab, Computer Science and Information Engineering, National Chiao-Tung University, Hsin-Chu, 33050, Taiwan
[email protected] 2. InterVideo Digital technology (Shanghai) Co., Ltd.6F, Caohejing Software Mansion No. 461 Hongcao Rd., ShangHai, PRC (200233) ABSTRACT This paper extends the previous work on AAC to the HE-AAC. The audio path method consists of two individual parts, zero band dithering and high frequency reconstruction. The zero band dithering can conceal the fishy artifact in the low frequency part that is encoded by a convention AAC encoder. Furthermore, high frequency reconstruction can extend the audio obtained from the SBR to a full bandwidth signal. Intensive experiments have been conducted on various audio tracks to check the quality improvement and the possible risks in degrading the quality. The objective test measures used is the recommendation system by ITU-R Task Group 10/4.
1.
INTRODUCTION
An audio patch method on audio decoders without any prior information has been proposed to successfully enhance the MP3 and AAC tracks [1]. The audio patch method consists of two individual parts: zero band dithering and high frequency reconstruction. The paper considers the extension of the two modules to the HEAAC decoders. Under restriction of limited bit rate, to get the best perceptual quality, almost all audio compression codecs scarify the high frequency component of signals, and put all available bits to the low frequency component
that is more important for human hearing. However as the audio bandwidth is lower, the hearing perception become muffling. Under the tradeoff of bandwidth limiting, an advanced scheme referred to as “Spectral Band Replication (SBR)” [2]-[4] has been proposed to compress high frequency contents with little overheads, commonly about 1~3 kbits per second for each channel. With SBR module for high frequency contents, the AAC encoder can focus on compressing the low frequency part under a more sufficient available bit rate. The resulting scheme is referred to MPEG-4 High Efficient (HE) AAC or AACplus. Figure 1 illustrates an audio spectrum of a HE-AAC frame, where SBR applies on the range from 8k Hz to 16k Hz.
Liu et al.
Audio Patch Method in MPEG-4 HE AAC Decoder
Although SBR module can extend the bandwidth of the narrowband signal decoded from the AAC decoder, the frequency range is usually still lower than 16k Hz. The determining of the maximum frequency in SBR range is affected by the two factors. One is the available bits for the SBR module. When the range of SBR is longer, the number of required bits for the energy information, the ratio of tonal and noise-like component of the timefrequency grids [2] in SBR data will increase. Hence under the bit allocation policy between the convention AAC encoder and the SBR module, the bandwidth extension range is constrained. The other factor is due to the spectral band duplication policy used commonly in the SBR algorithm. Usually SBR extend the bandwidth of the signal decoded from the convention AAC decoder as the twice. Hence, if the original audio is harder to compress, the cut-off frequency in the AAC encoder will be set lower and then the final bandwidth of the decoded audio becomes limited. To enhance the audio quality, the paper extends the signal enhanced by SBR to a full bandwidth signal. On the other hand, the effect of the convention AAC encoder still affects the audio quality largely. The zero-band is a frequent artifact for an audio encoder at very low bit rates. As illustrated in Figure 1, a breakage spectrum presents in the low frequency part due to many zero bands. Therefore, a zero-band dithering method to conceal the artifact is required.
2.
FUNDAMENTAL CONCEPT OF SBR
The HE-AAC codec is the extension of the convention AAC codec by supporting of the SBR encoder. The basic principle of SBR is to reconstruct the high frequency spectral bands by replicating the low frequency spectral bands and rescale the spectral envelope of the reconstructed high frequency component closely to the original signal according to the priori information extracted by the SBR encoder illustrated as Figure 2. Because the conventional AAC encoder only needs to compress the low frequency parts (lower then a half of the original bandwidth) of the audio signal, a half of the original sample rate is enough to keep signal information according to Nyquist’s theorem. Hence, the signal, before being compressed by the convention AAC encoder, is down-sampled by a down sampler with factor two. In other words, the HEAAC codec is a dual rate system.
Figure 2 : Block diagram of HE AAC encoder.
Figure 1: A spectrum of a HE-AAC audio frame. Experiments are conducted on intensive audio tracks to prove the improved quality. Through both the subjective and objective measure, the method is verified to be able to improve the perceptive quality of HE-AAC encoded audio signals to approach the original AAC at 65% the bit rate. Especially, the objective measurement by the perceptual evaluation of audio quality system, which is the recommendation system by ITU-R Task Group 10/4 [5] has proven a significant quality improvement.
The SBR encoder is responsible to extract the high frequency information that includes the data of spectral envelope representation, the tonal-to-noise ratio, and other control parameters. To extract the information, the original full bandwidth signal is separated into 64 subbands by a complex-valued QMF (quadratic mirror filter). Furthermore, the subband signals covered by the SBR range are gathered by a time-frequency grid. The signal energy on each unit is encoded by the SBR encoder and ensures the SBR decoder be able to scale the spectral envelope of the reconstructed high-bands closely to the original signal. The SBR range consisting of high-bands is separated into several envelopes that are segmented by several time points recoded as control parameters. The determining of the time points depends on the stable situation of the signal content. Also, several sub-bands are combined as no-uniform bands from frequency aspect. The frequency segment points are decided by the frequency resolution table. By the segment of the two dimensions, the time-frequency grid is constructed. As the segment is refiner, the number of
AES 117th Convention, San Francisco, CA, USA, 2004 October 28–31 Page 2 of 11
Liu et al.
Audio Patch Method in MPEG-4 HE AAC Decoder
the unit is larger, and the more bits are required for high frequency component. Hence, the quality of HE-AAC tracks depends largely on the choice of different frequency resolution tables and the number of envelopes. On the other hand, to handle the inconsistence of the tonal-to-noise ratio of the original spectral bands and the replicated spectral bands, the adding of noise or sinusoids with suitable energy is also considered. The extraction of the ratio information is also based on units which are defined by a timefrequency grid with different resolutions. Figure 3 illustrates the block of the HE-AAC decoder. After being decoded by the AAC Core decoder, the low frequency signal is separated into 32 subbands by an analysis QMF. To follow, the HF generator reconstructs the high-band signals by duplicating the low-band signals that are processed through inverse filtering further. Then the envelope adjuster module modulates the spectral envelope in the high-bands, and adds additional component such as noise and tone according to the control information extracted by SBR encoder. Finally the subbands are synthesized by a synthesis QMF bank with 64 subbands to a time domain signal.
containing zero energy in the spectrum. The method adopts random noises to dither zero bands, and exploits the information of the quantization to extract the amplitude range of dithering noise. This section gives a review of the zero-band dithering algorithm.
Figure 4 : An audio spectrum containing several spectral nullities. This is also an example of a HE-AAC track only decoded by the convention AAC decoder without SBR. 3.1. Quantization Model in AAC For AAC encoder, the non-uniform quantizer is used to handle the weights of distortion effectively. Also every quantization band owns individual quantization step size ∆ q to fit different perceptually tolerable distortion allowed by psychoacoustic model. More specific, the quantization model introduced in MPEG-2/4 AAC standard [6] [7] is given as follow.
AAC-SBR Bitstream
Bitstream Parser AAC Core Decoder
Bitstream Demultiplexer
Huffman Decoding & Dequantization
32 channels
Analysis QMF Bank
HF Generator
Synthesis QMF Bank
Envelope Adjuster
3
X [k ]4 , ∆q
(1)
where X [k ] is a frequency line, S [k ] is the quantization value, and the operate int(.) denotes the nearest integer operation. 3.2. Zero Bands Occurring Condition In decoders, the encoded frequency signal X [k ] will be inversely quantized as X~ [k ] by (2). 4 ~ ~ X [k ] = S [k ]3 ⋅ ∆ q ,
64 channels
(2)
~ is defined as where ∆ ∆ q .In fact, the original X [k ] q value should be given as 4 ~ (3) X [k ] = R[k ]3 ⋅ ∆ q , where R[k ] is a real number, and 4 3
Output PCM Samples
Figure 3 : Block diagram of HE-AAC decoder. 3.
S [k ] = int
S [k ] = int(R[k ]) .
ZERO BAND DITHERING METHOD
The proposed dithering method in [1] patches spectral nullity, illustrated as Figure 4 , to ease annoying fishy noise. A zero band is defined as a spectral band
(4) From the definition of zero bands, the requantized ~ X [k ] in zero bands must be zero. From (2), it implies that the relative S [k ] must be also zero. Hence, from
AES 117th Convention, San Francisco, CA, USA, 2004 October 28–31 Page 3 of 11
Liu et al.
Audio Patch Method in MPEG-4 HE AAC Decoder
(4), R[k ] should be less than 1/2. Substituting the result to (3) illustrates that the occurring of zero bands is due to the relation X [k ] <
1 2
4 3
~ . ⋅ ∆q
4 3
~ ⋅ ∆q ,
(6)
where r should be between -1 and 1. Let X d [k ] be the dithering frequencies. The formula (6) suggests a well dithering model: 4 3
1 X d [k ] = ~ r⋅ 2
~ ⋅ ∆q ,
(7)
By substituting a random number ~ r of uniform distribution from -1 to 1 to r, X [k ] can be effectively simulated. However, the zero-band phenomenon is ~ , and hence it mainly due to an enormous step size ∆ p makes the simulated value of (7) excessive. To handle the risk, a gain g needs be considered to constrict the magnitude range of random noise. The modified model is given as. 1 X d [k ] = ~ r⋅ 2
4 3
Patch gain determining module
Zero band searching module
Zero band dithering module
X d [k ]
Figure 5: A block diagram of zero-band dithering algorithm. 3.4.1. Patch Gain Determining Module
According to (5), the original frequencies X [k ] in a zero band can be expressed as 1 2
X [k ]
(5)
3.3. Dithering Model
X [k ] = r ⋅
Patched Spectrum
Fracture Spectrum
~ ⋅∆q ⋅ g
For simplification, combining g and
(8) 1 2
4 3
Figure 6: The spectrum of the original audio signal.
Figure 7: The spectrum of the compression audio signal.
to a
parameter g p , say patch gain, the dithering model for zero bands is defined as (9). ~ X d [k ] = ~ r ⋅ ∆q ⋅ g p
The spectral contents of audio signals are constantly varying whether in time or in category. Therefore, to control a feasible distribution range of random noise by a fixed patch gain value is not effective. Especially, for a signal containing much tone component, it is very likely to harm the original quality due to an unsuitable patch gain. Figure 6~9 illustrates the phenomenon. By comparing Figure 6 and Figure 8, it shows that the random noise added destroys seriously the energy ratio of tone and noise components due to an excessive patch gain. Decreasing the patch gain can improve the problem as illustrated in Figure 9. Therefore, an adaptive mechanism to change patch gain is required.
Figure 8: The spectrum of the compression audio signal with zero band dithering that sets patch gain as ( 1 2 ) . 4 3
(9)
3.4. Dithering Algorithm This subsection presents the algorithm for zero-band dithering based on (9). The algorithm consists of three components that include patch gain determining module, zero band searching module, and zero band dithering module. At first, according to the content of the spectrum signal, patch gain determining module will adaptively choose a suitable value for patch gain. In turn, the searching module will detect where zero bands exist on the spectrum. Ultimately, the dithering module will patch the zero bands following the dithering model (9).
Figure 9: The spectrum of the compression audio signal with zero-band dithering that sets patch gain as 1/32. To measure the ratio of tone and noise components in a spectrum, this paper calculates the flatness degree by the ratio of arithmetic average and geometric average of the frequency magnitude means of the successive spectral bands. Assume that a spectrum is separated into M uniform spectral bands, and each band has m frequency lines. The flatness degree is calculated by the formula (10).
AES 117th Convention, San Francisco, CA, USA, 2004 October 28–31 Page 4 of 11
Liu et al.
Audio Patch Method in MPEG-4 HE AAC Decoder
i + N −1 N
Flatness Degree F =
where Sb =
1 N
∏S . b =i i + N −1 b =i
1 m −1 X [ j + m ⋅ b] m j =0
(10)
b
Sb
the
.
(11)
By the flatness degree F, we can change the patch gain dynamically as (12) 1 , for 0.9 ≤ F < 1 4 1 , for 0.0025 ≤ F < 0.9 4 2 patch gain g p = 1 , for 0.001 ≤ F < 0.0025 8 1 , for F < 0.001 8 2
(12)
.
3.4.2. Zero Band Searching Module A zero band is usually located over two or several quantization bands. Hence, besides searching where the zero bands are located, the relative quantization bands index q also need to be found out to compute respective
~ ∆ q . The block diagram of zero band
searching module is illustrated by Figure 10.
&Qc
%
!
"
!
$
Q′s quant. band
. . .
the
. . . The last zero band
Z s′
Qs quant. band
the
Qt
quant. band
The current zero band
Zs
Zt
the
Qc quant. band
. . .
k Ct
Figure 11 : The relative relation between the last zero band and the current zero band. Let Z s′ be the point that is exactly next the terminal point of the last zero band, and Q′s be the relative index of the quantization band where the terminal point of the last zero band locates. Figure 11 illustrates the relative relation between the last zero band and the current zero band. For searching Z s , we need to find out the first
frequency k such that X [k ] is zero from Z s′ to Ct , where Ct denotes the terminal point of Cut-off quantization band. If such k does not exist, it shows all the range has been searched, and hence the dithering processing is completed. To continue, for searching Z t , we need to find out the first frequency k such that X [k + 1] is not zero from Z s to Ct . If such k does not exist, then set Z t as Ct .On the other hand, similarly, we need find Qs between Q′s and Qc . Finally, Qt needs to be
X [k ] ' '
does not exist, it implies the processed time frame is silent and the dithering processing is skipped.
found between Qs and Qc .
#"
Zs
#"
Zt
#"
3.4.3. Zero Band Dithering Module
%
&Qs
'Z s
#"
%
&Qt
'Z t
#"
Z s , Z t , Qs , Qt Figure 10: The block diagram of zero band searching module. The searching module generates four indexes that are Z s , Z t , Qs and Qt .The four indexes denote the starting point and terminal point of a zero band, and the indexes of the relative quantization bands where Z s and Z t locate, respectively. Before searching the four indexes, the index Qc of cut-off quantization band should be searched. Cut-off quantization band means the eventual quantization band containing nonzero energy. If Qc
A zero band containing a few frequency lines needs not to be dithered. It is likely a normal situation, not due to abnormal artifacts. This paper uses the two conditions. First one is that the bandwidth BWZ of the zero band must more than 1/4 of the bandwidth of the first quantization band which is associated with the zero band. Second is that the zero band must has at least six frequency lines. If neither of the two conditions holds, the dithering processing is skipped. Pseudo Quantization Step Size in AAC Decoder On the other hand, in AAC encoder, a special Huffman codebook “ZERO_HCB” [6] [7] is used for a null quantization band (scalefactor band). The scalefactor is not transmitted for quantization bands which are coded with the Huffman codebook “ZERO_HCB” to save available bits. In other word, in AAC decoder, there is no information about quantization step size of a null quantization band. Under the situation, a pseudo step size for a null quantization band is required for the
AES 117th Convention, San Francisco, CA, USA, 2004 October 28–31 Page 5 of 11
Liu et al.
Audio Patch Method in MPEG-4 HE AAC Decoder
dithering processing. A suitable pseudo step size is given as (13) according to the step sizes of the two neighbor quantization band and referring to the distances to the null quantization band as inverse weights for linear combination.
~ ∆q =
d2 d1 ~ ~ ⋅ ∆ q1 + ⋅ ∆q2 d1 + d 2 d1 + d 2 .
(13)
where q1, q2 are the left and right neighbor quantization band indexes, respectively, and d i = q − qi + 1 for i =1,2.
Time index A SBR frame
0
1
…
Real part
31
0 1 2
E [n][k ] = exp
…
Subband index
Image part 0 1
Matrix E128×64
…
Also, the original content on a null quantization band is usually noise-like component of which envelope is relatively very lower than others. Hence, the patch gain for a null quantization band can be decreased to low the risk. The associated flow chart of the algorithm is illustrated by Figure 12.
domain, such as MDCT coefficients. However, the SBR algorithm is based on 64 complex-valued subband signals. Therefore, a modified method based on subbands needs be considered to handle the inconsistence. Figure 14 illustrates the synthesis procedure of the synthesis PQF bank in the HE-AAC decoder [2], in which every 64 subband samples are synthesized into 64 reconstructed PCM samples. The objective of the modified high frequency reconstruction method, as shown in Figure 15, is to reconstruct the high-bands over the SBR range to extend the audio bandwidth.
2
iπ 1 ⋅ k + ⋅ (2n − 255 ) 128 2
62 63
126 127 0
0
1
2
8
127 63
0
63
0
63 64
V FIFO Buffer
0
9
127 639
G vector
639
C window
63
0
639
W vector Reconstructed samples 0
0
+
+
+
+
+
+
+
+
+
=
63
63
Figure 14: Polyphase implementation of synthesis QMF bank in the HE-AAC decoder. 0
Figure 12: The flow chart of zero band dithering.
X [s ][t ]
31
0
0 1 2
Y [s ][t ]
31
0 1 2
…
… HFR
Cut-off subband kc
kc
…
… 62 63
Null subbands
62 63
Reconstructed high frequency subbands
Figure 15: Block diagram of high frequency reconstruction based on subband signals. Figure 13: The spectrum of the audio signal with zero band dithering corresponding to Figure 4. 4.
FILTERBANK-BASED HIGH FREQUENCY RECONSTRUCTION METHOD
The reconstruction method proposed in [1] and [8] is a method designed to apply on frequency transform
Let X [b][t ] be the bth subband signal at some time frame. To describe the envelope of a spectrum, based on the 64 subband signals, we define an envelope factor as the energy between the 32 time samples on a subband. More specific, the bth envelope factor is defined as (14) 2 31 31 . E [b ] = X [b ][t ] = {Re 2 ( X [b ][t ]) + Im 2 ( X [b][t ])} (14) t =0
t =0
AES 117th Convention, San Francisco, CA, USA, 2004 October 28–31 Page 6 of 11
Liu et al.
Audio Patch Method in MPEG-4 HE AAC Decoder
The modified method finds the envelope of the high frequency through the linear extrapolation of envelope factors with logarithm scale and lower than the reconstructed band, say kc , and then replicate low frequency subbands to high frequency fitting the envelope defined. In other word, the energy calibrations of the replicated subbands are in accordance with the extrapolated envelope factors. Figure 16 illustrates the concept.
Z i = Z i −1 ⋅ E [k c − i ] ; for i = 1, 2 , ...,
(18)
Vi = Vi −1 ⋅ E [kc − (N + 1 − i )] ; for i = 1, 2, ..., N2−1 .
(19)
The recursive forms in (18) and (19) can be derived as N −1 2
∏ E [k i =1
c
− i]
N −1 −i 2
N −1 2
= ∏ Zi
(20)
.
i =1
and N −1 2
∏ E[k − (N + 1 − i )] c
i =1
N −1 −i 2
N −1 2
.
= ∏Vi
(21)
i =1
Substituting (20) and (21) to (16) yields N -1 . 2
Reconstructed envelope factors
a opt =
…
12 ⋅ ln (N − 1)N (N + 1)
∏Z i =1 N -1 2
i
(22)
∏ Vi i =1
Using (22) to calculate a opt , it needs totally 2 N − 6
b
N
kc
multiplications, and only one logarithm and division operation. On the other hand, computing bopt needs a
cut-off subband
Figure 16: Linear extrapolation on the envelope factors with logarithm scale.
constant complexity due to Z N -1 ⋅ V N -1 ⋅ E [k c − 2
Least Squares Method by Linear Method
M = { ln(E[kc − N ]), ln (E[kc − N + 1]),..., ln(E[kc − 1])
}.
(15)
Assume L: ln(E [k ]) = aopt ⋅ k + bopt is the linear approximation with the least-square method on the N envelope factors. Then N −1 N +1 −i . 2 2 E [k c − i ] 12 ⋅ ln ∏ aopt = (16) (N − 1)N (N + 1) E [k − ( N + 1 − i )] i =1
c
2
N +1 2
] = ∏ E [k c − i] . N
(23)
i =1
E [k c − 1]
The spectral envelope is basically evaluated by the following theorem: Theorem Given a set M consists of N envelope factors with logarithm magnitude; that is
E[kc − 2] E [k c − 3] E [k c − 4] E [k c − N2−1 ]
12
(N − 1)N (N + 1)
÷
k c − N2+1
log
E [k c − ( N − 3)] E [k c − ( N − 2 )]
aopt
log
E [k c − N2+3 ]
E [k c − N2+1 ]
−
bopt
1 N
E [k c − ( N − 1)] E [k c − N ]
And ln
N
∏ E[k i =1
bopt =
N
c
− i] − kc −
. N +1 aopt 2
(17)
Furthermore, the complexity to calculate aopt is O (N 2 ) and bopt is O( N ) .
4.2.
.
Also, recursively define Vi as (19), where V0 =1.
log(E [b])
4.1.
N −1 2
Fast Computing Method
Assume N is odd integer and N>1. We recursively define Z i as (18), where Z 0 = 1.
Figure 17: Signal flow diagram of the fast computing method.
4.3.
Reconstruction Algorithm
This subsection presents the algorithm for high frequency reconstruction based on the linear model above. The algorithm consists of three components that include envelope extractor module, subband duplication module, and envelope adjustment module. The block diagram of the reconstruction algorithm is illustrated in Figure 18.
AES 117th Convention, San Francisco, CA, USA, 2004 October 28–31 Page 7 of 11
Liu et al.
Audio Patch Method in MPEG-4 HE AAC Decoder
Full Spectrum
Band-limited Spectrum
X [k ]
Envelope extractor module
Subband duplication module
Envelope adjustment module
Y [k ]
Figure 18: The block diagram of high frequency reconstruction method. At first, based on the low-bands, envelope extractor module will calculate a opt and bopt by fast computing method. In turn, the subband duplication module will generate high frequency by duplicating low-band signals. Ultimately, the envelope adjustment module will adjust the high frequency to fit the defined spectral envelope. On the other hand, the correct cut-off subband that is the ending subband of SBR range can be extracted from the HF generator module of the SBR decoder. The detail of the subband duplication module is as follow. Assume ke is reconstruction-ended subband. The module will duplicate a long piece of the low-bands which range is bw subbands to high-bands, where bw = min{N , ke − kc + 1} . More precise,
X [b ][t ] = X [b − bw][t ], for b = k c ~ k e .
(24)
Figure 19 illustrates a result after the module, where a spectrum with excessive energy is reconstructed on high frequency and hence envelope adjustment is necessary.
Figure 19: A spectrum with an unsuitable envelope on high frequency after subband duplication module. As mentioned above, the envelope adjustment module will adjust the energy of the reconstructed high frequency after the duplication module to fit the envelope defined. We assemble three subbands as an envelope adjustment unit. For an adjustment unit, the relative adjustment ratio α is defined as. αu =
where
Pu , for u = 0 ~ M − 1 . Su
(25)
Su is the energy of the replicated subbands on
Pu is the relative pseudo energy, and M is the number of adjustment unit. Su and Pu are defined
the uth unit ,
as (26) and (27), respectively.
Su =
k c + 3u + 2
Eu , for u = 0 ~ M − 1 .
(26)
b = k c + 3u
And Pu =
kc +3u + 2
exp
bopt + aopt k
, for u = 0 ~ M − 1
.
(27)
k = kc +3u
Furthermore, (27) can be reduced to (28). Pu =
exp
2 bopt + aopt k c
3 ⋅ exp
bopt
(
⋅ 1 + exp
aopt
+ exp
2 aopt
)⋅ (exp ) , if a 3 aopt u
opt
, if aopt = 0.
≠0 .
(28)
(28) suggests an efficient recursive computing method for Pu with constant complexity for any u>0. Pu = Pu −1 ⋅ exp
where the ratio
ρ , and P0
= Pu −1 ⋅ ρ , for u = 1 ~ M − 1 .
exp
3aopt
(29)
is referred to unit decay ratio
is defined as (30). exp opt
+ aopt k c
3 ⋅ exp
bopt
b
P0 =
3 a opt
(
⋅ 1 + exp
aopt
+ exp
2 aopt
), if a
, if aopt = 0.
opt
≠0 . .
(30)
The eventual adjusted high-band signal X ′[b ][t ] is given as (31). X ′[b][t ] = X [b ][t ] ⋅ α u , .
(31)
for b = kc + 3u ~ kc + 3u + 2, t = 0 ~ 31, and u = 0 ~ M − 1.
After the module, the reconstruction algorithm is completed.
Figure 20: The unsuitable spectrum in Figure 19 is adjusted after envelope adjustment module.
5.
EXPERIEMENT
This paper verifies the perceptual quality improvement by comparing the patched audio with the original CD quality audio. The perceptual quality is measured through the PEAQ (perceptual evaluation of audio quality) system [5]. The system includes a subtle perceptual model to measure the difference between two tracks. The objective difference grade (ODG) is the output variable from the objective measurement method. The ODG values should ideally range from 0 to -4, where 0 corresponds to an imperceptible impairment and -4 to an impairment judged as very annoying. The improvement up to 0.1 is usually perceptually audible. The PEAQ has been widely used to measure the
AES 117th Convention, San Francisco, CA, USA, 2004 October 28–31 Page 8 of 11
Liu et al.
Audio Patch Method in MPEG-4 HE AAC Decoder
Figure 21: The ODG range comparison of the Nero 6.3 AAC tracks, the Nero 6.3 HE-AAC tracks, and the Nero 6.3 HE-AAC tracks with audio patch under several bit rates AAC-SBR Bitstream AAC Kernel Decoder
SBR Decoder
Bitstream Parser
Frequency Line Decoder
Bitstream Demultiplexer
Zero Band Dithering
Huffman Decoding & Dequantization
Other Processing FilterBank
Analysis QMF Bank
HF Generator
Envelope Adjuster
High Frequency Reconstruction Synthesis QMF Bank Output PCM Samples
Figure 22: The diagram of Audio Patch Method incorporated into HE-AAC decoder. compression technique due to the capability to detect perceptual difference sensible by human hearing system. The HE-AAC tracks are prepared for bit rates
at 48k kbps, 64kbps and 80 kbps, and sample rate at 44.1 kHz. The twelve test tracks recommended by MPEG, as shown in Table 1, include the critical music balancing on the percussion, string, wind instruments, and human vocal. The HE-AAC encoder used to prepare the music tracks are from the Nero 6.3 [9]. The 48kbps HE-AAC, due to the policy of SBR algorithm designed by Nero 6.3, has always scarified the signal quality above 16k. On the other hand, when bit rate is more than 48kbps, the Nero 6.3 HE-AAC owns 21k Hz bandwidth. However, because the spectral nullity phenomenon occurring at low frequency part is very frequent, the tracks are suitable to verify the effective of the zero band dithering method. As illustrated in Figure 22, the two patch algorithms can be directly implemented on the spectrum lines or the subbands in the reconstruction of HE-AAC decoders. Figure 21 illustrates the ODG range comparison of the Nero 6.3 AAC tracks, the Nero 6.3 HE-AAC tracks, and the Nero 6.3 HE-AAC tracks with the enhancement of the audio patch method under several bit rates. For each statistics line, the top arrow represents the maximum ODG, the down cross represents the minimum ODG, and the middle square represents average ODG among the 12 test tracks .The order of the statistics lines follow as the ascendant order of the average ODG. Furthermore, Figure 23-25 illustrate the ODGs for the twelve tracks under different decoding processing as well as different bit rates.
AES 117th Convention, San Francisco, CA, USA, 2004 October 28–31 Page 9 of 11
Chi-Min et al.
Audio Patch Method in MPEG-4 HE AAC Decoder
From the test data of the 12 tracks, we found that no HE-AAC track losses the quality in ODG after the audio patch processing but can gain improvement even up to 1.59, 1.03, 0.85 in ODG at 48k, 64k and 80k bit rate, respectively. Furthermore, the average ODG gain is 0.83, 0.4 and 0.32, respectively. The result indicates that the audio patch technique can have almost no risk in improving the quality at the several available bit rates. Through the comparison, it tells that the objective quality of Nero 6.3 HE-AAC at 48k bit rate with the audio patch enhancement approaches to Nero 6.3 AAC at 80k bit rate, the objective quality of Nero 6.3 HEAAC at 64k bit rate with the audio patch enhancement approaches to both Nero 6.3 AAC at 96k bit rate and Nero 6.3 HE-AAC at 80k bit rate, and the objective quality of Nero 6.3 HE-AAC at 80k bit rate with the audio patch enhancement approaches to Nero 6.3 AAC at 112k bit rate. In other word, the method is able to improve the perceptive quality of HE-AAC encoded audio signals to approach the original AAC at 65% the bit rate.
6.
CONCLUSION
This paper has extended the previous work in [1] on AAC to the HE-AAC. This paper proposed a patch method without any prior information to patch the HEAAC audio signals to conceal the compression artifacts. The method consists of two individual parts. One is zero-band dithering that aims to patch spectral valley to conceal the fishy artifact in the low frequency part that is encoded by a convention AAC encoder. The other is high frequency reconstruction that can extend the audio obtained from the SBR to a full bandwidth signal. Experiments have been conducted on intensive audio tracks to prove the improved quality nearly without risks in degrading the quality. Through both the subjective and objective measure, the method is verified to be able to improve the perceptive quality of HE-AAC encoded audio signals to approach the original AAC at 65% the bit rate. Especially, the objective measurement by the perceptual evaluation of audio quality system, which is the recommendation system by ITU-R Task Group 10/4 has proven a significant quality improvement.
7.
8.
REFERENCES
[1] H.W. Hsu, C.M. Liu, and W.C. Lee, “Audio Patch Method in Audio Decoders—MP3 and AAC,” at the 116th AES Convention, Berlin, Germany, May 8~11, 2004. [2] ISO/IEC, “Text of ISO/IEC 14496-3:2001/FPDAM 1, Bandwidth extensions,” ISO/IEC JTC1/SC29/WG11/N5203, October 2002, Shanghai, China. [3] M. Dietz, L. Liljeryd, K. Kjörling, O. Kunz, “Spectral Band Replication, a novel approach in audio coding,” at the 112th AES Convention, Munich, May 10–13, 2002. [4] M. Wolters, K. Kjörling, D. Homm, H. Purnhagen, “Acloser look into MPEG-4 High Efficiency AAC,” at the 115th AES Convention, New York, USA, October 10–13, 2003. [5] ITU Radiocommunication Study Group 6, “DRAFT REVISION TO RECOMMENDATION ITU-R BS.1387Method for objective measurements of perceived audio quality”. [6] ISO/IEC, “Coding of Moving Pictures and Audio— IS 13818-7 (MPEG-2 Advanced Audio Coding, AAC),” Doc. ISO/IEC JTC1/SC29/WG11 n1650, Apr. 1997. [7] ISO/IEC, “Information Technology- Coding of audiovisual objects,”—.ISO/IEC.D 4496 (Part 3, Audio), 1999. [8] C.M. Liu, W.C. Lee, and H.W. Hsu, “High Frequency Reconstruction by Linear Extrapolation,” at the 115th AES Convention, New York, USA, October 10–13, 2003. [9] Nero, http://www.nero.com.
ACKNOWLEDGEMENTS
This work was supported by National Science Council under NSC91-2622-E009-003 and InterVideo Digital Tech. under 792171.
AES 117th Convention, San Francisco, CA, USA, 2004 October 28–31 Page 10 of 11
Chi-Min et al.
Audio Patch Method in MPEG-4 HE AAC Decoder Table 1: The twelve test tracks recommended by MPEG. Track
Figure 23 : The ODG of the HE-AAC tracks and APM audio under 48k bit rate.
Figure 24 : The ODG of the HE-AAC tracks and APM audio under 64k bit rate.
Signal Description Signal
Mode Time(sec) Remark
1 Es01 vocal (Suzan Stereo 10 (c) Vega) 2 Es02 German speech Stereo 8 (c) 3 Es03 English speech Stereo 7 (c) 4 Sc01 Trumpet solo Stereo 10 (d) and orchestra 5 Sc02 Orchestral piece Stereo 12 (d) 6 Sc03 Contemporary Stereo 11 (d) pop music 7 Si01 Harpsichord Stereo 7 8 Si02 Castanets Stereo 7 (a) 9 Si03 pitch pipe Stereo 27 (b) 10 Sm01 Bagpipes Stereo 11 (b) 11 Sm02 Glockenspiel Stereo 10 (a) 12 Sm03 Plucked strings Stereo 13 Remark: (a) Transients: pre-echo sensitive, smearing of noise in temporal domain. (b)Tonal/Harmonic structure: noise sensitive, roughness. (c) Natural vocal (critical combination of tonal parts and attacks): distortion sensitive, smearing of attacks. (d) Complex sound: stresses the Device Under Test. (e) High bandwidth: stresses the Device Under Test, loss of high frequencies, program-modulated high frequency noise. (f) Low volume testing.
Figure 25 : The ODG of the HE-AAC tracks and APM audio under 80k bit rate.
AES 117th Convention, San Francisco, CA, USA, 2004 October 28–31 Page 11 of 11