An H.264/AVC High422 Profile and MPEG-2 422 Profile Encoder LSI for HDTV Broadcasting Infrastructures Koyo Nitta, Mitsuo Ikeda, Hiroe Iwasaki, Takayuki Onishi, Takashi Sano, Atsushi Sagata, Yasuyuki Nakajima, Minoru Inamori, Takeshi Yoshitome, Hiroaki Matsuda, Ryuichi Tanida, Atsushi Shimizu, Ken Nakamura, and Jiro Naganuma NTT Cyber Space Laboratories, Nippon Telegraph and Telephone Corporation 1-1, Hikarinooka, Yokosuka, Kanagawa, 239-0847, Japan phone: +81-46-859-4201, fax: +81-46-859-4205, email:
[email protected] Abstract An H.264/AVC encoder LSI (named SARA/E) that supports High422 profile, as well as 422 profile of MPEG-2, has been developed for HDTV broadcasting infrastructures. It contains 257GOPS motion estimation and compensation (ME/MC) engines with search ranges of -271.75 to +199.75 (H) / -109.75 to +145.75 (V), which can utilize almost all H.264/AVC ME/MC tools, multiple reference frame, variable block size, 1/4-pel prediction, macroblock adaptive field/frame prediction, temporal/spatial direct mode, and weighted prediction. Our evaluations show that it can encode fast moving scenes with 1.2 to 1.7dB higher than the JM. It was successfully fabricated in a 90nm 9level metal CMOS technology. It integrates 140 million transistors. Keywords: H.264/AVC, encoder, MPSoC, ME/MC and HDTV
Host HostProcessor Processor
Communication Data
TRISC TRISC C-CORE
M-CORE MRISC MRISC Video Data
IR IR MBP MBP RIT RIT
VIF VIF
CRISC CRISC Audio/user Data
TME TME
FME FME
SME SME
IPD IPD
MUX MUX EC EC
LF LF
TS
TQ TQ
MIF MIF
MDT MDT
From/to Upper chip
From/to Lower chip
TRISC: Top-level RISC VIF: Video interface IR: Image reduction MBP: Macroblock-based preprocess RIT: Recording information transform
eDRAM eDRAM
Mobile Mobile DDR DDR MRISC: M-CORE RISC TME: Two-pel Motion Estimation FME: Full-pel Motion Estimation SME: Sub-pel Motion Estimation CRISC: C-CORE RISC
EC: Entropy Coding LF: Loop Filter MUX: Multiplexer MDT: Multi-chip Data Transfer MIF: Memory Inter face
Introduction Fig. 1. The block diagram of the SARA/E. The H.264/AVC [1] will play an important role in the field of HDTV broadcasting infrastructures, like DVB-H in Europe, video coding cores (M-CORE and C-CORE), a video interface ISDB-T in Japan, and US-ATSC. There are many professional (VIF), pre-analysis engines (IR, MBP, and RIT), a multiplexer applications, such as interruption, contribution and distribution. (MUX) that can concatenate bitstreams from other chips, a For encoder LSIs used in these systems, 1) a 4:2:2 chroma multi-chip data transfer (MDT) that can send/receive image format support for material, 2) repetitive transcoding for data from/to other chips, a memory interface (MIF), and editing, and 3) tandem (2-passed) encoding for high image eDRAMs that can reduce the bandwidth drastically (80%). quality are indispensable. Moreover, a wide ME/MC with Each of the M-CORE and the C-CORE has a 32bit RISC high precision and advanced mode decisions are needed to processor (MRISC and CRISC, respectively). The M-CORE encode a variety of scenes efficiently. Although several has triple ME/MC engines (TME, FME, and SME), an intra consumer LSIs have already been developed [2-5], it is hard to prediction (IPD), and a transform and quantization (TQ) as implement an professional HDTV H.264/AVC encoder into a application-specific hardware modules. An entropy coding single chip even with a 90nm technology, because (EC) and a loop filter (LF) are in the C-CORE. broadcasting image quality of HDTV (1920x1080, 30fps) with the high-end functions could not be achieved without 1.5TOPS B. ME/MC Architecture The architecture of the TME, which can execute a telescopic ME/MC and 268GB/s bandwidth. Therefore, we have developed a professional H.264/AVC video encoder LSI, search (TS) [7], is shown in Fig. 2. Four PE Array Groups SARA/E, that can be configured with multi-chip for HDTV, (PAGs) corresponded with four 8x8 blocks in an MB work in and that has 257GOPS ME/MC engines with wide search parallel. Two types of parallelism are additionally introduced ranges and 72Mbit eDRAMs for bandwidth reductions. It is in each PAG to realize a wide search range. First, each PAG the successor to our previous MPEG-2 422P@HL CODEC has twin PE arrays that search left and right half of the search range. Secondly, a 4x4 systolic array (SA) in a PE array is chip (VASA) [6]. divided into two 4x2 SAs. It takes 16 cycles from the start of one-step search for a 4x4 SA to output the first SAD. The next Architecture step search cannot start until the previous search results are A. System Architecture Fig. 1 shows the block diagram of the SARA/E. The fixed in the TS. Two 4x2 SAs can make the start-up cycles MPSoC chip consists of a 64bit RISC processor (TRISC), two half.
978-1-4244-1805-3/08/$25.00 © 2008 IEEE
2008 Symposium on VLSI Circuits Digest of Technical Papers
Authorized licensed use limited to: NTT Yokosuka. Downloaded on March 12, 2009 at 06:02 from IEEE Xplore. Restrictions apply.
106
8x8 block #0 (reduced to 4x4)
Encoding param. MVs stat. to from MRISC MRISC
PAG #0
TME
4x4 PE Array
Left half
For left half
Current MB
Neighbor MVs from MRISC
Image quality comparison between SARA/E and JM12.4
Right half
For right half
MRISC I/F
Search range
PSNR (dB)
Controller (Sequencer)
Memory Read
With 32b x 128w inst. memory
Bank and Addres s
Pixels of current MB and search area
Eight 8x8 block SADs for every cycle
MVs and SADs
PMVs
PAG #0
SRAM
PMV Calc.
One 4x4 systolic array
t
PAG #1
593Kb for search area, current MB, and MVs
Pixel Arrangement PAG #2
MIF I/F
JM12.4
2
3
4
5
6
7
8
The SARA/E also has advantage of an average image quality with 0.3dB. This means that it can encode a variety of scenes effectively.
ave.
Scene id.
16 cycles
With lambda func.
Fig. 4. Image quality evaluations.
Two 4x2 systolic arrays
PAG #3
For fast moving scenes, 1.2 to 1.7dB gains for the ME/MC engines of the SARA/E.
SARA/E
1
MV Eval.
With on-chip padding
MVs
40 38 36 34 32 30 28 26 24 22 20
t
4x4 PE Array +
MVs
PAG: PE Array Group
58 cycles
Pixels of current MB and search area from MIF
Fig. 2. The architecture of the TME. Encoding param. from MRISC
SME
Neighbor MVs from MRISC
MVs and MB mode
Controller Modes
MVs
ME-SIMD
MC-SIMD
16-PA
SRAM
16-PA
SATD
8-PA
16-PA
514Kb
PA: PE Array
Flexible datapath for various modes
Memory I/F
Cur. and Ref. pixels and local decoded image from/to MIF
Local decoded image to IPD
Pre- and Post TQ images from/to TQ
Fig. 5. A micro photograph of the SARA/E. Table 1. The chip specifications
SME Pipeline Schedule H Q L0
H
L0
Q 16x16 ①
16x16 ②
1 1 L0 6 6 16x16 x x ③ 8 8 L1 ③ ③
16x16 ③
ENG8A ENG8B ENG8C ENG8D
SATD
H
L1 L0
L1 L0
L1 L0
L1
L1
8x8 ⓪
L0
L0
L1 L0
L0
L1 L0
8x16 ⓪
8x16 ⓪
8x16 ②
8x16 ②
L1 L0 L1
8x8 ⓪
8x8 ⓪
L1 L0
16x8 (0)
t
H Q
L0
L1
L1 L0
L1 L0
L1 L0
L1
L1
8x8 ①
L0 L1 L0 L1
8x8 8x16 (0) ①
8x16 (0)
L0
L1 L0
L0
L0
L0
L0
L1 L0
L1
L1
L1
L1
8x16 8x16 ① ⓪ ① 8x16 ③
L1
16x8 (0)
H Q
L0
1 1 L0 6 6 16x16 x x ② 8 8 ENG16D ① ① L1
L1 L0
MC-SIMD
H Q L0
1 1 L0 16x16 6 6 16x16 x x ⓪ ① 8 8 L1 L1 ② ②
ENG16C
ME-SIMD
Q
1 1 L0 6 6 16x16 x x ⓪ 8 8 ENG16B ⓪ ⓪ L1 ENG16A
8x8 ①
8x16 ③
8x8 ②
8x8 ③
Settling complex data dependence and realize various modes supported
L1
16x8 (1)
8x8 8x8 8x8 8x8 8x8 8x8 8x16 (1) ② 16x16 ③ ⓪ ① ② ③ Di Di Di Di
16x8 (1)
8x8 8x8 8x8 8x8 8x8 8x8 8x16 (1) 16x16 ⓪ ① ② ③ ② ③ Di Di Di Di
InterBest Y
InterBest C
6
Fig. 3. The architecture of the SME.
Technology
90-nm 9-level metal CMOS
Number of transistors
140 million transistors
Die size
11.85mm x 11.85mm
Clock frequency
200MHz (max)
Supply voltage
Core: 1.2V / Mobile DDR: 1.8V / eDRAM: 2.5V / I/O: 3.3V
Power consumption
3.0W
Package
625-pin FCBGA (21mm x 21mm)
Memory
Fig. 3 shows the SME that performs the 1/2-, 1/4- pel ME and MC. It consists of two kinds of SIMD processors: one for ME and the other for MC. The ME-SIMD has four 16-PE arrays and four 8-PE arrays to calculate SADs of variable block size, 8x8, 8x16, 16x8 and 16x16. The MC-SIMD consists of a 16-PE array and executes various operations with flexible datapath, bi-directional prediction, temporal-/spatial-direct modes, SAD/SATD calculations, inter/intra decisions, and so on. 4:2:2 formats can be supported by changing the MC-SIMD’s programs. All executions of SME for an MB are carefully scheduled in consideration of the complex data dependency. Evaluations and Implementation Fig. 4 shows image quality evaluations. For fast moving scenes, our chip has 1.2 to 1.7dB gain compared to the JM12.4. This is because the ME/MC engines can find better MVs and thus the SARA/E can encode a variety of scenes efficiently. A micro photograph of the SARA/E is shown in Fig. 5, and the chip specifications are summarized in Table 1. It was successfully fabricated in a 90nm 9level metal CMOS technology. It integrates 140 million transistors. The chip can encode D1 (720x480, 30 fps) in real time. With multi-chip configurations on a post-card size board, it can encode full HDTV (1920x1080, 30fps). The SARA/E will be a key device for implementing various professional H.264/MPEG-2 applications for future broadcasting infrastructures.
978-1-4244-1805-3/08/$25.00 © 2008 IEEE
Video
eDRAM: 72Mbit, external: 512Mbit (32-bit width) Mobile DDR Profile
H.264: Main / High / High422 (8bit only), MPEG2: Main / 422P
Level
H.264: 3.0 / 4.0 / 4.1, MPEG2: ML / H14L / HL
Resolution and video rate
Single chip: 720 x 480 at up to 30fps Multiple chip: 1920 x 1080 at up to 30fps
Coding structure
Field, Frame, PAFF, MBAFF
Motion estimation
Search range: -271.75 / +199.75 (H), -109.75 / +145.75 (V) Max. number of reference frames: 4 Supported block size: 8x8, 8x16, 16x8, 16x16 Weighted prediction mode: explicit Direct mode: spatial, temporal
Pre-processing
Adaptive temporal and spatial filter Macroblock-based feature extraction
Transcoding
Using mole information or our original information
References [1] [2] [3]
[4]
[5] [6]
[7]
ISO/IEC 14496-10:2003, “Information technology – Coding of audio-visual objects – Part 10: Advanced Video Coding,” Dec. 2003. Y. W. Huang, et al., “A 1.3TOPS H.264/AVC Single-chip Encoder for HDTV Applications,” ISSCC Dig. Tech. Papers, pp. 128-129, Feb. 2005. H. C. Chang, et al., “A 7mW-to-183mW Dynamic Quality-Scalable H.264 Video Encoder Chip,” ISSCC Dig. Tech. Papers, pp. 280 - 281, Feb. 2007. H. Mizosoe, et al., “A Single Chip H.264/AVC HDTV Encoder/Decoder/Transcoder System LSI,” ICCE Dig. Tech. Papers, pp. 1 – 2, Jan. 2007. Z. Liu, et al., “A 1.41W H.264/AVC real-time encoder SoC for HDTV1080P,” IEEE Symp. VLSI Circuits, pp. 12 – 13, Jun, 2007. H. Iwasaki, et al., “Single-Chip MPEG-2 422P@HL CODEC LSI With Multichip Configuration for Large Scale Processing Beyond HDTV Level,” IEEE Trans. VLSI Systems, vol. 15, pp. 1055 – 1059, Sep. 2007. K. Suguri, et al., “A real-time motion estimation and compensation LSI with wide-search range for MPEG-2 video encoding,” J. Solid-State Circuits, vol. 31, pp. 1733 – 1741, Nov, 1996.
2008 Symposium on VLSI Circuits Digest of Technical Papers
Authorized licensed use limited to: NTT Yokosuka. Downloaded on March 12, 2009 at 06:02 from IEEE Xplore. Restrictions apply.
107