MINISTÉRIO DA EDUCAÇÃO UNIVERSIDADE FEDERAL DE PELOTAS PROGRAMA DE PÓS-GRADUAÇÃO EM COMPUTAÇÃO
Application-driven temperatureaware solutions for video coding Professor: Daniel Palomino
[email protected]
Prof. Daniel Palomino
Quem sou eu? • Formação – Bacharel em Ciência da Computação (UFPel) – Mestre e Doutor em Ciência da Computação (UFRGS) • Estágio sanduíche em Karlsruhe, Alemanha (KIT)
• Atualmente – Professor no Centro de Engenharias (UFPel) – Pesquisador no Video Technology Research Group (ViTech)
Prof. Daniel Palomino
2
Outline • • • • • • •
Introduction State-of-the-art Video coding overview Temperature methodology Application-driven temperature-aware solutions Conclusions and future works Publications
Prof. Daniel Palomino
3
Introduction • Video services have widely spread in the past years • According to [CISCO, 2016] mobile video services will consume 75% of global internet traffic by 2020
• Video coding plays an important role • New standards have been released to provide high coding efficiency • Like HEVC (High Efficiency Video Coding) • However, computational complexity of video coding process increases
Prof. Daniel Palomino
4
Introduction • The high complexity of HEVC is well complemented by continuous technology scaling and microarchitecture advancements – Provided high performance computing systems – Many processing cores running in parallel – High frequencies • However, the expected voltage scaling did not followed the transistor shrinking at the same rate • This resulted in high power densities and consequently high on-chip temperatures (hotspots)
Prof. Daniel Palomino
5
Introduction • High temperature effects – Higher cooling costs • Area and power
– Chip reliability and lifetime • Most aging effects are aggravated at high temperatures • For instance, NBTI and HCI cause threshold voltage elevation • Electromigration, stress migration andisdielectric • Therefore, temperature management/reduction one of thebreakdown primary design objectives especially for embedded systems that have limited power and area to employ complex thermal solutions. – Spatial/temporal variations on temperature also have
negative impact on reliability and performance
Prof. Daniel Palomino
6
Introduction • Video processing is an important application domain with respect to temperature – Very data and compute intensive – Computational complexity drastically increases with new video coding standards (like HEVC) and trends of Multiview and 3D videos – High workload variations translate to temperature variations • Spatial/temporal temperature gradients
• We present a set of designed thermal solutions targeting video coding using • Application-specific characteristics • Video properties
Prof. Daniel Palomino
7
State-of-the-art • There are lots of literature works dealing with temperature issues at different computing stack levels – Hardware-level – System-level
• Traditional techniques to reduce temperature – Handled at package level by heat sinks and coolers – Design-time techniques (HUNG, ADDO-QUAYE, et al., 2004) (ZHANG, OGRENCI-MEMIK, et al., 2015)
Prof. Daniel Palomino
8
State-of-the-art • In this thesis, the main focus is Dynamic Temperature Management (DTM) – Employed in most of literature works to deal with temperature problems at run-time – DTM use diverse low power techniques to control temperature • Task Migration • Dynamic Voltage and Frequency Scaling (DVFS) • Clock/power gating
Prof. Daniel Palomino
9
State-of-the-art • At system-level – Scheduling using task migration (Multi-core scenario) • Temperature history-based prediction • Hardware performance counter prediction
– Works • • • • • •
(YEO, LIU and KIM @ ISLPED’08) (COSKUN, ROSING and GROSS @ ICCAD’08) (COSKUN, ROSING, et al. @ TVLSI’08) (COSKUN, ROSING and GROSS @ TCAD’09) (KUMAR, SHANG, et al., TCAD’08) (CHO, KERSEY, et al., TCPMT’13)
Prof. Daniel Palomino
10
State-of-the-art • At hardware-level – Run-time hardware configuration • Model predictive controllers • Dynamic Voltage and Frequency Scaling (DVFS) • Clock gating
– Works • • • • • • • • • •
(COSKUN, ROSING and GROSS @ ICCAD’08) (COSKUN, ROSING, et al. @ TVLSI’08) (COSKUN, ROSING and GROSS @ TCAD’09) (KUMAR, SHANG, et al., TCAD’08) (FISHER, CHEN, et al. @ JSA’11) (EBI, FARUQUE and HENKEL @ ICCAD’09) (EBI, KRAMER, et al. @ CODES+ISSS’11) (ZANINI, ATIENZA, et al. @ ECCTD’09) (BARTOLINI, CACCIARI, et al. @ DATE’11) (BARTOLINI, CACCIARI, et al. @ TPDS’13)
Prof. Daniel Palomino
11
State-of-the-art Single-core
Hardware-level
Multi-core
(FISHER, CHEN, et al., 2009) (EBI, FARUQUE and HENKEL, 2009) (EBI, KRAMER, et al., 2011) (ZANINI, ATIENZA, et al., 2009) (BARTOLINI, CACCIARI, et al., 2011) (BARTOLINI, CACCIARI, et al., 2013) (YEO, LIU and KIM, 2008) (COSKUN, ROSING and GROSS, 2008) (COSKUN, ROSING, et al., 2008) (COSKUN, ROSING and GROSS, 2009) (KUMAR, SHANG, et al., 2008) (CHO, KERSEY, et al., 2013)
System-level
Application-level
Prof. Daniel Palomino
12
State-of-the-art • Summary of DTM techniques – All the mentioned works sacrifice system performance in exchange of better thermal profiles – None of these works consider application characteristics on designing the thermal solutions – The performance degradation may be intolerable for some applications (real time applications) – The system-level schedulers consider that workloads of different threads are always the same • Video coding is an example where workloads can vary abruptly between threads and along time • Temperature history based schedulers may fail on predicting future application behavior
Prof. Daniel Palomino
13
State-of-the-art • DTM techniques for video coding/decoding – Take multimedia application general characteristics to perform better temperature management – (SRINIVASAN and ADVE @ ICS’03) – (LEE, PATEL and PEDRAM @ ISLPED’06) – (LEE, PATEL and PEDRAM @ TVLSI’2008) – (YEO and KIM, 2008) – (MARCU, MILOS and TUDOR, 2010) – (FORTE and SRIVASTAVA, 2010) – (MIRTAR, DEY and RAGHUNATHAN, 2012)
Prof. Daniel Palomino
14
State-of-the-art • Summary of DTM for video coding – They do not account for the application-specific characteristics and video content properties • Which may provide a potential for more efficient temperature management for video coding
• In order address the thermal related challenges there is a need for application– All these works target for old video coding standards (such driven thermal management solutions for efficient thermal control of video coding as MPEG-2 and H.264/AVC) systems while providing high video quality – itTherefore, may the notrecent be efficiently appliedvideo highcoding complex • Also, is important they to consider and more complex new(like coding standards HEVC)tools • Finally, application-specific characteristics and video content properties can be used to improve temperature profiles of video coding systems
Prof. Daniel Palomino
15
Goals • We address these goals with the following contributions 1. Thermal analysis and workload distribution on video coding 2. Relationships between video coding characteristics and video content properties with temperature 3. Application-driven dynamic thermal management for video encoders 4. Thermal optimization using adaptive approximate computing for video coding 5. Application-driven temperature-aware scheduling of multithreaded workloads on multi-core systems Prof. Daniel Palomino
16
Video coding overview • The main goal of video coding is to represent the high amount of data of digital videos with less information as possible – Reducing data redundancies (spatial, temporal, entropic) Prediction Spatial
Residual
Entropy
Transforms
CAVLC
Temporal Input Video
Quantization
CABAC
Coded Video
Inter-view
Prof. Daniel Palomino
17
Video coding overview • Motion estimation – Find the best representation of the current block being encoded on the temporally neighboring frames Reference frames list
Search Area Current block Current frame
Reference frame Current block
Similarity value =
Candidate block
–
Similarity criterion
𝑁
𝑚𝑖𝑛
𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝐵𝑙𝑜𝑐𝑘 − 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝐵𝑙𝑜𝑐𝑘𝑖 𝑖=1
Prof. Daniel Palomino
18
Video coding overview • HEVC standard – State-of-the-art video coding standard – New coding tools – Coding Tree Unit (CTU) based structure • Can be recursively divided in smaller Coding Units (CU) • This high number of possibilities to encode one CTU highly improves the HEVC coding efficiency in terms of bit-rate and visual quality when compared to older standards • However, the computational complexity associated with processing all CTUs considering all CU sizes highly increases
Prof. Daniel Palomino
19
Video coding overview • Opportunities of temperature optimization for video coding – – – – –
Application natural resilience to error Application workload dependent on content High configurability Trade-off between temperature and compression efficiency Video coding parallel tools are not efficient on distributing workload
Prof. Daniel Palomino
20
Temperature Methodology • In this work we used three different methodologies to collect temperature results – IR-Camera setup – Tool chain setup – DTS setup
Prof. Daniel Palomino
21
Temperature Methodology • Infra Red Camera setup (CES-KIT, 2013) Linux Ubuntu kernel
Voltage supply
- accuracy of ±1 °C - spatial resolution of 50 µm per pixel - 50 Hz
IR Camera
Water-cooling unit to cool down the thermoelectric device
Thermal map CPU chip
1.8 GHz Intel Atom 45nm dual-core processor
Prof. Daniel Palomino
22
Temperature Methodology • Tool chain setup Temperature methodology using tool chain GEM 5 simulator Application
core
core
Cores usage statistics
McPAT power simulator
core
Thermal profiles
power traces of cores
HotSpot thermal modeling tool
Prof. Daniel Palomino
23
Temperature Methodology • Digital thermal sensor – Presented in most of modern processor – Independently real time measure of cores temperature • • • •
Intel I7 processor Linux monitoring sensors software The video coding application is restricted to one core Temperature readings are performed every 500 milliseconds
Prof. Daniel Palomino
24
Temperature Methodology • Video coding experiments methodology – All experiments used the HEVC standard – All sequences are encoded using the HEVC test model (HM) software – Recommendations of standardization committee (BOSSEN, 2012) Sequences Resolution (pixels) BQMall BasketballDrill RaceHorses Keiba PartyScene BQTerrace BasketballDrive Cactus
832x480 832x480 832x480 832x480 832x480 1920x1080 1920x1080 1920x1080
Prof. Daniel Palomino
25
Temperatura-aware solutions • Application-driven dynamic thermal management for HEVC • Thermal optimization using approximate computing • Application-driven thermal-aware scheduling
Prof. Daniel Palomino
26
App-driven DTM for video coding • Application-driven dynamic thermal management for HEVC (using infra-red camera) – Thermal analysis of HEVC • Different sequences (content properties) • Different coding parameters
– Application-driven DTM for video coding • Application-level temperature prediction • Application-level thermal management
Prof. Daniel Palomino
27
App-driven DTM for video coding • Thermal analysis for different sequences 55 50 45
RaceHorses
40 2500
Temperature (°C)
60
2550
peak2
55
50 peak1 45 40 2500
RaceHorses (1.8 GHz) 2550 peak1
53.9 ºC
BQMall (1.35 GHz)
2600 Time (sec)
2650
peak2
2600 Time (sec)
BQMall 2650
62 Temperature (°C)
Temperature ( C)
60
2700 62 60 58 56 54 52 50 48 46 44
2700
peak2
57 52
47
peak1
RaceHorses
42 1880
BQMall
1930
1980 Time (sec)
C C C C C C C C C C
peak1
56.4 ºC
peak2
2030 62 60 58 56 54 52 50 48 46 44
C C C C C C C C C C
5 ºC higher Prof. Daniel Palomino
28
App-driven DTM for video coding • Thermal analysis of different sequences – Efficiency and complexity of video coding is driven by video properties – Complexity classification (SHAFIQUE @ DAC’12) • Texture intensity using variance of luminance samples High
1 𝑣𝑓 = 𝑛×𝑚
𝐶𝑓 =
𝑛×𝑚
𝜌𝑖 − 𝜌𝑎𝑣𝑔
2
𝑖=0
Low
𝑙𝑜𝑤
𝑖𝑓 𝑣𝑓 ≤ 𝑇ℎ𝑣1
𝑚𝑒𝑑𝑖𝑢𝑚
𝑖𝑓 𝑇ℎ𝑣1 < 𝑣𝑓 ≤ 𝑇ℎ𝑣2
ℎ𝑖𝑔ℎ
𝑖𝑓 𝑣𝑓 > 𝑇ℎ𝑣2
Medium
Temperature difference up to 10 ºC Prof. Daniel Palomino
29
App-driven DTM for video coding • Summary – Video properties directly influence temperature generation of HEVC encoding – There is a potential of applying DTM techniques to have safe operational temperature • Core idling • Frequency scaling
• Therefore, one of the key challenges is to use the video content properties to enable application-level temperature management during HEVC encoding
Prof. Daniel Palomino
30
App-driven DTM for video coding • Thermal analysis of different HEVC parameters – Does encoder configuration really matter?
RaceHorses @ QP 22 Temp max.: 55.0 °C Temp min.: 36.0 °C Temp avg.: 53.0 °C
RaceHorses @ QP 37 Temp max.: 53.0 °C Temp min.: 35.0 °C Temp avg.: 49.0 °C Prof. Daniel Palomino
31
App-driven DTM for video coding 56
PSNR
Bit Rate
Avg. Temp.
80 56
PSNR
Bit Rate
Avg. Temp.
80 56
PSNR
Bit Rate
Avg. Temp.
80
56
PSNR
Bit Rate
Avg. Temp. 80
52
60 52
60 52
60
52
60
48
40 48
40 48
40
48
40
44
20 44
20 44
20
44
20
0
40
40
0 40 22
27
32
QP
37
0 64
32
CTU size
16
40 4
2
1
# reference frames
Bit Rate (100xkbps) PSNR (dB)
Avg. Temperature C
• Thermal analysis of different HEVC parameters
0 128
64
32
search area
– Changes on QP, CTU size and number of reference frames impact on the resulted temperature – QP also highly impact on bit-rate and PSNR – Search area does not have significant impact on temperature* Prof. Daniel Palomino
32
App-driven DTM for video coding • Application-driven DTM for video coding – Problem formulation • T as temperature in °C; • Q as video quality in terms of PSNR; • B as the resulted bit rate; 𝑇𝑐𝑢𝑟𝑟𝑒𝑛𝑡 < 𝑇𝑡ℎ 𝑀𝑎𝑥{𝑄} 𝑀𝑖𝑛 𝐵 • To achieve this goal, the temperature management at the application-level is performed by appropriate selection of the encoding parameters that determines the workload and affects the resulting temperature
Prof. Daniel Palomino
33
App-driven DTM for video coding Avg. Temperature C
• Application-level temperature prediction 𝑇𝑝 = 𝑇𝑐𝑢𝑟𝑟𝑒𝑛𝑡 + ∆𝑇 High to Medium
Medium to High Low to Medium
Medium to Low
60
Measured
Predicted
Prediction starts
55
50 Warming up
45 0
10
20
Frames
30
Avg. Temperature C
PartyScene
𝑤
Dependent of frames complexity
∆𝑇 = 𝑇𝑣 +
60 Measured Prediction starts
55
50 Warming up
45 0
𝑒𝑖 𝑤 𝑖=0
𝑒 = 𝑇𝑚 − 𝑇𝑝 Prof. Daniel Palomino
Predicted
10
Frames
20
30
Keiba
Prediction error of 1.1% on average 34
App-driven DTM for video coding • Application-level thermal management – Design-time pareto analysis of different configurations • Set of pareto-optimal configuration points
– Run-time configuration selection • Select optimal configuration targeting – Temperature – PSNR – Bit-rate
Prof. Daniel Palomino
35
App-driven DTM for video coding • Design-time pareto analysis Algorithm I Extraction of Pareto optimal curve
let T be all temperature points; for each v V and each c C do: encode vi with configuration ci and get temperature ti; get biti and psnri; update point ti in T with biti and psnri; let D be the desired temperature points; for each v V and each d D do: select ci while maximizing{psnri} and minimizing{biti} to satisfy di;
Parameters QP RF SA CU Values 22, 27, 32, 37 1, 2, 4 128,64,32 64, 32, 16
PSNR loss (dB)
1: 2: 3: 4: 5: 6: 7: 8:
- Config. points - Optimal curve
4.0 3.9
3.8 0
12
- Config. points - Optimal curve
400 300
200 100 0
-100 0
Prof. Daniel Palomino
2 4 6 8 10 Temperature Reduction C
500
BR Increase (kbps)
Input: Configuration points C, Video Sequences V;
4.1
2 4 6 8 10 Temperature Reduction C
12
36
App-driven DTM for video coding • Run-time adaptive temperature management Algorithm II Run Time Adaptive Temperature Optimization Input: Pareto points P, video V, Temperature Threshold Tth; 1: error_list = []; 2: c = initial configuration; 3: 𝑇𝑐𝑢𝑟𝑟𝑒𝑛𝑡 = measure_temperature(); 4: for each frame f Є V do: 5: Ccurrent = classify_complexity(f); 6: ∆𝑇 = Tv(Ccurrent, Cprevious) + mean(error_list); Prediction 7: Tp = Tcurrent + ∆𝑇; 8: if Tp > Tth do: 9: c = pareto_selection(P, Tth); //reaction Configuration selection 10: encode(f, c); 11: Tcurrent = measure_temperature(); 12: error = Tcurrent – Tp; Prediction error update 13: update_error_list(error); 14: Cprevious = Ccurrent; Prof. Daniel Palomino
37
App-driven DTM for video coding 100%
80% 60%
40% 20% 0% Party
R.Horses Keiba
Low
Basket BQMall
Medium
Peak temperature (ºC)
• Experimental results (temperature) 60
55 50
45 No Opt.
40 0
High
55 50
45 40 0
54 °C
10
# Frames
Keiba
50 °C
20
10
50 °C
56 ºC 54 ºC 52 ºC
46 °C
20
# Frames
PartyScene
60
46 °C
Peak temperature (ºC)
Peak temperature (ºC)
Complexity
No Opt.
54 °C
50 ºC
60
No optimization
55
48 ºC 46 ºC
50
44 ºC
45
42 ºC 40 ºC 38 ºC
No Opt.
40 0
54 °C
10
# Frames
50 °C
46 °C
20
Basketball
Prof. Daniel Palomino
54 ºC
38
App-driven DTM for video coding Bit rate (10x kbps)
42 40 38 36 34 32 30 R. Horses Keiba
No Opt.
Party
54°C
30 25 20 15 10 5 0 R. Horses Keiba
B. Drill BQMall
50°C
Bit rate
No Opt.
46°C
Party
54°C
B. Drill BQMall
50°C
46°C
Bit-rate increase of 0.99% on average
46 ºC = PSNR loss not higher than 1.81dB 50 PSNR (dB)
PSNR (dB)
• Experimental results (quality) PSNR
40
Our 10% 20% 50%
30
20 10
PSNR loss up to 20dB
0 R. Horses
Keiba
Party
B. Drill
BQMall
Comparison with (Lee @ TVLSI’08) Prof. Daniel Palomino
39
Temperature-aware solutions
• Thermal optimization using approximate computing
Prof. Daniel Palomino
40
Thermal optimization for video • Approximate computing (AC) – Way to improve power efficiency – Compromise quality within tolerable ranges
• AC at different levels of the computing stack – Circuit (Ramasubramanian @ DAC’13), (Gupta@TVLSI’14) – Architectures (Chippa@ISLPED’14), (Chipppa@ISLPED’14) • Not yet explored for thermal profile optimization – Application (Venkataramani@ISLPED’14), • Video Coding is a well-suited application for approximate computing (Chakradhar@DAC’10) • Inherent resilience of various functional blocks • Can tolerate varying degree of errors in the output quality
Prof. Daniel Palomino
41
Thermal optimization for video • Thermal optimization using adaptive approximate computing for HEVC – Error tolerance analysis for video coding – Thermal optimization through adaptive approximate computing
Prof. Daniel Palomino
42
Thermal optimization for video • Error tolerance analysis for video coding – Lots of possibilities to encode one CTU (Coding tree block) • Pixel level functions using SAD (Sum of Absolute Differences)
– All operations can be approximated at different levels • Circuit (Gupta@ISLPED’11) • Loop perforation (Duskos@ESEC/FSW’11) • Data approximations (Shafique@DAC’12)
64x64
32x32
8x8
16x16
1:4
1:2
1:8
Quad-tree coding structure
Example of data approximations
Prof. Daniel Palomino
43
Thermal optimization for video • Analyzing the error tolerance of HEVC under different application-level approximations – Loop Perforation to prune the quad-tree coding structure – Data approximations through pixel sub-sampling during the prediction process Approximate Mode
Maximum Quad-Tree Depth
Data Sub-Sampling
AM-0 AM-1 AM-2 AM-3
4 (until 8x8) 3 (until 16x16) 2 (until 32x32) 1 (until 64x64)
1:1 1:2 1:4 1:8
Prof. Daniel Palomino
44
Thermal optimization for video • Analyzing the error tolerance of HEVC under different application-level approximations – Approximation modes are applied CU by CU for all sequences – Metrics to evaluate • BD-PSNR visual quality (dB) to illustrate error tolerance of a given sequence • Computation complexity reduction (sec) is used as the abstract metric to show potential for improve the temperature profile
Prof. Daniel Palomino
45
Thermal optimization for video • Effects of approximation modes BD-PSNR loss (dB)
Normalized time for each CU* 1,5
AM-1
AM-2
AM-3
1 0,5
0 BQMall
BQTerrace BasketDrive BasketDrill RaceHorses
Cactus
• Quality loss increases as the AM modes are stronger • High motion/texture regions are encoded with bigger blocks
AM-0
AM-2
AM-1
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
AM-3
• Workload reduces more as the AM modes are stronger *Color maps for BasketballDrive sequence
Prof. Daniel Palomino
46
Thermal optimization for video • AM modes selectively applied BasketballDrive
All regions
Low detailed
Low detailed regions
High detailed Norm. Workload
BD-PSNR loss (dB)
High detailed
BQTerrace
0,6 0,4 0,2 0
All regions
AM-1 AM-2 AM-3 AM-1 AM-2 AM-3
BQTerrace
BasketballDrive
• Very low quality losses for all AM modes (less than 0.1dB)
Low detailed regions
1 0,8 0,6 0,4 0,2 0
AM-1 AM-2 AM-3 AM-1 AM-2 AM-3
BQTerrace
Low detailed
BasketballDrive
• Still good workload reduction for all AM modes
Prof. Daniel Palomino
47
Thermal optimization for video • Summary of analysis – Approximate computing can provide significant workload reductions with quality penalties – Tradeoff (workload reduction vs. quality loss) can be improved • Adaptive approximate computing • Error tolerance/resilience properties of different regions • We propose temperature optimization technique properties that adaptively • Error a resilience as a function of texture/motion employs different approximate computing modes to reduce the temperature associated with the HEVC video coding process with low quality degradation
Prof. Daniel Palomino
48
Thermal optimization for video • Thermal optimization through adaptive approximate computing – Error resilience classification • As a function of content properties
– Content-driven adaptive approximation management • Approximation modes adaptively applied during encoding
Prof. Daniel Palomino
49
Thermal optimization for video • Error resilience classification 𝜐𝐶𝑈
1 = 4096
4096
(𝜌𝑖 − 𝜌𝑎𝑣𝑔 )2
(SHAFIQUE @ DAC’12)
𝑖=1
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
BQTerrace
Prof. Daniel Palomino
Normalized CU variance
BasketballDrive
50
Thermal optimization for video • Error resilience classification – Four resilience levels to evaluate the proposed concepts 𝑖𝑓(𝑛𝑜𝑟𝑚_𝜐𝐶𝑈 < 𝑇ℎ𝑣1 ) 𝑅𝑒𝑠𝑖𝑙𝑖𝑒𝑛𝑡 𝑀𝑒𝑑𝑖𝑢𝑚 𝑟𝑒𝑠𝑖𝑙𝑖𝑒𝑛𝑡 𝑖𝑓(𝑇ℎ𝑣1 ≤ 𝑛𝑜𝑟𝑚_𝜐𝐶𝑈 < 𝑇ℎ𝑣2 ) 𝛤= 𝑀𝑒𝑑𝑖𝑢𝑚 𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑒 𝑖𝑓(𝑇ℎ𝑣2 ≤ 𝑛𝑜𝑟𝑚_𝜐𝐶𝑈 < 𝑇ℎ𝑣3 ) 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑒 𝑖𝑓 𝑛𝑜𝑟𝑚_𝜐𝐶𝑈 ≥ 𝑇ℎ𝑣3
– Threshold values obtained through regression analysis – Step size of 0.05 – Small subset of test video sequences with diverse motion/texture properties Thresholds 𝑇ℎ𝑣1 Values 0.1
𝑇ℎ𝑣2 0.2
Prof. Daniel Palomino
𝑇ℎ𝑣3 0.3 51
Thermal optimization for video • Content-driven adaptive approximation management Algorithm I Approximate mode selection heuristic. Input: sequence S; 1: v_list = [ ]; 2: for each frame f Є S do: 3: for each CU cu Є f do: 4: vCU = extract_variance(cu); 5: update_v_list(vCU); 6: for each CU cu Є f do: 7: norm_vCU = [vCU – min(v_list)] / [max(v_list) – min(v_list)]; 8: case norm_vCU: 9: resilient: encode(cu, AM-3); 10: medium resilient: encode(cu, AM-2); 11: medium sensitive: encode(cu, AM-1); 12: sensitive: encode(cu, AM-0);
Prof. Daniel Palomino
Variance extraction for each CU
AM mode selection for each CU
52
Thermal optimization for video • Experimental setup – – – – –
Digital Thermal Sensor (DTS) setup Tool chain setup All results collected with HEVC test Model 16.0 Thresholds obtained from BQTerrace and BasketballDrive Videos recommended by video community
Prof. Daniel Palomino
53
Thermal optimization for video • Experimental results (Temperature) 50 40 30 AC OFF
20
0
10
AC ON
20 30 Frames
60
50 40 30 AC OFF
20
40
50
BasketballDrill
Temperature (°C)
Temperature (°C)
Temperature (°C)
60
0
10
AC ON
20 30 Frames
40
60 Temperature (°C)
– DTS
60
50 40 30 AC OFF
20
50
0
10
BasketballDrive
AC ON
20 30 Frames
BQTerrace
40
50 40 30 AC OFF
20
50
0
10
AC ON
20 30 Frames
40
50
Cactus
– Thermal maps from tool chain
BQMall
BasketDrill
AC off
AC on
Core
Core
L2
L2
Core
Core
L2
L2
60 58 56 54 52 49 47 46
C C C C C C C C
• Our technique successfully improves temperature profile of video coding systems • Average temperature reduction of 10 ºC
Prof. Daniel Palomino
54
Thermal optimization for video • Quality results and comparison Average loss 0.66 dBs
BD-PSNR loss (dB)
1.5
AC ON
AM-3
1
0.5 0 BQMall
BQTerrace BasketDrive BasketDrill RaceHorses
Cactus
Keiba
• Low quality degradation when using our approach • Better quality results when compared to static AM-3
Better or similar results
PSNR (dB)
60
Our AC (PALOMINO, SHAFIQUE, et al., 2014) 20% (LEE, PATEL and PEDRAM, 2008)
40 20
0 RaceHorses
Keiba
BasketDrill
BQMall
• Our technique outperforms quality of previous works Prof. Daniel Palomino
55
Temperature-aware solutions
• Application-driven thermal-aware scheduling
Prof. Daniel Palomino
56
Thermal-aware scheduling • Thread’s workload in video coding systems 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
PartyScene frame 2
BasketballDrive frame 3 35 Workload (%)
35 Workload (%)
30
30
25
25
20
20 t0
15 0
10
t1
20 Frames
t2
30
PartyScene 832x480
t3
40
t0
15 50
0
10
t1 20 Frames
t2 30
t3 40
50
BasketballDrive 1920x1080 Prof. Daniel Palomino
57
Thermal-aware scheduling
70 60 50 40 30 20 10 0
Workload Variation (%)
Workload Variation (%)
• Thread’s workload in video coding systems
t0
t1
t2
PartyScene 832x480 Up to 40% computational complexity difference between threads
70 60 50 40 30 20 10 0 t0
t3
t1
t2
t3
BasketballDrive 1920x1080 Up to 60% computational complexity difference between threads
Prof. Daniel Palomino
58
Thermal-aware scheduling Thread 1
Thread 3
Thread 7 Thread 3
Thread 2
Thread 6 Thread 2
Thread 1
Thread 5 Thread 1
Thread 0
Thread 4 Thread 0
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Thread 0
Prof. Daniel Palomino
79 75 70 65 59 55 51 47
C C C C C C C C
81 76 72 67 62 58 53 49
C C C C C C C C
83 78 73 69 64 59 54 51
C C C C C C C C 59
Thermal-aware scheduling • Summary of analysis – The unbalanced nature of the multithreaded HEVC directly impact in the temperature profile – Threads with different workload demands will generate unbalanced thermal profiles • High spatial temperature gradients degrade reliability
– However, workload distribution is mainly affected by video properties • This can be used to appropriate schedule • Towards reducing spatial temperature gradients
Prof. Daniel Palomino
60
Thermal-aware scheduling • Summary of analysis – The unbalanced nature of the multithreaded HEVC directly impact in the temperature profile – Threads with different workload demands will generate unbalanced thermal profiles • High spatial temperature gradients degrade reliability
– However, workload distribution is mainly affected by video properties • This can be used to appropriate schedule • Towards reducing spatial temperature gradients
Prof. Daniel Palomino
61
Thermal-aware scheduling • Application-driven scheduling scheme – Application-level thread workload prediction – Temperature-aware scheduler Application
Threads before Threads after scheduling scheduling .
. . .
Multicore platform
core
core
core
core
core
core
. .
HEVC encoder Thread 0 Thread 1 Thread 2 Thread 3
Application-driven temperatureThreads workload aware scheduler
Prof. Daniel Palomino
Current thermal status
62
Thermal-aware scheduling • Problem formulation – Given a set of n cores 𝐶 = 𝑐0 , 𝑐1 , … , 𝑐𝑛 – Given a set of m threads 𝑇 = 𝑡0 , 𝑡1 , … , 𝑡𝑚 – The main goal is assign all threads in T to all cores in C • Minimize spatial temperature gradients
– Goal of temperature-aware scheduling • Assign each thread to each core • Considering current temperature status 𝑇𝑆 = 𝑡𝑠𝑐0 , 𝑡𝑠𝑐1 , … , 𝑡𝑠𝑐𝑛 • Considering future workload of threads 𝑊 = 𝑤𝑡0 , 𝑤𝑡1 , … , 𝑤𝑡𝑚
Prof. Daniel Palomino
63
Thermal-aware scheduling • Application-level thread workload prediction – Workload of neighbor frames are similar 0.4
0.4 0.35
Workload previous frame
Workload previous frame
• Temporal content correlation ρ=0.88
0.3 0.25 0.2 0.15 0.1 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Workload current frame
0.35 0.3
ρ=0.99
0.25
0.2 0.15 0.1 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Workload current frame
PartyScene 832x480
BasketballDrive 1920x1080 Prof. Daniel Palomino
64
Thermal-aware scheduling • Workload distribution variation – Video properties can guide possible workload variation between frames • Minimizing workload prediction errors 𝑤𝑝𝑡𝑖 = 𝑓 𝑊𝑀, 𝑇𝐿𝐶, 𝑇𝐿𝑃 = 𝑤𝑚𝑡𝑖 + ∆𝑡𝑙𝑡𝑖 ∆𝑡𝑙𝑡𝑖 = 𝑡𝑙𝑐𝑖 − 𝑡𝑙𝑝𝑖 𝑤𝑝𝑡𝑖 Workload prediction of i-th thread 𝑊𝑀 = 𝑤𝑚𝑡0 , 𝑤𝑚𝑡1 , … , 𝑤𝑚𝑡𝑚 Workload from content-temporal correlation 𝑇𝐿𝐶 = 𝑡𝑙𝑐𝑡0 , 𝑡𝑙𝑐𝑡1 , … , 𝑡𝑙𝑐𝑡𝑚 Texture levels of current frame 𝑇𝐿𝑃 = 𝑡𝑙𝑝𝑡0 , 𝑡𝑙𝑝𝑡1 , … , 𝑡𝑙𝑝𝑡𝑚 Texture levels of previous frame
Prof. Daniel Palomino
65
Thermal-aware scheduling • Prediction error
Frequency
Frequency
– Less than 0.1%
Error (%)
Error (%)
BQMall 832x480
RaceHorses 832x480
Prof. Daniel Palomino
66
Thermal-aware scheduling • Temperature-aware scheduler
Algorithm 7.1 Application-driven temperature-aware scheduler scheme Input: threads T, cores C, Spatial Gradient Threshold SGTth; 1: for each frame f Є V do: 2: WP = [ ] 3: TS = [ ] 4: for each thread t T do: 5: WM.extract() 6: TLC.extract() 7: TLP.extract() Monitoring variables 8: wpti = f(WM,TLC,TLP) 9: WP.append(wpti) 10: end for 11: for each core c C do: 12: tsi = get_temperature(ci) 13: TS.append(tsi) 14: end for 15: current_STG = max(TS) – min(TS) 16: if current_STG > STGth do: Reacting to large 17: WP.sort() 18: TS.reverse_sort() spatial gradients 19: schedule(T,WP,TS) //reaction 20: end for
Prof. Daniel Palomino
67
Thermal-aware scheduling • Experimental results 90
Core_0
Core_1
90
80
80
70
70
60
60 unware scheduler
our scheduler
90
core_0 core_4
core_0
unware scheduler
core_1 core_5
core_2 core_6
core_1
core_2
core_3
our scheduler
core_3 core_7
80 70
60 unware scheduler
our scheduler
Prof. Daniel Palomino
68
Thermal-aware scheduling • Experimental results
unaware scheduler
Our scheduler (STGth = 5 ºC)
#cores Max temp (°C) %time > 5 °C
2 83.2
4 84.3
8 83.2
2 78.8
4 79.8
43.7
44.3
43.7
0.0
0.0
0.0
%time > 10 °C
4.6
4.6
5.6
0.0
0.0
0.0
Prof. Daniel Palomino
8 76.9
69
Thermal-aware scheduling 79 75 70 65 59 55 51 47
C C C C C C C C
81 76 72 67 62 58 53 49
C C C C C C C C
83 78 73 69 64 59 54 51
Prof. Daniel Palomino
C C C C C C C C
70
Conclusions • We presented different solutions for thermal optimization of video coding systems – Application-driven DTM for HEVC • Run-time encoder configuration selection
– Temperature optimization using adaptive approximate computing • Approximate computing levels adaptively applied
– Application-driven thermal-aware scheduling • Scheduling scheme to reduce spatial temperature gradients of multicore systems
Prof. Daniel Palomino
71
Conclusions • Challenge – Improve thermal profiles of video coding systems
• Main idea – Raise the abstraction of temperature management to the application-level – Use video properties and application characteristics to drive the temperature solutions
• Thesis results demonstrated that thermal management can be performed at the application level successfully
Prof. Daniel Palomino
72
MINISTÉRIO DA EDUCAÇÃO UNIVERSIDADE FEDERAL DE PELOTAS PROGRAMA DE PÓS-GRADUAÇÃO EM COMPUTAÇÃO
Thank you! Questions?
Application-driven temperatureaware solutions for video coding Professor: Daniel Palomino
[email protected]
Prof. Daniel Palomino