MINISTÉRIO DA EDUCAÇÃO UNIVERSIDADE FEDERAL DE PELOTAS PROGRAMA DE PÓS-GRADUAÇÃO EM COMPUTAÇÃO

Application-driven temperatureaware solutions for video coding Professor: Daniel Palomino [email protected]

Prof. Daniel Palomino

Quem sou eu? • Formação – Bacharel em Ciência da Computação (UFPel) – Mestre e Doutor em Ciência da Computação (UFRGS) • Estágio sanduíche em Karlsruhe, Alemanha (KIT)

• Atualmente – Professor no Centro de Engenharias (UFPel) – Pesquisador no Video Technology Research Group (ViTech)

Prof. Daniel Palomino

2

Outline • • • • • • •

Introduction State-of-the-art Video coding overview Temperature methodology Application-driven temperature-aware solutions Conclusions and future works Publications

Prof. Daniel Palomino

3

Introduction • Video services have widely spread in the past years • According to [CISCO, 2016] mobile video services will consume 75% of global internet traffic by 2020

• Video coding plays an important role • New standards have been released to provide high coding efficiency • Like HEVC (High Efficiency Video Coding) • However, computational complexity of video coding process increases

Prof. Daniel Palomino

4

Introduction • The high complexity of HEVC is well complemented by continuous technology scaling and microarchitecture advancements – Provided high performance computing systems – Many processing cores running in parallel – High frequencies • However, the expected voltage scaling did not followed the transistor shrinking at the same rate • This resulted in high power densities and consequently high on-chip temperatures (hotspots)

Prof. Daniel Palomino

5

Introduction • High temperature effects – Higher cooling costs • Area and power

– Chip reliability and lifetime • Most aging effects are aggravated at high temperatures • For instance, NBTI and HCI cause threshold voltage elevation • Electromigration, stress migration andisdielectric • Therefore, temperature management/reduction one of thebreakdown primary design objectives especially for embedded systems that have limited power and area to employ complex thermal solutions. – Spatial/temporal variations on temperature also have

negative impact on reliability and performance

Prof. Daniel Palomino

6

Introduction • Video processing is an important application domain with respect to temperature – Very data and compute intensive – Computational complexity drastically increases with new video coding standards (like HEVC) and trends of Multiview and 3D videos – High workload variations translate to temperature variations • Spatial/temporal temperature gradients

• We present a set of designed thermal solutions targeting video coding using • Application-specific characteristics • Video properties

Prof. Daniel Palomino

7

State-of-the-art • There are lots of literature works dealing with temperature issues at different computing stack levels – Hardware-level – System-level

• Traditional techniques to reduce temperature – Handled at package level by heat sinks and coolers – Design-time techniques (HUNG, ADDO-QUAYE, et al., 2004) (ZHANG, OGRENCI-MEMIK, et al., 2015)

Prof. Daniel Palomino

8

State-of-the-art • In this thesis, the main focus is Dynamic Temperature Management (DTM) – Employed in most of literature works to deal with temperature problems at run-time – DTM use diverse low power techniques to control temperature • Task Migration • Dynamic Voltage and Frequency Scaling (DVFS) • Clock/power gating

Prof. Daniel Palomino

9

State-of-the-art • At system-level – Scheduling using task migration (Multi-core scenario) • Temperature history-based prediction • Hardware performance counter prediction

– Works • • • • • •

(YEO, LIU and KIM @ ISLPED’08) (COSKUN, ROSING and GROSS @ ICCAD’08) (COSKUN, ROSING, et al. @ TVLSI’08) (COSKUN, ROSING and GROSS @ TCAD’09) (KUMAR, SHANG, et al., TCAD’08) (CHO, KERSEY, et al., TCPMT’13)

Prof. Daniel Palomino

10

State-of-the-art • At hardware-level – Run-time hardware configuration • Model predictive controllers • Dynamic Voltage and Frequency Scaling (DVFS) • Clock gating

– Works • • • • • • • • • •

(COSKUN, ROSING and GROSS @ ICCAD’08) (COSKUN, ROSING, et al. @ TVLSI’08) (COSKUN, ROSING and GROSS @ TCAD’09) (KUMAR, SHANG, et al., TCAD’08) (FISHER, CHEN, et al. @ JSA’11) (EBI, FARUQUE and HENKEL @ ICCAD’09) (EBI, KRAMER, et al. @ CODES+ISSS’11) (ZANINI, ATIENZA, et al. @ ECCTD’09) (BARTOLINI, CACCIARI, et al. @ DATE’11) (BARTOLINI, CACCIARI, et al. @ TPDS’13)

Prof. Daniel Palomino

11

State-of-the-art Single-core

Hardware-level

Multi-core

(FISHER, CHEN, et al., 2009) (EBI, FARUQUE and HENKEL, 2009) (EBI, KRAMER, et al., 2011) (ZANINI, ATIENZA, et al., 2009) (BARTOLINI, CACCIARI, et al., 2011) (BARTOLINI, CACCIARI, et al., 2013) (YEO, LIU and KIM, 2008) (COSKUN, ROSING and GROSS, 2008) (COSKUN, ROSING, et al., 2008) (COSKUN, ROSING and GROSS, 2009) (KUMAR, SHANG, et al., 2008) (CHO, KERSEY, et al., 2013)

System-level

Application-level

Prof. Daniel Palomino

12

State-of-the-art • Summary of DTM techniques – All the mentioned works sacrifice system performance in exchange of better thermal profiles – None of these works consider application characteristics on designing the thermal solutions – The performance degradation may be intolerable for some applications (real time applications) – The system-level schedulers consider that workloads of different threads are always the same • Video coding is an example where workloads can vary abruptly between threads and along time • Temperature history based schedulers may fail on predicting future application behavior

Prof. Daniel Palomino

13

State-of-the-art • DTM techniques for video coding/decoding – Take multimedia application general characteristics to perform better temperature management – (SRINIVASAN and ADVE @ ICS’03) – (LEE, PATEL and PEDRAM @ ISLPED’06) – (LEE, PATEL and PEDRAM @ TVLSI’2008) – (YEO and KIM, 2008) – (MARCU, MILOS and TUDOR, 2010) – (FORTE and SRIVASTAVA, 2010) – (MIRTAR, DEY and RAGHUNATHAN, 2012)

Prof. Daniel Palomino

14

State-of-the-art • Summary of DTM for video coding – They do not account for the application-specific characteristics and video content properties • Which may provide a potential for more efficient temperature management for video coding

• In order address the thermal related challenges there is a need for application– All these works target for old video coding standards (such driven thermal management solutions for efficient thermal control of video coding as MPEG-2 and H.264/AVC) systems while providing high video quality – itTherefore, may the notrecent be efficiently appliedvideo highcoding complex • Also, is important they to consider and more complex new(like coding standards HEVC)tools • Finally, application-specific characteristics and video content properties can be used to improve temperature profiles of video coding systems

Prof. Daniel Palomino

15

Goals • We address these goals with the following contributions 1. Thermal analysis and workload distribution on video coding 2. Relationships between video coding characteristics and video content properties with temperature 3. Application-driven dynamic thermal management for video encoders 4. Thermal optimization using adaptive approximate computing for video coding 5. Application-driven temperature-aware scheduling of multithreaded workloads on multi-core systems Prof. Daniel Palomino

16

Video coding overview • The main goal of video coding is to represent the high amount of data of digital videos with less information as possible – Reducing data redundancies (spatial, temporal, entropic) Prediction Spatial

Residual

Entropy

Transforms

CAVLC

Temporal Input Video

Quantization

CABAC

Coded Video

Inter-view

Prof. Daniel Palomino

17

Video coding overview • Motion estimation – Find the best representation of the current block being encoded on the temporally neighboring frames Reference frames list

Search Area Current block Current frame

Reference frame Current block

Similarity value =

Candidate block



Similarity criterion

𝑁

𝑚𝑖𝑛

𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝐵𝑙𝑜𝑐𝑘 − 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝐵𝑙𝑜𝑐𝑘𝑖 𝑖=1

Prof. Daniel Palomino

18

Video coding overview • HEVC standard – State-of-the-art video coding standard – New coding tools – Coding Tree Unit (CTU) based structure • Can be recursively divided in smaller Coding Units (CU) • This high number of possibilities to encode one CTU highly improves the HEVC coding efficiency in terms of bit-rate and visual quality when compared to older standards • However, the computational complexity associated with processing all CTUs considering all CU sizes highly increases

Prof. Daniel Palomino

19

Video coding overview • Opportunities of temperature optimization for video coding – – – – –

Application natural resilience to error Application workload dependent on content High configurability Trade-off between temperature and compression efficiency Video coding parallel tools are not efficient on distributing workload

Prof. Daniel Palomino

20

Temperature Methodology • In this work we used three different methodologies to collect temperature results – IR-Camera setup – Tool chain setup – DTS setup

Prof. Daniel Palomino

21

Temperature Methodology • Infra Red Camera setup (CES-KIT, 2013) Linux Ubuntu kernel

Voltage supply

- accuracy of ±1 °C - spatial resolution of 50 µm per pixel - 50 Hz

IR Camera

Water-cooling unit to cool down the thermoelectric device

Thermal map CPU chip

1.8 GHz Intel Atom 45nm dual-core processor

Prof. Daniel Palomino

22

Temperature Methodology • Tool chain setup Temperature methodology using tool chain GEM 5 simulator Application

core

core

Cores usage statistics

McPAT power simulator

core

Thermal profiles

power traces of cores

HotSpot thermal modeling tool

Prof. Daniel Palomino

23

Temperature Methodology • Digital thermal sensor – Presented in most of modern processor – Independently real time measure of cores temperature • • • •

Intel I7 processor Linux monitoring sensors software The video coding application is restricted to one core Temperature readings are performed every 500 milliseconds

Prof. Daniel Palomino

24

Temperature Methodology • Video coding experiments methodology – All experiments used the HEVC standard – All sequences are encoded using the HEVC test model (HM) software – Recommendations of standardization committee (BOSSEN, 2012) Sequences Resolution (pixels) BQMall BasketballDrill RaceHorses Keiba PartyScene BQTerrace BasketballDrive Cactus

832x480 832x480 832x480 832x480 832x480 1920x1080 1920x1080 1920x1080

Prof. Daniel Palomino

25

Temperatura-aware solutions • Application-driven dynamic thermal management for HEVC • Thermal optimization using approximate computing • Application-driven thermal-aware scheduling

Prof. Daniel Palomino

26

App-driven DTM for video coding • Application-driven dynamic thermal management for HEVC (using infra-red camera) – Thermal analysis of HEVC • Different sequences (content properties) • Different coding parameters

– Application-driven DTM for video coding • Application-level temperature prediction • Application-level thermal management

Prof. Daniel Palomino

27

App-driven DTM for video coding • Thermal analysis for different sequences 55 50 45

RaceHorses

40 2500

Temperature (°C)

60

2550

peak2

55

50 peak1 45 40 2500

RaceHorses (1.8 GHz) 2550 peak1

53.9 ºC

BQMall (1.35 GHz)

2600 Time (sec)

2650

peak2

2600 Time (sec)

BQMall 2650

62 Temperature (°C)

Temperature ( C)

60

2700 62 60 58 56 54 52 50 48 46 44

2700

peak2

57 52

47

peak1

RaceHorses

42 1880

BQMall

1930

1980 Time (sec)

C C C C C C C C C C

peak1

56.4 ºC

peak2

2030 62 60 58 56 54 52 50 48 46 44

C C C C C C C C C C

5 ºC higher Prof. Daniel Palomino

28

App-driven DTM for video coding • Thermal analysis of different sequences – Efficiency and complexity of video coding is driven by video properties – Complexity classification (SHAFIQUE @ DAC’12) • Texture intensity using variance of luminance samples High

1 𝑣𝑓 = 𝑛×𝑚

𝐶𝑓 =

𝑛×𝑚

𝜌𝑖 − 𝜌𝑎𝑣𝑔

2

𝑖=0

Low

𝑙𝑜𝑤

𝑖𝑓 𝑣𝑓 ≤ 𝑇ℎ𝑣1

𝑚𝑒𝑑𝑖𝑢𝑚

𝑖𝑓 𝑇ℎ𝑣1 < 𝑣𝑓 ≤ 𝑇ℎ𝑣2

ℎ𝑖𝑔ℎ

𝑖𝑓 𝑣𝑓 > 𝑇ℎ𝑣2

Medium

Temperature difference up to 10 ºC Prof. Daniel Palomino

29

App-driven DTM for video coding • Summary – Video properties directly influence temperature generation of HEVC encoding – There is a potential of applying DTM techniques to have safe operational temperature • Core idling • Frequency scaling

• Therefore, one of the key challenges is to use the video content properties to enable application-level temperature management during HEVC encoding

Prof. Daniel Palomino

30

App-driven DTM for video coding • Thermal analysis of different HEVC parameters – Does encoder configuration really matter?

RaceHorses @ QP 22 Temp max.: 55.0 °C Temp min.: 36.0 °C Temp avg.: 53.0 °C

RaceHorses @ QP 37 Temp max.: 53.0 °C Temp min.: 35.0 °C Temp avg.: 49.0 °C Prof. Daniel Palomino

31

App-driven DTM for video coding 56

PSNR

Bit Rate

Avg. Temp.

80 56

PSNR

Bit Rate

Avg. Temp.

80 56

PSNR

Bit Rate

Avg. Temp.

80

56

PSNR

Bit Rate

Avg. Temp. 80

52

60 52

60 52

60

52

60

48

40 48

40 48

40

48

40

44

20 44

20 44

20

44

20

0

40

40

0 40 22

27

32

QP

37

0 64

32

CTU size

16

40 4

2

1

# reference frames

Bit Rate (100xkbps) PSNR (dB)

Avg. Temperature C

• Thermal analysis of different HEVC parameters

0 128

64

32

search area

– Changes on QP, CTU size and number of reference frames impact on the resulted temperature – QP also highly impact on bit-rate and PSNR – Search area does not have significant impact on temperature* Prof. Daniel Palomino

32

App-driven DTM for video coding • Application-driven DTM for video coding – Problem formulation • T as temperature in °C; • Q as video quality in terms of PSNR; • B as the resulted bit rate; 𝑇𝑐𝑢𝑟𝑟𝑒𝑛𝑡 < 𝑇𝑡ℎ 𝑀𝑎𝑥{𝑄} 𝑀𝑖𝑛 𝐵 • To achieve this goal, the temperature management at the application-level is performed by appropriate selection of the encoding parameters that determines the workload and affects the resulting temperature

Prof. Daniel Palomino

33

App-driven DTM for video coding Avg. Temperature C

• Application-level temperature prediction 𝑇𝑝 = 𝑇𝑐𝑢𝑟𝑟𝑒𝑛𝑡 + ∆𝑇 High to Medium

Medium to High Low to Medium

Medium to Low

60

Measured

Predicted

Prediction starts

55

50 Warming up

45 0

10

20

Frames

30

Avg. Temperature C

PartyScene

𝑤

Dependent of frames complexity

∆𝑇 = 𝑇𝑣 +

60 Measured Prediction starts

55

50 Warming up

45 0

𝑒𝑖 𝑤 𝑖=0

𝑒 = 𝑇𝑚 − 𝑇𝑝 Prof. Daniel Palomino

Predicted

10

Frames

20

30

Keiba

Prediction error of 1.1% on average 34

App-driven DTM for video coding • Application-level thermal management – Design-time pareto analysis of different configurations • Set of pareto-optimal configuration points

– Run-time configuration selection • Select optimal configuration targeting – Temperature – PSNR – Bit-rate

Prof. Daniel Palomino

35

App-driven DTM for video coding • Design-time pareto analysis Algorithm I Extraction of Pareto optimal curve

let T be all temperature points; for each v V and each c C do: encode vi with configuration ci and get temperature ti; get biti and psnri; update point ti in T with biti and psnri; let D be the desired temperature points; for each v V and each d D do: select ci while maximizing{psnri} and minimizing{biti} to satisfy di;

Parameters QP RF SA CU Values 22, 27, 32, 37 1, 2, 4 128,64,32 64, 32, 16

PSNR loss (dB)

1: 2: 3: 4: 5: 6: 7: 8:

- Config. points - Optimal curve

4.0 3.9

3.8 0

12

- Config. points - Optimal curve

400 300

200 100 0

-100 0

Prof. Daniel Palomino

2 4 6 8 10 Temperature Reduction C

500

BR Increase (kbps)

Input: Configuration points C, Video Sequences V;

4.1

2 4 6 8 10 Temperature Reduction C

12

36

App-driven DTM for video coding • Run-time adaptive temperature management Algorithm II Run Time Adaptive Temperature Optimization Input: Pareto points P, video V, Temperature Threshold Tth; 1: error_list = []; 2: c = initial configuration; 3: 𝑇𝑐𝑢𝑟𝑟𝑒𝑛𝑡 = measure_temperature(); 4: for each frame f Є V do: 5: Ccurrent = classify_complexity(f); 6: ∆𝑇 = Tv(Ccurrent, Cprevious) + mean(error_list); Prediction 7: Tp = Tcurrent + ∆𝑇; 8: if Tp > Tth do: 9: c = pareto_selection(P, Tth); //reaction Configuration selection 10: encode(f, c); 11: Tcurrent = measure_temperature(); 12: error = Tcurrent – Tp; Prediction error update 13: update_error_list(error); 14: Cprevious = Ccurrent; Prof. Daniel Palomino

37

App-driven DTM for video coding 100%

80% 60%

40% 20% 0% Party

R.Horses Keiba

Low

Basket BQMall

Medium

Peak temperature (ºC)

• Experimental results (temperature) 60

55 50

45 No Opt.

40 0

High

55 50

45 40 0

54 °C

10

# Frames

Keiba

50 °C

20

10

50 °C

56 ºC 54 ºC 52 ºC

46 °C

20

# Frames

PartyScene

60

46 °C

Peak temperature (ºC)

Peak temperature (ºC)

Complexity

No Opt.

54 °C

50 ºC

60

No optimization

55

48 ºC 46 ºC

50

44 ºC

45

42 ºC 40 ºC 38 ºC

No Opt.

40 0

54 °C

10

# Frames

50 °C

46 °C

20

Basketball

Prof. Daniel Palomino

54 ºC

38

App-driven DTM for video coding Bit rate (10x kbps)

42 40 38 36 34 32 30 R. Horses Keiba

No Opt.

Party

54°C

30 25 20 15 10 5 0 R. Horses Keiba

B. Drill BQMall

50°C

Bit rate

No Opt.

46°C

Party

54°C

B. Drill BQMall

50°C

46°C

Bit-rate increase of 0.99% on average

46 ºC = PSNR loss not higher than 1.81dB 50 PSNR (dB)

PSNR (dB)

• Experimental results (quality) PSNR

40

Our 10% 20% 50%

30

20 10

PSNR loss up to 20dB

0 R. Horses

Keiba

Party

B. Drill

BQMall

Comparison with (Lee @ TVLSI’08) Prof. Daniel Palomino

39

Temperature-aware solutions

• Thermal optimization using approximate computing

Prof. Daniel Palomino

40

Thermal optimization for video • Approximate computing (AC) – Way to improve power efficiency – Compromise quality within tolerable ranges

• AC at different levels of the computing stack – Circuit (Ramasubramanian @ DAC’13), (Gupta@TVLSI’14) – Architectures (Chippa@ISLPED’14), (Chipppa@ISLPED’14) • Not yet explored for thermal profile optimization – Application (Venkataramani@ISLPED’14), • Video Coding is a well-suited application for approximate computing (Chakradhar@DAC’10) • Inherent resilience of various functional blocks • Can tolerate varying degree of errors in the output quality

Prof. Daniel Palomino

41

Thermal optimization for video • Thermal optimization using adaptive approximate computing for HEVC – Error tolerance analysis for video coding – Thermal optimization through adaptive approximate computing

Prof. Daniel Palomino

42

Thermal optimization for video • Error tolerance analysis for video coding – Lots of possibilities to encode one CTU (Coding tree block) • Pixel level functions using SAD (Sum of Absolute Differences)

– All operations can be approximated at different levels • Circuit (Gupta@ISLPED’11) • Loop perforation (Duskos@ESEC/FSW’11) • Data approximations (Shafique@DAC’12)

64x64

32x32

8x8

16x16

1:4

1:2

1:8

Quad-tree coding structure

Example of data approximations

Prof. Daniel Palomino

43

Thermal optimization for video • Analyzing the error tolerance of HEVC under different application-level approximations – Loop Perforation to prune the quad-tree coding structure – Data approximations through pixel sub-sampling during the prediction process Approximate Mode

Maximum Quad-Tree Depth

Data Sub-Sampling

AM-0 AM-1 AM-2 AM-3

4 (until 8x8) 3 (until 16x16) 2 (until 32x32) 1 (until 64x64)

1:1 1:2 1:4 1:8

Prof. Daniel Palomino

44

Thermal optimization for video • Analyzing the error tolerance of HEVC under different application-level approximations – Approximation modes are applied CU by CU for all sequences – Metrics to evaluate • BD-PSNR visual quality (dB) to illustrate error tolerance of a given sequence • Computation complexity reduction (sec) is used as the abstract metric to show potential for improve the temperature profile

Prof. Daniel Palomino

45

Thermal optimization for video • Effects of approximation modes BD-PSNR loss (dB)

Normalized time for each CU* 1,5

AM-1

AM-2

AM-3

1 0,5

0 BQMall

BQTerrace BasketDrive BasketDrill RaceHorses

Cactus

• Quality loss increases as the AM modes are stronger • High motion/texture regions are encoded with bigger blocks

AM-0

AM-2

AM-1

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

AM-3

• Workload reduces more as the AM modes are stronger *Color maps for BasketballDrive sequence

Prof. Daniel Palomino

46

Thermal optimization for video • AM modes selectively applied BasketballDrive

All regions

Low detailed

Low detailed regions

High detailed Norm. Workload

BD-PSNR loss (dB)

High detailed

BQTerrace

0,6 0,4 0,2 0

All regions

AM-1 AM-2 AM-3 AM-1 AM-2 AM-3

BQTerrace

BasketballDrive

• Very low quality losses for all AM modes (less than 0.1dB)

Low detailed regions

1 0,8 0,6 0,4 0,2 0

AM-1 AM-2 AM-3 AM-1 AM-2 AM-3

BQTerrace

Low detailed

BasketballDrive

• Still good workload reduction for all AM modes

Prof. Daniel Palomino

47

Thermal optimization for video • Summary of analysis – Approximate computing can provide significant workload reductions with quality penalties – Tradeoff (workload reduction vs. quality loss) can be improved • Adaptive approximate computing • Error tolerance/resilience properties of different regions • We propose temperature optimization technique properties that adaptively • Error a resilience as a function of texture/motion employs different approximate computing modes to reduce the temperature associated with the HEVC video coding process with low quality degradation

Prof. Daniel Palomino

48

Thermal optimization for video • Thermal optimization through adaptive approximate computing – Error resilience classification • As a function of content properties

– Content-driven adaptive approximation management • Approximation modes adaptively applied during encoding

Prof. Daniel Palomino

49

Thermal optimization for video • Error resilience classification 𝜐𝐶𝑈

1 = 4096

4096

(𝜌𝑖 − 𝜌𝑎𝑣𝑔 )2

(SHAFIQUE @ DAC’12)

𝑖=1

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

BQTerrace

Prof. Daniel Palomino

Normalized CU variance

BasketballDrive

50

Thermal optimization for video • Error resilience classification – Four resilience levels to evaluate the proposed concepts 𝑖𝑓(𝑛𝑜𝑟𝑚_𝜐𝐶𝑈 < 𝑇ℎ𝑣1 ) 𝑅𝑒𝑠𝑖𝑙𝑖𝑒𝑛𝑡 𝑀𝑒𝑑𝑖𝑢𝑚 𝑟𝑒𝑠𝑖𝑙𝑖𝑒𝑛𝑡 𝑖𝑓(𝑇ℎ𝑣1 ≤ 𝑛𝑜𝑟𝑚_𝜐𝐶𝑈 < 𝑇ℎ𝑣2 ) 𝛤= 𝑀𝑒𝑑𝑖𝑢𝑚 𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑒 𝑖𝑓(𝑇ℎ𝑣2 ≤ 𝑛𝑜𝑟𝑚_𝜐𝐶𝑈 < 𝑇ℎ𝑣3 ) 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑒 𝑖𝑓 𝑛𝑜𝑟𝑚_𝜐𝐶𝑈 ≥ 𝑇ℎ𝑣3

– Threshold values obtained through regression analysis – Step size of 0.05 – Small subset of test video sequences with diverse motion/texture properties Thresholds 𝑇ℎ𝑣1 Values 0.1

𝑇ℎ𝑣2 0.2

Prof. Daniel Palomino

𝑇ℎ𝑣3 0.3 51

Thermal optimization for video • Content-driven adaptive approximation management Algorithm I Approximate mode selection heuristic. Input: sequence S; 1: v_list = [ ]; 2: for each frame f Є S do: 3: for each CU cu Є f do: 4: vCU = extract_variance(cu); 5: update_v_list(vCU); 6: for each CU cu Є f do: 7: norm_vCU = [vCU – min(v_list)] / [max(v_list) – min(v_list)]; 8: case norm_vCU: 9: resilient: encode(cu, AM-3); 10: medium resilient: encode(cu, AM-2); 11: medium sensitive: encode(cu, AM-1); 12: sensitive: encode(cu, AM-0);

Prof. Daniel Palomino

Variance extraction for each CU

AM mode selection for each CU

52

Thermal optimization for video • Experimental setup – – – – –

Digital Thermal Sensor (DTS) setup Tool chain setup All results collected with HEVC test Model 16.0 Thresholds obtained from BQTerrace and BasketballDrive Videos recommended by video community

Prof. Daniel Palomino

53

Thermal optimization for video • Experimental results (Temperature) 50 40 30 AC OFF

20

0

10

AC ON

20 30 Frames

60

50 40 30 AC OFF

20

40

50

BasketballDrill

Temperature (°C)

Temperature (°C)

Temperature (°C)

60

0

10

AC ON

20 30 Frames

40

60 Temperature (°C)

– DTS

60

50 40 30 AC OFF

20

50

0

10

BasketballDrive

AC ON

20 30 Frames

BQTerrace

40

50 40 30 AC OFF

20

50

0

10

AC ON

20 30 Frames

40

50

Cactus

– Thermal maps from tool chain

BQMall

BasketDrill

AC off

AC on

Core

Core

L2

L2

Core

Core

L2

L2

60 58 56 54 52 49 47 46

C C C C C C C C

• Our technique successfully improves temperature profile of video coding systems • Average temperature reduction of 10 ºC

Prof. Daniel Palomino

54

Thermal optimization for video • Quality results and comparison Average loss 0.66 dBs

BD-PSNR loss (dB)

1.5

AC ON

AM-3

1

0.5 0 BQMall

BQTerrace BasketDrive BasketDrill RaceHorses

Cactus

Keiba

• Low quality degradation when using our approach • Better quality results when compared to static AM-3

Better or similar results

PSNR (dB)

60

Our AC (PALOMINO, SHAFIQUE, et al., 2014) 20% (LEE, PATEL and PEDRAM, 2008)

40 20

0 RaceHorses

Keiba

BasketDrill

BQMall

• Our technique outperforms quality of previous works Prof. Daniel Palomino

55

Temperature-aware solutions

• Application-driven thermal-aware scheduling

Prof. Daniel Palomino

56

Thermal-aware scheduling • Thread’s workload in video coding systems 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

PartyScene frame 2

BasketballDrive frame 3 35 Workload (%)

35 Workload (%)

30

30

25

25

20

20 t0

15 0

10

t1

20 Frames

t2

30

PartyScene 832x480

t3

40

t0

15 50

0

10

t1 20 Frames

t2 30

t3 40

50

BasketballDrive 1920x1080 Prof. Daniel Palomino

57

Thermal-aware scheduling

70 60 50 40 30 20 10 0

Workload Variation (%)

Workload Variation (%)

• Thread’s workload in video coding systems

t0

t1

t2

PartyScene 832x480 Up to 40% computational complexity difference between threads

70 60 50 40 30 20 10 0 t0

t3

t1

t2

t3

BasketballDrive 1920x1080 Up to 60% computational complexity difference between threads

Prof. Daniel Palomino

58

Thermal-aware scheduling Thread 1

Thread 3

Thread 7 Thread 3

Thread 2

Thread 6 Thread 2

Thread 1

Thread 5 Thread 1

Thread 0

Thread 4 Thread 0

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Thread 0

Prof. Daniel Palomino

79 75 70 65 59 55 51 47

C C C C C C C C

81 76 72 67 62 58 53 49

C C C C C C C C

83 78 73 69 64 59 54 51

C C C C C C C C 59

Thermal-aware scheduling • Summary of analysis – The unbalanced nature of the multithreaded HEVC directly impact in the temperature profile – Threads with different workload demands will generate unbalanced thermal profiles • High spatial temperature gradients degrade reliability

– However, workload distribution is mainly affected by video properties • This can be used to appropriate schedule • Towards reducing spatial temperature gradients

Prof. Daniel Palomino

60

Thermal-aware scheduling • Summary of analysis – The unbalanced nature of the multithreaded HEVC directly impact in the temperature profile – Threads with different workload demands will generate unbalanced thermal profiles • High spatial temperature gradients degrade reliability

– However, workload distribution is mainly affected by video properties • This can be used to appropriate schedule • Towards reducing spatial temperature gradients

Prof. Daniel Palomino

61

Thermal-aware scheduling • Application-driven scheduling scheme – Application-level thread workload prediction – Temperature-aware scheduler Application

Threads before Threads after scheduling scheduling .

. . .

Multicore platform

core

core

core

core

core

core

. .

HEVC encoder Thread 0 Thread 1 Thread 2 Thread 3

Application-driven temperatureThreads workload aware scheduler

Prof. Daniel Palomino

Current thermal status

62

Thermal-aware scheduling • Problem formulation – Given a set of n cores 𝐶 = 𝑐0 , 𝑐1 , … , 𝑐𝑛 – Given a set of m threads 𝑇 = 𝑡0 , 𝑡1 , … , 𝑡𝑚 – The main goal is assign all threads in T to all cores in C • Minimize spatial temperature gradients

– Goal of temperature-aware scheduling • Assign each thread to each core • Considering current temperature status 𝑇𝑆 = 𝑡𝑠𝑐0 , 𝑡𝑠𝑐1 , … , 𝑡𝑠𝑐𝑛 • Considering future workload of threads 𝑊 = 𝑤𝑡0 , 𝑤𝑡1 , … , 𝑤𝑡𝑚

Prof. Daniel Palomino

63

Thermal-aware scheduling • Application-level thread workload prediction – Workload of neighbor frames are similar 0.4

0.4 0.35

Workload previous frame

Workload previous frame

• Temporal content correlation ρ=0.88

0.3 0.25 0.2 0.15 0.1 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Workload current frame

0.35 0.3

ρ=0.99

0.25

0.2 0.15 0.1 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Workload current frame

PartyScene 832x480

BasketballDrive 1920x1080 Prof. Daniel Palomino

64

Thermal-aware scheduling • Workload distribution variation – Video properties can guide possible workload variation between frames • Minimizing workload prediction errors 𝑤𝑝𝑡𝑖 = 𝑓 𝑊𝑀, 𝑇𝐿𝐶, 𝑇𝐿𝑃 = 𝑤𝑚𝑡𝑖 + ∆𝑡𝑙𝑡𝑖 ∆𝑡𝑙𝑡𝑖 = 𝑡𝑙𝑐𝑖 − 𝑡𝑙𝑝𝑖 𝑤𝑝𝑡𝑖 Workload prediction of i-th thread 𝑊𝑀 = 𝑤𝑚𝑡0 , 𝑤𝑚𝑡1 , … , 𝑤𝑚𝑡𝑚 Workload from content-temporal correlation 𝑇𝐿𝐶 = 𝑡𝑙𝑐𝑡0 , 𝑡𝑙𝑐𝑡1 , … , 𝑡𝑙𝑐𝑡𝑚 Texture levels of current frame 𝑇𝐿𝑃 = 𝑡𝑙𝑝𝑡0 , 𝑡𝑙𝑝𝑡1 , … , 𝑡𝑙𝑝𝑡𝑚 Texture levels of previous frame

Prof. Daniel Palomino

65

Thermal-aware scheduling • Prediction error

Frequency

Frequency

– Less than 0.1%

Error (%)

Error (%)

BQMall 832x480

RaceHorses 832x480

Prof. Daniel Palomino

66

Thermal-aware scheduling • Temperature-aware scheduler

Algorithm 7.1 Application-driven temperature-aware scheduler scheme Input: threads T, cores C, Spatial Gradient Threshold SGTth; 1: for each frame f Є V do: 2: WP = [ ] 3: TS = [ ] 4: for each thread t T do: 5: WM.extract() 6: TLC.extract() 7: TLP.extract() Monitoring variables 8: wpti = f(WM,TLC,TLP) 9: WP.append(wpti) 10: end for 11: for each core c C do: 12: tsi = get_temperature(ci) 13: TS.append(tsi) 14: end for 15: current_STG = max(TS) – min(TS) 16: if current_STG > STGth do: Reacting to large 17: WP.sort() 18: TS.reverse_sort() spatial gradients 19: schedule(T,WP,TS) //reaction 20: end for

Prof. Daniel Palomino

67

Thermal-aware scheduling • Experimental results 90

Core_0

Core_1

90

80

80

70

70

60

60 unware scheduler

our scheduler

90

core_0 core_4

core_0

unware scheduler

core_1 core_5

core_2 core_6

core_1

core_2

core_3

our scheduler

core_3 core_7

80 70

60 unware scheduler

our scheduler

Prof. Daniel Palomino

68

Thermal-aware scheduling • Experimental results

unaware scheduler

Our scheduler (STGth = 5 ºC)

#cores Max temp (°C) %time > 5 °C

2 83.2

4 84.3

8 83.2

2 78.8

4 79.8

43.7

44.3

43.7

0.0

0.0

0.0

%time > 10 °C

4.6

4.6

5.6

0.0

0.0

0.0

Prof. Daniel Palomino

8 76.9

69

Thermal-aware scheduling 79 75 70 65 59 55 51 47

C C C C C C C C

81 76 72 67 62 58 53 49

C C C C C C C C

83 78 73 69 64 59 54 51

Prof. Daniel Palomino

C C C C C C C C

70

Conclusions • We presented different solutions for thermal optimization of video coding systems – Application-driven DTM for HEVC • Run-time encoder configuration selection

– Temperature optimization using adaptive approximate computing • Approximate computing levels adaptively applied

– Application-driven thermal-aware scheduling • Scheduling scheme to reduce spatial temperature gradients of multicore systems

Prof. Daniel Palomino

71

Conclusions • Challenge – Improve thermal profiles of video coding systems

• Main idea – Raise the abstraction of temperature management to the application-level – Use video properties and application characteristics to drive the temperature solutions

• Thesis results demonstrated that thermal management can be performed at the application level successfully

Prof. Daniel Palomino

72

MINISTÉRIO DA EDUCAÇÃO UNIVERSIDADE FEDERAL DE PELOTAS PROGRAMA DE PÓS-GRADUAÇÃO EM COMPUTAÇÃO

Thank you! Questions?

Application-driven temperatureaware solutions for video coding Professor: Daniel Palomino [email protected]

Prof. Daniel Palomino

Apresentação do PowerPoint

Linux monitoring sensors software. • The video coding application is restricted to one core. • Temperature readings are performed every 500 milliseconds. 24 ...

2MB Sizes 1 Downloads 99 Views

Recommend Documents

Apresentação do PowerPoint - Ipsos
Feb 1, 2018 - áreas urbanas de acordo com dados oficiais do IBGE ... Não sabe / não conhece suficiente para avaliar. 62. 58. 57. 56. 52. 51. 49. 49. 48. 41.

Apresentação do PowerPoint
Page 1. Antebrachium. Carpal bones. Proximal row. Distal row. 4th metacarpal bone. 3rd metacarpal bone. 2nd metacarpal bone. Proximal phalanx. DDE. Middle phalanx. Distal phalanx. Dog. Pig. OX. Horse. Radius. Ulna. Radial carpal bone. Intermediate ca

powerpoint template -
Four SICA project meeting. Imperial College, London, 24 May 2013. Florin Grigorescu MD, PhD ... Consortium. Taragona. Cantazaro. Rome. Bologna. Bucharest.

מצגת של PowerPoint - Editorial Express
Overconfidence. Introduction. Example. Results. Variants. Evolution. Model. People report 80 .... Principal wants agents with the most accurate private signals to.

PowerPoint bemutató - Tárki
Dec 15, 2012 - Data archives. Scientific analytic papers. Survey instruments. Administrative data collection. Other forms of data collection. Outline: the structure of the .... Form of access (web access, file transfer, remote statistical analysis ..

PowerPoint bemutató - Tárki
Dec 15, 2012 - (Tarki Social Research Institute, Budapest) with contributions by .... Project websites ... content, frequency, etc. so that all will understand the.

מצגת של PowerPoint
information → a few overconfident agents survive. ▫. Second-best outcome; compensates another bias (e.g., excess risk aversion): Wang (91), Blume & Easly ...

Presentación de PowerPoint
Job Description: GRADIANT, Galician Research and Development Center in Advanced Telecommunications, leader in the generation and transfer of ICT knowledge in Galicia, needs to incorporate a ... Work with a qualified team with flexible schedules. •

PowerPoint Sunusu -
Flexible Production. • Specialization of labor. • Keynesian Institutions are failed. • Labor unions are getting out of picture. • Social security system declined.

016 PowerPoint Slides.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. 016 PowerPoint ...

PowerPoint bemutató - Tárki
Dec 15, 2012 - (Tarki Social Research Institute, Budapest) with contributions ... Social monitoring and social reporting. Scientific community .... Project websites ...

PowerPoint Handout
During the PPT activity, take notes on Romanticism from the screen so that during our class ... Romantic Period, take notes on the pieces of art and poetry.

Presentazione di PowerPoint Services
What we discussed so far connected. Lives. Internet of me. Speed ... A moment we reflexively turn to a device to act on a need we have that moment – to learn, discover, find or buy ... Smartphone. Mobile search for info on purchasing during free ti

PowerPoint ç°¡å ±
TCCIP webpage ... Build interdisciplinary cooperation and information integration for climate change research. Produce projections of ... Thanks for your attention!

Microsoft PowerPoint - Presentation1
Page 1. WWW.SYLLABUSPDF.IN. Page 2. WWW.SYLLABUSPDF.IN.

The Rhetoric of PowerPoint - Seminar.net
Apr 27, 2006 - Seminar.net - International journal of media, technology and lifelong learning. Vol. .... In the USA, PowerPoint is one of the most popular.

Presentazione standard di PowerPoint - Anils
Mar 11, 2016 - which allows you to embed any video into an online multiple-choice ... students who struggle, the instructor can send them directly back to the ... Other free video making programmes: Screencast-o-matic, Jing, Quicktime.

Presentazione di PowerPoint Services
>25% of conversions are impacted by mobile. 10%. Last click model. 34%. Linear model. TAKEAWAY: Mobile's influence is bigger than expected. NEXT STEP:.