Evaluation of the Performance/Energy Overhead in DSP Video Decoding and its Implications Yahia Benmoussa†§ , Jalil Boukhobza† , Eric Senn† and Djamel Benazzouz,§ † Universit´e

Europ´eenne de Bretagne, CNRS, UMR 6285 Lab-STICC, France § Universit´e M’hamed Bougara, Boumerdes, Algeria

Abstract—Video decoding is considered as one of the most compute and energy intensive application in energy constrained mobile devices. Some specific processing units, such as DSPs, are added to those devices in order to optimize the performance and the energy consumption. However, in DSP video decoding, the inter-processor communication overhead may have a considerable impact on the performance and the energy consumption. In this paper, we propose to evaluate this overhead and analyse its impact on the performance and the energy consumption as compared to the GPP decoding. Our work revealed that the GPP can be the best choice in many cases due to the a significant overhead in DSP decoding which may represents 30% of the total decoding energy. Keywords—Video decoding, Performance, Energy, GPP, DSP, H264/AVC, OMAP, Gstreamer.

I.

I NTRODUCTION

Energy saving consideration becomes at the center of the hardware and the application design in mobile devices such as smart-phones and tablets. In fact, Lithium battery technologies are not evolving fast enough, this negatively impacts the autonomy duration. This is becoming a critical issue especially when using processor intensive applications such as video playback. In [1], it is shown that video playback is the most important energy consumer application used in mobile devices. This is due to the important use of the processing resources responsible of more than 60% of the consumed energy [1]. Furthermore, to allow high quality video decoding, the processors equipping mobile devices are more and more powerful. A hardware configuration including a processor clocked at more than 1 GHz frequency becomes common. The main drawback of using high frequencies is that it requires higher voltage levels. This leads to a considerable increase in energy consumption due to the quadratic relation between the dynamic power and the supplied voltage in CMOS circuits. To overcome this issue, Digital Signal Processors (DSP) are used to provide better performance-energy properties. Indeed, the use of parallelism in data processing increases the performance without the need to use higher voltages and frequencies [2]. In case of DSP decoding, in addition, to the clock frequency and the decoded video quality parameters stated above, the overhead due to the inter-processor communication should be considered. This issue was addressed from performance point in studies such as [3], [4]. However its impact on the energy consumption as compared to a GPP decoding was not studied before. In this paper, we propose to evaluate the performance and the energy overhead in DSP decoding and analyse its impact on the performance and the energy

consumption as compared to GPP video decoding. For this purpose, we conduct some experimental measurements which are described in section II. The obtained results and the conclusion are discussed in sections III and IV respectively. II.

E XPERIMENTAL M ETHODOLOGY AND S ETUP

In the experimentations, we followed two steps. 1) A video frame level performance and energy characterization where the DSP performance and energy overhead is evaluated in a frame decoding cycle. We define the overhead as all the processing which is not related to the actual frame decoding such as GPP-DSP communication and cache memory maintenance operations. 2) The video sequence performance and energy consumption are evaluated and compared to those of the GPP. Power measurements performed in this study were achieved using the Open-PEOPLE framework [5], a multiuser and multi-target power and energy optimization platform and estimator. The target platform is OMAP3530EVM board which consists of a Cortex A8 ARM processor and TMS320C64x DSP. The power consumptions of the DSP and the ARM processors are measured using . On this hardware platform, the Linux operating system version 2.6.32 was used. The video decoding was achieved using Gstreamer, a multimedia development framework. The ARM decoding, was performed using ffdec h264, an open-source plug-in based on ffmpeg/libavcodec library. For DSP decoding, we used TIViddec2, a proprietary Gstreamer H264/AVC baseline profile plug-in provided by Texas Instrument. The videos sequences used in the tests are Harbor and Soccer. Each video is coded in different biterates (64 Kb/s, . . . 5120 Kb/s) and qcif, cif and 4cif resolutions. Each video is then decoded at different clock frequencies ranging from 125 MHz to 720 MHz. The performance (Frame/s) and the energy consumption (mJ/frame) are measured for each (bit-rate, resolution, frequency). III.

E XPERIMENTAL R ESULTS & D ISCUSSIONS

A. Frame level Performance and energy characterization Fig. 3 shows the power consumption level of 4cif and qcif DSP video decoding. The DSP frame decoding phase is represented by the values varying between 0.7 W and 1.1 W corresponding to [32 ms, 62ms] and [6.2 ms, 7.5ms] intervals. This phase is terminated by a burst of DMA transfers of the decoded frame macro-blocks from the DSP cache to the shared memory which corresponds to the intervals [56 ms, 62ms] and [7.2 ms, 7.5ms] and is illustrated by an increase in memory power consumption. The ARM wake-up latency is represented by the power level 0.66 W. The ARM wake-up is represented

qcif ARM and DSP decoding (Harbour)

400

160

DSP

ARM

4cif ARM and DSP decoding (Harbour)

70

DSP

140

300

ARM

60

250 200 150 100

Frames/s

DSP

Frames/s

Frames/s

cif ARM and DSP decoding (Harbour)

180

ARM

350

120 100 80 60

20

0

0 0

200

400

600

Frquency

6000

4000

2000

30 20

0 6000

200

200

Frquency

600

5000

Frquency

4000

400

4000

400

6000

Bitrate (Kb/s)

0

40

10

40

50

50

0

3000

2000

600

2000 800

1000

Bitrate (Kb/s)

Bitrate (Kb/s)

0

Fig. 1: ARM and DSP decoding performance of the Harbour video

4cif decoding energy consumption (Harbour)

cif decoding Energy consumption (Harbour)

qcif decoding energy consumption (Harbour)

ARM DSP

ARM DSP

ARM DSP

35

10

5

30

4 3 2

6 4 2

6000

1

0 100

mJ/Frame

mJ/Frame

mJ/Frame

8

0 0

6000

3000 300

400

Frequency

500

700

4000

200

2000 1000

600

400

Frequency

Bitrate (Kb/s)

2000

600

0

800

20 15 10 5

5000 4000 200

25

800

Bitrate (Kb/s)

0

0 0

6000 4000

200 400

Frequency

2000

600 800

0

Bitrate (Kb/s)

Fig. 2: ARM vs DSP decoding energy consumption of H264/AVC video

by the power transition to 0.83 W. Table I shows the obtained time and energy overhead values for qcif, cif and 4cif videos. One can notice that the overhead can reach 50% and 30% for energy and performance respectively in case of qcif resolution.

Memory DSP + ARM

Frame decoding period Overhead DSP Decoding DSP idle/ ARM idle

DSP active/ARM idle

1,2

Decoded frame transfer using DMA

(b) DSP frame decoding (qcif) power consumption

Power (W)

Power (W)

(a) DSP frame decoding (4cif) power consumption

Overhead

1,2

1

1

0,8

Memory

Frame decoding period DSP idle/ARM idle

DSP + ARM

DSP active/ARM idle

DSP Decoding

Decoded frame transfer using DMA

DSP idle/ARM active

0,8 DSP idle/ ARM active

0,6

Memory power increase due to frame copy.

0,4

0,6 Memory power increase due to frame copy.

0,4

0,2

2) Energy Consumption Results: Fig. 2 shows a comparison between the ARM and DSP video decoding energy consumption (mJ/Frame) in case of 4cif, cif and qcif resolutions. The DSP qcif video decoding consumes 100% more energy than the ARM in case of low bit-rate and 20% for high bit-rate. On the other hand, the DSP 4cif video decoding consumes less energy than the ARM although. In case of cif resolution, we noticed an crossing between the ARM and the DSP energy consumption levels. In fact, for low bit-rate starting from 1Kb/s, the ARM consumes less energy than the DSP. IV.

0,2 0

10

20

30

40 50 Time (ms)

60

70

80

90

100

0

2

4

6

Time (ms)

8

10

12

Fig. 3: ARM and DSP frames decoding

TABLE I: DSP video decoding time and energy overhead Resolution qcif (128kb) cif (1024kb) 4cif (5120 kb)

DSP decoding energy(mJ/frame) Processing Total Overhead (%) 1.97 4.16 52.64 6.016 8.36 28.11 23.73 25.93 8.48

C ONCLUSION

14

DSP decoding time (ms/frame) Processing Total Overhead (%) 1.71 2.33 30.48 5.35 6.72 20.38 21.59 22.16 2.5

B. Video Stream Performance and Energy Evaluation 1) Decoding Performance Results: Fig. 1 shows a comparison between ARM and DSP video decoding performance in case of 4cif, cif and qcif resolutions for the Harbor video sequence. The flat surface represents the reference acceptable video displaying rate (30 Frames/s). One can observes that the performances of the ARM processor and of the DSP are almost equivalent in case of qcif resolution. However, the ARM decoding speed is 43% higher than the DSP in case of 64 Kb/s bit-rate while the DSP decoding speed is 14% higher than the ARM in case of 5120 Kb/s bit-rate. For cif and 4cif resolutions, The DSP decoding is almost 50 % faster than of the ARM in case of cif resolution and 100% in case of 4cif. This ratio decreases drastically for low bit-rates where the ARM performance increases faster than the one of the DSP.

The analysis of the obtained results shows that the overall performance and the energy efficiency of the DSP as compared to the ARM processor depend mainly on the required video coding quality (bit-rate and resolution). In fact, the DSP video decoding is the best performance and energy efficient choice in case of 4cif resolution and the use of ARM decoding is better in case of qcif resolution and cif resolution with a bitrate less than 1 Mb/s. The drop of the performance and energy consumption properties of the DSP video decoding are due to a significant inter-processors overhead. R EFERENCES [1]

[2]

[3]

[4]

[5]

A. Carroll and G. Heiser, “An analysis of power consumption in a smartphone,” Proceedings of the 2010 USENIX conference on USENIX annual technical conference, pp. 21–21, 2010. D. Markovic, V. Stojanovic, B. Nikolic, M. Horowitz, and R. Brodersen, “Methods for true energy-performance optimization,” Solid-State Circuits, IEEE Journal of, vol. 39, no. 8, pp. 1282–1293, 2004. P. Ramachandra and M. R. Satish, “H.264 main profile video decoding implementation techniques on OMAP3430IVA,” Signal Processing (ICSP), 2010 IEEE 10th International Conference on, pp. 271–274, 2010. S. Kant, U. Mithun, and P. Gupta, “Real time H.264 video encoder implementation on a programmable dsp processor for videophone applications,” Consumer Electronics, 2006. ICCE ’06. 2006 Digest of Technical Papers. International Conference on, pp. 93–94, 2006. E. Senn, D. Chillet, O. Zendra, C. Belleudy, S. Bilavarn, R. Atitallah, C. Samoyeau, and A. Fritsch, “Open-people: Open power and energy optimization PLatform and estimator,” 2012 15th Euromicro Conference on Digital System Design (DSD), pp. 668 –675, Sep. 2012.

Evaluation of the Performance/Energy Overhead in ...

compute and energy intensive application in energy constrained mobile devices. ... Indeed, the use of parallelism in data processing increases the performance.

152KB Sizes 0 Downloads 244 Views

Recommend Documents

Fundamentals of Inter-Cell Overhead Signaling in ... - IEEE Xplore
Index Terms—Heterogeneous cellular network, inter-cell coor- dination ...... Overhead outage p versus average packet service rate =B in the three scenarios.

Evaluation of the CellFinder pipeline in the ... - Semantic Scholar
Rat Genome Database, Medical College of Wisconsin, Milwaukee, USA .... using GNAT [5], a system for extraction and normalization of gene/protein mentions.

Evaluation of the CellFinder pipeline in the ... - Semantic Scholar
manually annotate the gene/protein expression events present in the documents to allow comparison between ... parser output. 4 http://opennlp.apache.org/ ...

Evaluation of the CellFinder pipeline in the BioCreative IV User ...
approach in which synonyms from the Cell Ontology (CL) are matched against the text using. Linnaeus [6]. Regarding the post-processing step, we included an extra acronym resolution for cell types, besides the one carried out by the Metamap tool. Addi

Evaluation of Business Solutions in Manufacturing Enterprises
Department of Information and Communication Technology,. University of ... He received his degree in Informatics Engineering from Institut Teknologi · Bandung ...

Evaluation of Business Solutions in Manufacturing Enterprises
degree from Computer Science Institute of University of Ancona (Italy) in ... The last years have seen the emergence of risk as a metric for prioritizing events ... model, manufacturing SMEs can be distinguished into two main categories: product-.

Citation Counts and Evaluation of Researchers in the ...
archiving their own copy and making it freely accessible on the web. This study analyzed more ... citations—we consider all the citations received up to the data.

Evaluation of the integrated care and support Pioneers programme in ...
Evaluation of the integrated care and support Pioneers ... funding arrangements for integrated care in England.pdf. Evaluation of the integrated care and support ...

Experimental Evaluation of the Variation in ...
A test data adequacy criterion is a set of rules that pre- scribe some property ... control-flow test criteria check these Boolean decisions of the program based on ...

Bibliometric Evaluation of Researchers in the Internet Age
Oct 3, 2014 - rather than quality, influence what gets read and cited. Two, re- search evaluation based on citation counts works against many types of ... ing process, of the acceptance criteria enforced by editors, .... alone contains materials tagg

Evaluation of the Medical Marijuana Program in Washington ...
Page 2 of 47. Acknowledgements. We would like to thank the following people and organizations for their. assistance during the research and preparation of this report: Arthur Parker. Mark A. Peterson. Vanessa M. Soma. Americans for Safe Access. Distr

Overhead Door Company.pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying.

ABILITY AND EDUCATION IN THE POLICY EVALUATION ...
biases) and on the shape and variability of marginal returns to education. .... In the simplest model, we ignore option values of continuing studying after the level ...

Evaluation and Management of Febrile Seizures in the ...
Feb 2, 2003 - principles of the evaluation and management of this common disorder. Most febrile ..... solution or rectal gel).20-22 It is generally agreed that any seizure ..... Febrile seizures and later intellectual performance. Arch. Neurol.

ABILITY AND EDUCATION IN THE POLICY EVALUATION ...
choice with hindsight. The correct economic incentives are provided if the economic system allows individuals to have a sorting gain. In other words, the ...

Throughput Versus Routing Overhead in Large Ad Hoc ...
Consider a wireless ad hoc network with n nodes distributed uniformly on ... not made or distributed for profit or commercial advantage and that copies bear this ...

Evaluation of quinolones residues in bovine meat in ...
antimicrobial therapy in human medicine. Public health risks comming from Salmonellas and. Cmpylobacter resistance strains increased morbidity and mortality ...

Evaluation of the Clinical Que Evaluation of the Clinical ...
medical literature in response to ad hoc clinical questions. ... numerous search engines – both open domain and domain-specific – as well ... users access the source of the item to learn more. ..... http://portal.acm.org/citation.cfm?id=1390338.

Evaluation of the Clinical Que Evaluation of the Clinical ...
numerous search engines – both open domain ... to a query can result in a good answer if the user is lucky .... based ranking and presentation unit constructor.

Conduct of the Regional Evaluation of the Application Projects of ...
Conduct of the Regional Evaluation of the Application Projects of School Heads Development Program.pdf. Conduct of the Regional Evaluation of the ...