Green Metadata Based Adaptive DVFS for Energy Efficient Video Decoding Yahia Benmoussa∗† , Eric Senn∗ , Nicolas Derouineau‡ , Nicolas Tizon‡ , Jalil Boukhobza§ ∗ University

of South Brittany, UMR6285, Lab-STICC, France {firstname.lastname}@univ-ubs.fr of M’Hamed Bougara, LMSS, Algeria, [email protected] ‡ Vitec Multimedia, France, {firstname.lastname}@vitec.com § University of Western Brittany, UMR6285, Lab-STICC, France, [email protected] † University

Abstract—We present in this paper GM-DVFS, an adaptive DVFS scheme for energy efficient decoding of H.264 videos. GMDVFS uses metadata (normalized by the MPEG Green Metadata standard) providing information about the upcoming workload. In the solution we propose, these metadata are processed within an adaptive filter to build dynamically an accurate video decoding complexity model. This model serves to calculate the minimal required processor frequency for decoding a video frame while guaranteeing the real time constraints. Our performance evaluations showed that the proposed algorithm is able to converge to an accurate complexity model (4%) in less than 1 second (in the worst case). Moreover, it is simple to implement (250 lines of C code) and induces very low overhead (1400 cycles/frames). On the other hand, it allows to achieve up to 46% energy saving as compared to the ondemand Linux DVFS governor.

I. I NTRODUCTION The new trends in video application usage (Youtube, IPTV, Video conferencing) combined with the market explosion of multimedia consumer electronics rise new challenges for mobile device manufacturers. Indeed, to fit the important processing requirement and real-time constraints of video applications, processing resources embedded in these devices tend to be more and more powerful and complex. As a result, there is a drastic increase in power consumption of video decoding [1]. In fact, according to [2], when playing-back a video, the processing resources consume more than 60% of the energy budget. This leads to a drastic decrease in mobile devices autonomy as battery technologies are not evolving fast enough to absorb the ever-growing energy requirements of such devices [3]. Dynamic Frequency and Voltage Scaling (DVFS) [4] is one of the most promising techniques used to reduce the power consumption in energy constrained mobile devices. In a processor supporting DVFS, the supply voltage and the frequency can be adapted dynamically to the workload. A considerable energy can then be saved when the processor operates at a low frequency due to the quadratic relation between the supply voltage and the power consumption [4]. DVFS is particularly interesting in the case of video applications which are characterized by high variability in the workload. However, DVFS for video decoding is a challenging issue due to the difficulty to predict the upcoming workload because of the increasingly complex used codecs. Indeed,

accurate workload prediction is essential to avoid deadline misses while displaying the decoded frames. A simple approach to predict video decoding complexity is to use a workload averaging policy based on the previous decoding history [6]. While this strategy is easy to implement, it fails to make a prediction at a frame basis leading to deadline misses. To prevent this issue, the use of a postdecoding buffer allows to decouple the frames displaying and decoding processes [7], [8]. Thus, in case of a workload underestimation, there will be some available frames to be displayed. However, the use of a buffer induces an extra latency which is not acceptable for video application like video conferencing. In such a case, the DVFS controller should predict proactively and accurately the upcoming video workload and adjust accordingly the processor frequency. In that direction, the recently released Green Metadata MPEG standard [9] proposes a set of techniques which assist the video decoder in saving its energy consumption. One suggested approach consists in embedding in the video stream some metadata providing information about the upcoming video complexity allowing to the decoder to react proactively. While, by definition, the proposed metadata are exclusively dependent on the video content, the DVFS controller should use them to build a complexity model for the used video decoder and the execution platform. In this paper, we propose an efficient DVFS algorithm which makes use of such metadata for saving the energy while decoding videos compressed using H.264, the most used video encoding standard. It is based on an adaptive filter [10] to build an accurate video decoding complexity model. The experimentations show that the learning process is fast and converges to an accurate complexity model for various processor architectures and video configurations. We show also that the proposed solution is easy to implement and induces very low overhead on the video decoder. The remainder of this paper is organized as follows: Section II provides an overview of the metadata proposed in the Green Metadata standard of MPEG. The proposed DVFS solution is detailed in section III. Section IV and V are dedicated to the experimental setup used and the evaluation results respectively. Section VI presents some related works proposing metadataassisted DVFS for video decoding. A conclusion and future works are provided in section VII.

7

8

6

6

x 10

entropy decoding complexity vs frame size 6

3

x 10

intra. pred. complexity vs number of intra−coded MB

2.5 2

4

4

6

7

x 10

IDCT complexity vs number of non−zero 8x8 blocks

2.5

x 10

2

12

inter. pred. complexity vs number of six−tap−filter

x 10

dblk filter complexity vs number of α point dblk instances

10

1.5

8

2

1

6

1

0.5

4

3

1.5 1

2 0.5 0

5

0

2

4

6

8

4 10 x1012

0

0

500

1000

1500

0 2000 0

2000

4000

6000

0

8000 0

2 0.5

1

1.5

2

0

0.5

1

1.5

x1025

Fig. 1. Linear complexity models of H.264/AVC decoding modules [5]

II. G REEN M ETADATA S TANDARD Green Metadata is an MPEG standard released in 2015 [11], [9]. It aims to enhance the energy efficiency of H.264 video decoding by specifying 3 types of metadata : • display metadata: related to screen power management approaches. • quality metadata: related to energy efficient media selection strategies used in dynamic adaptive streaming. • complexity metadata: related to complexity estimation approaches. In this study, we focus on the last type of metadata which is used to build a video decoding complexity model for predicting the upcoming workload. This complexity model is then used to drive the processor frequency to reduce the consumed energy without impacting the real time constraints. To achieve this objective, the green metadata standard specifies Complexity Metrics (CM) metadata to be sent to the decoder. Each CM is specific to a particular H.264 decoding module (DM). Actually, there exist 5 metrics for entropy decoding, intra-prediction, IDCT-dequantization, inter-prediction and deblocking filter modules (see Table I). Then, based on the linear relation-ship (see Fig. 1) observed between the required processor cycles for executing each H.264 decoding module and the associated CM [12], [5], the total complexity can be defined as :

CM Nbit NM B Nnz Nsix Nα

Decoding module Entropy decoding Intra-prediction IDCT-Dequantization Iter-prediction Deblocking filter

definition bitstream size number of intra-coded macroblocks number of non-zero mactroblocks number of six-tap filtering number of α-point deblocking instances

TABLE I C OMPLEXITY METRICS DEFINED IN THE G REEN M ETADATA STANDARD

tion (SEI) messages except of Nbit which can be extracted by calculating the size of the video frames. III. G REEN M ETADATA BASED DVFS

In this section, we describe a Green Metadata based DVFS (GM-DVFS) we designed to reduce the energy consumption of video decoding. First, GM-DVFS has to build the complexity model described in Eq. II to calculate proactively the appropriate processor frequency for each video frame. In practice, while the CM can be extracted from the SEI messages in the video stream, the coefficients of the complexity model ( kinit , kbit , kM B , knz , ksix , kα ) should be estimated dynamically according to the execution platform. One approach approach to estimate the complexity model coefficients is to build separately the linear models of the different decoding modules as proposed in [12]. In this case, for each macroblock (16 x 16 pixels), the required number of cycles for executing entropy decoding, intra prediction, dequantization, inter-prediction and deblocking filter modules C = kinit + kbit .Nbit + kM B .NM B + knz .Nnz + ksix .Nsix + kα .Nα (I) is measured. Then, a simple moving average filters can be used to estimate kbit , kM B , knz , ksix and kα coefficients where kbit , kM B , knz , ksix and kα are coefficients which based on the measured number of cycles for executing each depend on the video content and on the execution platform. the decoding modules of the previous macroblocks. On the other hand, kinit is a constant which is not associThis approach has two main drawbacks. First, it requires ated to any metadata. It serves to represent the initialization to insert cycles profiling code inside macroblock decoding processing which does not depend on the video content [12]. functions which is not possible in case of closed-source and An enhancement of the accuracy of this complexity model proprietary codecs∗ . Second, the complexity of the profiling was proposed in [13]. In this study, the authors suggested to instructions may induce a non-negligible overhead even if low use the number of non-zero 8x8 blocks (Nnz8 ) instead of knz level assembly code is used for cycles counting. In fact, cycles as a CM of the ”IDCT-dequantization” module. Thus, we will counting should be performed for each decoding module and assume in this paper, the complexity model defined as : for each macroblock. Our experimentations showed that the overhead is 200 cycles per macroblock. In case of 4CIF (704 C = kinit + kbit .Nbit + kM B .NM B + knz8 .Nnz8 + ksix .Nsix + kα .Nα (II) x 576 pixels) video resolution, this represents around 316800 cycles per frame. Notice that Green metadata proposes to explicitly send the ∗ Codec vendors usually provide frame level decoding API. normalized CM within a Supplemental Enhancement Informa-

1

Complexity model coefficients (Weights )

kinit

Nbit

kbit

i

NMB

Σ

kMB

i

Nnz8

knz8

Nsix

ksix





i

i

i

CP : Predicted complexity (Output signal)

Weights update

Ni : Green

CP=kinit+kbit.Nbit+kMB.NMB+knz8.Nnz8+ksix.Nsix+kα.Nα

Metadata (Input signal)

LMS

+ − CD : Profiled compexity (Desired signal)

Fig. 2. Use of an Adaptive Linear Combiner for learning the video decoding complexity model

A. Complexity model learning In GM-DVFS, we propose to achieve a cycle profiling at a video frame level. Thus, based on the number of decoding cycles of each frame, the complexity model is learned without the need of detailed information about the complexity of each decoding module separately. Due to the linearity of the complexity model (see Eq. II), we propose to use an Adaptive Linear Combiner (ALC) [10] to learn the related coefficients. Indeed, The ALC is an adaptive filter which output is a linear combination of its inputs. It receives an input signal vector and the desired output to effect learning. The components of the input vector are weighted by a set of coefficients. The sum of the weighted inputs is then computed, producing a linear output. During the training process, input patterns and corresponding desired responses are presented to a Least Mean Square (LMS, Widrow-Hoff) learning rule to adjust the weights so that the output responses to the input patterns will be as close as possible to their respective desired output. The use of an ALC to learn the video decoding complexity model is illustrated in Fig. 2. Suppose that Ni = (Nbiti , NM Bi , Nnz8i , Nsixi , Nαi , 1) is a vector containing the complexity metrics associated to the frame Fi . Then, for each Ni (input signal) and assuming the linear model described in Eq. II (output is a linear combination of the inputs), the model coefficients (weights) represented by the vector k = (kinit , kbit , kM B , knz8 , ksix , kα ) are updated after decoding each video frame. The objective is to make the predicted complexity CPi (output signal) as close as possible to the profiled complexity CDi (desired signal). Notice that the first element of Ni associated to kinit is set to 1 to represent the constant processing which is invariant for all the frames. Algorithm-1 lists the pseudo code of the procedure which updates the coefficients of the video decoding complexity model. Initially, all the weights are set to zero. Then, the weights are updated using the LMS learning rule described

Algorithm 1 Complexity model leaning 1: Initialize k ← (0, 0, 0, 0, 0, 0) 2: procedure U PDATE W EIGHTS (CDi ,CPi ,Ni ) 3: k ← k + lr.(CDi − CPi ).Ni 4: end procedure Algorithm 2 DVFS Control 1: Ep ← 1 2: for i ← 1 to N umF rames do 3: Ni ← GetGreenM etadata(Fi ) 4: Predict CPi ← k.Nit 5: if Ep > Eth then SetF requency(fmax ) 6: else SetF requency(CPi .D ) 7: end if 8: CDi ← Decode(Fi ) 9: U pdateW eights(CDi ,CPi ,Ni ) 10: Ep = α.Ep + (1 − α).|CPi − CDi |/CDi 11: end for

in line 3. The internal parameter lr is a real number between 0 and 1 representing the learning rate. Notice that it is well known that the normalization of LMS inputs enhances the convergence [14]. Thus, Nbit is normalized by dividing the frame size (in bits) by (Npix x 0.4), where Npix is the frame size in pixels and 0.4 is an approximation of the maximum bits/pixel obtained after H.264 video encoding. The other CM are already normalized in Green Matadata standard [11]. We will describe hereafter how to use this procedure within GM-DVFS, a metadata based DVFS for video decoding. B. DVFS Control The Algorithm 2 shows the pseudo code of GM-DVFS for decoding a video which displaying rate is D frame per second. For each coded frame Fi , the associated Green Metadata are read using GetGreenMetadata function from SEI messages.

green metadata

(1)

ALC

per-frame processor cycles (4) profiling

Video frame

Adaptive Linear Combiner dynalic volatage and frequency scaling (2)

d

GM-DVFS

(3) Decoding thread

(5)

Synchronization (6) and display

LMS

Displaying thread

P0

P1

Processor core

Processor core

Fig. 4. GM-DVFS implementation within a video decoder

7

Average cycles per frame

6

x 10

Cortex A7

qcif cif 4cif HD

4 2 0 6

0.2 7 x 10

0.4

0.6

0.8

1

1.2

1.4

Cortex A15

qcif cif 4cif HD

4 2 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Processor frequency (GHz)

The predicted decoding complexity CPi of the ith frame is estimated using the current values of the model coefficients. If the prediction error Ep is higher than a given threshold Eth (the complexity model does not converged yet) then the processor frequency is set to the maximum value fmax to prevent deadline miss due to prediction inaccuracy. Note that, due the off-chip memory access delay, the required decoding cycles depend on the used frequency [15]. In our case, this behavior can be neglected since we have noticed an almost constant number of cycles when varying the frequency (see Fig 3). This may be explained by memory cache and pipelining performances of the used processors which reduce the cycles wasted in memory stall. Thus, assuming that the complexity model has converged, the clock frequency 1† is set to CPi divided by D . We suppose that Decode(Fi ), the function which decodes the Fi , embed some profiling 1 D

IV. EXPERIMENTATIONS

2

Fig. 3. Decoding cycles vs processor frequency

† Frame F should be decoded in maximum i time to guarantee a low latency.

instructions permitting to return CDi , the total number of used cycles during the decoding. Then, in addition to CDi and Ni , CPi is fed to U pdateW eights procedure to update the model coefficients. Finally, the prediction error Ep is updated by means of an exponentiation moving average filter with a smoothing factor α. This allows to detect the model convergence based on the errors history while being sensitive to a rapid rise in prediction error due to eventual configuration change. In the next sections, we describe the implementation of this algorithm and the evaluation of its performance in terms of convergence, accuracy and power consumption saving.

second after its availability

A. Platform setup We use the Odroid XU3 board as a test platform. This platform is based on the Exynos 5422 ARM SoC which contains 4 energy efficient cortex A7 in-order processors and 4 high performance cortex A15 out-of-order processors. The A7 supports 12 frequencies (from 200 MHz to 1.4 GHz) and the A15, 19 frequencies (from 200 MHz to 2 GHz). We have used the Linux BSP provided for Odroid XU3. Its Linux kernel was reconfigured to include the driver for the userspace governor, a Linux DVFS governor allowing to set the processor frequency from userspace applications. B. GM-DVFS implementation To test the proposed DVFS algorithms, GM-DVFS and PGM-DVFS was implemented within mono-core and parallel multi-core video decoding respectively. hereafter, we describe the internal details of their implementations. Figure 4 illustrates the internal implementation of GMDVFS withing a video decoder based on the open sourceffmpeg library. First, the green metadata are extracted from the SEI message associated to a video frame (1) and fed to ALC to predict the required cycles for decoding it. Then, the adequate core frequency is calculated accordingly and set using userspace governor, a policy allowing to set the processor

V. R ESULTS A. Convergence and accuracy

Fig. 5. Measurement of the power consumption of Odroid XU3 board using NI-PXI-4472 digitizer

frequency from outside the Linux kernel (2). Notice that the set frequency is calculated based on 10% over estimation of the predicted cycles to prevent deadline miss due to prediction error then is rounded to the nearest higher available frequency value. The video frame then retrieved (3) and decoded (5). To avoid any impact on the prediction accuracy, the decoded video frame is then displayed (6) by a dedicated thread running on separate processor core (6). The value of the cycles counter register is read before and after the frame decoding to calculate the actual required cycles which is then fed to the ALC (4) to update the complexity model weights. The measured complexity of the above described algorithm is around 1400 and 800 cycles for each frame or slice decoding on cortex A7 and A15 receptively. This overhead is hundred times lower than the method proposed in [12], based on MB profiling. On the other hand, about 250 and 500 lines of C code was added to ffmpeg to implement the above described monocore and multi-core video decoders respectively. All these modifications was done outside the H.264 decoding functions of the ffmpeg decoder. C. Test videos The video decoder was used to decode a set of representative video sequences (Soccer, Harbor, Bunny). Each one of these videos was encoder into different resolutions: QCIF (176x144), CIF (352x288), 4CIF (704x576) and HD (1280x720) using x264 encoder. D. Power consumption measurement The Odroid-XU3 has on-board sensors to measure the power consumption of the little core, and the big core. However, the sampling rate of these sensors is less than 10Hz which is not sufficient to achieve an accurate power measurement. This is particularly true in case of DVFS for video decoding where the frequency switching may occur at a higher rate. To enhance the accuracy, we used an external digitizer (National Instrument PXI-4472) to measure the voltage on the on-board shunt resistors at 10 KHz sampling rate. The consumed energy was then calculated accordingly as described in [16].

Figure 6-a illustrates the predicted video decoding complexity (in term of processor cycles) for each video frame of the (Soccer,4CIF) sequence. The plots in red, blue and green correspond to the predicted cycles using 0.05, 0.025 and 0.01 learning rate respectively while the black plot corresponds to the profiled (actual) decoding cycles. One can observe that the lower is the learning rate, higher is the convergence time of the complexity prediction as illustrated in Fig. 6-b, showing the evolution of the prediction errors. For example, to reach a prediction error of 4% , it is needed to decode 30 frames when using lr = 0.05 and 150 frames when using lr = 0.01. Note that these convergence times represent the worst case, since the complexity models coefficient are initialized to zero. The learning rate seems to have an impact only on the convergence time. In fact, the zoom on the last 60 frames in Fig. 6-a shows that the prediction error related to the different learning rates are almost the same. Notice that we used a 10 seconds video length in our experimentations, However, in real life scenario, video duration is much more longer. Thus, even in the worst case (ex. 1 second in case of lr = 0.05), the convergence time is acceptable. Figure 6-c shows how the learning algorithm adapts dynamically the complexity model to different resolution of the Soccer video (QCIF, CIF and 4CIF). Indeed, the prediction error is 100% at the beginning of the decoding (see Fig. 6d) because the model coefficients are initialized to zero. The prediction error starts then to decrease during learning until it reachs the average value of 4%. When the video resolution changes, the prediction error rises rapidly because the previous coefficients do not fit the new video. A new learning phase is required to rebuild the complexity model. Notice that, we have obtained the same results for the other video sequences (City, Bunny) on both cortex A7 and A15 processors (see Table II). We expect that GM-DVFS can easily be generalized to any video decoder since the linear complexity model (Eq. II) was validated on a wide range of video/architectures configurations [5] and LMS algorithm is known to converge when its inputs are normalized [14]. B. Power consumption Figure 7-a and 7-b illustrate the frequencies selected by GM-DVFS for decoding QCIF, CIF, 4CIF and HD videos on Cortex A7 and A15 processors. One can observes that, it makes sense to use DVFS only in case of CIF/4CIF videos decoding on cortex A7 and 4CIF/HD video decoding on cortex A15. In the other configurations, the used processor are either under-dimensioned (e.g. HD decoding on Cortex A7) or overdimensioned (e.g. QCIF decoding on Cortex A7 and A15). Thus, we measured the consumed energy only in case of the relevant configurations. Figure 7-c shows, the variation of the power consumption resulting from using different frequencies while decoding 30 4CIF video frames (1 second) on cortex A15. The power consumption related to each frame decoding

7

15

7

x 10

7

x 10

x 10

Decoding cycles

8 6 10

4 2 0

5

0 1

Error

(a) Convergence of the video complexity6 model vs learning rate Cortex A15, SOCCER (4CIF, 2Kb/s) 4

Actual decoding complexity (lr = 0.05) Predicted decoding (lr = 0.025) (lr = 0.01) complexity

Zoom on the first 60 frames 0

10

20

30

40

50

60

(lr = 0.05) (lr = 0.025) (lr = 0.01)

0

50

100

Decoding cycles

6 5 4

Actual complexity

0 240

250

260

270

280

290

300

Frames index 150

200

250

300

(c) Complexity model adaptation while the video quality changes (lr = 0.025,Cortex A7, SOCCER)

6

x 10

QCIF quality

15

2

Frames (b)index Complexity model error vs learning rate

0.5

0

Zoom on the last 60 frames

10

Predicted complexity

5

3

CIF quality 0

2

260

280

300

320

CIF resolution

4CIF resolution

1

QCIF resolution Error

10

Framesmodel index error (d)Complexity

0.5 0

100

200

300

400

Frames 500 Index

600

700

800

Fig. 6. Complexity model learning

Processor

Video Soccer

Cortex A7 City Cortex A15

Soccer City Bunny

Resolution

Error

CIF 4CIF CIF 4CIF 4CIF 4CIF HD

2.96% 5.70% 3.44% 5,01% 3.56% 4.98% 4.57%

Energy (mJ/frame) OD-DVFS GM-DVFS 1.11 0.59 6.20 5.22 1.02 0.57 5.88 5.07 16.35 12.58 15.77 12.13 41.16 37.01

Saving rate (%) 46.24 15.85 44.11 13.77 23.03 23.08 10.08

TABLE II E NERGY CONSUMPTION OF VIDEO DECODING : GM-DVFS vs OD-DVFS

can be evaluated accurately thanks to the high sampling rate used in the experimentation (10 KHz). The power drop represented by 0.2 W values represent the processor idle states due to early frame decoding due to prediction overestimation. We compared GM-DVFS with the widely used Linux ondemand DVFS governor (OD-DVFS). Table II summarizes the measured Millijoules per frame (mJ/frame) for each configuration. GM-DVFS may allow up to 46% energy saving. The lower energy saving rate achieved by the Cortex A15 as compared to that of the A7 can be explained by its power consumption model. In fact, we expect that the contribution of the static power (which cannot be saved using DVFS) is more significant in case of the Cortex A15 because of its higher area.

VI. R ELATED WORKS To achieve a low latency DVFS for video decoding, the processor frequency should be adjusted at a frame basis which requires to use an accurate frame-by-frame complexity model for predicting the decoding complexity. In [17], [18], to predict MPEG2 frame decoding time, they use a linear relationship observed between the frame size and its decoding time [19]. They propose then to maintain statistics per frame type (I, P, B). The decoding time and frame length are correlated online to guide the selection of the appropriate processor frequency. However, the used linear complexity model is no longer valid in the case of new video compression standards which use aggressive techniques to reach high compression ratio. For example, starting from MPEG4 standard, the distinction of frame types does not even

Frequency

16 x 105 14

x 10

(a) Frequency scaling (Cortex A7)

MAX frequency = 1.4 Ghz

12 10

CIF resolution

8

HD resolution

6

QCIF resolution

4

4CIF resolution

2 6

x 10

2.5

Frequency

1.6

Power consumtion (W)

x 1060

400

(c) Power consumption Cortex A15, 10KHz samples

1.4

600

(b) Frequency scaling Frames(Cortex index A15)

1.2

2

MAX frequency = 2 Ghz Measured power consumption

0.6

idle state power consumtion

0.4

1

0.2

0.5

0

1000

1

0.8

1.5

800

0

0

200

400

600

800

1000

time (ms) 0

300

600

900

1200

Frames index Fig. 7. Frequency scaling using GM-DVFS

exist. Instead, every frame can include different types of (I, B, or P) macroblocks, and each macroblock requires different amount of processing. It is thus difficult to achieve an accurate estimation for the complexity of a given frame merely from its size. Thus, in [20], they propose an enhanced complexity model which take into account, in addition to the frame size, the number of macrobloc of a given type (I, P or B). These parameters are assigned weights to fit the frames complexity model. The developed model was used in a frame-by-frame DVFS strategy for MPEG4 video decoding. The recent widely used H.264/AVC [21] introduces more complexity and flexibility in the decoding process. In fact, depending on the decoder capabilities, multiple coding profiles (base, main and high) are supported and the encoding tools are highly customized. This results in large variation and nonstationary workload which is difficult to predict based only on standard information present in the frame headers. To overcome this issue, many studies proposed to use some information sent by the encoder (metadata) to assist the decoder in predicting the video workload. This information is the result of a complexity analysis at encoding phase [22]. The use of complexity metadata was first used for videoaware DVFS in [23]. In this study, they propose to decode the video at the server side to collect frame-by-frame complexity information. Then, the decoding time of each frame normalized to that of the first frame (reference frame) is sent to the decoder. The video decoder processes the first frame and can deduce then the decoding time of the next one based on the normalized decoding times. The prposed solution has two main drawbacks. First, it is not feasible for live video because of the video has to be decoded to collect the complexity information. Second, this solution supposes that differential complexity variation for adjacent frames remain constant for different architecture which is not practically verified. For

example, the modules of a video decoder (entropy, IDCT, inter-prediction) may scale differently depending on the use of optimization (such as SIMD) or not. In [24], they use a generic complexity model proposed in [25] where the decoding complexity (for non-standard wavelet based decoder) is expressed in terms of generic basic operations for each of the video decoding modules. For example, The complexity of entropy decoding is estimated in terms of symbol lookup in the coding table. The complexity of the inverse transform is estimated in terms of the number of multiply-accumulate (MAC) operations with non-zero Fourier coefficients. These information synthesizing the workload are then transformed online into a real complexity model at the decoder side using a Least Mean Square (LMS) adaptive filter which calculate the regression coefficients. In [12], they use a similar approach for building an accurate complexity model for H.264/AVC, the most used compression standard. Note that this study was the basis of the Green Metadata MPEG standard [26]. In this work, we proposed GM-DVFS, an efficient DVFS scheme based on green metadata. As compared to previous implementation [12], GM-DVFS is easy to implement, induces a very low overhead and allows a significant energy saving as compared to the state-of-the-art DVFS. VII. C ONCLUSION We proposed in this paper GM-DVFS, a Green Metadata standard compliant DVFS for video decoding. In addition to an energy saving rate up to 46%, the experimentations show a high learning capabilities of the proposed algorithm for different processor and video configurations. Moreover, the GM-DVFS is easy to implement and induces a very low overhead.

As future works, we plan to enhance the convergence time of the learning algorithm by predicting the variation trends of the complexity model parameters when the video quality changes. We will explore the generalization of GMDVFS scheme to multi-core architecture to further enhance the energy efficiency using parallelism [27]. Finally, we plan to finalize the ongoing standardization of Green Metadata for HEVC [28] in order to extend GM-DVFS to HEVC decoders. ACKNOWLEDGMENT

This work was supported by BPI France, R´egion Ilede-France, R´egion Bretagne and Rennes M´etropole through the French Project GreenVideo [29]. R EFERENCES [1] Y. Benmoussa, J. Boukhobza, E. Senn, Y. Hadjadj-Aoul, and D. Benazzouz, “A methodology for performance/energy consumption characterization and modeling of video decoding on heterogeneous soc and its applications,” Journal of Systems Architecture, vol. 61, no. 1, pp. 49 – 70, 2015. [2] A. Carroll and G. Heiser, “The systems hacker’s guide to the galaxy energy usage in a modern smartphone,” in Proceedings of the 4th AsiaPacific Workshop on Systems, ser. APSys ’13. ACM, 2013, pp. 5:1–5:7. [3] M. Broussely and G. Archdale, “Li-ion batteries and portable power source prospects for the next 5-10 years,” Journal of Power Sources, vol. 136, no. 2, pp. 386–394, Oct. 2004. [4] F. Yao, A. Demers, and S. Shenker, “A scheduling model for reduced cpu energy,” in Foundations of Computer Science, 1995. Proceedings., 36th Annual Symposium on, Oct 1995, pp. 374–382. [5] Y. Benmoussa, E. Senn, N. Tizon, and N. Derouineau, “Evaluation of the green metadata,” in GreenMetadata Ad Hoc Group, 112th MPEG MEETING, Warsaw, Poland, 2015. [6] D. Son, C. Yu, and H.-N. Kim, “Dynamic voltage scaling on mpeg decoding,” in Parallel and Distributed Systems, 2001. ICPADS 2001. Proceedings. Eighth International Conference on, 2001, pp. 633–640. [7] V. Gutnik and A. Chandrakasan, “Embedded power supply for lowpower dsp,” Very Large Scale Integration Systems, IEEE Transactions on, vol. 5, no. 4, pp. 425–435, 1997. [8] E. Nogues, R. Berrada, M. Pelcat, D. M´enard, and E. Raffin, “A DVFS based HEVC decoder for energy-efficient software implementation on embedded processors,” in 2015 IEEE International Conference on Multimedia and Expo, ICME 2015, Turin, Italy, June 29 - July 3, 2015, 2015, pp. 1–6. [9] F. C. Fernandes, X. Ducloux, Z. Ma, E. Faramarzi, P. Gendron, and J. Wen, “The green metadata standard for energy-efficient video consumption,” MultiMedia, IEEE, vol. 22, no. 1, pp. 80–87, 2015. [10] B. Widrow and S. D. Stearns, “Adaptive signal processing,” Englewood Cliffs, NJ, Prentice-Hall, Inc., 1985, 491 p., vol. 1, 1985. [11] “Information technology – mpeg systems technologies – part 11: Energyefficient media consumption (green metadata),” 2015. [12] Z. Ma, H. Hu, and Y. Wang, “On complexity modeling of H.264/AVC video decoding and its application for energy efficient decoding,” IEEE Trans on Multimedia, vol. 13, no. 6, pp. 1240 –1255, Dec. 2011.

[13] Y. Benmoussa, E. Senn, and N. Tizon, “Enhancement of the dequantization and inverse transform metadata,” in GreenMetadata Ad Hoc Group, 112th MPEG MEETING, Warsaw, Poland, 2015. [14] D. Slock, “On the convergence behavior of the lms and the normalized lms algorithms,” Signal Processing, IEEE Transactions on, vol. 41, no. 9, pp. 2811–2825, 1993. [15] K. Choi, K. Dantu, W.-C. Cheng, and M. Pedram, “Frame-based dynamic voltage and frequency scaling for a mpeg decoder,” in Proceedings of the 2002 IEEE/ACM international conference on Computeraided design, ser. ICCAD ’02. New York, NY, USA: ACM, 2002, pp. 732–737. [16] Y. Benmoussa, E. Senn, J. Boukhobza, M. Lanoe, and D. Benazzouz, “Open-PEOPLE, a collaborative platform for remote & accurate measurement and evaluation of embedded systems power consumption,” in in Proceedings of the IEEE 22nd International Symposium On Modeling, Analysis And Simulation Of Computer And Telecommunication Systems, ser. MASCOTS ’14, 2014, pp. 273–282. [17] J. Pouwelse, K. Langendoen, and H. Sips, “Dynamic voltage scaling on a low-power microprocessor,” Proceedings of the 7th annual international conference on Mobile computing and networking, pp. 251–259, 2001. [18] J. Pouwelse, K. Langendoen, I. Lagendijk, and H. Sips, “Power-aware video decoding,” in in 22nd Picture Coding Symposium, Seoul, Korea, 2001, pp. 303–306. [19] A. C. Bavier, A. B. Montz, and L. L. Peterson, “Predicting mpeg execution times,” SIGMETRICS Perform. Eval. Rev., vol. 26, no. 1, pp. 131–140, Jun. 1998. [20] Y. Tan, P. Malani, Q. Qiu, and QingWu, “Workload prediction and dynamic voltage scaling for mpeg decoding,” in Design Automation, 2006. Asia and South Pacific Conference on, Jan 2006, pp. 6 pp.–. [21] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video coding extension of the H.264/AVC standard,” IEEE Trans on Circuits and Systems for Video Technology, vol. 17, no. 9, Sep. 2007. [22] M. Mattavelli, S. Brunetton, and D. Mlynek, “Implementing real-time video decoding on multimedia processors by complexity prediction techniques,” in Consumer Electronics, 1998. ICCE. 1998 Digest of Technical Papers. International Conference on, June 1998, pp. 264–265. [23] E.-Y. Chung, L. Benini, and G. De Micheli, “Contents provider-assisted dynamic voltage scaling for low energy multimedia applications,” in Low Power Electronics and Design, 2002. ISLPED ’02. Proceedings of the 2002 International Symposium on, 2002, pp. 42–47. [24] E. Akyol and M. van der Schaar, “Complexity model based proactive dynamic voltage scaling for video decoding systems,” Multimedia, IEEE Transactions on, vol. 9, no. 7, pp. 1475–1492, Nov 2007. [25] M. van der Schaar and Y. Andreopoulos, “Rate-distortion-complexity modeling for network and receiver aware adaptation,” Trans. Multi., vol. 7, no. 3, pp. 471–479, Jun. 2005. [26] The Moving Picture Experts Group, “MPEG systems technologies part 11: Energy-efficient media consumption (green metadata),” http://mpeg.chiariglione.org/sites/default/files/files/ standards/parts/docs/w14344-v2-w14344.zip, 2014. [27] Y. Benmoussa, J. Boukhobza, E. Senn, and D. Benazzouz, “On the energy efficiency of parallel multi-core vs hardware accelerated hd video decoding,” SIGBED Rev., vol. 11, no. 4, Feb. 2014. [28] N. Derouineau, N. Tizon, and Y. Benmoussa, “Hevc green metadata proposition,” in GreenMetadata Ad Hoc Group, 114th MPEG MEETING, San Diego, USA, 2016. [29] “Green video project,” http://greenvideo.insa-rennes.fr, 2014.

Green Metadata Based Adaptive DVFS for Energy ...

Section IV and V are dedicated to the experimental setup used and ..... the video at the server side to collect frame-by-frame com- plexity information. Then, the ...

2MB Sizes 1 Downloads 132 Views

Recommend Documents

green energy
Solar systems for Universities, Industries, Engineering Colleges, Science colleges, Polytechnic and School. RENEWABLE ENERGY SYSTEM. For Details and ...

Reactive DVFS Control for Multicore Processors - GitHub
quency domains of multicore processors at every stage. First, it is only replicated once ..... design/processor/manuals/253668.pdf. [3] AMD, “AMD Cool'n'Quiet.

Distributive Energy Efficient Adaptive Clustering Protocol for Wireless ...
Indian Institute of Technology, Kharagpur, India ... solutions to some of the conventional wireless ... routing protocol for wireless sensor networks, which uses a ...

Adaptive and Mobility Based Algorithm for enhancement of VANET's ...
In this paper an analytical model for the reliability of vehicular ad hoc networks (VANETs) is ... In vehicular ad hoc networks, vehicles download data from RSUs.

Adaptive Sub-band Nulling for OFDM-Based Wireless Communication ...
into a sub-band and multiple sub-bands are allocated to users. In the FDS, the ... frame experience fading and it results in varying SINR values at the receiver.

Density-Adaptive Synthetic-Vision Based Steering for ...
tends to lead to concentric swarming behaviour. This approach does .... on Robotics and Automation, Karlsruhe, Germany, May 6-10,. 2013, 2839–2844.

Adaptive Sampling based Sampling Strategies for the ...
List of Figures. 1.1 Surrogate modeling philosophy. 1. 3.1 The function ( ). ( ) sin. y x x x. = and DACE. 3.1a The function ( ). ( ) sin. y x x x. = , Kriging predictor and .... ACRONYMS argmax. Argument that maximizes. CFD. Computational Fluid Dyna

Adaptive Sub-band Nulling for OFDM-Based Wireless Communication ...
and coding format, pending data to specific users, in order to maintain the ..... [11] A. J. Goldsmith and P. P. Varaiya, “Capacity of fading channels with channel ...

An Adaptive Protocol Stack for High-Dependability based on ... - EWSN
In Wiselib 802.15.4, pack- ets are limited to 116Bytes and as a result, it may include a maximum of 37 neighbors. If we need to operate on a larger neighborhood we can use the Wiselib Fragmenting Radio and transmit beacons larger than a single messag

Green Business Awards - Hawaii Clean Energy Initiative
Jul 17, 2015 - resources all while helping the state meet its clean energy goals. Twenty-four of these renewable energy and efficiency stewards ... Page 2 ...

Energy Efficiency Tradeoff Mechanism Towards Wireless Green ...
tions to the challenging energy-consumption issue in wireless network. These green engineering ... several different EE tradeoffs for energy efficient green com- munication. The work presented in this paper is ..... Fundamental tradeoff for green com

Kauai Green Energy Team - Lebbe.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Kauai Green ...

Green Resilient Energy-Efficient Nano -
circuit- to system-level techniques to enhance the energy efficiency of high-speed VLSI circuits to enable continued performance enhancements under a tight power budget. • circuit and system integration with emerging and post-CMOS technologies for