Proceedings of the 6th World Congress on Control and Automation, June 21 - 23, 2006, Dalian, China

Power Load Forecasting with Least Squares Support Vector Machines and Chaos Theory Haishan Wu

Xiaoling Chang

College of Information and Electronic Engineering

Department of Control Science and Engineering

China University of Mining & Technology

Huazhong University of Science and Technology

Xuzhou,Jiangsu , China, 221008

Wuhan, Hubei , China, 430074

[email protected]

[email protected]

Abstract- In this paper, a novel approach to power load forecasting based on least squares support vector machines (LS-SVM) and chaos theory is presented. First, with the data from EUNITE network, we find the chaotic characteristics of the daily peak load series by analyzing the largest Lyapunov exponent and power spectrum. Average Mutual Information (AMI) method is used to find the optimal time lag. Then the time series is decomposed by wavelet transform. Cao’s method is adopted to find the optimal embedding dimension of the decomposed series of each level. At last, with the optimal time lag and embedding dimension, LS-SVM is used to predict future load series of each level. The reconstruction of predicted time series is used as the final forecasting result. The mean absolute percentage error (MAPE) is 1.1013% and the maximum error is 25.1378 MW, which show that this approach is applicable for power load forecasting. Index Terms- Least squares support vector machines, wavelet transform, chaos theory, load forecasting

Ⅰ. INTRODUCTION Daily peak load forecasting plays an important role in all aspects of economic, reliable, and secure strategies for power system. But the time series of power load is affected by many factors such as economic, temperature, thus making it difficulty to be accurately predicted. Numerous researches are focused on accurate prediction of power load series, and many prediction models are constructed [1] [2] [3]. Among these models, artificial neural networks model is very popular [1]. However, this method has several inherent disadvantages. For example, it may fall into local optimal solution because of the adoption of empirical risk minimization (ERM) principle. Besides, it needs large quantity of training samples and learning speed is comparatively slow. Recently, support vector machines (SVM), proposed by Vapnik [4] in 1995 and based on statistical learning theory (SLT), is a promising approach to power load forecasting [5]. Its merits chiefly come from the adoption of structural risk minimization (SRM) principle instead of ERM principle, and therefore it can obtain global optimal solution by solving a quadratic problem. The adoption of kernel method avoids the curse of dimensional efficiently. Least squares support vector machines (LS-SVM) is a kind of SVM, but it

1-4244-0332-4/06/$20.00 ©2006 IEEE

possesses different constrains with regard to standard SVM. It has been successfully applied in many fields such as time series prediction [6]. When generating the input vectors and output vectors of LS-SVM predictor, the selections of time lag and embedding dimension are crucial. With further scrutiny, one can find that the power load time series has chaotic characteristics: its spectrum is broadband and has a broad peak; meanwhile its largest Lyapunov exponent exceeds zero[7]. This encourages us to apply the chaos theory to the power load forecasting. Average Mutual Information (AMI)[7] technique is used to choose the optimal time lag. Further detailed examination of curves of daily peak load series and its power spectrum, one can see that the load series possesses a daily, weekly and seasonal pattern. This characteristic also motivates us to decompose the original series by wavelet transform, because it can produce a good local representation of the signal in both time domain and frequency domain. Then the trend, periodicity and random parts are illustrated explicitly. Cao’s method [8] is used to select the optimal embedding dimension of decomposed series of each level, respectively. With the optimal time lag and embedding dimension, the input vectors and output vectors can be obtained. Finally, the future load of each level is predicted by separate LS-SVM predictor. The reconstruction of predicted series of each level is used as the final forecasting result. This paper is organized as follows: in next section, we review the basic principles of LS-SVM and chaos theory. In Section Ⅲ, the procedure of the experiment is presented. In Section Ⅳ, the final forecasting result is shown and the comparison is made. Finally, the conclusion is made and future work is discussed in the last section. Ⅱ.

A.

BACKGROUND

Least Squares Support Vector Machines

Suppose we have the independent uniformly distributed data {x1 , y1}"{x N , y N } , where each x i ∈ R n denotes the input space of the sample and has a corresponding target value y i ∈ R for i=1...N, where N

4369

corresponds to the size of the training data. The estimating function takes the form as follows: f ( x ) = ( w ⋅ Φ ( x )) + b (1) Where, Φ ( x ) denotes the high dimensional feature space which is nonlinearly mapped from the input space. This leads to the optimization problem for standard SVM: minimize

N 1 T w w + γ ∑ ξi 2 i =1

(2)

⎧⎪ y [ wT Φ ( x ) + b] ≥ 1 − ξ i i (3) subject to ⎨ i ⎪⎩ ξi ≥ 0, i = 1, …, N Where, ξ i is a slack variable and γ is a positive real

that is K ( x i , x j ) = Φ ( x i )Φ ( x j ) . The typical examples of kernel function are polynomial kernel and RBF kernel: polynomial: K ( xi , x j ) = (γ ( xi • x j ) + r ) d , γ > 0 ; 2 xt − x j RBF: K ( xi , x j ) = exp(− ) 2σ 2 The resulting LS-SVM model for regression can be expressed as follows: N

f ( x) =

∑ (α −α i

* i ) K ( xi , x ) + b

(10)

i =1

B.

Chaos Theory

Chaos is a nonlinear behavior that exists between the constant which determines penalties to estimation errors. realms of periodic and random. The exact system state can For LS-SVM, (3) has been modified as follows: be written: N 1 T X (t ) = ( x(t ), x(t − T ), x(t − 2T ),..., x(t − (k − 1)T )) (11) w w + γ ∑ ξi minimize (4) 2 i =1 Where t is a scalar index for the data series and T is the subject to the equality constrains: interval of observations. For a discrete time series, the next yi [ wT Φ ( xi ) + b] = 1 − ξi i = 1,..., N (5) value of the state can be obtained as a function of the current state; By constructing the Lagrange function and according (12) x(t + 1) = f ( X (t )) to KKT Conditions, the equation as follows can be obtained: Dynamic systems may evolve over time and generate N ⎧ w y x = Φ ( ) α time series with regular appearance, but it is possible that ∑ i i i ⎪ i =1 the system evolves to a chaotic attractor. The goal of a time ⎪ N ⎪ series predictor is to reconstruct the chaotic dynamics of the 0 y = α ∑ (6) i i ⎨ i =1 space state from the measurements of one component of the ⎪ α i = γξi state vector of a dynamic system, and to predict the ⎪ ⎪ y [ wT Φ ( x ) + b] − 1 + ξ = 0 evolution of the measured variable. Takens’ embedding i i ⎩ i theorem proved that if the dynamics occurs in a d Then we define: dimensional Euclidean space, a reconstruction of the system ⎧Z = [Φ ( x ) T y ;...; Φ ( xi ) T y i can be obtained in a 2d+1 dimensional space built using 1 1 ⎪ delay coordinates, which define the delay coordinate vectors T Y = [ y1 ;...; y i ] ⎪⎪ [9]. A delay coordinate is simply an observed variable with a G time τ. For example, if we have the time series x1,x2,x3…xN, 1 = [1;...;1] ⎨ the i-th delay coordinate vector with embedding d is: ⎪ ξ = [ξ1 ;...; ξ i ] yi(d)=[xi,xi+τ,xi+2τ,…,xi+(d-1) τ], i=1,2,...,N-(d-1) τ (13) ⎪ α = [α1 ;...; α i ] Where, τ is called time lag. The number of delay ⎪⎩ (7) coordinates necessary to reconstruct the state of a dynamic After substituting (7) into (6) and eliminating w and γ , system represents the minimum embedding dimension d. we can obtain: To reconstruct the phase space, good choices for time T lag τ and embedding dimension d are needed. We use the ⎡0 ⎤ ⎡ b ⎤ ⎡0 ⎤ Y (8) first minimum of the average mutual information (AMI) ⎢Y ZZ T + γ −1 I ⎥ ⎢α ⎥ = ⎢1G ⎥ ⎣ ⎦⎣ ⎦ ⎣ ⎦ function: n +τ T ⎡ P ( xn , xn +τ ) ⎤ By defining Ω = ZZ and applying Mercer’s Condition I (τ ) = P ( xn , xn +τ ) log 2 ⎢ ⎥ (14) within the Ω the matrix, each element of the matrix is in ⎣ P ( xn ) P ( xn +τ ) ⎦ n the form: Where P(xn) is the probability density of xn, while the T Ω i , j = y i y j Φ ( xi ) Φ ( x j ) = y i y j K ( xi , x j ) . (9) P(xn,xn+τ) is the probability density of xn and xn+τ . The false nearest neighbor (FNN)[10]method is a Where, K ( x i , x j ) is defined as kernel function. The value approach to find the optimal embedding dimension, of a kernel function equals to the inner product of two however, we prefer to Cao’s method because it: 1) does not contain any subjective parameters except for vectors x i and x j in the feature space Φ ( x i ) and Φ ( x j ) , the time delay for the embeddings



4370

2) does not strongly depend on how many data points are available 3) is computationally efficient Cao’s method can be expressed as follows: E (d + 1) (15) let E1(d ) = E (d ) with 1 N − dτ

N − dτ −1

∑ t =0

y d +1 (t ) − ydNN +1 (t ) yd (t ) − ydNN (t )

800 Maximal Daily Load

E (d ) =

900

(16)

700 600 500

and 400

y d (t ) − ydNN (t ) = max x(t + jτ ) − x NN (t + jτ ) (17) 0 ≤ j ≤ d −1

Where N is the length of the original date series, d refers to the embedding dimension, NN means the nearest neighbor to other vector. As d gets large enough, E1(d) will tend to one. The appropriate embedding dimension is given by the value of d where E1(d) stops changing

Fig.1

-50

ME = max Li − Lˆi

n

100 200 Frequency

300

6 5 4

n = 31;

i = 1,...,31

3 2

With the data available, the daily peak load from 1997 to 1998 is easy to be obtained. The result is as shown in Fig.1.Our method is to predict the daily peak load of January of 1999 by using those of 1997 and 1998. So the data of temperature and dates of holidays are not considered.

A.

0

Fig.2 Fourier power spectrum for maximal daily load

i

i =1

Maximal daily load from 1997 to 1998

0

LLE

MAPE = 100

800

50

n

i

600

day

100

In this section, we use the data provided by the EUNITE network during the EUNITE load competitions [11]. The data are as follows: z Electricity load demand recorded every half hour, from 1997 to 1998 z Average daily temperature, from 1995 to 1998 z Dates of holidays , from 1997 to 1999 The objective of our model as well as the aim of the competition is to forecast the daily peak load of January of 1999. Two indices are used to evaluate the performance of the forecasting model: mean absolute average error (MAPE) and maximal error (ME): i

400

150

LOAD FORCASTING MODEL

∑ ( L − Lˆ ) / L

200

db

Ⅲ.

0

1 0

100

200

300

400

500

Fig.3 Largest Lyapunov exponent of maximal daily load

Chaotic Characteristics Analysis

In analyzing the data, an important step is determining the presence of chaotic behavior. In this sub-section, we use two approaches to determine the chaotic characteristics of the daily peak load: Fourier power spectrum and largest Lyapunov exponent. All the calculations as follows are performed on TSTOOL add-on program for matlab[12]. The results are shown in Fig.2 and Fig3, respectively.

From Fig.2, one can see that daily peak load possessed a broad band spectrum of frequencies. And as shown in Fig.3, the largest Lyapunov exponent lies between 4 and 6. Both of them show that time series of daily peak load is chaotic.

B.

4371

Selecting Delay Time

5

D.

Selecting Embedding Dimension

With the selected time lag, embedding dimension of each level is produced by Cao’s method. The value of embedding dimension is shown in Table Ⅰ. From Table Ⅰ, one can see that the higher the decomposition level is, the larger embedding dimension is. That is because the information in high frequency parts is contaminated by the noise, and thus more data points are needed to predict the value of next day.

4.95

Bit

4.9 4.85 4.8

1

4.75

0.5 0

20 Fig.4

40 60 Time Lag

80

100

0 10

AMI function for maximal daily load

E1(d)

0 0 1

15

20

5

10

15

20

5

10

15

20

10 15 Embedding Dimension

20

0.5 0 0 1

Decomposed By Wavelet Transform

With further scrutiny, one can find that the daily peak load time series has a weekly, monthly and seasonal pattern. Besides, it also has random parts. This kind of characteristics inspires us to decompose the series by wavelet transform. The decomposed level is hard to choose. After trying different level, we find: if the level is too low, prediction error will increase; when it increases, prediction result will be improved greatly, but when the level is more than three, there is only tiny improvement. At last, db3 is chosen as the mother wavelet and decomposed level is three. The decomposed series is shown if Fig.5

10

0.5

From Fig.4, one can see that the first minimum of the AMI function is one, so the time lag is one. That means the time lag equals to the sampling interval and all the maximal daily load data are used to predict those of January of 1999. At last, we use the daily peak load of last three month of 1998 to predict those of January of1999.

C.

5

0.5 0

0

5

Fig.6 E1(d) graph of each level produced by Cao’s method. From top to bottom, they belong to: approximate parts, detail parts in level 3, 2, 1, respectively

E.

Load Forecasting Based On LS-SVM Predictors

By using the obtained time lag τ and embedding d and according to (13), the input vectors and output vectors for LS-SVM predictor are obtained. In this paper, we chose the RBF as the kernel function. We use the software LSSVM [13] which includes the implementation of solving (8). The parameters of LS-SVM predictor for each part are shown in Table Ⅰ. Here, γ and σ 2 are the parameters of LS-SVM and RBF kernel, respectively. TABLE Ⅰ PARAMETERS OF EACH LEVEL LS-SVM predictor of each part

γ

σ2

approximate parts detail parts in level 3 detail parts in level 2 detail parts in level 1

1250 565 100 5

20 78 10 17.5

Ⅳ.

Fig.5 The original series (at the top) and decomposed series of three month

Embedding Dimension 5 6 8 10

FINAL FORECASTING RESULT AND COMPARISON

The reconstruction of the predicted value of each part is used as the final forecasting result as shown in Fig.7. The comparison with the competition winners is also made and

4372

However, further research is still needed of this model. 1) In our experiment, we only use RBF as the kernel function, but other kernels may be also promising, such as the wavelet kernel proposed by Zhang et al [15]. 2) The parameter selection of LS-SVM predictor is tough. It seems there is no way but trying. Attention should also be paid on how to select the parameters efficiently. REFERENCES

the result is shown in Table Ⅱ. 850

Actual Maximal Load Forecasting Maximal Load

Maximal Load

800

750

700

650

[1]

0

5

10 Fig.7

15 day 20

25

[2]

30

The final predicted result [3]

TABLE Ⅱ COMPARISON OF OUR MODEL AND THE MODELS OF EUNITE COMPETITION WINNERS Prediction Model Our model Reference[5] Reference[14]

MAPE (%) 1.1013 1.9824 2.1489

[4]

Maximal Error(MW) 25.1378 51.4150 40.0000

[5] [6]

As we expected, the final forecasting result of our model has the minimal MAPE and ME, which indicate the applicability to load forecasting.

Ⅴ.

CONCLUSIONS AND FUTURE WORK

[7] [8] [9]

Our model demonstrates its success in load forecasting. The reasons can be explained: 1) Chaos theory is applied to our experiment. The selections of time lag by AMI function and of embedding dimension by Cao’s method are much more reliable than by experience. 2) Wavelet transform reveals the trend, periodicity, random parts of original series apparently. By this technique, the random part, which mainly results in the forecasting error, is a small part of the original series, thus decreasing forecasting error. 3) LS-SVM predictor has great generalization ability and guarantees global optimality, thus enhancing the forecasting accuracy.

[10] [11] [12] [13] [14] [15]

4373

H.S. Hipper, C.E. Pedreira, and R.C. Souza, “Neural networks for short-term load forecasting: A review and evaluation”, IEEE Trans. Power System., vol.16, pp.44-55, Feb,2001 Kyung-Bin Song Young-Sik Baek Dug Hun Hong Jang, G., “Short-term load forecasting for the holidays using fuzzy linear regression method”. IEEE Trans. Power System, vol.20, no.1, pp.96-101, 2005. Papalexopoulos, A.D.Hesterberg, T.C. “A regression-based approach to short-term system load forecasting”. IEEE Trans. Power System, vol.5, no.4, pp.1535-1547, 1990. V.vapnik, “The Nature of Statistical Learning Theory” New York: Springer-Verlag,1995 B.J. Chen, M.W. Chang, C.J. Lin “Load Forecasting Using Support Vector Machines: A Study on EUNITE Competition 2001”. IEEE Trans. Power System vol.19, no.4, pp.1821-1830, 2004 Van Gestel, T, et al. “Financial time series prediction using least squares support vector machines within the evidence framework”. IEEE Trans. Neural Networks, vol.12, no.4, pp.809-821, 2001. Abarbanel, Henry D.I., “Analysis of Observed Chaotic Data” New York: Springer-Verlag,1996 L.Y. Cao, “Practical Method for Determining the Minimum Embedding Dimension of a Scalor Time Series”, Physical D, vol.110, pp.43-50, 1997. F. Takens, “Detecting strange attractors in turbulence”, in Dynamic Systems and Turbulence. Warwick, 1980. Lecture Notes in Mathematics, D.A. Rand and L.S. Young (Eds.). Springer Verlag, Berlin. 1981, pp. 366 – 381. M.Kennel, R.Brown, H.Abarbanel. “Determining embedding dimension for phase--space reconstruction using a geometrical construction”, Physical Review .vol.A 45, no. 6, 3403-3411, 1992. World-wide competition within the EUNITE network, [Online]. Available: http://neuron.tuke.sk/competition, 2005 TSTOOL [Online]. Available: http://www.physik3.gwdg.de/tstool, 2005 LS-SVMlab[Online].Available:http://www.esat.kuleuven.ac.be/sista/ lssvmlab, 2005 Dalibor Zivcak. Electricity Load Forecasting using ANN [Online] Available:http://neuron.tuke.sk/competition/reports/DaliborZivcak.p df, 2005. Li Zhang, Wei-da Zhou,Li-cheng Jiao. “Wavelet Support Vector Machines.” IEEE Trans. System. Man and Cybernetics, vol.34, no.1, pp.34-39, 2004.

WCICA 2006 Paper

quantity of training samples and learning speed is comparatively slow. ... characteristics: its spectrum is broadband and has a broad peak ... comparison is made. Finally, the ... Where, ( )x. Φ denotes the high dimensional feature space.

519KB Sizes 3 Downloads 145 Views

Recommend Documents

WCICA 2006 Paper
As a brand new time-frequency domain analytical technique, it is independent with Fourier ... a powerful tool for adaptive multi-scale analysis of short time nonlinear and ... As mentioned above, EMD method decomposes a signal into a set of ...

WCICA 2006 Paper
China University of Mining & Technology ... First, the historical series is decomposed by wavelet, and thus ... and Automation, June 21 - 23, 2006, Dalian, China ...

Download Kcse 2006: Chemistry Paper 1 question paper - Kenyaplex
When the circuit was completed, it was noticed that a blue colour spread towards the right. (a) Explain ... process taking place between to and t1;. - - - - - - - n - - .

Download Kcse 2006: Chemistry Paper 1 question paper - Kenyaplex
2008. 2 hours. Candidate's Signature . ... Mathematical tables and silent electronic calculators may be used. All working ... III Butene Add a drop of bromine water.

IEEE ICMA 2006 Paper
Index Terms—Robot grouping, mobile actuator networks, ... sellation (CVT) in coverage control of mobile sensing ... pollution source is not a good strategy.

IEEE ICMA 2006 Paper
implicit communication through the environment. This paper investigates the algorithms inspired from robust biological system, and draws algorithmic profiles when these algorithms are applied to wireless sensor networks. The rest of this paper is org

CBSE 2012 Computer Science Question Paper set 3 2006.pdf ...
Page 1 of 6. Computer Science 2006 (Compartment Delhi). General Instructions: 3. All questions are compulsory. 4. Programming Language: C++. Q. 1. a.

DDSS 2006 Paper - The Robotics Institute Carnegie Mellon University
to spam email filters. ... experiment by building a software program that can identify a certain type of ..... Mining and Knowledge Discovery,2(2), p121-167. Blum ...

DDSS 2006 Paper - The Robotics Institute Carnegie Mellon University
Isenberg, A., 2004, Downtown America: a history of the place and the people who made it. Chicago, University of Chicago Press: xviii, 441 p., [2] p. of plates (col). Joachims, T., 1998, “Text Categorization with Support Vector Machines: Learning wi

b.com i paper ii financial accounting (admission of 2006 onwords ...
b.com i paper ii financial accounting (admission of 2006 onwords) 1451.pdf. b.com i paper ii financial accounting (admission of 2006 onwords) 1451.pdf. Open.

KCSE 2006 PHYSICS PAPER 1 Answer all questions ...
helium gas. The mass of the fabric making the balloon is 0.30kg. Determine the maximum tension on the string trying the balloon to the rigid support. 18. (a) Define specific latent heat of fusion of a substance. ( 1 mark). (b) Water of mass 200 g at

2006 CLRA Paper Soil Bioengineering for Riparian Restoration.pdf ...
Retrying... 2006 CLRA Paper Soil Bioengineering for Riparian Restoration.pdf. 2006 CLRA Paper Soil Bioengineering for Riparian Restoration.pdf. Open.

DDSS 2006 Paper - CMU Robotics Institute - Carnegie Mellon University
potential for the use of machine learning in the solving of urban planning ... of complex data analysis algorithms on a large amount of data available from.

CBSE 2012 Computer Science Question Paper set 1 2006.pdf ...
CBSE 2012 Computer Science Question Paper set 1 2006.pdf. CBSE 2012 Computer Science Question Paper set 1 2006.pdf. Open. Extract. Open with. Sign In.

january 2006 february 2006
Snow Creek. DAY. 8:30 AM. Scout meeting. Blue Elk Dist. Pizza form due. Dinner 6:30 PM. 26. 27. 28. Scout meeting. JANUARY 2006. FEBRUARY 2006 ... Lake Jacomo. 28. 29. 30. 31. MEMORIAL DAY. No Scout meeting. SUNDAY. MONDAY. TUESDAY. WEDNESDAY. THURSD

2006 PlantJournal.pdf
*For correspondence (fax +44 1865 281 696; e-mail [email protected]). †Present address: ... resistant host with an avirulent pathogen (Ross, 1961a,b), also require SA ... provide at best only limited information on the distribution of SA within ...

CBSE 12 Chemistry Question Paper set 3 2006.pdf
(2, 3). Page 3 of 4. CBSE 12 Chemistry Question Paper set 3 2006.pdf. CBSE 12 Chemistry Question Paper set 3 2006.pdf. Open. Extract. Open with. Sign In.

CBSE 12 Chemistry Question Paper set 2 2006.pdf
CBSE 12 Chemistry Question Paper set 2 2006.pdf. CBSE 12 Chemistry Question Paper set 2 2006.pdf. Open. Extract. Open with. Sign In. Main menu.

2006.Mobicom.pdf
an interesting question as to whether SMTP can use email. vocabulary based techniques to reduce the actual content. transferred between SMTP servers, or a ...

CORE 2006 6th grade.indd
mouth of the cold water jar and turn it upside down on top of the warm water. Carefully pull out the card. You should see warm, red water rising and cold water sinking. Radiation is the transfer of heat through space in the form of waves. The heat we

Winter 2006
Clinical Science is published as a service to the members of Section III of the Division of Clinical. Psychology of the ... of Clinical Psychology, or the American Psychological Association. .... was founded for the express purpose of promoting clini

2006 Friedland
of the Center for. Milf s Cotlege, Oak.land. Advi-sor. Contemporary Mus.rc efrnudena Konrad, Computer Science Advisor and Computer Science. Department of ..... to develop a more extensible system that could be customized through a GUI. .... to look a

28 March 2006 - Sites
But I can make my theory of Magical. Medicine completely consistent with the evidence (like a conspiracy theory). • e.g. I ascribe different maladies to the.

Rafael_Gonzalez_Propuesta_de_escudo_de_Bretocino_[2006].pdf
41-53. Page 3 of 13. Rafael_Gonzalez_Propuesta_de_escudo_de_Bretocino_[2006].pdf. Rafael_Gonzalez_Propuesta_de_escudo_de_Bretocino_[2006].pdf.