Proceedings of the 6th World Congress on Control and Automation, June 21 - 23, 2006, Dalian, China
Power Load Forecasting with Least Squares Support Vector Machines and Chaos Theory Haishan Wu
Xiaoling Chang
College of Information and Electronic Engineering
Department of Control Science and Engineering
China University of Mining & Technology
Huazhong University of Science and Technology
Xuzhou,Jiangsu , China, 221008
Wuhan, Hubei , China, 430074
[email protected]
[email protected]
Abstract- In this paper, a novel approach to power load forecasting based on least squares support vector machines (LS-SVM) and chaos theory is presented. First, with the data from EUNITE network, we find the chaotic characteristics of the daily peak load series by analyzing the largest Lyapunov exponent and power spectrum. Average Mutual Information (AMI) method is used to find the optimal time lag. Then the time series is decomposed by wavelet transform. Cao’s method is adopted to find the optimal embedding dimension of the decomposed series of each level. At last, with the optimal time lag and embedding dimension, LS-SVM is used to predict future load series of each level. The reconstruction of predicted time series is used as the final forecasting result. The mean absolute percentage error (MAPE) is 1.1013% and the maximum error is 25.1378 MW, which show that this approach is applicable for power load forecasting. Index Terms- Least squares support vector machines, wavelet transform, chaos theory, load forecasting
Ⅰ. INTRODUCTION Daily peak load forecasting plays an important role in all aspects of economic, reliable, and secure strategies for power system. But the time series of power load is affected by many factors such as economic, temperature, thus making it difficulty to be accurately predicted. Numerous researches are focused on accurate prediction of power load series, and many prediction models are constructed [1] [2] [3]. Among these models, artificial neural networks model is very popular [1]. However, this method has several inherent disadvantages. For example, it may fall into local optimal solution because of the adoption of empirical risk minimization (ERM) principle. Besides, it needs large quantity of training samples and learning speed is comparatively slow. Recently, support vector machines (SVM), proposed by Vapnik [4] in 1995 and based on statistical learning theory (SLT), is a promising approach to power load forecasting [5]. Its merits chiefly come from the adoption of structural risk minimization (SRM) principle instead of ERM principle, and therefore it can obtain global optimal solution by solving a quadratic problem. The adoption of kernel method avoids the curse of dimensional efficiently. Least squares support vector machines (LS-SVM) is a kind of SVM, but it
1-4244-0332-4/06/$20.00 ©2006 IEEE
possesses different constrains with regard to standard SVM. It has been successfully applied in many fields such as time series prediction [6]. When generating the input vectors and output vectors of LS-SVM predictor, the selections of time lag and embedding dimension are crucial. With further scrutiny, one can find that the power load time series has chaotic characteristics: its spectrum is broadband and has a broad peak; meanwhile its largest Lyapunov exponent exceeds zero[7]. This encourages us to apply the chaos theory to the power load forecasting. Average Mutual Information (AMI)[7] technique is used to choose the optimal time lag. Further detailed examination of curves of daily peak load series and its power spectrum, one can see that the load series possesses a daily, weekly and seasonal pattern. This characteristic also motivates us to decompose the original series by wavelet transform, because it can produce a good local representation of the signal in both time domain and frequency domain. Then the trend, periodicity and random parts are illustrated explicitly. Cao’s method [8] is used to select the optimal embedding dimension of decomposed series of each level, respectively. With the optimal time lag and embedding dimension, the input vectors and output vectors can be obtained. Finally, the future load of each level is predicted by separate LS-SVM predictor. The reconstruction of predicted series of each level is used as the final forecasting result. This paper is organized as follows: in next section, we review the basic principles of LS-SVM and chaos theory. In Section Ⅲ, the procedure of the experiment is presented. In Section Ⅳ, the final forecasting result is shown and the comparison is made. Finally, the conclusion is made and future work is discussed in the last section. Ⅱ.
A.
BACKGROUND
Least Squares Support Vector Machines
Suppose we have the independent uniformly distributed data {x1 , y1}"{x N , y N } , where each x i ∈ R n denotes the input space of the sample and has a corresponding target value y i ∈ R for i=1...N, where N
4369
corresponds to the size of the training data. The estimating function takes the form as follows: f ( x ) = ( w ⋅ Φ ( x )) + b (1) Where, Φ ( x ) denotes the high dimensional feature space which is nonlinearly mapped from the input space. This leads to the optimization problem for standard SVM: minimize
N 1 T w w + γ ∑ ξi 2 i =1
(2)
⎧⎪ y [ wT Φ ( x ) + b] ≥ 1 − ξ i i (3) subject to ⎨ i ⎪⎩ ξi ≥ 0, i = 1, …, N Where, ξ i is a slack variable and γ is a positive real
that is K ( x i , x j ) = Φ ( x i )Φ ( x j ) . The typical examples of kernel function are polynomial kernel and RBF kernel: polynomial: K ( xi , x j ) = (γ ( xi • x j ) + r ) d , γ > 0 ; 2 xt − x j RBF: K ( xi , x j ) = exp(− ) 2σ 2 The resulting LS-SVM model for regression can be expressed as follows: N
f ( x) =
∑ (α −α i
* i ) K ( xi , x ) + b
(10)
i =1
B.
Chaos Theory
Chaos is a nonlinear behavior that exists between the constant which determines penalties to estimation errors. realms of periodic and random. The exact system state can For LS-SVM, (3) has been modified as follows: be written: N 1 T X (t ) = ( x(t ), x(t − T ), x(t − 2T ),..., x(t − (k − 1)T )) (11) w w + γ ∑ ξi minimize (4) 2 i =1 Where t is a scalar index for the data series and T is the subject to the equality constrains: interval of observations. For a discrete time series, the next yi [ wT Φ ( xi ) + b] = 1 − ξi i = 1,..., N (5) value of the state can be obtained as a function of the current state; By constructing the Lagrange function and according (12) x(t + 1) = f ( X (t )) to KKT Conditions, the equation as follows can be obtained: Dynamic systems may evolve over time and generate N ⎧ w y x = Φ ( ) α time series with regular appearance, but it is possible that ∑ i i i ⎪ i =1 the system evolves to a chaotic attractor. The goal of a time ⎪ N ⎪ series predictor is to reconstruct the chaotic dynamics of the 0 y = α ∑ (6) i i ⎨ i =1 space state from the measurements of one component of the ⎪ α i = γξi state vector of a dynamic system, and to predict the ⎪ ⎪ y [ wT Φ ( x ) + b] − 1 + ξ = 0 evolution of the measured variable. Takens’ embedding i i ⎩ i theorem proved that if the dynamics occurs in a d Then we define: dimensional Euclidean space, a reconstruction of the system ⎧Z = [Φ ( x ) T y ;...; Φ ( xi ) T y i can be obtained in a 2d+1 dimensional space built using 1 1 ⎪ delay coordinates, which define the delay coordinate vectors T Y = [ y1 ;...; y i ] ⎪⎪ [9]. A delay coordinate is simply an observed variable with a G time τ. For example, if we have the time series x1,x2,x3…xN, 1 = [1;...;1] ⎨ the i-th delay coordinate vector with embedding d is: ⎪ ξ = [ξ1 ;...; ξ i ] yi(d)=[xi,xi+τ,xi+2τ,…,xi+(d-1) τ], i=1,2,...,N-(d-1) τ (13) ⎪ α = [α1 ;...; α i ] Where, τ is called time lag. The number of delay ⎪⎩ (7) coordinates necessary to reconstruct the state of a dynamic After substituting (7) into (6) and eliminating w and γ , system represents the minimum embedding dimension d. we can obtain: To reconstruct the phase space, good choices for time T lag τ and embedding dimension d are needed. We use the ⎡0 ⎤ ⎡ b ⎤ ⎡0 ⎤ Y (8) first minimum of the average mutual information (AMI) ⎢Y ZZ T + γ −1 I ⎥ ⎢α ⎥ = ⎢1G ⎥ ⎣ ⎦⎣ ⎦ ⎣ ⎦ function: n +τ T ⎡ P ( xn , xn +τ ) ⎤ By defining Ω = ZZ and applying Mercer’s Condition I (τ ) = P ( xn , xn +τ ) log 2 ⎢ ⎥ (14) within the Ω the matrix, each element of the matrix is in ⎣ P ( xn ) P ( xn +τ ) ⎦ n the form: Where P(xn) is the probability density of xn, while the T Ω i , j = y i y j Φ ( xi ) Φ ( x j ) = y i y j K ( xi , x j ) . (9) P(xn,xn+τ) is the probability density of xn and xn+τ . The false nearest neighbor (FNN)[10]method is a Where, K ( x i , x j ) is defined as kernel function. The value approach to find the optimal embedding dimension, of a kernel function equals to the inner product of two however, we prefer to Cao’s method because it: 1) does not contain any subjective parameters except for vectors x i and x j in the feature space Φ ( x i ) and Φ ( x j ) , the time delay for the embeddings
∑
4370
2) does not strongly depend on how many data points are available 3) is computationally efficient Cao’s method can be expressed as follows: E (d + 1) (15) let E1(d ) = E (d ) with 1 N − dτ
N − dτ −1
∑ t =0
y d +1 (t ) − ydNN +1 (t ) yd (t ) − ydNN (t )
800 Maximal Daily Load
E (d ) =
900
(16)
700 600 500
and 400
y d (t ) − ydNN (t ) = max x(t + jτ ) − x NN (t + jτ ) (17) 0 ≤ j ≤ d −1
Where N is the length of the original date series, d refers to the embedding dimension, NN means the nearest neighbor to other vector. As d gets large enough, E1(d) will tend to one. The appropriate embedding dimension is given by the value of d where E1(d) stops changing
Fig.1
-50
ME = max Li − Lˆi
n
100 200 Frequency
300
6 5 4
n = 31;
i = 1,...,31
3 2
With the data available, the daily peak load from 1997 to 1998 is easy to be obtained. The result is as shown in Fig.1.Our method is to predict the daily peak load of January of 1999 by using those of 1997 and 1998. So the data of temperature and dates of holidays are not considered.
A.
0
Fig.2 Fourier power spectrum for maximal daily load
i
i =1
Maximal daily load from 1997 to 1998
0
LLE
MAPE = 100
800
50
n
i
600
day
100
In this section, we use the data provided by the EUNITE network during the EUNITE load competitions [11]. The data are as follows: z Electricity load demand recorded every half hour, from 1997 to 1998 z Average daily temperature, from 1995 to 1998 z Dates of holidays , from 1997 to 1999 The objective of our model as well as the aim of the competition is to forecast the daily peak load of January of 1999. Two indices are used to evaluate the performance of the forecasting model: mean absolute average error (MAPE) and maximal error (ME): i
400
150
LOAD FORCASTING MODEL
∑ ( L − Lˆ ) / L
200
db
Ⅲ.
0
1 0
100
200
300
400
500
Fig.3 Largest Lyapunov exponent of maximal daily load
Chaotic Characteristics Analysis
In analyzing the data, an important step is determining the presence of chaotic behavior. In this sub-section, we use two approaches to determine the chaotic characteristics of the daily peak load: Fourier power spectrum and largest Lyapunov exponent. All the calculations as follows are performed on TSTOOL add-on program for matlab[12]. The results are shown in Fig.2 and Fig3, respectively.
From Fig.2, one can see that daily peak load possessed a broad band spectrum of frequencies. And as shown in Fig.3, the largest Lyapunov exponent lies between 4 and 6. Both of them show that time series of daily peak load is chaotic.
B.
4371
Selecting Delay Time
5
D.
Selecting Embedding Dimension
With the selected time lag, embedding dimension of each level is produced by Cao’s method. The value of embedding dimension is shown in Table Ⅰ. From Table Ⅰ, one can see that the higher the decomposition level is, the larger embedding dimension is. That is because the information in high frequency parts is contaminated by the noise, and thus more data points are needed to predict the value of next day.
4.95
Bit
4.9 4.85 4.8
1
4.75
0.5 0
20 Fig.4
40 60 Time Lag
80
100
0 10
AMI function for maximal daily load
E1(d)
0 0 1
15
20
5
10
15
20
5
10
15
20
10 15 Embedding Dimension
20
0.5 0 0 1
Decomposed By Wavelet Transform
With further scrutiny, one can find that the daily peak load time series has a weekly, monthly and seasonal pattern. Besides, it also has random parts. This kind of characteristics inspires us to decompose the series by wavelet transform. The decomposed level is hard to choose. After trying different level, we find: if the level is too low, prediction error will increase; when it increases, prediction result will be improved greatly, but when the level is more than three, there is only tiny improvement. At last, db3 is chosen as the mother wavelet and decomposed level is three. The decomposed series is shown if Fig.5
10
0.5
From Fig.4, one can see that the first minimum of the AMI function is one, so the time lag is one. That means the time lag equals to the sampling interval and all the maximal daily load data are used to predict those of January of 1999. At last, we use the daily peak load of last three month of 1998 to predict those of January of1999.
C.
5
0.5 0
0
5
Fig.6 E1(d) graph of each level produced by Cao’s method. From top to bottom, they belong to: approximate parts, detail parts in level 3, 2, 1, respectively
E.
Load Forecasting Based On LS-SVM Predictors
By using the obtained time lag τ and embedding d and according to (13), the input vectors and output vectors for LS-SVM predictor are obtained. In this paper, we chose the RBF as the kernel function. We use the software LSSVM [13] which includes the implementation of solving (8). The parameters of LS-SVM predictor for each part are shown in Table Ⅰ. Here, γ and σ 2 are the parameters of LS-SVM and RBF kernel, respectively. TABLE Ⅰ PARAMETERS OF EACH LEVEL LS-SVM predictor of each part
γ
σ2
approximate parts detail parts in level 3 detail parts in level 2 detail parts in level 1
1250 565 100 5
20 78 10 17.5
Ⅳ.
Fig.5 The original series (at the top) and decomposed series of three month
Embedding Dimension 5 6 8 10
FINAL FORECASTING RESULT AND COMPARISON
The reconstruction of the predicted value of each part is used as the final forecasting result as shown in Fig.7. The comparison with the competition winners is also made and
4372
However, further research is still needed of this model. 1) In our experiment, we only use RBF as the kernel function, but other kernels may be also promising, such as the wavelet kernel proposed by Zhang et al [15]. 2) The parameter selection of LS-SVM predictor is tough. It seems there is no way but trying. Attention should also be paid on how to select the parameters efficiently. REFERENCES
the result is shown in Table Ⅱ. 850
Actual Maximal Load Forecasting Maximal Load
Maximal Load
800
750
700
650
[1]
0
5
10 Fig.7
15 day 20
25
[2]
30
The final predicted result [3]
TABLE Ⅱ COMPARISON OF OUR MODEL AND THE MODELS OF EUNITE COMPETITION WINNERS Prediction Model Our model Reference[5] Reference[14]
MAPE (%) 1.1013 1.9824 2.1489
[4]
Maximal Error(MW) 25.1378 51.4150 40.0000
[5] [6]
As we expected, the final forecasting result of our model has the minimal MAPE and ME, which indicate the applicability to load forecasting.
Ⅴ.
CONCLUSIONS AND FUTURE WORK
[7] [8] [9]
Our model demonstrates its success in load forecasting. The reasons can be explained: 1) Chaos theory is applied to our experiment. The selections of time lag by AMI function and of embedding dimension by Cao’s method are much more reliable than by experience. 2) Wavelet transform reveals the trend, periodicity, random parts of original series apparently. By this technique, the random part, which mainly results in the forecasting error, is a small part of the original series, thus decreasing forecasting error. 3) LS-SVM predictor has great generalization ability and guarantees global optimality, thus enhancing the forecasting accuracy.
[10] [11] [12] [13] [14] [15]
4373
H.S. Hipper, C.E. Pedreira, and R.C. Souza, “Neural networks for short-term load forecasting: A review and evaluation”, IEEE Trans. Power System., vol.16, pp.44-55, Feb,2001 Kyung-Bin Song Young-Sik Baek Dug Hun Hong Jang, G., “Short-term load forecasting for the holidays using fuzzy linear regression method”. IEEE Trans. Power System, vol.20, no.1, pp.96-101, 2005. Papalexopoulos, A.D.Hesterberg, T.C. “A regression-based approach to short-term system load forecasting”. IEEE Trans. Power System, vol.5, no.4, pp.1535-1547, 1990. V.vapnik, “The Nature of Statistical Learning Theory” New York: Springer-Verlag,1995 B.J. Chen, M.W. Chang, C.J. Lin “Load Forecasting Using Support Vector Machines: A Study on EUNITE Competition 2001”. IEEE Trans. Power System vol.19, no.4, pp.1821-1830, 2004 Van Gestel, T, et al. “Financial time series prediction using least squares support vector machines within the evidence framework”. IEEE Trans. Neural Networks, vol.12, no.4, pp.809-821, 2001. Abarbanel, Henry D.I., “Analysis of Observed Chaotic Data” New York: Springer-Verlag,1996 L.Y. Cao, “Practical Method for Determining the Minimum Embedding Dimension of a Scalor Time Series”, Physical D, vol.110, pp.43-50, 1997. F. Takens, “Detecting strange attractors in turbulence”, in Dynamic Systems and Turbulence. Warwick, 1980. Lecture Notes in Mathematics, D.A. Rand and L.S. Young (Eds.). Springer Verlag, Berlin. 1981, pp. 366 – 381. M.Kennel, R.Brown, H.Abarbanel. “Determining embedding dimension for phase--space reconstruction using a geometrical construction”, Physical Review .vol.A 45, no. 6, 3403-3411, 1992. World-wide competition within the EUNITE network, [Online]. Available: http://neuron.tuke.sk/competition, 2005 TSTOOL [Online]. Available: http://www.physik3.gwdg.de/tstool, 2005 LS-SVMlab[Online].Available:http://www.esat.kuleuven.ac.be/sista/ lssvmlab, 2005 Dalibor Zivcak. Electricity Load Forecasting using ANN [Online] Available:http://neuron.tuke.sk/competition/reports/DaliborZivcak.p df, 2005. Li Zhang, Wei-da Zhou,Li-cheng Jiao. “Wavelet Support Vector Machines.” IEEE Trans. System. Man and Cybernetics, vol.34, no.1, pp.34-39, 2004.