Proceedings of the 6th World Congress on Control and Automation, June 21 - 23, 2006, Dalian, China
Combining Least Squares Support Vector Machines and Wavelet Transform to Predict Gas Emission Amount Cunliang Jia
Haishan Wu
College of Information and Electronic Engineering
College of Information and Electronic Engineering
China University of Mining & Technology
China University of Mining & Technology
Xuzhou,Jiangsu , China, 221008
Xuzhou,Jiangsu , China, 221008
[email protected]
[email protected]
Abstract- To improve the prediction accuracy of gas emission amount, a novel model based on least squares support vector machines (LS-SVM) and wavelet transform (WT) is presented. First, the historical series is decomposed by wavelet, and thus the approximate part and several detail parts are obtained. Then each part is predicted by a separate LS-SVM predictor. The reconstruction of predicted series is used as the final prediction result. The selections of embedding dimension and decomposition level are discussed, respectively. The results show that this model has greater generality ability and higher accuracy. Index Terms– Least squares support vector machines, wavelet transform, gas emission prediction
Ⅰ. INTRODUCTION Prediction of the expected gas emission amount from the work area of a mine is needed to facilitate ventilation planning and an assessment of methane drainage requirements. Accurate prediction of gas emission amount is crucial to insure the safety of the workers and the production of the coal. Great attention is paid on the accurate prediction of gas emission amount, and many models have been constructed. Among them, linear regression methods such as autoregressive (AR) [1] have been used in practice. Meantime, nonlinear methods are also applied to time series prediction with the development of machine learning theory. Of the nonlinear models, neural networks are very popular [2]. However, there are disadvantages in these models. Linear models are inadequate to predict nonstationary time series, which is affected by several random factors thus making it hard to predict. With respect to the model based on neural networks, it can not overcome the overfitting problem because it adopts the empirical risk minimization (ERM) principle. Moreover, it needs large quantity of training samples and learning speed is comparatively slow. Support vector machines (SVM), proposed by Vapnik [3] in 1995, is based on statistical learning theory (STL). It adopts structural risk minimization (SRM) principle instead of ERM principle, and thus can obtain global optimal solution by solving a quadratic problem. The adoption of kernel method avoids the curse of dimensional efficiently.
1-4244-0332-4/06/$20.00 ©2006 IEEE
Least squares support vector machines (LS-SVM) is a kind of SVM, but it possesses different constrains with regard to standard SVM. It has been applied in many fields such as time series prediction [4]. Wavelet transform (WT), which can produce a good local representation of the signal in both time domain and frequency domain, has also been successfully applied in the fields like data analysis and signal processing. It is also proposed for time series prediction combined with other models like neural networks [5]. In this paper, we proposed a model for gas emission amount prediction combining LS-SVM and WT, which can be called WT-LSSVM model. A simulation experiment is carried out to validate the applicability of the model. This paper is organized as follows: Section Ⅱ reviews the basic principles of LS-SVM and WT. In Section Ⅲ, the prediction model based on LS-SVM and WT is constructed. In Section Ⅳ, the simulation experiment is carried out and the selections of embedding dimension and decomposition level are discussed. Finally, the conclusion is made in Section Ⅴ. Ⅱ.
A.
BACKGROUND
Least Squares Support Vector Machines
Suppose we have the independent uniformly distributed data {x1 , y1 }"{xl , y l } , where each x i ∈ R n denotes the input space of the sample and has a corresponding target value y i ∈ R for i=1...l, where l corresponds to the size of the training data. The estimating function takes the form as follows: f ( x ) = ( w ⋅ Φ ( x )) + b (1) Where, Φ ( x ) denotes the high dimensional feature space which is nonlinearly mapped from the input space. This leads to the optimization problem for standard SVM: l 1 T Minimize . (2) w w + γ ∑ ξi 2 i =1
6097
⎧⎪ y [ wT Φ ( x ) + b] ≥ 1 − ξ i i . (3) ⎨ i ξ i ≥ 0, i = 1, … , l ⎪⎩ Where, ξ i is a slack variable and γ is a positive real constant which determines penalties to estimation errors. For LS-SVM, (3) has been modified as follows: l 1 T w w + γ ∑ξi 2 i =1 (4) Minimize Subject to the equality constrains: y i [ wT Φ ( x i ) + b] = 1 − ξ i i = 1,..., l (5) By constructing the Lagrange function and according to KKT Conditions, the equation as follows can be obtained: l ⎧ w = ∑ α i y i Φ ( xi ) ⎪ i =1 ⎪ l ⎪ ∑ α i yi = 0 (6) ⎨ i =1 ⎪ α i = γξ i ⎪ ⎪ y [ w T Φ ( x ) + b] − 1 + ξ = 0 i i ⎩ i
xt − x j RBF: K ( xi , x j ) = exp(− 2σ 2
Subject to
Then we define:
2 )
The resulting LS-SVM model for regression can be expressed as follows: l
f ( x) =
∑ (α −α i
* i ) K ( x i , x) + b
(10)
i =1
B.
Wavelet Transform 2 Suppose the function ϕ (t ) ∈ L ( R ) and its Fourier
transform
ψ (ω )
satisfies the condition:
∫
R
ψ (ω ) ω
2
dω < ∞
(11)
Then ϕ (t ) can be called mother wavelet. By dilations and translations of mother wavelet, a family of wavelet functions as follows can be obtained: 1 t−d ψ a ,d (t ) = ψ( ) ( a ≠ 0, d ∈ R ) (12) a a Where, a is the dilation factor and d is the translation factor. Let a = 2 j and d = k 2 j , discrete wavelet transform (DWT) can be realized: −j (13) ψ j ,k (t ) = 2 2 ψ (2 − j t − k )
⎧Z = [Φ ( x )T y ;...; Φ ( xi ) T y i 1 1 ⎪ T Y [ y ;...; yi ] = ⎪⎪ 1 G 1 = [1;...;1] ⎨ ⎪ ξ = [ξ1 ;...; ξ i ] ⎪ α = [α1 ;...; α i ] ⎪⎩
Where, k is the shift parameter and j is the resolution level. (7) The larger the value of j, the lower the frequency. After substituting (7) into (6) and eliminating w and γ , According to (13), the reconstruction expression of f(x) can we can obtain: be presented as follows: T ⎤ ⎡ b ⎤ ⎡0 ⎤ ⎡0 f (t ) = c j ,k ϕ j ,k (t ) + d j ,kψ j ,k (t ) Y (8) ⎢Y ZZ T + γ −1 I ⎥ ⎢α ⎥ = ⎢1G ⎥ k k j (14) ⎣ ⎦⎣ ⎦ ⎣ ⎦ = a j (t ) + d j (t ) j By defining Ω = ZZ T and applying Mercer’s Condition [6] within the Ω the matrix, each element of the matrix is Where, a j and d j are the approximate and detail parts of in the form: original signal, respectively. T Ω i , j = y i y j Φ ( xi ) Φ ( x j ) = y i y j K ( xi , x j ) . (9) Ⅲ. PREDICTION MODEL Where, K ( x i , x j ) is defined as kernel function. The value
∑
∑∑
∑
of the kernel equals to the inner product of two vectors x i and x j in the feature space Φ ( x i ) and Φ ( x j ) that
The prediction model based on WT and LS-SVM can be realized according to the following stages:
is K ( x i , x j ) = Φ ( x i )Φ ( x j ) .
A.
Any
symmetry
function
satisfying Mercer’s condition can be used as kernel function. The typical examples of kernel function are polynomial kernel, RBF kernel. Polynomial: K ( xi , x j ) = (γ ( xi • x j ) + r ) d , γ > 0 ;
Decomposition of the Time Series
Given time series of gas emission amount {Q (1) " Q (l )} , it is decomposed by the wavelet at level j whose selection will be discussed in next section. Then the approximate part a j and the detail parts d i (i = 1" j ) are obtained: Q(t ) = a j +
∑d j
6098
i
(15)
B.
Prediction Model Base on LS-SVM
Suppose the current time is t , the amount of gas emission Q(t) can be predicted by the historical data Q(t-1),Q(t-2)…Q(t-p). Then the prediction function can be expressed as: (16) Q(t ) = Φ[Q(t − 1) " Q(t − p)] Where, p is referred to as the embedding dimension, whose selection will also be discussed in next section. According to the above subsection, the prediction function can be modified as follows: (17) a j (t ) = Φ[ a j (t − 1) " a j (t − p)] d j (t ) = Φ[ d j (t − 1) " d j (t − p )]
Ⅳ.
To test the efficiency of our prediction model, we use four days gas emission amount of each hour to forecast those of the next day.
A.
1.5
(18)
1 0.5 10
0.6 0.2 0
TABLE Ⅰ STRUCTURE OF INPUT VECTORS AND OUTPUT VECTORS
C.
aj(1)…aj(p-1),aj(p)
aj(p+1)
#
#
aj(l-p-1)…aj(l-3),aj(l-2)
aj(l-1)
aj(l-p)…aj(l-1),aj(l-1)
aj(l)
i
Wavelet Transform
dˆ j
"
dˆ1
aj
aˆ j
20
40
60
80
100
120
20
40
60
80
100
120
20
40
60
80
100
120
0
20
40
60
80
100
120
TABLE Ⅱ PARAMETERS OF LS-SVM PREDICTORS
Q( t − p ) " Q( t − 2 ) Q( t - 1 )
d1
120
In Fig.2, the trend parts, periodic parts, and random parts of original series are illustrated obviously,. The decomposed series is used to predict that of the next day by LS-SVM predictor. In this section, we use the software LSSVM [7] which includes the implementation of solving (8). RBF kernel is chosen as the kernel function; embedding dimension is selected at 6. The parameters of four LS-SVM predictors are shown in TABLEⅡ:
Where, Q and Qˆ (t ) are the real and predicted values of the gas emission amount respectively. Figure 1 shows the structure of the prediction model:
"
100
Fig.2 The original series (at the top) and decomposed series
j
dj
80
0 -0.5
represent the predicted values of approximate parts and detail parts, respectively. The reconstruction of each part can be used as the final predicted results: (19) Qˆ (t ) = aˆ + dˆ
LS - SVM " LS - SVM LS - SVM
60
0 -0.2 0.5 0
Using LS-SVM predictor, the predicted values of the approximate parts and detail parts of series of future gas emission amount can be achieved. Let aˆ j and dˆ i (i = 1" l )
∑
40
0 -0.2 0.2 0
Reconstruction of the Predicted Value
j
20
0.8
LS-SVM predictor can be obtained as shown in TableⅠ
output vectors
Experiment Procedure And Results
Db3 wavelet is selected as the wavelet function and decomposition level is selected at 3. The original series and the decomposed series are shown in Fig.2.
We construct a multi-input and single-output LS-SVM predictor for each part. According to (17) and (18), taking a j for example, the input vectors and output vectors of
input vectors
SIMULATION EXPERIMENT
LS-SVM predictor of each part
γ
σ2
approximate parts detail parts in level 3 detail parts in level 2 detail parts in level 1
1250 1250 100 350
20 20 20 190
By using LS-SVM predictor, the predicted values of approximate and detail parts of gas emission amount of the fifth day can be obtained as shown in Fig.3.
Reconstruction
ˆ(t ) Q
Fig.1 Structure of the prediction model
6099
and the correlation between real value and predicted value, respectively. The value of F, which is the comprehensive index to evaluate the model, shows the precision of the model. The comparison result is shown in Table. Ⅲ.
0.95 0.9 0.85 0.2
0
5
10
15
20
25
TABLE Ⅲ COMPARISON OF WT-LSVM MODEL AND AR MODEL AND LS-SVM MODEL
0 -0.2 0.1
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
Indices WT-LSSVM model LS-SVM model AR model
0 -0.1 0.2
Co 0.9486 0.5309 0.3360
F 0.9766 0.8068 0.7270
As we expect, all of the three indices of our method are significantly better than those of two other models. WT-LSSVM model demonstrates its success in the prediction of gas emission amount.
0 -0.2
E 0.00474 0.0093 0.012
Fig.3 Predicted values of each parts of original series for the fifth day. In each figure, the solid line and the dotted line represent the actual value and predicted value, respectively.
According to (19), the reconstruction of each part is used as the final predicted result. The result is shown in Fig.4:
C.
Discussion of Parameter Selection
The values of embedding dimension and decomposed level are difficulty to select. In our experiment, we select the embedding dimension from one to twelve and the decompose level for one to three. Two indices are used to validate the efficiency of the model with selected values: F and MAPE (Mean Absolute Percentage Error). l
1.1
∑ Q (i ) − Qˆ (i ) i =1 MAPE= 100 l = 24 l The result is shown in TABLE Ⅳ and Fig.5:
Actual Value Predicted Value
1
TABLE Ⅳ PERFORMANCE WHEN DECOMPOSITION LEVEL VARIES
0.9
Decomposition Level 1 2 3
0.8
F 0.9345 0.9705 0.9766
MAPE(%) 4.274 3.029 2.436
1
0
5
10 hour 15
20
0.9
25 F
0.7
0.8 Fig.4
The final predicted result
0.7
Performance and Comparison
To make a comparison with the autoregressive model in [1] and pure LS-SVM model, three indices are used to evaluate the performance of the prediction model: E = abs{[(mean(Q) − mean(Qˆ )] / mean(Q )]} Co =
cov(Q, Qˆ ) D(Q ) • D(Qˆ )
;
6100
2
4 6 8 Embedding Dimension
10
12
0
2
4 6 8 Embedding Dimension
10
12
4
2
F = 0.6(1 − E ) + 0.4C . The values of E and CO are used to measure the error
0
6 MAPE(%)
B.
Fig.5 Performance when embedding dimension varies
From the table and figure above, we note that when p is more than five or decomposed level more than two, there is only tiny improvement of the performance. That is because: the information of gas emission amount of five hours is sufficient to predict the value of the next hour. When p is more than five, the information is redundant. Meanwhile, when the resolution level is at two, random parts have been displayed apparently. Too large resolution level may lead to the error propagation. In this paper, we select the resolution level at three in that the periodic parts, the trend parts and the random parts are illustrated clearly. The predicted result shown in TABLE Ⅳindicates the validity of our selection.
Ⅴ.
original series. Other data preparation methods [9] may enhance the accuracy. REFERENCES [1] [2] [3] [4] [5]
CONCLUSIONS
In this paper, we combine wavelet transform and least squares support vector machines to predict time series of gas emission amount. The final results show that this model has greater generality ability and higher accuracy. That means our method is applicable to predict time series of gas emission amount. However, additional research is necessary to further explore the model combining WT and LS-SVM. In our experiment, we only select RBF as the kernel function. Other kernel functions such as wavelet kernel proposed by Zhang et al [8] may be also promising. In addition, prediction error mainly results from the random part of
[6] [7] [8] [9]
6101
Zhi-fang Xu “Research on Integrated System of Gas Real-time Detecting Information Based on Intranet”. Ph.D. thesis, China University of Mining & Technology, 2001.(in Chinese) Zhi-yi Yang , Ya-xuan Xiong, Qian-lin Zhang “Research on the prediction of gas emission in working face Based on neural network” Coal Engineering , no. 10 pp.73-75. 2004.(in Chinese) V.vapnik, “The Nature of Statistical Learning Theory” New York: Springer-Verlag,1995 Van Gestel, T, et al. “Financial time series prediction using least squares support vector machines within the evidence framework”. IEEE Trans. Neural Networks, vol.12, no.4, pp.809-821, 2001 Bai-ling Zhang, et al. “Multi-resolution Forecasting for Futures Trading Using Wavelet Decompositions” IEEE Trans. Neural Networks, vol.12, no.4, pp.765-774, 2001. Shevade S K et al. “Improvements to SMO algorithm for SVM regression” IEEE Trans. Neural Networks, vol.11, no.5, pp:1188-1193,2000. LS-SVMlab[Online].Available:http://www.esat.kuleuven.ac.be/sista/ lssvmlab Li Zhang, Wei-da Zhou,Licheng Jiao. “Wavelet Support Vector Machines.” IEEE Trans. System. Man and Cybernetics, vol.34, no.1,pp.34-39, 2004. Bo-juen Chen, Ming-wei Chang, Chih-jen Lin “Load Forecasting Using Support Vector Machines: A Study on EUNITE Competition 2001”. IEEE Trans. Power System vol.19, no.4, pp.1821-1830, 2004.