STRATIFICATION & SAMPLING OF ELECTRICITY CONSUMER DATA FOR OUTAGE COST DETERMINATION
DANIEL ANDREW SEN
COLLEGE OF GRADUATE STUDIES UNIVERSITI TENAGA NASIONAL 2007
STRATIFICATION & SAMPLING OF ELECTRICITY CONSUMER DATA FOR OUTAGE COST DETERMINATION
by
DANIEL ANDREW SEN
Project Supervisor: Dr Amir Hisham Hashim
A Dissertation Submitted in Partial Fulfillment of the Requirement for the Degree of Master Of Electrical Engineering College Of Graduate Studies Universiti Tenaga Nasional
JULY 2007
COPYRIGHT © 2007 Attention is drawn to the fact that copyright of this dissertation rests with its author.
ii
APPROVAL SHEET
This dissertation, entitled:
“STRATIFICATION & SAMPLING OF ELECTRICITY CONSUMER DATA FOR OUTAGE COST DETERMINATION”
Submitted by: Daniel Andrew Sen (SE 20240)
In partial Fulfillment of the requirement for the Degree of Master Of Electrical Engineering, College Of Graduate Studies, Universiti Tenaga Nasional has been accepted.
Project Supervisor: Dr. Amir Hisham Hashim
1 July 2007
iii
DECLARATION
I hereby declare that this dissertation, submitted to Universiti Tenaga Nasional as a partial fulfillment of the requirements for the degree of Master of Electrical Engineering has not been submitted as an exercise for a similar degree at any other university. I also certify that the work described here is my own except for excerpts and summaries whose sources are appropriately cited in the references. This dissertation may be made available within the university library and may be photocopied or loaned to other libraries for the purposes of consultation.
1 July 2007
Daniel Andrew Sen
iv
DEDICATION To my family for their love and support throughout the time I was writing this dissertation.
v
ACKNOWLEDGEMENT
Every day you may make progress. Every step may be fruitful. Yet there will stretch out before you an ever-lengthening ever-ascending, ever-improving path. You know you will never get to the end of the journey. But this, so far from discouraging, only adds to the joy and glory of the climb. – Winston Churchil – Completing the project work and writing this dissertation was certainly a unique and challenging task. The experience of writing this dissertation has certainly brought out the true meaning of excellence, perseverance, and teamwork. Many people contributed their help, thoughts, ideas, and support during the preparation of this dissertation. Specifically, I would like to thank my supervisor, Dr. Amir Hisham Hashim, whose keen guidance led us through the design, implementation, analysis, and finally, the compilation of this dissertation. I am also thankful for my research team members’ dedication and creativity in accomplishing our research objectives. Finally, I wish to express my heartfelt love and gratitude to my family for the many sacrifices they made to allow me the opportunity to complete my Masters Degree. And above all else, I thank God for His continued wisdom and sustenance. Daniel Andrew Sen Universiti Tenaga Nasional 1 July 2007
vi
ABSTRACT
In today’s demanding business environment, determining the Value of Lost Load (VoLL) is important in order for utilities to make the right decision when it embarks on any form of asset expansion or when assessing risks in power system operation. The VoLL is the aggregated or average value of outage costs across the whole range of consumers in the electricity supply industry (ESI).
This dissertation covers the initial process in
determining the VoLL from consumer survey data. It discusses the data stratification method used to categorize consumers according to a set of criteria and then the normalization process of consumer survey data. The questionnaire development process is then discussed with respect to the various consumer strata. Finally, a demonstration of the stratification method on consumers is shown and evaluated. Consumers are first stratified into 3 major strata: Domestic, Commercial, and Industrial. Each major stratum is then stratified into minor strata.
Sampling size is then calculated for each stratum for a
confidence interval (CI) of 90% and a precision factor r of 0.1.
vii
TABLE OF CONTENTS Approval Sheet.................................................................................................................... iii Declaration ...........................................................................................................................iv Dedication .............................................................................................................................v Acknowledgement................................................................................................................vi Abstract ...............................................................................................................................vii Table of Contents .............................................................................................................. viii List of Tables.......................................................................................................................xii List of Figures ................................................................................................................... xiii List of Equations ................................................................................................................xiv Glossary of Terms .............................................................................................................xvii CHAPTER 1
Introduction ...............................................................................................1
1.1
Preface...................................................................................................................1
1.2
Background of Asset Optimization.......................................................................3
1.3
Objectives..............................................................................................................5
1.4
Brief Methodology ................................................................................................6
1.5
Scope of Work.......................................................................................................7
1.6
Layout of Dissertation...........................................................................................7
CHAPTER 2
Determination of Value of Lost Load .......................................................9
2.1
Determination of Consumer Damage Function ....................................................9
2.2
Determination of Value of Loss Load.................................................................10
2.2.1
VoLL Formulation ......................................................................................11
2.3
Tenaga Nasional Berhad Business Code ............................................................12
2.4
TNB Consumers in Peninsular Malaysia ............................................................12
2.4.1
Domestic Consumers ..................................................................................12
2.4.2
Commercial Consumers ..............................................................................13
2.4.3
Industrial Consumers ..................................................................................13
CHAPTER 3 3.1
Statistical Analysis in Engineering .........................................................15
Statistical Analysis ..............................................................................................15
3.1.1
Etymology ...................................................................................................16
3.1.2
Origins in probability ..................................................................................17
3.1.3
Statistics today ............................................................................................18
viii
3.1.4
Conceptual overview...................................................................................19
3.1.5
Statistical methods ......................................................................................20
3.1.5.1
Experimental and observational studies..................................................21
3.1.5.2
Levels of measurement ...........................................................................22
3.2
Statistical techniques...........................................................................................23
3.2.1
Specialized disciplines ................................................................................23
3.2.2
Criticism......................................................................................................24
3.3
Normal Distribution ............................................................................................26
3.3.1
Overview .....................................................................................................27
3.3.2
History.........................................................................................................28
3.3.3
Characterization of the normal distribution ................................................28
3.3.4
Probability Density Function ......................................................................29
3.3.5
Cumulative Distribution Function ..............................................................30
3.3.6
Generating functions ...................................................................................31
3.3.6.1
Moment generating function ...................................................................31
3.3.6.2
Cumulant generating function.................................................................32
3.3.6.3
Characteristic function ............................................................................32
3.3.7
Properties ....................................................................................................32
3.3.8
Standardizing Normal Random Variables ..................................................33
3.3.9
Moments......................................................................................................34
3.3.10
Generating Values for Normal Random Variables .....................................34
3.3.11
The Central Limit Theorem ........................................................................34
3.3.12
Infinite divisibility.......................................................................................36
3.3.13
Stability .......................................................................................................36
3.3.14
Standard deviation.......................................................................................36
3.3.15
Normality tests ............................................................................................37
3.3.16
Occurrence ..................................................................................................38
3.3.17
Measurement errors.....................................................................................39
3.3.18
Physical characteristics of biological specimens ........................................40
3.3.19
Financial variables ......................................................................................41
3.3.20
Distribution in testing and intelligence .......................................................41
3.3.21
Numerical approximations of the normal distribution ................................42
3.4 3.4.1
Confidence Interval.............................................................................................42 Practical Example........................................................................................43 ix
3.4.2
Theoretical Example ...................................................................................46
3.4.3
Interpretations of Confidence Intervals.......................................................47
3.4.4
Confidence Intervals in Measurement ........................................................49
3.4.5
Robust Confidence Intervals .......................................................................50
3.4.6
Confidence Intervals for Proportions and Related Quantities.....................52
3.5
Boxplot................................................................................................................53
3.5.1
Construction ................................................................................................53
3.5.2
An Example.................................................................................................54
3.5.3
Visualization ...............................................................................................55
3.6
Outliers................................................................................................................56
3.6.1
An Example.................................................................................................57
3.6.2
Mild outliers ................................................................................................58
3.6.3
Extreme outliers ..........................................................................................58
3.6.4
Occurrence and causes ................................................................................59
3.6.5
Non-normal distributions ............................................................................59
3.7
Extreme Values ...................................................................................................59
3.7.1 3.8
Extreme values in abstract spaces with order .............................................60 Probability of a z value .......................................................................................61
3.8.1 3.9
Critical z for a given probability .................................................................61 Precision Factor...................................................................................................62
CHAPTER 4
Determination of Consumer Stratification & Sample Size....................64
4.1
Introduction .........................................................................................................64
4.2
Stratification........................................................................................................65
4.2.1
Stratified Sampling .....................................................................................66
4.3
Preprocessing of Population Data with SPSS Software......................................67
4.4
Sampling with SPSS Software ............................................................................69
4.4.1
Domestic Consumers ..................................................................................71
4.4.2
Commercial Consumers ..............................................................................72
4.4.3
Industrial Consumers ..................................................................................74
4.5
Normalization of Consumer Survey Data ...........................................................78
4.6
Preprocessing of Sample Data ............................................................................80
4.6.1
Domestic Consumer Data ...........................................................................80
4.6.2
Commercial Consumer Data .......................................................................82
4.6.3
Industrial Consumer Data ...........................................................................83 x
CHAPTER 5
Conclusion ..............................................................................................86
References ...........................................................................................................................88 Appendix 1 – TNB Business Codes....................................................................................91 Appendix 2 – List of Commercial Consumers Interviewed..............................................101 Appendix 3 – List of Industrial Consumers Interviewed ..................................................103 Appendix 4 – Domestic Questionnaire .............................................................................105 Appendix 5 – Commercial Questionnaire.........................................................................106 Appendix 6 – Industrial Questionnaire .............................................................................107 Appendix 7 – SPSS Analysis of Domestic Consumer Population....................................108 Appendix 8 – SPSS Analysis of Commercial Consumer Population ...............................116 Appendix 9 – SPSS Analysis of Industrial Consumer Population....................................140 Appendix 10 – SPSS Analysis of Domestic Consumer Samples .....................................170 Appendix 11 – SPSS Analysis of Commercial Consumer Samples .................................172 Appendix 12 – SPSS Analysis of Industrial Consumer Samples .....................................174 Appendix 13 – Standard Normal (Z) Table ......................................................................176 Appendix 14 – List of Publications...................................................................................177
xi
LIST OF TABLES Table 1: Major Differences between Probabilistic Risk and Deterministic Risk .................3 Table 2: Probability Distributions.......................................................................................26 Table 3: Some of the first few moments of the normal distribution ...................................34 Table 4: Number of Samples needed for each Major Strata (90% CI, r=0.1) ....................70 Table 5: List of Domestic Business Codes .........................................................................71 Table 6: Number of Sample Selection for a Proportional Stratified Random Sample .......72 Table 7: List of Commercial Business Codes .....................................................................72 Table 8: Boundaries between cells......................................................................................76 Table 9: Number of Sample Selection for a Proportional Stratified Random Sample .......77 Table 10: Domestic Statistics Range...................................................................................80 Table 11: Commercial Statistics Range ..............................................................................82 Table 12: Industrial Statistics Range...................................................................................84
xii
LIST OF FIGURES Figure 1: Utility Asset Investment: Balancing Reliability Cost and Reliability Worth........4 Figure 2: Example Consumer Damage Functions. .............................................................10 Figure 3: Example of determination of VoLL ....................................................................10 Figure 4: A Graph of a Bell Curve in a Normal Distribution .............................................16 Figure 5: PDF of Gaussian distribution (bell curve)...........................................................27 Figure 6: CDF of Gaussian distribution ..............................................................................30 Figure 7: Plot of the PDF of a normal distribution with
= 12 and
= 3..........................35
Figure 8: Standard Deviation Segment Sizes in Standard Normal Distribution.................37 Figure 9: Histogram with CIs..............................................................................................42 Figure 10: Figure shows 50 realizations of a confidence interval for ..............................46 Figure 11: Boxplot and PDF of a Normal N(0,1 2) Population .........................................56 Figure 12: An example of an outlier in a histogram ...........................................................57 Figure 13: An example of an outlier in a scatterplot...........................................................58 Figure 14: Classification of Consumers by kWh Consumption Year 2005........................65 Figure 15: Negative skew (left diagram) and positive skew (right diagram). ....................68 Figure 16: Kurtosis Factor Impact on a Normal Distribution .............................................68 Figure 17: Venn diagram for Tariff Classification .............................................................74 Figure 18: Industrial Stratification Method.........................................................................74 Figure 19: Industrial Stratification Process Flow................................................................75 Figure 20: Formation of Boundaries between Cells ...........................................................76 Figure 21: Allocation scheme .............................................................................................77 Figure 22: Final sample.......................................................................................................78 Figure 23: The bell curve that shows the normal distribution from sample data................78 Figure 24: Illustrates the normalization process of a survey VoLL value ..........................79 Figure 25: SPSS Boxplot of Domestic Consumer Population (CI = 90%).........................81 Figure 26: SPSS Boxplot of Commercial Consumer Population (CI = 90%) ....................83 Figure 27: SPSS Boxplot of Industrial Consumer Population (CI = 90%).........................85 Figure 28: Outage Cost vs. Reliability Curve ....................................................................87
xiii
LIST OF EQUATIONS Loss ($/kW) = f(duration, season, time of day, notice) WTA + WTP 2
Outage Costs (RM) =
f ( x; ,
)=
ϕ (x) =
1 2
f ( x; ,
(x 1 exp 2 2 e
-x 2
exp
=
2
1
ϕ
x-
(Equation 4)...................................29
1 2
x -∞
z 2
2
du
(Equation 6).................................................30
u2 du 2
(Equation 7) ..............................................30
2
2
-∞
1 1 + erf 2
( p) =
2
(u - ) -
x
( x ) = F ( x:0, 1) =
-1
)
(Equation 3) ...................11
(Equation 5) .....................................................................................29
2
1 2
)=
=
(Equation 2)........................................................11
(a1 ⋅ Cust1 + a 2 ⋅ Cust 2 + a 3 ⋅ Cust 3 + ... + a n ⋅ Cust n ) n
VoLL =
(z)
(Equation 1) ................................9
exp -
(Equation 8) ..........................................................................30
2 erf -1 ( 2p - 1)
(Equation 9)..........................................................................30
M X ( t ) = E exp ( tX ) =
(x 1 exp 2 2
∞ -∞
= exp M X ( t; ,
)
2
2
exp ( tx ) dx
(Equation 10)........................................31
2 2
t 2
t+
= E exp ( itX ) =
∞ -∞
(x 1 exp 2 2
X-
Pr ( X ≤ x ) =
) 2
2
exp ( itx ) dx
(Equation 11) ..............................32
2 2
t 2
= exp i t Z=
)
(Equation 12)...............................................................................................33
x-
=
1 x1 + erf 2 2
(Equation 13) ........................................33
xiv
X= Z+
(Equation 14)...............................................................................................33
c = -2 ln a cos ( 2 b ) erf
n 2
(Equation 16)...............................................................................................37
Pr(U < θ < V|θ) = x
1 n
ˆ=X= x=
Z=
1 25
25 i=1
X-
n i=1
(Equation 17)...................................................................................43
(Equation 18).......................................................................................43
Xi
x i = 250.2 (grams)
= n
X0.5
-1
( ( z )) =
= P ( X - 0.98 ≤ T=
XS
Pr (X x-
(Equation 19) ..................................................................43
(Equation 20) ....................................................................................44
P ( -z ≤ Z ≥ z ) = 1 z=
(Equation 15) ..............................................................................34
-1
= 0.95
(Equation 21).................................................................44
( 0.975) = 1.96
≤ X + 0.98)
(Equation 22) ........................................................44
(Equation 23)................................................................45
(Equation 24)...............................................................................................47 n cS < n
cS cS ;x+ n n
cS = 0.9 n
(Equation 25) ............................................................47
(Equation 26)..................................................................................47
Pr (-1.645 < X - θ < 1.645) = 0.9
(Equation 27) ..............................................................48
Pr (X - 1.645 < θ < X + 1.645) = 0.9 Pr (82 – 1.645 < θ < 82 + 1.645) = 0.9
(Equation 28) ........................................................48 (Equation 29) .....................................................48
< Q1 − 1.5 * IQR,
(Equation 30) ......................................................................................58
> Q3 + 1.5 * IQR
(Equation 31).......................................................................................58
< Q1 – 3 * IQR,
(Equation 32).......................................................................................58
> Q3 + 3 * IQR,
(Equation 33).......................................................................................58
z-value = 1.96
(Equation 34).......................................................................................61
z-value = 1.64
(Equation 35).......................................................................................61
z-value = 1.881
(Equation 36).......................................................................................62
xv
r=
σ ∗z∗ µ
1 σ= n X=
1 n
n i =1
n i =1
Xi
1 n
(Equation 37) ...................................................................................62
( X i − X )2
(Equation 38)............................................................................63
(Equation 39)...........................................................................................63
VoLLstratum normalized =VoLLsurvey ×
kWhstratum mean kWh survey
(Equation 40) ...........................................79
xvi
GLOSSARY OF TERMS Adequacy
The ability of the electric system to supply the aggregate electrical demand and energy requirements of the consumers from various electric generation suppliers at all times, taking into account scheduled and reasonably expected unscheduled outages of system elements.
ASIFI
Average System Interruption Frequency Index [1]. It is an index that uses the load interrupted rather than the number of consumers interrupted. Thus, it is a measure of the expected number of times load is interrupted during the specified interval of time. Thus ASIFI for a system may be computed as: ASIFI =
Li LT
where: Li is the load interrupted due to each outage LT is the total load connected to the system under consideration. CDF
Consumer Damage Function. The CDF relates the magnitude of consumer losses (RM/kWh interrupted) for a given duration of a power outage.
CI
Confidence Interval. In statistics, a CI for a population parameter is an interval between two numbers with an associated probability p which is generated from a random sample of an underlying population, such that if the sampling was repeated numerous times and the confidence interval recalculated from each sample according to the same method, a proportion p of the confidence intervals would contain the population parameter in question. Confidence intervals are the most prevalent form of interval estimation.
[1]
http://72.14.205.104/search?q=cache:FECoyC0oBjsJ:www.ee.iastate.edu/~jdm/ee653/Distribution ReliabilityFundamentals.doc+asifi+index+definition&hl=en&ct=clnk&cd=7,
Retrieved
20
December 2006
xvii
Circuit
A conductor or system of conductors through which an electric current is intended to flow.
EENS
Expected Energy Not Supplied
EGAT
Electricity Generating Authority of Thailand.
EGAT presently
builds, owns and operates several types and sizes of power plants across Thailand with a combined installed capacity of 15,035.80 MW, accounting for about 59 percent of the Thailand’s 25,646.99 MW generating capacity.
EGAT also purchases electric power
from private power companies and neighboring countries. Energy Intensity
Ratio of energy consumption and economic or physical output. At the national level, energy intensity is the ratio of total domestic primary energy consumption or final energy consumption to gross domestic product or physical output [2].
EPRI
Electric Power Research Institute (of the United States of America). EPRI brings together members, participants, the Institute’s scientists and engineers, and other leading experts to work collaboratively on solutions to the challenges of electric power. These solutions span nearly every area of electricity generation, delivery, and use, including health, safety, and environment.
ESI
Electricity Supply Industry. The ESI consists of electricity utility companies involved in the generation, transmission, and distribution of electricity in a specific region.
It includes state owned
companies, IPPs, and co-generators. FMM
Federation of Malaysian Manufacturers. FMM was established in 1968, and strives to lead Malaysian manufacturers in spearheading the nation’s growth and modernization.
Today, as the largest
private sector economic organization in Malaysia representing over 2,000 manufacturing and industrial service companies of varying sizes, the FMM is the officially recognized and acknowledged voice of the industry in Malaysia. [2]
Intergovernmental Panel on Climate Change, http://glossary.eea.europa.eu/EEAGlossary/E/energy_ intensity, Retrieved 8 January 2007.
xviii
IPP
Independent Power Producer.
IQR
InterQuartile Range. It is the difference between the third and first quartiles and is a measure of statistical dispersion. The interquartile range is a more stable statistic than the range, and is often preferred to that statistic.
kW
kilowatt. The watt (symbol: W) is the SI derived unit of power, equal to one joule per second. 1 kW is equal to one thousand (103) watts.
kWh
kilowatt-hour. The watt-hour (symbol W·h) is a unit of energy. It is commonly used in the form of the kilowatt-hour, which is 1,000 watt-hours. It is a commonly used unit, especially for measuring electric energy.
MITI
Ministry of International Trade and Industry (of Malaysia). MITI plans,
formulates,
and
implements
policies
on
industrial
development, international trade, and investment in Malaysia. It encourages foreign and domestic investment and promotes Malaysia’s export of manufacturing products and services. MMBTU
The British thermal unit (BTU or Btu) is a unit of energy used in North America.
It is also still occasionally encountered in the
United Kingdom, in the context of older heating and cooling systems. In most other areas, it has been replaced by the SI unit of energy, the joule (J). In the United States, the term “BTU” is used to describe the heat value (energy content) of fuels, and also to describe the power of heating and cooling systems, such as furnaces, stoves, barbecue grills, and air conditioners. When used as a unit of power, BTU per hour is understood, though this is often confusingly abbreviated to just “BTU”.
The unit MBTU was
defined as one thousand BTU presumably from the Roman numeral system where “M” stands for one thousand (1,000).
There is
currently a social push to redefine MBTU as one million (1,000,000) BTU, thus making the unit more intuitive with metric system that uses “M” to mean mega, or 106. To avoid confusion many companies and engineers use MMBTU to represent one million (1,000,000) BTU. In natural gas, by convention 1 MMBtu xix
(1 million Btu, sometimes written “mmBTU”) = 1.054615 GJ. Conversely, 1 gigajoule is equivalent to 26.8 m³ of natural gas at defined temperature and pressure [3]. NAICS
North American Industry Classification System. The NAICS has replaced the SIC system. NAICS was developed jointly by the U.S., Canada, and Mexico to provide new comparability in statistics about business activity across North America [4].
NERC
North American Electric Reliability Council – An organization of regional reliability councils established to promote the reliability of the electricity supply for North America.
ORP
Optimal Reliability Point
PDF
Probability Density Function
POS
Point of Sale. This can mean a retail shop, a checkout counter in a shop, or a variable location where a transaction occurs in this type of environment.
Additionally, POS sometimes refers to the
electronic cash register system being used in an establishment. POS systems are used in restaurants, hotels, stadiums, casinos, as well as retail environments – in short, if something can be sold, it can be sold where a point of sale system is in use[5]. PTM
Malaysian Energy Center. PTM is an independent and non-profit organization devoted to energy research in Malaysia administered by the Ministry of Energy, Water and Communications, Malaysia. PTM’s core activities are energy planning and research, renewable energy (RE), energy efficiency (EE) and related technological research development and demonstration (RD&D) undertaken in the energy sector.
The responsibilities also include data gathering,
compilation and strategic/policy analysis, a think-tank group for the government as well as becoming a one-stop energy agency for
[3]
http://en.wikipedia.org/wiki/MMBTU, Retrieved 17 January 2007.
[4]
http://www.census.gov/epcd/www/naics.html, Retrieved 17 January 2007.
[5]
http://en.wikipedia.org/wiki/Point_of_sale, Retrieved 17 January 2007.
xx
linkages with the universities, research institutions, industries and other various national and international organizations. Q1
Quartile 1 or first quartile or lower quartile. It cuts off the lowest 25% of data.
Q2
Quartile 2 or second quartile or median. It cuts the data set in half.
Q3
Quartile 3 or third quartile or upper quartile. It cuts off the highest 25% of data.
Reliability
The degree of performance of the elements of an electric system that results in electricity being delivered to consumers within accepted standards and in the desired amount, measured by the frequency, duration, and magnitude of adverse effects on the electric supply and by considering two basic and functional aspects of the electric system: adequacy and security.
Reliability indices
Service performance indicators which measure the frequency, duration, and magnitude of consumer interruptions, excluding outages associated with major events.
SAIDI
System Average Interruption Duration Index [6].
The average
duration of sustained consumer interruptions per consumer occurring during the analysis period.
It is the average time
consumers were without power. It is determined by dividing the sum of all sustained consumer interruption durations, in minutes, by the total number of consumers served. This determination is made by using the following equation: SAIDI =
ri Ni NT
where: NT = total number of consumers served for the area being indexed SAIFI
System Average Interruption Frequency Index.
The average
frequency of sustained interruptions per consumer occurring during the analysis period. It is calculated by dividing the total number of sustained consumer interruptions by the total number of consumers [6]
http://www.pacode.com/secure/data/052/chapter57/s57.192.html, Retrieved 6 November 2006.
xxi
served.
This determination is made by using the following
equation: SAIFI =
Ni NT
where: NT = total number of consumers served for the area being indexed Security
The ability of the electric system to withstand sudden disturbance such as electric short circuits or unanticipated loss of system elements.
SIC
Standard Industrial Classification. The SIC was a United States government system for classifying industries by a four-digit code. Established in the 1930s, it is being supplanted by the six-digit NAICS, which was released in 1997; however certain government departments and agencies, such as the U.S. Securities and Exchange Commission (SEC), still use the SIC codes [7].
TNB
Tenaga Nasional Berhad (of Malaysia). TNB’s core activities are in the generation, transmission, and distribution of electricity. The TNB Group has a complete power system, including the National Grid, Consumer Service Centers, Call Management Centers, and administration offices throughout Peninsular Malaysia and Sabah. It is the largest electricity utility company in Malaysia with assets worth more than RM60 billion serving over six million consumers throughout Peninsular Malaysia and Sabah
VoLL
Value of Lost Load. The VoLL is the aggregated or average value of outage costs across the whole range of consumers in the ESI.
WTA
Willingness-to-Accept. An approach to determine how much the consumers are willing to pay to avoid an outage.
WTP
Willingness-to-Pay.
An approach to determine how much the
consumers would be willing to accept in compensation for an outage that has occurred.
[7]
http://en.wikipedia.org/wiki/Standard_Industrial_Classification. Retrived 17 January 2007.
xxii
CHAPTER 1 INTRODUCTION
1.1
Preface
Utilities are responsible for the generation, transmission, and distribution of electricity to consumers. Part of this responsibility is ensuring that system adequacy and security criteria are fulfilled. However, this must be balanced against the investment and operating costs, which are increasingly important factors to remain competitive. Utility planning has traditionally been based on the electricity load demand forecast. The demand for electricity initiates actions by the utilities to add or retire generation, transmission, or distribution assets [1].
Retiring assets can be done fairly quickly,
however, there is a long lead time required to plan and construct new utility equipment. Decisions may need to be made from 2-10 years in advance [1] for the need of a new utility plant. These long lead times require that the utility planning horizon be at least 10 years. Since utility decisions involve an economic analysis of the operating and investment costs, the utility planning horizon may range from 15 – 30 years into the future. Forecasts with these long lead times are quite a challenge in light of the uncertainties in national, regional, and local economic growth, coupled with uncertainties in electricity usage patterns and conservation trends [1]. The ultimate goal, however, is to ensure that all these uncertainties are properly taken into account to ensure that the planning stage continues to ensure system adequacy and security in the long term. To this end, system reliability provides a good yardstick in determining the effectiveness of policies executed during the planning stage.
1
System reliability is a central criterion during the planning stage. Reliability is the need to provide both system adequacy and security. Adequacy is the existence of sufficient facilities such as generators, lines, and control systems within a system to satisfy consumer demand, whereas, system security is the ability for the system to respond against a disturbance in the system such as the loss of a generator or a lightning strike. While these criteria have served the ESI well in the past, the present environment requires a balance between the planning and operation criteria, and the economic value consumers assign to reliability in establishing target reliability levels [2]. Consumers’ needs range from those who would not mind paying a premium for a highly reliable supply (because they would suffer a very large loss when there are disruptions in their power supply) to the vast majority of consumers who do not mind tolerating outages in exchange for lower prices. Often, there is a mix of consumers with various reliability needs located in close geographic proximity. This mix of consumers, who require various levels of reliability, complicates the process of utility planning. Conventional criteria, such as electricity load demand forecast, can lead to investments that are economically inefficient. Planners may inadvertently overinvest in generation, transmission, and distribution facilities that provide greater reliability than that required by consumers. unwilling to pay.
This leads to higher prices, which consumers are
Correspondingly, under-investment in facilities serving consumers
requiring high reliability will lead to unsatisfied consumers who would be willing to pay more for a higher level of reliability. Therefore, it can be quite hard for the utility to decide on the optimum reliability level. One method of optimization is through value based planning, which is matching the level of investment in reliability with consumers’ reliability preferences [2].
Taking into
account the economics during the planning stage allows utilities to optimize their investment by investing in assets to boost reliability where consumers expect higher reliability levels than the status quo.
2
1.2
Background of Asset Optimization
Optimization is the discipline which is concerned with finding the maxima and minima of functions, possibly subject to constraints [3]. When applied to asset investment in the ESI, optimization is the process of trying to maximize the utility’s profit while, at the same time minimizing the cost of adequacy and security. Planning engineers should constantly analyze and manage risks and the costs associated with those risks. However, in the Malaysian ESI, this activity is very limited [4]. Generally, planning engineers put more emphasis on the technical requirements of adequacy and security with minimal consideration of the cost incurred for the added reliability. To enable accurate reflection of a particular type of risk in utility planning, it is necessary to first analyze and quantify that risk; then associate it with a cost penalty. Two common methods of analyzing risks are the probabilistic approach and the deterministic approach [5].
Table 1 illustrates some of the major differences between deterministic and
probabilistic estimates. Table 1: Major Differences between Probabilistic Risk and Deterministic Risk • • •
Probabilistic Risk Assessment Takes into account all available information and considers the probability of an occurrence. The risk estimate is expressed as a distribution of values, with a probability assigned to each value. The distribution reflects variability and uncertainty.
• •
Deterministic Risk Assessment This risk estimate is expressed as point value. The variability and uncertainty of this value are not reflected.
The probability approach aims to determine the probability of an event occurrence using statistical data. In this process, a sampling method is selected. To avoid bias, researchers employ random sampling procedures [6]. Through statistical methods, for a given sample size, the mean, standard deviation, coefficient of variation, and the level of precision can be calculated. This method allows utilities to gauge damage perceived by consumers due to a power interruption. The deterministic approach, on the other hand, plans for contingency scenarios such as n-1 criteria and largest unit tripping. In utilizing this method, the utility plans for various potential contingencies and estimates the damage to the affected consumers. In this work, 3
the deterministic approach was used to collect consumer data via the various scenarios in the questionnaires that were developed. However, it is foreseen that this work can be extended to cover the probabilistic analysis as obtaining this would enable risk quantification. By relating the system reliability level to the outage costs, the optimum
Cost
reliability point, as shown in Figure 1, could be calculated.
Total Cost Reliability Cost (utility) Reliability Worth (consumer) A
Reliability
Figure 1: Utility Asset Investment: Balancing Reliability Cost and Reliability Worth [7].
Figure 1 above illustrates the costs utilities face in planning for asset investment. In the figure, there are 2 types of costs: the reliability cost and reliability worth. The reliability cost is related to the cost of purchasing and installing new equipment to increase the utility’s reliability. This cost increases with increasing reliability. The reliability worth is the value that the consumer is willing to pay for increasing power supply reliability. This cost decreases with increasing reliability. The total cost to the consumer is the sum of the utility cost, which the consumer pays for through the electricity bill, and the consumer cost due to outages [8]. Reliability worth according to Billinton [9] is the outage cost which can be divided into three categories: i)
Outage Cost to Utility – this includes loss of revenue, loss of goodwill, loss of future potential sales and increased expenditure for maintenance and repair.
ii)
Outage Cost to Industry- this includes lost of production, damaged machineries and products and corrective maintenance.
iii)
Outage Cost to Domestic – this includes lost of frozen foods, alternative energy cost.
4
Often outage cost to Industry and Domestic outweigh the outage cost to utility. However, the outage cost can be reduced through measures such as the construction of parallel lines which connect a power source to the load. Utility asset investment centers on reducing risk by increasing system reliability to provide system adequacy and security.
However, optimization would show that investment
beyond point A is economically inefficient because it leads to higher total cost. At the same time, investment below point A would result in a lower than optimum reliability level, which also results in higher total cost. In order to embark on this exercise, initial data had to be collected, conditioned, and understood. The initial data included the consumer groups in Peninsular Malaysia, types of manufacturing and/or commercial processes, consumption data, premise location, and TNB business code. This dissertation aims to document these processes and present the research findings.
1.3
Objectives
The objectives of this research are: i)
To group electricity consumers based on their TNB business code, tariff structure, monthly energy consumption or a combination of these grouping. This includes stratification of consumers to facilitate accurate data collection through sampling.
ii)
To determine their weights in the final VoLL calculation. Weights are based on the percentage consumption of a particular stratum in comparison to the total consumption.
iii)
To determine the individual outage cost of consumers of the identified strata. This was accomplished by collecting sample data through the questionnaires that we developed to gauge the consumers’ perception of the direct and indirect costs of various outage scenarios.
iv)
To develop a formulation which enables the calculation of the VoLL. This allows the individual stratum VoLL values to be combined to form a composite VoLL value.
5
There is also a growing need to determine the value of electricity worth for system security purposes. This can be seen in the proliferation of contracts for interruption or interruptible load schemes between electricity utilities and large consumers. In order for schemes like this to take off, the VoLL must first be found. Another use of the VoLL is in the cost of outage analysis. At the same time, VoLL can be used in cost-benefit analysis for justifying new projects given a particularly high cost of outage in specific areas. In the event that the utility is required to pay compensation to its consumers due to a loss of supply, the VoLL can provide a starting point for both parties to negotiate the quantum of compensation.
In some deregulated markets, the VoLL is a component of the Pool
Purchase Price and is a reflection of the price given an inadequate supply of electricity in the network. This again shows the use of VoLL in determining the cost of unsupplied electricity to consumers.
1.4
Brief Methodology
Initially, a literature review is to be carried out to assess existing methods of calculating VoLL in other utilities. Consumer data will be collected and verified in order to classify consumer groups. This required the researchers to liaise with TNB Distribution Sdn Bhd in order to acquire the latest consumer data, which includes the monthly energy consumption, peak demand, TNB business code, and also the tariff of the consumers. Based on the above data, the research examined how best to group the consumers. This required a balance between the two factors of adequate sampling and also the time constraints of the research. The collected data would also form the basis of the weights of each consumer strata. Questionnaires were developed to suit the different consumer segments.
These
questionnaires and cost assessment templates formed the basis of outage cost assessment and were used when meeting with consumers to collect sample data.
6
A series of interviews were conducted to assess the cost of manufacturing, service processes, and inconvenience caused by various outage scenarios at selected industrial, commercial, and domestic consumers in the TNB Network of Peninsular Malaysia. Further discussion of the above will be covered in Chapters 2 through 5.
1.5
Scope of Work
This research will analyze data from TNB databases in order to understand the make-up and consequently group TNB consumers. Consumers will be divided in to major strata, which will then be subdivided into minor strata. Given this stratification, the study will not assess all consumers but would rely on sampling to determine and calculate consumer losses for each minor stratum. A census of the entire population would not be feasible because of the tremendous cost and time needed. Compared to a census, the sampling process will provide an adequate accuracy factor for a fraction of the cost and time. This would result in a reduced list of consumers to be interviewed and examined. The stratification will also allow a simplified form of consumer group weights. The weighs will be based on consumer consumption and will be used to calculate the VoLL of a major stratum by combining individual VoLL values from the minor strata. Similarly, these weights will be used to calculate the composite VoLL for Peninsular Malaysia by combining the individual VoLL values of the major strata.
1.6
Layout of Dissertation
This dissertation will introduce the concept of outage cost analysis and compare the data collection techniques, results, and analysis from research carried out in other parts of the world. Data collection techniques will be discussed with particular emphasis on data collection through surveys.
7
The application of statistics in engineering will then be introduced, including the concept of CIs, precision, outliers, extreme values, and box plots. Next, the determination of consumer stratification and sample size will be presented.
This will include the
differentiation and characteristics of domestic, commercial, and industrial consumers. In the following chapter, the results of our nationwide survey and its analysis are presented. This data analysis will discuss the CDFs for the various categories of domestic, commercial, and industrial consumers.
The overall composite VoLL will also be
calculated. Chapter 2 will introduce the concept of outage cost and VoLL. It will discuss the various outage cost estimation methods and evaluation types. The CDF will then be introduced and discussed with respect to the VoLL formulation. It also lays the groundwork for the statistics theory and calculations that will be used in later chapters. It discusses the characteristics and importance of the TNB 5-digit business code. Lastly, the definition and characteristics of domestic, commercial, and industrial consumers in Peninsular Malaysia are discussed. Chapter 3 presents the origins and evolution of statistics theory and their application in data processing and conditioning.
The Normal distribution, confidence intervals,
boxplots, outliers, extreme values, z-value, and their implications on a data set are also discussed. Chapter 4 discusses consumer stratification and determination of the appropriate sample size for an adequate precision level. First the stratification process is documented. This includes a discussion of the three major consumer strata and their respective needs. The minor strata and the steps to determine the minor strata are discussed. Next, the normal distribution function and precision factor for the TNB database are discussed with reference to calculations presented in Chapter 4. Lastly, the normalization process and processing of the collected sample data is discussed. Chapter 5 draws conclusions from the research and highlights future possible work in this area.
8
CHAPTER 2 DETERMINATION OF VALUE OF LOST LOAD
2.1
Determination of Consumer Damage Function
The economics loses customers experience as a result of reliability and power quality problems may be described by what is called a Consumer Damage Function (CDF). In a CDF, the losses that consumers face are expressed as a function of the magnitude of load interrupted, the duration of the interruption, the season and time of day, and whether notice was given by the utility notifying the consumer of a planned outage [10]. The CDF can be defined as: Loss ($/kW) = f(duration, season, time of day, notice)
(Equation 1)
The CDF for each consumer’s category is called the individual consumer damage function (ICDF). All the ICDFs for a given category of consumers such as domestic, commercial and industrial can be combined to represent the function of cost for that category and can be named as the sector consumer damage function (SCDF). The SCDF for domestic, commercial and industrial can then be combined to produce the composite consumer damage function (CCDF). This CCDF is used to represent the CDF for a large area. Figure 2 below illustrates the incremental CDFs for domestic, large commercial and industrial, and small and medium commercial and industrial. The CDF relates the magnitude of consumer losses (per kW interrupted) for a given duration of a power outage. While the general shapes of all three curves are similar, the magnitude of loss varies dramatically depending on the consumer’s size.
9
$/kW interrupted
Small & Medium C/I Large C/I Domestic Outage Duration
Figure 2: Example Consumer Damage Functions [11].
2.2
Determination of Value of Loss Load
VoLL is said to represent the value an average consumer puts on an unsupplied kWh [12]. The value of VoLL can be calculated by using data from the CDF.
Small & Medium C&I
$/kW interrupted
VoLL
Outage Duration Figure 3: Example of determination of VoLL
Figure 3 illustrates how to determine VoLL for small and medium sized commercial and industrial. Based on the VoLL data from an EGAT survey in March and April 2000 [13], it was estimated that the consumers’ costs in the first hour for domestic consumers was Baht 11.45/kW. For large C/I and small & medium C/I consumers, the cost in the first hour was Baht 29.55/kW and Baht 89.50 /kW respectively. Another research from EPRI [11] indicates that domestic consumers’ cost tend to peak at USD 1.50/kW in the first hour and falls of to USD 0.46/kW in subsequent hours. On the other hand, large C/I and small & medium C/I suffer much higher losses of USD 10/kW and USD 38/kW respectively in the first hour. This falls to USD 4/kW and USD 9/kW respectively in the subsequent hours. 10
This general trend shows that consumers usually incur medium to heavy losses in the first few hours of outage and tend to have smaller incremental losses after 4 hours. Outages, especially forced outages, cause an interruption of the consumers’ usual activities. During the first few hours, consumers typically will be forced to deal with the outage and make the necessary arrangements for a resumption of their activities once supply is restored. After 4 hours, losses are reduced because consumers have been able to plan and cope with the outage. This is the reason notice is important: consumers who have been notified of planned outages in advance can plan around the interruption period, thereby reducing outage induced losses.
2.2.1
VoLL Formulation
In order to determine the VoLL, all the information which will be gathered from the survey should be analyzed by using some formulas. The outage cost is determined by using the simple average of the WTA and WTP values. The calculation for outage cost is given by Equation 2.
Outage Costs (RM) =
WTA + WTP 2
(Equation 2)
Therefore, the average value of WTA and WTP for respondents will be used to calculate the customer losses in the VoLL formula. The general equation used to calculate the VoLL is shown in Equation 3.
VoLL =
(a1 ⋅ Cust1 + a 2 ⋅ Cust 2 + a 3 ⋅ Cust 3 + ... + a n ⋅ Cust n ) n
(Equation 3)
where; a is consumer weight n is group identifier Custn is losses incurred by consumer n The calculation of VoLL for domestic, commercial, and industrial is based on this general equation. For each category, the weight of the consumer is calculated first. Consumer 11
weights are defined by the consumption of the individual minor strata in the domestic, commercial, or industrial strata divided by the total consumption of that major stratum.
2.3
Tenaga Nasional Berhad Business Code
To simplify the stratification process, membership of the minor strata were defined by the TNB business codes. The TNB business codes are 5-digit codes created by Tenaga Nasional Berhad (TNB), a local utility company, to differentiate the types of residences, commercial business, and industries. It includes a detailed breakdown of the premise by type, business, activity, consumption size, and other criteria. All the business codes with their respective premise description are listed in Appendix 1 – TNB Business Codes.
2.4
TNB Consumers in Peninsular Malaysia
TNB has a total number of 6,582,374 consumers in 2005 [14]. Consumers as defined by TNB [15] are any person and/or entity taking electricity supply from TNB’s supply lines at any one point of supply, provided that if a person and/or entity takes a supply at more than one point of supply such person and/or entity shall be deemed to be separate consumer for each of such point of supply.
2.4.1
Domestic Consumers
There are 5,482,920 domestic consumers, which constitute 83.297 percent of the total number of consumers [16]. A domestic consumer as defined by TNB [15] is a consumer occupying a private dwelling, which is not used as a hotel, boarding house or used for the purpose of carrying out any form of business, trade, professional activities or services. A domestic consumer typically is the smallest power consumer amongst the three major strata. Consumption during weekdays is high during the morning period of 6am – 8am due to preparation for work or school, and during the evening and night period of 6pm –
12
11pm, which is when dinner is prepared and household chores are done. Weekends are marked by little consumption.
2.4.2
Commercial Consumers
Commerce means buying and selling of goods, where else commercial refers to an activity of commerce [17]. Hence, considering commercial consumers for analysis in this research is vital. A commercial consumer as defined by TNB [15] is, but not limited to, a consumer occupying or operating an office block, hotel, service apartment, boarding house, retail complex, shop-house, car-park, workshop, restaurant, estate, plantation, farm (except those categories defined in the Specific Agriculture Tariff), port, airport, railway installation, toll plaza, street lightings at tolled highway including its bridges and tunnels, telecommunications installation, broadcasting installation, entertainment/recreation/sports outlet, golf course, school/educational institution, religious and welfare organization, military and government installation, hospital, waste treatment plant, district cooling plant, cold storage, warehouse, and any other form of business or commercial activities which are not primarily involved in manufacturing, quarrying or mining activities. Commercial consumers are typically moderate to large consumers during weekday business hours, with some premises extending activities over the weekend.
Their
consumption is used mainly to run motor loads and computer equipment.
2.4.3
Industrial Consumers
Industrial consumers are low in number but contribute the most in terms of revenue; their total energy consumption is high at approximately 80% of total consumption in Peninsular Malaysia. TNB has a total of 26,689 industrial consumers [14]. An industrial consumer as defined by TNB [15] is a consumer engaging in manufacturing of goods and products.
Manufacturing means the conversion of raw material or 13
components to finished products such as the making, altering, blending, ornamenting, finishing, or otherwise treating, or adapting any article with a view to use, sell, transport, deliver, or dispose; and includes the assembly of parts and food processing but shall not include any activity normally associated with the retail or wholesale trade. Quarrying of minerals, stone, and other natural resources and pumping for water treatment plant are also termed as Industrial Consumer.
In addition, the total wattage of lamps and air
conditionings installed for the purpose of office use shall not exceed 20% of the total wattage of all electrical equipment installed.
14
CHAPTER 3 STATISTICAL ANALYSIS IN ENGINEERING
3.1
Statistical Analysis
A textbook definition of statistics is “a logic and methodology for the measurement of uncertainty and for an examination of the consequences of that uncertainty in the planning and interpretation of experimentation or observation” [18]. Statistics is a mathematical science pertaining to the collection, analysis, interpretation, and presentation of data [19]. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities; it is also used for making informed decisions in all areas of business and government. Statistical methods can be used to summarize or describe a collection of data; this is called descriptive statistics. In addition, patterns in the data may be modeled in a way that accounts for randomness and uncertainty in the observations, to draw inferences about the process or population being studied; this is called inferential statistics. Both descriptive and inferential statistics can be considered part of applied statistics. There is also a discipline of mathematical statistics, which is concerned with the theoretical basis of the subject. The word statistics is also the plural of statistic (singular), which refers to the result of applying a statistical algorithm to a set of data, as in employment statistics, accident statistics, etc.
15
Figure 4: A Graph of a Bell Curve in a Normal Distribution [20]
3.1.1
Etymology
The word statistics ultimately derives from the modern Latin term statisticum collegium (“council of state”) and the Italian word statista (“statesman” or “politician”) [21]. The German Statistik, first introduced by Gottfried Achenwall (1749), originally designated the analysis of data about the state, signifying the “science of state”; then called political arithmetic in English [22]. It acquired the meaning of the collection and classification of
data generally in the early 19th century. It was introduced into English by Sir John Sinclair. Thus, the original principal purpose of Statistik was data to be used by governmental and (often centralized) administrative bodies. The collection of data about states and localities continues, largely through national and international statistical services. In particular, censuses provide regular information about the population. During the 20th century, the creation of precise instruments for public health concerns (epidemiology, biostatistics, etc.) and economic and social purposes (unemployment rate, econometry, etc.) necessitated substantial advances in statistical practices. This became a
16
necessity for Western welfare states developed after World War I which had to develop a specific knowledge of their “population”. Philosophers such as Michel Foucault have argued that this constituted a form of “biopower”, a term which has since been used by many other authors [23].
3.1.2
Origins in probability
The mathematical methods of statistics emerged from probability theory, which can be dated to the correspondence of Pierre de Fermat and Blaise Pascal (1654) [24]. Christiaan Huygens (1657) gave the earliest known scientific treatment of the subject.
Jakob
Bernoulli’s Ars Conjectandi (posthumous, 1713) and Abraham de Moivre’s Doctrine of Chances (1718) treated the subject as a branch of mathematics [25].
The theory of errors may be traced back to Roger Cotes’s Opera Miscellanea (posthumous, 1722), but a memoir prepared by Thomas Simpson in 1755 (printed 1756) first applied the theory to the discussion of errors of observation [18]. The reprint (1757) of this memoir lays down the axioms that positive and negative errors are equally probable, and that there are certain assignable limits within which all errors may be supposed to fall; continuous errors are discussed and a probability curve is given. Pierre-Simon Laplace (1774) made the first attempt to deduce a rule for the combination of observations from the principles of the theory of probabilities. He represented the law of probability of errors by a curve.
He deduced a formula for the mean of three
observations. He also gave (1781) a formula for the law of facility of error (a term due to Lagrange, 1774), but one which led to unmanageable equations. Daniel Bernoulli (1778) introduced the principle of the maximum product of the probabilities of a system of concurrent errors [18]. The method of least squares, which was used to minimize errors in data measurement, is due to Adrien-Marie Legendre (1805), who introduced it in his Nouvelles méthodes pour la détermination des orbites des comètes (New Methods for Determining the Orbits of
Comets).
In ignorance of Legendre’s contribution, an Irish-American writer, Robert
Adrain, editor of “The Analyst” (1808), first deduced the law of facility of error. He gave 17
two proofs, the second being essentially the same as John Herschel’s (1850). Carl Gauss gave the first proof which seems to have been known in Europe (the third after Adrain’s) in 1809. Further proofs were given by Laplace (1810, 1812), Gauss (1823), James Ivory (1825, 1826), Hagen (1837), Friedrich Bessel (1838), W. F. Donkin (1844, 1856), and Morgan Crofton (1870). Other contributors were Ellis (1844), De Morgan (1864), Glaisher (1872), and Giovanni Schiaparelli (1875).
Peters’s (1856) formula for r, the probable error of a single
observation, is well known. In the nineteenth century authors on the general theory included Laplace, Sylvestre Lacroix (1816), Littrow (1833), Richard Dedekind (1860), Helmert (1872), Hermann Laurent (1873), Liagre, Didion, and Karl Pearson. Augustus De Morgan and George Boole improved the exposition of the theory. Adolphe Quetelet (1796-1874), another important founder of statistics, introduced the notion of the “average man” (l’homme moyen) as a means of understanding complex social phenomena such as crime rates, marriage rates or suicide rates.
3.1.3
Statistics today
Today the use of statistics has broadened far beyond its origins as a service to a state or government. Individuals and organizations use statistics to understand data and make informed decisions throughout the natural and social sciences, medicine, business, and other areas. Statistics is generally regarded not as a subfield of mathematics but as a distinct, albeit allied, field. Many universities maintain separate mathematics and statistics departments. Statistics is also taught in departments as diverse as psychology, education, and public health.
18
3.1.4
Conceptual overview
In applying statistics to a scientific, industrial, or societal problem, one begins with a process or population to be studied. This might be a population of people in a country, of crystal grains in a rock, or of goods manufactured by a particular factory during a given period. It may instead be a process observed at various times; data collected about this kind of “population” constitute what is called a time series. For practical reasons, rather than compiling data about an entire population, one usually instead studies a chosen subset of the population, called a sample. Data are collected about the sample in an observational or experimental setting. The data are then subjected to statistical analysis, which serves two related purposes: description and inference. i)
Descriptive statistics can be used to summarize the data, either numerically or graphically, to describe the sample.
Basic examples of numerical descriptors
include the mean and standard deviation.
Graphical summarizations include
various kinds of charts and graphs. ii)
Inferential statistics is used to model patterns in the data, accounting for randomness and drawing inferences about the larger population. These inferences may take the form of answers to yes/no questions (hypothesis testing), estimates of numerical characteristics (estimation), forecasting of future observations, descriptions of association (correlation), or modeling of relationships (regression). Other modeling techniques include ANOVA, time series, and data mining.
The concept of correlation is particularly noteworthy. Statistical analysis of a data set may reveal that two variables (that is, two properties of the population under consideration) tend to vary together, as if they are connected. For example, a study of annual income and age of death among people might find that poor people tend to have shorter lives than affluent people. The two variables are said to be correlated. However, one cannot immediately infer the existence of a causal relationship between the two variables [26]. If the sample is representative of the population, then inferences and conclusions made from the sample can be extended to the population as a whole. A major problem lies in determining the extent to which the chosen sample is representative. Statistics offers
19
methods to estimate and correct for randomness in the sample and in the data collection procedure, as well as methods for designing robust experiments in the first place [27]. The fundamental mathematical concept employed in understanding such randomness is probability. Mathematical statistics (also called statistical theory) is the branch of applied mathematics that uses probability theory and analysis to examine the theoretical basis of statistics. The use of any statistical method is valid only when the system or population under consideration satisfies the basic mathematical assumptions of the method. Misuse of statistics can produce subtle but serious errors in description and interpretation — subtle in that even experienced professionals sometimes make such errors, and serious in that they may affect social policy, medical practice and the reliability of structures such as bridges and nuclear power plants. Even when statistics is correctly applied, the results can be difficult to interpret for a nonexpert. For example, the statistical significance of a trend in the data — which measures the extent to which the trend could be caused by random variation in the sample — may not agree with one’s intuitive sense of its significance. The set of basic statistical skills (and skepticism) needed by people to deal with information in their everyday lives is referred to as statistical literacy.
3.1.5
Statistical methods
Some statistical methods are discussed in the following subsections.
They include
experimental and observational studies, levels of measurement, and statistical techniques.
20
3.1.5.1 Experimental and observational studies A common goal for a statistical research project is to investigate causality, and in particular to draw a conclusion on the effect of changes in the values of predictors or independent variables on a response or dependent variable. There are two major types of causal statistical studies, experimental studies and observational studies. In both types of studies, the effect of differences of an independent variable (or variables) on the behavior of the dependent variable are observed. The difference between the two types is in how the study is actually conducted. Each can be very effective. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation may have modified the values of the measurements. manipulation.
In contrast, an observational study does not involve experimental Instead data are gathered and correlations between predictors and the
response are investigated. An example of an experimental study is the famous Hawthorne studies which attempted to test changes to the working environment at the Hawthorne plant of the Western Electric Company.
The researchers were interested in whether increased illumination would
increase the productivity of the assembly line workers. The researchers first measured productivity in the plant then modified the illumination in an area of the plant to see if changes in illumination would affect productivity.
Due to errors in experimental
procedures, specifically the lack of a control group and blindedness, the researchers were unable to do what they planned, in what is known as the Hawthorne effect [28]. An example of an observational study is a study which explores the correlation between smoking and lung cancer.
This type of study typically uses a survey to collect
observations about the area of interest and then perform statistical analysis. In this case, the researchers would collect observations of both smokers and non-smokers and then look at the number of cases of lung cancer in each group.
21
The basic steps for an experiment are to: i)
plan the research including determining information sources, research subject selection, and ethical considerations for the proposed research and method
ii)
design the experiment concentrating on the system model and the interaction of independent and dependent variables
iii)
summarize a collection of observations to feature their commonality by suppressing details (descriptive statistics)
iv)
reach consensus about what the observations tell about the world that is observed (statistical inference)
v)
document and present the results of the study
3.1.5.2 Levels of measurement There are four types of measurements or measurement scales used in statistics. The four types or levels of measurement, which are nominal, ordinal, interval, and ratio, have different degrees of usefulness in statistical research. Ratio measurements, where both a zero value and distances between different measurements are defined, provide the greatest flexibility in statistical methods that can be used for analyzing the data.
Interval
measurements have meaningful distances between measurements but no meaningful zero value (such as IQ measurements or temperature measurements in degrees Celsius). Ordinal measurements have imprecise differences between consecutive values but a meaningful order to those values. Nominal measurements have no meaningful rank order among values.
22
3.2
Statistical techniques
Some well known statistical tests and procedures for research observations are: i)
Student’s t-test
ii)
chi-square
iii)
analysis of variance (ANOVA)
iv)
Mann-Whitney U
v)
regression analysis
vi)
correlation
vii)
Fisher’s Least Significant Difference test
3.2.1
a.
Pearson product-moment correlation coefficient
b.
Spearman’s rank correlation coefficient
Specialized disciplines
Some sciences use applied statistics so extensively that they have specialized terminology. These disciplines include: i)
Actuarial science
ii)
Biostatistics
iii)
Business statistics
iv)
Data mining (apply statistics & pattern recognition to obtain knowledge from data)
v)
Economic statistics (Econometrics)
vi)
Engineering statistics
vii)
Statistical physics
viii)
Demography
ix)
Psychological statistics
x)
Social statistics (for all the social sciences)
xi)
Statistical literacy
xii)
Statistical surveys
xiii)
Process analysis & chemometrics (for analysis of data from analytical chemistry)
xiv)
Reliability engineering
xv)
Image processing
xvi)
Statistics in various sports, particularly baseball and cricket 23
Statistics form a key basis tool in business and manufacturing as well. It is used to understand measurement systems variability, control processes (as in statistical process control or SPC), for summarizing data, and to make data-driven decisions. In these roles it is a key tool, and perhaps the only reliable tool. In this research, Engineering Statistics and Econometrics played an important role in the stratification process and in deciding the sample size. These processes are discussed further in Chapter 4.
3.2.2
Criticism
There is a general perception that statistical knowledge is all-too-frequently intentionally misused, by finding ways to interpret the data that are favorable to the presenter. A famous quote, variously attributed, but thought to be from Benjamin Disraeli [29] is: “There are three types of lies - lies, damn lies, and statistics.” Indeed, the well-known book How to Lie with Statistics by Darrell Huff [30] discusses many cases of deceptive uses of statistics, focusing on misleading graphs. By choosing (or rejecting, or modifying) a certain sample, results can be manipulated; throwing out outliers is one means of doing so. This may be the result of outright fraud or of subtle and unintentional bias on the part of the researcher. As further studies contradict previously announced results, people may become wary of trusting such studies. One might read a study that says (for example) “do X to reduce high blood pressure”, followed by a study that says “doing X does not affect high blood pressure”, followed by a study that says “doing X actually worsens high blood pressure”. Often the studies were conducted on different groups with different protocols, or a smallsample study that promised intriguing results has not held up to further scrutiny in a largesample study. However, many readers may not have noticed these distinctions, or the media may have oversimplified this vital contextual information, and the public’s distrust of statistics is thereby increased. However, deeper criticisms come from the fact that the hypothesis testing approach, widely used and in many cases required by law or regulation, forces one hypothesis to be ‘favored’ (the null hypothesis), and can also seem to exaggerate the importance of minor
24
differences in large studies. A difference that is highly statistically significant can still be of no practical significance. In the fields of psychology and medicine, especially with regard to the approval of new drug treatments by the Food and Drug Administration, criticism of the hypothesis testing approach has increased in recent years. One response has been a greater emphasis on the p-value over simply reporting whether or not a hypothesis was rejected at the given level
of significance . Here again, however, this summarizes the evidence for an effect but not the size of the effect. One increasingly common approach is to report confidence intervals (CI) instead, since these indicate both the size of the effect and the uncertainty surrounding it. This aids in interpreting the results, as the CI for a given
simultaneously indicates
both statistical significance and effect size. Note that both the p-value and CI approaches are based on the same fundamental calculations as those entering into the corresponding hypothesis test. The results are stated in a more detailed format, rather than the yes-or-no finality of the hypothesis test, but use the same underlying statistical methodology. A truly different approach is to use Bayesian methods. This approach has been criticized as well, however. The strong desire to see good drugs approved and harmful or useless ones restricted remain conflicting tensions (Type I and Type II errors in the language of hypothesis testing). According to Abelson [31], makes the case that statistics serves as a standardized means of settling arguments between scientists who could otherwise each argue the merits of their own cases ad infinitum. Statistics is, in this view, a form of rhetoric. This can be viewed as a positive or a negative, but as with any means of settling a dispute, statistical methods can succeed only so long as both sides accept the approach and agree on the particular method to be used.
25
3.3
Normal Distribution
There are many probability distribution functions. Table 2 lists some functions according to discrete/continuous and univariate/multivariate classifications.
However, for this
research, the normal probability distribution is of particular interest because this research deals with consumer consumption data from TNB databases which contain consumption data for all electricity consumers in Peninsular Malaysia. Their consumption pattern would be of normal distribution due to the large number of consumers of random consumption demand. Table 2: Probability Distributions Discrete
Continuous
Miscellaneous
Univariate Multivariate Benford • Bernoulli • binomial • Boltzmann • categorical • compound Ewens • Poisson • discrete phase-type • degenerate • Gauss-Kuzmin • geometric multinomial • • hypergeometric • logarithmic • negative binomial • parabolic fractal • multivariate Poisson • Rademacher • Skellam • uniform • Yule-Simon • zeta • Zipf • Polya Zipf-Mandelbrot Beta • Beta prime • Cauchy • chi-square • Dirac delta function • Dirichlet • Coxian • Erlang • exponential • exponential power • F • fading • inverse-Wishart • Kent • matrix Fisher’s z • Fisher-Tippett • Gamma • generalized extreme value • generalized hyperbolic • generalized inverse Gaussian • Half-Logistic • normal • Hotelling’s T-square • hyperbolic secant • hyper-exponential • multivariate normal • hypoexponential • inverse chi-square (scaled inverse chi-square)• multivariate inverse Gaussian • inverse gamma (scaled inverse gamma) • Kumaraswamy • Landau • Laplace • Lévy • Lévy skew alpha-stable • Student • von logistic • log-normal • Maxwell-Boltzmann • Maxwell speed • Mises-Fisher • Wigner quasi • Nakagami • normal (Gaussian) • normal-gamma • normal inverse Wishart Gaussian • Pareto • Pearson • phase-type • polar • raised cosine • Rayleigh • relativistic Breit-Wigner • Rice • shifted Gompertz • Student’s t • triangular • type-1 Gumbel • type-2 Gumbel • uniform • Variance-Gamma • Voigt • von Mises • Weibull • Wigner semicircle • Wilks’ lambda Cantor • conditional • equilibrium • exponential family • infinitely divisible • locationscale family • marginal • maximum entropy • posterior • prior • quasi • sampling • singular
The normal distribution, also called Gaussian distribution (named after Carl Friedrich Gauss, a German mathematician, although Gauss was not the first to work with it), is an extremely important probability distribution in many fields. It is a family of distributions of the same general form, differing in their location and scale parameters: the mean (“average”) and standard deviation (“variability”), respectively. The standard normal distribution is the normal distribution with a mean of zero and a variance of one, as illustrated in the green curve in the plots in Figure 5. It is often called the bell curve because the graph of its probability density resembles a bell.
26
Figure 5: PDF of Gaussian distribution (bell curve).
3.3.1
Overview
The fundamental importance of the normal distribution as a model of quantitative phenomena in the natural and behavioral sciences is due to the central limit theorem (see Section 3.3.11 for explanation).
A variety of psychological test scores and physical
phenomena like photon counts can be well approximated by a normal distribution. While the mechanisms underlying these phenomena are often unknown, the use of the normal model can be theoretically justified if one assumes many small (independent) effects contribute to each observation in an additive fashion. The normal distribution also arises in many areas of statistics: for example, the sampling distribution of the mean is approximately normal, even if the distribution of the population the sample is taken from is not normal. In addition, the normal distribution maximizes information entropy among all distributions with known mean and variance, which makes it the natural choice of underlying distribution for data summarized in terms of sample mean and variance. The normal distribution is the most widely used family of distributions in statistics and many statistical tests are based on the assumption of normality. In probability theory, normal distributions arise as the limiting distributions of several continuous and discrete families of distributions.
27
3.3.2
History
The normal distribution was first introduced by Abraham de Moivre in an article in 1734 (reprinted in the second edition of his The Doctrine of Chances, 1738) in the context of approximating certain binomial distributions for large n. His result was extended by Laplace in his book Analytical Theory of Probabilities (1812), and is now called the theorem of de Moivre-Laplace. Laplace used the normal distribution in the analysis of errors of experiments.
The
important method of least squares was introduced by Legendre in 1805. Gauss, who claimed to have used the method since 1794, justified it rigorously in 1809 by assuming a normal distribution of the errors. The name “bell curve” goes back to Jouffret who first used the term “bell surface” in 1872 for a bivariate normal with independent components. The name “normal distribution” was coined independently by Charles S. Peirce, Francis Galton, and Wilhelm Lexis around 1875. This terminology is unfortunate, since it reflects and encourages the fallacy that many or all probability distributions are “normal” (see Section 3.3.16 for explanation). That the distribution is called the Gaussian distribution is an instance of Stigler’s law of eponymy: “No scientific discovery is named after its original discoverer.”
3.3.3
Characterization of the normal distribution
There are various ways to characterize a probability distribution. The most visual is the PDF, which represents how likely each value of the random variable is. The CDF is a conceptually cleaner and less cluttered way to specify the same information, but to the untrained eye its plot is much less informative. Equivalent ways to specify the normal distribution are [32]: the moments, the cumulants, the moment generating function (see Section 3.3.6.1 for explanation), the cumulant generating function (see Section 3.3.6.2 for explanation), the characteristic function (see Section 3.3.6.3 for explanation), and Maxwell’s theorem. Some of these are very useful for theoretical work, but not intuitive.
28
3.3.4
Probability Density Function
The probability density function of the normal distribution with mean µ and variance σ2 (equivalently, standard deviation σ) is an example of a Gaussian function, f ( x; ,
(x 1 exp 2 2
)=
) 2
2
=
1
ϕ
x-
(Equation 4)
where 1 2
ϕ (x) =
e
-x 2
2
(Equation 5)
is the density function of the “standard” normal distribution, i.e., the normal distribution with
= 0 and
= 1.
As a Gaussian function with the denominator of the exponent equal to two, the standard normal density function ϕ is an eigenfunction of the Fourier transform. To indicate that a random variable X, is normally distributed with mean we write X
N( ,
2
and variance
2
)
Some notable qualities of the normal distribution: i)
The density function is symmetric about its mean value.
ii)
The mean is also its mode and median.
iii)
68.26894921371% of the area under the curve is within one standard deviation of the mean.
iv)
95.44997361036% of the area is within two standard deviations.
v)
99.73002039367% of the area is within three standard deviations.
vi)
The inflection points of the curve occur at one standard deviation away from the mean.
29
3.3.5
Cumulative Distribution Function
Figure 6: CDF of Gaussian distribution
The CDF is defined as the probability that a variable X has a value less than or equal to x, and it is expressed in terms of the density function as f ( x; ,
)=
1 2
x
exp
(u - ) 2
-∞
2
2
du
The standard normal CDF, conventionally denoted with
= 0 and
(Equation 6)
, is just the general CDF evaluated
= 1,
( x ) = F ( x:0, 1) =
1 2
x -∞
exp -
u2 du 2
(Equation 7)
The standard normal CDF can be expressed in terms of a special function called the error function, as
(z)
=
1 1 + erf 2
z 2
(Equation 8)
The inverse cumulative distribution function, or quantile function, can be expressed in terms of the inverse error function: -1
( p) =
2 erf -1 ( 2p - 1)
(Equation 9)
30
This quantile function is sometimes called the probit function. There is no elementary primitive for the probit function. This is not to say merely that none is known, but rather that the non-existence of such a function has been proved. Values of
(x) may be approximated very accurately by a variety of methods, such as
numerical integration, Taylor series, or asymptotic series.
3.3.6
Generating functions
In mathematics, a generating function is a formal power series whose coefficients encode information about a sequence an that is indexed by the natural numbers. There are various types of generating functions, including ordinary generating functions, exponential generating functions, Lambert series, Bell series, and Dirichlet series. The particular generating function that is most useful in a given context will depend upon the nature of the sequence and the details of the problem being addressed. Generating functions are often expressed in closed form as functions of a formal argument
x. Sometimes a generating function is evaluated at a specific value of x. However, it must be remembered that generating functions are formal power series, and they will not necessarily converge for all values of x.
3.3.6.1 Moment generating function The moment generating function is defined as the expected value of exp(tX). For a normal distribution, completing the square in the exponent, it can be shown that M X ( t ) = E exp ( tX ) =
∞ -∞
= exp
(x 1 exp 2 2 t+
) 2
2
exp ( tx ) dx
(Equation 10)
2 2
t 2
31
3.3.6.2 Cumulant generating function The cumulant generating function is the logarithm of the moment generating function:
g(t) = t + 2−1 2t2. The derivative of the cumulant generating function is simply: g’ (t) =
+
2
t
3.3.6.3 Characteristic function The characteristic function is defined as the expected value of exp(itX), where i is the imaginary unit. For a normal distribution, the characteristic function is M X ( t; ,
)
= E exp ( itX ) ∞
=
-∞
(x 1 exp 2 2
)
2
2
exp ( itx ) dx
(Equation 11)
2 2
t 2
= exp i t -
The characteristic function is obtained by replacing t with it in the moment-generating function.
3.3.7
Properties
Some of the properties of the normal distribution: 1. If X
N( ,
2. If X
N(
X
2
,
(
) and a and b are real numbers, then aX + b 2 X
) and Y
N(
Y
2 Y
,
N a + b, ( a
)
2
)
) are independent normal random
variables, then:
•
Their sum is normally distributed with U = X + Y
•
Their difference is normally distributed with V=X-Y
•
N(
X
-
Y
,
2 X
+
2 Y
N(
X
+
Y
,
2 X
+
2 Y
)
)
Both U and V are independent of each other
32
3. If X
N ( 0,
2 X
) and Y
N ( 0,
2 Y
) are independent normal random variables,
then:
•
Their product XY follows a distribution with density p given by
p (z) =
1 X
z
K0 Y
X
where K0 is a modified Bessel function of the Y
second kind
•
X Y
Their ratio follows a Cauchy distribution with
Cauchy 0,
X Y
4. If X1 ,...,X n are independent standard normal variables, then X12 + ... + X 2n has a chi-square distribution with n degrees of freedom.
3.3.8
Standardizing Normal Random Variables
As a consequence of Property 1, it is possible to relate all normal random variables to the standard normal. If X ~ N( , 2), then Z=
X-
(Equation 12)
is a standard normal random variable: Z ~ N(0,1). An important consequence is that the CDF of a general normal distribution is therefore
Pr ( X ≤ x ) =
x-
=
1 x1 + erf 2 2
(Equation 13)
Conversely, if Z ~ N(0, 1), then X= Z+ is a normal random variable with mean
(Equation 14)
and variance
2
.
The standard normal distribution has been tabulated, and the other normal distributions are simple transformations of the standard one. Therefore, one can use tabulated values of the CDF of the standard normal distribution to find values of the CDF of a general normal distribution.
33
3.3.9
Moments
Some of the first few moments of the normal distribution are shown in Table 3. All of cumulants of the normal distribution beyond the second cumulant are zero. Table 3: Some of the first few moments of the normal distribution Number
Raw moment
Central moment
0
1
1
1
0 2
2 3
3 4
Cumulant
4
+6
+
+3 2 2
2 2
+3
4
2
2
0
0
3
4
0
3.3.10 Generating Values for Normal Random Variables For computer simulations, it is often useful to generate values that have a normal distribution. There are several methods and the most basic is to invert the standard normal CDF. More efficient methods are also known, one such method being the Box-Muller transform. An even faster algorithm is the ziggurat algorithm. The Box-Muller algorithm says that, if you have two numbers a and b uniformly distributed on (0, 1], (e.g. the output from a random number generator), then a standard normally distributed random variable is c where c = -2 ln a cos ( 2 b )
(Equation 15)
This is a consequence of the fact that the chi-square distribution with two degrees of freedom (see property 4 above) is an easily-generated exponential random variable.
3.3.11 The Central Limit Theorem The normal distribution has the very important property that under certain conditions, the distribution of a sum of a large number of independent variables is approximately normal. This is the central limit theorem. 34
Figure 7: Plot of the PDF of a normal distribution with
= 12 and
=3
approximating the PDF of a binomial distribution with n = 48 and p = 1/4
The practical importance of the central limit theorem is that the normal distribution can be used as an approximation to some other distributions. This is shown in Figure 7. •
A binomial distribution with parameters n and p is approximately normal for large
n and p not too close to 1 or 0 (some books recommend using this approximation only if np and n(1 − p) are both at least 5; in this case, a continuity correction should be applied). The approximating normal distribution has mean •
2
= np and variance
= np(1 − p).
A Poisson distribution with parameter is approximately normal for large .
The approximating normal distribution has mean
= and variance
2
= .
Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution.
35
3.3.12 Infinite divisibility The normal distributions are infinitely divisible probability distributions. To say that a probability distribution F on the real line is infinitely divisible means that if X is any random variable whose distribution is F, then for every positive integer n there exist n independent identically distributed random variables X1, ..., Xn whose sum is equal in distribution to X. This simplifies the stratification process because it is possible to divide this distribution into any number of strata, which begin and end anywhere within the distribution range.
3.3.13 Stability The normal distributions are strictly stable probability distributions. In statistics, the stability of a family of probability distributions is an important property which essentially states that if one has a number of random variates that are ‘in the family’, any linear combination of these variates will also be ‘in the family’. The importance of a stable family of probability distributions is that they serve as ‘attractors’ for linear combinations of non-stable random variates. By the classical central limit theorem the linear sum of a set of random variates, each with finite variance, will tend towards a normal distribution as the number of variates increases. Thus, in this research, although individual consumers may be of random consumption size, in aggregate, they tend to form a normal distribution.
3.3.14 Standard deviation Figure 8 indicates that dark blue is less than one standard deviation from the mean. For the normal distribution, this accounts for about 68% of the set while two standard deviations from the mean (blue and brown) account for about 95% and three standard deviations (blue, brown, and green) account for about 99.7%.
36
Figure 8: Standard Deviation Segment Sizes in Standard Normal Distribution.
In practice, one often assumes that data are from an approximately normally distributed population. If that assumption is justified, then about 68% of the values are at within 1 standard deviation away from the mean, about 95% of the values are within two standard deviations, and about 99.7% lie within 3 standard deviations. This is known as the “6895-99.7 rule” or the “Empirical Rule”. To be more precise, the area under the curve between − n and n is erf
n 2
(Equation 16)
where erf(x) is the error function. To six decimal places the values of the 1, 2, and 3 sigma points are 0.682689..., 0.954500..., 0.997300... respectively.
3.3.15 Normality tests Normality tests check a given set of data for similarity to the normal distribution. The null hypothesis is that the data set is similar to the normal distribution; therefore a sufficiently small P-value indicates non-normal data. i)
Kolmogorov-Smirnov test
ii)
Lilliefors test
iii)
Anderson-Darling test
iv)
Ryan-Joiner test
v)
Shapiro-Wilk test
vi)
normal probability plot (rankit plot)
vii)
Jarque-Bera test
37
3.3.16 Occurrence Approximately normal distributions occur in many situations, as a result of the central limit theorem. When there is reason to suspect the presence of a large number of small effects acting additively and independently, it is reasonable to assume that observations will be normal.
This is why the consumption data for the electricity consumers of
Peninsular Malaysia are taken to be normally distributed: there are a very large number of consumers who, individually, each consume a very small amount of electricity as compared to the total amount of electricity sold to all consumers. There are statistical methods to empirically test that assumption, for example the Kolmogorov-Smirnov test. Effects can also act as multiplicative (rather than additive) modifications. In that case, the assumption of normality is not justified, and it is the logarithm of the variable of interest that is normally distributed. The distribution of the directly observed variable is then called log-normal. Finally, if there is a single external influence which has a large effect on the variable under consideration, the assumption of normality is not justified either. This is true even if, when the external variable is held constant, the resulting marginal distributions are indeed normal. The full distribution will be a superposition of normal variables, which is not in general normal. This is related to the theory of errors (see Section 3.3.17 for explanation). To summarize, this is a list of situations where approximate normality is sometimes assumed.
•
In counting problems (so the central limit theorem includes a discrete-to-continuum approximation) where reproductive random variables are involved, such as
o Binomial random variables, associated to yes/no questions; o Poisson random variables, associated to rare events; •
In physiological measurements of biological specimens:
o The logarithm of measures of size of living tissue (length, height, skin area, weight);
38
o The length of inert appendages (hair, claws, nails, teeth) of biological specimens, in the direction of growth; presumably the thickness of tree bark also falls under this category;
o Other physiological measures may be normally distributed, but there is no reason to expect that a priori;
•
Measurement errors are often assumed to be normally distributed, and any deviation from normality is considered something which should be explained;
•
Financial variables
o Changes in the logarithm of exchange rates, price indices, and stock market indices; these variables behave like compound interest, not like simple interest, and so are multiplicative;
o Other financial variables may be normally distributed, but there is no reason to expect that a priori;
•
Light intensity
o The intensity of laser light is normally distributed; o Thermal light has a Bose-Einstein distribution on very short time scales, and a normal distribution on longer timescales due to the central limit theorem. Of relevance to biology and economics is the fact that complex systems tend to display power laws rather than normality.
3.3.17 Measurement errors Normality is the central assumption of the mathematical theory of errors. Similarly, in statistical model-fitting, an indicator of goodness of fit is that the residuals (as the errors are called in that setting) be independent and normally distributed. The assumption is that any deviation from normality needs to be explained. In that sense, both in model-fitting and in the theory of errors, normality is the only observation that need not be explained, being expected. However, if the original data are not normally distributed (for instance if they follow a Cauchy distribution), then the residuals will also not be normally distributed. This fact is usually ignored in practice.
39
Repeated measurements of the same quantity are expected to yield results which are clustered around a particular value. If all major sources of errors have been taken into account, it is assumed that the remaining error must be the result of a large number of very small additive effects, and hence normal. Deviations from normality are interpreted as indications of systematic errors which have not been taken into account. Whether this assumption is valid is debatable.
3.3.18 Physical characteristics of biological specimens The sizes of full-grown animals are approximately lognormal. The evidence and an explanation based on models of growth was first published in the 1932 book Problems of
Relative Growth by Julian Huxley. However, in the case of human height for example, there are people several standard deviations away from the average who would almost certainly not exist at all among the whole population of the world if height followed a true lognormal distribution. Differences in size due to sexual dimorphism, or other polymorphisms like the worker/soldier/queen division in social insects, further make the distribution of sizes deviate from lognormality. The assumption that linear size of biological specimens is normal (rather than lognormal) leads to a non-normal distribution of weight (since weight or volume is roughly proportional to the 2nd or 3rd power of length, and Gaussian distributions are only preserved by linear transformations), and conversely assuming that weight is normal leads to non-normal lengths. This is a problem, because there is no a priori reason why one of length, or body mass, and not the other, should be normally distributed. Lognormal distributions, on the other hand, are preserved by powers so the “problem” goes away if lognormality is assumed. On the other hand, there are some biological measures where normality is assumed, such as blood pressure of adult humans. This is supposed to be normally distributed, but only
40
after separating males and females into different populations (each of which is normally distributed).
3.3.19 Financial variables Because of the exponential nature of inflation, financial indicators such as stock values, or commodity prices make good examples of multiplicative behavior. As such, periodic changes in them (for example, yearly changes) should not be expected to be normal, but perhaps lognormal. This was the theory proposed in 1900 by Louis Bachelier. However, Benoît Mandelbrot, the popularizer of fractals, showed that even the assumption of lognormality is flawed – the changes in logarithm over short periods (such as a day) are approximated well by distributions that do not have a finite variance, and therefore the central limit theorem does not apply. Rather, the sum of many such changes gives logLevy distributions.
3.3.20 Distribution in testing and intelligence A great deal of confusion exists over whether or not IQ test scores and intelligence are normally distributed. As a deliberate result of test construction, IQ scores are normally distributed for the majority of the population. But intelligence cannot be said to be normally distributed, simply because it is not a number. The difficulty and number of questions on an IQ test is decided based on which combinations will yield a normal distribution. This does not mean, however, that the information is in any way being misrepresented, or that there is any kind of “true” distribution that is being artificially forced into the shape of a normal curve. Intelligence tests can be constructed to yield any kind of score distribution desired.
41
3.3.21 Numerical approximations of the normal distribution The normal distribution is widely used in scientific and statistical computing. Therefore, it has been implemented in various ways. The GNU Scientific Library calculates values of the standard normal CDF using piecewise approximations by rational functions. Another approximation method uses third-degree polynomials on intervals.
3.4
Confidence Interval
In statistics, a CI helps describe how reliable survey results are. All other things being equal, a survey result with a small CI is more reliable than a result with a large CI. More precisely, a CI for a population parameter is an interval with an associated probability p that is generated from a random sample of an underlying population such that if the sampling was repeated numerous times and the confidence interval recalculated from each sample according to the same method, a proportion p of the confidence intervals would contain the population parameter in question. Confidence intervals are the most prevalent form of interval estimation. In Figure 9, the bars represent observation means and the red lines represent the CIs surrounding them. The difference between the two populations on the left is significant at the level corresponding to the confidence probability because the intervals do not overlap.
Figure 9: Histogram with CIs
42
It must be noted that a confidence interval is not in general equivalent to a (Bayesian) credible interval. The common error of equating the two is known as the prosecutor’s fallacy. If U and V are statistics (i.e., observable random variables) whose probability distribution depends on some unobservable parameter , and Pr(U < θ < V|θ) = x
(Equation 17)
where x is a number between 0 and 1 then the random interval (U, V) is a “(100·x)% confidence interval for ”. The number x (or 100·x%) is called the confidence level or confidence coefficient. In modern applied practice, most confidence intervals are stated at the 95% level [33]
3.4.1
Practical Example
A machine fills cups with margarine, and is supposed to be adjusted so that the mean content of the cups is close to 250 grams of margarine. Of course it is not possible to fill every cup with exactly 250 grams of margarine. Hence the weight of the filling can be considered to be a random variable X. The distribution of X is assumed here to be a normal distribution with unknown expectation standard deviation
and (for the sake of simplicity) known
= 2.5 grams. To check if the machine is adequately adjusted, a
sample of n = 25 cups of margarine is chosen at random and the cups weighed. The weights of margarine are X1,…, X25, a random sample from X. To get an impression of the expectation
, it is sufficient to give an estimate. The
appropriate estimator is the sample mean: ˆ=X=
1 n
n i=1
Xi
(Equation 18)
The sample shows actual weights X1,…, X25, with mean: x=
1 25
25 i=1
x i = 250.2 (grams)
(Equation 19)
43
If we take another sample of 25 cups, we could easily expect to find values like 250.4 or 251.1 grams. A sample mean value of 280 grams however would be extremely rare if the mean content of the cups is in fact close to 250g. There is a whole interval around the observed value 250.2 of the sample mean within which, if the whole population mean actually takes a value in this range, the observed data would not be considered particularly unusual. Such an interval is called a confidence interval for the parameter . How is such an interval calculated? The endpoints of the interval have to be calculated from the sample, so they are statistics, functions of the sample X1,…,X25 and hence random variables themselves. In this example, the endpoints may determined by considering that the sample mean X from a normally distributed sample is also normally distributed, with the same expectation , but with standard deviation
n
= 0.5 (grams). By standardizing, the
random variable Z is obtained: Z=
X-
= n
X0.5
(Equation 20)
dependent on , but with a standard normal distribution independent of the parameter
to
be estimated. Hence it is possible to find numbers -z and z, independent of , where Z lies in between with probability 1 − , a measure of how confident we want to be. Taking 1−
= 0.95,
P ( -z ≤ Z ≥ z ) = 1 -
= 0.95
(Equation 21)
The number z follows from:
(z) z=
= P ( Z ≤ z) = 1 -1
( ( z )) =
-1
2
= 0.975
( 0.975) = 1.96
(Equation 22)
44
and: 0.95 = 1 = P ( -z ≤ Z ≤ z ) = P -1.96 ≤
X-
≤ 1.96 n
= P X - 1.96
n
≤
= P ( X - 1.96 × 0.5 ≤
= P ( X - 0.98 ≤
≤ X + 1.96
n
≤ X + 1.96 × 0.5 )
≤ X + 0.98)
(Equation 23)
This might be interpreted as: with probability 0.95 one will find the parameter
between
the stochastic endpoints: X - 0.98 and X + 0.98 Every time the measurements are repeated, there will be another value for the mean X of the sample. In 95% of the cases
will be between the endpoints calculated from this
mean, but in 5% of the cases it will not be. The actual CI is calculated by entering the measured weights in the formula. This 0.95 CI becomes:
( x - 0.98 ; x + 0.98 ) = ( 250.2 - 0.98 ; 250.2 + 0.98) = ( 249.22 ; 251.18 ) This interval has fixed endpoints, where
might be in between (or not). There is no
probability of such an event. It cannot be said: “with probability 1 −
the parameter
in the CI.” It is only known that by repetition in 100(1 − ) % of the cases
lies
will be in the
calculated interval. In 100 % of the cases however it doesn’t. And unfortunately it is not known in which of the cases this happens. That is why it is said: “with confidence level 100(1 − ) %
lies in the confidence interval.”
45
Figure 10: Figure shows 50 realizations of a confidence interval for
Observation of the sample means selecting or choosing from the population of all realizations. There the probability is 95% that it ends in having chosen an interval that contains the parameter. After realization, all that is obtained is the chosen interval. As seen from Figure 10, there was a fair chance of choosing an interval containing
;
however, if unlucky, the wrong one may have been picked. In other words, if one were to make a large number of sets of measurements, and calculate the confidence interval each time, one would except (on the average) such intervals to include the mean the selected percentage of times, say about 95 out of each 100 times for a 95% CI. On the other hand, for a 90% CI, the means are at the center of the intervals 90% of the time [34].
3.4.2
Theoretical Example
Suppose X1, ..., Xn are an independent sample from a normally distributed population with mean
and variance X=
. Let
( X1 + ... + X n )
1 S = n-1 2
2
n n i=1
(X
i
- X)
2
46
Then T=
XS
(Equation 24)
n
has a Student’s t-distribution with n − 1 degrees of freedom. Note that the distribution of T does not depend on the values of the unobservable parameters
and
2
; i.e., it is a
pivotal quantity. If c is the 95th percentile of this distribution, then
Pr(-c < T < c) = 0.9 (Note: “95th” and “0.9” are correct in the preceding expressions. There is a 5% chance that T will be less than −c and a 5% chance that it will be larger than +c. Thus, the probability that T will be between −c and +c is 90%). Consequently, Pr (X -
cS < n
cS = 0.9 n
and a theoretical (stochastic) 90% CI for
(Equation 25)
is obtained.
After observing the sample the values for x for X and s for S are found, from which the CI x-
cS cS ;x+ n n
(Equation 26)
is computed, which is an interval with fixed numbers as endpoints, of which no more can be said as there is a certain probability it contains the parameter . Either
is in this
interval or isn’t.
3.4.3
Interpretations of Confidence Intervals
Confidence levels are typically given alongside statistics resulting from sampling. In a statement: “we are 90% confident that between 35% and 45% of voters favor Candidate A”, 90% is the confidence level and 35%-45% is the confidence interval.
47
This statement is often misunderstood in the following way. Capital letters U and V are used for random variables; it is conventional to use lower-case letters u and v for their observed values in a particular instance. The misunderstanding is the conclusion that Pr (u < θ < v) = 0.9 so that after the data has been observed, a conditional probability distribution of , given the data, is inferred. For example, suppose X is normally distributed with expected value and variance 1. (It is grossly unrealistic to take the variance to be known while the expected value must be inferred from the data, but it makes the example simple.) The random variable X is observable. (The random variable X − value depends on .) Then X −
is not observable, since its
is normally distributed with expectation 0 and variance
1; therefore Pr (-1.645 < X - θ < 1.645) = 0.9
(Equation 27)
Consequently Pr (X - 1.645 < θ < X + 1.645) = 0.9
(Equation 28)
so the interval from X − 1.645 to X + 1.645 is a 90% CI for . But when X = 82 is observed, it can then be said that Pr (82 – 1.645 < θ < 82 + 1.645) = 0.9
(Equation 29)
This conclusion does not follow from the laws of probability because
is not a “random
variable”; i.e., no probability distribution has been assigned to it. CIs are generally a frequentist method, i.e., employed as interpretting “90% probability” as “occurring in 90% of all cases”. Suppose, for example, that
is the mass of the planet Neptune, and the
randomness in the measurement of error means that 90% of the time the statement that the mass is between this number and that number will be correct. The mass is not what is random. Therefore, given that we have measured it to be 82 units, we cannot say that in 90% of all cases, the mass is between 82 − 1.645 and 82 + 1.645. There are no such cases; there is, after all, only one planet Neptune. But if probabilities are construed as degrees of belief rather than as relative frequencies of occurrence of random events, i.e., Bayesian probability rather than frequentism, it can then be said that one is 90% sure that the mass is between 82 − 1.645 and 82 + 1.645? Many answers to this question have been proposed, and are philosophically controversial. The
48
answer will not be a mathematical theorem, but a philosophical tenet. Less controversial are Bayesian credible intervals, in which one starts with a prior probability distribution of , and finds a posterior probability distribution, which is the conditional probability distribution of given the data. For users of frequentist methods, the explanation of a CI can amount to something like: “The CI represents values for the population parameter for which the difference between the parameter and the observed estimate is not statistically significant at the 10% level”.
Critics of frequentist methods suggest that this hides the real and, to the critics, incomprehensible frequentist interpretation which might be expressed as: “If the population parameter in fact lies within the CI, then the probability that the estimator either will be the estimate actually observed, or will be closer to the parameter, is less than or equal to 90%”. Users of Bayesian methods, if producing a CI, might by contrast
say “The degree of belief that the parameter is in fact in the CI is 90%”. Disagreements about these issues are not disagreements about solutions to mathematical problems. Rather they are disagreements about the ways in which mathematics is to be applied.
3.4.4
Confidence Intervals in Measurement
More concretely, the results of measurements are often accompanied by CIs. For instance, suppose a scale is known to yield the actual mass of an object plus a normally distributed random error with mean 0 and known standard deviation . If 100 objects of known mass are weighed on this scale and it reports the values ± , then it can expected that around 68 of the reported ranges include the actual mass. If a smaller standard error value is required, then the measurement is repeated n times and the results are averaged. Then the 68.2% CI is ±
n
. For example, repeating the
measurement 100 times reduces the confidence interval to 1/10 of the original width. Note that when it is reported a 68.2% CI (usually termed standard error) as v ± , this does not mean that the true mass has a 68.2% chance of being in the reported range. In fact, the true mass is either in the range or not. How can a value outside the range be said to have 49
any chance of being in the range? Rather, this statement means that 68.2% of the ranges that are reported are likely to include the true mass. This is not just a trivial objection. Under the incorrect interpretation, each of the 100 measurements described above would be specifying a different range, and the true mass supposedly has a 68% chance of being in each and every range. Also, it supposedly has a 32% chance of being outside each and every range. If two of the ranges happen to be disjoint, the statements are obviously inconsistent. Say one range is 1 to 2, and the other is 2 to 3. Supposedly, the true mass has a 68% chance of being between 1 and 2, but only a 32% chance of being less than 2 or more than 3. The incorrect interpretation reads more into the statement than is meant. On the other hand, under the correct interpretation, each and every statement made is really true, because the statements are not about any specific range. It could report that one mass is 10.2 ± 0.1 grams, while really it is 10.6 grams, and not be lying. But if fewer than 1000 values are reported and more than two of them are that far off, there will have to be some explaining. It is also possible to estimate a CI without knowing the standard deviation of the random error. This is done using the T distribution, or by using non-parametric resampling methods such as the bootstrap, which do not require that the error have a normal distribution.
3.4.5
Robust Confidence Intervals
In the process of weighing 1000 objects, under practical conditions, it is easy to believe that the operator might make a mistake in procedure and so report an incorrect mass (thereby making one type of systematic error). Suppose the operator has 100 objects and has weighed them all, one at a time, and repeated the whole process ten times. A sample standard deviation for each object can then be calculated, and outliers may be identified. Any object with an unusually large standard deviation probably has an outlier in its data. These can be removed by various non-parametric techniques. If the operator repeated the process only three times, he would simply take the median of the three measurements and 50
use
to give a confidence interval. The 200 extra weightings served only to detect and
correct for operator error and did nothing to improve the CI. With more repetitions, the operator could use a truncated mean, discarding say the largest and smallest values and averaging the rest. Then a bootstrap calculation could be used to determine a CI that is narrower than that calculated from , and so obtain some benefit from a large amount of extra work. These procedures are robust against procedural errors which are not modeled by the assumption that the balance has a fixed known standard deviation
.
In practical
applications where the occasional operator error can occur, or the balance can malfunction, the assumptions behind simple statistical calculations cannot be taken for granted. Before trusting the results of 100 objects weighed just three times each to have CIs calculated from , it is necessary to test for and remove a reasonable number of outliers (testing the assumption that the operator is careful and correcting for the fact that he is not perfect), and to test the assumption that the data really have a normal distribution with standard deviation . The theoretical analysis of such an experiment is complicated, but it is easy to set up a spreadsheet which draws random numbers from a normal distribution with standard deviation
to simulate the situation (use =norminv(rand(),0, )) [35].
After removing obvious outliers, one could subtract the median from the other two values for each object, and examine the distribution of the 200 resulting numbers. It should be normal with the mean near zero and standard deviation a little larger than . A simple Monte Carlo spreadsheet calculation would reveal typical values for the standard deviation (around 105% to 115% of ). Or, one could subtract the mean of each triplet from the values, and examine the distribution of 300 values. The mean is identically zero, but the standard deviation should be somewhat smaller (around 75% to 85% of ).
51
3.4.6
Confidence Intervals for Proportions and Related Quantities
An approximate CI for a population mean can be constructed for random variables that are not normally distributed in the population, relying on the central limit theorem (see Section 3.3.11 for explanation), if the sample sizes and counts are big enough. The formulae are identical to the case above (where the sample mean is actually normally distributed about the population mean). The approximation will be quite good with only a few dozen observations in the sample if the probability distribution of the random variable is not too different from the normal distribution (e.g. its cumulative distribution function does not have any discontinuities and its skewness is moderate). One type of sample mean is the mean of an indicator variable, which takes on the value 1 for true and the value 0 for false. (Statisticians often call indicator variables “dummy variables”, but that term is also frequently used by mathematicians for the concept of a bound variable.) The mean of such a variable is equal to the proportion that has the variable equal to one (both in the population and in any sample). Thus, the sample mean for a variable labeled MALE in data is just the proportion of sampled observations who have MALE = 1, i.e. the proportion who are male. This is a useful property of indicator variables, especially for hypothesis testing. To apply the central limit theorem, one must use a large enough sample. A rough rule of thumb is that one should see at least 5 cases in which the indicator is 1 and at least 5 in which it is 0. Confidence intervals constructed using the above formulae may include negative numbers or numbers greater than 1, but proportions obviously cannot be negative or exceed 1. The probability assigned to negative numbers and numbers greater than 1 is usually small when the sample size is large and the proportion being estimated is not too close to 0 or 1. CIs for cases where the method above assigns a substantial probability to (− , 0) or to (1,
) may be constructed by inverting hypothesis tests. If conducting hypothesis tests
over the whole feasible range of parameter values is considered, and including any values for which a single hypothesis test would not reject the null hypothesis that the true value was that value, given a sample value, one can make a CI based on the central limit theorem that does not violate the basic properties of proportions. 52
On the other hand, sample proportions can only take on a finite number of values, so the central limit theorem and the normal distribution are not the best tools for building a CI. A better method would rely on the binomial distribution or the beta distribution, and there are a number of better methods in widespread use.
There are advantages and
disadvantages of each [36].
3.5
Boxplot
The boxplot was invented in 1977 by American statistician John Tukey. In descriptive statistics, a boxplot (also known as a box-and-whisker diagram or candlestick chart) is a convenient way of graphically depicting the five-number summary, which consists of the smallest non-outlier observation, lower quartile (Q1), median, upper quartile (Q3) and largest non-outlier observation. In addition, the boxplot indicates which observations, if any, are considered outliers. Boxplots are able to visually show different types of populations, without any assumptions of the statistical distribution. The spacings between the different parts of the box can help indicate variance, skew and identify outliers. Boxplots can be drawn either horizontally or vertically.
3.5.1
Construction
For a data set, one constructs a boxplot in the following manner: i)
Calculate the Q1 (x.25), median (x.50), and Q3 (x.75)
ii)
Calculate the IQR by subtracting Q1 from Q3 (x.75-x.25)
iii)
Construct a box above the number line bounded on the left by the first quartile (x.25) and on the right by the third quartile (x.75). The box may be as tall as one likes, although reasonably proportioned boxplots are customary
iv)
Indicate where the median lies inside of the box with the presence of a symbol or a line dividing the box at the median value
v)
Any data observation which lies more than 1.5*IQR lower than the first quartile or 1.5*IQR higher than the third quartile is considered an outlier. Indicate where the 53
smallest value that is not an outlier is by a vertical tic mark or “whisker”, and connect the whisker to the box via a horizontal line. Likewise, indicate where the largest value that is not an outlier is by a “whisker”, and connect that whisker to the box via another horizontal line vi)
Indicate outliers by open and closed dots. “Extreme” outliers, or those which lie more than three times the IQR to the left and right from the first and third quartiles, respectively, are indicated by the presence of an open dot. “Mild” outliers – those observation which lie more than 1.5 times the IQR from the first and third quartile but are not also extreme outliers are indicated by the presence of a closed dot
vii)
Add an appropriate label to the number line and title the boxplot
viii)
It is worth noting that a boxplot may be constructed in a similar manner vertically as opposed to horizontally by merely interchanging “left” for “bottom” and “right” for “top” in the above description
3.5.2
An Example
A plain-text version might look like this: +-----+-+ |-------| + | |---| +-----+-+ +---+---+---+---+---+---+---+---+---+---+ 0 1 2 3 4 5 6 7 8 9 10 *
o
number line
For this data set (values are approximate, based on the figure): i)
smallest observation (outliers excluded, minimum or min) = 5
ii)
lower quartile (Q1) = 7
iii)
median (Q2) (Med) = 8.5
iv)
upper quartile (Q3) = 9
v)
largest observation (outliers excluded, maximum or max) = 10
vi)
mean = 8
vii)
IQR = Q3 − Q1 = 2
viii)
the value 3.5 is a “mild” outlier, between 1.5*(IQR) and 3*(IQR) below Q1
ix)
the value 0.5 is an “extreme” outlier, more than 3*(IQR) below Q1
x)
the smallest value that is not an outlier is 5
xi)
the data are skewed to the left (negatively skewed) 54
The horizontal lines (the “whiskers”) extend to at most 1.5 times the box width (the IQR) from either or both ends of the box. They must end at an observed value, thus connecting all the values outside the box that are not more than 1.5 times the box width away from the box. Three times the box width marks the boundary between “mild” and “extreme” outliers. There are alternative implementations of this detail of the box plot in various software packages, such as the whiskers extending to at most the 5th and 95th (or some more extreme) percentiles. Such approaches do not conform to Tukey’s definition, with its emphasis on the median in particular and counting methods in general; and they tend to produce “outliers” for all data sets larger than ten, no matter what the shape of the distribution.
3.5.3
Visualization
The boxplot is a quick graphic approach for examining one or more sets of data. Boxplots may seem more primitive than a histogram or PDF but it does have its benefits. Besides saving space on paper, boxplots are quicker to generate by hand.
Histograms and
probability density functions require assumptions of the statistical distribution.
This
assumption can be a major barrier because binning techniques can heavily influence the histogram and incorrect variance calculations will heavily affect the probability density function. However, looking at a statistical distribution is more intuitive than looking at a boxplot, comparing the boxplot against the probability density function (theoretical histogram) for a Normal N(0,1 2) distribution may be a useful tool for understanding the boxplot in Figure 11.
55
Figure 11: Boxplot and PDF of a Normal N(0,1 2) Population
3.6
Outliers
An outlier is an extremely unrepresentative data point [37]. An outlier is an observation that lies outside the overall pattern of a distribution [38]. In statistics, an outlier is an observation that is numerically distant from the rest of the data [37]. Statistics derived from data sets that include outliers will often be misleading. Outliers may be indicative of data points that belong to a different population than the rest of the sample set. In most samplings of data, some data points will be further away from their expected values than what is deemed reasonable. Outliers arise for two reasons [39]: i)
They are legitimate observations whose values are simply unusually large or unusually small, i.e. these observations happen to be a long way from the center of the data
ii)
They are the result of an error in measurement, poor experimental technique, or a mistake in recording or entering data. They can also be due to systematic error, faults in the theory that generated the expected values. 56
Outlier points can therefore indicate faulty data, erroneous procedures, or areas where a certain theory might not be valid. However, a small number of outliers is expected in normal distributions. Deletion of outlier data is a controversial practice frowned on by many scientists and science instructors; while mathematical criteria provides an objective and quantitative method for data rejection, it does not make the practice more scientifically or methodologically sound, especially in small sets or where a normal distribution cannot be assumed.
Rejection of outliers is more acceptable in areas of practice where the
underlying model of the process being measured and the usual distribution of measurement error are confidently known [37]. If the cause of an outlier cannot be found, a good strategy is to analyze the data both with and without the outlier. If the results are similar, then the outlier is having little effect. If the results are substantially different, then the presence of the outlier should be reported, and both analyses presented. Further, in order to decide which is the appropriate analysis, it may be necessary to make extra effort to identify a cause for the outlier, or to obtain more data.
3.6.1
An Example
Outliers are often easy to spot in histograms. For example, the point on the far left in the Figure 12 is an outlier.
Figure 12: An example of an outlier in a histogram
57
Outliers can also occur when comparing relationships between two sets of data. Outliers of this type can be easily identified on a scatterplot, as shown in Figure 13. When performing least squares fitting to data, it is often best to discard outliers before computing the line of best fit. This is particularly true of outliers along the x direction, since these points may greatly influence the result [38] [40].
Figure 13: An example of an outlier in a scatterplot
3.6.2
Mild outliers
Defining Q1 and Q3 to be the first and third quartiles, and IQR to be the interquartile range (Q3 − Q1), one possible definition of being “far away” in this context is < Q1 − 1.5 * IQR,
(Equation 30)
> Q3 + 1.5 * IQR
(Equation 31)
or Q1 and Q3 define the so-called inner fences, beyond which an observation would be
labeled a mild outlier.
3.6.3
Extreme outliers
Extreme outliers are observations that are beyond the outer fences: < Q1 – 3 * IQR,
(Equation 32)
> Q3 + 3 * IQR,
(Equation 33)
or
58
3.6.4
Occurrence and causes
In the case of normally distributed data, using the above definitions, only about 1 in 150 observations will be a mild outlier, and only about 1 in 425,000 an extreme outlier. Because of this, outliers usually demand special attention, since they may indicate problems in sampling or data collection or transcription. Alternatively, an outlier could be the result of a flaw in the assumed theory, calling for further investigation by the researcher.
3.6.5
Non-normal distributions
Even when a normal model is appropriate to the data being analyzed, outliers are expected for large sample sizes and should not automatically be discarded if that is the case. Also, the possibility should be considered that the underlying distribution of the data is not approximately normal, having “fat tails”. For instance, when sampling from a Cauchy distribution, the sample variance increases with the sample size, the sample mean fails to converge as the sample size increases, and outliers are expected at far larger rates than for a normal distribution.
3.7
Extreme Values
The largest and the smallest element of a set are called extreme values, absolute extrema, or extreme records. For a differentiable function f, if f(x0) is an extreme value for the set of all values f(x), and if x0 is in the interior of the domain of f, then (x0, f(x0)) is a stationary point or critical point.
59
3.7.1
Extreme values in abstract spaces with order
In the case of a general partial order one should not confuse a least element (smaller than all other) and a minimal element (nothing is smaller). Likewise, a greatest element of a poset is an upper bound of the set which is contained within the set, whereas a maximal element m of a poset A is an element of A such that if m
b (for any b in A) then m = b.
Any least element or greatest element of a poset will be unique, but a poset can have several minimal or maximal elements. If a poset has more than one maximal element, then these elements will not be mutually comparable. In a totally ordered set, or chain, all elements are mutually comparable, so such a set can have at most one minimal element and at most one maximal element. Then, due to mutual comparability, the minimal element will also be the least element and the maximal element will also be the greatest element. If a chain is finite then it will always have a maximum (maximal element, greatest element) and a minimum (minimal element, least element). If a chain is infinite then it need not have a maximum or a minimum. For example, the set of natural numbers has no maximum, though it has a minimum. If an infinite chain S is bounded, then the closure Cl(S) of the set occasionally have a minimum and a maximum, in such case they are called the greatest lower bound and the least upper bound of the set S, respectively. In general, if an ordered set S has a greatest element m, m is a maximal element. Furthermore, if S is a subset of an ordered set T and m is the greatest element of S with respect to order induced by T, m is a least upper bound of S in T. The similar result holds for least element, minimal element, and greatest lower bound.
60
3.8
Probability of a z value
When there is a z (standardized) value for a variable, the probability of that value can be determined; by utilizing the table shown in Appendix 13 – Standard Normal (Z) Table. When a z value is looked up, the area under the normal curve will be calculated. The area not under the curve is referred to as the rejection region. It is also called a two-tailed probability because both tails of the distribution are excluded. A one-tailed probability is used when a research question is concerned with only half of the distribution. Its value is exactly half the two-tailed probability. Example 1 and Example 2 illustrate the z value calculation for popular probability values of 0.05 and 0.10 respectively. Example 1
Two-tailed probability = 0.05 1 - 0.05 2 = 0.4750
Area under graph =
z-value = 1.96
(Equation 34)
Example 2
Two-tailed probability = 0.10 1 - 0.10 2 = 0.4500
Area under graph =
z-value = 1.64
3.8.1
(Equation 35)
Critical z for a given probability
The critical z value for a given probability can also be determined as illustrated in Example 3. Example 3
A large company designed a pre-employment survey to be administered to perspective employees. Baseline data was established by administering the survey to all current employees. They now want to use the instrument to identify job applicants who have very 61
high or very low scores. Management has decided they want to identify people who score in the upper and lower 3% when compared to the norm. How many standard deviations away from the mean are required to define the upper and lower 3% of the scores? The total area of rejection is 6%. This includes 3% who scored very high and 3% who scored very low. Thus, the two-tailed probability is 0.06. The z value required to reject 6% of the area under the curve is 1.881. Thus, new applicants who score higher or lower than 1.881 standard deviations away from the mean are the people to be identified. Two-tailed probability = 0.06 1 - 0.06 2 = 0.4700
Area under graph =
z-value = 1.881
3.9
(Equation 36)
Precision Factor
In order to have the total number of samples (n), the concept of statistical precision as in the simple random sampling is applied [41]. The formula is given by:
r=
σ ∗z∗ µ
1 n
(Equation 37)
where: r is precision
σ is the standard deviation of the total consumption (kWh) µ is the mean of the total consumption (kWh) z is the standard normal variable (i.e. z equals 1.960 for a 95% CI and 1.645 for a 90% CI [42])
n is the sample size From the above equation, the total number of samples (n) can be determined for a given precision factor. It can be seen that the smaller the precision factor, the greater the samples required.
62
To complete the calculations in Equation 37, the standard deviation and mean as shown in equations 38 and 39 respectively were determined.
1 n
σ= X=
1 n
n i =1
n i =1
( X i − X )2
Xi
(Equation 38)
(Equation 39)
where:
σ is the standard deviation of the total consumption (kWh); n is the sample size; X is the mean of the total consumption (kWh); and X i is the current sample value of total consumption (kWh)
From Equation 37, the total number of samples n of each category can be determined for a given precision factor. For this study, the precision factor r is set to 0.1.
63
CHAPTER 4 DETERMINATION OF CONSUMER STRATIFICATION & SAMPLE SIZE
4.1
Introduction
In order to calculate the VoLL for Peninsular Malaysia, it was imperative that proper and accurate techniques were used in the stratification and sampling of consumer data. Both of these stages would have a large impact on the analysis and final calculation of the VoLL. The stratification stage deals with the segregation and grouping of consumers into major strata, which are then broken down further into minor strata. This stage needs to identify consumers with similar needs and activities.
This is done by profiling consumers’
consumption trends and common business practices. The sampling stage uses the various consumer stratification data to calculate the necessary number of samples for each major stratum such that the needed accuracy factor is satisfied. For this research, the precision factor r is set at 0.1 for a CI of 90% (see Section 3.4 for explanation). The consumer data from the TNB billing database yielded a large variance. Due to limitations of time and funding in this research project, it was necessary to choose a lower r and CI in order to limit the sample size. Both the r and CI values, however, are statistically acceptable.
64
4.2
Stratification
Stratification and sampling is a field under the study of statistics and is very useful when it comes to outage cost study because it allows accurate research of a large population by taking samples from that population. This research is limited to the electricity consumers of Peninsular Malaysia, which comprises 6,582,374 consumers in 2005 [14]. For such a large population, a general census would prove infeasible. consumer data.
Thus, this research will rely on sampling to collect
However, before sampling, the consumer population must first be
stratified to ensure that consumption data of each unique stratum is accurately represented. At the beginning of the study, the classification (by consumption) of consumers of electricity and their size is determined, as shown in Figure 14. Consumers are divided into three major stratums, which are domestic, commercial, and industrial consumers.
Others 5.04%
Industrial 48.03%
Domestic 18.19%
Commercial 28.74%
Figure 14: Classification of Consumers by kWh Consumption Year 2005 [43]
Once the strata are justified in general, it is required that the classification be refined further to obtain samples that are realistic, so that it reflects actual consumption trends, and deterministic, so that it is predictable under all operating conditions [44]. Hence, the TNB business code is referred to for classification into much smaller groups for the stratification and sampling purposes. The business code is used to identify the individual minor strata for each major stratum.
The details of the individual minor strata
65
stratification are discussed in Section 4.4.1 for domestic consumers, Section 4.4.2 for commercial customers, and Section 4.4.3 for industrial consumers.
4.2.1
Stratified Sampling
Stratified sampling is a method of sampling from a population. When subpopulations (stratum) vary considerably, it is advantageous to sample each stratum independently. Stratification is the process of grouping members of the population into relatively homogeneous subgroups before sampling [45]. The strata should be mutually exclusive: every element in the population must be assigned to only one stratum. The strata should also be collectively exhaustive: no population element can be excluded [46]. In this research for example, the entire population is stratified into 3 major stratums: Domestic, Commercial, and Industrial. Each stratum consists of consumers that only belong to that stratum (and no other stratum). Furthermore, all consumers are assigned to a particular stratum. Less than 1% of the total consumers (by consumption) are excluded, which statistically can be taken to mean a mutually exhaustive stratification process. Generally, stratification is used for two reasons mainly, to reduce standard errors for survey estimates and to ensure that sample sizes for strata are of their expected size. Relative to taking a completely unstratified sample, taking a proportionate sample is either a good thing, in that it reduces standard errors, or a neutral thing, if standard errors don’t change. Proportionate stratification can never increase standard errors. This is because: i)
total sampling variance can be decomposed into two components: within-strata variation and between-strata variation (the split between the two depending on how the strata are defined)
ii)
with proportionate stratification the between-strata variance becomes zero. So, proportionate stratification is most efficient when the stratifiers that are used split the total variance in a way that maximizes the between-strata variance [47]
66
After stratification, random or systematic sampling was applied within each stratum. This is done to further improve the representativeness of the sample by reducing sampling error. It can produce a weighted mean that has less variability than the arithmetic mean of a simple random sample of the population.
Proportionate stratified sampling almost
always leads to an increase in survey precision (relative to a design with no stratification), although the increase will often be modest, depending upon the nature of the stratifiers. In this research, random sampling of the minor strata was carried out, albeit the best effort was made to provide a proportionate number of samples for each minor stratum with regards to its contribution to the total consumer consumption, i.e. its weight.
4.3
Preprocessing of Population Data with SPSS Software
The way the data is distributed around a central value is very critical and important, statistically. While the values of individual data points for samples of population may vary according to a number of patterns, measurement data often follow a simple distribution. The data points are characteristically distributed symmetrically about the mean. Small deviations from the average usually occur more often than large differences and very large differences occur rarely. When a frequency distribution is constructed of large sample results from general population, one often obtains the familiar bell shaped curve which has been mathematically described by the mathematician Gauss and is called the Normal distribution or Gaussian distribution. Statistically, there are many terms related to the roots of sampling. Generally, variance, standard deviation, and mean are used regularly. Moreover, there are two special kinds of departures from the normal distribution: skewness and kurtosis (curvature), in which the data are abnormally compressed or are more spread out than for a true normal distribution. For a perfect normal distribution, the skewness factor will be zero. A negative value is due to the skewness toward lower values; more data in the left tail than would be expected in a normal distribution. This can be seen in the left diagram of Figure 15 by the elongated tail at the left. A positive value indicates excess higher values; more data in the
67
right tail than would be expected in a normal distribution. This can be seen in the right diagram of Figure 15 by the elongated tail at the right.
Figure 15: Negative skew (left diagram) and positive skew (right diagram).
From Figure 16, with reference to perfect normal distribution, the kurtosis factor will be expected to be zero (mesokurtic curve). A negative value indicates a sharper curve (leptokurtic curve) while a positive value indicates data that are more spread out than normal (platykurtic curve).
Hence, an improper approach and combination of the
skewness and kurtosis will result in poor sampling.
Figure 16: Kurtosis Factor Impact on a Normal Distribution
Based on ordinary variability, there will be some level of probability – a level chosen arbitrarily – where any chosen difference is significant. However, naturally, one sets the probability, such as 90% confidence interval, for a category. In this way, there is a limit set to obtain samples that drop in the particular confidence interval. Since the consumer population was very large and the distribution was asymmetrically, a CI of 90% was chosen. The computed variance was also significantly high due to the large but not dense distribution of the population. Therefore, a precision factor of 0.1 was set for the sampling process.
68
Taking these factors into consideration, the statistical software SPSS was used to preprocess the raw data and check for proper normal distribution. The SPSS output for domestic consumers is listed in Appendix 7, commercial consumers in Appendix 8, and industrial consumers in Appendix 9.
4.4
Sampling with SPSS Software
Sampling is that part of statistical practice concerned with the selection of individual observations intended to yield some knowledge about a population of concern, especially for the purposes of statistical inference [48]. In particular, results from probability theory and statistical theory are employed to guide practice. The sampling process consists of five stages [49]: i)
Definition of population of concern
ii)
Specification of a sampling frame, a set of items or events that it is possible to measure
iii)
Specification of sampling method for selecting items or events from the frame
iv)
Sampling and data collecting
v)
Review of sampling process
It is important to create a sample that correctly reflects the makeup of the whole population [50]. For the VoLL calculations, there are two primary objectives in sample design [51]: i)
Ensure that the sample is representative of the population interest so that the resulting outage cost estimates are not biased
ii)
Efficiency to produce the most statistically precise outage cost estimates possible given the resources available for the research
The objective in sampling is to achieve the greatest statistical precision possible in the outages cost estimates for a given sample size. The procedure in sampling is a three step approach. First, the domain is defined. Here, the consumer population is defined in terms of consumer economic activity. It is here that
69
the business code for consumer classification is very useful, because it allows for quick and precise classification of consumers. Second, the preliminary data is analyzed.
At this point it is desirable to employ a
stratification scheme; this research relies heavily on the consumers’ business codes and, to a lesser extent, on tariff and consumption data. This is where the consumers for the minor strata are randomly selected and listed for future reference. Every effort was made to ensure proportionate random sampling of each minor stratum. Third, the final sample is compiled. The final sample includes an overall sample size for each major strata. It lists the respective particulars of the consumers that this research would like to sample. At the onset of this research, it was clear that stratification of major stratum would have to follow domestic, commercial, and industrial strata. The main reason being that TNB grouped its consumers in this manner and therefore had set different tariff rates for these three major strata. Electricity tariffs are a major component to consider when making outage cost estimations. However, to leave the stratification process at three major strata would yield a very inefficient sampling of the population. Table 4 shows the number of samples needed for each major stratum. Calculation were made for a 90% CI and setting r = 0.1. The domestic strata yields a reasonable sample size consistent with surveys in other countries such as the US, UK, Canada, Australia, and New Zealand. The commercial and industrial strata, however, yield a sample size that is disproportionately large as compared to their respective population sizes. There were two causes for this problem. Table 4: Number of Samples needed for each Major Strata (90% CI, r=0.1) No 1 2 3
Category Domestic Commercial Industrial
Number of Samples (n) 2048 26962 4487
70
Firstly, the data extraction from the TNB billing database yielded data for only about 5.2 million of the 6.6 million consumers TNB had in 2005. The remaining balance data could not be extracted because while it was assumed that all consumers were billed once every calendar month, after many months of investigation this was found to be untrue. There were consistently about 1 million consumers whose meter reading spilled over into the next calendar month and thus, were not captured in our data extraction. However, by the time this was discovered, it was too late to perform another data extraction and the research had already progressed passed the stratification stage and was already in the data collection (sampling) stage. Secondly, the normal distribution of the consumer data yielded a very mesokurtic curve. This resulted in the σ/µ ratio being very large (typically >1), which in Equation 37 produced very large sample sizes. Comparatively, the US research by EPRI consistently utilized σ/µ ratio of <1.
4.4.1
Domestic Consumers
Business codes are split into four categories depending on the occupants’ total consumption and similarity in terms of building design. The four categories mentioned are listed below in Table 5. Table 5: List of Domestic Business Codes No 1 2 3 4
Minor Strata Kampong House Link House High Rise Bungalow
Business Code 63253 63221, 63222 63224, 63225, 63231 to 63238 63211, 63212, 63250 to 63252
For the purpose of stratification and sampling in this study, squatter houses [Business Code: 63240] and long house (temporary) [Business Code: 63241] are neglected because they have low power consumption and very low volume of data for sampling. Out of the total number of domestic consumers, the neglected business codes contribute less than 1 percent.
71
From this statistics it can be concluded that the stratification is done based on total consumption in terms of kWh and also volume of data. Number of samples to work on is then calculated as shown in Table 6, where Equation 37 is applied to find the estimated proportional value for this research. Table 6: Number of Sample Selection for a Proportional Stratified Random Sample No 1 2 3 4
Minor Strata Kampong House Link House High Rise Bungalow Total
Number of Samples (n) 576 1077 275 120 2048
Meanwhile, in a non-proportional stratified sample, the number of items chosen in each stratum is disproportional to the respective numbers in the population. Regardless of whether a proportional or non-proportional sampling procedure is used, every item in the population has a chance of being selected for the sample.
4.4.2
Commercial Consumers
In this research, TNB’s existing consumer list based on business code classification is adapted. Table 7 shows the business class and business code as mentioned in the TNB billing system. Table 7: List of Commercial Business Codes No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Commercial Minor Strata Accommodation Agriculture Communication Construction Financial Institution Insurance Other Services Real Estate / Business Services Recreational Retail Shop House Social Service Transport Wholesale
Business Code 63200 – 63206 11111 – 11199 72001 – 72009 50011 – 50029 81011 – 81030 82001 – 82005 95110 – 95990 83101 – 83300 94110 – 94901 62110 – 63100 63223 93100 – 93400 71110 – 71920 61110 – 61500
Example Rest house, hotel, motel Celcom, Digi, Maxis Plumbing, carpentry, contractor Banks, pawnshop AIA, Great Eastern Laundry, barber Law firm, advertising Stadium, TGV Restaurant, café, 7-eleven Clinic, schools, kindergarten Taxi service, travel agency Mydin, Carrefour, Makro
72
In deciding among alternative capital investments and operating procedures that can affect electric power supply reliability and quality, it is important to take account of the interruption cost that consumers will experience as a result of the outage. In order to obtain such critical information from consumer survey, the appropriate and genuine consumers for each classification needs to be considered. In order to determine the total sample size, required targets need to be set for each category. This is to ensure the minimum number of samples achieved for analysis of Value of Loss Load.
Based on the preliminary calculation of mean and standard
deviation, it was observed that the standard deviation is larger then mean; hence it was difficult to obtain a sample size using the same method for domestic consumers. Therefore, a rather optimistic method is being used to calculate the sample size for this category of consumers. The first step was to categorise the list of consumers into business code classification as shown in Table 7. As the TNB database has over six million entries, sorting was done by writing a query in Microsoft Access to split the database. Once the entries were sorted, it was further stratified with reference to the tariffs and a query was written to sort into Tariff B, Tariff C and Tariff D. Based on the tariff classification by TNB as listed below, TNB has classified Tariff D into Industrial Tariff, but for the purpose of this research it was considered a cross set between commercial and industrial. Figure 17 illustrates the tariff classification. i)
Tariff B – Low Voltage Commercial Tariff
ii)
Tariff C1 – Medium Voltage General Commercial Tariff
iii)
Tariff C2 – Medium Voltage Peak/Off-Peak Commercial Tariff
iv)
Tariff D – Low Voltage Industrial Tariff
The reason for this cross set to exist was due to the fact that consumers listed under the business code of commercial consumers were billed with Tariff D rates.
73
E
Tariff B Tariff C1 Tariff C2
Tariff F1, F2
Tariff D
Tariff A
Tariff F, G, G1 Tariff E1 Tariff E2 Tariff E3
Tariff H, H1, H2
Figure 17: Venn diagram for Tariff Classification In order to achieve high precision, a large number of consumers (sample size) needed to be considered. Since it would be time consuming and not feasible (see Section 4.4 for explanation) to obtain the large sample size; therefore an optimistic sample size was manually fixed for each minor stratum. In the case of commercial consumers, a size of five samples is fixed for each minor stratum. Hence, with 14 categories, our target sample size will be 70 samples from Peninsular Malaysia. These samples comprise of all three tariff types. Since tariff B plays a major role in classification compared to the other two tariffs, therefore samples of tariff B will eventually have a larger portion of samples during distribution. This would help maintain a proportionate random sampling process.
4.4.3
Industrial Consumers
In the case of industrial consumer stratification, four steps were taken to determine the samples: i)
Stratification variables or set of variables
ii)
Boundaries between cell
iii)
Allocation Schemes
iv)
Final sample
Stratification
Boundaries
Allocation
variables
between cells
Schemes
Final sample
Figure 18: Industrial Stratification Method
74
The list of industries and classification from MITI [52], PTM [53], FMM [54] and TNB [55] are obtained.
Figure 19: Industrial Stratification Process Flow
A list of all industries from the data provided by other agencies such as MITI, MIDA and PTM with references from TNB are consolidated. As a result, a list of 18 industries that contributes heavily to the growth of the country can be compiled. This list is shown below: i)
Electrical and Electronics
ii)
Paper, Printing And Publishing
iii)
Chemicals And Petrochemicals
iv)
Iron And Steel
v)
Non-Ferrous Metals And Their Products
vi)
Transport Equipment
vii)
Food
viii)
Wood And Wood Products
ix)
Textile And Textile Products
x)
Clay-Based And Other Non-Metallic Mineral
xi)
Plastic
xii)
Machinery And Machinery Components
xiii)
Rubber
xiv)
Beverage And Tobacco
xv)
Furniture And Fixture
xvi)
Palm And Palm Kernel Oil
xvii)
Pharmaceutical
xviii) Miscellaneous
75
$
$ "
# '
(
#
$"
#% &
$
!
Figure 20: Formation of Boundaries between Cells
The break point between each minor stratum needs to be clarified at this point. For this purpose, we used the Dalenius–Hodges method to develop the sample cell boundaries. In order to split the respective stratum, stratification has to be implemented based on the 12month running peak consumption (kWh). Table 8 details the categorization that is used in this research. Table 8: Boundaries between cells No 1 2 3
Industry Small Medium Big
Business Code 31152 35600 35300
Range of Peak Consumption (kWh) 0 – 100000 100001 – 1000000 1000000 – Infinity
In this allocation scheme, random sampling needs to be implemented.
In random
sampling, there is a choice of to use either the Neyman design or the proportional design. In Neyman sampling, the sample sizes are chosen proportional to the products of the standard deviations and the stratum sizes For industrial consumers, Neyman sampling was chosen since it is better than proportionate sampling [56]. The additional work to determine cell boundaries and to use Neyman sampling instead of proportional random sampling were necessary because industrial consumers consists of a large range; from those with small consumptions to those with very large consumptions. The additional steps, which are not utilized in the domestic and commercial strata, are necessary for the industrial strata to ensure that both large and small industrial consumers are fairly represented in the sample size. 76
After the cell boundaries are determined, the allocation scheme can proceed to determining the sample size given the precision factor r = 0.1. Figure 21 illustrates the complete process flow of the allocation process for determining the number of samples needed for industrial consumers.
$ $
$" $ )
*
+
(
$
& $
Figure 21: Allocation scheme
From these calculations, it can be concluded that the stratification is done based on total consumption in terms of kWh and also volume of data. For similar reason as those stated in Section 4.4.2, the number of samples is set at 3 samples per minor strata as shown in Table 9. Table 9: Number of Sample Selection for a Proportional Stratified Random Sample No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Minor Stratum Electrical & Electronic Paper, Printing and Publishing Chemical & Petrochemical Iron & Steel Nonferrous Metals Transport Equipment Food Wood & its products Textiles & its products Clay-based & other nonmetallic minerals Plastics Rubber Beverage & Tobacco Furniture & Fixtures Palm & Palm Kernel Oil Pharmaceutical Machinery & Machinery Equipment Others Total
Number of Consumers 1,004 1,188 313 2518 823 1118 1366 2825 2663 926 1451 790 782 0 170 155 594 333 26,689
Number of Samples (n) 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 54
77
The final list of samples can then be determined. However, for industrial consumers it was important to consider the geographical location of the consumer as certain industries could only be found at specific geographical locations. For example, the petrochemical industry would be concentrated near oil and gas fields, which are predominantly on the East Coast, especially in Kerteh, Terengganu. Figure 22 illustrates the overall process of determining the sample size and final sample list for industrial consumers.
(
,
$"
Figure 22: Final sample
4.5
Normalization of Consumer Survey Data
Normalization is defined in relational database design as the process of organizing data to minimize redundancy. Normalization usually involves dividing a database into two or more tables and defining relationships between the tables. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships [57].
Figure 23: The bell curve that shows the normal distribution from sample data
The normal distribution or Gaussian distribution is the plot of the PDF for any given large population. It is often known as the bell curve because the shape of the plot resembles a bell.
For this study, we will assume that the PDF of TNB consumers is Gaussian 78
distributed. Therefore, all survey data will be normalized against the Gaussian PDF plot, such as the one shown in Figure 23. Normalization is the process of removing statistical error in repeated measured data [58]. In this study, a survey sample is taken and evaluated to determine its VoLL. This VoLL value will be normalized against the respondent’s stratum’s mean kWh consumption. Therefore, the steps necessary to calculate the VoLL for a particular stratum are: i)
Calculate the mean kWh consumption for each stratum. This can be done by simply summing all elements and dividing that value by the number of elements. This is the same calculation as shown in Equation 39.
ii)
Normalize each VoLL value obtained from survey sampling. To normalize a VoLL value, the following formula is used.
VoLLstratum normalized =VoLLsurvey ×
kWhstratum mean kWh survey
(Equation 40)
This process is illustrated in Figure 24. The survey VoLL value may be for a respondent who consumes more (or less) than the mean kWh consumption for his stratum. Therefore, that value must be normalized with the stratum mean.
Survey normalized
Mean e.g. 5kWh
Survey e.g. 7kWh
Figure 24: Illustrates the normalization process of a survey VoLL value
79
iii)
Calculate the mean VoLL value for each stratum. This step once again requires the use of X =
1 n
n i =1
Xi
(Equation 39; this time
to calculate the mean VoLL value.
4.6
Preprocessing of Sample Data
Analysis of the consumer data suggests that the consumption of commercial and industrial consumers is skewed. This is because there is a minority of commercial and industrial consumers who have a much higher consumption than the average value of their respective categories.
Hence, the number of samples needed from commercial and
industrial consumers, as calculated by Equation 37, is much higher than the number of samples needed from domestic consumers, although the number of commercial and industrial consumers is much lower than the number of domestic consumers. To solve this problem, the consumer database was pre-processed to exclude outliners and extreme values. Outliers and Extreme values are discussed further in Section 3.6 and Section 3.7 respectively.
The following sections discuss domestic, commercial, and industrial
consumer data in greater detail.
4.6.1
Domestic Consumer Data Table 10: Domestic Statistics Range Type Kampong Terrace High-rise Bungalow
Mean (kWh) (µ) 181.8853 307.9136 234.0532 471.5546
+90% (µ*1.90) 345.5821 585.0358 444.7011 895.9536
-90% (µ*0.10) 18.18853 30.79136 23.40532 47.15546
80
Statistical parameters for domestic precision factor and number of samples calculation: Mean (µ)
198.360642
Standard Deviation (σ)
120.670647
Number of sample (n)
2015
Standard Normal Variate (Z)
1.645 (for a 90% Confidence Interval)
From Equation 37, Precision (r)
0.0223
Before obtaining the precision factor, r, the mean consumption of each domestic stratum must be multiplied with a factor 1.90 and 0.1 to normalize the distribution to +90% and – 90% of the mean. Table 10 shows the high and low cutoff points for each domestic stratum. The boundaries between 0.0 – 0.1 and 1.9-2.0 must then be cleared to complete the normalization process.
This yields a normalization range of 0.1 – 1.9 of mean
consumption. Figure 25 shows the boxplot representation of the preprocessed data.
Figure 25: SPSS Boxplot of Domestic Consumer Population (CI = 90%)
The detailed SPSS output is appended in Appendix 10 – SPSS Analysis of Domestic Consumer Samples. 81
4.6.2
Commercial Consumer Data
Statistical parameters for commercial precision factor and number of samples calculation: Mean (µ)
699.7677
Standard Deviation (σ)
1042.4832
Number of sample (n)
74
Standard Normal Variate (Z)
1.645 (for a 90% Confidence Interval)
From Equation 37, Precision (r)
0.2840
Table 11: Commercial Statistics Range Type
Mean (kWh) (µ)
+90% (µ*1.90)
-90% (µ*0.10)
Accommodation
3336.6493
6339.63
333.66
Agriculture
4968.7967
9440.71
496.88
Communication
4427.6905
8412.61
442.77
Construction
2258.4385
4291.03
225.84
Financial Institution
5976.8402
11355.9964
597.684
Insurance
1800.8203
3421.5586
180.08
Real Estate / Business Services
1386.8678
2635.04
138.67
Recreation
34293.395
65157.45
3429.34
Residential Shop House
716.94182
1362.19
71.69
Retail
1165.1658
2213.81
116.52
Social Service
2449.9959
4654.99
244.99
Transport
3407.0479
6473.39
340.704
Wholesale
2143.717
4073.06
214.37
Others
934.29393
1775.15
93.43
Before obtaining the precision factor, r, the mean consumption of each commercial stratum must be multiplied with a factor 1.90 and 0.1 to normalize the distribution to +90% and –90% of the mean. Table 11 shows the high and low cutoff points for each commercial stratum. The boundaries between 0.0 – 0.1 and 1.9-2.0 must then be cleared to complete the normalization process. This yields a normalization range of 0.1 – 1.9 of mean consumption. Figure 26 shows the boxplot representation of the preprocessed data.
82
Figure 26: SPSS Boxplot of Commercial Consumer Population (CI = 90%)
The detailed SPSS output is appended in Appendix 11 – SPSS Analysis of Commercial Consumer Samples.
4.6.3
Industrial Consumer Data
Statistical parameters for industrial precision factor and number of samples calculation: Mean (µ)
10540.72
Standard Deviation (σ)
12852.91
Number of sample (n)
63
Standard Normal Variate (Z)
1.645 (for a 90% Confidence Interval)
From Equation 37, Precision (r)
0.252713
83
Table 12: Industrial Statistics Range Mean (kWh)
+90%
-90%
*1.90
*0.1
E&E
36477.009
69306.3171
3647.7009
Paper, Printing
15966.551
30336.4469
1596.6551
Chemical
28071.11
30254.5607
1592.3453
Iron & Steel
12858.318
24430.8042
1285.8318
N. Metal
4442.61
8440.959
444.261
Transport Equip
6801.7306
12923.28814
680.17306
Food
21465.127
40783.7413
2146.5127
Wood & Prod
19312.481
36693.7139
1931.271
Textile
6813.8849
12946.38131
681.38849
Clay-based
18187.995
34557.1905
1818.7995
Plastics
43474.151
82600.8869
4347.4151
Rubber
32501.994
61753.7886
3250.1994
Beverage & T
15522.685
29493.1015
1552.2685
Furniture
17273.1
32818.89
3281.889
Palm Oil
18052.81
34300.339
1805.281
Pharmaceutical
17948.473
34102.0987
1794.8473
Machinery
16415.808
31190.0352
1641.5808
Others
11154.974
21194.4506
1115.4974
Before obtaining the precision factor, r, the mean consumption of each industrial stratum must be multiplied with a factor 1.90 and 0.1 to normalize the distribution to +90% and – 90% of the mean. Table 12 shows the high and low cutoff points for each industrial stratum. The boundaries between 0.0 – 0.1 and 1.9-2.0 must then be cleared to complete the normalization process.
This yields a normalization range of 0.1 – 1.9 of mean
consumption. Figure 27 shows the boxplot representation of the preprocessed data.
84
Figure 27: SPSS Boxplot of Industrial Consumer Population (CI = 90%)
The detailed SPSS output is appended in Appendix 12 – SPSS Analysis of Industrial Consumer Samples
85
CHAPTER 5 CONCLUSION
This VoLL research was carried out with the objective of developing a formulation which enables the calculation of the composite VoLL for Peninsular Malaysia. To accomplish this it was necessary to first, group or stratify electricity consumers based on their consumption, second, determine each stratum’s weights, and third, determine the individual outage cost of consumers of the identified strata. The stratification of consumers was done by taking into consideration stratification data from many organizations such as FMM, MITI, MIDA, and PTM. TNB’s tariff structure and business codes were also used in the stratification process. The finalized stratification comprises 3 major strata: Domestic, Commercial, and Industrial. There are 4 minor domestic strata, 14 minor commercial strata, and 18 minor industrial strata. The number of minor strata depended on the level of diversity of activity, processes, and consumption of the respective major strata. To calculate the sample size, it was necessary to eliminate outliers and extreme values to increase precision.
Next, the sample size of each major stratum was calculated in
accordance to Equation 37, for r = 0.1 at a CI of 90%. However, for the commercial and industrial strata, this sample size proved impractical and had to be fixed at 5 and 3 samples per minor strata respectively. The weight of each stratum was determined based on the stratum’s consumption versus the total consumption of all strata. Statistical analysis by SPSS proved that the stratification was adequate to provide the level of precision (r = 0.1, CI = 90%) that was required by this research.
86
The VoLL research project concluded that the VoLL for the Malaysian ESI is RM22.72/kWh interrupted [59] for a one hour outage. However, this research noted that the VoLL should be lower due to the limitation of time and funds to gather more data. Therefore, one more data extraction should be done to capture all consumers in TNB’s billing database and more samples should be collected, especially for the commercial and industrial strata. While calculating the VoLL can be a tremendous effort, once calculated, it can be used in a variety of techno-economic analysis. The VoLL can be used to determine the optimal reliability and outage cost of a system. Figure 28 illustrates the reliability cost borne by the utility and the reliability worth as perceived by the consumer with reference to the system reliability. By developing the reliability cost and reliability worth curves, we can quantify the reliability cost and outage cost. Summing the two curves, we obtain the total cost curve.
Outage Cost
4. Total Cost (TC, Red) = Reliability Cost + Reliability Worth 2. Present Reliability Level
1. Reliability Cost for TNB (RC, Blue)
R big? C big? 3. Reliability Worth to Consumer (RW, black) Reliability 5. Optimal Reliability and Cost Figure 28: Outage Cost vs. Reliability Curve
The optimal reliability and cost point is at the minimum of the total cost curve. Operating the system at this point allows optimal investment of resources by the utility while at the same time providing an acceptable level of reliability to the consumers. Alternatively, the same techno-economic analysis can be performed on a specific geographical area.
87
REFERENCES [1]
H.G. Stoll, “Least-Cost Electric Utility Planning”, John Wiley & Sons, 1989, pp 167, ISBN 0-47163614-2.
[2]
M.J. Sullivan, & D.M. Keane, “Outage Cost Estimation Guidebook”, EPRI Research Project 287804 Final Report, December 1995, pp 1-1.
[3]
“Optimization - Wikipedia, the free encyclopedia”, http://en.wikipedia.org/wiki/Optimization, 27 October 2005.
[4]
“Transmission Network Planning Manual”, Tenaga Nasional Network Planning Division, April 1998, pp 2-1 – 2-8.
[5]
Scientific Advisory Panel (SAP), US EPA, Policy for Review of Monte Carlo Analyses for Dietary and Residential Exposure Scenarios Meeting, “Attachment 2: Probabilistic Risk Assessments and Monte-Carlo Methods: A Brief Introduction”, March 1998, pp A2-3, Available at http://www.epa.gov/scipoly/sap/meetings/1998/march/attach2.pdf, Retrieved 21 June 2007.
[6]
M.J. Sullivan, & D.M. Keane, “Outage Cost Estimation Guidebook”, EPRI Research Project 287804 Final Report, December 1995, pp 6-1.
[7]
A.H. Hashim, “MEE Course Notes”, Universiti Tenaga Nasional, 2005.
[8]
H.G. Stoll, “Least-Cost Electric Utility Planning”, John Wiley & Sons, 1989, pp 363 – 365.
[9]
R. Billinton & R.N. Allan, “Reliability Evaluation of Power Systems, 2nd Ed”, Plenum Press, New York and London, 1996, pp. 302 – 326 and pp.443 – 473. ISBN 0-306-45259-6.
[10]
M.J. Sullivan, & D.M. Keane, “Outage Cost Estimation Guidebook”, EPRI Research Project 287804 Final Report, December 1995, pp 1-4.
[11]
M.J. Sullivan, & D.M. Keane, “Outage Cost Estimation Guidebook”, EPRI Research Project 287804 Final Report, December 1995, pp 8-13.
[12]
K.K. Kariuki & R. N. Allan, “Evaluation of Reliability Worth and Value of Lost Load, IEEE, ProcGeneration, Transmission, Distribution, Vol. 143, No. 2, March 1996, pp. 176 – 177.
[13]
Energy Research Institute, Chulalongkorn University, Thailand, “Electricity Outage Cost Study”, Available at http://www.eppo.go.th/power/ERI-study-E/ERI-ExeSummary-E.html, 19 November 2005.
[14]
Tenaga Nasional Berhad Annual Report, 2005, pp 5.
[15]
Tenaga Nasional Berhad Tariff Book, 2006, pp1.
[16]
Tenaga Nasional Berhad Annual Report, 2005, pp 62.
[17]
A.S. Hornby, “Oxford Advanced Learner’s Dictionary, 4th Edition”, Oxford University Press, 1989, pp 230.
[18]
S.M. Stigler, “The History of Statistics: The Measurement of Uncertainty before 1900”, Cambridge, MA, and London, Belkap Press of Harvard University Press, 1990, ISBN 0-674-40341-X.
[19]
http://en.wikipedia.org/wiki/Statistics, Retrieved 21 September 2006.
88
[20]
Inspired by Figure 4.3 of Ward, A. W., Murray-Ward, M., “Assessment in the Classroom”, Wadsworth Pub Co, Belmont, CA, United States, 1999, pp 74, ISBN 0-53-452704-3.
[21]
http://dictionary.reference.com/browse/statistics, Retrieved 16 October 2006.
[22]
The American Heritage® Dictionary, Fourth Edition, “Statistics”, ISBN 0-39-544895-6, Available at http://education.yahoo.com/reference/dictionary/entry/statistics, Retrieved 16 October 2006.
[23]
M. Foucault & R. Hurley, “The Will to Knowledge”, Penguin Books Ltd, 1977, ISBN 0-14026868-5.
[24]
http://en.wikipedia.org/wiki/Statistics, Retrieved 21 September 2006.
[25]
I. Hacking, “The Emergence of Probability: A Philosophical Study of Early Ideas About Probability, Induction and Statistical Inference, 2nd Ed”, Cambridge University Press, 1984, ISBN 0-52-131803-3.
[26]
K.L. Wuensch, Department of Psychology, East Carolina University, “When does correlation imply causation?”
[27]
http://en.wikipedia.org/wiki/Experimental_design, Retrieved 7 November 2006.
[28]
S.M. Jex, “Organizational Psychology: A Scientist Practitioner Approach”, John Wiley & Sons, New York, 2002, ISBN 0-47-137420-2.
[29]
J. Best, “Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists”, University of California Press, 2001.
[30]
D. Huff, “How to Lie with Statistics”, Penguin Books Ltd, 1991, ISBN 0-14-013629-0.
[31]
R.P. Abelson, “Statistics As Principled Argument”, LEA, Inc, 1995, ISBN 0-80-580528-1.
[32]
http://dictionary.laborlawtalk.com/Normal_distribution, Retrieved 12 November 2006.
[33]
J.H. Zar, “Biostatistical Analysis”, Prentice Hall International, New Jersey, 1984, pp 43 – 45, ISBN 0-13-100846-3.
[34]
J.K. Taylor, “Statistical techniques for Data Analysis”, Lewis Publishers New York, 1990, pp 68.
[35]
J.W. Wittwer, “Monte Carlo Simulation in Excel: http://vertex42.com/ExcelArticles/mc/, Retrieved June 1, 2004.
[36]
L.D. Brown, T. Tony Cai, A. DasGupta, “Interval Estimation for a Binomial Proportion”, Statistical Science, Volume 16, Number 2 (May, 2001), pp 101 – 117.
[37]
J.D. Petruccelli, B. Nandram, M. Chen, “Applied Statistics for Engineers and Scientists, 1st Ed”, Prentice Hall, New Jersey, 1999, pp 58 – 59, ISBN 0-13-565953-1.
[38]
D.S. Moore, & G.P. McCabe, “Introduction to the Practice of Statistics, 3rd Ed”, W. H. Freeman, New York, 1999.
[39]
J.S. Milton & J.C Arnold, “Introduction to Probability and Statistics: Principles and Applications for Engineering and the Computing Sciences, 3rd Ed”, McGraw-Hill Companies, 1994, pp 204 – 205, ISBN 0-07-042623-6.
[40]
Renze, John, “Outlier” From MathWorld – A Wolfram Web Resource, created by Eric W. Weisstein, available at http://mathworld.wolfram.com/Outlier.html, Retrieved 7 November 2006.
A
Practical
Guide”,
89
[41]
M.J. Sullivan, & D.M. Keane, “Outage Cost Estimation Guidebook”, EPRI Research Project 287804 Final Report, December 1995, pp 6-2.
[42]
http://www.statsoft.com/textbook/sttable.html#z, Retrieved 11 November 2006.
[43]
Tenaga Nasional Berhad Annual Report, 2005, pp 5.
[44]
http://en.wikipedia.org/wiki/Deterministic_computation, Retrieved 15 August 2006.
[45]
Andrew F. Siegel, “4th Edition Practical Business Statistics”, Irwin McGraw-Hill, 2000, pp279 – 282.
[46]
http://en.wikipedia.org/wiki/Stratified_sampling, Retrieved 14 August 2006.
[47]
http://www.dcs.napier.ac.uk/peas/sratheory.htm#tion, Retrieved 14 August 2006.
[48]
http://en.wikipedia.org/wiki/Sampling_%28statistics%29, Retrieved 11 August 2006.
[49]
Makerere University Institute of Statistics & Applied Economics (ISAE).
[50]
http://www.statpac.com/surveys/sampling.html, Retrieved 14 August 2006.
[51]
M.J. Sullivan, & D.M. Keane, “Outage Cost Estimation Guidebook”, EPRI Research Project 287804 Final Report, December 1995, pp 4-5.
[52]
“List of promoted Activities and products”, Ministry of Industrial and Trade International, 2005.
[53]
Energy Intensity pg 34, A report on Energy Efficiency in Malaysian Industries: Economic Analysis and Recommendations for an Institutional Framework. Final Report November 1994, Energy Conservation Study (ADB-TA No 1574-MAL), Ministry of Energy, Telecommunication and Post.
[54]
Directory Federation of Manufacturing Malaysia 2005, Federation of Manufacturing Malaysia, 2005.
[55]
TNBD Consumer Database printed on 21 February 2006.
[56]
T.W. Anderson & Stanley L. Sclove, “Statistical Analysis of Data, 2nd Ed”, Scientific Press, Palo Alto, Cal., 1986, pp 580 -581.
[57]
http://www.webopedia.com/TERM/n/normalization.html, Retrieved 10 August 2006.
[58]
Douglas A. Lind, Robert D. Mason, William G. Marchal, “Basic Statistics for Business and Economics 3rd Edition”, Irwin McGraw-Hill, 2000, pp 197 – 211.
[59]
TNB Research Sdn Bhd & Power Engineering Center, Universiti Tenaga Nasional, “Determining the Value of Lost Load in the Malaysian Electricity Supply Industry”, November 2006.
90
Appendix
APPENDIX 1 – TNB BUSINESS CODES Business Code
Description
Type of consumer
11111
Agriculture:Padi
COM
11112
Agriculture:Tobacco
COM
11113
Agriculture:Tapioca
COM
11114
Agriculture:Sugar-cane
COM
11115
Agriculture:Market gardening
COM
11119
Agriculture:Other corps n.e.c
COM
11120
Agriculture services
COM
11121
Agriculture:Rubber estate
COM
11122
Agriculture:Rubber smallholdings
COM
11123
Agriculture:Oil palm estate
COM
11124
Agriculture:Oil palm smallholdings
COM
11125
Agriculture:Coconut estate
COM
11126
Agriculture:Coconut smallholdings
COM
11127
Agriculture:Tea estate
COM
11128
Agriculture:Coffee estate
COM
11129
Agriculture:Cocoa
COM
11131
Agriculture:Pepper
COM
11132
Agriculture:Pineapple
COM
11133
Agriculture:Banana
COM
11134
AgricultureOther fruits growing
COM
11139
Agriculture:Other permanent crops
COM
11190
Agri-Livestock:Other n.e.c. 11119
COM
11191
Agri-Livestock:Pig rearing
COM
11192
Agri-Livestock:Cattle & dairy
COM
11193
Agri-Livestock:Poultry & hatching
COM
11199
Agri-Livestock:Other n.e.c.
COM
11300
Hunting, Trapping & Games propagation
12101
Forestry:Coll’n of attap + other
12102
Forestry:Charcoal burning
12109
Forestry:Other forestry industries
12200
Logging
13010
Fishing:Ocean & Coastal Fishing
13021
Fishing:Inland fishing
13029
Fishing:Not elsewhere classified
14001
Public Lighting:Telephone kiosk
14002
Public Lighting:Decorative lighting
14003
Public Lighting:Street lighting
14004
Public Lighting:Bus stand
14005
Public LightingSignboard;Advert
14009
Public Lighting:Not elsewhere classified
14010
Public Lightin:Village street light
21000
Mining:Coal
91
Appendix
22000
Crude Petroleum & Natural gas prod.
IND
23010
Mining:Iron Ore
23021
Mining:Tin Dredging
23022
Mining:Tin mining &other dreging
23023
Mining:Dulang washimg
23024
Mining:Amang treatment
23025
Mining:Bauxite
23026
Mimig:Gold
23027
Mining:Copper
23028
Mining:antimony
23029
Mining:Non-ferrous & metal ore nec
29011
Mining:Limestone quarrying
29012
Mining:Other stone qurrying
29013
Mining:Clay,sand&gravel pits
29021
Mining:Guano gathering
29029
Mining:Other chemical,fertiliser
29030
Mining:Salt
29090
Mining&quarrying:Others
31110
Manufac:Meat preparation
IND
31121
Manufac:Diary prod. Ice cream
IND
31129
Manufac:Diary prod. Others
IND
31131
Manufac:Canning pineapple
IND
31139
Manufac:Canning fruits,vegetables
IND
31140
Manufac: Canning fish, crustacea
IND
31151
Manufac:Coconut oil
IND
31152
Manufac:Palm oil
IND
31153
Manufac:Palm kernel oil
IND
31159
Manufac:Other vege. & anim,al oils
IND
31161
Manufac:Small rice mills
IND
31162
Manufac:Large rice mills
IND
31163
Manufac:Flour mills
IND
31164
Manufac:Sago & tapioca factories
IND
31169
Manufac:Other grain milling
IND
31171
Manufac:Biscuit factories
IND
31172
Manufac:Bakeries
IND
31180
Manufac:Sugar factories&refine
IND
31190
Manufac:Cocoa,choc. & sugar conf
IND
31211
Manufac:Ice factories
IND
31212
Manufac:Coffee factories
IND
31213
Manufac:Tea factories
IND
31214
Manufac:Meehoon,noodles,&related
IND
31215
Manufac:Spices&curry powder
IND
31216
Manufac:Starch
IND
31219
Manufac:Other food products
IND
31220
Manufac:Prepared animal feeds
IND
31310
Manufac:Distilling,blend’g spirit
IND
92
Appendix
31320
Manufac:Wine industries
IND
31330
Manufac:Malt liquors & malt
IND
31340
Manufac:Soft drinks&Corconated
IND
31400
Manufac:Tobacco
IND
32111
Manufac:Natural fibre spining
IND
32112
Manufac:Non-batek Dyeing,bleach
IND
32113
Manufac:Handicraft spin’g,weav’g
IND
32114
Manufac:Batek making
IND
32115
Manufac:Synthetic textile mills
IND
32119
Manufac:Misc primary textiles
IND
32120
Manufac:Non-wearing apparel
IND
32190
Manufac:Textile (material)
IND
32201
Manufac:Clothing
IND
32202
Manufac:Custom tailoring
IND
32209
Manufac:Misc wearing apparel
IND
32310
Manufac:Tanneries & leather fin’g
IND
32320
Manufac:fur dressing &dyeing ind
IND
32330
Manufac:leather prod. Excp. F.w/w.a
IND
32400
Manufac:Footware excp Vul,Rub,Pla
IND
33110
Manufac:Wood(33119)
IND
33111
Manufac:Sawmills
IND
33112
Manufac:Plywood,hardb’d & particl
IND
33113
Manufac:Plammg,window,door,join
IND
33114
Manufac:Prefab. Wooden house
IND
33119
Manufac:Other wood products
IND
33120
Manufac:Wooden & cane container
IND
33190
Manufac:Wood&cork products
IND
33200
Manufac:Non-metal furnit/fixture
IND
34100
Manufac:Paper
IND
34110
Manufac:Pulp,paper&paperboard
IND
34120
Manufac:Container,paper boxes
IND
34190
Manufac:Other pulp,paper,paperb’d
IND
34200
Manufac:Printing,Publishing,Allied
IND
35111
Manufac:Industrial gases
IND
35119
Manufac:Other industrial chemical
IND
35120
Manufac:Fertilizers & pesticides
IND
35130
Manufac:Synt.resins,plastic,fibr
IND
35210
Manufac:Paints,varnishes,lacquers
IND
35220
Manufac:Drugs & medicine
IND
35230
Manufac:Perfumes,cosmetics (35239)
IND
35231
Manufac:Soap,cleaning preparation
IND
35238
Manufac:Incense / Joss sticks
IND
35239
Manufac:Perfumes,cosmetics,toilet
IND
35290
Manufac:Other chemical products
IND
35300
Manufac:Petroleum refineries
IND
35400
Manufac:Misc.petroleum/local prod
IND
93
Appendix
35510
Manufac:Tyoe & tube industries
IND
35590
Manufac:Rubber products (35599)
IND
35591
Manufac:Rubber remilling, latex
IND
35599
Manufac:Other rubber products
IND
35600
Manufac:Other plastic products
IND
36100
Manufac:Pottery,China,earthware
IND
36200
Manufac:Glass, glass product
IND
36910
Manufac:Structural clay products
IND
36911
Manufac:Bricks
IND
36920
Manufac:Cement (36991)
IND
36921
manufac:Hydraulic cement
IND
36922
Manufac:Lime and plaster
IND
36991
Manufac:Cement&concrete product
IND
36992
Manufac:Cut-stone&stone product
IND
26993
Manufac:Marble slabs
IND
36999
Manufac:Other non-metallic mineral
IND
37101
Manufac:Primary iron & steel basic
IND
37102
Manufac:Foundries
IND
37109
Manufac:Other iron&steel
IND
37201
Manufac:Tin smelting
IND
37202
Manufac:Tin recycling
IND
37209
Manufac:Other non-ferrous ind
IND
38111
Manufac:Cutlery,handtools,h/w
IND
38112
Manufac:Atinsmithing/Blacksmithing
IND
38120
Manufac:Metal furniture/fixture
IND
38130
Manufac:Metal structural products
IND
38191
Manufac:Cans &metal boxes
IND
38192
Manufac:Wire&wire products
IND
38193
Manufac:Brass,copper,pewter,alumi
IND
38199
Manufac:Other fabricated metal p.
IND
38210
Manufac:Engines & turbines
IND
38220
Manufac:Agri. Machineries/equip’t
IND
38230
Manufac:Metal&wood working mach
IND
38240
Manufac:Spec. ind. Mach.excp.metal
IND
38250
Manufac:Office,computing,accounting
IND
38291
Manufac:Refrig.exhaust,air-cond m
IND
38299
Manufac:Other machinery equipm’t
IND
38310
Manufac:Elec.ind.mach. & apparatus
IND
38321
Manufac:Radio,TV related equipm’t
IND
38322
Manufac:Gramophone rec.,tapes
IND
38329
Manufac:Semi-cond.,coommu. Equipm’t
IND
38330
Manufac:Elec.appliances/housware
IND
38390
Manufac:Battery,lamps,etc
IND
38391
Manufac:Cables&wires
IND
38392
Manufac:Dry celss & storage batteris
IND
38393
Manufac:Electric lamps &tubes
IND
94
Appendix
38399
Manufac:Misc. elec. Apparatus/supp
IND
38410
Manufac:Shipbuilding&repairing
IND
38420
Manufac:Railroad equipment
IND
38431
Manufac:Motor vehicle bodies
IND
38432
Manufac:Motor vehicles, assembly
IND
38439
Manufac:Motor veh, parts/accessor
IND
38441
Manufac:motor cycle, assembly
IND
38449
Manufac:Bi-,tri-cycles, trishaws
IND
38450
Manufac:Aircraft
IND
38490
Manufac:Other transport equipm’t
IND
38510
Manufac:Prof. & scient. Equip’t
IND
38520
Manufac:Photographic/optical good
IND
38530
Manufac:Watches and clocks
IND
39010
Manufac:Jewellery & related art
IND
39020
Manufac:Musical instruments
IND
39030
Manufac:Sporting&athletic goods
IND
39091
Manufac:Brooms, brushes and mops
IND
39092
Manufac:pens,pencils,office supp.
IND
39093
Manufac:toys
IND
39094
Manufac:Umbrella
IND
39099
Manufac:Other manufacturing ind
IND
41010
Utility:Electric light &power
41020
Utility:Gas manufacture & distr
41030
Utility:Steam &hot water supply
41040
Utility:Telecomm exchange
42000
Utility:Water works &supply
42001
Utility:Water reservoir
42002
Utility:Water treatment plant
43000
utility:TNB sub station
50011
Construc:Residential construs
COM
50012
Construc:Non-residential construst’n
COM
50013
Construc:Civil engineering constr
COM
50021
Contruc:Metal work contractors
COM
50022
Construc:Electrical contractor
COM
50023
Construc:Plumbing,sewage,sanitory
COM
50024
Construc:Air-cond, fridge contractor
COM
50025
Construc:Bricklaying contractor
COM
50026
Construc:Painting contractor
COM
50027
Construc:Carpentry contractor
COM
50028
Construc:Cement,concrete wortk cont
COM
50029
Construc:Special trade contractors
COM
61110
Wholesale:Trade:Meat,poultry
COM
61120
Wholesale:Fish-fresh/frozen/dried
COM
61130
Wholesale:Fruits and vegetables
COM
61140
Wholesale:Confectionery sweets etc
COM
61150
Wholesale:Bakery products
COM
95
Appendix
61160
Wholesale:Rice,other grains
COM
61170
Wholesale:Other food stuffs
COM
61180
Wholesale:Tobacco products
COM
61190
Wholesale:Alcoholic beverages
COM
61211
Wholesale:Household hardware
COM
61212
Wholesale:Household goods/applian
COM
61213
Wholesale:Furniture,furnishing
COM
61214
Wholesale:Clothing,textiles,etc
COM
61215
Wholesale:Footware
COM
61216
Wholesale:Chemists’ goods, cosmetics
COM
61217
Wholesale:Book,stationery, magazine
COM
61218
Wholesale:Jewellery,watches,silver
COM
61219
Wholesale:Bicycles&parts thereof
COM
61221
Wholesale:Photographic equip.
COM
61222
Wholesale:Other household/per. Itm
COM
61310
Wholesale:Motorcycles & parts
COM
61321
Wholesale:Passenger cars-New
COM
61329
Wholesale:Industrials,comm.veh-New
COM
61331
Wholesale:Passenger-Used
COM
61339
Wholesale:Industrials,comm.veh-Used
COM
61340
Wholesale:Motor parts & accessory
COM
61390
Wholesale:Petrol,lubricatig oils
COM
61410
Wholesale:Tractors,farming eqp&prt
COM
61420
Wholesale:Business&Indust. Eqp&prt
COM
61430
Wholesale:Lumber & timber
COM
61440
Wholesale:Other buildng materials
COM
61450
Wholesale:Rubber
COM
61460
Wholesale:palm oil
COM
61470
Wholesale:Livestock
COM
61480
Wholesale:Other agricultural prod.
COM
61490
Wholesale:Other(e.g. mineral)
COM
61500
Wholesale:Large general wholesaler
COM
62110
Retail:Meat&poultry (fresh/frozen)
COM
62120
Retail:Fish-fresh/frozen/dried
COM
62130
Retail:Fruits & vegetables
COM
62140
Retail:Confectionery (sweets etc)
COM
62150
Retail:Bahery products
COM
62160
Retail:Rice,other grains,flour
COM
62170
Retail:Other food stuffs
COM
62180
Retail:Provision store
COM
62190
Retail:Super market
COM
62210
Retail:Tobacco products
COM
62220
Retail:Alcoholic beverages
COM
62311
Retail:Household hardware
COM
62312
Retail:Household good/appliances
COM
62313
Retail:furniture,furnishing
COM
96
Appendix
62314
Retail:Clothing,textiles,etc
COM
62315
Retail:Footware
COM
62316
Retail:Chemists’goods, cosmetics
COM
62317
Retail:Books,stationery,magazines
COM
62318
Retail:Jewellery,watches,silverwar
COM
62319
Retail:Bicycles&part thereof
COM
62320
Retail:Telecommunications products
COM
62321
Retail:General merchandise
COM
62322
Retail:Photographic eqp./supp
COM
62323
Retail:Others
COM
62410
Retail:Motorcycles & parts
COM
62421
Retail:Passenger cars-New
COM
62429
Retail:Passenger-Used
COM
62430
Retail:Motor parts & accessory
COM
62490
Retail:Petrol,lubricatig oils
COM
62500
Retail:Misc.retail trades
COM
62501
Retail:Business Complex(<4 stor
COM
62502
Retail:Business Complex(>4 stor
COM
63100
Retail:Restaurants,cafes etc
COM
63200
Accommodate:Rest house
COM
63201
Accommodate:Hotel below 3 star
COM
63202
Accommodate:Hotel 3 star
COM
63203
Accommodate:Hotel 4 star
COM
63204
Accommodate:Hotel 5 star
COM
63205
Accommodate:Challet,motel
COM
63206
Accommodate:Hostel
COM
63211
Residence:Semi-D (1storey)
DOM
63212
Residence:Semi-D (>1storey)
DOM
63221
Residence:Link house (1 storey)
DOM
63222
Residence:Link house (>1 storey)
DOM
63223
Residence:Shop house
COM
63224
Residence:Luxury link house(>1st)
DOM
63225
Residence:Townhouse(>1 storey)
DOM
63231
Residence:Medium cost apartment
DOM
63232
Residence:Condominium
DOM
63233
Residence:Flat
DOM
62334
Residence:low cost Apt(<=3 rooms)
DOM
63235
Residence:Low cost Apt(>3 rooms)
DOM
63236
Residence:Apt/Condominium (<3 rooms)
DOM
63237
Residence:Apt/Condominium (>3 rooms)
DOM
63238
Residence:Luxury Apt/Condominium
DOM
63240
Residence:Squatter house
DOM
63241
Residence:Long house (temporary)
DOM
63250
Residence:Bungalow (LPC)
DOM
63251
Residence:Bungalow (1 storey)
DOM
63252
Residence:Bungalow (> 1 storey)
DOM
97
Appendix
63253
Residence:Kampong house
DOM
66666
Residence:Palace (formal&informal)
DOM
71110
Transp:Railway transport
COM
71120
Transp:Bus & tramway transport
COM
71121
Transp:Plaza toll serv. & resthouse
COM
71131
Transp:Taxi service
COM
71139
Transp:Others
COM
71140
Transp:Freight transport by road
COM
71150
Transp:Pipeline transport
COM
71160
Transp:Supptng serv. To land trans
COM
71210
Transp:Ocean&coastal water transport
COM
71220
Transp:Inland water transport
COM
71230
Transp:Supptng serv. To water transport
COM
71310
Transp:Air transport carriers
COM
71320
Transp:Supptnd ser to air transport
COM
71911
Transp:Travel&tourist agencies
COM
71919
Transp:Other transport services
COM
71920
Transp:Storage & warehousing
COM
72001
Commmunication:Post
COM
72009
Commmunication:Telecommunication
COM
81011
FinInst:Central bank
COM
81012
FinInst:Commercial bank
COM
81021
FinInst:Pawnbrokers
COM
81022
FinInst:Other financial institu’n
COM
81029
FinInst:Govern.bank (BSN, B.Negara)
COM
81030
FinInst:Financial services
COM
82001
Insurance:Life insurance
COM
82002
Insurance:Provident&pension fund
COM
82003
Insurance:Cacualty insurance
COM
82004
Insurance:Social security org’n
COM
82005
Insurance:Insurance services
COM
83101
RealEstate:Operations
COM
83102
RealEstate:Developments
COM
83103
RealEstate:Services
COM
83210
BusinessServ:Legal services
COM
83220
BusinessServ:Accountng,audtg etc
COM
83230
BusinessServ:Data proc.&tabulatng
COM
83240
BusinessServ:Engineering, architect.
COM
83250
BusinessServ:Advertising
COM
83291
BusinessServ:Estate agencies
COM
83292
BusinessServ:Labour contracting
COM
83299
BusinessServ:Other non-M&E rental
COM
83300
BusinessServ:M&E rental/leasing
COM
91110
Pub.Adm/Defence:General admin
91120
Pub.Adm/Defence:External affairs
91130
Pub.Adm/Defence:Justice/public
98
Appendix
91140
Pub.Adm/Defence:Defence
91150
Pub.Adm/Defence:Educational admin
91160
Pub.Adm/Defence:Health admin
91170
Pub.Adm/Defence:Soc.sec.&welfare
91180
Pub.Adm/Defence:Housg&comm.dev
91190
Pub.Adm/Defence:Other com./soc.af
91210
Pub.Adm/Defence:Econ.&labour aff
91220
Pub.Adm/Defence:Agri,forestry,fis
91230
Pub.Adm/Defence:Mining,manuf.cons
91240
Pub.Adm/Defence:Elec.gas &water
91250
Pub.Adm/Defence:Roads&transport
91260
Pub.Adm/Defence:Water transport
91270
Pub.Adm/Defence:Other transport
91280
Pub.Adm/Defence:Communication
91290
Pub.Adm/Defence:Other services
92000
Sanitary & similar services
93100
SocServ:Educational services
COM
93102
SocServ:Sch with hostel
COM
93103
SocServ:university,college
COM
93104
SocServ:Primary sch
COM
93105
SocServ:Secondary sch
COM
93106
SocServ:Kindergarten,nursery
COM
93200
SocServ:Research &Science Inst.
COM
93311
SocServ:Hospital service
COM
93312
SocServ:Specialst(E&T,O&G,dental)
COM
93313
SocServ:Country clinic
COM
93314
SocServ:Private clinic
COM
93315
SocServ:District Health Centre
COM
93320
SocServ:Veterinary services
COM
93400
SocServ:Welfare institutions
COM
93500
SocServ:Busi,Prof,Labour accoc
93600
SocServ:Community Hall
93650
SocServ:Town Hall
93910
SocServ:religious organisations
93911
SocServ:Surau
93912
SocServ:Mosque
93913
SocServ:Church
93914
SocServ:Temple
93990
SocServ:Social& related com.serv
94110
Recreatn:Motion picture product’n
COM
94120
Recreatn:Motion picturedist.&project
COM
94130
Recreatn:Radio&TV broadcasting
COM
94140
Recreatn:Theatrical producers
COM
94150
Recreatn:Author,music composer
COM
94200
Recreatn:Library,museums,gardens
COM
94900
Recreatn:Other amusement services
COM
99
Appendix
94901
Recreatn:Stadium
COM
95110
OtherServ:Footware/leather repair
COM
95120
OtherServ:Electrical repair shops
COM
95130
OtherServ:Motor vehicle repair
COM
95140
OtherServ:Watch,jewellery repairs
COM
95151
OtherServ:Bicycle repairs
COM
95159
OtherServ:Other repair shops
COM
95200
OtherServ:Laundry,cleaning serv.
COM
95300
OtherServ:Domestic services
COM
95910
OtherServ:Barber&beauty shops
COM
95920
OtherServ:Photographic studios
COM
95990
OtherServ:Other personal services
COM
96000
International & extra-terr ser
99999
Reserved for interest
100
Appendix
APPENDIX 2 – LIST OF COMMERCIAL CONSUMERS INTERVIEWED Company Name Enhance Track Sdn Bhd Mun Chuen Transport Sdn Bhd K.T. Beach Resort Sams Metal Trading (KT) Sdn Bhd
Type Real Estate/Business Services Transport Accommodation Wholesale
S.P. Chong (M) Sdn Bhd
Retail
Pustaka Seri Intan Sdn Bhd
Retail
Lee Electrical & Refrigeration Services Lam Soon Edible Oils Sdn Bhd Lau & Partners Pharmacy Sdn Bhd Terengganu Refrigeration & Elec Services Hotel KT Mutiara Sdn Bhd Bumiputra-Commerce Bank Bhd
Real Estate/Business Services Wholesale Retail Real Estate/Business Services Accommodation Financial Institution
Fibrecomm Network
Communication
Wanziehan Enterprise
Wholesale
Tadika Ezi
Social Services
Perniagaan Kinta Mewah
Wholesale
Boon Hoi (pemborong telur&plastik
Wholesale
Seng Hing Confectionery Sdn Bhd
Wholesale
Risya&Anita Beauty Centre
Residence/Shop House
Perniagaan Shah Ain
Residence/Shop House
Serdang Bakery
Residence/Shop House
K.H.Chin Enterprise
Residence/Shop House
Syarikat Anita Marina Agencies Bumiputra-Commerce Bank Bhd Enviro Lift Sdn. Bhd
Residence/Shop House Real Estate/Business Services Financial Institution Agriculture
Time dot.com Berhad
Communication
Maxis Mobile
Communication
Uni Asia
Insurance
Unique Bubbles Sdn Bhd
Recreation
Tahan Insurance
Insurance
Freight Team (M) Sdn Bhd
Transport
Glomac Enterprise Sdn Bhd
Recreation
APL-NOL (Malaysia) Sdn Bhd
Transport
Viva Life Sdn Bhd
Insurance
Jambatan Merah Sdn Bhd
Transport
Seagull Logistic Sdn Bhd
Transport
Applewood Sdn Bhd
Transport
Berjaya Prudential Assurance Bhd
Insurance
Sunway Risk Management Sdn Bhd
Insurance
Pusat Rekreasi Pelangi
Recreation
Bonus Properties Sdn Bhd
Recreation
RJ Family Store Sdn Bhd
Retail
101
Appendix
How Soon Hardware Trading
Construction
CIMB Bank Bhd (Seremban Branch)
Financial Institution
Maybank Berhad (Seremban)
Financial Institution
Allson Klana Resort Aw & Lena Plumbing Construction Eurotral Wheels Sdn Bhd Day & Day Enterprise Maxis Communications Berhad (S’ban) Golden Screen Cinemas (Seremban) Domino’s Pizza (Seremban)
Accommodation Construction Transport Real Estate/Business Services Communication Recreation Others
Avillion Port Dickson
Accommodation
Corus Paradise Resort Port Dickson
Accommodation
Billion Shopping Center Port Dickson Royal Adelphi Seremban Dreamz Car Wash
Retail Accommodation Others
Kimmark (M) Sdn. Bhd
Retail
Exact Automation Sdn. Bhd
Others
Sin Weng Soon Motor Workshop Klinik Raj & Rakan Rakan
Others Social Services
Klinik Raj & Rakan Rakan
Social Services
MAZ International School
Social Services
Super Racing (Workshop)
Others
Super Racing (Sales of Motorcycle)
Others
CALTEX Petrol Station
Wholesale
102
Appendix
APPENDIX 3 – LIST OF INDUSTRIAL CONSUMERS INTERVIEWED Company Name Texas Instruments BH Bakery Exxon Mobil Production Pertima Terengganu
Type Electrical/Electronics Food Chemicals & Petrochemicals Food
Atlas Edible Ice (Pantai Timur) Sdn Bhd
Food
KT Ice Sdn Bhd
Food
KY Food Industries Sdn Bhd
Food
F&N Coca Cola Sdn Bhd, KT
Beverage & Tobacco
MSET Ship Building Corp Sdn Bhd
Transport Equipment
My Socks Malaysia Sdn Bhd
Textiles & its products
Permint Suterasemai Sdn Bhd
Textiles & its products
Noor Arfa Batek Sdn Bhd
Textiles & its products
Utusan Melayu (M) Berhad, KT S.E.H. (Shah Alam) Sdn Bhd Jernih Ais Sdn Bhd
Paper, Printing & Publishing Electrical/Electronics Food
Freescale, PJ
Electrical/Electronics
F&N Coca Cola Sdn Bhd, KL
Beverage & Tobacco
Dutch Lady Malaysia, PJ
Beverage & Tobacco
Amsteel Mills Sdn Bhd Aluminium Company of Malaysia Berhad Somerville (M) Sdn Bhd Porite (Malaysia) Sdn Bhd Muda Paper Mills Sdn Bhd MetTube Sdn Bhd Antara Steel Mill Sdn Bhd Malaysian Marine Heavy Engineering
Iron & Steel Nonferrous Metals & its products Iron & Steel Others Paper, Printing & Publishing Nonferrous Metals & its products Iron & Steel Others
Flextronics Ind (Malaysia) Sdn Bhd
Electrical/Electronics
Lafarge Cement (Southern Cement)
Clay-based & other non-metallic minerals
URC Snack Foods (Malaysia) Sdn Bhd Malaysian Sheet Glass Sdn Bhd
Food Nonferrous Metals & its products
Yeo Hiap Seng
Beverage & Tobacco
Universal Nutribeverage Sdn. Bhd
Beverage & Tobacco
Paling Plastic Industry Ornasteel (M) Iamko Metal Industry Sdn. Bhd United Power Cable (M)
Plastics Iron & Steel Iron & Steel Transport Equipment
Beryl’s Confectionary
Food
World Prominence Sdn Bhd
Food
Penang Seagate
Electrical/Electronics
AMD Expansion
Electrical/Electronics
ON Semiconductor
Electrical/Electronics
Rohm-Wako Electronics (M) Sdn Bhd
Electrical/Electronics
S.M Biomed Sdn. Bhd
Pharmaceutical
103
Appendix
Continental Sime Tyre PJ Sdn Bhd
Rubber
Hitachi Construction
Machinery & Machinery Equipment
United Bintang
Machinery & Machinery Equipment
CCM Chemicals Sdn Bhd
Chemicals & Petrochemicals
Idemitsu Chemicals (M) Sdn Bhd
Chemicals & Petrochemicals
Kiswire Sdn Bhd CMKS (Malaysia) Sdn Bhd
Iron & Steel Electrical/Electronics
Pharmaniaga
Pharmaceutical
Taisho Pharmaceutical (M) Sdn Bhd
Pharmaceutical
Takeuchi MDF Sdn Bhd Autoways Sdn. Bhd.
Wood & its products Rubber
Hopetech Sdn Bhd
Electrical/Electronics
Yamaha Electronics Manufacturing(M)
Electrical/Electronics
Aquila Sofa Industries Sdn Bhd
Furniture & Fixtures
WF Furniture & Renovation Sdn Bhd
Furniture & Fixtures
Sykt. Cahaya Muda Perak Sdn Bhd
Palm & Palm Kernel Oil
Unitata Berhad
Palm & Palm Kernel Oil
Goodyear Malaysia Berhad Proton (Tg Malim) Palmaju Edible Oil Sdn Bhd
Rubber Transport Equipment Palm & Palm Kernel Oil
104
Appendix
APPENDIX 4 – DOMESTIC QUESTIONNAIRE
105
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
" #
-1-
$
%
$&
' ( %) *
+
,
$!& # -!. , /!$0 1 )
$0
2
&
3
4
5
# , 1 5
-
6
6 (
", ,
'
6
-2-
7
: - 00 '
8
$ 3
6 6
6
6 9 : 1 *1
( %) *
; ,
; 6
# *
'
!
' 3'
<
, ) !
.
3
6
- 00 .6
%) *
6
6 $ 6 51 " *15
9 :,") 7"
" ,* 71 :: ::%
* 3 7 6 *
=
> ) !
9
*
-3-
5 : 71 :: ::%
$
&
-
8
.
$
&
-
8
.
$
&
-
8
.
$
&
-
8
.
$
&
-
8
.
3 $
&
-
8
.
$
&
-
8
.
$
&
-
8
.
/ 3
6 %) *
5 ) ?5 % 70
70$0
70&.
7 .0
7$
7&
7-
7$&
7$.
78
7.
7/
7@
7$0
7&0
7&.
7-0
780
7.0
) " !) "
7
A 6
5
6 6
6
6
6 6
B B22 ( %) *
@
'
5 ) ?5 %
70
70$0
70&.
7 .0
7$
7&
7-
78
7.
7/
7@
7$0
7$&
7$.
7&0
7&.
7-0
780
7.0
5
6
6 ' 6
'
) " !) "
7
6 6 ( %) *
6 5 )?
5 %
C
70
70$0
70&.
7 .0
7$
7&
7-
78
7.
7/
7@
7$0
7$&
7$.
7&0
7&.
7-0
780
7.0
5
6
3<
70
/'A' =
) !
= 1) 5*
-4-
) " !) "
7
@' (
: / 00 '
$0
& 3
6 6
6
6 9 : 1 *1
( %) *
; ,
; 6
# *
'
!
' 3'
<
, ) !
$$
3
6
6
/ 00
6 $ 6
.6
%) *
51 " *15
9 :,") 7"
" ,* 71 :: ::%
* 3 7
*
=
> ) !
9
*
-5-
5 : 71 :: ::%
$
&
-
8
.
$
&
-
8
.
$
&
-
8
.
$
&
-
8
.
$
&
-
8
.
3 $
&
-
8
.
$
&
-
8
.
$
&
-
8
.
$& : 3 %) * 70
6
5 ) ?5 %
70$0
70&.
7 .0
7$
7&
7-
7$&
7$.
78
7.
7/
7@
7$0
7&0
7&.
7-0
780
7.0
) " !) "
7
$6
5
6 6
6
6
6 6
B B22 ( %) *
$8
'
6 5 ) ?5 %
70
70$0
70&.
7 .0
7$
7&
7-
78
7.
7/
7@
7$0
7$&
7$.
7&0
7&.
7-0
780
7.0
5
6
6 ' 6
'
) " !) "
6
7
( %) *
6 5 )?
5 %
$.
70
70$0
70&.
7 .0
7$
7&
7-
78
7.
7/
7@
7$0
7$&
7$.
7&0
7&.
7-0
780
7.0
5
6
3<
70
$&'$=
) !
= 1) 5*
-6-
) " !) "
7
$8' (
Appendix
APPENDIX 5 – COMMERCIAL QUESTIONNAIRE
106
Commercial
Interviewer:
___________________
Date Interview Completed:
___________________
Interview Start Time:
___________________
Name of Respondent:
________________________________________
Title:
____________________________
Contact Number:
____________________________
Company name:
________________________________________
Company Address:
________________________________________ _________________________________________
Business Type (Select One):
Accommodation
Recreation
Agriculture
Residence / Shop House
Communication
Retail
Construction
Social Service
Financial Institution
Transport
Insurance
Wholesale
Real Estate / Business Services
Others (Specify) _______________
Please be informed that all the information you provide here is strictly confidential.
-1-
Commercial
&
- ,
3
,
3 ' , 3 3
.
3 . 8
,
. 3
.
!" # a. " $$$$$$$$$$$$$$$$$$$$$$$$$$$ b. + &
, !" ")/0 1#
-
1
%&' ! ()
&!" *
-
2
.
3
4
5
-
* !" # a. " b. + 2
&
, . 3 0 5"% #
3
6666666666
7
4
. .
3 3
&"
6666666 /
8
3 &1
a. b. ' c. d. 95 e. f. : g. !
4&
5
5 5'' +#
,-
,
# ;#
3<666666666666666
-2-
Commercial
=
-
3
3 a. b. c. d. e.
?
. 5 5 5 5 5
.
!" # .
2 > 2
3
.
1
2
3
3 !" ")/0 1#
4
Dissatisfied
3
3
;
-
!" ")/0 1# 1
2
3
4
Dissatisfied
@
5
Satisfied
3
>
#
5
Satisfied
3
-
3 "0
!" ")/0 1# 1
2
3
Dissatisfied
4
5
Satisfied
-3-
-
Commercial
.
=
3 3
A
;
3 B
4
;
3
, -
,
3
. . ;
.
-
&
8 ,
&3 8
-
3 3
3
3 3
< < < "
A
3 3
-
.
'
!
-
< CC <
:
!" ")/0 1#
.
1
2
3
4
Not Disruptive
5
Disruptive
A
3 !"
5": 4&
# " $$$$$$$$ ; ( 7 .# + $$$$$ 66666666666666 "
&" 0 5"%&45''1!'1&5
.
3 3
*
A .
, 4&
#
#
3
, &" 0 5"%#
1/ 66666666666666 95 ) !4 !
-4-
3
'1!:) &!" !1 5
Commercial
2
A
3 !" ")/0 1#
$$$$$$ $$$$$$ CD
7
CD
.
$$$$$$
$$$$$$ $$$$$$
CD
*CD
A
.
$$$$$$
2CD
$$$$$$
7CD
!
3 &45''1!'1&5
# " .# +
=
A &"
):
&1
>CD
@CD
CCD
. &1
; !" ")/0 1 5": &"
):
#
1/ 666666
.
1/ 666666
.
!
3 3 3
. -
$$$$$
8 &45''1!'1&5
; E
3
;
$$$
#
!"
5":
:#
1/ 6666666 1/ 6666666
. E . ; 5''1!'1&5 #
&3
. #
A
# " .# + >
?CD
$$$$$$ $$$$$
.
# " .# +
?
$$$
=CD
$$$$$$
. 3 3
;
$$$$$$
8
. , 8
#
!"
5": &"
):
!
&4
1/ 66666666
!" ")/0 1 5": &"
# 1/C, .# 1/ 666666 ,
. 3 ):
.
! .
.
-5-
#, &45''1!'1&5
. #
Commercial
@
&
. -, -
-
5''1!'1&5 # " .# + C
5 ; -
-
$$$
, A &1 !"
")/0 1
, F3 . 5": &" ):
3 !
&4
#
1/ 6666666
-
$$$$$$ $$$$$$ $$$$$$ $$$$$$ $$$$$$ $$$$$$ CD CD CD *CD 2CD 7CD 5. ) ; 7CE 7C
-6-
.
$$$$$$ =CD
,
$$$$$$ $$$$$$ $$$$$$ $$$$$$ ?CD >CD @CD CCD ;
Commercial
3 3
,
;
"
# $
%&
'
Complete outage Warning : None Start time : 2 p.m. Duration : 1 hour Complete outage Warning : None Start time : 2 p.m. Duration : 2 hour Complete outage Warning : None Start time : 2 p.m. Duration : 4 hour
2
Complete outage Warning : None Start time : 2 p.m. Duration : 8 hour
7
Complete outage Warning : None Start time : 2 p.m. Duration : 0 – 2 seconds
=
Complete outage Warning :1 hour advance warning Start time : 2 p.m. Duration : 1 hour
, 3
!
*
,
-7-
(# ) $ % ) #
#
# * $+ $ ' %
# '
Commercial
. ;
3 3 3 3
3 3
. 3 . 3
A &3
A !" #
3
3
5
3 3 3
3 ,
3
GH
.
.
;
.
%
. E /
5 5
. E
&
E ' 4
-
E '
!3 3
E 1 3
! E I
'
1
I
E
E I
E
E
. , 3
,
,
,
#
A -
4&
)
&" 0 5"%#
3
F
1/ 6666666666
*
A
. ,
3 ,
F ,
-
#
F &" 0 5"%#
.
&" 0 5"%# 1/6666666666666
2
5
3 #
666666666666 D
-8-
4&
4&
Commercial
7
: 3
,3 8
-
8
-
,
#, !" 5": 4&
0 5"%&45''1!'1&5 " . +
=
%&' ! () A 8
:
6666666666666666
8
.
0 ; $ )
3
#
-
8 !" ")/0 1 5": 4&
&1
.
>
" A .
#
A
8
. ,
3& :
3
33 .#
& # .# #
66666666666666666666
-
#+
@
#
#
:
" +
0 5"% '5
#
&
5''1!'1&5
3 3 %5''1!'1&5
. #
?
&!"
E -
&"
#
,
"
3
#
;
-
-9-
8
3 "0
&" 0 5"%&4
Commercial
C # .# # # #
-
3 9
.
-
3
3 " : 9
3 3
"0
5
3
3
. 3
3 -
3
-
;
3
3
"0
3
-
- 10 -
"0 3 3
Komersial
Penemuduga:
___________________
Tarikh:
___________________
Masa :
___________________
Nama Responden:
________________________________________
Jawatan:
____________________________
Nombor Telefon:
____________________________
Nama Syarikat:
________________________________________
Alamat:
________________________________________ _________________________________________
Jenis Perniagaan (Pilih Satu) :
Penginapan
Rekreasi
Pertanian
Kediaman/Rumah Kedai
Komunikasi
Runcit
Pembinaan
Perkhidmatan Sosial
Institusi Kewangan
Pengangkutan
Insurans
Jualan Gudang (Wholesale)
Hartanah/Perkhidmatan Bisnes
Lain-lain (Nyatakan)
Semua maklumat yang diberikan di sini adalah sulit.
_______________
Komersial
KAJI SELIDIK GANGGUAN BEKALAN ELEKTRIK KOMERSIAL/INDUSTRI SOALAN ASAS
1.
Adakah organisasi anda mengalami ganguan elektrik 12 bulan kebelakangan ini? (PILIH SATU) a. b.
2.
Tidak --------------------------Ya
Secara umum, adakah keadaan ini sangat mengganggu perjalanan operasi di syarikat anda? (PILIH SATU)
1
3.
3
4
5
Tidak Ya
Secara umum, berapa lamakah syarikat anda boleh menunggu sekiranya bekalan elektrik terputus sebelum keadaan ini memberi kesan terhadap kos operasi syarikat? (ISIKAN TEMPAT KOSONG) __________ Jam
5.
2
Adakah syarikat anda menghantar pekerja pulang sekiranya berlaku gangguan elektrik? (PILIH SATU) a. b.
4.
(TERUS KE SOALAN 3)
dan
_______ Minit
Diantara berikut, peralatan manakah yang memerlukan bekalan elektrik secara berterusan? (PILIH YANG BERKAITAN) a. KomputerComputers b. TelefonPhones c. Sistem KeselamatanSecurity system d. Sistam HVAC (pemanas, pengudaraan, penghawa dingin) e. Mesin Tunai f. Sistem komunikasi data (LAN) g. Lain-lain (Nyatakan: _______________
-2-
Komersial
6.
Berapa lamakah pemberitahuan mengenai gangguan bekalan elektrik yang diperlukan oleh syarikat anda untuk mengurangkan kesan yang disebabkan oleh gangguan tersebut? (PILIH SATU) a. b. c. d. e.
7.
Pemberitahuan awal tidak mengurangkan kesan gangguan Sekurang-kurangnya 1 jam Sekurang-kurangnya 4 jam Sekurang-kurangnya 8 jam Sekurang-kurangnya 24 jam
Adakah anda berpuas hati dengan kadar gangguan elektrik yang dialami oleh syarikat anda 12 bulan kebelakangan ini? (PILIH SATU)
1
2
3
4
Tidak puas hati
8.
Berpuashati
Adakah anda berpuas hati dengan masa yang diambil untuk keadaan pulih seperti biasa selepas gangguan elektrik? (PILIH SATU)
1
2
3
4
Tidak puas hati
9.
5
5 Berpuashati
Adakah anda berpuas hati dengan tanggungjawab yang dijalankan oleh TNB apabila berlaku gangguan elektrik? (PILIH SATU)
1
2
3
4
5 Berpuashati
Tidak puas hati
-3-
Komersial
Soalan Berangka
!
"#
$
!
#% #)
1.
& *
2
3
Tidak mengganggu
4
5 Mengganggu
Adakah proses pengeluaran dan jualan terhenti ataupun terganggu disebabkan oleh senario ini? (PILIH SATU DAN ISIKAN TEMPAT KOSONG JIKA BERKENAAN) a. b.
3.
#' (( #"
Apakah senario ini mengganggu syarijat anda? (PILIH SATU)
1
2.
&
Tidak -------- Terus ke soalan 5 Ya ----______________ Jumlah jam operasi pengeluaran ataupun jualan yang terhenti ataupun terganggu (termasuk masa semasa gangguan dan selepas gangguan)
Apakah nilai anggaran kerugian untuk proses pengeluaran ataupun jualan yang mungkin dihadapi, sekurang-kurangnya sementara, semasa gangguan dan tempoh yang perlahan selepas gangguan? (ISIKAN TEMPAT KOSONG) RM ______________ Nilai kerugian pengeluaran ataupun jualan
-4-
Komersial
4.
Berapa peratuskah proses pengeluaran ataupun jualan ini boleh di peroleh semula? (PILIH SATU) ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ----0%
5.
50%
60%
70%
80%
90%
100%
Tidak Ya --RM ______ kos buruh untuk kakitangan yang tidak dapat bekerja ketika gangguan bekalan elektrik RM ______ kos buruh untuk kerja lebih masa
Tidak Ya -----
RM _______ kerosakan peralatan RM _______ kerosakan kepada bahan ataupun produk
Tidak Ya ---
RM ________ kos tambahan
Sekiranya anda diminta untuk meletakkan nilai ringgit untuk kos tersembunyi akibat gangguan ini (seperti ketidakselesaan dan ketidakpuasan hati pelanggan), berapakah nilai tersebut? a. b.
9.
40%
Adakah terdapat sebarang tambahan kos nyata akibat gangguan ini (seperti ‘overhead’ dan susut nilai, kos tambahan untuk memulakan semula operasi dan kos untuk membeli dan/atau menyewa peralatan sokongan) (PILIH SATU DAN NYATAKAN KOS JIKA BERKAITAN) a. b.
8.
30%
Adakah terdapat kos kerosakan yang disebabkan oleh gangguan ini (seperti kerosakan pada peralatan ataupun kerosakan pada produk)? (PILIH SATU DAN NYATAKAN KOS JIKA BERKAITAN) a. b.
7.
20%
Adakah terdapat kos buruh yang ditanggung akibat gangguan ini (seperti gaji ataupun upah kepada kakitangan yang tidak dapat bekerja ataupun bayaran kerja lebih masa untuk mengurangkan kerugian pengeluaran ataupun jualan)? (BULATKAN SATU NOMBOR DAN NYATAKAN KOS JIKA BERKAITAN) a. b.
6.
10%
RM0, tiada kos tersembunyi RM ______ , kos tersembunyi
Sebagai tambahan kepada kos-kos yang dibincangkan di atas, ada syarikat yang dapat berjimat kerana proses yang tidak dapat dijalankan, seperti bahan-bahan yang tidak digunakan ataupun inventori, penjimatan daripada bil atapun upah yang tidak perlu dibayar. Adakah syarikat anda mendapat kos penjimatan ini akibat daripada gangguan tersebut? a. b.
Tidak Ya ---
RM _______
-5-
Komersial
10.
Andaikan syarikat anda diberi pemberitahuan awal satu jam mengenai gangguan bekakan elektrik, berapa peratuskah kakitangan anda akan mendengar pemberitahuan ini? ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ -----0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
11.
Tiada
50/50
Ada
Pertimbangkan kesemua kos yang mungkin anda hadapi akibat gangguan bekalan elektrik dan nilaikan kos kerugian anda berdasarkan scenario yang diberikan. (Masukkan kosong jika tiada) Kes
Senario
1
Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 1 jam
2
Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 2 jam
3
Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 4 jam
4
Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 8 jam
5
Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 0 – 2 saat
6
Gangguan Total Amaran :1 jam awal Masa : 2 p.m. Tempoh : 1 jam
Jumlah Kerugian Minimum (Best Case) RM
-6-
Jumlah Kerugian Pertengahan (Typical case) RM
Jumlah Kerugian Maksimum (Worst case) RM
Komersial
SOALAN TAMBAHAN &
! + ,-
1.
Apakah kategori aktiviti syarikat anda? (Tandakan Satu) 1
1 1
.
1 /
/0
&
&
1
1 1
/0 /
/1
/
/ /1
/
/)
/!
1
%
1 ) 2.
)
Apakah nilai anggaran tahunan untuk operasi ataupun perkhidmatan bagi syarikat anda? (ISIKAN TEMPAT KOSONG) RM __________ /tahun
3.
Apakah nilai anggaran perbelanjaan tahunan (termasuk kos buruh. Sewa, bahan-bahan dan kos ‘overhead’ lain)? (ISIKAN TEMPAT KOSONG) RM_____________ /tahun
4.
Apakah peratusan bajet tahunan anda yang digunakan untuk kos tenaga (gas dan elektrik) (ISIKAN TEMPAT KOSONG) ____________ %
5.
Adakah syarikat anda mempunyai peralatan elektrik yang sensitif terhadap kualiti bekalan elektrik ?
-7-
Komersial
a. b. 6.
Back-up generator(s) Uninterruptible power supply Line conditioning device(s) Surge suppressor(s) Isolation transformers(s)
Adakah syarikat anda mempunyai peralatan elektri yang memerlukan bekalan elektrik yang berterusan walaupun terdapat gangguan bekalan elektrik? Tiada Ya
8.
Terus ke soalan 11 Apakah peralatan tersebut? ________________
Adakah syarikat anda mempunyai/menyewa peralatan di bawah untuk melindungi peralatan di syarikat anda? a. b. c. d. e.
7.
Tidak Ya
Apakah peralatan tersebut? ____________________
Adakah anda akan mengalami sebarang kos tidak langsung seperti kehilangan peluang perniagaan, ataupun kehilangan harta benda? a. Ya (Sila jelaskan)
9.
b. Tiada
Sejak 5 tahun kebelakangan ini, adakah anda fikir kualiti Perkhidmatan TNB meningkat? a. b. c.
Meningkat Merosot Tiada perubahan
10. Apakah pendapat andamengenai perkhidmatan bekalan elektrik oleh TNB? a. b. c. d. e.
11.
Sangat berpuashati Berpuashati Biasa-biasa sahaja Tidak berpuashati Sangat tidak berpuashati
Adakah terdapat kejadian yang berlaku setahun kebelakangan ini yang menyebabkan anda tidak berpuashati dengan TNB?
-8-
Komersial
12.
Adakat terdapat perkhidmatan tambahan yang anda inginkan daripada TNB pada masa hadapan?
-9-
Appendix
APPENDIX 6 – INDUSTRIAL QUESTIONNAIRE
107
Industrial
Interviewer:
___________________
Date Interview Completed:
___________________
Interview Start Time:
___________________
Name of Respondent:
________________________________________
Title:
____________________________
Contact Number:
____________________________
Company name:
________________________________________
Company Address:
________________________________________ _________________________________________
Business Type (Select One):
Electrical / Electronic
Rubber
Paper, Printing & Publishing
Beverage & Tobacco
Chemicals & Petrochemicals
Furniture & Fixtures
Iron & Steel
Scientific & Measuring Equipment
Nonferrous metal & its products
Leather & its products
Transport Equipment
Palm & Palm Kernel Oil
Food
Pharmaceutical
Wood & Wood products
Photographs, Cinemagraphics, Video & Optical
Textiles & Textile products
Machinery & Machinery Equipment
Clay-based & other nonmetallic minerals
Lain-lain (Nyatakan)
Plastics
_______________
Please be informed that all the information you provide here is strictly confidential.
-1-
Industrial
&
- ,
3
,
3 ' , 3 3
.
3 . 8
,
. 3
.
!" # a. " $$$$$$$$$$$$$$$$$$$$$$$$$$$ b. + &
, !" ")/0 1#
-
1
%&' ! ()
&!" *
-
2
.
3
4
5
-
* !" # a. " b. + 2
&
, . 3 0 5"% #
3
6666666666
7
4
. .
3 3
&"
6666666 /
8
3 &1
a. b. ' c. d. 95 e. f. : g. !
4&
5
5 5'' +#
,-
,
# ;#
3<666666666666666
-2-
Industrial
=
-
3
3 a. b. c. d. e.
?
. 5 5 5 5 5
.
!" # .
2 > 2
3
.
1
2
3
3 !" ")/0 1#
4
Dissatisfied
3
3
;
-
!" ")/0 1# 1
2
3
4
Dissatisfied
@
5
Satisfied
3
>
#
5
Satisfied
3
-
3 "0
!" ")/0 1# 1
2
3
Dissatisfied
4
5
Satisfied
-3-
-
Industrial
.
=
3 3
A
;
3 B
4
;
3
, -
,
3
. . ;
.
-
&
8 ,
&3 8
-
3 3
3
3 3
< < < "
A
3 3
-
.
'
!
-
< CC <
:
!" ")/0 1#
.
1
2
3
4
Not Disruptive
5
Disruptive
A
3 !"
5": 4&
# " $$$$$$$$ ; ( 7 .# + $$$$$ 66666666666666 "
&" 0 5"%&45''1!'1&5
.
3 3
*
A .
, 4&
#
#
3
, &" 0 5"%#
1/ 66666666666666 95 ) !4 !
-4-
3
'1!:) &!" !1 5
Industrial
2
A
3 !" ")/0 1#
$$$$$$ $$$$$$ CD
7
CD
.
$$$$$$
$$$$$$ $$$$$$
CD
*CD
A
.
$$$$$$
2CD
$$$$$$
7CD
!
3 &45''1!'1&5
# " .# +
=
A &"
):
&1
>CD
@CD
CCD
. &1
; !" ")/0 1 5": &"
):
#
1/ 666666
.
1/ 666666
.
!
3 3 3
. -
$$$$$
8 &45''1!'1&5
; E
3
;
$$$
#
!"
5":
:#
1/ 6666666 1/ 6666666
. E . ; 5''1!'1&5 #
&3
. #
A
# " .# + >
?CD
$$$$$$ $$$$$
.
# " .# +
?
$$$
=CD
$$$$$$
. 3 3
;
$$$$$$
8
. , 8
#
!"
5": &"
):
!
&4
1/ 66666666
!" ")/0 1 5": &"
# 1/C, .# 1/ 666666 ,
. 3 ):
.
! .
.
-5-
#, &45''1!'1&5
. #
Industrial
@
&
. -, -
-
5''1!'1&5 # " .# + C
5 ; -
-
$$$
, A &1 !"
")/0 1
, F3 . 5": &" ):
3 !
&4
#
1/ 6666666
-
$$$$$$ $$$$$$ $$$$$$ $$$$$$ $$$$$$ $$$$$$ CD CD CD *CD 2CD 7CD 5. ) ; 7CE 7C
-6-
.
$$$$$$ =CD
,
$$$$$$ $$$$$$ $$$$$$ $$$$$$ ?CD >CD @CD CCD ;
Industrial
3 3
,
;
"
# $
%&
'
Complete outage Warning : None Start time : 2 p.m. Duration : 1 hour Complete outage Warning : None Start time : 2 p.m. Duration : 2 hour Complete outage Warning : None Start time : 2 p.m. Duration : 4 hour
2
Complete outage Warning : None Start time : 2 p.m. Duration : 8 hour
7
Complete outage Warning : None Start time : 2 p.m. Duration : 0 – 2 seconds
=
Complete outage Warning :1 hour advance warning Start time : 2 p.m. Duration : 1 hour
, 3
!
*
,
-7-
(# ) $ % ) #
#
# * $+ $ ' %
# '
Industrial
. ;
3 3 3 3
3 3
. 3 . 3
A &3
A !" #
3
3
5
3 3 3
3 ,
3
GH
.
.
;
.
%
. E /
5 5
. E
&
E ' 4
-
E '
!3 3
E 1 3
! E I
'
1
I
E
E I
E
E
. , 3
,
,
,
#
A -
4&
)
&" 0 5"%#
3
F
1/ 6666666666
*
A
. ,
3 ,
F ,
-
#
F &" 0 5"%#
.
&" 0 5"%# 1/6666666666666
2
5
3 #
666666666666 D
-8-
4&
4&
Industrial
7
: 3
,3 8
-
8
-
,
#, !" 5": 4&
0 5"%&45''1!'1&5 " . +
=
%&' ! () A 8
:
6666666666666666
8
.
0 ; $ )
3
#
-
8 !" ")/0 1 5": 4&
&1
.
>
" A .
#
A
8
. ,
3& :
3
33 .#
& # .# #
66666666666666666666
-
#+
@
#
#
:
" +
0 5"% '5
#
&
5''1!'1&5
3 3 %5''1!'1&5
. #
?
&!"
E -
&"
#
,
"
3
#
;
-
-9-
8
3 "0
&" 0 5"%&4
Industrial
C # .# # # #
-
3 9
.
-
3
3 " : 9
3 3
"0
5
3
3
. 3
3 -
3
-
;
3
3
"0
3
-
- 10 -
"0 3 3
Perindustrian
Penemuduga:
___________________
Tarikh:
___________________
Masa :
___________________
Nama Responden:
________________________________________
Jawatan:
____________________________
Nombor Telefon:
____________________________
Nama Syarikat:
________________________________________
Alamat:
________________________________________ _________________________________________
Jenis Perniagaan (Pilih Satu) :
Elektrikal/Elektronik
Getah
Kertas,percetakan & penerbitan
Tembakau
Kimia & Petrokimia
Perabot
Besi & Keluli
Peralatan Sains
Logam bukan besi & produknya
Kulit & produk kulit
Makanan
Minyak Kelapa Sawit Farmasi
Kayu & Produk kayu
Fotografi,Sinematografi,Video& Optikal
Tekstil & Produk Tekstil
Mesin dan peralatannya
Tembikar, produk tanah liat & mineral bukan logam, dll
Lain-lain (Nyatakan)
Peralatan pengangkutan
Plastik
Semua maklumat yang diberikan di sini adalah sulit.
_______________
Perindustrian
KAJI SELIDIK GANGGUAN BEKALAN ELEKTRIK KOMERSIAL/INDUSTRI SOALAN ASAS
1.
Adakah organisasi anda mengalami ganguan elektrik 12 bulan kebelakangan ini? (PILIH SATU) a. b.
2.
Tidak --------------------------Ya
Secara umum, adakah keadaan ini sangat mengganggu perjalanan operasi di syarikat anda? (PILIH SATU)
1
3.
3
4
5
Tidak Ya
Secara umum, berapa lamakah syarikat anda boleh menunggu sekiranya bekalan elektrik terputus sebelum keadaan ini memberi kesan terhadap kos operasi syarikat? (ISIKAN TEMPAT KOSONG) __________ Jam
5.
2
Adakah syarikat anda menghantar pekerja pulang sekiranya berlaku gangguan elektrik? (PILIH SATU) a. b.
4.
(TERUS KE SOALAN 3)
dan
_______ Minit
Diantara berikut, peralatan manakah yang memerlukan bekalan elektrik secara berterusan? (PILIH YANG BERKAITAN) a. KomputerComputers b. TelefonPhones c. Sistem KeselamatanSecurity system d. Sistam HVAC (pemanas, pengudaraan, penghawa dingin) e. Mesin Tunai f. Sistem komunikasi data (LAN) g. Lain-lain (Nyatakan: _______________
-2-
Perindustrian
6.
Berapa lamakah pemberitahuan mengenai gangguan bekalan elektrik yang diperlukan oleh syarikat anda untuk mengurangkan kesan yang disebabkan oleh gangguan tersebut? (PILIH SATU) a. b. c. d. e.
7.
Pemberitahuan awal tidak mengurangkan kesan gangguan Sekurang-kurangnya 1 jam Sekurang-kurangnya 4 jam Sekurang-kurangnya 8 jam Sekurang-kurangnya 24 jam
Adakah anda berpuas hati dengan kadar gangguan elektrik yang dialami oleh syarikat anda 12 bulan kebelakangan ini? (PILIH SATU)
1
2
3
4
Tidak puas hati
8.
Berpuashati
Adakah anda berpuas hati dengan masa yang diambil untuk keadaan pulih seperti biasa selepas gangguan elektrik? (PILIH SATU)
1
2
3
4
Tidak puas hati
9.
5
5 Berpuashati
Adakah anda berpuas hati dengan tanggungjawab yang dijalankan oleh TNB apabila berlaku gangguan elektrik? (PILIH SATU)
1
2
3
4
5 Berpuashati
Tidak puas hati
-3-
Perindustrian
Soalan Berangka
!
"#
$
!
#% #)
1.
& *
2
3
Tidak mengganggu
4
5 Mengganggu
Adakah proses pengeluaran dan jualan terhenti ataupun terganggu disebabkan oleh senario ini? (PILIH SATU DAN ISIKAN TEMPAT KOSONG JIKA BERKENAAN) a. b.
3.
#' (( #"
Apakah senario ini mengganggu syarijat anda? (PILIH SATU)
1
2.
&
Tidak -------- Terus ke soalan 5 Ya ----______________ Jumlah jam operasi pengeluaran ataupun jualan yang terhenti ataupun terganggu (termasuk masa semasa gangguan dan selepas gangguan)
Apakah nilai anggaran kerugian untuk proses pengeluaran ataupun jualan yang mungkin dihadapi, sekurang-kurangnya sementara, semasa gangguan dan tempoh yang perlahan selepas gangguan? (ISIKAN TEMPAT KOSONG) RM ______________ Nilai kerugian pengeluaran ataupun jualan
-4-
Perindustrian
4.
Berapa peratuskah proses pengeluaran ataupun jualan ini boleh di peroleh semula? (PILIH SATU) ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ----0%
5.
50%
60%
70%
80%
90%
100%
Tidak Ya --RM ______ kos buruh untuk kakitangan yang tidak dapat bekerja ketika gangguan bekalan elektrik RM ______ kos buruh untuk kerja lebih masa
Tidak Ya -----
RM _______ kerosakan peralatan RM _______ kerosakan kepada bahan ataupun produk
Tidak Ya ---
RM ________ kos tambahan
Sekiranya anda diminta untuk meletakkan nilai ringgit untuk kos tersembunyi akibat gangguan ini (seperti ketidakselesaan dan ketidakpuasan hati pelanggan), berapakah nilai tersebut? a. b.
9.
40%
Adakah terdapat sebarang tambahan kos nyata akibat gangguan ini (seperti ‘overhead’ dan susut nilai, kos tambahan untuk memulakan semula operasi dan kos untuk membeli dan/atau menyewa peralatan sokongan) (PILIH SATU DAN NYATAKAN KOS JIKA BERKAITAN) a. b.
8.
30%
Adakah terdapat kos kerosakan yang disebabkan oleh gangguan ini (seperti kerosakan pada peralatan ataupun kerosakan pada produk)? (PILIH SATU DAN NYATAKAN KOS JIKA BERKAITAN) a. b.
7.
20%
Adakah terdapat kos buruh yang ditanggung akibat gangguan ini (seperti gaji ataupun upah kepada kakitangan yang tidak dapat bekerja ataupun bayaran kerja lebih masa untuk mengurangkan kerugian pengeluaran ataupun jualan)? (BULATKAN SATU NOMBOR DAN NYATAKAN KOS JIKA BERKAITAN) a. b.
6.
10%
RM0, tiada kos tersembunyi RM ______ , kos tersembunyi
Sebagai tambahan kepada kos-kos yang dibincangkan di atas, ada syarikat yang dapat berjimat kerana proses yang tidak dapat dijalankan, seperti bahan-bahan yang tidak digunakan ataupun inventori, penjimatan daripada bil atapun upah yang tidak perlu dibayar. Adakah syarikat anda mendapat kos penjimatan ini akibat daripada gangguan tersebut? a. b.
Tidak Ya ---
RM _______
-5-
Perindustrian
10.
Andaikan syarikat anda diberi pemberitahuan awal satu jam mengenai gangguan bekakan elektrik, berapa peratuskah kakitangan anda akan mendengar pemberitahuan ini? ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ -----0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
11.
Tiada
50/50
Ada
Pertimbangkan kesemua kos yang mungkin anda hadapi akibat gangguan bekalan elektrik dan nilaikan kos kerugian anda berdasarkan scenario yang diberikan. (Masukkan kosong jika tiada) Kes
Senario
1
Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 1 jam
2
Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 2 jam
3
Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 4 jam
4
Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 8 jam
5
Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 0 – 2 saat
6
Gangguan Total Amaran :1 jam awal Masa : 2 p.m. Tempoh : 1 jam
Jumlah Kerugian Minimum (Best Case) RM
-6-
Jumlah Kerugian Pertengahan (Typical case) RM
Jumlah Kerugian Maksimum (Worst case) RM
Perindustrian
SOALAN TAMBAHAN &
! + ,-
1.
Apakah kategori aktiviti syarikat anda? (Tandakan Satu) 1
1 1
.
1 /
/0
&
&
1
1 1
/0 /
/1
/
/ /1
/
/)
/!
1
%
1 ) 2.
)
Apakah nilai anggaran tahunan untuk operasi ataupun perkhidmatan bagi syarikat anda? (ISIKAN TEMPAT KOSONG) RM __________ /tahun
3.
Apakah nilai anggaran perbelanjaan tahunan (termasuk kos buruh. Sewa, bahan-bahan dan kos ‘overhead’ lain)? (ISIKAN TEMPAT KOSONG) RM_____________ /tahun
4.
Apakah peratusan bajet tahunan anda yang digunakan untuk kos tenaga (gas dan elektrik) (ISIKAN TEMPAT KOSONG) ____________ %
5.
Adakah syarikat anda mempunyai peralatan elektrik yang sensitif terhadap kualiti bekalan elektrik ?
-7-
Perindustrian
a. b. 6.
Back-up generator(s) Uninterruptible power supply Line conditioning device(s) Surge suppressor(s) Isolation transformers(s)
Adakah syarikat anda mempunyai peralatan elektri yang memerlukan bekalan elektrik yang berterusan walaupun terdapat gangguan bekalan elektrik? Tiada Ya
8.
Terus ke soalan 11 Apakah peralatan tersebut? ________________
Adakah syarikat anda mempunyai/menyewa peralatan di bawah untuk melindungi peralatan di syarikat anda? a. b. c. d. e.
7.
Tidak Ya
Apakah peralatan tersebut? ____________________
Adakah anda akan mengalami sebarang kos tidak langsung seperti kehilangan peluang perniagaan, ataupun kehilangan harta benda? a. Ya (Sila jelaskan)
9.
b. Tiada
Sejak 5 tahun kebelakangan ini, adakah anda fikir kualiti Perkhidmatan TNB meningkat? a. b. c.
Meningkat Merosot Tiada perubahan
10. Apakah pendapat andamengenai perkhidmatan bekalan elektrik oleh TNB? a. b. c. d. e.
11.
Sangat berpuashati Berpuashati Biasa-biasa sahaja Tidak berpuashati Sangat tidak berpuashati
Adakah terdapat kejadian yang berlaku setahun kebelakangan ini yang menyebabkan anda tidak berpuashati dengan TNB?
-8-
Perindustrian
12.
Adakat terdapat perkhidmatan tambahan yang anda inginkan daripada TNB pada masa hadapan?
-9-
1274714 555.10000 30.00000
Terrace
422.00000 23.00000
1250930 327.00000 18.00000
High_Rise 390598
Kampong
847.00000 48.00000
Statistic
Sum Statistic
Statistic
Std. Deviation Statistic
Variance
181.7513349 .15853669 99.08200478 .486
.629
585.10000 308398953.15000 241.9358014 .11703923 132.14107438 17461.264 .644
5770.100
9817.244
Skewness
Kurtosis
.002
.002
.004
.008
-.365
-.438
-.307
.151
.004
.004
.008
.016
Statistic Std. Error Statistic Std. Error
320.9752837 .64899961 198.11184899 39248.305 .945
Std. Error
Mean Statistic
345.00000 187651495.50002 150.0095893 .06791648 75.96117738
445.00000 70991707.90000
895.00000 29909118.89000
Statistic
Minimum Maximum
93182
Bungalow
Range
Statistic
N
Statistic
Descriptive Statistics
APPENDIX 7 – SPSS ANALYSIS OF DOMESTIC CONSUMER POPULATION
Appendix
108
Appendix
Descriptives Mean
Bungalow
95% Confidence Interval for Mean
High-Rise
.64899961
Lower Bound 319.7032518 Upper Bound
322.2473157 307.8238334
Median
269.0000000
Variance
39248.305
Std. Deviation
198.11184899
Minimum
48.00000
Maximum
895.00000
Range
847.00000
Interquartile Range
260.00000
Skewness
.945
.008
Kurtosis
.151
.016
Mean
325.4398532
.17987639
Lower Bound 325.0872975 Upper Bound
325.7924089
5% Trimmed Mean
323.4196912
Median
315.0000000
Variance
3014.952
Std. Deviation
54.90857467
Minimum
249.00000
Maximum
445.00000
Range
196.00000
Interquartile Range
90.00000
Skewness
.470
.008
Kurtosis
-.918
.016
Mean
307.7973976
.06370672
95% Confidence Interval for Mean
Kampong
Std. Error
320.9752837
5% Trimmed Mean
95% Confidence Interval for Mean
Terrace
Statistic
Lower Bound 307.6725331 Upper Bound
307.9222620
5% Trimmed Mean
307.4545394
Median
306.0000000
Variance
378.183
Std. Deviation
19.44694013
Minimum
277.00000
Maximum
345.00000
Range
68.00000
Interquartile Range
33.00000
Skewness
.222
.008
Kurtosis
-1.132
.016
Mean
525.4215279
.10462629
95% Confidence Interval for Mean
Lower Bound 525.2164615 Upper Bound
525.6265942
5% Trimmed Mean
524.9973351
Median
523.0000000
Variance
1020.032
109
Appendix
Std. Deviation
31.93793622
Minimum
474.00000
Maximum
585.10000
Range
111.10000
Interquartile Range
54.00000
Skewness
.170
.008
Kurtosis
-1.163
.016
M-Estimators Huber’ s M-Estimator(a) Tukey’ s Biweight(b) Hampel’ s M-Estimator(c) Andrews’ Wave(d) Bungalow 283.0957654 266.7803206 High-Rise 319.1433789 319.8449702 Kampong 306.5501089 306.7611529 Terrace 523.8345753 524.1387126 a The weighting constant is 1.339. b The weighting constant is 4.685. c The weighting constants are 1.700, 3.400, and 8.500 d The weighting constant is 1.340*pi.
287.5231060 321.7710026 307.0419735 524.5207610
266.2687862 319.8801127 306.7645718 524.1416048
Percentiles Percentiles Bungalow
83.0000000
10
25
50
75
90
95
109.0000000 170.0000000 269.0000000 430.0000000 630.0000000 741.0000000
High_Rise 254.0000000 260.0000000 278.0000000 315.0000000 368.0000000 410.0000000 427.0000000 Kampong
280.0000000 283.0000000 291.0000000 306.0000000 324.0000000 336.0000000 341.0000000
Terrace
479.0000000 483.0000000 498.0000000 523.0000000 552.0000000 572.0000000 578.0000000
Bungalow
170.0000000 269.0000000 430.0000000
High_Rise
278.0000000 315.0000000 368.0000000
Kampong
291.0000000 306.0000000 324.0000000
Terrace
498.0000000 523.0000000 552.0000000
Extreme Values
Bungalow
Highest
Lowest
Highest High_Rise
Tukey’ s Hinges
Weighted Average(Defi nition 1)
5
Case Number
Value
1
1
895.00000
2
2
895.00000
3
3
895.00000
4
4
895.00000
5
5
895.00000(a)
1
93182
48.00000
2
93181
48.00000
3
93180
48.00000
4
93179
48.00000
5
93178
48.00000(b)
1
1
445.00000
2
2
445.00000
3
3
445.00000
4
4
445.00000
110
Appendix
Lowest
Kampong
Highest
Lowest
Terrace
Highest
Lowest
5
5
445.00000(c)
1
93182
249.00000
2
93181
249.00000
3
93180
249.00000
4
93179
249.00000
5
93178
249.00000(d)
1
1
345.00000
2
2
345.00000
3
3
345.00000
4
4
345.00000
5
5
345.00000(e)
1
93182
277.00000
2
93181
277.00000
3
93180
277.00000
4
93179
277.00000
5
93178
277.00000(f)
1
1
585.10000
2
2
585.00000
3
3
585.00000
4
4
585.00000
5
5
585.00000(g)
1
93182
474.00000
2
93181
474.00000
3
93180
474.00000
4
93179
474.00000
5
93178
474.00000(h)
a Only a partial list of cases with the value 895.00000 are shown in the table of upper extremes. b Only a partial list of cases with the value 48.00000 are shown in the table of lower extremes. c Only a partial list of cases with the value 445.00000 are shown in the table of upper extremes. d Only a partial list of cases with the value 249.00000 are shown in the table of lower extremes. e Only a partial list of cases with the value 345.00000 are shown in the table of upper extremes. f Only a partial list of cases with the value 277.00000 are shown in the table of lower extremes. g Only a partial list of cases with the value 585.00000 are shown in the table of upper extremes. h Only a partial list of cases with the value 474.00000 are shown in the table of lower extremes.
111
Appendix
1000.00000
129 664 1,245 1,990 800.00000
600.00000
400.00000
200.00000
0.00000 Bungalow
112
Appendix
450.00000
400.00000
350.00000
300.00000
250.00000
High_Rise
113
Appendix
340.00000
320.00000
300.00000
280.00000
Kampong
114
Appendix
115
4227
2114
3197
2530
1992
835
21975
16954
615
14607
2197
6668
190643
103757
Agri
Comm
Constr
FinInst
Insurance
Others
RealEst_BS
Recreation
SocialServ
Transport
Wholesale
Retail
ResidenceShopHouse
1290.00000
2096.00000
3858.00
6122.00
4403.00
60965.00
2496.00
1682.00
3224.00
10729.00
4056.00
7962.42
8941.00
5986.00
Statistic
Statistic
Accomodation
Range
N
72.00000
117.00000
214.00
340.00
245.00
3436.00
139.00
93.00
180.00
597.00
226.00
444.00
497.00
334.00
Statistic
Minimum
1362.00000
2213.00000
4072.00
6462.00
4648.00
64401.00
2635.00
1775.00
3404.00
11326.00
4282.00
8406.42
9438.00
6320.00
Statistic
Maximum
44916813.86000
120357090.88000
7058635.55
3585598.70
17860434.25
8549437.75
14953611.83
12059468.42
1018394.57
8188935.37
3136966.04
7267130.04
4998059.03
6580639.93
Statistic
Sum
432.9039377
631.3218470
1058.5836
1632.0431
1222.7312
13901.5248
882.0108
548.7813
1219.6342
4110.9113
1239.9075
2273.1092
2364.2663
1556.8110
Statistic
Std. Error
.98630187
1.12565658
10.94583
30.52458
8.44329
521.28169
4.84175
2.81377
28.70930
64.84349
19.74230
28.70555
45.71755
21.26297
Mean
Descriptive Statistics
317.70098291
491.49187740
893.81245
1430.75338
1020.45100
12927.36547
630.43194
417.11255
829.59432
2894.08324
993.02010
1623.06973
2102.01321
1382.41998
Statistic
Std. Deviation
100933.915
241564.266
798900.699
2047055.229
1041320.245
167116778.004
397444.434
173982.879
688226.730
8375717.824
986088.911
2634355.334
4418459.555
1911085.000
Statistic
Variance
1.057
1.233
1.453
1.536
1.440
1.899
.906
1.086
.850
.784
1.192
1.403
1.473
1.493
Statistic
.008
.006
.030
.052
.020
.099
.019
.017
.085
.055
.049
.043
.053
.038
Std. Error
Skewness
APPENDIX 8 – SPSS ANALYSIS OF COMMERCIAL CONSUMER POPULATION
Appendix
.247
.778
1.447
1.652
1.370
3.231
-.095
.304
-.121
-.488
.569
1.914
1.332
1.467
Statistic
116
.015
.011
.060
.104
.041
.197
.038
.033
.169
.110
.097
.087
.106
.075
Std. Error
Kurtosis
Appendix
Descriptives
Mean 95% Confidence Interval for Mean
Accomodation
Constr
4384.1866
36.27177
Lower Bound 4312.9548 Upper Bound
4455.4183 4355.4422
Median
4205.0000
Variance
809119.336
Std. Deviation
899.51061
Minimum
3073.00
Maximum
6320.00
Range
3247.00
Interquartile Range
1475.00
Skewness
.403
.099
Kurtosis
-.975
.197
Mean
5144.4787
74.17249
Lower Bound 4998.8162 Upper Bound
5290.1412
5% Trimmed Mean
5058.4483
Median
4686.0000
Variance
3383458.106
Std. Deviation
1839.41787
Minimum
2790.00
Maximum
9438.00
Range
6648.00
Interquartile Range
3086.00
Skewness
.570
.099
Kurtosis
-.852
.197
Mean
4958.1505
56.10256
95% Confidence Interval for Mean
Comm
Std. Error
5% Trimmed Mean
95% Confidence Interval for Mean
Agri
Statistic
Lower Bound 4847.9743 Upper Bound
5068.3267
5% Trimmed Mean
4874.4358
Median
4506.0000
Variance
1935710.994
Std. Deviation
1391.29831
Minimum
3337.00
Maximum
8406.42
Range
5069.42
Interquartile Range
2120.00
Skewness
.787
.099
Kurtosis
-.559
.197
Mean
2745.1382
28.32058
95% Confidence Interval for Mean 5% Trimmed Mean
Lower Bound 2689.5212 Upper Bound
2800.7551 2718.4182
117
Appendix
Median
2588.0000
Variance
493264.041
Std. Deviation
702.32759
Minimum
1797.00
Maximum
4282.00
Range
2485.00
Interquartile Range
1194.00
Skewness
.500
.099
Kurtosis
-.963
.197
Mean
7873.2278
69.29124
95% Confidence Interval for Mean
FinInst
Others
Upper Bound
8009.3043
5% Trimmed Mean
7830.5910
Median
7662.0000
Variance
2952784.828
Std. Deviation
1718.36691
Minimum
5279.14
Maximum
11326.00
Range
6046.86
Interquartile Range
2878.00
Skewness
.333
.099
Kurtosis
-1.066
.197
Mean
1528.8676
30.35496
95% Confidence Interval for Mean
Insurance
Lower Bound 7737.1512
Lower Bound 1469.2555 Upper Bound
1588.4797
5% Trimmed Mean
1483.3095
Median
1336.0000
Variance
566675.328
Std. Deviation
752.77841
Minimum
571.00
Maximum
3404.00
Range
2833.00
Interquartile Range
1110.00
Skewness
.812
.099
Kurtosis
-.298
.197
Mean
1677.1555
2.28503
95% Confidence Interval for Mean
Lower Bound 1672.6681 Upper Bound
1681.6429
5% Trimmed Mean
1676.8467
Median
1675.0000
Variance
3211.132
Std. Deviation
56.66685
Minimum
1582.00
Maximum
1775.00
Range
193.00
Interquartile Range
101.00
Skewness
.107
.099
118
Appendix
Kurtosis Mean 95% Confidence Interval for Mean
RealEst_BS
Upper Bound
3.73706
2469.0149 2461.0528
Median
2460.0000
Variance
8588.836
Std. Deviation
92.67597
Minimum
2304.00
Maximum
2635.00
Range
331.00
Interquartile Range
159.00
Skewness
.057
.099
Kurtosis
-1.143
.197
13901.5248
521.28169
Lower Bound 12877.8135 Upper Bound
14925.2361
5% Trimmed Mean
12227.1128
Median
8780.0400
Variance
167116778.004
Std. Deviation
12927.36547
Minimum
3436.00
Maximum
64401.00
Range
60965.00
Interquartile Range
11776.00
Skewness
1.899
.099
Kurtosis
3.231
.197
4168.5767
10.40691
Mean 95% Confidence Interval for Mean
Transport
2461.6759
5% Trimmed Mean
95% Confidence Interval for Mean
SocialServ
.197
Lower Bound 2454.3370
Mean
Recreation
-1.233
Lower Bound 4148.1393 Upper Bound
4189.0142
5% Trimmed Mean
4166.6939
Median
4155.0000
Variance
66606.821
Std. Deviation
258.08297
Minimum
3732.00
Maximum
4648.00
Range
916.00
Interquartile Range
416.50
Skewness
.092
.099
Kurtosis
-1.060
.197
3561.2750
51.83195
Mean 95% Confidence Interval for Mean
Lower Bound 3459.4856 Upper Bound
3663.0644
5% Trimmed Mean
3498.9004
Median
3200.0000
Variance
1652228.852
Std. Deviation
1285.39054
119
Appendix
Minimum
1944.00
Maximum
6462.00
Range
4518.00
Interquartile Range
2115.00
Skewness
.644
.099
Kurtosis
-.778
.197
Mean
3212.8681
19.05352
95% Confidence Interval for Mean
Wholesale
Upper Bound
3250.2861
5% Trimmed Mean
3206.0391
Median
3207.0000
Variance
223267.528
Std. Deviation
472.51193
Minimum
2508.00
Maximum
4072.00
Range
1564.00
Interquartile Range
850.00
Skewness
.159
.099
Kurtosis
-1.249
.197
Mean
2192.7849593
.47869557
95% Confidence Interval for Mean
Retail
Lower Bound 3175.4502
Lower Bound 2191.8448802 Upper Bound
2193.7250385
5% Trimmed Mean
2192.7574526
Median
2193.0000000
Variance
140.927
Std. Deviation
11.87126409
Minimum
2173.00000
Maximum
2213.00000
Range
40.00000
Interquartile Range
20.00000
Skewness
-.004
.099
Kurtosis
-1.192
.197
Mean
1345.3430894
.40233360
95% Confidence Interval for Mean
Lower Bound 1344.5529726 Upper Bound
1346.1332063
5% Trimmed Mean
1345.3148148
Median
1345.0000000
Variance
99.551
ResidenceShopHouse Std. Deviation
9.97754875
Minimum
1329.00000
Maximum
1362.00000
Range
33.00000
Interquartile Range
18.00000
Skewness
.047
.099
Kurtosis
-1.224
.197
120
Appendix
M-Estimators Huber’ s M-Estimator(a) Tukey’ s Biweight(b) Hampel’ s M-Estimator(c) Andrews’ Wave(d) Accomodation
4295.8960
4308.4887
4330.7159
4308.9908
Agri
4878.2555
4900.6929
4996.7142
4902.1424
Comm
4662.2349
4590.8779
4750.1928
4586.8340
Constr
2657.2294
2666.0026
2692.6373
2666.5333
FinInst
7734.1241
7762.3947
7797.5413
7762.9264
Insurance
1394.2954
1358.1028
1426.8799
1356.1655
Others
1675.5774
1675.9439
1676.5875
1675.9432
RealEst_BS
2460.2526
2460.6420
2460.6574
2460.6481
Recreation
9717.4700
8388.1238
9359.2572
8373.0856
SocialServ
4161.0956
4161.5281
4163.2350
4161.5554
Transport
3347.3957
3341.4444
3427.0310
3342.7681
Wholesale
3197.2604
3203.2365
3207.4810
3203.2567
Retail
2192.8732592
2192.8446759
2192.7912882
2192.8448297
ResidenceShopHouse 1345.1976033
1345.2203913
1345.2958213
1345.2201900
a The weighting constant is 1.339. b The weighting constant is 4.685. c The weighting constants are 1.700, 3.400, and 8.500 d The weighting constant is 1.340*pi.
Percentiles Percentiles
Tukey’ s Hinges
Weighted Average(Definition 1)
5
10
25
50
75
90
95
Accomodation
3161.6000
3257.6000
3621.0000
4205.0000
5096.0000
5719.6000
6006.8000
Agri
2911.8000
3032.0000
3536.0000
4686.0000
6622.0000
7914.8000
8594.4000
Comm
3439.0000
3496.4000
3781.0000
4506.0000
5901.0000
7218.6000
7656.0000
Constr
1855.8000
1929.0000
2134.0000
2588.0000
3328.0000
3837.8000
4007.0000
FinInst
5518.8000
5718.0000
6414.0000
7662.0000
9292.0000
10513.6000
10902.6000
Insurance
644.6000
700.2000
907.0000
1336.0000
2017.0000
2723.4000
3140.0000
Others
1593.0000
1601.0000
1627.0000
1675.0000
1728.0000
1759.0000
1767.0000
RealEst_BS
2321.0000
2335.0000
2379.0000
2460.0000
2538.0000
2593.0000
2610.4000
Recreation
3685.8000
4036.2000
5209.0000
8780.0400
16985.0000
32710.0000
44083.2000
SocialServ
3762.0000
3806.6000
3964.5000
4155.0000
4381.0000
4539.4000
4598.0000
Transport
2043.4000
2124.0000
2445.0000
3200.0000
4560.0000
5652.8000
6066.9760
Wholesale
2549.6000
2588.6000
2763.0000
3207.0000
3613.0000
3906.8000
3977.6000
Retail
2174.000000 2176.000000 0 0
2183.000000 2193.000000 0 0
2203.000000 2209.000000 0 0
2211.200000 0
ResidenceShopHou se
1330.000000 1332.000000 0 0
1337.000000 1345.000000 0 0
1355.000000 1359.000000 0 0
1361.000000 0
Accomodation
3622.0000
4205.0000
5094.5000
Agri
3539.5000
4686.0000
6621.5000
Comm
3781.0000
4506.0000
5892.0000
Constr
2136.0000
2588.0000
3326.0000
FinInst
6415.0000
7662.0000
9288.5650
Insurance
908.5000
1336.0000
2009.0000
121
Appendix
Others
1627.0000
1675.0000
1728.0000
RealEst_BS
2380.0000
2460.0000
2537.5000
Recreation
5216.5000
8780.0400
16841.5000
SocialServ
3965.5000
4155.0000
4380.5000
Transport
2445.0000
3200.0000
4555.0000
Wholesale
2764.0000
3207.0000
3612.0000
Retail
2183.000000 2193.000000 0 0
2203.000000 0
ResidenceShopHou se
1337.000000 1345.000000 0 0
1355.000000 0
Extreme Values
Highest
Accomodation
Lowest
Highest
Agri
Lowest
Highest
Comm
Lowest
Constr
Highest
Case Number
Value
1
1
6320.00
2
2
6296.00
3
3
6282.00
4
4
6279.00
5
5
6275.00
1
615
3073.00
2
614
3077.00
3
613
3082.00
4
612
3084.00
5
611
3086.00
1
1
9438.00
2
2
9430.00
3
3
9388.00
4
4
9316.00
5
5
9306.00
1
615
2790.00
2
614
2793.00
3
613
2799.00
4
612
2806.00
5
611
2810.00
1
1
8406.42
2
2
8374.42
3
3
8339.00
4
4
8332.00
5
5
8329.00(a)
1
615
3337.00
2
614
3346.00
3
613
3349.00
4
612
3352.00
5
611
3356.00
1
1
4282.00
2
2
4280.00
3
3
4271.00
122
Appendix
Lowest
Highest
FinInst
Lowest
Highest
Insurance
Lowest
Highest
Others
Lowest
Highest
RealEst_BS
Lowest
4
4
4264.00
5
5
4264.00
1
615
1797.00
2
614
1798.00
3
613
1799.00
4
612
1800.00
5
611
1801.00
1
1
11326.00
2
2
11306.00
3
3
11290.00
4
4
11281.00
5
5
11269.00
1
615
5279.14
2
614
5282.00
3
613
5291.00
4
612
5322.27
5
611
5328.00
1
1
3404.00
2
2
3386.00
3
3
3384.00
4
4
3380.00
5
5
3365.00
1
615
571.00
2
614
572.00
3
613
573.00
4
612
582.00
5
611
584.00
1
1
1775.00
2
2
1775.00
3
3
1775.00
4
4
1775.00
5
5
1775.00(b)
1
615
1582.00
2
614
1583.00
3
613
1584.00
4
612
1584.00
5
611
1584.00
1
1
2635.00
2
2
2634.00
3
3
2634.00
4
4
2633.00
5
5
2632.00(c)
1
615
2304.00
2
614
2304.00
3
613
2304.00
4
612
2304.00
5
611
2305.00
123
Appendix
Highest
Recreation
Lowest
Highest
SocialServ
Lowest
Highest
Transport
Lowest
Highest
Wholesale
Lowest
Retail Highest
Lowest
1
1
64401.00
2
2
64066.00
3
3
62920.00
4
4
61724.00
5
5
61286.00
1
615
3436.00
2
614
3441.00
3
613
3451.00
4
612
3461.00
5
611
3474.00
1
1
4648.00
2
2
4646.00
3
3
4646.00
4
4
4645.00
5
5
4644.00(d)
1
615
3732.00
2
614
3732.00
3
613
3733.00
4
612
3737.00
5
611
3737.00(e)
1
1
6462.00
2
2
6430.00
3
3
6428.00
4
4
6365.00
5
5
6329.00
1
615
1944.00
2
614
1949.00
3
613
1950.00
4
612
1960.00
5
611
1962.00
1
1
4072.00
2
2
4070.00
3
3
4067.00
4
4
4063.00
5
5
4052.00
1
615
2508.00
2
614
2511.00
3
613
2513.00
4
612
2514.00
5
611
2515.00
1
1
2213.00000
2
2
2213.00000
3
3
2213.00000
4
4
2213.00000
5
5
2213.00000(f)
1
615
2173.00000
2
614
2173.00000
124
Appendix
Highest
ResidenceShopHouse
Lowest
3
613
2173.00000
4
612
2173.00000
5
611
2173.00000(g)
1
1
1362.00000
2
2
1362.00000
3
3
1362.00000
4
4
1362.00000
5
5
1362.00000(h)
1
615
1329.00000
2
614
1329.00000
3
613
1329.00000
4
612
1329.00000
5
611
1329.00000(i)
a Only a partial list of cases with the value 8329.00 are shown in the table of upper extremes. b Only a partial list of cases with the value 1775.00 are shown in the table of upper extremes. c Only a partial list of cases with the value 2632.00 are shown in the table of upper extremes. d Only a partial list of cases with the value 4644.00 are shown in the table of upper extremes. e Only a partial list of cases with the value 3737.00 are shown in the table of lower extremes. f Only a partial list of cases with the value 2213.00000 are shown in the table of upper extremes. g Only a partial list of cases with the value 2173.00000 are shown in the table of lower extremes. h Only a partial list of cases with the value 1362.00000 are shown in the table of upper extremes. i Only a partial list of cases with the value 1329.00000 are shown in the table of lower extremes.
125
Appendix
126
Appendix
127
Appendix
128
Appendix
129
Appendix
130
Appendix
131
Appendix
132
Appendix
133
Appendix
134
Appendix
135
Appendix
136
Appendix
137
Appendix
138
Appendix
139
77599.00000
29444.00000
58110.00000
499
1122
465
297
760
1135
709
377
802
283
374
63
55
150
Iron
N.F.Metal
Transport
Food
Wood
Textile
Clay
Plastics
Machinery 265
288
Chemical
Rubber
Beverage
Furniture
Palm
Pharma
Miscel
Statistic
Maximum
12788.00000
8346.00000
12885.00000
1804.00000 31755.00000
1858.00000 33614.00000
3339.00000 32728.00000
1554.00000 29150.50000
3279.00000 61389.00000
1660.00000 31104.00000
4389.00000 81988.00000
1836.00000 34656.00000
682.00000
1976.00000 36840.00000
2148.00000 40506.00000
689.00000
447.00000
1286.00000 12774.00000
1597.00000 17923.00000
1604.00000 29736.00000
3675.00000 68926.00000
Statistic
Minimum Std. Error
157.58567909
81.58325710
93.39788511
196.02012166
297.93348705
10998.5110000 345.06204619
2939.6326936
2021.1313763
4891.1591889
6667.8895190
7286.0350101
21915.2558391 845.02917418
Statistic
Mean
Descriptive Statistics
103.59809326
10412.9127851 449.59583749
3045.6141608
407322.31000
669347.77000
4666848.85000
2236223.63000
5905107.27000
2303591.64000
458.21320267 418.91777422
Statistic
Variance
76205429.242
7609388.534
85067013.680
90491539.950
7375474.138
3094959.945
9787391.066
19173520.159
43938359.539
55639224.861
1024.86684597 7600.61595253
7847.15609725
8279.63977316
7047.28775771
57769362.858
61577858.815
68552434.773
49664264.740
16081.47580117 258613863.944
7459.17052096
19625.24014797 385150050.866
8729.57211104
2758.51201441
9223.17806833
9512.70413445
2715.78241727
1759.24982463
3128.48063216
4378.75783288
6628.60162773
17624.48078018 310622322.771
Statistic
Std. Deviation
Kurtosis
1.827
1.092
.903
1.338
.878
1.350
1.106
1.230
1.709
.992
1.294
1.522
1.556
.798
.853
1.531
1.057
.198
.322
.302
.126
.145
.144
.150
.086
.126
.092
.073
.089
.141
.113
.073
.109
.110
.117
9.673
2.532
.477
-.351
.933
-.349
1.006
.314
.591
2.363
-.051
.719
1.514
1.773
-.471
-.300
1.467
.012
.394
.634
.595
.252
.289
.286
.298
.172
.251
.183
.145
.177
.282
.226
.146
.218
.219
.234
Statistic Std. Error Statistic Std. Error
Skewness
19053.4095333 2910.93538009 35651.53177718 1271031718.059 3.048
7405.8601818
10624.5677778 988.64873964
12478.2054813 428.12998043
7901.8502827
20503.8446875 947.61004921
8692.7986415
19836009.06000 24733.1783791 692.99132112
3925668.12000
2159340.44000
13282012.93000 11702.2140352 273.76798705
8358868.36000
873070.91000
939826.09000
5487880.61000
3327276.87000
3606587.33000
9533136.29000
Statistic
Sum
197697.00000 1144.00000 198841.00000 2858011.43000
29951.00000
31756.00000
29389.00000
27596.50000
32820.00000
12203.00000
34864.00000
38358.00000
12099.00000
7899.00000
11488.00000
16326.00000
28132.00000
495
65251.00000
435
Printing
Statistic
Statistic
Electrical
Range
N
APPENDIX 9 – SPSS ANALYSIS OF INDUSTRIAL CONSUMER POPULATION
Appendix
140
Appendix
Descriptives
Mean 95% Confidence Interval for Mean
Electrical
Iron
4451.8107273
67.97878881
Lower Bound 4315.5214615 Upper Bound
4588.0999930 4448.5725253
Median
4428.0000000
Variance
254161.365
Std. Deviation
504.14419075
Minimum
3675.00000
Maximum
5280.00000
Range
1605.00000
Interquartile Range
885.25000
Skewness
.144
.322
Kurtosis
-1.324
.634
Mean
1756.5818182
13.36542705
Lower Bound 1729.7857503 Upper Bound
1783.3778861
5% Trimmed Mean
1755.0000000
Median
1758.0000000
Variance
9824.905
Std. Deviation
99.12065990
Minimum
1604.00000
Maximum
1933.00000
Range
329.00000
Interquartile Range
184.00000
Skewness
.104
.322
Kurtosis
-1.094
.634
Mean
1829.7341818
19.49354843
95% Confidence Interval for Mean
Chemical
Std. Error
5% Trimmed Mean
95% Confidence Interval for Mean
Printing
Statistic
Lower Bound 1790.6519703 Upper Bound
1868.8163933
5% Trimmed Mean
1827.7847475
Median
1841.0000000
Variance
20899.914
Std. Deviation
144.56802437
Minimum
1597.00000
Maximum
2091.00000
Range
494.00000
Interquartile Range
253.00000
Skewness
.070
.322
Kurtosis
-1.206
.634
Mean
1355.9272727
6.44985081
95% Confidence Interval for Mean 5% Trimmed Mean
Lower Bound 1342.9961004 Upper Bound
1368.8584450 1354.7121212
141
Appendix
Median
1352.0000000
Variance
2288.032
Std. Deviation
47.83337381
Minimum
1286.00000
Maximum
1454.00000
Range
168.00000
Interquartile Range
81.00000
Skewness
.292
.322
Kurtosis
-1.198
.634
Mean
509.2545455
4.82496441
95% Confidence Interval for Mean
N.F.Metal
Food
Upper Bound
518.9280167
5% Trimmed Mean
508.8888889
Median
511.0000000
Variance
1280.415
Std. Deviation
35.78289379
Minimum
447.00000
Maximum
581.00000
Range
134.00000
Interquartile Range
61.00000
Skewness
.004
.322
Kurtosis
-.829
.634
Mean
761.9818182
6.68434479
95% Confidence Interval for Mean
Transport
Lower Bound 499.5810742
Lower Bound 748.5805138 Upper Bound
775.3831226
5% Trimmed Mean
760.6313131
Median
762.0000000
Variance
2457.426
Std. Deviation
49.57242771
Minimum
689.00000
Maximum
860.00000
Range
171.00000
Interquartile Range
77.00000
Skewness
.343
.322
Kurtosis
-.940
.634
Mean
2342.1810909
15.09042663
95% Confidence Interval for Mean
Lower Bound 2311.9266071 Upper Bound
2372.4355747
5% Trimmed Mean
2341.8577778
Median
2342.0000000
Variance
12524.654
Std. Deviation
111.91359911
Minimum
2148.00000
Maximum
2529.00000
Range
381.00000
Interquartile Range
193.00000
Skewness
.120
.322
142
Appendix
Kurtosis Mean 95% Confidence Interval for Mean
Wood
Upper Bound
11.14788842
2153.4036252 2132.4684848
Median
2143.0000000
Variance
6835.148
Std. Deviation
82.67495326
Minimum
1976.00000
Maximum
2248.00000
Range
272.00000
Interquartile Range
151.00000
Skewness
-.208
.322
Kurtosis
-1.239
.634
745.8541818
5.29111927
Lower Bound 735.2461264 Upper Bound
756.4622372
5% Trimmed Mean
745.6662626
Median
748.0000000
Variance
1539.777
Std. Deviation
39.23999069
Minimum
682.00000
Maximum
811.00000
Range
129.00000
Interquartile Range
69.00000
Skewness
.055
.322
Kurtosis
-1.338
.634
2253.8727273
39.83511305
Mean 95% Confidence Interval for Mean
Plastics
2131.0534545
5% Trimmed Mean
95% Confidence Interval for Mean
Clay
.634
Lower Bound 2108.7032839
Mean
Textile
-1.103
Lower Bound 2174.0081342 Upper Bound
2333.7373204
5% Trimmed Mean
2247.7020202
Median
2226.0000000
Variance
87275.993
Std. Deviation
295.42510516
Minimum
1836.00000
Maximum
2806.00000
Range
970.00000
Interquartile Range
511.00000
Skewness
.187
.322
Kurtosis
-1.231
.634
4940.9527273
46.34177303
Mean 95% Confidence Interval for Mean
Lower Bound 4848.0430664 Upper Bound
5033.8623882
5% Trimmed Mean
4941.2656566
Median
4888.2000000
Variance
118115.796
Std. Deviation
343.67978700
143
Appendix
Minimum
4389.00000
Maximum
5485.00000
Range
1096.00000
Interquartile Range
610.00000
Skewness
.082
.322
Kurtosis
-1.264
.634
Mean
2141.3809091
38.71178173
95% Confidence Interval for Mean
Lower Bound 2063.7684597 Upper Bound
2140.2969697
Median
2093.0000000
Variance
82423.112
Machinery Std. Deviation
1660.00000
Maximum
2676.00000
Range
1016.00000
Interquartile Range
463.00000
Skewness
.078
.322
Kurtosis
-1.026
.634
Mean
4355.2545455
106.43387760
Furniture
Lower Bound 4141.8674687 Upper Bound
4568.6416222
5% Trimmed Mean
4325.0808081
Median
4278.0000000
Variance
623049.367
Std. Deviation
789.33476206
Minimum
3279.00000
Maximum
6034.00000
Range
2755.00000
Interquartile Range
1149.51000
Skewness
.491
.322
Kurtosis
-.818
.634
Mean
1872.3030909
27.26151011
95% Confidence Interval for Mean
Beverage
287.09425709
Minimum
95% Confidence Interval for Mean
Rubber
2218.9933585
5% Trimmed Mean
Lower Bound 1817.6470539 Upper Bound
1926.9591279
5% Trimmed Mean
1871.3822222
Median
1838.0000000
Variance
40875.446
Std. Deviation
202.17677007
Minimum
1554.00000
Maximum
2209.00000
Range
655.00000
Interquartile Range
360.00000
Skewness
.192
.322
Kurtosis
-1.331
.634
Mean
3964.2821818
44.55798626
95% Confidence Interval for Mean Lower Bound 3874.9487980
144
Appendix
Upper Bound 5% Trimmed Mean
3969.9970707
Median
3904.0000000
Variance
109197.778
Std. Deviation
330.45087028
Minimum
3339.00000
Maximum
4487.00000
Range
1148.00000
Interquartile Range
511.50000
Skewness
-.118
.322
Kurtosis
-1.001
.634
8335.4503636
692.76339913
Mean 95% Confidence Interval for Mean
Palm
Lower Bound 6946.5433731 Upper Bound
8087.4195960
Median
7110.0000000
Variance
26395661.995
Std. Deviation
5137.67087257
Minimum
1858.00000
Maximum
20110.50000
Range
18252.50000
Interquartile Range
8366.68000
Skewness
.708
.322
Kurtosis
-.595
.634
7405.8601818
1024.86684597
95% Confidence Interval for Mean
Miscel
9724.3573542
5% Trimmed Mean
Mean
Pharma
4053.6155656
Lower Bound 5351.1258692 Upper Bound
9460.5944944
5% Trimmed Mean
6516.2941414
Median
4247.0000000
Variance
57769362.858
Std. Deviation
7600.61595253
Minimum
1804.00000
Maximum
31755.00000
Range
29951.00000
Interquartile Range
6007.00000
Skewness
1.827
.322
Kurtosis
2.532
.634
1828.6145455
72.15800253
Mean 95% Confidence Interval for Mean
Lower Bound 1683.9464607 Upper Bound
1973.2826302
5% Trimmed Mean
1808.0262626
Median
1687.5000000
Variance
286372.753
Std. Deviation
535.13806921
Minimum
1144.00000
Maximum
2931.00000
Range
1787.00000
145
Appendix
Interquartile Range
1030.00000
Skewness
.519
.322
Kurtosis
-1.130
.634
M-Estimators Huber’ s M-Estimator(a) Tukey’ s Biweight(b) Hampel’ s M-Estimator(c) Andrews’ Wave(d) Electrical
4428.2964566
4436.1070569
4443.2743218
4436.1369112
Printing
1754.2797199
1755.8006041
1754.7529632
1755.8159596
Chemical
1828.2020183
1829.0494460
1827.8467075
1829.0767207
Iron
1352.5585991
1353.2754826
1354.3229822
1353.2826850
N.F.Metal
510.2069135
509.8548444
509.3234726
509.8576015
Transport
759.3717933
759.9186433
760.4256415
759.9268954
Food
2340.1705927
2340.6625832
2341.9161810
2340.6590213
Wood
2135.9657888
2135.1716904
2132.8591910
2135.1763935
Textile
745.8058557
746.0489920
745.8698738
746.0514925
Clay
2236.4578584
2244.4400821
2247.4641730
2244.4773229
Plastics
4928.9916702
4934.3530854
4940.2790892
4934.3420474
Machinery 2135.7940633
2135.1375921
2138.3129542
2135.1399637
Rubber
4266.4385281
4270.7740465
4289.1588276
4271.3239424
Beverage
1858.8278434
1862.5326016
1867.6401563
1862.5528781
Furniture
3968.4588358
3966.0105344
3972.1060867
3965.9296687
Palm
7411.6162055
7327.4042668
7717.0144006
7329.5405159
Pharma
4580.9142866
3726.3139591
4226.3668047
3707.9715678
Miscel
1750.6516979
1764.2288663
1796.7642084
1764.5262269
a The weighting constant is 1.339. b The weighting constant is 4.685. c The weighting constants are 1.700, 3.400, and 8.500 d The weighting constant is 1.340*pi.
Percentiles
Weighted Average(Definition 1)
Percentiles 5
10
25
50
75
90
95
Electrical
3729.600000 0
3813.900000 0
4031.000000 0
4428.000000 0
4916.2500000
5153.9520000
5250.0000000
Printing
1615.600000 0
1623.600000 0
1650.000000 0
1758.000000 0
1834.0000000
1901.6000000
1927.2000000
Chemical
1611.400000 0
1637.800000 0
1695.000000 0
1841.000000 0
1948.0000000
2024.8000000
2088.9040000
Iron
1293.000000 0
1294.800000 0
1312.000000 0
1352.000000 0
1393.0000000
1425.0000000
1436.4000000
N.F.Metal
449.6000000
457.8000000
476.0000000
511.0000000
537.0000000
556.8000000
574.2000000
Transport
691.8000000
698.8000000
719.0000000
762.0000000
796.0000000
835.8000000
854.2000000
Food
2177.000000 0
2193.600000 0
2260.000000 0
2342.000000 0
2453.0000000
2510.8760000
2527.4000000
Wood
1996.600000 0
2000.000000 0
2057.000000 0
2143.000000 0
2208.0000000
2243.1760000
2248.0000000
Textile
690.6000000
694.2000000
710.0000000
748.0000000
779.0000000
801.4000000
806.8000000
146
Tukey’ s Hinges
Appendix
Clay
1849.200000 0
1864.200000 0
1968.000000 0
2226.000000 0
2479.0000000
2676.2000000
2745.6000000
Plastics
4417.000000 0
4490.200000 0
4623.000000 0
4888.200000 0
5233.0000000
5444.8700000
5468.0000000
Machiner y
1675.800000 0
1741.100000 0
1932.000000 0
2093.000000 0
2395.0000000
2560.6000000
2604.6000000
Rubber
3312.400000 0
3432.200000 0
3705.000000 0
4278.000000 0
4854.5100000
5636.8000000
5838.6000000
Beverage
1570.400000 0
1637.600000 0
1683.000000 0
1838.000000 0
2043.0000000
2170.0000000
2186.4000000
Furniture
3362.800000 0
3454.000000 0
3721.000000 0
3904.000000 0
4232.5000000
4422.8200000
4460.5760000
Palm
2126.800000 0
2565.200000 0
4133.320000 0
7110.000000 0
12500.000000 0
16711.600000 0
18618.800000 0
Pharma
1817.800000 0
1981.400000 0
2448.000000 0
4247.000000 0
8455.0000000
23159.000000 0
25896.200000 0
Miscel
1193.200000 0
1219.600000 0
1358.000000 0
1687.500000 0
2388.0000000
2587.8000000
2792.8000000
Electrical
4031.500000 0
4428.000000 0
4878.1250000
Printing
1658.000000 0
1758.000000 0
1831.5000000
Chemical
1697.500000 0
1841.000000 0
1945.0000000
Iron
1313.500000 0
1352.000000 0
1391.0000000
N.F.Metal
481.5000000
511.0000000
536.5000000
Transport
719.5000000
762.0000000
794.0000000
Food
2260.000000 0
2342.000000 0
2447.7500000
Wood
2058.000000 0
2143.000000 0
2207.0000000
Textile
710.5000000
748.0000000
777.0000000
Clay
1982.000000 0
2226.000000 0
2471.7500000
Plastics
4626.500000 0
4888.200000 0
5218.0000000
Machiner y
1942.000000 0
2093.000000 0
2375.5000000
Rubber
3709.500000 0
4278.000000 0
4842.3750000
Beverage
1690.000000 0
1838.000000 0
2035.0000000
Furniture
3732.500000 0
3904.000000 0
4223.2500000
Palm
4151.660000 0
7110.000000 0
11751.000000 0
Pharma
2491.500000 0
4247.000000 0
8370.0000000
Miscel
1371.500000 0
1687.500000 0
2371.0000000
Extreme Values Electrical
Highest
1
Case Number
Value
55
5280.00000
147
Appendix
Lowest
Highest
Printing
Lowest
Highest
Chemical
Lowest
Highest
Iron
Lowest
N.F.Metal Highest
Lowest
2
54
5270.00000
3
53
5245.00000
4
52
5235.00000
5
51
5167.50000
1
1
3675.00000
2
2
3680.00000
3
3
3742.00000
4
4
3793.00000
5
5
3804.00000
1
55
1933.00000
2
54
1932.00000
3
53
1926.00000
4
52
1920.00000
5
51
1913.00000
1
1
1604.00000
2
2
1614.00000
3
3
1616.00000
4
4
1617.00000
5
5
1620.00000
1
54
2091.00000
2
55
2091.00000
3
53
2088.38000
4
52
2036.00000
5
51
2026.00000
1
1
1597.00000
2
2
1605.00000
3
3
1613.00000
4
4
1617.00000
5
5
1630.00000
1
55
1454.00000
2
54
1438.00000
3
53
1436.00000
4
52
1435.00000
5
50
1425.00000(a)
1
1
1286.00000
2
5
1293.00000
3
4
1293.00000
4
3
1293.00000
5
2
1293.00000
1
55
581.00000
2
54
575.00000
3
53
574.00000
4
52
566.00000
5
51
558.00000
1
1
447.00000
2
2
448.00000
3
3
450.00000
148
Appendix
Highest
Transport
Lowest
Highest
Food
Lowest
Highest
Wood
Lowest
Highest
Textile
Lowest
Clay Highest
4
4
453.00000
5
5
456.00000
1
55
860.00000
2
54
859.00000
3
53
853.00000
4
52
839.00000
5
51
837.00000
1
1
689.00000
2
2
691.00000
3
3
692.00000
4
4
693.00000
5
5
694.00000
1
54
2529.00000
2
55
2529.00000
3
53
2527.00000
4
52
2514.00000
5
51
2513.00000
1
1
2148.00000
2
2
2161.00000
3
3
2181.00000
4
4
2182.00000
5
5
2187.00000
1
53
2248.00000
2
54
2248.00000
3
55
2248.00000
4
52
2247.00000
5
51
2246.44000
1
1
1976.00000
2
2
1995.00000
3
3
1997.00000
4
4
1997.50000
5
6
2000.00000(b)
1
55
811.00000
2
54
810.00000
3
53
806.00000
4
52
803.00000
5
51
802.00000
1
1
682.00000
2
2
685.00000
3
4
692.00000
4
3
692.00000
5
5
693.00000
1
55
2806.00000
2
54
2772.00000
3
53
2739.00000
4
52
2724.00000
5
51
2693.00000
149
Appendix
Lowest
Highest
Plastics
Lowest
Highest
Machinery
Lowest
Highest
Rubber
Lowest
Highest
Beverage
Lowest
Furniture
Highest
1
1
1836.00000
2
2
1846.00000
3
3
1850.00000
4
4
1856.00000
5
5
1860.00000
1
55
5485.00000
2
53
5468.00000
3
54
5468.00000
4
52
5467.00000
5
51
5450.00000
1
1
4389.00000
2
2
4401.00000
3
3
4421.00000
4
4
4461.00000
5
5
4483.00000
1
55
2676.00000
2
54
2619.00000
3
52
2601.00000
4
53
2601.00000
5
51
2590.00000
1
1
1660.00000
2
2
1667.00000
3
3
1678.00000
4
4
1688.00000
5
5
1724.00000
1
55
6034.00000
2
54
5989.00000
3
53
5801.00000
4
52
5706.00000
5
51
5692.00000
1
1
3279.00000
2
2
3310.00000
3
3
3313.00000
4
4
3328.00000
5
5
3374.00000
1
55
2209.00000
2
54
2204.00000
3
52
2182.00000
4
53
2182.00000
5
51
2176.00000
1
1
1554.00000
2
2
1560.00000
3
3
1573.00000
4
4
1593.00000
5
5
1622.00000
1
55
4487.00000
2
54
4470.00000
150
Appendix
Lowest
Highest
Palm
Lowest
Highest
Pharma
Lowest
Highest
Miscel
Lowest
3
53
4458.22000
4
52
4433.00000
5
51
4425.55000
1
1
3339.00000
2
2
3358.00000
3
3
3364.00000
4
4
3432.00000
5
5
3439.00000
1
55
20110.50000
2
54
18634.00000
3
53
18615.00000
4
52
18047.00000
5
51
16714.00000
1
1
1858.00000
2
2
1926.00000
3
3
2177.00000
4
4
2369.00000
5
5
2480.00000
1
55
31755.00000
2
54
29237.00000
3
53
25061.00000
4
52
24816.51000
5
51
23579.00000
1
1
1804.00000
2
2
1809.00000
3
3
1820.00000
4
4
1828.00000
5
5
1845.50000
1
55
2931.00000
2
54
2828.00000
3
53
2784.00000
4
52
2721.50000
5
51
2592.00000
1
1
1144.00000
2
2
1190.00000
3
3
1194.00000
4
4
1198.00000
5
5
1208.50000
a Only a partial list of cases with the value 1425.00000 are shown in the table of upper extremes. b Only a partial list of cases with the value 2000.00000 are shown in the table of lower extremes.
151
Appendix
152
Appendix
153
Appendix
154
Appendix
155
Appendix
156
Appendix
157
Appendix
158
Appendix
159
Appendix
160
Appendix
161
Appendix
162
Appendix
163
Appendix
164
Appendix
165
Appendix
166
Appendix
167
Appendix
168
Appendix
169
Appendix
APPENDIX 10 – SPSS ANALYSIS OF DOMESTIC CONSUMER SAMPLES Descriptives
Q6
Mean 95% Confidence Interval for Mean
Q8
Std. Error
16.0623
.48646
Lower Bound
15.1083
Upper Bound
17.0163
5% Trimmed Mean
14.2467
Median
10.0000
Variance
481.098
Std. Deviation
21.93395
Minimum
.00
Maximum
500.00
Range
500.00
Interquartile Range
18.00
Skewness
6.973
.054
Kurtosis
125.069
.109
Mean
7.1957
.35700
95% Confidence Interval for Mean
Q12
Statistic
Lower Bound
6.4956
Upper Bound
7.8959
5% Trimmed Mean
5.1044
Median
3.0000
Variance
259.106
Std. Deviation
16.09676
Minimum
.00
Maximum
520.00
Range
520.00
Interquartile Range
9.90
Skewness
16.983
.054
Kurtosis
509.068
.109
Mean
15.6387
.47320
95% Confidence Interval for Mean
Lower Bound
14.7107
Upper Bound
16.5667
5% Trimmed Mean
13.8662
Median
10.0000
Variance
455.219
Std. Deviation
21.33586
Minimum
.00
Maximum
500.00
Range
500.00
Interquartile Range
18.00
Skewness
7.194
.054
Kurtosis
135.886
.109
170
Appendix
Q14
Mean 95% Confidence Interval for Mean
7.2745 Lower Bound
6.7626
Upper Bound
7.7863
.26100
5% Trimmed Mean
5.4525
Median
3.0000
Variance
138.494
Std. Deviation
11.76837
Minimum
.00
Maximum
100.00
Range
100.00
Interquartile Range
9.90
Skewness
3.145
.054
Kurtosis
13.668
.109
171
Appendix
APPENDIX 11 – SPSS ANALYSIS OF COMMERCIAL CONSUMER SAMPLES Descriptives
Avg_Case_1
Mean 95% Confidence Interval for Mean
Avg_Case_2
Std. Error
2083.4080
645.89899
Lower Bound
793.8290
Upper Bound
3372.9869
5% Trimmed Mean
1261.8297
Median
450.0000
Variance
27951428.824
Std. Deviation
5286.91109
Minimum
.00
Maximum
40000.00
Range
40000.00
Interquartile Range
2450.00
Skewness
5.907
.293
Kurtosis
41.035
.578
Mean
3753.1095
1255.13498
95% Confidence Interval for Mean
Avg_Case_3
Statistic
Lower Bound
1247.1516
Upper Bound
6259.0673
5% Trimmed Mean
2184.6324
Median
900.0000
Variance
105549376.591
Std. Deviation
10273.72263
Minimum
.00
Maximum
80000.00
Range
80000.00
Interquartile Range
4400.00
Skewness
6.469
.293
Kurtosis
47.359
.578
Mean
6018.6070
1674.42030
95% Confidence Interval for Mean
Lower Bound
2675.5190
Upper Bound
9361.6950
5% Trimmed Mean
3752.0177
Median
1860.0000
Variance
187846783.215
Std. Deviation
13705.72082
Minimum
.00
Maximum
100000.0
Range
100000.00
Interquartile Range
6033.33
Skewness
5.394
.293
Kurtosis
34.561
.578
172
Appendix
Avg_Case_4
Mean 95% Confidence Interval for Mean
Avg_Case_5
Lower Bound
5805.0432
Upper Bound
15828.7379
2510.23574
5% Trimmed Mean
7248.1758
Median
3500.0000
Variance
422185991.570
Std. Deviation
20547.16505
Minimum
.00
Maximum
110000.0
Range
110000.00
Interquartile Range
11433.33
Skewness
3.544
.293
Kurtosis
13.934
.578
Mean
74.6518
41.99967
95% Confidence Interval for Mean
Avg_Case_6
10816.8905
Lower Bound
-9.2033
Upper Bound
158.5068
5% Trimmed Mean
10.4755
Median
.0000
Variance
118186.156
Std. Deviation
343.78213
Minimum
.00
Maximum
2000.00
Range
2000.00
Interquartile Range
.00
Skewness
5.483
.293
Kurtosis
29.514
.578
Mean
1228.8060
258.94551
95% Confidence Interval for Mean
Lower Bound
711.8046
Upper Bound
1745.8074
5% Trimmed Mean
918.9608
Median
250.0000
Variance
4492536.011
Std. Deviation
2119.56033
Minimum
.00
Maximum
10000.00
Range
10000.00
Interquartile Range
1220.00
Skewness
2.615
.293
Kurtosis
7.513
.578
173
Appendix
APPENDIX 12 – SPSS ANALYSIS OF INDUSTRIAL CONSUMER SAMPLES Descriptives
Avg_Case_1
Mean 95% Confidence Interval for Mean
Avg_Case_2
Std. Error
184607.6821
68928.92954
Lower Bound
46820.7153
Upper Bound
322394.6490
5% Trimmed Mean
102809.2412
Median
20000.0000
Variance
299325431599.365
Std. Deviation
547106.41707
Minimum
.00
Maximum
4200000
Range
4200000.00
Interquartile Range
170000.00
Skewness
6.623
.302
Kurtosis
48.468
.595
Mean
342365.1156
136260.81998
95% Confidence Interval for Mean
Avg_Case_3
Statistic
Lower Bound
69983.6176
Upper Bound
614746.6136
5% Trimmed Mean
180535.1667
Median
40000.0000
Variance
1169721696956.344
Std. Deviation
1081536.72936
Minimum
.00
Maximum
8400000
Range
8400000.00
Interquartile Range
323333.33
Skewness
6.913
.302
Kurtosis
51.637
.595
Mean
613499.5593
271212.17388
95% Confidence Interval for Mean
Lower Bound
71354.1486
Upper Bound
1155644.9699
5% Trimmed Mean
281923.0259
Median
74175.5000
Variance
4634030725581.560
Std. Deviation
2152679.89390
Minimum
.00
Maximum
16800000
Range
16800000.00
Interquartile Range
484897.00
Skewness
7.111
.302
Kurtosis
53.698
.595
174
Appendix
Avg_Case_4
Mean 95% Confidence Interval for Mean
Avg_Case_5
Lower Bound
45644.7521
Upper Bound
2211549.4849
541754.77599
5% Trimmed Mean
468619.1676
Median
144000.0000
Variance
18490388950329.600
Std. Deviation
4300045.22654
Minimum
.00
Maximum
33600000
Range
33600000.00
Interquartile Range
780000.00
Skewness
7.203
.302
Kurtosis
54.675
.595
Mean
37370.4769
12800.17200
95% Confidence Interval for Mean
Avg_Case_6
1128597.1185
Lower Bound
11783.2976
Upper Bound
62957.6561
5% Trimmed Mean
21199.4129
Median
66.6667
Variance
10322197407.015
Std. Deviation
101598.21557
Minimum
.00
Maximum
720000.0
Range
720000.00
Interquartile Range
25000.00
Skewness
5.280
.302
Kurtosis
33.494
.595
Mean
137347.5631
68600.85324
95% Confidence Interval for Mean
Lower Bound
216.4114
Upper Bound
274478.7148
5% Trimmed Mean
48851.7897
Median
15000.0000
Variance
296482855145.357
Std. Deviation
544502.39223
Minimum
.00
Maximum
4200000
Range
4200000.00
Interquartile Range
94200.00
Skewness
6.983
.302
Kurtosis
52.008
.595
175
Appendix
APPENDIX 13 – STANDARD NORMAL (Z) TABLE Area between 0 and z
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0
0.00 0.0000 0.0398 0.0793 0.1179 0.1554 0.1915 0.2257 0.2580 0.2881 0.3159 0.3413 0.3643 0.3849 0.4032 0.4192 0.4332 0.4452 0.4554 0.4641 0.4713 0.4772 0.4821 0.4861 0.4893 0.4918 0.4938 0.4953 0.4965 0.4974 0.4981 0.4987
0.01 0.0040 0.0438 0.0832 0.1217 0.1591 0.1950 0.2291 0.2611 0.2910 0.3186 0.3438 0.3665 0.3869 0.4049 0.4207 0.4345 0.4463 0.4564 0.4649 0.4719 0.4778 0.4826 0.4864 0.4896 0.4920 0.4940 0.4955 0.4966 0.4975 0.4982 0.4987
0.02 0.0080 0.0478 0.0871 0.1255 0.1628 0.1985 0.2324 0.2642 0.2939 0.3212 0.3461 0.3686 0.3888 0.4066 0.4222 0.4357 0.4474 0.4573 0.4656 0.4726 0.4783 0.4830 0.4868 0.4898 0.4922 0.4941 0.4956 0.4967 0.4976 0.4982 0.4987
0.03 0.0120 0.0517 0.0910 0.1293 0.1664 0.2019 0.2357 0.2673 0.2967 0.3238 0.3485 0.3708 0.3907 0.4082 0.4236 0.4370 0.4484 0.4582 0.4664 0.4732 0.4788 0.4834 0.4871 0.4901 0.4925 0.4943 0.4957 0.4968 0.4977 0.4983 0.4988
0.04 0.0160 0.0557 0.0948 0.1331 0.1700 0.2054 0.2389 0.2704 0.2995 0.3264 0.3508 0.3729 0.3925 0.4099 0.4251 0.4382 0.4495 0.4591 0.4671 0.4738 0.4793 0.4838 0.4875 0.4904 0.4927 0.4945 0.4959 0.4969 0.4977 0.4984 0.4988
0.05 0.0199 0.0596 0.0987 0.1368 0.1736 0.2088 0.2422 0.2734 0.3023 0.3289 0.3531 0.3749 0.3944 0.4115 0.4265 0.4394 0.4505 0.4599 0.4678 0.4744 0.4798 0.4842 0.4878 0.4906 0.4929 0.4946 0.4960 0.4970 0.4978 0.4984 0.4989
0.06 0.0239 0.0636 0.1026 0.1406 0.1772 0.2123 0.2454 0.2764 0.3051 0.3315 0.3554 0.3770 0.3962 0.4131 0.4279 0.4406 0.4515 0.4608 0.4686 0.4750 0.4803 0.4846 0.4881 0.4909 0.4931 0.4948 0.4961 0.4971 0.4979 0.4985 0.4989
0.07 0.0279 0.0675 0.1064 0.1443 0.1808 0.2157 0.2486 0.2794 0.3078 0.3340 0.3577 0.3790 0.3980 0.4147 0.4292 0.4418 0.4525 0.4616 0.4693 0.4756 0.4808 0.4850 0.4884 0.4911 0.4932 0.4949 0.4962 0.4972 0.4979 0.4985 0.4989
0.08 0.0319 0.0714 0.1103 0.1480 0.1844 0.2190 0.2517 0.2823 0.3106 0.3365 0.3599 0.3810 0.3997 0.4162 0.4306 0.4429 0.4535 0.4625 0.4699 0.4761 0.4812 0.4854 0.4887 0.4913 0.4934 0.4951 0.4963 0.4973 0.4980 0.4986 0.4990
0.09 0.0359 0.0753 0.1141 0.1517 0.1879 0.2224 0.2549 0.2852 0.3133 0.3389 0.3621 0.3830 0.4015 0.4177 0.4319 0.4441 0.4545 0.4633 0.4706 0.4767 0.4817 0.4857 0.4890 0.4916 0.4936 0.4952 0.4964 0.4974 0.4981 0.4986 0.4990
Source: Frank, H. & Althoen, S.C., “ Statistics Concepts and Applications” , Cambridge University Press, 1994, pp724-725.
176
Appendix
APPENDIX 14 – LIST OF PUBLICATIONS [1]
A.H. Hashim, D.A. Sen, R.A. Rahman, “ Value of Lost Load - A Critical Parameter for Optimum Utility Asset Investment” , in Proc. Uniten SCOReD 2005, December 2005.
[2]
A.H. Hashim, Z.F. Hussein, D.A. Sen, et al., “ Stratification & Sampling of Electricity Supply Customers for Outage Costs Survey” , in Proc. IEEE PECon 2006, November 2006.
177