stratification & sampling of electricity consumer data for ...

Viewer
Transcript

STRATIFICATION & SAMPLING OF ELECTRICITY CONSUMER DATA FOR OUTAGE COST DETERMINATION

DANIEL ANDREW SEN

COLLEGE OF GRADUATE STUDIES UNIVERSITI TENAGA NASIONAL 2007

STRATIFICATION & SAMPLING OF ELECTRICITY CONSUMER DATA FOR OUTAGE COST DETERMINATION

by

DANIEL ANDREW SEN

Project Supervisor: Dr Amir Hisham Hashim

A Dissertation Submitted in Partial Fulfillment of the Requirement for the Degree of Master Of Electrical Engineering College Of Graduate Studies Universiti Tenaga Nasional

JULY 2007

COPYRIGHT © 2007 Attention is drawn to the fact that copyright of this dissertation rests with its author.

ii

APPROVAL SHEET

This dissertation, entitled:

“STRATIFICATION & SAMPLING OF ELECTRICITY CONSUMER DATA FOR OUTAGE COST DETERMINATION”

Submitted by: Daniel Andrew Sen (SE 20240)

In partial Fulfillment of the requirement for the Degree of Master Of Electrical Engineering, College Of Graduate Studies, Universiti Tenaga Nasional has been accepted.

Project Supervisor: Dr. Amir Hisham Hashim

1 July 2007

iii

DECLARATION

I hereby declare that this dissertation, submitted to Universiti Tenaga Nasional as a partial fulfillment of the requirements for the degree of Master of Electrical Engineering has not been submitted as an exercise for a similar degree at any other university. I also certify that the work described here is my own except for excerpts and summaries whose sources are appropriately cited in the references. This dissertation may be made available within the university library and may be photocopied or loaned to other libraries for the purposes of consultation.

1 July 2007

Daniel Andrew Sen

iv

DEDICATION To my family for their love and support throughout the time I was writing this dissertation.

v

ACKNOWLEDGEMENT

Every day you may make progress. Every step may be fruitful. Yet there will stretch out before you an ever-lengthening ever-ascending, ever-improving path. You know you will never get to the end of the journey. But this, so far from discouraging, only adds to the joy and glory of the climb. – Winston Churchil – Completing the project work and writing this dissertation was certainly a unique and challenging task. The experience of writing this dissertation has certainly brought out the true meaning of excellence, perseverance, and teamwork. Many people contributed their help, thoughts, ideas, and support during the preparation of this dissertation. Specifically, I would like to thank my supervisor, Dr. Amir Hisham Hashim, whose keen guidance led us through the design, implementation, analysis, and finally, the compilation of this dissertation. I am also thankful for my research team members’ dedication and creativity in accomplishing our research objectives. Finally, I wish to express my heartfelt love and gratitude to my family for the many sacrifices they made to allow me the opportunity to complete my Masters Degree. And above all else, I thank God for His continued wisdom and sustenance. Daniel Andrew Sen Universiti Tenaga Nasional 1 July 2007

vi

ABSTRACT

In today’s demanding business environment, determining the Value of Lost Load (VoLL) is important in order for utilities to make the right decision when it embarks on any form of asset expansion or when assessing risks in power system operation. The VoLL is the aggregated or average value of outage costs across the whole range of consumers in the electricity supply industry (ESI).

This dissertation covers the initial process in

determining the VoLL from consumer survey data. It discusses the data stratification method used to categorize consumers according to a set of criteria and then the normalization process of consumer survey data. The questionnaire development process is then discussed with respect to the various consumer strata. Finally, a demonstration of the stratification method on consumers is shown and evaluated. Consumers are first stratified into 3 major strata: Domestic, Commercial, and Industrial. Each major stratum is then stratified into minor strata.

Sampling size is then calculated for each stratum for a

confidence interval (CI) of 90% and a precision factor r of 0.1.

vii

TABLE OF CONTENTS Approval Sheet.................................................................................................................... iii Declaration ...........................................................................................................................iv Dedication .............................................................................................................................v Acknowledgement................................................................................................................vi Abstract ...............................................................................................................................vii Table of Contents .............................................................................................................. viii List of Tables.......................................................................................................................xii List of Figures ................................................................................................................... xiii List of Equations ................................................................................................................xiv Glossary of Terms .............................................................................................................xvii CHAPTER 1

Introduction ...............................................................................................1

1.1

Preface...................................................................................................................1

1.2

Background of Asset Optimization.......................................................................3

1.3

Objectives..............................................................................................................5

1.4

Brief Methodology ................................................................................................6

1.5

Scope of Work.......................................................................................................7

1.6

Layout of Dissertation...........................................................................................7

CHAPTER 2

Determination of Value of Lost Load .......................................................9

2.1

Determination of Consumer Damage Function ....................................................9

2.2

Determination of Value of Loss Load.................................................................10

2.2.1

VoLL Formulation ......................................................................................11

2.3

Tenaga Nasional Berhad Business Code ............................................................12

2.4

TNB Consumers in Peninsular Malaysia ............................................................12

2.4.1

Domestic Consumers ..................................................................................12

2.4.2

Commercial Consumers ..............................................................................13

2.4.3

Industrial Consumers ..................................................................................13

CHAPTER 3 3.1

Statistical Analysis in Engineering .........................................................15

Statistical Analysis ..............................................................................................15

3.1.1

Etymology ...................................................................................................16

3.1.2

Origins in probability ..................................................................................17

3.1.3

Statistics today ............................................................................................18

viii

3.1.4

Conceptual overview...................................................................................19

3.1.5

Statistical methods ......................................................................................20

3.1.5.1

Experimental and observational studies..................................................21

3.1.5.2

Levels of measurement ...........................................................................22

3.2

Statistical techniques...........................................................................................23

3.2.1

Specialized disciplines ................................................................................23

3.2.2

Criticism......................................................................................................24

3.3

Normal Distribution ............................................................................................26

3.3.1

Overview .....................................................................................................27

3.3.2

History.........................................................................................................28

3.3.3

Characterization of the normal distribution ................................................28

3.3.4

Probability Density Function ......................................................................29

3.3.5

Cumulative Distribution Function ..............................................................30

3.3.6

Generating functions ...................................................................................31

3.3.6.1

Moment generating function ...................................................................31

3.3.6.2

Cumulant generating function.................................................................32

3.3.6.3

Characteristic function ............................................................................32

3.3.7

Properties ....................................................................................................32

3.3.8

Standardizing Normal Random Variables ..................................................33

3.3.9

Moments......................................................................................................34

3.3.10

Generating Values for Normal Random Variables .....................................34

3.3.11

The Central Limit Theorem ........................................................................34

3.3.12

Infinite divisibility.......................................................................................36

3.3.13

Stability .......................................................................................................36

3.3.14

Standard deviation.......................................................................................36

3.3.15

Normality tests ............................................................................................37

3.3.16

Occurrence ..................................................................................................38

3.3.17

Measurement errors.....................................................................................39

3.3.18

Physical characteristics of biological specimens ........................................40

3.3.19

Financial variables ......................................................................................41

3.3.20

Distribution in testing and intelligence .......................................................41

3.3.21

Numerical approximations of the normal distribution ................................42

3.4 3.4.1

Confidence Interval.............................................................................................42 Practical Example........................................................................................43 ix

3.4.2

Theoretical Example ...................................................................................46

3.4.3

Interpretations of Confidence Intervals.......................................................47

3.4.4

Confidence Intervals in Measurement ........................................................49

3.4.5

Robust Confidence Intervals .......................................................................50

3.4.6

Confidence Intervals for Proportions and Related Quantities.....................52

3.5

Boxplot................................................................................................................53

3.5.1

Construction ................................................................................................53

3.5.2

An Example.................................................................................................54

3.5.3

Visualization ...............................................................................................55

3.6

Outliers................................................................................................................56

3.6.1

An Example.................................................................................................57

3.6.2

Mild outliers ................................................................................................58

3.6.3

Extreme outliers ..........................................................................................58

3.6.4

Occurrence and causes ................................................................................59

3.6.5

Non-normal distributions ............................................................................59

3.7

Extreme Values ...................................................................................................59

3.7.1 3.8

Extreme values in abstract spaces with order .............................................60 Probability of a z value .......................................................................................61

3.8.1 3.9

Critical z for a given probability .................................................................61 Precision Factor...................................................................................................62

CHAPTER 4

Determination of Consumer Stratification & Sample Size....................64

4.1

Introduction .........................................................................................................64

4.2

Stratification........................................................................................................65

4.2.1

Stratified Sampling .....................................................................................66

4.3

Preprocessing of Population Data with SPSS Software......................................67

4.4

Sampling with SPSS Software ............................................................................69

4.4.1

Domestic Consumers ..................................................................................71

4.4.2

Commercial Consumers ..............................................................................72

4.4.3

Industrial Consumers ..................................................................................74

4.5

Normalization of Consumer Survey Data ...........................................................78

4.6

Preprocessing of Sample Data ............................................................................80

4.6.1

Domestic Consumer Data ...........................................................................80

4.6.2

Commercial Consumer Data .......................................................................82

4.6.3

Industrial Consumer Data ...........................................................................83 x

CHAPTER 5

Conclusion ..............................................................................................86

References ...........................................................................................................................88 Appendix 1 – TNB Business Codes....................................................................................91 Appendix 2 – List of Commercial Consumers Interviewed..............................................101 Appendix 3 – List of Industrial Consumers Interviewed ..................................................103 Appendix 4 – Domestic Questionnaire .............................................................................105 Appendix 5 – Commercial Questionnaire.........................................................................106 Appendix 6 – Industrial Questionnaire .............................................................................107 Appendix 7 – SPSS Analysis of Domestic Consumer Population....................................108 Appendix 8 – SPSS Analysis of Commercial Consumer Population ...............................116 Appendix 9 – SPSS Analysis of Industrial Consumer Population....................................140 Appendix 10 – SPSS Analysis of Domestic Consumer Samples .....................................170 Appendix 11 – SPSS Analysis of Commercial Consumer Samples .................................172 Appendix 12 – SPSS Analysis of Industrial Consumer Samples .....................................174 Appendix 13 – Standard Normal (Z) Table ......................................................................176 Appendix 14 – List of Publications...................................................................................177

xi

LIST OF TABLES Table 1: Major Differences between Probabilistic Risk and Deterministic Risk .................3 Table 2: Probability Distributions.......................................................................................26 Table 3: Some of the first few moments of the normal distribution ...................................34 Table 4: Number of Samples needed for each Major Strata (90% CI, r=0.1) ....................70 Table 5: List of Domestic Business Codes .........................................................................71 Table 6: Number of Sample Selection for a Proportional Stratified Random Sample .......72 Table 7: List of Commercial Business Codes .....................................................................72 Table 8: Boundaries between cells......................................................................................76 Table 9: Number of Sample Selection for a Proportional Stratified Random Sample .......77 Table 10: Domestic Statistics Range...................................................................................80 Table 11: Commercial Statistics Range ..............................................................................82 Table 12: Industrial Statistics Range...................................................................................84

xii

LIST OF FIGURES Figure 1: Utility Asset Investment: Balancing Reliability Cost and Reliability Worth........4 Figure 2: Example Consumer Damage Functions. .............................................................10 Figure 3: Example of determination of VoLL ....................................................................10 Figure 4: A Graph of a Bell Curve in a Normal Distribution .............................................16 Figure 5: PDF of Gaussian distribution (bell curve)...........................................................27 Figure 6: CDF of Gaussian distribution ..............................................................................30 Figure 7: Plot of the PDF of a normal distribution with

= 12 and

= 3..........................35

Figure 8: Standard Deviation Segment Sizes in Standard Normal Distribution.................37 Figure 9: Histogram with CIs..............................................................................................42 Figure 10: Figure shows 50 realizations of a confidence interval for ..............................46 Figure 11: Boxplot and PDF of a Normal N(0,1 2) Population .........................................56 Figure 12: An example of an outlier in a histogram ...........................................................57 Figure 13: An example of an outlier in a scatterplot...........................................................58 Figure 14: Classification of Consumers by kWh Consumption Year 2005........................65 Figure 15: Negative skew (left diagram) and positive skew (right diagram). ....................68 Figure 16: Kurtosis Factor Impact on a Normal Distribution .............................................68 Figure 17: Venn diagram for Tariff Classification .............................................................74 Figure 18: Industrial Stratification Method.........................................................................74 Figure 19: Industrial Stratification Process Flow................................................................75 Figure 20: Formation of Boundaries between Cells ...........................................................76 Figure 21: Allocation scheme .............................................................................................77 Figure 22: Final sample.......................................................................................................78 Figure 23: The bell curve that shows the normal distribution from sample data................78 Figure 24: Illustrates the normalization process of a survey VoLL value ..........................79 Figure 25: SPSS Boxplot of Domestic Consumer Population (CI = 90%).........................81 Figure 26: SPSS Boxplot of Commercial Consumer Population (CI = 90%) ....................83 Figure 27: SPSS Boxplot of Industrial Consumer Population (CI = 90%).........................85 Figure 28: Outage Cost vs. Reliability Curve ....................................................................87

xiii

LIST OF EQUATIONS Loss ($/kW) = f(duration, season, time of day, notice) WTA + WTP 2

Outage Costs (RM) =

f ( x; ,

)=

ϕ (x) =

1 2

f ( x; ,

(x 1 exp 2 2 e

-x 2

exp

=

2

1

ϕ

x-

(Equation 4)...................................29

1 2

x -∞

z 2

2

du

(Equation 6).................................................30

u2 du 2

(Equation 7) ..............................................30

2

2

-∞

1 1 + erf 2

( p) =

2

(u - ) -

x

( x ) = F ( x:0, 1) =

-1

)

(Equation 3) ...................11

(Equation 5) .....................................................................................29

2

1 2

)=

=

(Equation 2)........................................................11

(a1 ⋅ Cust1 + a 2 ⋅ Cust 2 + a 3 ⋅ Cust 3 + ... + a n ⋅ Cust n ) n

VoLL =

(z)

(Equation 1) ................................9

exp -

(Equation 8) ..........................................................................30

2 erf -1 ( 2p - 1)

(Equation 9)..........................................................................30

M X ( t ) = E exp ( tX ) =

(x 1 exp 2 2

∞ -∞

= exp M X ( t; ,

)

2

2

exp ( tx ) dx

(Equation 10)........................................31

2 2

t 2

t+

= E exp ( itX ) =

∞ -∞

(x 1 exp 2 2

X-

Pr ( X ≤ x ) =

) 2

2

exp ( itx ) dx

(Equation 11) ..............................32

2 2

t 2

= exp i t Z=

)

(Equation 12)...............................................................................................33

x-

=

1 x1 + erf 2 2

(Equation 13) ........................................33

xiv

X= Z+

(Equation 14)...............................................................................................33

c = -2 ln a cos ( 2 b ) erf

n 2

(Equation 16)...............................................................................................37

Pr(U < θ < V|θ) = x

1 n

ˆ=X= x=

Z=

1 25

25 i=1

X-

n i=1

(Equation 17)...................................................................................43

(Equation 18).......................................................................................43

Xi

x i = 250.2 (grams)

= n

X0.5

-1

( ( z )) =

= P ( X - 0.98 ≤ T=

XS

Pr (X x-

(Equation 19) ..................................................................43

(Equation 20) ....................................................................................44

P ( -z ≤ Z ≥ z ) = 1 z=

(Equation 15) ..............................................................................34

-1

= 0.95

(Equation 21).................................................................44

( 0.975) = 1.96

≤ X + 0.98)

(Equation 22) ........................................................44

(Equation 23)................................................................45

(Equation 24)...............................................................................................47 n cS < n

cS cS ;x+ n n

cS = 0.9 n

(Equation 25) ............................................................47

(Equation 26)..................................................................................47

Pr (-1.645 < X - θ < 1.645) = 0.9

(Equation 27) ..............................................................48

Pr (X - 1.645 < θ < X + 1.645) = 0.9 Pr (82 – 1.645 < θ < 82 + 1.645) = 0.9

(Equation 28) ........................................................48 (Equation 29) .....................................................48

< Q1 − 1.5 * IQR,

(Equation 30) ......................................................................................58

> Q3 + 1.5 * IQR

(Equation 31).......................................................................................58

< Q1 – 3 * IQR,

(Equation 32).......................................................................................58

> Q3 + 3 * IQR,

(Equation 33).......................................................................................58

z-value = 1.96

(Equation 34).......................................................................................61

z-value = 1.64

(Equation 35).......................................................................................61

z-value = 1.881

(Equation 36).......................................................................................62

xv

r=

σ ∗z∗ µ

1 σ= n X=

1 n

n i =1

n i =1

Xi

1 n

(Equation 37) ...................................................................................62

( X i − X )2

(Equation 38)............................................................................63

(Equation 39)...........................................................................................63

VoLLstratum normalized =VoLLsurvey ×

kWhstratum mean kWh survey

(Equation 40) ...........................................79

xvi

GLOSSARY OF TERMS Adequacy

The ability of the electric system to supply the aggregate electrical demand and energy requirements of the consumers from various electric generation suppliers at all times, taking into account scheduled and reasonably expected unscheduled outages of system elements.

ASIFI

Average System Interruption Frequency Index [1]. It is an index that uses the load interrupted rather than the number of consumers interrupted. Thus, it is a measure of the expected number of times load is interrupted during the specified interval of time. Thus ASIFI for a system may be computed as: ASIFI =

Li LT

where: Li is the load interrupted due to each outage LT is the total load connected to the system under consideration. CDF

Consumer Damage Function. The CDF relates the magnitude of consumer losses (RM/kWh interrupted) for a given duration of a power outage.

CI

Confidence Interval. In statistics, a CI for a population parameter is an interval between two numbers with an associated probability p which is generated from a random sample of an underlying population, such that if the sampling was repeated numerous times and the confidence interval recalculated from each sample according to the same method, a proportion p of the confidence intervals would contain the population parameter in question. Confidence intervals are the most prevalent form of interval estimation.

[1]

http://72.14.205.104/search?q=cache:FECoyC0oBjsJ:www.ee.iastate.edu/~jdm/ee653/Distribution ReliabilityFundamentals.doc+asifi+index+definition&hl=en&ct=clnk&cd=7,

Retrieved

20

December 2006

xvii

Circuit

A conductor or system of conductors through which an electric current is intended to flow.

EENS

Expected Energy Not Supplied

EGAT

Electricity Generating Authority of Thailand.

EGAT presently

builds, owns and operates several types and sizes of power plants across Thailand with a combined installed capacity of 15,035.80 MW, accounting for about 59 percent of the Thailand’s 25,646.99 MW generating capacity.

EGAT also purchases electric power

from private power companies and neighboring countries. Energy Intensity

Ratio of energy consumption and economic or physical output. At the national level, energy intensity is the ratio of total domestic primary energy consumption or final energy consumption to gross domestic product or physical output [2].

EPRI

Electric Power Research Institute (of the United States of America). EPRI brings together members, participants, the Institute’s scientists and engineers, and other leading experts to work collaboratively on solutions to the challenges of electric power. These solutions span nearly every area of electricity generation, delivery, and use, including health, safety, and environment.

ESI

Electricity Supply Industry. The ESI consists of electricity utility companies involved in the generation, transmission, and distribution of electricity in a specific region.

It includes state owned

companies, IPPs, and co-generators. FMM

Federation of Malaysian Manufacturers. FMM was established in 1968, and strives to lead Malaysian manufacturers in spearheading the nation’s growth and modernization.

Today, as the largest

private sector economic organization in Malaysia representing over 2,000 manufacturing and industrial service companies of varying sizes, the FMM is the officially recognized and acknowledged voice of the industry in Malaysia. [2]

Intergovernmental Panel on Climate Change, http://glossary.eea.europa.eu/EEAGlossary/E/energy_ intensity, Retrieved 8 January 2007.

xviii

IPP

Independent Power Producer.

IQR

InterQuartile Range. It is the difference between the third and first quartiles and is a measure of statistical dispersion. The interquartile range is a more stable statistic than the range, and is often preferred to that statistic.

kW

kilowatt. The watt (symbol: W) is the SI derived unit of power, equal to one joule per second. 1 kW is equal to one thousand (103) watts.

kWh

kilowatt-hour. The watt-hour (symbol W·h) is a unit of energy. It is commonly used in the form of the kilowatt-hour, which is 1,000 watt-hours. It is a commonly used unit, especially for measuring electric energy.

MITI

Ministry of International Trade and Industry (of Malaysia). MITI plans,

formulates,

and

implements

policies

on

industrial

development, international trade, and investment in Malaysia. It encourages foreign and domestic investment and promotes Malaysia’s export of manufacturing products and services. MMBTU

The British thermal unit (BTU or Btu) is a unit of energy used in North America.

It is also still occasionally encountered in the

United Kingdom, in the context of older heating and cooling systems. In most other areas, it has been replaced by the SI unit of energy, the joule (J). In the United States, the term “BTU” is used to describe the heat value (energy content) of fuels, and also to describe the power of heating and cooling systems, such as furnaces, stoves, barbecue grills, and air conditioners. When used as a unit of power, BTU per hour is understood, though this is often confusingly abbreviated to just “BTU”.

The unit MBTU was

defined as one thousand BTU presumably from the Roman numeral system where “M” stands for one thousand (1,000).

There is

currently a social push to redefine MBTU as one million (1,000,000) BTU, thus making the unit more intuitive with metric system that uses “M” to mean mega, or 106. To avoid confusion many companies and engineers use MMBTU to represent one million (1,000,000) BTU. In natural gas, by convention 1 MMBtu xix

(1 million Btu, sometimes written “mmBTU”) = 1.054615 GJ. Conversely, 1 gigajoule is equivalent to 26.8 m³ of natural gas at defined temperature and pressure [3]. NAICS

North American Industry Classification System. The NAICS has replaced the SIC system. NAICS was developed jointly by the U.S., Canada, and Mexico to provide new comparability in statistics about business activity across North America [4].

NERC

North American Electric Reliability Council – An organization of regional reliability councils established to promote the reliability of the electricity supply for North America.

ORP

Optimal Reliability Point

PDF

Probability Density Function

POS

Point of Sale. This can mean a retail shop, a checkout counter in a shop, or a variable location where a transaction occurs in this type of environment.

Additionally, POS sometimes refers to the

electronic cash register system being used in an establishment. POS systems are used in restaurants, hotels, stadiums, casinos, as well as retail environments – in short, if something can be sold, it can be sold where a point of sale system is in use[5]. PTM

Malaysian Energy Center. PTM is an independent and non-profit organization devoted to energy research in Malaysia administered by the Ministry of Energy, Water and Communications, Malaysia. PTM’s core activities are energy planning and research, renewable energy (RE), energy efficiency (EE) and related technological research development and demonstration (RD&D) undertaken in the energy sector.

The responsibilities also include data gathering,

compilation and strategic/policy analysis, a think-tank group for the government as well as becoming a one-stop energy agency for

[3]

http://en.wikipedia.org/wiki/MMBTU, Retrieved 17 January 2007.

[4]

http://www.census.gov/epcd/www/naics.html, Retrieved 17 January 2007.

[5]

http://en.wikipedia.org/wiki/Point_of_sale, Retrieved 17 January 2007.

xx

linkages with the universities, research institutions, industries and other various national and international organizations. Q1

Quartile 1 or first quartile or lower quartile. It cuts off the lowest 25% of data.

Q2

Quartile 2 or second quartile or median. It cuts the data set in half.

Q3

Quartile 3 or third quartile or upper quartile. It cuts off the highest 25% of data.

Reliability

The degree of performance of the elements of an electric system that results in electricity being delivered to consumers within accepted standards and in the desired amount, measured by the frequency, duration, and magnitude of adverse effects on the electric supply and by considering two basic and functional aspects of the electric system: adequacy and security.

Reliability indices

Service performance indicators which measure the frequency, duration, and magnitude of consumer interruptions, excluding outages associated with major events.

SAIDI

System Average Interruption Duration Index [6].

The average

duration of sustained consumer interruptions per consumer occurring during the analysis period.

It is the average time

consumers were without power. It is determined by dividing the sum of all sustained consumer interruption durations, in minutes, by the total number of consumers served. This determination is made by using the following equation: SAIDI =

ri Ni NT

where: NT = total number of consumers served for the area being indexed SAIFI

System Average Interruption Frequency Index.

The average

frequency of sustained interruptions per consumer occurring during the analysis period. It is calculated by dividing the total number of sustained consumer interruptions by the total number of consumers [6]

http://www.pacode.com/secure/data/052/chapter57/s57.192.html, Retrieved 6 November 2006.

xxi

served.

This determination is made by using the following

equation: SAIFI =

Ni NT

where: NT = total number of consumers served for the area being indexed Security

The ability of the electric system to withstand sudden disturbance such as electric short circuits or unanticipated loss of system elements.

SIC

Standard Industrial Classification. The SIC was a United States government system for classifying industries by a four-digit code. Established in the 1930s, it is being supplanted by the six-digit NAICS, which was released in 1997; however certain government departments and agencies, such as the U.S. Securities and Exchange Commission (SEC), still use the SIC codes [7].

TNB

Tenaga Nasional Berhad (of Malaysia). TNB’s core activities are in the generation, transmission, and distribution of electricity. The TNB Group has a complete power system, including the National Grid, Consumer Service Centers, Call Management Centers, and administration offices throughout Peninsular Malaysia and Sabah. It is the largest electricity utility company in Malaysia with assets worth more than RM60 billion serving over six million consumers throughout Peninsular Malaysia and Sabah

VoLL

Value of Lost Load. The VoLL is the aggregated or average value of outage costs across the whole range of consumers in the ESI.

WTA

Willingness-to-Accept. An approach to determine how much the consumers are willing to pay to avoid an outage.

WTP

Willingness-to-Pay.

An approach to determine how much the

consumers would be willing to accept in compensation for an outage that has occurred.

[7]

http://en.wikipedia.org/wiki/Standard_Industrial_Classification. Retrived 17 January 2007.

xxii

CHAPTER 1 INTRODUCTION

1.1

Preface

Utilities are responsible for the generation, transmission, and distribution of electricity to consumers. Part of this responsibility is ensuring that system adequacy and security criteria are fulfilled. However, this must be balanced against the investment and operating costs, which are increasingly important factors to remain competitive. Utility planning has traditionally been based on the electricity load demand forecast. The demand for electricity initiates actions by the utilities to add or retire generation, transmission, or distribution assets [1].

Retiring assets can be done fairly quickly,

however, there is a long lead time required to plan and construct new utility equipment. Decisions may need to be made from 2-10 years in advance [1] for the need of a new utility plant. These long lead times require that the utility planning horizon be at least 10 years. Since utility decisions involve an economic analysis of the operating and investment costs, the utility planning horizon may range from 15 – 30 years into the future. Forecasts with these long lead times are quite a challenge in light of the uncertainties in national, regional, and local economic growth, coupled with uncertainties in electricity usage patterns and conservation trends [1]. The ultimate goal, however, is to ensure that all these uncertainties are properly taken into account to ensure that the planning stage continues to ensure system adequacy and security in the long term. To this end, system reliability provides a good yardstick in determining the effectiveness of policies executed during the planning stage.

1

System reliability is a central criterion during the planning stage. Reliability is the need to provide both system adequacy and security. Adequacy is the existence of sufficient facilities such as generators, lines, and control systems within a system to satisfy consumer demand, whereas, system security is the ability for the system to respond against a disturbance in the system such as the loss of a generator or a lightning strike. While these criteria have served the ESI well in the past, the present environment requires a balance between the planning and operation criteria, and the economic value consumers assign to reliability in establishing target reliability levels [2]. Consumers’ needs range from those who would not mind paying a premium for a highly reliable supply (because they would suffer a very large loss when there are disruptions in their power supply) to the vast majority of consumers who do not mind tolerating outages in exchange for lower prices. Often, there is a mix of consumers with various reliability needs located in close geographic proximity. This mix of consumers, who require various levels of reliability, complicates the process of utility planning. Conventional criteria, such as electricity load demand forecast, can lead to investments that are economically inefficient. Planners may inadvertently overinvest in generation, transmission, and distribution facilities that provide greater reliability than that required by consumers. unwilling to pay.

This leads to higher prices, which consumers are

Correspondingly, under-investment in facilities serving consumers

requiring high reliability will lead to unsatisfied consumers who would be willing to pay more for a higher level of reliability. Therefore, it can be quite hard for the utility to decide on the optimum reliability level. One method of optimization is through value based planning, which is matching the level of investment in reliability with consumers’ reliability preferences [2].

Taking into

account the economics during the planning stage allows utilities to optimize their investment by investing in assets to boost reliability where consumers expect higher reliability levels than the status quo.

2

1.2

Background of Asset Optimization

Optimization is the discipline which is concerned with finding the maxima and minima of functions, possibly subject to constraints [3]. When applied to asset investment in the ESI, optimization is the process of trying to maximize the utility’s profit while, at the same time minimizing the cost of adequacy and security. Planning engineers should constantly analyze and manage risks and the costs associated with those risks. However, in the Malaysian ESI, this activity is very limited [4]. Generally, planning engineers put more emphasis on the technical requirements of adequacy and security with minimal consideration of the cost incurred for the added reliability. To enable accurate reflection of a particular type of risk in utility planning, it is necessary to first analyze and quantify that risk; then associate it with a cost penalty. Two common methods of analyzing risks are the probabilistic approach and the deterministic approach [5].

Table 1 illustrates some of the major differences between deterministic and

probabilistic estimates. Table 1: Major Differences between Probabilistic Risk and Deterministic Risk • • •

Probabilistic Risk Assessment Takes into account all available information and considers the probability of an occurrence. The risk estimate is expressed as a distribution of values, with a probability assigned to each value. The distribution reflects variability and uncertainty.

• •

Deterministic Risk Assessment This risk estimate is expressed as point value. The variability and uncertainty of this value are not reflected.

The probability approach aims to determine the probability of an event occurrence using statistical data. In this process, a sampling method is selected. To avoid bias, researchers employ random sampling procedures [6]. Through statistical methods, for a given sample size, the mean, standard deviation, coefficient of variation, and the level of precision can be calculated. This method allows utilities to gauge damage perceived by consumers due to a power interruption. The deterministic approach, on the other hand, plans for contingency scenarios such as n-1 criteria and largest unit tripping. In utilizing this method, the utility plans for various potential contingencies and estimates the damage to the affected consumers. In this work, 3

the deterministic approach was used to collect consumer data via the various scenarios in the questionnaires that were developed. However, it is foreseen that this work can be extended to cover the probabilistic analysis as obtaining this would enable risk quantification. By relating the system reliability level to the outage costs, the optimum

Cost

reliability point, as shown in Figure 1, could be calculated.

Total Cost Reliability Cost (utility) Reliability Worth (consumer) A

Reliability

Figure 1: Utility Asset Investment: Balancing Reliability Cost and Reliability Worth [7].

Figure 1 above illustrates the costs utilities face in planning for asset investment. In the figure, there are 2 types of costs: the reliability cost and reliability worth. The reliability cost is related to the cost of purchasing and installing new equipment to increase the utility’s reliability. This cost increases with increasing reliability. The reliability worth is the value that the consumer is willing to pay for increasing power supply reliability. This cost decreases with increasing reliability. The total cost to the consumer is the sum of the utility cost, which the consumer pays for through the electricity bill, and the consumer cost due to outages [8]. Reliability worth according to Billinton [9] is the outage cost which can be divided into three categories: i)

Outage Cost to Utility – this includes loss of revenue, loss of goodwill, loss of future potential sales and increased expenditure for maintenance and repair.

ii)

Outage Cost to Industry- this includes lost of production, damaged machineries and products and corrective maintenance.

iii)

Outage Cost to Domestic – this includes lost of frozen foods, alternative energy cost.

4

Often outage cost to Industry and Domestic outweigh the outage cost to utility. However, the outage cost can be reduced through measures such as the construction of parallel lines which connect a power source to the load. Utility asset investment centers on reducing risk by increasing system reliability to provide system adequacy and security.

However, optimization would show that investment

beyond point A is economically inefficient because it leads to higher total cost. At the same time, investment below point A would result in a lower than optimum reliability level, which also results in higher total cost. In order to embark on this exercise, initial data had to be collected, conditioned, and understood. The initial data included the consumer groups in Peninsular Malaysia, types of manufacturing and/or commercial processes, consumption data, premise location, and TNB business code. This dissertation aims to document these processes and present the research findings.

1.3

Objectives

The objectives of this research are: i)

To group electricity consumers based on their TNB business code, tariff structure, monthly energy consumption or a combination of these grouping. This includes stratification of consumers to facilitate accurate data collection through sampling.

ii)

To determine their weights in the final VoLL calculation. Weights are based on the percentage consumption of a particular stratum in comparison to the total consumption.

iii)

To determine the individual outage cost of consumers of the identified strata. This was accomplished by collecting sample data through the questionnaires that we developed to gauge the consumers’ perception of the direct and indirect costs of various outage scenarios.

iv)

To develop a formulation which enables the calculation of the VoLL. This allows the individual stratum VoLL values to be combined to form a composite VoLL value.

5

There is also a growing need to determine the value of electricity worth for system security purposes. This can be seen in the proliferation of contracts for interruption or interruptible load schemes between electricity utilities and large consumers. In order for schemes like this to take off, the VoLL must first be found. Another use of the VoLL is in the cost of outage analysis. At the same time, VoLL can be used in cost-benefit analysis for justifying new projects given a particularly high cost of outage in specific areas. In the event that the utility is required to pay compensation to its consumers due to a loss of supply, the VoLL can provide a starting point for both parties to negotiate the quantum of compensation.

In some deregulated markets, the VoLL is a component of the Pool

Purchase Price and is a reflection of the price given an inadequate supply of electricity in the network. This again shows the use of VoLL in determining the cost of unsupplied electricity to consumers.

1.4

Brief Methodology

Initially, a literature review is to be carried out to assess existing methods of calculating VoLL in other utilities. Consumer data will be collected and verified in order to classify consumer groups. This required the researchers to liaise with TNB Distribution Sdn Bhd in order to acquire the latest consumer data, which includes the monthly energy consumption, peak demand, TNB business code, and also the tariff of the consumers. Based on the above data, the research examined how best to group the consumers. This required a balance between the two factors of adequate sampling and also the time constraints of the research. The collected data would also form the basis of the weights of each consumer strata. Questionnaires were developed to suit the different consumer segments.

These

questionnaires and cost assessment templates formed the basis of outage cost assessment and were used when meeting with consumers to collect sample data.

6

A series of interviews were conducted to assess the cost of manufacturing, service processes, and inconvenience caused by various outage scenarios at selected industrial, commercial, and domestic consumers in the TNB Network of Peninsular Malaysia. Further discussion of the above will be covered in Chapters 2 through 5.

1.5

Scope of Work

This research will analyze data from TNB databases in order to understand the make-up and consequently group TNB consumers. Consumers will be divided in to major strata, which will then be subdivided into minor strata. Given this stratification, the study will not assess all consumers but would rely on sampling to determine and calculate consumer losses for each minor stratum. A census of the entire population would not be feasible because of the tremendous cost and time needed. Compared to a census, the sampling process will provide an adequate accuracy factor for a fraction of the cost and time. This would result in a reduced list of consumers to be interviewed and examined. The stratification will also allow a simplified form of consumer group weights. The weighs will be based on consumer consumption and will be used to calculate the VoLL of a major stratum by combining individual VoLL values from the minor strata. Similarly, these weights will be used to calculate the composite VoLL for Peninsular Malaysia by combining the individual VoLL values of the major strata.

1.6

Layout of Dissertation

This dissertation will introduce the concept of outage cost analysis and compare the data collection techniques, results, and analysis from research carried out in other parts of the world. Data collection techniques will be discussed with particular emphasis on data collection through surveys.

7

The application of statistics in engineering will then be introduced, including the concept of CIs, precision, outliers, extreme values, and box plots. Next, the determination of consumer stratification and sample size will be presented.

This will include the

differentiation and characteristics of domestic, commercial, and industrial consumers. In the following chapter, the results of our nationwide survey and its analysis are presented. This data analysis will discuss the CDFs for the various categories of domestic, commercial, and industrial consumers.

The overall composite VoLL will also be

calculated. Chapter 2 will introduce the concept of outage cost and VoLL. It will discuss the various outage cost estimation methods and evaluation types. The CDF will then be introduced and discussed with respect to the VoLL formulation. It also lays the groundwork for the statistics theory and calculations that will be used in later chapters. It discusses the characteristics and importance of the TNB 5-digit business code. Lastly, the definition and characteristics of domestic, commercial, and industrial consumers in Peninsular Malaysia are discussed. Chapter 3 presents the origins and evolution of statistics theory and their application in data processing and conditioning.

The Normal distribution, confidence intervals,

boxplots, outliers, extreme values, z-value, and their implications on a data set are also discussed. Chapter 4 discusses consumer stratification and determination of the appropriate sample size for an adequate precision level. First the stratification process is documented. This includes a discussion of the three major consumer strata and their respective needs. The minor strata and the steps to determine the minor strata are discussed. Next, the normal distribution function and precision factor for the TNB database are discussed with reference to calculations presented in Chapter 4. Lastly, the normalization process and processing of the collected sample data is discussed. Chapter 5 draws conclusions from the research and highlights future possible work in this area.

8

CHAPTER 2 DETERMINATION OF VALUE OF LOST LOAD

2.1

Determination of Consumer Damage Function

The economics loses customers experience as a result of reliability and power quality problems may be described by what is called a Consumer Damage Function (CDF). In a CDF, the losses that consumers face are expressed as a function of the magnitude of load interrupted, the duration of the interruption, the season and time of day, and whether notice was given by the utility notifying the consumer of a planned outage [10]. The CDF can be defined as: Loss ($/kW) = f(duration, season, time of day, notice)

(Equation 1)

The CDF for each consumer’s category is called the individual consumer damage function (ICDF). All the ICDFs for a given category of consumers such as domestic, commercial and industrial can be combined to represent the function of cost for that category and can be named as the sector consumer damage function (SCDF). The SCDF for domestic, commercial and industrial can then be combined to produce the composite consumer damage function (CCDF). This CCDF is used to represent the CDF for a large area. Figure 2 below illustrates the incremental CDFs for domestic, large commercial and industrial, and small and medium commercial and industrial. The CDF relates the magnitude of consumer losses (per kW interrupted) for a given duration of a power outage. While the general shapes of all three curves are similar, the magnitude of loss varies dramatically depending on the consumer’s size.

9

$/kW interrupted

Small & Medium C/I Large C/I Domestic Outage Duration

Figure 2: Example Consumer Damage Functions [11].

2.2

Determination of Value of Loss Load

VoLL is said to represent the value an average consumer puts on an unsupplied kWh [12]. The value of VoLL can be calculated by using data from the CDF.

Small & Medium C&I

$/kW interrupted

VoLL

Outage Duration Figure 3: Example of determination of VoLL

Figure 3 illustrates how to determine VoLL for small and medium sized commercial and industrial. Based on the VoLL data from an EGAT survey in March and April 2000 [13], it was estimated that the consumers’ costs in the first hour for domestic consumers was Baht 11.45/kW. For large C/I and small & medium C/I consumers, the cost in the first hour was Baht 29.55/kW and Baht 89.50 /kW respectively. Another research from EPRI [11] indicates that domestic consumers’ cost tend to peak at USD 1.50/kW in the first hour and falls of to USD 0.46/kW in subsequent hours. On the other hand, large C/I and small & medium C/I suffer much higher losses of USD 10/kW and USD 38/kW respectively in the first hour. This falls to USD 4/kW and USD 9/kW respectively in the subsequent hours. 10

This general trend shows that consumers usually incur medium to heavy losses in the first few hours of outage and tend to have smaller incremental losses after 4 hours. Outages, especially forced outages, cause an interruption of the consumers’ usual activities. During the first few hours, consumers typically will be forced to deal with the outage and make the necessary arrangements for a resumption of their activities once supply is restored. After 4 hours, losses are reduced because consumers have been able to plan and cope with the outage. This is the reason notice is important: consumers who have been notified of planned outages in advance can plan around the interruption period, thereby reducing outage induced losses.

2.2.1

VoLL Formulation

In order to determine the VoLL, all the information which will be gathered from the survey should be analyzed by using some formulas. The outage cost is determined by using the simple average of the WTA and WTP values. The calculation for outage cost is given by Equation 2.

Outage Costs (RM) =

WTA + WTP 2

(Equation 2)

Therefore, the average value of WTA and WTP for respondents will be used to calculate the customer losses in the VoLL formula. The general equation used to calculate the VoLL is shown in Equation 3.

VoLL =

(a1 ⋅ Cust1 + a 2 ⋅ Cust 2 + a 3 ⋅ Cust 3 + ... + a n ⋅ Cust n ) n

(Equation 3)

where; a is consumer weight n is group identifier Custn is losses incurred by consumer n The calculation of VoLL for domestic, commercial, and industrial is based on this general equation. For each category, the weight of the consumer is calculated first. Consumer 11

weights are defined by the consumption of the individual minor strata in the domestic, commercial, or industrial strata divided by the total consumption of that major stratum.

2.3

Tenaga Nasional Berhad Business Code

To simplify the stratification process, membership of the minor strata were defined by the TNB business codes. The TNB business codes are 5-digit codes created by Tenaga Nasional Berhad (TNB), a local utility company, to differentiate the types of residences, commercial business, and industries. It includes a detailed breakdown of the premise by type, business, activity, consumption size, and other criteria. All the business codes with their respective premise description are listed in Appendix 1 – TNB Business Codes.

2.4

TNB Consumers in Peninsular Malaysia

TNB has a total number of 6,582,374 consumers in 2005 [14]. Consumers as defined by TNB [15] are any person and/or entity taking electricity supply from TNB’s supply lines at any one point of supply, provided that if a person and/or entity takes a supply at more than one point of supply such person and/or entity shall be deemed to be separate consumer for each of such point of supply.

2.4.1

Domestic Consumers

There are 5,482,920 domestic consumers, which constitute 83.297 percent of the total number of consumers [16]. A domestic consumer as defined by TNB [15] is a consumer occupying a private dwelling, which is not used as a hotel, boarding house or used for the purpose of carrying out any form of business, trade, professional activities or services. A domestic consumer typically is the smallest power consumer amongst the three major strata. Consumption during weekdays is high during the morning period of 6am – 8am due to preparation for work or school, and during the evening and night period of 6pm –

12

11pm, which is when dinner is prepared and household chores are done. Weekends are marked by little consumption.

2.4.2

Commercial Consumers

Commerce means buying and selling of goods, where else commercial refers to an activity of commerce [17]. Hence, considering commercial consumers for analysis in this research is vital. A commercial consumer as defined by TNB [15] is, but not limited to, a consumer occupying or operating an office block, hotel, service apartment, boarding house, retail complex, shop-house, car-park, workshop, restaurant, estate, plantation, farm (except those categories defined in the Specific Agriculture Tariff), port, airport, railway installation, toll plaza, street lightings at tolled highway including its bridges and tunnels, telecommunications installation, broadcasting installation, entertainment/recreation/sports outlet, golf course, school/educational institution, religious and welfare organization, military and government installation, hospital, waste treatment plant, district cooling plant, cold storage, warehouse, and any other form of business or commercial activities which are not primarily involved in manufacturing, quarrying or mining activities. Commercial consumers are typically moderate to large consumers during weekday business hours, with some premises extending activities over the weekend.

Their

consumption is used mainly to run motor loads and computer equipment.

2.4.3

Industrial Consumers

Industrial consumers are low in number but contribute the most in terms of revenue; their total energy consumption is high at approximately 80% of total consumption in Peninsular Malaysia. TNB has a total of 26,689 industrial consumers [14]. An industrial consumer as defined by TNB [15] is a consumer engaging in manufacturing of goods and products.

Manufacturing means the conversion of raw material or 13

components to finished products such as the making, altering, blending, ornamenting, finishing, or otherwise treating, or adapting any article with a view to use, sell, transport, deliver, or dispose; and includes the assembly of parts and food processing but shall not include any activity normally associated with the retail or wholesale trade. Quarrying of minerals, stone, and other natural resources and pumping for water treatment plant are also termed as Industrial Consumer.

In addition, the total wattage of lamps and air

conditionings installed for the purpose of office use shall not exceed 20% of the total wattage of all electrical equipment installed.

14

CHAPTER 3 STATISTICAL ANALYSIS IN ENGINEERING

3.1

Statistical Analysis

A textbook definition of statistics is “a logic and methodology for the measurement of uncertainty and for an examination of the consequences of that uncertainty in the planning and interpretation of experimentation or observation” [18]. Statistics is a mathematical science pertaining to the collection, analysis, interpretation, and presentation of data [19]. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities; it is also used for making informed decisions in all areas of business and government. Statistical methods can be used to summarize or describe a collection of data; this is called descriptive statistics. In addition, patterns in the data may be modeled in a way that accounts for randomness and uncertainty in the observations, to draw inferences about the process or population being studied; this is called inferential statistics. Both descriptive and inferential statistics can be considered part of applied statistics. There is also a discipline of mathematical statistics, which is concerned with the theoretical basis of the subject. The word statistics is also the plural of statistic (singular), which refers to the result of applying a statistical algorithm to a set of data, as in employment statistics, accident statistics, etc.

15

Figure 4: A Graph of a Bell Curve in a Normal Distribution [20]

3.1.1

Etymology

The word statistics ultimately derives from the modern Latin term statisticum collegium (“council of state”) and the Italian word statista (“statesman” or “politician”) [21]. The German Statistik, first introduced by Gottfried Achenwall (1749), originally designated the analysis of data about the state, signifying the “science of state”; then called political arithmetic in English [22]. It acquired the meaning of the collection and classification of

data generally in the early 19th century. It was introduced into English by Sir John Sinclair. Thus, the original principal purpose of Statistik was data to be used by governmental and (often centralized) administrative bodies. The collection of data about states and localities continues, largely through national and international statistical services. In particular, censuses provide regular information about the population. During the 20th century, the creation of precise instruments for public health concerns (epidemiology, biostatistics, etc.) and economic and social purposes (unemployment rate, econometry, etc.) necessitated substantial advances in statistical practices. This became a

16

necessity for Western welfare states developed after World War I which had to develop a specific knowledge of their “population”. Philosophers such as Michel Foucault have argued that this constituted a form of “biopower”, a term which has since been used by many other authors [23].

3.1.2

Origins in probability

The mathematical methods of statistics emerged from probability theory, which can be dated to the correspondence of Pierre de Fermat and Blaise Pascal (1654) [24]. Christiaan Huygens (1657) gave the earliest known scientific treatment of the subject.

Jakob

Bernoulli’s Ars Conjectandi (posthumous, 1713) and Abraham de Moivre’s Doctrine of Chances (1718) treated the subject as a branch of mathematics [25].

The theory of errors may be traced back to Roger Cotes’s Opera Miscellanea (posthumous, 1722), but a memoir prepared by Thomas Simpson in 1755 (printed 1756) first applied the theory to the discussion of errors of observation [18]. The reprint (1757) of this memoir lays down the axioms that positive and negative errors are equally probable, and that there are certain assignable limits within which all errors may be supposed to fall; continuous errors are discussed and a probability curve is given. Pierre-Simon Laplace (1774) made the first attempt to deduce a rule for the combination of observations from the principles of the theory of probabilities. He represented the law of probability of errors by a curve.

He deduced a formula for the mean of three

observations. He also gave (1781) a formula for the law of facility of error (a term due to Lagrange, 1774), but one which led to unmanageable equations. Daniel Bernoulli (1778) introduced the principle of the maximum product of the probabilities of a system of concurrent errors [18]. The method of least squares, which was used to minimize errors in data measurement, is due to Adrien-Marie Legendre (1805), who introduced it in his Nouvelles méthodes pour la détermination des orbites des comètes (New Methods for Determining the Orbits of

Comets).

In ignorance of Legendre’s contribution, an Irish-American writer, Robert

Adrain, editor of “The Analyst” (1808), first deduced the law of facility of error. He gave 17

two proofs, the second being essentially the same as John Herschel’s (1850). Carl Gauss gave the first proof which seems to have been known in Europe (the third after Adrain’s) in 1809. Further proofs were given by Laplace (1810, 1812), Gauss (1823), James Ivory (1825, 1826), Hagen (1837), Friedrich Bessel (1838), W. F. Donkin (1844, 1856), and Morgan Crofton (1870). Other contributors were Ellis (1844), De Morgan (1864), Glaisher (1872), and Giovanni Schiaparelli (1875).

Peters’s (1856) formula for r, the probable error of a single

observation, is well known. In the nineteenth century authors on the general theory included Laplace, Sylvestre Lacroix (1816), Littrow (1833), Richard Dedekind (1860), Helmert (1872), Hermann Laurent (1873), Liagre, Didion, and Karl Pearson. Augustus De Morgan and George Boole improved the exposition of the theory. Adolphe Quetelet (1796-1874), another important founder of statistics, introduced the notion of the “average man” (l’homme moyen) as a means of understanding complex social phenomena such as crime rates, marriage rates or suicide rates.

3.1.3

Statistics today

Today the use of statistics has broadened far beyond its origins as a service to a state or government. Individuals and organizations use statistics to understand data and make informed decisions throughout the natural and social sciences, medicine, business, and other areas. Statistics is generally regarded not as a subfield of mathematics but as a distinct, albeit allied, field. Many universities maintain separate mathematics and statistics departments. Statistics is also taught in departments as diverse as psychology, education, and public health.

18

3.1.4

Conceptual overview

In applying statistics to a scientific, industrial, or societal problem, one begins with a process or population to be studied. This might be a population of people in a country, of crystal grains in a rock, or of goods manufactured by a particular factory during a given period. It may instead be a process observed at various times; data collected about this kind of “population” constitute what is called a time series. For practical reasons, rather than compiling data about an entire population, one usually instead studies a chosen subset of the population, called a sample. Data are collected about the sample in an observational or experimental setting. The data are then subjected to statistical analysis, which serves two related purposes: description and inference. i)

Descriptive statistics can be used to summarize the data, either numerically or graphically, to describe the sample.

Basic examples of numerical descriptors

include the mean and standard deviation.

Graphical summarizations include

various kinds of charts and graphs. ii)

Inferential statistics is used to model patterns in the data, accounting for randomness and drawing inferences about the larger population. These inferences may take the form of answers to yes/no questions (hypothesis testing), estimates of numerical characteristics (estimation), forecasting of future observations, descriptions of association (correlation), or modeling of relationships (regression). Other modeling techniques include ANOVA, time series, and data mining.

The concept of correlation is particularly noteworthy. Statistical analysis of a data set may reveal that two variables (that is, two properties of the population under consideration) tend to vary together, as if they are connected. For example, a study of annual income and age of death among people might find that poor people tend to have shorter lives than affluent people. The two variables are said to be correlated. However, one cannot immediately infer the existence of a causal relationship between the two variables [26]. If the sample is representative of the population, then inferences and conclusions made from the sample can be extended to the population as a whole. A major problem lies in determining the extent to which the chosen sample is representative. Statistics offers

19

methods to estimate and correct for randomness in the sample and in the data collection procedure, as well as methods for designing robust experiments in the first place [27]. The fundamental mathematical concept employed in understanding such randomness is probability. Mathematical statistics (also called statistical theory) is the branch of applied mathematics that uses probability theory and analysis to examine the theoretical basis of statistics. The use of any statistical method is valid only when the system or population under consideration satisfies the basic mathematical assumptions of the method. Misuse of statistics can produce subtle but serious errors in description and interpretation — subtle in that even experienced professionals sometimes make such errors, and serious in that they may affect social policy, medical practice and the reliability of structures such as bridges and nuclear power plants. Even when statistics is correctly applied, the results can be difficult to interpret for a nonexpert. For example, the statistical significance of a trend in the data — which measures the extent to which the trend could be caused by random variation in the sample — may not agree with one’s intuitive sense of its significance. The set of basic statistical skills (and skepticism) needed by people to deal with information in their everyday lives is referred to as statistical literacy.

3.1.5

Statistical methods

Some statistical methods are discussed in the following subsections.

They include

experimental and observational studies, levels of measurement, and statistical techniques.

20

3.1.5.1 Experimental and observational studies A common goal for a statistical research project is to investigate causality, and in particular to draw a conclusion on the effect of changes in the values of predictors or independent variables on a response or dependent variable. There are two major types of causal statistical studies, experimental studies and observational studies. In both types of studies, the effect of differences of an independent variable (or variables) on the behavior of the dependent variable are observed. The difference between the two types is in how the study is actually conducted. Each can be very effective. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation may have modified the values of the measurements. manipulation.

In contrast, an observational study does not involve experimental Instead data are gathered and correlations between predictors and the

response are investigated. An example of an experimental study is the famous Hawthorne studies which attempted to test changes to the working environment at the Hawthorne plant of the Western Electric Company.

The researchers were interested in whether increased illumination would

increase the productivity of the assembly line workers. The researchers first measured productivity in the plant then modified the illumination in an area of the plant to see if changes in illumination would affect productivity.

Due to errors in experimental

procedures, specifically the lack of a control group and blindedness, the researchers were unable to do what they planned, in what is known as the Hawthorne effect [28]. An example of an observational study is a study which explores the correlation between smoking and lung cancer.

This type of study typically uses a survey to collect

observations about the area of interest and then perform statistical analysis. In this case, the researchers would collect observations of both smokers and non-smokers and then look at the number of cases of lung cancer in each group.

21

The basic steps for an experiment are to: i)

plan the research including determining information sources, research subject selection, and ethical considerations for the proposed research and method

ii)

design the experiment concentrating on the system model and the interaction of independent and dependent variables

iii)

summarize a collection of observations to feature their commonality by suppressing details (descriptive statistics)

iv)

reach consensus about what the observations tell about the world that is observed (statistical inference)

v)

document and present the results of the study

3.1.5.2 Levels of measurement There are four types of measurements or measurement scales used in statistics. The four types or levels of measurement, which are nominal, ordinal, interval, and ratio, have different degrees of usefulness in statistical research. Ratio measurements, where both a zero value and distances between different measurements are defined, provide the greatest flexibility in statistical methods that can be used for analyzing the data.

Interval

measurements have meaningful distances between measurements but no meaningful zero value (such as IQ measurements or temperature measurements in degrees Celsius). Ordinal measurements have imprecise differences between consecutive values but a meaningful order to those values. Nominal measurements have no meaningful rank order among values.

22

3.2

Statistical techniques

Some well known statistical tests and procedures for research observations are: i)

Student’s t-test

ii)

chi-square

iii)

analysis of variance (ANOVA)

iv)

Mann-Whitney U

v)

regression analysis

vi)

correlation

vii)

Fisher’s Least Significant Difference test

3.2.1

a.

Pearson product-moment correlation coefficient

b.

Spearman’s rank correlation coefficient

Specialized disciplines

Some sciences use applied statistics so extensively that they have specialized terminology. These disciplines include: i)

Actuarial science

ii)

Biostatistics

iii)

Business statistics

iv)

Data mining (apply statistics & pattern recognition to obtain knowledge from data)

v)

Economic statistics (Econometrics)

vi)

Engineering statistics

vii)

Statistical physics

viii)

Demography

ix)

Psychological statistics

x)

Social statistics (for all the social sciences)

xi)

Statistical literacy

xii)

Statistical surveys

xiii)

Process analysis & chemometrics (for analysis of data from analytical chemistry)

xiv)

Reliability engineering

xv)

Image processing

xvi)

Statistics in various sports, particularly baseball and cricket 23

Statistics form a key basis tool in business and manufacturing as well. It is used to understand measurement systems variability, control processes (as in statistical process control or SPC), for summarizing data, and to make data-driven decisions. In these roles it is a key tool, and perhaps the only reliable tool. In this research, Engineering Statistics and Econometrics played an important role in the stratification process and in deciding the sample size. These processes are discussed further in Chapter 4.

3.2.2

Criticism

There is a general perception that statistical knowledge is all-too-frequently intentionally misused, by finding ways to interpret the data that are favorable to the presenter. A famous quote, variously attributed, but thought to be from Benjamin Disraeli [29] is: “There are three types of lies - lies, damn lies, and statistics.” Indeed, the well-known book How to Lie with Statistics by Darrell Huff [30] discusses many cases of deceptive uses of statistics, focusing on misleading graphs. By choosing (or rejecting, or modifying) a certain sample, results can be manipulated; throwing out outliers is one means of doing so. This may be the result of outright fraud or of subtle and unintentional bias on the part of the researcher. As further studies contradict previously announced results, people may become wary of trusting such studies. One might read a study that says (for example) “do X to reduce high blood pressure”, followed by a study that says “doing X does not affect high blood pressure”, followed by a study that says “doing X actually worsens high blood pressure”. Often the studies were conducted on different groups with different protocols, or a smallsample study that promised intriguing results has not held up to further scrutiny in a largesample study. However, many readers may not have noticed these distinctions, or the media may have oversimplified this vital contextual information, and the public’s distrust of statistics is thereby increased. However, deeper criticisms come from the fact that the hypothesis testing approach, widely used and in many cases required by law or regulation, forces one hypothesis to be ‘favored’ (the null hypothesis), and can also seem to exaggerate the importance of minor

24

differences in large studies. A difference that is highly statistically significant can still be of no practical significance. In the fields of psychology and medicine, especially with regard to the approval of new drug treatments by the Food and Drug Administration, criticism of the hypothesis testing approach has increased in recent years. One response has been a greater emphasis on the p-value over simply reporting whether or not a hypothesis was rejected at the given level

of significance . Here again, however, this summarizes the evidence for an effect but not the size of the effect. One increasingly common approach is to report confidence intervals (CI) instead, since these indicate both the size of the effect and the uncertainty surrounding it. This aids in interpreting the results, as the CI for a given

simultaneously indicates

both statistical significance and effect size. Note that both the p-value and CI approaches are based on the same fundamental calculations as those entering into the corresponding hypothesis test. The results are stated in a more detailed format, rather than the yes-or-no finality of the hypothesis test, but use the same underlying statistical methodology. A truly different approach is to use Bayesian methods. This approach has been criticized as well, however. The strong desire to see good drugs approved and harmful or useless ones restricted remain conflicting tensions (Type I and Type II errors in the language of hypothesis testing). According to Abelson [31], makes the case that statistics serves as a standardized means of settling arguments between scientists who could otherwise each argue the merits of their own cases ad infinitum. Statistics is, in this view, a form of rhetoric. This can be viewed as a positive or a negative, but as with any means of settling a dispute, statistical methods can succeed only so long as both sides accept the approach and agree on the particular method to be used.

25

3.3

Normal Distribution

There are many probability distribution functions. Table 2 lists some functions according to discrete/continuous and univariate/multivariate classifications.

However, for this

research, the normal probability distribution is of particular interest because this research deals with consumer consumption data from TNB databases which contain consumption data for all electricity consumers in Peninsular Malaysia. Their consumption pattern would be of normal distribution due to the large number of consumers of random consumption demand. Table 2: Probability Distributions Discrete

Continuous

Miscellaneous

Univariate Multivariate Benford • Bernoulli • binomial • Boltzmann • categorical • compound Ewens • Poisson • discrete phase-type • degenerate • Gauss-Kuzmin • geometric multinomial • • hypergeometric • logarithmic • negative binomial • parabolic fractal • multivariate Poisson • Rademacher • Skellam • uniform • Yule-Simon • zeta • Zipf • Polya Zipf-Mandelbrot Beta • Beta prime • Cauchy • chi-square • Dirac delta function • Dirichlet • Coxian • Erlang • exponential • exponential power • F • fading • inverse-Wishart • Kent • matrix Fisher’s z • Fisher-Tippett • Gamma • generalized extreme value • generalized hyperbolic • generalized inverse Gaussian • Half-Logistic • normal • Hotelling’s T-square • hyperbolic secant • hyper-exponential • multivariate normal • hypoexponential • inverse chi-square (scaled inverse chi-square)• multivariate inverse Gaussian • inverse gamma (scaled inverse gamma) • Kumaraswamy • Landau • Laplace • Lévy • Lévy skew alpha-stable • Student • von logistic • log-normal • Maxwell-Boltzmann • Maxwell speed • Mises-Fisher • Wigner quasi • Nakagami • normal (Gaussian) • normal-gamma • normal inverse Wishart Gaussian • Pareto • Pearson • phase-type • polar • raised cosine • Rayleigh • relativistic Breit-Wigner • Rice • shifted Gompertz • Student’s t • triangular • type-1 Gumbel • type-2 Gumbel • uniform • Variance-Gamma • Voigt • von Mises • Weibull • Wigner semicircle • Wilks’ lambda Cantor • conditional • equilibrium • exponential family • infinitely divisible • locationscale family • marginal • maximum entropy • posterior • prior • quasi • sampling • singular

The normal distribution, also called Gaussian distribution (named after Carl Friedrich Gauss, a German mathematician, although Gauss was not the first to work with it), is an extremely important probability distribution in many fields. It is a family of distributions of the same general form, differing in their location and scale parameters: the mean (“average”) and standard deviation (“variability”), respectively. The standard normal distribution is the normal distribution with a mean of zero and a variance of one, as illustrated in the green curve in the plots in Figure 5. It is often called the bell curve because the graph of its probability density resembles a bell.

26

Figure 5: PDF of Gaussian distribution (bell curve).

3.3.1

Overview

The fundamental importance of the normal distribution as a model of quantitative phenomena in the natural and behavioral sciences is due to the central limit theorem (see Section 3.3.11 for explanation).

A variety of psychological test scores and physical

phenomena like photon counts can be well approximated by a normal distribution. While the mechanisms underlying these phenomena are often unknown, the use of the normal model can be theoretically justified if one assumes many small (independent) effects contribute to each observation in an additive fashion. The normal distribution also arises in many areas of statistics: for example, the sampling distribution of the mean is approximately normal, even if the distribution of the population the sample is taken from is not normal. In addition, the normal distribution maximizes information entropy among all distributions with known mean and variance, which makes it the natural choice of underlying distribution for data summarized in terms of sample mean and variance. The normal distribution is the most widely used family of distributions in statistics and many statistical tests are based on the assumption of normality. In probability theory, normal distributions arise as the limiting distributions of several continuous and discrete families of distributions.

27

3.3.2

History

The normal distribution was first introduced by Abraham de Moivre in an article in 1734 (reprinted in the second edition of his The Doctrine of Chances, 1738) in the context of approximating certain binomial distributions for large n. His result was extended by Laplace in his book Analytical Theory of Probabilities (1812), and is now called the theorem of de Moivre-Laplace. Laplace used the normal distribution in the analysis of errors of experiments.

The

important method of least squares was introduced by Legendre in 1805. Gauss, who claimed to have used the method since 1794, justified it rigorously in 1809 by assuming a normal distribution of the errors. The name “bell curve” goes back to Jouffret who first used the term “bell surface” in 1872 for a bivariate normal with independent components. The name “normal distribution” was coined independently by Charles S. Peirce, Francis Galton, and Wilhelm Lexis around 1875. This terminology is unfortunate, since it reflects and encourages the fallacy that many or all probability distributions are “normal” (see Section 3.3.16 for explanation). That the distribution is called the Gaussian distribution is an instance of Stigler’s law of eponymy: “No scientific discovery is named after its original discoverer.”

3.3.3

Characterization of the normal distribution

There are various ways to characterize a probability distribution. The most visual is the PDF, which represents how likely each value of the random variable is. The CDF is a conceptually cleaner and less cluttered way to specify the same information, but to the untrained eye its plot is much less informative. Equivalent ways to specify the normal distribution are [32]: the moments, the cumulants, the moment generating function (see Section 3.3.6.1 for explanation), the cumulant generating function (see Section 3.3.6.2 for explanation), the characteristic function (see Section 3.3.6.3 for explanation), and Maxwell’s theorem. Some of these are very useful for theoretical work, but not intuitive.

28

3.3.4

Probability Density Function

The probability density function of the normal distribution with mean µ and variance σ2 (equivalently, standard deviation σ) is an example of a Gaussian function, f ( x; ,

(x 1 exp 2 2

)=

) 2

2

=

1

ϕ

x-

(Equation 4)

where 1 2

ϕ (x) =

e

-x 2

2

(Equation 5)

is the density function of the “standard” normal distribution, i.e., the normal distribution with

= 0 and

= 1.

As a Gaussian function with the denominator of the exponent equal to two, the standard normal density function ϕ is an eigenfunction of the Fourier transform. To indicate that a random variable X, is normally distributed with mean we write X

N( ,

2

and variance

2

)

Some notable qualities of the normal distribution: i)

The density function is symmetric about its mean value.

ii)

The mean is also its mode and median.

iii)

68.26894921371% of the area under the curve is within one standard deviation of the mean.

iv)

95.44997361036% of the area is within two standard deviations.

v)

99.73002039367% of the area is within three standard deviations.

vi)

The inflection points of the curve occur at one standard deviation away from the mean.

29

3.3.5

Cumulative Distribution Function

Figure 6: CDF of Gaussian distribution

The CDF is defined as the probability that a variable X has a value less than or equal to x, and it is expressed in terms of the density function as f ( x; ,

)=

1 2

x

exp

(u - ) 2

-∞

2

2

du

The standard normal CDF, conventionally denoted with

= 0 and

(Equation 6)

, is just the general CDF evaluated

= 1,

( x ) = F ( x:0, 1) =

1 2

x -∞

exp -

u2 du 2

(Equation 7)

The standard normal CDF can be expressed in terms of a special function called the error function, as

(z)

=

1 1 + erf 2

z 2

(Equation 8)

The inverse cumulative distribution function, or quantile function, can be expressed in terms of the inverse error function: -1

( p) =

2 erf -1 ( 2p - 1)

(Equation 9)

30

This quantile function is sometimes called the probit function. There is no elementary primitive for the probit function. This is not to say merely that none is known, but rather that the non-existence of such a function has been proved. Values of

(x) may be approximated very accurately by a variety of methods, such as

numerical integration, Taylor series, or asymptotic series.

3.3.6

Generating functions

In mathematics, a generating function is a formal power series whose coefficients encode information about a sequence an that is indexed by the natural numbers. There are various types of generating functions, including ordinary generating functions, exponential generating functions, Lambert series, Bell series, and Dirichlet series. The particular generating function that is most useful in a given context will depend upon the nature of the sequence and the details of the problem being addressed. Generating functions are often expressed in closed form as functions of a formal argument

x. Sometimes a generating function is evaluated at a specific value of x. However, it must be remembered that generating functions are formal power series, and they will not necessarily converge for all values of x.

3.3.6.1 Moment generating function The moment generating function is defined as the expected value of exp(tX). For a normal distribution, completing the square in the exponent, it can be shown that M X ( t ) = E exp ( tX ) =

∞ -∞

= exp

(x 1 exp 2 2 t+

) 2

2

exp ( tx ) dx

(Equation 10)

2 2

t 2

31

3.3.6.2 Cumulant generating function The cumulant generating function is the logarithm of the moment generating function:

g(t) = t + 2−1 2t2. The derivative of the cumulant generating function is simply: g’ (t) =

+

2

t

3.3.6.3 Characteristic function The characteristic function is defined as the expected value of exp(itX), where i is the imaginary unit. For a normal distribution, the characteristic function is M X ( t; ,

)

= E exp ( itX ) ∞

=

-∞

(x 1 exp 2 2

)

2

2

exp ( itx ) dx

(Equation 11)

2 2

t 2

= exp i t -

The characteristic function is obtained by replacing t with it in the moment-generating function.

3.3.7

Properties

Some of the properties of the normal distribution: 1. If X

N( ,

2. If X

N(

X

2

,

(

) and a and b are real numbers, then aX + b 2 X

) and Y

N(

Y

2 Y

,

N a + b, ( a

)

2

)

) are independent normal random

variables, then:

•

Their sum is normally distributed with U = X + Y

•

Their difference is normally distributed with V=X-Y

•

N(

X

-

Y

,

2 X

+

2 Y

N(

X

+

Y

,

2 X

+

2 Y

)

)

Both U and V are independent of each other

32

3. If X

N ( 0,

2 X

) and Y

N ( 0,

2 Y

) are independent normal random variables,

then:

•

Their product XY follows a distribution with density p given by

p (z) =

1 X

z

K0 Y

X

where K0 is a modified Bessel function of the Y

second kind

•

X Y

Their ratio follows a Cauchy distribution with

Cauchy 0,

X Y

4. If X1 ,...,X n are independent standard normal variables, then X12 + ... + X 2n has a chi-square distribution with n degrees of freedom.

3.3.8

Standardizing Normal Random Variables

As a consequence of Property 1, it is possible to relate all normal random variables to the standard normal. If X ~ N( , 2), then Z=

X-

(Equation 12)

is a standard normal random variable: Z ~ N(0,1). An important consequence is that the CDF of a general normal distribution is therefore

Pr ( X ≤ x ) =

x-

=

1 x1 + erf 2 2

(Equation 13)

Conversely, if Z ~ N(0, 1), then X= Z+ is a normal random variable with mean

(Equation 14)

and variance

2

.

The standard normal distribution has been tabulated, and the other normal distributions are simple transformations of the standard one. Therefore, one can use tabulated values of the CDF of the standard normal distribution to find values of the CDF of a general normal distribution.

33

3.3.9

Moments

Some of the first few moments of the normal distribution are shown in Table 3. All of cumulants of the normal distribution beyond the second cumulant are zero. Table 3: Some of the first few moments of the normal distribution Number

Raw moment

Central moment

0

1

1

1

0 2

2 3

3 4

Cumulant

4

+6

+

+3 2 2

2 2

+3

4

2

2

0

0

3

4

0

3.3.10 Generating Values for Normal Random Variables For computer simulations, it is often useful to generate values that have a normal distribution. There are several methods and the most basic is to invert the standard normal CDF. More efficient methods are also known, one such method being the Box-Muller transform. An even faster algorithm is the ziggurat algorithm. The Box-Muller algorithm says that, if you have two numbers a and b uniformly distributed on (0, 1], (e.g. the output from a random number generator), then a standard normally distributed random variable is c where c = -2 ln a cos ( 2 b )

(Equation 15)

This is a consequence of the fact that the chi-square distribution with two degrees of freedom (see property 4 above) is an easily-generated exponential random variable.

3.3.11 The Central Limit Theorem The normal distribution has the very important property that under certain conditions, the distribution of a sum of a large number of independent variables is approximately normal. This is the central limit theorem. 34

Figure 7: Plot of the PDF of a normal distribution with

= 12 and

=3

approximating the PDF of a binomial distribution with n = 48 and p = 1/4

The practical importance of the central limit theorem is that the normal distribution can be used as an approximation to some other distributions. This is shown in Figure 7. •

A binomial distribution with parameters n and p is approximately normal for large

n and p not too close to 1 or 0 (some books recommend using this approximation only if np and n(1 − p) are both at least 5; in this case, a continuity correction should be applied). The approximating normal distribution has mean •

2

= np and variance

= np(1 − p).

A Poisson distribution with parameter is approximately normal for large .

The approximating normal distribution has mean

= and variance

2

= .

Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution.

35

3.3.12 Infinite divisibility The normal distributions are infinitely divisible probability distributions. To say that a probability distribution F on the real line is infinitely divisible means that if X is any random variable whose distribution is F, then for every positive integer n there exist n independent identically distributed random variables X1, ..., Xn whose sum is equal in distribution to X. This simplifies the stratification process because it is possible to divide this distribution into any number of strata, which begin and end anywhere within the distribution range.

3.3.13 Stability The normal distributions are strictly stable probability distributions. In statistics, the stability of a family of probability distributions is an important property which essentially states that if one has a number of random variates that are ‘in the family’, any linear combination of these variates will also be ‘in the family’. The importance of a stable family of probability distributions is that they serve as ‘attractors’ for linear combinations of non-stable random variates. By the classical central limit theorem the linear sum of a set of random variates, each with finite variance, will tend towards a normal distribution as the number of variates increases. Thus, in this research, although individual consumers may be of random consumption size, in aggregate, they tend to form a normal distribution.

3.3.14 Standard deviation Figure 8 indicates that dark blue is less than one standard deviation from the mean. For the normal distribution, this accounts for about 68% of the set while two standard deviations from the mean (blue and brown) account for about 95% and three standard deviations (blue, brown, and green) account for about 99.7%.

36

Figure 8: Standard Deviation Segment Sizes in Standard Normal Distribution.

In practice, one often assumes that data are from an approximately normally distributed population. If that assumption is justified, then about 68% of the values are at within 1 standard deviation away from the mean, about 95% of the values are within two standard deviations, and about 99.7% lie within 3 standard deviations. This is known as the “6895-99.7 rule” or the “Empirical Rule”. To be more precise, the area under the curve between − n and n is erf

n 2

(Equation 16)

where erf(x) is the error function. To six decimal places the values of the 1, 2, and 3 sigma points are 0.682689..., 0.954500..., 0.997300... respectively.

3.3.15 Normality tests Normality tests check a given set of data for similarity to the normal distribution. The null hypothesis is that the data set is similar to the normal distribution; therefore a sufficiently small P-value indicates non-normal data. i)

Kolmogorov-Smirnov test

ii)

Lilliefors test

iii)

Anderson-Darling test

iv)

Ryan-Joiner test

v)

Shapiro-Wilk test

vi)

normal probability plot (rankit plot)

vii)

Jarque-Bera test

37

3.3.16 Occurrence Approximately normal distributions occur in many situations, as a result of the central limit theorem. When there is reason to suspect the presence of a large number of small effects acting additively and independently, it is reasonable to assume that observations will be normal.

This is why the consumption data for the electricity consumers of

Peninsular Malaysia are taken to be normally distributed: there are a very large number of consumers who, individually, each consume a very small amount of electricity as compared to the total amount of electricity sold to all consumers. There are statistical methods to empirically test that assumption, for example the Kolmogorov-Smirnov test. Effects can also act as multiplicative (rather than additive) modifications. In that case, the assumption of normality is not justified, and it is the logarithm of the variable of interest that is normally distributed. The distribution of the directly observed variable is then called log-normal. Finally, if there is a single external influence which has a large effect on the variable under consideration, the assumption of normality is not justified either. This is true even if, when the external variable is held constant, the resulting marginal distributions are indeed normal. The full distribution will be a superposition of normal variables, which is not in general normal. This is related to the theory of errors (see Section 3.3.17 for explanation). To summarize, this is a list of situations where approximate normality is sometimes assumed.

•

In counting problems (so the central limit theorem includes a discrete-to-continuum approximation) where reproductive random variables are involved, such as

o Binomial random variables, associated to yes/no questions; o Poisson random variables, associated to rare events; •

In physiological measurements of biological specimens:

o The logarithm of measures of size of living tissue (length, height, skin area, weight);

38

o The length of inert appendages (hair, claws, nails, teeth) of biological specimens, in the direction of growth; presumably the thickness of tree bark also falls under this category;

o Other physiological measures may be normally distributed, but there is no reason to expect that a priori;

•

Measurement errors are often assumed to be normally distributed, and any deviation from normality is considered something which should be explained;

•

Financial variables

o Changes in the logarithm of exchange rates, price indices, and stock market indices; these variables behave like compound interest, not like simple interest, and so are multiplicative;

o Other financial variables may be normally distributed, but there is no reason to expect that a priori;

•

Light intensity

o The intensity of laser light is normally distributed; o Thermal light has a Bose-Einstein distribution on very short time scales, and a normal distribution on longer timescales due to the central limit theorem. Of relevance to biology and economics is the fact that complex systems tend to display power laws rather than normality.

3.3.17 Measurement errors Normality is the central assumption of the mathematical theory of errors. Similarly, in statistical model-fitting, an indicator of goodness of fit is that the residuals (as the errors are called in that setting) be independent and normally distributed. The assumption is that any deviation from normality needs to be explained. In that sense, both in model-fitting and in the theory of errors, normality is the only observation that need not be explained, being expected. However, if the original data are not normally distributed (for instance if they follow a Cauchy distribution), then the residuals will also not be normally distributed. This fact is usually ignored in practice.

39

Repeated measurements of the same quantity are expected to yield results which are clustered around a particular value. If all major sources of errors have been taken into account, it is assumed that the remaining error must be the result of a large number of very small additive effects, and hence normal. Deviations from normality are interpreted as indications of systematic errors which have not been taken into account. Whether this assumption is valid is debatable.

3.3.18 Physical characteristics of biological specimens The sizes of full-grown animals are approximately lognormal. The evidence and an explanation based on models of growth was first published in the 1932 book Problems of

Relative Growth by Julian Huxley. However, in the case of human height for example, there are people several standard deviations away from the average who would almost certainly not exist at all among the whole population of the world if height followed a true lognormal distribution. Differences in size due to sexual dimorphism, or other polymorphisms like the worker/soldier/queen division in social insects, further make the distribution of sizes deviate from lognormality. The assumption that linear size of biological specimens is normal (rather than lognormal) leads to a non-normal distribution of weight (since weight or volume is roughly proportional to the 2nd or 3rd power of length, and Gaussian distributions are only preserved by linear transformations), and conversely assuming that weight is normal leads to non-normal lengths. This is a problem, because there is no a priori reason why one of length, or body mass, and not the other, should be normally distributed. Lognormal distributions, on the other hand, are preserved by powers so the “problem” goes away if lognormality is assumed. On the other hand, there are some biological measures where normality is assumed, such as blood pressure of adult humans. This is supposed to be normally distributed, but only

40

after separating males and females into different populations (each of which is normally distributed).

3.3.19 Financial variables Because of the exponential nature of inflation, financial indicators such as stock values, or commodity prices make good examples of multiplicative behavior. As such, periodic changes in them (for example, yearly changes) should not be expected to be normal, but perhaps lognormal. This was the theory proposed in 1900 by Louis Bachelier. However, Benoît Mandelbrot, the popularizer of fractals, showed that even the assumption of lognormality is flawed – the changes in logarithm over short periods (such as a day) are approximated well by distributions that do not have a finite variance, and therefore the central limit theorem does not apply. Rather, the sum of many such changes gives logLevy distributions.

3.3.20 Distribution in testing and intelligence A great deal of confusion exists over whether or not IQ test scores and intelligence are normally distributed. As a deliberate result of test construction, IQ scores are normally distributed for the majority of the population. But intelligence cannot be said to be normally distributed, simply because it is not a number. The difficulty and number of questions on an IQ test is decided based on which combinations will yield a normal distribution. This does not mean, however, that the information is in any way being misrepresented, or that there is any kind of “true” distribution that is being artificially forced into the shape of a normal curve. Intelligence tests can be constructed to yield any kind of score distribution desired.

41

3.3.21 Numerical approximations of the normal distribution The normal distribution is widely used in scientific and statistical computing. Therefore, it has been implemented in various ways. The GNU Scientific Library calculates values of the standard normal CDF using piecewise approximations by rational functions. Another approximation method uses third-degree polynomials on intervals.

3.4

Confidence Interval

In statistics, a CI helps describe how reliable survey results are. All other things being equal, a survey result with a small CI is more reliable than a result with a large CI. More precisely, a CI for a population parameter is an interval with an associated probability p that is generated from a random sample of an underlying population such that if the sampling was repeated numerous times and the confidence interval recalculated from each sample according to the same method, a proportion p of the confidence intervals would contain the population parameter in question. Confidence intervals are the most prevalent form of interval estimation. In Figure 9, the bars represent observation means and the red lines represent the CIs surrounding them. The difference between the two populations on the left is significant at the level corresponding to the confidence probability because the intervals do not overlap.

Figure 9: Histogram with CIs

42

It must be noted that a confidence interval is not in general equivalent to a (Bayesian) credible interval. The common error of equating the two is known as the prosecutor’s fallacy. If U and V are statistics (i.e., observable random variables) whose probability distribution depends on some unobservable parameter , and Pr(U < θ < V|θ) = x

(Equation 17)

where x is a number between 0 and 1 then the random interval (U, V) is a “(100·x)% confidence interval for ”. The number x (or 100·x%) is called the confidence level or confidence coefficient. In modern applied practice, most confidence intervals are stated at the 95% level [33]

3.4.1

Practical Example

A machine fills cups with margarine, and is supposed to be adjusted so that the mean content of the cups is close to 250 grams of margarine. Of course it is not possible to fill every cup with exactly 250 grams of margarine. Hence the weight of the filling can be considered to be a random variable X. The distribution of X is assumed here to be a normal distribution with unknown expectation standard deviation

and (for the sake of simplicity) known

= 2.5 grams. To check if the machine is adequately adjusted, a

sample of n = 25 cups of margarine is chosen at random and the cups weighed. The weights of margarine are X1,…, X25, a random sample from X. To get an impression of the expectation

, it is sufficient to give an estimate. The

appropriate estimator is the sample mean: ˆ=X=

1 n

n i=1

Xi

(Equation 18)

The sample shows actual weights X1,…, X25, with mean: x=

1 25

25 i=1

x i = 250.2 (grams)

(Equation 19)

43

If we take another sample of 25 cups, we could easily expect to find values like 250.4 or 251.1 grams. A sample mean value of 280 grams however would be extremely rare if the mean content of the cups is in fact close to 250g. There is a whole interval around the observed value 250.2 of the sample mean within which, if the whole population mean actually takes a value in this range, the observed data would not be considered particularly unusual. Such an interval is called a confidence interval for the parameter . How is such an interval calculated? The endpoints of the interval have to be calculated from the sample, so they are statistics, functions of the sample X1,…,X25 and hence random variables themselves. In this example, the endpoints may determined by considering that the sample mean X from a normally distributed sample is also normally distributed, with the same expectation , but with standard deviation

n

= 0.5 (grams). By standardizing, the

random variable Z is obtained: Z=

X-

= n

X0.5

(Equation 20)

dependent on , but with a standard normal distribution independent of the parameter

to

be estimated. Hence it is possible to find numbers -z and z, independent of , where Z lies in between with probability 1 − , a measure of how confident we want to be. Taking 1−

= 0.95,

P ( -z ≤ Z ≥ z ) = 1 -

= 0.95

(Equation 21)

The number z follows from:

(z) z=

= P ( Z ≤ z) = 1 -1

( ( z )) =

-1

2

= 0.975

( 0.975) = 1.96

(Equation 22)

44

and: 0.95 = 1 = P ( -z ≤ Z ≤ z ) = P -1.96 ≤

X-

≤ 1.96 n

= P X - 1.96

n

≤

= P ( X - 1.96 × 0.5 ≤

= P ( X - 0.98 ≤

≤ X + 1.96

n

≤ X + 1.96 × 0.5 )

≤ X + 0.98)

(Equation 23)

This might be interpreted as: with probability 0.95 one will find the parameter

between

the stochastic endpoints: X - 0.98 and X + 0.98 Every time the measurements are repeated, there will be another value for the mean X of the sample. In 95% of the cases

will be between the endpoints calculated from this

mean, but in 5% of the cases it will not be. The actual CI is calculated by entering the measured weights in the formula. This 0.95 CI becomes:

( x - 0.98 ; x + 0.98 ) = ( 250.2 - 0.98 ; 250.2 + 0.98) = ( 249.22 ; 251.18 ) This interval has fixed endpoints, where

might be in between (or not). There is no

probability of such an event. It cannot be said: “with probability 1 −

the parameter

in the CI.” It is only known that by repetition in 100(1 − ) % of the cases

lies

will be in the

calculated interval. In 100 % of the cases however it doesn’t. And unfortunately it is not known in which of the cases this happens. That is why it is said: “with confidence level 100(1 − ) %

lies in the confidence interval.”

45

Figure 10: Figure shows 50 realizations of a confidence interval for

Observation of the sample means selecting or choosing from the population of all realizations. There the probability is 95% that it ends in having chosen an interval that contains the parameter. After realization, all that is obtained is the chosen interval. As seen from Figure 10, there was a fair chance of choosing an interval containing

;

however, if unlucky, the wrong one may have been picked. In other words, if one were to make a large number of sets of measurements, and calculate the confidence interval each time, one would except (on the average) such intervals to include the mean the selected percentage of times, say about 95 out of each 100 times for a 95% CI. On the other hand, for a 90% CI, the means are at the center of the intervals 90% of the time [34].

3.4.2

Theoretical Example

Suppose X1, ..., Xn are an independent sample from a normally distributed population with mean

and variance X=

. Let

( X1 + ... + X n )

1 S = n-1 2

2

n n i=1

(X

i

- X)

2

46

Then T=

XS

(Equation 24)

n

has a Student’s t-distribution with n − 1 degrees of freedom. Note that the distribution of T does not depend on the values of the unobservable parameters

and

2

; i.e., it is a

pivotal quantity. If c is the 95th percentile of this distribution, then

Pr(-c < T < c) = 0.9 (Note: “95th” and “0.9” are correct in the preceding expressions. There is a 5% chance that T will be less than −c and a 5% chance that it will be larger than +c. Thus, the probability that T will be between −c and +c is 90%). Consequently, Pr (X -

cS < n

cS = 0.9 n

and a theoretical (stochastic) 90% CI for

(Equation 25)

is obtained.

After observing the sample the values for x for X and s for S are found, from which the CI x-

cS cS ;x+ n n

(Equation 26)

is computed, which is an interval with fixed numbers as endpoints, of which no more can be said as there is a certain probability it contains the parameter . Either

is in this

interval or isn’t.

3.4.3

Interpretations of Confidence Intervals

Confidence levels are typically given alongside statistics resulting from sampling. In a statement: “we are 90% confident that between 35% and 45% of voters favor Candidate A”, 90% is the confidence level and 35%-45% is the confidence interval.

47

This statement is often misunderstood in the following way. Capital letters U and V are used for random variables; it is conventional to use lower-case letters u and v for their observed values in a particular instance. The misunderstanding is the conclusion that Pr (u < θ < v) = 0.9 so that after the data has been observed, a conditional probability distribution of , given the data, is inferred. For example, suppose X is normally distributed with expected value and variance 1. (It is grossly unrealistic to take the variance to be known while the expected value must be inferred from the data, but it makes the example simple.) The random variable X is observable. (The random variable X − value depends on .) Then X −

is not observable, since its

is normally distributed with expectation 0 and variance

1; therefore Pr (-1.645 < X - θ < 1.645) = 0.9

(Equation 27)

Consequently Pr (X - 1.645 < θ < X + 1.645) = 0.9

(Equation 28)

so the interval from X − 1.645 to X + 1.645 is a 90% CI for . But when X = 82 is observed, it can then be said that Pr (82 – 1.645 < θ < 82 + 1.645) = 0.9

(Equation 29)

This conclusion does not follow from the laws of probability because

is not a “random

variable”; i.e., no probability distribution has been assigned to it. CIs are generally a frequentist method, i.e., employed as interpretting “90% probability” as “occurring in 90% of all cases”. Suppose, for example, that

is the mass of the planet Neptune, and the

randomness in the measurement of error means that 90% of the time the statement that the mass is between this number and that number will be correct. The mass is not what is random. Therefore, given that we have measured it to be 82 units, we cannot say that in 90% of all cases, the mass is between 82 − 1.645 and 82 + 1.645. There are no such cases; there is, after all, only one planet Neptune. But if probabilities are construed as degrees of belief rather than as relative frequencies of occurrence of random events, i.e., Bayesian probability rather than frequentism, it can then be said that one is 90% sure that the mass is between 82 − 1.645 and 82 + 1.645? Many answers to this question have been proposed, and are philosophically controversial. The

48

answer will not be a mathematical theorem, but a philosophical tenet. Less controversial are Bayesian credible intervals, in which one starts with a prior probability distribution of , and finds a posterior probability distribution, which is the conditional probability distribution of given the data. For users of frequentist methods, the explanation of a CI can amount to something like: “The CI represents values for the population parameter for which the difference between the parameter and the observed estimate is not statistically significant at the 10% level”.

Critics of frequentist methods suggest that this hides the real and, to the critics, incomprehensible frequentist interpretation which might be expressed as: “If the population parameter in fact lies within the CI, then the probability that the estimator either will be the estimate actually observed, or will be closer to the parameter, is less than or equal to 90%”. Users of Bayesian methods, if producing a CI, might by contrast

say “The degree of belief that the parameter is in fact in the CI is 90%”. Disagreements about these issues are not disagreements about solutions to mathematical problems. Rather they are disagreements about the ways in which mathematics is to be applied.

3.4.4

Confidence Intervals in Measurement

More concretely, the results of measurements are often accompanied by CIs. For instance, suppose a scale is known to yield the actual mass of an object plus a normally distributed random error with mean 0 and known standard deviation . If 100 objects of known mass are weighed on this scale and it reports the values ± , then it can expected that around 68 of the reported ranges include the actual mass. If a smaller standard error value is required, then the measurement is repeated n times and the results are averaged. Then the 68.2% CI is ±

n

. For example, repeating the

measurement 100 times reduces the confidence interval to 1/10 of the original width. Note that when it is reported a 68.2% CI (usually termed standard error) as v ± , this does not mean that the true mass has a 68.2% chance of being in the reported range. In fact, the true mass is either in the range or not. How can a value outside the range be said to have 49

any chance of being in the range? Rather, this statement means that 68.2% of the ranges that are reported are likely to include the true mass. This is not just a trivial objection. Under the incorrect interpretation, each of the 100 measurements described above would be specifying a different range, and the true mass supposedly has a 68% chance of being in each and every range. Also, it supposedly has a 32% chance of being outside each and every range. If two of the ranges happen to be disjoint, the statements are obviously inconsistent. Say one range is 1 to 2, and the other is 2 to 3. Supposedly, the true mass has a 68% chance of being between 1 and 2, but only a 32% chance of being less than 2 or more than 3. The incorrect interpretation reads more into the statement than is meant. On the other hand, under the correct interpretation, each and every statement made is really true, because the statements are not about any specific range. It could report that one mass is 10.2 ± 0.1 grams, while really it is 10.6 grams, and not be lying. But if fewer than 1000 values are reported and more than two of them are that far off, there will have to be some explaining. It is also possible to estimate a CI without knowing the standard deviation of the random error. This is done using the T distribution, or by using non-parametric resampling methods such as the bootstrap, which do not require that the error have a normal distribution.

3.4.5

Robust Confidence Intervals

In the process of weighing 1000 objects, under practical conditions, it is easy to believe that the operator might make a mistake in procedure and so report an incorrect mass (thereby making one type of systematic error). Suppose the operator has 100 objects and has weighed them all, one at a time, and repeated the whole process ten times. A sample standard deviation for each object can then be calculated, and outliers may be identified. Any object with an unusually large standard deviation probably has an outlier in its data. These can be removed by various non-parametric techniques. If the operator repeated the process only three times, he would simply take the median of the three measurements and 50

use

to give a confidence interval. The 200 extra weightings served only to detect and

correct for operator error and did nothing to improve the CI. With more repetitions, the operator could use a truncated mean, discarding say the largest and smallest values and averaging the rest. Then a bootstrap calculation could be used to determine a CI that is narrower than that calculated from , and so obtain some benefit from a large amount of extra work. These procedures are robust against procedural errors which are not modeled by the assumption that the balance has a fixed known standard deviation

.

In practical

applications where the occasional operator error can occur, or the balance can malfunction, the assumptions behind simple statistical calculations cannot be taken for granted. Before trusting the results of 100 objects weighed just three times each to have CIs calculated from , it is necessary to test for and remove a reasonable number of outliers (testing the assumption that the operator is careful and correcting for the fact that he is not perfect), and to test the assumption that the data really have a normal distribution with standard deviation . The theoretical analysis of such an experiment is complicated, but it is easy to set up a spreadsheet which draws random numbers from a normal distribution with standard deviation

to simulate the situation (use =norminv(rand(),0, )) [35].

After removing obvious outliers, one could subtract the median from the other two values for each object, and examine the distribution of the 200 resulting numbers. It should be normal with the mean near zero and standard deviation a little larger than . A simple Monte Carlo spreadsheet calculation would reveal typical values for the standard deviation (around 105% to 115% of ). Or, one could subtract the mean of each triplet from the values, and examine the distribution of 300 values. The mean is identically zero, but the standard deviation should be somewhat smaller (around 75% to 85% of ).

51

3.4.6

Confidence Intervals for Proportions and Related Quantities

An approximate CI for a population mean can be constructed for random variables that are not normally distributed in the population, relying on the central limit theorem (see Section 3.3.11 for explanation), if the sample sizes and counts are big enough. The formulae are identical to the case above (where the sample mean is actually normally distributed about the population mean). The approximation will be quite good with only a few dozen observations in the sample if the probability distribution of the random variable is not too different from the normal distribution (e.g. its cumulative distribution function does not have any discontinuities and its skewness is moderate). One type of sample mean is the mean of an indicator variable, which takes on the value 1 for true and the value 0 for false. (Statisticians often call indicator variables “dummy variables”, but that term is also frequently used by mathematicians for the concept of a bound variable.) The mean of such a variable is equal to the proportion that has the variable equal to one (both in the population and in any sample). Thus, the sample mean for a variable labeled MALE in data is just the proportion of sampled observations who have MALE = 1, i.e. the proportion who are male. This is a useful property of indicator variables, especially for hypothesis testing. To apply the central limit theorem, one must use a large enough sample. A rough rule of thumb is that one should see at least 5 cases in which the indicator is 1 and at least 5 in which it is 0. Confidence intervals constructed using the above formulae may include negative numbers or numbers greater than 1, but proportions obviously cannot be negative or exceed 1. The probability assigned to negative numbers and numbers greater than 1 is usually small when the sample size is large and the proportion being estimated is not too close to 0 or 1. CIs for cases where the method above assigns a substantial probability to (− , 0) or to (1,

) may be constructed by inverting hypothesis tests. If conducting hypothesis tests

over the whole feasible range of parameter values is considered, and including any values for which a single hypothesis test would not reject the null hypothesis that the true value was that value, given a sample value, one can make a CI based on the central limit theorem that does not violate the basic properties of proportions. 52

On the other hand, sample proportions can only take on a finite number of values, so the central limit theorem and the normal distribution are not the best tools for building a CI. A better method would rely on the binomial distribution or the beta distribution, and there are a number of better methods in widespread use.

There are advantages and

disadvantages of each [36].

3.5

Boxplot

The boxplot was invented in 1977 by American statistician John Tukey. In descriptive statistics, a boxplot (also known as a box-and-whisker diagram or candlestick chart) is a convenient way of graphically depicting the five-number summary, which consists of the smallest non-outlier observation, lower quartile (Q1), median, upper quartile (Q3) and largest non-outlier observation. In addition, the boxplot indicates which observations, if any, are considered outliers. Boxplots are able to visually show different types of populations, without any assumptions of the statistical distribution. The spacings between the different parts of the box can help indicate variance, skew and identify outliers. Boxplots can be drawn either horizontally or vertically.

3.5.1

Construction

For a data set, one constructs a boxplot in the following manner: i)

Calculate the Q1 (x.25), median (x.50), and Q3 (x.75)

ii)

Calculate the IQR by subtracting Q1 from Q3 (x.75-x.25)

iii)

Construct a box above the number line bounded on the left by the first quartile (x.25) and on the right by the third quartile (x.75). The box may be as tall as one likes, although reasonably proportioned boxplots are customary

iv)

Indicate where the median lies inside of the box with the presence of a symbol or a line dividing the box at the median value

v)

Any data observation which lies more than 1.5*IQR lower than the first quartile or 1.5*IQR higher than the third quartile is considered an outlier. Indicate where the 53

smallest value that is not an outlier is by a vertical tic mark or “whisker”, and connect the whisker to the box via a horizontal line. Likewise, indicate where the largest value that is not an outlier is by a “whisker”, and connect that whisker to the box via another horizontal line vi)

Indicate outliers by open and closed dots. “Extreme” outliers, or those which lie more than three times the IQR to the left and right from the first and third quartiles, respectively, are indicated by the presence of an open dot. “Mild” outliers – those observation which lie more than 1.5 times the IQR from the first and third quartile but are not also extreme outliers are indicated by the presence of a closed dot

vii)

Add an appropriate label to the number line and title the boxplot

viii)

It is worth noting that a boxplot may be constructed in a similar manner vertically as opposed to horizontally by merely interchanging “left” for “bottom” and “right” for “top” in the above description

3.5.2

An Example

A plain-text version might look like this: +-----+-+ |-------| + | |---| +-----+-+ +---+---+---+---+---+---+---+---+---+---+ 0 1 2 3 4 5 6 7 8 9 10 *

o

number line

For this data set (values are approximate, based on the figure): i)

smallest observation (outliers excluded, minimum or min) = 5

ii)

lower quartile (Q1) = 7

iii)

median (Q2) (Med) = 8.5

iv)

upper quartile (Q3) = 9

v)

largest observation (outliers excluded, maximum or max) = 10

vi)

mean = 8

vii)

IQR = Q3 − Q1 = 2

viii)

the value 3.5 is a “mild” outlier, between 1.5*(IQR) and 3*(IQR) below Q1

ix)

the value 0.5 is an “extreme” outlier, more than 3*(IQR) below Q1

x)

the smallest value that is not an outlier is 5

xi)

the data are skewed to the left (negatively skewed) 54

The horizontal lines (the “whiskers”) extend to at most 1.5 times the box width (the IQR) from either or both ends of the box. They must end at an observed value, thus connecting all the values outside the box that are not more than 1.5 times the box width away from the box. Three times the box width marks the boundary between “mild” and “extreme” outliers. There are alternative implementations of this detail of the box plot in various software packages, such as the whiskers extending to at most the 5th and 95th (or some more extreme) percentiles. Such approaches do not conform to Tukey’s definition, with its emphasis on the median in particular and counting methods in general; and they tend to produce “outliers” for all data sets larger than ten, no matter what the shape of the distribution.

3.5.3

Visualization

The boxplot is a quick graphic approach for examining one or more sets of data. Boxplots may seem more primitive than a histogram or PDF but it does have its benefits. Besides saving space on paper, boxplots are quicker to generate by hand.

Histograms and

probability density functions require assumptions of the statistical distribution.

This

assumption can be a major barrier because binning techniques can heavily influence the histogram and incorrect variance calculations will heavily affect the probability density function. However, looking at a statistical distribution is more intuitive than looking at a boxplot, comparing the boxplot against the probability density function (theoretical histogram) for a Normal N(0,1 2) distribution may be a useful tool for understanding the boxplot in Figure 11.

55

Figure 11: Boxplot and PDF of a Normal N(0,1 2) Population

3.6

Outliers

An outlier is an extremely unrepresentative data point [37]. An outlier is an observation that lies outside the overall pattern of a distribution [38]. In statistics, an outlier is an observation that is numerically distant from the rest of the data [37]. Statistics derived from data sets that include outliers will often be misleading. Outliers may be indicative of data points that belong to a different population than the rest of the sample set. In most samplings of data, some data points will be further away from their expected values than what is deemed reasonable. Outliers arise for two reasons [39]: i)

They are legitimate observations whose values are simply unusually large or unusually small, i.e. these observations happen to be a long way from the center of the data

ii)

They are the result of an error in measurement, poor experimental technique, or a mistake in recording or entering data. They can also be due to systematic error, faults in the theory that generated the expected values. 56

Outlier points can therefore indicate faulty data, erroneous procedures, or areas where a certain theory might not be valid. However, a small number of outliers is expected in normal distributions. Deletion of outlier data is a controversial practice frowned on by many scientists and science instructors; while mathematical criteria provides an objective and quantitative method for data rejection, it does not make the practice more scientifically or methodologically sound, especially in small sets or where a normal distribution cannot be assumed.

Rejection of outliers is more acceptable in areas of practice where the

underlying model of the process being measured and the usual distribution of measurement error are confidently known [37]. If the cause of an outlier cannot be found, a good strategy is to analyze the data both with and without the outlier. If the results are similar, then the outlier is having little effect. If the results are substantially different, then the presence of the outlier should be reported, and both analyses presented. Further, in order to decide which is the appropriate analysis, it may be necessary to make extra effort to identify a cause for the outlier, or to obtain more data.

3.6.1

An Example

Outliers are often easy to spot in histograms. For example, the point on the far left in the Figure 12 is an outlier.

Figure 12: An example of an outlier in a histogram

57

Outliers can also occur when comparing relationships between two sets of data. Outliers of this type can be easily identified on a scatterplot, as shown in Figure 13. When performing least squares fitting to data, it is often best to discard outliers before computing the line of best fit. This is particularly true of outliers along the x direction, since these points may greatly influence the result [38] [40].

Figure 13: An example of an outlier in a scatterplot

3.6.2

Mild outliers

Defining Q1 and Q3 to be the first and third quartiles, and IQR to be the interquartile range (Q3 − Q1), one possible definition of being “far away” in this context is < Q1 − 1.5 * IQR,

(Equation 30)

> Q3 + 1.5 * IQR

(Equation 31)

or Q1 and Q3 define the so-called inner fences, beyond which an observation would be

labeled a mild outlier.

3.6.3

Extreme outliers

Extreme outliers are observations that are beyond the outer fences: < Q1 – 3 * IQR,

(Equation 32)

> Q3 + 3 * IQR,

(Equation 33)

or

58

3.6.4

Occurrence and causes

In the case of normally distributed data, using the above definitions, only about 1 in 150 observations will be a mild outlier, and only about 1 in 425,000 an extreme outlier. Because of this, outliers usually demand special attention, since they may indicate problems in sampling or data collection or transcription. Alternatively, an outlier could be the result of a flaw in the assumed theory, calling for further investigation by the researcher.

3.6.5

Non-normal distributions

Even when a normal model is appropriate to the data being analyzed, outliers are expected for large sample sizes and should not automatically be discarded if that is the case. Also, the possibility should be considered that the underlying distribution of the data is not approximately normal, having “fat tails”. For instance, when sampling from a Cauchy distribution, the sample variance increases with the sample size, the sample mean fails to converge as the sample size increases, and outliers are expected at far larger rates than for a normal distribution.

3.7

Extreme Values

The largest and the smallest element of a set are called extreme values, absolute extrema, or extreme records. For a differentiable function f, if f(x0) is an extreme value for the set of all values f(x), and if x0 is in the interior of the domain of f, then (x0, f(x0)) is a stationary point or critical point.

59

3.7.1

Extreme values in abstract spaces with order

In the case of a general partial order one should not confuse a least element (smaller than all other) and a minimal element (nothing is smaller). Likewise, a greatest element of a poset is an upper bound of the set which is contained within the set, whereas a maximal element m of a poset A is an element of A such that if m

b (for any b in A) then m = b.

Any least element or greatest element of a poset will be unique, but a poset can have several minimal or maximal elements. If a poset has more than one maximal element, then these elements will not be mutually comparable. In a totally ordered set, or chain, all elements are mutually comparable, so such a set can have at most one minimal element and at most one maximal element. Then, due to mutual comparability, the minimal element will also be the least element and the maximal element will also be the greatest element. If a chain is finite then it will always have a maximum (maximal element, greatest element) and a minimum (minimal element, least element). If a chain is infinite then it need not have a maximum or a minimum. For example, the set of natural numbers has no maximum, though it has a minimum. If an infinite chain S is bounded, then the closure Cl(S) of the set occasionally have a minimum and a maximum, in such case they are called the greatest lower bound and the least upper bound of the set S, respectively. In general, if an ordered set S has a greatest element m, m is a maximal element. Furthermore, if S is a subset of an ordered set T and m is the greatest element of S with respect to order induced by T, m is a least upper bound of S in T. The similar result holds for least element, minimal element, and greatest lower bound.

60

3.8

Probability of a z value

When there is a z (standardized) value for a variable, the probability of that value can be determined; by utilizing the table shown in Appendix 13 – Standard Normal (Z) Table. When a z value is looked up, the area under the normal curve will be calculated. The area not under the curve is referred to as the rejection region. It is also called a two-tailed probability because both tails of the distribution are excluded. A one-tailed probability is used when a research question is concerned with only half of the distribution. Its value is exactly half the two-tailed probability. Example 1 and Example 2 illustrate the z value calculation for popular probability values of 0.05 and 0.10 respectively. Example 1

Two-tailed probability = 0.05 1 - 0.05 2 = 0.4750

Area under graph =

z-value = 1.96

(Equation 34)

Example 2

Two-tailed probability = 0.10 1 - 0.10 2 = 0.4500

Area under graph =

z-value = 1.64

3.8.1

(Equation 35)

Critical z for a given probability

The critical z value for a given probability can also be determined as illustrated in Example 3. Example 3

A large company designed a pre-employment survey to be administered to perspective employees. Baseline data was established by administering the survey to all current employees. They now want to use the instrument to identify job applicants who have very 61

high or very low scores. Management has decided they want to identify people who score in the upper and lower 3% when compared to the norm. How many standard deviations away from the mean are required to define the upper and lower 3% of the scores? The total area of rejection is 6%. This includes 3% who scored very high and 3% who scored very low. Thus, the two-tailed probability is 0.06. The z value required to reject 6% of the area under the curve is 1.881. Thus, new applicants who score higher or lower than 1.881 standard deviations away from the mean are the people to be identified. Two-tailed probability = 0.06 1 - 0.06 2 = 0.4700

Area under graph =

z-value = 1.881

3.9

(Equation 36)

Precision Factor

In order to have the total number of samples (n), the concept of statistical precision as in the simple random sampling is applied [41]. The formula is given by:

r=

σ ∗z∗ µ

1 n

(Equation 37)

where: r is precision

σ is the standard deviation of the total consumption (kWh) µ is the mean of the total consumption (kWh) z is the standard normal variable (i.e. z equals 1.960 for a 95% CI and 1.645 for a 90% CI [42])

n is the sample size From the above equation, the total number of samples (n) can be determined for a given precision factor. It can be seen that the smaller the precision factor, the greater the samples required.

62

To complete the calculations in Equation 37, the standard deviation and mean as shown in equations 38 and 39 respectively were determined.

1 n

σ= X=

1 n

n i =1

n i =1

( X i − X )2

Xi

(Equation 38)

(Equation 39)

where:

σ is the standard deviation of the total consumption (kWh); n is the sample size; X is the mean of the total consumption (kWh); and X i is the current sample value of total consumption (kWh)

From Equation 37, the total number of samples n of each category can be determined for a given precision factor. For this study, the precision factor r is set to 0.1.

63

CHAPTER 4 DETERMINATION OF CONSUMER STRATIFICATION & SAMPLE SIZE

4.1

Introduction

In order to calculate the VoLL for Peninsular Malaysia, it was imperative that proper and accurate techniques were used in the stratification and sampling of consumer data. Both of these stages would have a large impact on the analysis and final calculation of the VoLL. The stratification stage deals with the segregation and grouping of consumers into major strata, which are then broken down further into minor strata. This stage needs to identify consumers with similar needs and activities.

This is done by profiling consumers’

consumption trends and common business practices. The sampling stage uses the various consumer stratification data to calculate the necessary number of samples for each major stratum such that the needed accuracy factor is satisfied. For this research, the precision factor r is set at 0.1 for a CI of 90% (see Section 3.4 for explanation). The consumer data from the TNB billing database yielded a large variance. Due to limitations of time and funding in this research project, it was necessary to choose a lower r and CI in order to limit the sample size. Both the r and CI values, however, are statistically acceptable.

64

4.2

Stratification

Stratification and sampling is a field under the study of statistics and is very useful when it comes to outage cost study because it allows accurate research of a large population by taking samples from that population. This research is limited to the electricity consumers of Peninsular Malaysia, which comprises 6,582,374 consumers in 2005 [14]. For such a large population, a general census would prove infeasible. consumer data.

Thus, this research will rely on sampling to collect

However, before sampling, the consumer population must first be

stratified to ensure that consumption data of each unique stratum is accurately represented. At the beginning of the study, the classification (by consumption) of consumers of electricity and their size is determined, as shown in Figure 14. Consumers are divided into three major stratums, which are domestic, commercial, and industrial consumers.

Others 5.04%

Industrial 48.03%

Domestic 18.19%

Commercial 28.74%

Figure 14: Classification of Consumers by kWh Consumption Year 2005 [43]

Once the strata are justified in general, it is required that the classification be refined further to obtain samples that are realistic, so that it reflects actual consumption trends, and deterministic, so that it is predictable under all operating conditions [44]. Hence, the TNB business code is referred to for classification into much smaller groups for the stratification and sampling purposes. The business code is used to identify the individual minor strata for each major stratum.

The details of the individual minor strata

65

stratification are discussed in Section 4.4.1 for domestic consumers, Section 4.4.2 for commercial customers, and Section 4.4.3 for industrial consumers.

4.2.1

Stratified Sampling

Stratified sampling is a method of sampling from a population. When subpopulations (stratum) vary considerably, it is advantageous to sample each stratum independently. Stratification is the process of grouping members of the population into relatively homogeneous subgroups before sampling [45]. The strata should be mutually exclusive: every element in the population must be assigned to only one stratum. The strata should also be collectively exhaustive: no population element can be excluded [46]. In this research for example, the entire population is stratified into 3 major stratums: Domestic, Commercial, and Industrial. Each stratum consists of consumers that only belong to that stratum (and no other stratum). Furthermore, all consumers are assigned to a particular stratum. Less than 1% of the total consumers (by consumption) are excluded, which statistically can be taken to mean a mutually exhaustive stratification process. Generally, stratification is used for two reasons mainly, to reduce standard errors for survey estimates and to ensure that sample sizes for strata are of their expected size. Relative to taking a completely unstratified sample, taking a proportionate sample is either a good thing, in that it reduces standard errors, or a neutral thing, if standard errors don’t change. Proportionate stratification can never increase standard errors. This is because: i)

total sampling variance can be decomposed into two components: within-strata variation and between-strata variation (the split between the two depending on how the strata are defined)

ii)

with proportionate stratification the between-strata variance becomes zero. So, proportionate stratification is most efficient when the stratifiers that are used split the total variance in a way that maximizes the between-strata variance [47]

66

After stratification, random or systematic sampling was applied within each stratum. This is done to further improve the representativeness of the sample by reducing sampling error. It can produce a weighted mean that has less variability than the arithmetic mean of a simple random sample of the population.

Proportionate stratified sampling almost

always leads to an increase in survey precision (relative to a design with no stratification), although the increase will often be modest, depending upon the nature of the stratifiers. In this research, random sampling of the minor strata was carried out, albeit the best effort was made to provide a proportionate number of samples for each minor stratum with regards to its contribution to the total consumer consumption, i.e. its weight.

4.3

Preprocessing of Population Data with SPSS Software

The way the data is distributed around a central value is very critical and important, statistically. While the values of individual data points for samples of population may vary according to a number of patterns, measurement data often follow a simple distribution. The data points are characteristically distributed symmetrically about the mean. Small deviations from the average usually occur more often than large differences and very large differences occur rarely. When a frequency distribution is constructed of large sample results from general population, one often obtains the familiar bell shaped curve which has been mathematically described by the mathematician Gauss and is called the Normal distribution or Gaussian distribution. Statistically, there are many terms related to the roots of sampling. Generally, variance, standard deviation, and mean are used regularly. Moreover, there are two special kinds of departures from the normal distribution: skewness and kurtosis (curvature), in which the data are abnormally compressed or are more spread out than for a true normal distribution. For a perfect normal distribution, the skewness factor will be zero. A negative value is due to the skewness toward lower values; more data in the left tail than would be expected in a normal distribution. This can be seen in the left diagram of Figure 15 by the elongated tail at the left. A positive value indicates excess higher values; more data in the

67

right tail than would be expected in a normal distribution. This can be seen in the right diagram of Figure 15 by the elongated tail at the right.

Figure 15: Negative skew (left diagram) and positive skew (right diagram).

From Figure 16, with reference to perfect normal distribution, the kurtosis factor will be expected to be zero (mesokurtic curve). A negative value indicates a sharper curve (leptokurtic curve) while a positive value indicates data that are more spread out than normal (platykurtic curve).

Hence, an improper approach and combination of the

skewness and kurtosis will result in poor sampling.

Figure 16: Kurtosis Factor Impact on a Normal Distribution

Based on ordinary variability, there will be some level of probability – a level chosen arbitrarily – where any chosen difference is significant. However, naturally, one sets the probability, such as 90% confidence interval, for a category. In this way, there is a limit set to obtain samples that drop in the particular confidence interval. Since the consumer population was very large and the distribution was asymmetrically, a CI of 90% was chosen. The computed variance was also significantly high due to the large but not dense distribution of the population. Therefore, a precision factor of 0.1 was set for the sampling process.

68

Taking these factors into consideration, the statistical software SPSS was used to preprocess the raw data and check for proper normal distribution. The SPSS output for domestic consumers is listed in Appendix 7, commercial consumers in Appendix 8, and industrial consumers in Appendix 9.

4.4

Sampling with SPSS Software

Sampling is that part of statistical practice concerned with the selection of individual observations intended to yield some knowledge about a population of concern, especially for the purposes of statistical inference [48]. In particular, results from probability theory and statistical theory are employed to guide practice. The sampling process consists of five stages [49]: i)

Definition of population of concern

ii)

Specification of a sampling frame, a set of items or events that it is possible to measure

iii)

Specification of sampling method for selecting items or events from the frame

iv)

Sampling and data collecting

v)

Review of sampling process

It is important to create a sample that correctly reflects the makeup of the whole population [50]. For the VoLL calculations, there are two primary objectives in sample design [51]: i)

Ensure that the sample is representative of the population interest so that the resulting outage cost estimates are not biased

ii)

Efficiency to produce the most statistically precise outage cost estimates possible given the resources available for the research

The objective in sampling is to achieve the greatest statistical precision possible in the outages cost estimates for a given sample size. The procedure in sampling is a three step approach. First, the domain is defined. Here, the consumer population is defined in terms of consumer economic activity. It is here that

69

the business code for consumer classification is very useful, because it allows for quick and precise classification of consumers. Second, the preliminary data is analyzed.

At this point it is desirable to employ a

stratification scheme; this research relies heavily on the consumers’ business codes and, to a lesser extent, on tariff and consumption data. This is where the consumers for the minor strata are randomly selected and listed for future reference. Every effort was made to ensure proportionate random sampling of each minor stratum. Third, the final sample is compiled. The final sample includes an overall sample size for each major strata. It lists the respective particulars of the consumers that this research would like to sample. At the onset of this research, it was clear that stratification of major stratum would have to follow domestic, commercial, and industrial strata. The main reason being that TNB grouped its consumers in this manner and therefore had set different tariff rates for these three major strata. Electricity tariffs are a major component to consider when making outage cost estimations. However, to leave the stratification process at three major strata would yield a very inefficient sampling of the population. Table 4 shows the number of samples needed for each major stratum. Calculation were made for a 90% CI and setting r = 0.1. The domestic strata yields a reasonable sample size consistent with surveys in other countries such as the US, UK, Canada, Australia, and New Zealand. The commercial and industrial strata, however, yield a sample size that is disproportionately large as compared to their respective population sizes. There were two causes for this problem. Table 4: Number of Samples needed for each Major Strata (90% CI, r=0.1) No 1 2 3

Category Domestic Commercial Industrial

Number of Samples (n) 2048 26962 4487

70

Firstly, the data extraction from the TNB billing database yielded data for only about 5.2 million of the 6.6 million consumers TNB had in 2005. The remaining balance data could not be extracted because while it was assumed that all consumers were billed once every calendar month, after many months of investigation this was found to be untrue. There were consistently about 1 million consumers whose meter reading spilled over into the next calendar month and thus, were not captured in our data extraction. However, by the time this was discovered, it was too late to perform another data extraction and the research had already progressed passed the stratification stage and was already in the data collection (sampling) stage. Secondly, the normal distribution of the consumer data yielded a very mesokurtic curve. This resulted in the σ/µ ratio being very large (typically >1), which in Equation 37 produced very large sample sizes. Comparatively, the US research by EPRI consistently utilized σ/µ ratio of <1.

4.4.1

Domestic Consumers

Business codes are split into four categories depending on the occupants’ total consumption and similarity in terms of building design. The four categories mentioned are listed below in Table 5. Table 5: List of Domestic Business Codes No 1 2 3 4

Minor Strata Kampong House Link House High Rise Bungalow

Business Code 63253 63221, 63222 63224, 63225, 63231 to 63238 63211, 63212, 63250 to 63252

For the purpose of stratification and sampling in this study, squatter houses [Business Code: 63240] and long house (temporary) [Business Code: 63241] are neglected because they have low power consumption and very low volume of data for sampling. Out of the total number of domestic consumers, the neglected business codes contribute less than 1 percent.

71

From this statistics it can be concluded that the stratification is done based on total consumption in terms of kWh and also volume of data. Number of samples to work on is then calculated as shown in Table 6, where Equation 37 is applied to find the estimated proportional value for this research. Table 6: Number of Sample Selection for a Proportional Stratified Random Sample No 1 2 3 4

Minor Strata Kampong House Link House High Rise Bungalow Total

Number of Samples (n) 576 1077 275 120 2048

Meanwhile, in a non-proportional stratified sample, the number of items chosen in each stratum is disproportional to the respective numbers in the population. Regardless of whether a proportional or non-proportional sampling procedure is used, every item in the population has a chance of being selected for the sample.

4.4.2

Commercial Consumers

In this research, TNB’s existing consumer list based on business code classification is adapted. Table 7 shows the business class and business code as mentioned in the TNB billing system. Table 7: List of Commercial Business Codes No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Commercial Minor Strata Accommodation Agriculture Communication Construction Financial Institution Insurance Other Services Real Estate / Business Services Recreational Retail Shop House Social Service Transport Wholesale

Business Code 63200 – 63206 11111 – 11199 72001 – 72009 50011 – 50029 81011 – 81030 82001 – 82005 95110 – 95990 83101 – 83300 94110 – 94901 62110 – 63100 63223 93100 – 93400 71110 – 71920 61110 – 61500

Example Rest house, hotel, motel Celcom, Digi, Maxis Plumbing, carpentry, contractor Banks, pawnshop AIA, Great Eastern Laundry, barber Law firm, advertising Stadium, TGV Restaurant, café, 7-eleven Clinic, schools, kindergarten Taxi service, travel agency Mydin, Carrefour, Makro

72

In deciding among alternative capital investments and operating procedures that can affect electric power supply reliability and quality, it is important to take account of the interruption cost that consumers will experience as a result of the outage. In order to obtain such critical information from consumer survey, the appropriate and genuine consumers for each classification needs to be considered. In order to determine the total sample size, required targets need to be set for each category. This is to ensure the minimum number of samples achieved for analysis of Value of Loss Load.

Based on the preliminary calculation of mean and standard

deviation, it was observed that the standard deviation is larger then mean; hence it was difficult to obtain a sample size using the same method for domestic consumers. Therefore, a rather optimistic method is being used to calculate the sample size for this category of consumers. The first step was to categorise the list of consumers into business code classification as shown in Table 7. As the TNB database has over six million entries, sorting was done by writing a query in Microsoft Access to split the database. Once the entries were sorted, it was further stratified with reference to the tariffs and a query was written to sort into Tariff B, Tariff C and Tariff D. Based on the tariff classification by TNB as listed below, TNB has classified Tariff D into Industrial Tariff, but for the purpose of this research it was considered a cross set between commercial and industrial. Figure 17 illustrates the tariff classification. i)

Tariff B – Low Voltage Commercial Tariff

ii)

Tariff C1 – Medium Voltage General Commercial Tariff

iii)

Tariff C2 – Medium Voltage Peak/Off-Peak Commercial Tariff

iv)

Tariff D – Low Voltage Industrial Tariff

The reason for this cross set to exist was due to the fact that consumers listed under the business code of commercial consumers were billed with Tariff D rates.

73

E

Tariff B Tariff C1 Tariff C2

Tariff F1, F2

Tariff D

Tariff A

Tariff F, G, G1 Tariff E1 Tariff E2 Tariff E3

Tariff H, H1, H2

Figure 17: Venn diagram for Tariff Classification In order to achieve high precision, a large number of consumers (sample size) needed to be considered. Since it would be time consuming and not feasible (see Section 4.4 for explanation) to obtain the large sample size; therefore an optimistic sample size was manually fixed for each minor stratum. In the case of commercial consumers, a size of five samples is fixed for each minor stratum. Hence, with 14 categories, our target sample size will be 70 samples from Peninsular Malaysia. These samples comprise of all three tariff types. Since tariff B plays a major role in classification compared to the other two tariffs, therefore samples of tariff B will eventually have a larger portion of samples during distribution. This would help maintain a proportionate random sampling process.

4.4.3

Industrial Consumers

In the case of industrial consumer stratification, four steps were taken to determine the samples: i)

Stratification variables or set of variables

ii)

Boundaries between cell

iii)

Allocation Schemes

iv)

Final sample

Stratification

Boundaries

Allocation

variables

between cells

Schemes

Final sample

Figure 18: Industrial Stratification Method

74

The list of industries and classification from MITI [52], PTM [53], FMM [54] and TNB [55] are obtained.

Figure 19: Industrial Stratification Process Flow

A list of all industries from the data provided by other agencies such as MITI, MIDA and PTM with references from TNB are consolidated. As a result, a list of 18 industries that contributes heavily to the growth of the country can be compiled. This list is shown below: i)

Electrical and Electronics

ii)

Paper, Printing And Publishing

iii)

Chemicals And Petrochemicals

iv)

Iron And Steel

v)

Non-Ferrous Metals And Their Products

vi)

Transport Equipment

vii)

Food

viii)

Wood And Wood Products

ix)

Textile And Textile Products

x)

Clay-Based And Other Non-Metallic Mineral

xi)

Plastic

xii)

Machinery And Machinery Components

xiii)

Rubber

xiv)

Beverage And Tobacco

xv)

Furniture And Fixture

xvi)

Palm And Palm Kernel Oil

xvii)

Pharmaceutical

xviii) Miscellaneous

75

$

$ "

# '

(

#

$"

#% &

$

!

Figure 20: Formation of Boundaries between Cells

The break point between each minor stratum needs to be clarified at this point. For this purpose, we used the Dalenius–Hodges method to develop the sample cell boundaries. In order to split the respective stratum, stratification has to be implemented based on the 12month running peak consumption (kWh). Table 8 details the categorization that is used in this research. Table 8: Boundaries between cells No 1 2 3

Industry Small Medium Big

Business Code 31152 35600 35300

Range of Peak Consumption (kWh) 0 – 100000 100001 – 1000000 1000000 – Infinity

In this allocation scheme, random sampling needs to be implemented.

In random

sampling, there is a choice of to use either the Neyman design or the proportional design. In Neyman sampling, the sample sizes are chosen proportional to the products of the standard deviations and the stratum sizes For industrial consumers, Neyman sampling was chosen since it is better than proportionate sampling [56]. The additional work to determine cell boundaries and to use Neyman sampling instead of proportional random sampling were necessary because industrial consumers consists of a large range; from those with small consumptions to those with very large consumptions. The additional steps, which are not utilized in the domestic and commercial strata, are necessary for the industrial strata to ensure that both large and small industrial consumers are fairly represented in the sample size. 76

After the cell boundaries are determined, the allocation scheme can proceed to determining the sample size given the precision factor r = 0.1. Figure 21 illustrates the complete process flow of the allocation process for determining the number of samples needed for industrial consumers.

$ $

$" $ )

*

+

(

$

& $

Figure 21: Allocation scheme

From these calculations, it can be concluded that the stratification is done based on total consumption in terms of kWh and also volume of data. For similar reason as those stated in Section 4.4.2, the number of samples is set at 3 samples per minor strata as shown in Table 9. Table 9: Number of Sample Selection for a Proportional Stratified Random Sample No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Minor Stratum Electrical & Electronic Paper, Printing and Publishing Chemical & Petrochemical Iron & Steel Nonferrous Metals Transport Equipment Food Wood & its products Textiles & its products Clay-based & other nonmetallic minerals Plastics Rubber Beverage & Tobacco Furniture & Fixtures Palm & Palm Kernel Oil Pharmaceutical Machinery & Machinery Equipment Others Total

Number of Consumers 1,004 1,188 313 2518 823 1118 1366 2825 2663 926 1451 790 782 0 170 155 594 333 26,689

Number of Samples (n) 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 54

77

The final list of samples can then be determined. However, for industrial consumers it was important to consider the geographical location of the consumer as certain industries could only be found at specific geographical locations. For example, the petrochemical industry would be concentrated near oil and gas fields, which are predominantly on the East Coast, especially in Kerteh, Terengganu. Figure 22 illustrates the overall process of determining the sample size and final sample list for industrial consumers.

(

,

$"

Figure 22: Final sample

4.5

Normalization of Consumer Survey Data

Normalization is defined in relational database design as the process of organizing data to minimize redundancy. Normalization usually involves dividing a database into two or more tables and defining relationships between the tables. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships [57].

Figure 23: The bell curve that shows the normal distribution from sample data

The normal distribution or Gaussian distribution is the plot of the PDF for any given large population. It is often known as the bell curve because the shape of the plot resembles a bell.

For this study, we will assume that the PDF of TNB consumers is Gaussian 78

distributed. Therefore, all survey data will be normalized against the Gaussian PDF plot, such as the one shown in Figure 23. Normalization is the process of removing statistical error in repeated measured data [58]. In this study, a survey sample is taken and evaluated to determine its VoLL. This VoLL value will be normalized against the respondent’s stratum’s mean kWh consumption. Therefore, the steps necessary to calculate the VoLL for a particular stratum are: i)

Calculate the mean kWh consumption for each stratum. This can be done by simply summing all elements and dividing that value by the number of elements. This is the same calculation as shown in Equation 39.

ii)

Normalize each VoLL value obtained from survey sampling. To normalize a VoLL value, the following formula is used.

VoLLstratum normalized =VoLLsurvey ×

kWhstratum mean kWh survey

(Equation 40)

This process is illustrated in Figure 24. The survey VoLL value may be for a respondent who consumes more (or less) than the mean kWh consumption for his stratum. Therefore, that value must be normalized with the stratum mean.

Survey normalized

Mean e.g. 5kWh

Survey e.g. 7kWh

Figure 24: Illustrates the normalization process of a survey VoLL value

79

iii)

Calculate the mean VoLL value for each stratum. This step once again requires the use of X =

1 n

n i =1

Xi

(Equation 39; this time

to calculate the mean VoLL value.

4.6

Preprocessing of Sample Data

Analysis of the consumer data suggests that the consumption of commercial and industrial consumers is skewed. This is because there is a minority of commercial and industrial consumers who have a much higher consumption than the average value of their respective categories.

Hence, the number of samples needed from commercial and

industrial consumers, as calculated by Equation 37, is much higher than the number of samples needed from domestic consumers, although the number of commercial and industrial consumers is much lower than the number of domestic consumers. To solve this problem, the consumer database was pre-processed to exclude outliners and extreme values. Outliers and Extreme values are discussed further in Section 3.6 and Section 3.7 respectively.

The following sections discuss domestic, commercial, and industrial

consumer data in greater detail.

4.6.1

Domestic Consumer Data Table 10: Domestic Statistics Range Type Kampong Terrace High-rise Bungalow

Mean (kWh) (µ) 181.8853 307.9136 234.0532 471.5546

+90% (µ*1.90) 345.5821 585.0358 444.7011 895.9536

-90% (µ*0.10) 18.18853 30.79136 23.40532 47.15546

80

Statistical parameters for domestic precision factor and number of samples calculation: Mean (µ)

198.360642

Standard Deviation (σ)

120.670647

Number of sample (n)

2015

Standard Normal Variate (Z)

1.645 (for a 90% Confidence Interval)

From Equation 37, Precision (r)

0.0223

Before obtaining the precision factor, r, the mean consumption of each domestic stratum must be multiplied with a factor 1.90 and 0.1 to normalize the distribution to +90% and – 90% of the mean. Table 10 shows the high and low cutoff points for each domestic stratum. The boundaries between 0.0 – 0.1 and 1.9-2.0 must then be cleared to complete the normalization process.

This yields a normalization range of 0.1 – 1.9 of mean

consumption. Figure 25 shows the boxplot representation of the preprocessed data.

Figure 25: SPSS Boxplot of Domestic Consumer Population (CI = 90%)

The detailed SPSS output is appended in Appendix 10 – SPSS Analysis of Domestic Consumer Samples. 81

4.6.2

Commercial Consumer Data

Statistical parameters for commercial precision factor and number of samples calculation: Mean (µ)

699.7677

Standard Deviation (σ)

1042.4832

Number of sample (n)

74

Standard Normal Variate (Z)

1.645 (for a 90% Confidence Interval)

From Equation 37, Precision (r)

0.2840

Table 11: Commercial Statistics Range Type

Mean (kWh) (µ)

+90% (µ*1.90)

-90% (µ*0.10)

Accommodation

3336.6493

6339.63

333.66

Agriculture

4968.7967

9440.71

496.88

Communication

4427.6905

8412.61

442.77

Construction

2258.4385

4291.03

225.84

Financial Institution

5976.8402

11355.9964

597.684

Insurance

1800.8203

3421.5586

180.08

Real Estate / Business Services

1386.8678

2635.04

138.67

Recreation

34293.395

65157.45

3429.34

Residential Shop House

716.94182

1362.19

71.69

Retail

1165.1658

2213.81

116.52

Social Service

2449.9959

4654.99

244.99

Transport

3407.0479

6473.39

340.704

Wholesale

2143.717

4073.06

214.37

Others

934.29393

1775.15

93.43

Before obtaining the precision factor, r, the mean consumption of each commercial stratum must be multiplied with a factor 1.90 and 0.1 to normalize the distribution to +90% and –90% of the mean. Table 11 shows the high and low cutoff points for each commercial stratum. The boundaries between 0.0 – 0.1 and 1.9-2.0 must then be cleared to complete the normalization process. This yields a normalization range of 0.1 – 1.9 of mean consumption. Figure 26 shows the boxplot representation of the preprocessed data.

82

Figure 26: SPSS Boxplot of Commercial Consumer Population (CI = 90%)

The detailed SPSS output is appended in Appendix 11 – SPSS Analysis of Commercial Consumer Samples.

4.6.3

Industrial Consumer Data

Statistical parameters for industrial precision factor and number of samples calculation: Mean (µ)

10540.72

Standard Deviation (σ)

12852.91

Number of sample (n)

63

Standard Normal Variate (Z)

1.645 (for a 90% Confidence Interval)

From Equation 37, Precision (r)

0.252713

83

Table 12: Industrial Statistics Range Mean (kWh)

+90%

-90%

*1.90

*0.1

E&E

36477.009

69306.3171

3647.7009

Paper, Printing

15966.551

30336.4469

1596.6551

Chemical

28071.11

30254.5607

1592.3453

Iron & Steel

12858.318

24430.8042

1285.8318

N. Metal

4442.61

8440.959

444.261

Transport Equip

6801.7306

12923.28814

680.17306

Food

21465.127

40783.7413

2146.5127

Wood & Prod

19312.481

36693.7139

1931.271

Textile

6813.8849

12946.38131

681.38849

Clay-based

18187.995

34557.1905

1818.7995

Plastics

43474.151

82600.8869

4347.4151

Rubber

32501.994

61753.7886

3250.1994

Beverage & T

15522.685

29493.1015

1552.2685

Furniture

17273.1

32818.89

3281.889

Palm Oil

18052.81

34300.339

1805.281

Pharmaceutical

17948.473

34102.0987

1794.8473

Machinery

16415.808

31190.0352

1641.5808

Others

11154.974

21194.4506

1115.4974

Before obtaining the precision factor, r, the mean consumption of each industrial stratum must be multiplied with a factor 1.90 and 0.1 to normalize the distribution to +90% and – 90% of the mean. Table 12 shows the high and low cutoff points for each industrial stratum. The boundaries between 0.0 – 0.1 and 1.9-2.0 must then be cleared to complete the normalization process.

This yields a normalization range of 0.1 – 1.9 of mean

consumption. Figure 27 shows the boxplot representation of the preprocessed data.

84

Figure 27: SPSS Boxplot of Industrial Consumer Population (CI = 90%)

The detailed SPSS output is appended in Appendix 12 – SPSS Analysis of Industrial Consumer Samples

85

CHAPTER 5 CONCLUSION

This VoLL research was carried out with the objective of developing a formulation which enables the calculation of the composite VoLL for Peninsular Malaysia. To accomplish this it was necessary to first, group or stratify electricity consumers based on their consumption, second, determine each stratum’s weights, and third, determine the individual outage cost of consumers of the identified strata. The stratification of consumers was done by taking into consideration stratification data from many organizations such as FMM, MITI, MIDA, and PTM. TNB’s tariff structure and business codes were also used in the stratification process. The finalized stratification comprises 3 major strata: Domestic, Commercial, and Industrial. There are 4 minor domestic strata, 14 minor commercial strata, and 18 minor industrial strata. The number of minor strata depended on the level of diversity of activity, processes, and consumption of the respective major strata. To calculate the sample size, it was necessary to eliminate outliers and extreme values to increase precision.

Next, the sample size of each major stratum was calculated in

accordance to Equation 37, for r = 0.1 at a CI of 90%. However, for the commercial and industrial strata, this sample size proved impractical and had to be fixed at 5 and 3 samples per minor strata respectively. The weight of each stratum was determined based on the stratum’s consumption versus the total consumption of all strata. Statistical analysis by SPSS proved that the stratification was adequate to provide the level of precision (r = 0.1, CI = 90%) that was required by this research.

86

The VoLL research project concluded that the VoLL for the Malaysian ESI is RM22.72/kWh interrupted [59] for a one hour outage. However, this research noted that the VoLL should be lower due to the limitation of time and funds to gather more data. Therefore, one more data extraction should be done to capture all consumers in TNB’s billing database and more samples should be collected, especially for the commercial and industrial strata. While calculating the VoLL can be a tremendous effort, once calculated, it can be used in a variety of techno-economic analysis. The VoLL can be used to determine the optimal reliability and outage cost of a system. Figure 28 illustrates the reliability cost borne by the utility and the reliability worth as perceived by the consumer with reference to the system reliability. By developing the reliability cost and reliability worth curves, we can quantify the reliability cost and outage cost. Summing the two curves, we obtain the total cost curve.

Outage Cost

4. Total Cost (TC, Red) = Reliability Cost + Reliability Worth 2. Present Reliability Level

1. Reliability Cost for TNB (RC, Blue)

R big? C big? 3. Reliability Worth to Consumer (RW, black) Reliability 5. Optimal Reliability and Cost Figure 28: Outage Cost vs. Reliability Curve

The optimal reliability and cost point is at the minimum of the total cost curve. Operating the system at this point allows optimal investment of resources by the utility while at the same time providing an acceptable level of reliability to the consumers. Alternatively, the same techno-economic analysis can be performed on a specific geographical area.

87

REFERENCES [1]

H.G. Stoll, “Least-Cost Electric Utility Planning”, John Wiley & Sons, 1989, pp 167, ISBN 0-47163614-2.

[2]

M.J. Sullivan, & D.M. Keane, “Outage Cost Estimation Guidebook”, EPRI Research Project 287804 Final Report, December 1995, pp 1-1.

[3]

“Optimization - Wikipedia, the free encyclopedia”, http://en.wikipedia.org/wiki/Optimization, 27 October 2005.

[4]

“Transmission Network Planning Manual”, Tenaga Nasional Network Planning Division, April 1998, pp 2-1 – 2-8.

[5]

Scientific Advisory Panel (SAP), US EPA, Policy for Review of Monte Carlo Analyses for Dietary and Residential Exposure Scenarios Meeting, “Attachment 2: Probabilistic Risk Assessments and Monte-Carlo Methods: A Brief Introduction”, March 1998, pp A2-3, Available at http://www.epa.gov/scipoly/sap/meetings/1998/march/attach2.pdf, Retrieved 21 June 2007.

[6]

M.J. Sullivan, & D.M. Keane, “Outage Cost Estimation Guidebook”, EPRI Research Project 287804 Final Report, December 1995, pp 6-1.

[7]

A.H. Hashim, “MEE Course Notes”, Universiti Tenaga Nasional, 2005.

[8]

H.G. Stoll, “Least-Cost Electric Utility Planning”, John Wiley & Sons, 1989, pp 363 – 365.

[9]

R. Billinton & R.N. Allan, “Reliability Evaluation of Power Systems, 2nd Ed”, Plenum Press, New York and London, 1996, pp. 302 – 326 and pp.443 – 473. ISBN 0-306-45259-6.

[10]

M.J. Sullivan, & D.M. Keane, “Outage Cost Estimation Guidebook”, EPRI Research Project 287804 Final Report, December 1995, pp 1-4.

[11]

M.J. Sullivan, & D.M. Keane, “Outage Cost Estimation Guidebook”, EPRI Research Project 287804 Final Report, December 1995, pp 8-13.

[12]

K.K. Kariuki & R. N. Allan, “Evaluation of Reliability Worth and Value of Lost Load, IEEE, ProcGeneration, Transmission, Distribution, Vol. 143, No. 2, March 1996, pp. 176 – 177.

[13]

Energy Research Institute, Chulalongkorn University, Thailand, “Electricity Outage Cost Study”, Available at http://www.eppo.go.th/power/ERI-study-E/ERI-ExeSummary-E.html, 19 November 2005.

[14]

Tenaga Nasional Berhad Annual Report, 2005, pp 5.

[15]

Tenaga Nasional Berhad Tariff Book, 2006, pp1.

[16]

Tenaga Nasional Berhad Annual Report, 2005, pp 62.

[17]

A.S. Hornby, “Oxford Advanced Learner’s Dictionary, 4th Edition”, Oxford University Press, 1989, pp 230.

[18]

S.M. Stigler, “The History of Statistics: The Measurement of Uncertainty before 1900”, Cambridge, MA, and London, Belkap Press of Harvard University Press, 1990, ISBN 0-674-40341-X.

[19]

http://en.wikipedia.org/wiki/Statistics, Retrieved 21 September 2006.

88

[20]

Inspired by Figure 4.3 of Ward, A. W., Murray-Ward, M., “Assessment in the Classroom”, Wadsworth Pub Co, Belmont, CA, United States, 1999, pp 74, ISBN 0-53-452704-3.

[21]

http://dictionary.reference.com/browse/statistics, Retrieved 16 October 2006.

[22]

The American Heritage® Dictionary, Fourth Edition, “Statistics”, ISBN 0-39-544895-6, Available at http://education.yahoo.com/reference/dictionary/entry/statistics, Retrieved 16 October 2006.

[23]

M. Foucault & R. Hurley, “The Will to Knowledge”, Penguin Books Ltd, 1977, ISBN 0-14026868-5.

[24]

http://en.wikipedia.org/wiki/Statistics, Retrieved 21 September 2006.

[25]

I. Hacking, “The Emergence of Probability: A Philosophical Study of Early Ideas About Probability, Induction and Statistical Inference, 2nd Ed”, Cambridge University Press, 1984, ISBN 0-52-131803-3.

[26]

K.L. Wuensch, Department of Psychology, East Carolina University, “When does correlation imply causation?”

[27]

http://en.wikipedia.org/wiki/Experimental_design, Retrieved 7 November 2006.

[28]

S.M. Jex, “Organizational Psychology: A Scientist Practitioner Approach”, John Wiley & Sons, New York, 2002, ISBN 0-47-137420-2.

[29]

J. Best, “Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists”, University of California Press, 2001.

[30]

D. Huff, “How to Lie with Statistics”, Penguin Books Ltd, 1991, ISBN 0-14-013629-0.

[31]

R.P. Abelson, “Statistics As Principled Argument”, LEA, Inc, 1995, ISBN 0-80-580528-1.

[32]

http://dictionary.laborlawtalk.com/Normal_distribution, Retrieved 12 November 2006.

[33]

J.H. Zar, “Biostatistical Analysis”, Prentice Hall International, New Jersey, 1984, pp 43 – 45, ISBN 0-13-100846-3.

[34]

J.K. Taylor, “Statistical techniques for Data Analysis”, Lewis Publishers New York, 1990, pp 68.

[35]

J.W. Wittwer, “Monte Carlo Simulation in Excel: http://vertex42.com/ExcelArticles/mc/, Retrieved June 1, 2004.

[36]

L.D. Brown, T. Tony Cai, A. DasGupta, “Interval Estimation for a Binomial Proportion”, Statistical Science, Volume 16, Number 2 (May, 2001), pp 101 – 117.

[37]

J.D. Petruccelli, B. Nandram, M. Chen, “Applied Statistics for Engineers and Scientists, 1st Ed”, Prentice Hall, New Jersey, 1999, pp 58 – 59, ISBN 0-13-565953-1.

[38]

D.S. Moore, & G.P. McCabe, “Introduction to the Practice of Statistics, 3rd Ed”, W. H. Freeman, New York, 1999.

[39]

J.S. Milton & J.C Arnold, “Introduction to Probability and Statistics: Principles and Applications for Engineering and the Computing Sciences, 3rd Ed”, McGraw-Hill Companies, 1994, pp 204 – 205, ISBN 0-07-042623-6.

[40]

Renze, John, “Outlier” From MathWorld – A Wolfram Web Resource, created by Eric W. Weisstein, available at http://mathworld.wolfram.com/Outlier.html, Retrieved 7 November 2006.

A

Practical

Guide”,

89

[41]

M.J. Sullivan, & D.M. Keane, “Outage Cost Estimation Guidebook”, EPRI Research Project 287804 Final Report, December 1995, pp 6-2.

[42]

http://www.statsoft.com/textbook/sttable.html#z, Retrieved 11 November 2006.

[43]

Tenaga Nasional Berhad Annual Report, 2005, pp 5.

[44]

http://en.wikipedia.org/wiki/Deterministic_computation, Retrieved 15 August 2006.

[45]

Andrew F. Siegel, “4th Edition Practical Business Statistics”, Irwin McGraw-Hill, 2000, pp279 – 282.

[46]

http://en.wikipedia.org/wiki/Stratified_sampling, Retrieved 14 August 2006.

[47]

http://www.dcs.napier.ac.uk/peas/sratheory.htm#tion, Retrieved 14 August 2006.

[48]

http://en.wikipedia.org/wiki/Sampling_%28statistics%29, Retrieved 11 August 2006.

[49]

Makerere University Institute of Statistics & Applied Economics (ISAE).

[50]

http://www.statpac.com/surveys/sampling.html, Retrieved 14 August 2006.

[51]

M.J. Sullivan, & D.M. Keane, “Outage Cost Estimation Guidebook”, EPRI Research Project 287804 Final Report, December 1995, pp 4-5.

[52]

“List of promoted Activities and products”, Ministry of Industrial and Trade International, 2005.

[53]

Energy Intensity pg 34, A report on Energy Efficiency in Malaysian Industries: Economic Analysis and Recommendations for an Institutional Framework. Final Report November 1994, Energy Conservation Study (ADB-TA No 1574-MAL), Ministry of Energy, Telecommunication and Post.

[54]

Directory Federation of Manufacturing Malaysia 2005, Federation of Manufacturing Malaysia, 2005.

[55]

TNBD Consumer Database printed on 21 February 2006.

[56]

T.W. Anderson & Stanley L. Sclove, “Statistical Analysis of Data, 2nd Ed”, Scientific Press, Palo Alto, Cal., 1986, pp 580 -581.

[57]

http://www.webopedia.com/TERM/n/normalization.html, Retrieved 10 August 2006.

[58]

Douglas A. Lind, Robert D. Mason, William G. Marchal, “Basic Statistics for Business and Economics 3rd Edition”, Irwin McGraw-Hill, 2000, pp 197 – 211.

[59]

TNB Research Sdn Bhd & Power Engineering Center, Universiti Tenaga Nasional, “Determining the Value of Lost Load in the Malaysian Electricity Supply Industry”, November 2006.

90

Appendix

APPENDIX 1 – TNB BUSINESS CODES Business Code

Description

Type of consumer

11111

Agriculture:Padi

COM

11112

Agriculture:Tobacco

COM

11113

Agriculture:Tapioca

COM

11114

Agriculture:Sugar-cane

COM

11115

Agriculture:Market gardening

COM

11119

Agriculture:Other corps n.e.c

COM

11120

Agriculture services

COM

11121

Agriculture:Rubber estate

COM

11122

Agriculture:Rubber smallholdings

COM

11123

Agriculture:Oil palm estate

COM

11124

Agriculture:Oil palm smallholdings

COM

11125

Agriculture:Coconut estate

COM

11126

Agriculture:Coconut smallholdings

COM

11127

Agriculture:Tea estate

COM

11128

Agriculture:Coffee estate

COM

11129

Agriculture:Cocoa

COM

11131

Agriculture:Pepper

COM

11132

Agriculture:Pineapple

COM

11133

Agriculture:Banana

COM

11134

AgricultureOther fruits growing

COM

11139

Agriculture:Other permanent crops

COM

11190

Agri-Livestock:Other n.e.c. 11119

COM

11191

Agri-Livestock:Pig rearing

COM

11192

Agri-Livestock:Cattle & dairy

COM

11193

Agri-Livestock:Poultry & hatching

COM

11199

Agri-Livestock:Other n.e.c.

COM

11300

Hunting, Trapping & Games propagation

12101

Forestry:Coll’n of attap + other

12102

Forestry:Charcoal burning

12109

Forestry:Other forestry industries

12200

Logging

13010

Fishing:Ocean & Coastal Fishing

13021

Fishing:Inland fishing

13029

Fishing:Not elsewhere classified

14001

Public Lighting:Telephone kiosk

14002

Public Lighting:Decorative lighting

14003

Public Lighting:Street lighting

14004

Public Lighting:Bus stand

14005

Public LightingSignboard;Advert

14009

Public Lighting:Not elsewhere classified

14010

Public Lightin:Village street light

21000

Mining:Coal

91

Appendix

22000

Crude Petroleum & Natural gas prod.

IND

23010

Mining:Iron Ore

23021

Mining:Tin Dredging

23022

Mining:Tin mining &other dreging

23023

Mining:Dulang washimg

23024

Mining:Amang treatment

23025

Mining:Bauxite

23026

Mimig:Gold

23027

Mining:Copper

23028

Mining:antimony

23029

Mining:Non-ferrous & metal ore nec

29011

Mining:Limestone quarrying

29012

Mining:Other stone qurrying

29013

Mining:Clay,sand&gravel pits

29021

Mining:Guano gathering

29029

Mining:Other chemical,fertiliser

29030

Mining:Salt

29090

Mining&quarrying:Others

31110

Manufac:Meat preparation

IND

31121

Manufac:Diary prod. Ice cream

IND

31129

Manufac:Diary prod. Others

IND

31131

Manufac:Canning pineapple

IND

31139

Manufac:Canning fruits,vegetables

IND

31140

Manufac: Canning fish, crustacea

IND

31151

Manufac:Coconut oil

IND

31152

Manufac:Palm oil

IND

31153

Manufac:Palm kernel oil

IND

31159

Manufac:Other vege. & anim,al oils

IND

31161

Manufac:Small rice mills

IND

31162

Manufac:Large rice mills

IND

31163

Manufac:Flour mills

IND

31164

Manufac:Sago & tapioca factories

IND

31169

Manufac:Other grain milling

IND

31171

Manufac:Biscuit factories

IND

31172

Manufac:Bakeries

IND

31180

Manufac:Sugar factories&refine

IND

31190

Manufac:Cocoa,choc. & sugar conf

IND

31211

Manufac:Ice factories

IND

31212

Manufac:Coffee factories

IND

31213

Manufac:Tea factories

IND

31214

Manufac:Meehoon,noodles,&related

IND

31215

Manufac:Spices&curry powder

IND

31216

Manufac:Starch

IND

31219

Manufac:Other food products

IND

31220

Manufac:Prepared animal feeds

IND

31310

Manufac:Distilling,blend’g spirit

IND

92

Appendix

31320

Manufac:Wine industries

IND

31330

Manufac:Malt liquors & malt

IND

31340

Manufac:Soft drinks&Corconated

IND

31400

Manufac:Tobacco

IND

32111

Manufac:Natural fibre spining

IND

32112

Manufac:Non-batek Dyeing,bleach

IND

32113

Manufac:Handicraft spin’g,weav’g

IND

32114

Manufac:Batek making

IND

32115

Manufac:Synthetic textile mills

IND

32119

Manufac:Misc primary textiles

IND

32120

Manufac:Non-wearing apparel

IND

32190

Manufac:Textile (material)

IND

32201

Manufac:Clothing

IND

32202

Manufac:Custom tailoring

IND

32209

Manufac:Misc wearing apparel

IND

32310

Manufac:Tanneries & leather fin’g

IND

32320

Manufac:fur dressing &dyeing ind

IND

32330

Manufac:leather prod. Excp. F.w/w.a

IND

32400

Manufac:Footware excp Vul,Rub,Pla

IND

33110

Manufac:Wood(33119)

IND

33111

Manufac:Sawmills

IND

33112

Manufac:Plywood,hardb’d & particl

IND

33113

Manufac:Plammg,window,door,join

IND

33114

Manufac:Prefab. Wooden house

IND

33119

Manufac:Other wood products

IND

33120

Manufac:Wooden & cane container

IND

33190

Manufac:Wood&cork products

IND

33200

Manufac:Non-metal furnit/fixture

IND

34100

Manufac:Paper

IND

34110

Manufac:Pulp,paper&paperboard

IND

34120

Manufac:Container,paper boxes

IND

34190

Manufac:Other pulp,paper,paperb’d

IND

34200

Manufac:Printing,Publishing,Allied

IND

35111

Manufac:Industrial gases

IND

35119

Manufac:Other industrial chemical

IND

35120

Manufac:Fertilizers & pesticides

IND

35130

Manufac:Synt.resins,plastic,fibr

IND

35210

Manufac:Paints,varnishes,lacquers

IND

35220

Manufac:Drugs & medicine

IND

35230

Manufac:Perfumes,cosmetics (35239)

IND

35231

Manufac:Soap,cleaning preparation

IND

35238

Manufac:Incense / Joss sticks

IND

35239

Manufac:Perfumes,cosmetics,toilet

IND

35290

Manufac:Other chemical products

IND

35300

Manufac:Petroleum refineries

IND

35400

Manufac:Misc.petroleum/local prod

IND

93

Appendix

35510

Manufac:Tyoe & tube industries

IND

35590

Manufac:Rubber products (35599)

IND

35591

Manufac:Rubber remilling, latex

IND

35599

Manufac:Other rubber products

IND

35600

Manufac:Other plastic products

IND

36100

Manufac:Pottery,China,earthware

IND

36200

Manufac:Glass, glass product

IND

36910

Manufac:Structural clay products

IND

36911

Manufac:Bricks

IND

36920

Manufac:Cement (36991)

IND

36921

manufac:Hydraulic cement

IND

36922

Manufac:Lime and plaster

IND

36991

Manufac:Cement&concrete product

IND

36992

Manufac:Cut-stone&stone product

IND

26993

Manufac:Marble slabs

IND

36999

Manufac:Other non-metallic mineral

IND

37101

Manufac:Primary iron & steel basic

IND

37102

Manufac:Foundries

IND

37109

Manufac:Other iron&steel

IND

37201

Manufac:Tin smelting

IND

37202

Manufac:Tin recycling

IND

37209

Manufac:Other non-ferrous ind

IND

38111

Manufac:Cutlery,handtools,h/w

IND

38112

Manufac:Atinsmithing/Blacksmithing

IND

38120

Manufac:Metal furniture/fixture

IND

38130

Manufac:Metal structural products

IND

38191

Manufac:Cans &metal boxes

IND

38192

Manufac:Wire&wire products

IND

38193

Manufac:Brass,copper,pewter,alumi

IND

38199

Manufac:Other fabricated metal p.

IND

38210

Manufac:Engines & turbines

IND

38220

Manufac:Agri. Machineries/equip’t

IND

38230

Manufac:Metal&wood working mach

IND

38240

Manufac:Spec. ind. Mach.excp.metal

IND

38250

Manufac:Office,computing,accounting

IND

38291

Manufac:Refrig.exhaust,air-cond m

IND

38299

Manufac:Other machinery equipm’t

IND

38310

Manufac:Elec.ind.mach. & apparatus

IND

38321

Manufac:Radio,TV related equipm’t

IND

38322

Manufac:Gramophone rec.,tapes

IND

38329

Manufac:Semi-cond.,coommu. Equipm’t

IND

38330

Manufac:Elec.appliances/housware

IND

38390

Manufac:Battery,lamps,etc

IND

38391

Manufac:Cables&wires

IND

38392

Manufac:Dry celss & storage batteris

IND

38393

Manufac:Electric lamps &tubes

IND

94

Appendix

38399

Manufac:Misc. elec. Apparatus/supp

IND

38410

Manufac:Shipbuilding&repairing

IND

38420

Manufac:Railroad equipment

IND

38431

Manufac:Motor vehicle bodies

IND

38432

Manufac:Motor vehicles, assembly

IND

38439

Manufac:Motor veh, parts/accessor

IND

38441

Manufac:motor cycle, assembly

IND

38449

Manufac:Bi-,tri-cycles, trishaws

IND

38450

Manufac:Aircraft

IND

38490

Manufac:Other transport equipm’t

IND

38510

Manufac:Prof. & scient. Equip’t

IND

38520

Manufac:Photographic/optical good

IND

38530

Manufac:Watches and clocks

IND

39010

Manufac:Jewellery & related art

IND

39020

Manufac:Musical instruments

IND

39030

Manufac:Sporting&athletic goods

IND

39091

Manufac:Brooms, brushes and mops

IND

39092

Manufac:pens,pencils,office supp.

IND

39093

Manufac:toys

IND

39094

Manufac:Umbrella

IND

39099

Manufac:Other manufacturing ind

IND

41010

Utility:Electric light &power

41020

Utility:Gas manufacture & distr

41030

Utility:Steam &hot water supply

41040

Utility:Telecomm exchange

42000

Utility:Water works &supply

42001

Utility:Water reservoir

42002

Utility:Water treatment plant

43000

utility:TNB sub station

50011

Construc:Residential construs

COM

50012

Construc:Non-residential construst’n

COM

50013

Construc:Civil engineering constr

COM

50021

Contruc:Metal work contractors

COM

50022

Construc:Electrical contractor

COM

50023

Construc:Plumbing,sewage,sanitory

COM

50024

Construc:Air-cond, fridge contractor

COM

50025

Construc:Bricklaying contractor

COM

50026

Construc:Painting contractor

COM

50027

Construc:Carpentry contractor

COM

50028

Construc:Cement,concrete wortk cont

COM

50029

Construc:Special trade contractors

COM

61110

Wholesale:Trade:Meat,poultry

COM

61120

Wholesale:Fish-fresh/frozen/dried

COM

61130

Wholesale:Fruits and vegetables

COM

61140

Wholesale:Confectionery sweets etc

COM

61150

Wholesale:Bakery products

COM

95

Appendix

61160

Wholesale:Rice,other grains

COM

61170

Wholesale:Other food stuffs

COM

61180

Wholesale:Tobacco products

COM

61190

Wholesale:Alcoholic beverages

COM

61211

Wholesale:Household hardware

COM

61212

Wholesale:Household goods/applian

COM

61213

Wholesale:Furniture,furnishing

COM

61214

Wholesale:Clothing,textiles,etc

COM

61215

Wholesale:Footware

COM

61216

Wholesale:Chemists’ goods, cosmetics

COM

61217

Wholesale:Book,stationery, magazine

COM

61218

Wholesale:Jewellery,watches,silver

COM

61219

Wholesale:Bicycles&parts thereof

COM

61221

Wholesale:Photographic equip.

COM

61222

Wholesale:Other household/per. Itm

COM

61310

Wholesale:Motorcycles & parts

COM

61321

Wholesale:Passenger cars-New

COM

61329

Wholesale:Industrials,comm.veh-New

COM

61331

Wholesale:Passenger-Used

COM

61339

Wholesale:Industrials,comm.veh-Used

COM

61340

Wholesale:Motor parts & accessory

COM

61390

Wholesale:Petrol,lubricatig oils

COM

61410

Wholesale:Tractors,farming eqp&prt

COM

61420

Wholesale:Business&Indust. Eqp&prt

COM

61430

Wholesale:Lumber & timber

COM

61440

Wholesale:Other buildng materials

COM

61450

Wholesale:Rubber

COM

61460

Wholesale:palm oil

COM

61470

Wholesale:Livestock

COM

61480

Wholesale:Other agricultural prod.

COM

61490

Wholesale:Other(e.g. mineral)

COM

61500

Wholesale:Large general wholesaler

COM

62110

Retail:Meat&poultry (fresh/frozen)

COM

62120

Retail:Fish-fresh/frozen/dried

COM

62130

Retail:Fruits & vegetables

COM

62140

Retail:Confectionery (sweets etc)

COM

62150

Retail:Bahery products

COM

62160

Retail:Rice,other grains,flour

COM

62170

Retail:Other food stuffs

COM

62180

Retail:Provision store

COM

62190

Retail:Super market

COM

62210

Retail:Tobacco products

COM

62220

Retail:Alcoholic beverages

COM

62311

Retail:Household hardware

COM

62312

Retail:Household good/appliances

COM

62313

Retail:furniture,furnishing

COM

96

Appendix

62314

Retail:Clothing,textiles,etc

COM

62315

Retail:Footware

COM

62316

Retail:Chemists’goods, cosmetics

COM

62317

Retail:Books,stationery,magazines

COM

62318

Retail:Jewellery,watches,silverwar

COM

62319

Retail:Bicycles&part thereof

COM

62320

Retail:Telecommunications products

COM

62321

Retail:General merchandise

COM

62322

Retail:Photographic eqp./supp

COM

62323

Retail:Others

COM

62410

Retail:Motorcycles & parts

COM

62421

Retail:Passenger cars-New

COM

62429

Retail:Passenger-Used

COM

62430

Retail:Motor parts & accessory

COM

62490

Retail:Petrol,lubricatig oils

COM

62500

Retail:Misc.retail trades

COM

62501

Retail:Business Complex(<4 stor

COM

62502

Retail:Business Complex(>4 stor

COM

63100

Retail:Restaurants,cafes etc

COM

63200

Accommodate:Rest house

COM

63201

Accommodate:Hotel below 3 star

COM

63202

Accommodate:Hotel 3 star

COM

63203

Accommodate:Hotel 4 star

COM

63204

Accommodate:Hotel 5 star

COM

63205

Accommodate:Challet,motel

COM

63206

Accommodate:Hostel

COM

63211

Residence:Semi-D (1storey)

DOM

63212

Residence:Semi-D (>1storey)

DOM

63221

Residence:Link house (1 storey)

DOM

63222

Residence:Link house (>1 storey)

DOM

63223

Residence:Shop house

COM

63224

Residence:Luxury link house(>1st)

DOM

63225

Residence:Townhouse(>1 storey)

DOM

63231

Residence:Medium cost apartment

DOM

63232

Residence:Condominium

DOM

63233

Residence:Flat

DOM

62334

Residence:low cost Apt(<=3 rooms)

DOM

63235

Residence:Low cost Apt(>3 rooms)

DOM

63236

Residence:Apt/Condominium (<3 rooms)

DOM

63237

Residence:Apt/Condominium (>3 rooms)

DOM

63238

Residence:Luxury Apt/Condominium

DOM

63240

Residence:Squatter house

DOM

63241

Residence:Long house (temporary)

DOM

63250

Residence:Bungalow (LPC)

DOM

63251

Residence:Bungalow (1 storey)

DOM

63252

Residence:Bungalow (> 1 storey)

DOM

97

Appendix

63253

Residence:Kampong house

DOM

66666

Residence:Palace (formal&informal)

DOM

71110

Transp:Railway transport

COM

71120

Transp:Bus & tramway transport

COM

71121

Transp:Plaza toll serv. & resthouse

COM

71131

Transp:Taxi service

COM

71139

Transp:Others

COM

71140

Transp:Freight transport by road

COM

71150

Transp:Pipeline transport

COM

71160

Transp:Supptng serv. To land trans

COM

71210

Transp:Ocean&coastal water transport

COM

71220

Transp:Inland water transport

COM

71230

Transp:Supptng serv. To water transport

COM

71310

Transp:Air transport carriers

COM

71320

Transp:Supptnd ser to air transport

COM

71911

Transp:Travel&tourist agencies

COM

71919

Transp:Other transport services

COM

71920

Transp:Storage & warehousing

COM

72001

Commmunication:Post

COM

72009

Commmunication:Telecommunication

COM

81011

FinInst:Central bank

COM

81012

FinInst:Commercial bank

COM

81021

FinInst:Pawnbrokers

COM

81022

FinInst:Other financial institu’n

COM

81029

FinInst:Govern.bank (BSN, B.Negara)

COM

81030

FinInst:Financial services

COM

82001

Insurance:Life insurance

COM

82002

Insurance:Provident&pension fund

COM

82003

Insurance:Cacualty insurance

COM

82004

Insurance:Social security org’n

COM

82005

Insurance:Insurance services

COM

83101

RealEstate:Operations

COM

83102

RealEstate:Developments

COM

83103

RealEstate:Services

COM

83210

BusinessServ:Legal services

COM

83220

BusinessServ:Accountng,audtg etc

COM

83230

BusinessServ:Data proc.&tabulatng

COM

83240

BusinessServ:Engineering, architect.

COM

83250

BusinessServ:Advertising

COM

83291

BusinessServ:Estate agencies

COM

83292

BusinessServ:Labour contracting

COM

83299

BusinessServ:Other non-M&E rental

COM

83300

BusinessServ:M&E rental/leasing

COM

91110

Pub.Adm/Defence:General admin

91120

Pub.Adm/Defence:External affairs

91130

Pub.Adm/Defence:Justice/public

98

Appendix

91140

Pub.Adm/Defence:Defence

91150

Pub.Adm/Defence:Educational admin

91160

Pub.Adm/Defence:Health admin

91170

Pub.Adm/Defence:Soc.sec.&welfare

91180

Pub.Adm/Defence:Housg&comm.dev

91190

Pub.Adm/Defence:Other com./soc.af

91210

Pub.Adm/Defence:Econ.&labour aff

91220

Pub.Adm/Defence:Agri,forestry,fis

91230

Pub.Adm/Defence:Mining,manuf.cons

91240

Pub.Adm/Defence:Elec.gas &water

91250

Pub.Adm/Defence:Roads&transport

91260

Pub.Adm/Defence:Water transport

91270

Pub.Adm/Defence:Other transport

91280

Pub.Adm/Defence:Communication

91290

Pub.Adm/Defence:Other services

92000

Sanitary & similar services

93100

SocServ:Educational services

COM

93102

SocServ:Sch with hostel

COM

93103

SocServ:university,college

COM

93104

SocServ:Primary sch

COM

93105

SocServ:Secondary sch

COM

93106

SocServ:Kindergarten,nursery

COM

93200

SocServ:Research &Science Inst.

COM

93311

SocServ:Hospital service

COM

93312

SocServ:Specialst(E&T,O&G,dental)

COM

93313

SocServ:Country clinic

COM

93314

SocServ:Private clinic

COM

93315

SocServ:District Health Centre

COM

93320

SocServ:Veterinary services

COM

93400

SocServ:Welfare institutions

COM

93500

SocServ:Busi,Prof,Labour accoc

93600

SocServ:Community Hall

93650

SocServ:Town Hall

93910

SocServ:religious organisations

93911

SocServ:Surau

93912

SocServ:Mosque

93913

SocServ:Church

93914

SocServ:Temple

93990

SocServ:Social& related com.serv

94110

Recreatn:Motion picture product’n

COM

94120

Recreatn:Motion picturedist.&project

COM

94130

Recreatn:Radio&TV broadcasting

COM

94140

Recreatn:Theatrical producers

COM

94150

Recreatn:Author,music composer

COM

94200

Recreatn:Library,museums,gardens

COM

94900

Recreatn:Other amusement services

COM

99

Appendix

94901

Recreatn:Stadium

COM

95110

OtherServ:Footware/leather repair

COM

95120

OtherServ:Electrical repair shops

COM

95130

OtherServ:Motor vehicle repair

COM

95140

OtherServ:Watch,jewellery repairs

COM

95151

OtherServ:Bicycle repairs

COM

95159

OtherServ:Other repair shops

COM

95200

OtherServ:Laundry,cleaning serv.

COM

95300

OtherServ:Domestic services

COM

95910

OtherServ:Barber&beauty shops

COM

95920

OtherServ:Photographic studios

COM

95990

OtherServ:Other personal services

COM

96000

International & extra-terr ser

99999

Reserved for interest

100

Appendix

APPENDIX 2 – LIST OF COMMERCIAL CONSUMERS INTERVIEWED Company Name Enhance Track Sdn Bhd Mun Chuen Transport Sdn Bhd K.T. Beach Resort Sams Metal Trading (KT) Sdn Bhd

Type Real Estate/Business Services Transport Accommodation Wholesale

S.P. Chong (M) Sdn Bhd

Retail

Pustaka Seri Intan Sdn Bhd

Retail

Lee Electrical & Refrigeration Services Lam Soon Edible Oils Sdn Bhd Lau & Partners Pharmacy Sdn Bhd Terengganu Refrigeration & Elec Services Hotel KT Mutiara Sdn Bhd Bumiputra-Commerce Bank Bhd

Real Estate/Business Services Wholesale Retail Real Estate/Business Services Accommodation Financial Institution

Fibrecomm Network

Communication

Wanziehan Enterprise

Wholesale

Tadika Ezi

Social Services

Perniagaan Kinta Mewah

Wholesale

Boon Hoi (pemborong telur&plastik

Wholesale

Seng Hing Confectionery Sdn Bhd

Wholesale

Risya&Anita Beauty Centre

Residence/Shop House

Perniagaan Shah Ain

Residence/Shop House

Serdang Bakery

Residence/Shop House

K.H.Chin Enterprise

Residence/Shop House

Syarikat Anita Marina Agencies Bumiputra-Commerce Bank Bhd Enviro Lift Sdn. Bhd

Residence/Shop House Real Estate/Business Services Financial Institution Agriculture

Time dot.com Berhad

Communication

Maxis Mobile

Communication

Uni Asia

Insurance

Unique Bubbles Sdn Bhd

Recreation

Tahan Insurance

Insurance

Freight Team (M) Sdn Bhd

Transport

Glomac Enterprise Sdn Bhd

Recreation

APL-NOL (Malaysia) Sdn Bhd

Transport

Viva Life Sdn Bhd

Insurance

Jambatan Merah Sdn Bhd

Transport

Seagull Logistic Sdn Bhd

Transport

Applewood Sdn Bhd

Transport

Berjaya Prudential Assurance Bhd

Insurance

Sunway Risk Management Sdn Bhd

Insurance

Pusat Rekreasi Pelangi

Recreation

Bonus Properties Sdn Bhd

Recreation

RJ Family Store Sdn Bhd

Retail

101

Appendix

How Soon Hardware Trading

Construction

CIMB Bank Bhd (Seremban Branch)

Financial Institution

Maybank Berhad (Seremban)

Financial Institution

Allson Klana Resort Aw & Lena Plumbing Construction Eurotral Wheels Sdn Bhd Day & Day Enterprise Maxis Communications Berhad (S’ban) Golden Screen Cinemas (Seremban) Domino’s Pizza (Seremban)

Accommodation Construction Transport Real Estate/Business Services Communication Recreation Others

Avillion Port Dickson

Accommodation

Corus Paradise Resort Port Dickson

Accommodation

Billion Shopping Center Port Dickson Royal Adelphi Seremban Dreamz Car Wash

Retail Accommodation Others

Kimmark (M) Sdn. Bhd

Retail

Exact Automation Sdn. Bhd

Others

Sin Weng Soon Motor Workshop Klinik Raj & Rakan Rakan

Others Social Services

Klinik Raj & Rakan Rakan

Social Services

MAZ International School

Social Services

Super Racing (Workshop)

Others

Super Racing (Sales of Motorcycle)

Others

CALTEX Petrol Station

Wholesale

102

Appendix

APPENDIX 3 – LIST OF INDUSTRIAL CONSUMERS INTERVIEWED Company Name Texas Instruments BH Bakery Exxon Mobil Production Pertima Terengganu

Type Electrical/Electronics Food Chemicals & Petrochemicals Food

Atlas Edible Ice (Pantai Timur) Sdn Bhd

Food

KT Ice Sdn Bhd

Food

KY Food Industries Sdn Bhd

Food

F&N Coca Cola Sdn Bhd, KT

Beverage & Tobacco

MSET Ship Building Corp Sdn Bhd

Transport Equipment

My Socks Malaysia Sdn Bhd

Textiles & its products

Permint Suterasemai Sdn Bhd

Textiles & its products

Noor Arfa Batek Sdn Bhd

Textiles & its products

Utusan Melayu (M) Berhad, KT S.E.H. (Shah Alam) Sdn Bhd Jernih Ais Sdn Bhd

Paper, Printing & Publishing Electrical/Electronics Food

Freescale, PJ

Electrical/Electronics

F&N Coca Cola Sdn Bhd, KL

Beverage & Tobacco

Dutch Lady Malaysia, PJ

Beverage & Tobacco

Amsteel Mills Sdn Bhd Aluminium Company of Malaysia Berhad Somerville (M) Sdn Bhd Porite (Malaysia) Sdn Bhd Muda Paper Mills Sdn Bhd MetTube Sdn Bhd Antara Steel Mill Sdn Bhd Malaysian Marine Heavy Engineering

Iron & Steel Nonferrous Metals & its products Iron & Steel Others Paper, Printing & Publishing Nonferrous Metals & its products Iron & Steel Others

Flextronics Ind (Malaysia) Sdn Bhd

Electrical/Electronics

Lafarge Cement (Southern Cement)

Clay-based & other non-metallic minerals

URC Snack Foods (Malaysia) Sdn Bhd Malaysian Sheet Glass Sdn Bhd

Food Nonferrous Metals & its products

Yeo Hiap Seng

Beverage & Tobacco

Universal Nutribeverage Sdn. Bhd

Beverage & Tobacco

Paling Plastic Industry Ornasteel (M) Iamko Metal Industry Sdn. Bhd United Power Cable (M)

Plastics Iron & Steel Iron & Steel Transport Equipment

Beryl’s Confectionary

Food

World Prominence Sdn Bhd

Food

Penang Seagate

Electrical/Electronics

AMD Expansion

Electrical/Electronics

ON Semiconductor

Electrical/Electronics

Rohm-Wako Electronics (M) Sdn Bhd

Electrical/Electronics

S.M Biomed Sdn. Bhd

Pharmaceutical

103

Appendix

Continental Sime Tyre PJ Sdn Bhd

Rubber

Hitachi Construction

Machinery & Machinery Equipment

United Bintang

Machinery & Machinery Equipment

CCM Chemicals Sdn Bhd

Chemicals & Petrochemicals

Idemitsu Chemicals (M) Sdn Bhd

Chemicals & Petrochemicals

Kiswire Sdn Bhd CMKS (Malaysia) Sdn Bhd

Iron & Steel Electrical/Electronics

Pharmaniaga

Pharmaceutical

Taisho Pharmaceutical (M) Sdn Bhd

Pharmaceutical

Takeuchi MDF Sdn Bhd Autoways Sdn. Bhd.

Wood & its products Rubber

Hopetech Sdn Bhd

Electrical/Electronics

Yamaha Electronics Manufacturing(M)

Electrical/Electronics

Aquila Sofa Industries Sdn Bhd

Furniture & Fixtures

WF Furniture & Renovation Sdn Bhd

Furniture & Fixtures

Sykt. Cahaya Muda Perak Sdn Bhd

Palm & Palm Kernel Oil

Unitata Berhad

Palm & Palm Kernel Oil

Goodyear Malaysia Berhad Proton (Tg Malim) Palmaju Edible Oil Sdn Bhd

Rubber Transport Equipment Palm & Palm Kernel Oil

104

Appendix

APPENDIX 4 – DOMESTIC QUESTIONNAIRE

105

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

" #

-1-

$

%

$&

' ( %) *

+

,

$!& # -!. , /!$0 1 )

$0

2

&

3

4

5

# , 1 5

-

6

6 (

", ,

'

6

-2-

7

: - 00 '

8

$ 3

6 6

6

6 9 : 1 *1

( %) *

; ,

; 6

# *

'

!

' 3'

<

, ) !

.

3

6

- 00 .6

%) *

6

6 $ 6 51 " *15

9 :,") 7"

" ,* 71 :: ::%

* 3 7 6 *

=

> ) !

9

*

-3-

5 : 71 :: ::%

$

&

-

8

.

$

&

-

8

.

$

&

-

8

.

$

&

-

8

.

$

&

-

8

.

3 $

&

-

8

.

$

&

-

8

.

$

&

-

8

.

/ 3

6 %) *

5 ) ?5 % 70

70$0

70&.

7 .0

7$

7&

7-

7$&

7$.

78

7.

7/

7@

7$0

7&0

7&.

7-0

780

7.0

) " !) "

7

A 6

5

6 6

6

6

6 6

B B22 ( %) *

@

'

5 ) ?5 %

70

70$0

70&.

7 .0

7$

7&

7-

78

7.

7/

7@

7$0

7$&

7$.

7&0

7&.

7-0

780

7.0

5

6

6 ' 6

'

) " !) "

7

6 6 ( %) *

6 5 )?

5 %

C

70

70$0

70&.

7 .0

7$

7&

7-

78

7.

7/

7@

7$0

7$&

7$.

7&0

7&.

7-0

780

7.0

5

6

3<

70

/'A' =

) !

= 1) 5*

-4-

) " !) "

7

@' (

: / 00 '

$0

& 3

6 6

6

6 9 : 1 *1

( %) *

; ,

; 6

# *

'

!

' 3'

<

, ) !

$$

3

6

6

/ 00

6 $ 6

.6

%) *

51 " *15

9 :,") 7"

" ,* 71 :: ::%

* 3 7

*

=

> ) !

9

*

-5-

5 : 71 :: ::%

$

&

-

8

.

$

&

-

8

.

$

&

-

8

.

$

&

-

8

.

$

&

-

8

.

3 $

&

-

8

.

$

&

-

8

.

$

&

-

8

.

$& : 3 %) * 70

6

5 ) ?5 %

70$0

70&.

7 .0

7$

7&

7-

7$&

7$.

78

7.

7/

7@

7$0

7&0

7&.

7-0

780

7.0

) " !) "

7

$6

5

6 6

6

6

6 6

B B22 ( %) *

$8

'

6 5 ) ?5 %

70

70$0

70&.

7 .0

7$

7&

7-

78

7.

7/

7@

7$0

7$&

7$.

7&0

7&.

7-0

780

7.0

5

6

6 ' 6

'

) " !) "

6

7

( %) *

6 5 )?

5 %

$.

70

70$0

70&.

7 .0

7$

7&

7-

78

7.

7/

7@

7$0

7$&

7$.

7&0

7&.

7-0

780

7.0

5

6

3<

70

$&'$=

) !

= 1) 5*

-6-

) " !) "

7

$8' (

Appendix

APPENDIX 5 – COMMERCIAL QUESTIONNAIRE

106

Commercial

Interviewer:

___________________

Date Interview Completed:

___________________

Interview Start Time:

___________________

Name of Respondent:

________________________________________

Title:

____________________________

Contact Number:

____________________________

Company name:

________________________________________

Company Address:

________________________________________ _________________________________________

Business Type (Select One):

Accommodation

Recreation

Agriculture

Residence / Shop House

Communication

Retail

Construction

Social Service

Financial Institution

Transport

Insurance

Wholesale

Real Estate / Business Services

Others (Specify) _______________

Please be informed that all the information you provide here is strictly confidential.

-1-

Commercial

&

- ,

3

,

3 ' , 3 3

.

3 . 8

,

. 3

.

!" # a. " $$$$$$$$$$$$$$$$$$$$$$$$$$$ b. + &

, !" ")/0 1#

-

1

%&' ! ()

&!" *

-

2

.

3

4

5

-

* !" # a. " b. + 2

&

, . 3 0 5"% #

3

6666666666

7

4

. .

3 3

&"

6666666 /

8

3 &1

a. b. ' c. d. 95 e. f. : g. !

4&

5

5 5'' +#

,-

,

# ;#

3<666666666666666

-2-

Commercial

=

-

3

3 a. b. c. d. e.

?

. 5 5 5 5 5

.

!" # .

2 > 2

3

.

1

2

3

3 !" ")/0 1#

4

Dissatisfied

3

3

;

-

!" ")/0 1# 1

2

3

4

Dissatisfied

@

5

Satisfied

3

>

#

5

Satisfied

3

-

3 "0

!" ")/0 1# 1

2

3

Dissatisfied

4

5

Satisfied

-3-

-

Commercial

.

=

3 3

A

;

3 B

4

;

3

, -

,

3

. . ;

.

-

&

8 ,

&3 8

-

3 3

3

3 3

< < < "

A

3 3

-

.

'

!

-

< CC <

:

!" ")/0 1#

.

1

2

3

4

Not Disruptive

5

Disruptive

A

3 !"

5": 4&

# " $$$$$$$$ ; ( 7 .# + $$$$$ 66666666666666 "

&" 0 5"%&45''1!'1&5

.

3 3

*

A .

, 4&

#

#

3

, &" 0 5"%#

1/ 66666666666666 95 ) !4 !

-4-

3

'1!:) &!" !1 5

Commercial

2

A

3 !" ")/0 1#

$$$$$$ $$$$$$ CD

7

CD

.

$$$$$$

$$$$$$ $$$$$$

CD

*CD

A

.

$$$$$$

2CD

$$$$$$

7CD

!

3 &45''1!'1&5

# " .# +

=

A &"

):

&1

>CD

@CD

CCD

. &1

; !" ")/0 1 5": &"

):

#

1/ 666666

.

1/ 666666

.

!

3 3 3

. -

$$$$$

8 &45''1!'1&5

; E

3

;

$$$

#

!"

5":

:#

1/ 6666666 1/ 6666666

. E . ; 5''1!'1&5 #

&3

. #

A

# " .# + >

?CD

$$$$$$ $$$$$

.

# " .# +

?

$$$

=CD

$$$$$$

. 3 3

;

$$$$$$

8

. , 8

#

!"

5": &"

):

!

&4

1/ 66666666

!" ")/0 1 5": &"

# 1/C, .# 1/ 666666 ,

. 3 ):

.

! .

.

-5-

#, &45''1!'1&5

. #

Commercial

@

&

. -, -

-

5''1!'1&5 # " .# + C

5 ; -

-

$$$

, A &1 !"

")/0 1

, F3 . 5": &" ):

3 !

&4

#

1/ 6666666

-

$$$$$$ $$$$$$ $$$$$$ $$$$$$ $$$$$$ $$$$$$ CD CD CD *CD 2CD 7CD 5. ) ; 7CE 7C

-6-

.

$$$$$$ =CD

,

$$$$$$ $$$$$$ $$$$$$ $$$$$$ ?CD >CD @CD CCD ;

Commercial

3 3

,

;

"

# $

%&

'

Complete outage Warning : None Start time : 2 p.m. Duration : 1 hour Complete outage Warning : None Start time : 2 p.m. Duration : 2 hour Complete outage Warning : None Start time : 2 p.m. Duration : 4 hour

2

Complete outage Warning : None Start time : 2 p.m. Duration : 8 hour

7

Complete outage Warning : None Start time : 2 p.m. Duration : 0 – 2 seconds

=

Complete outage Warning :1 hour advance warning Start time : 2 p.m. Duration : 1 hour

, 3

!

*

,

-7-

(# ) $ % ) #

#

# * $+ $ ' %

# '

Commercial

. ;

3 3 3 3

3 3

. 3 . 3

A &3

A !" #

3

3

5

3 3 3

3 ,

3

GH

.

.

;

.

%

. E /

5 5

. E

&

E ' 4

-

E '

!3 3

E 1 3

! E I

'

1

I

E

E I

E

E

. , 3

,

,

,

#

A -

4&

)

&" 0 5"%#

3

F

1/ 6666666666

*

A

. ,

3 ,

F ,

-

#

F &" 0 5"%#

.

&" 0 5"%# 1/6666666666666

2

5

3 #

666666666666 D

-8-

4&

4&

Commercial

7

: 3

,3 8

-

8

-

,

#, !" 5": 4&

0 5"%&45''1!'1&5 " . +

=

%&' ! () A 8

:

6666666666666666

8

.

0 ; $ )

3

#

-

8 !" ")/0 1 5": 4&

&1

.

>

" A .

#

A

8

. ,

3& :

3

33 .#

& # .# #

66666666666666666666

-

#+

@

#

#

:

" +

0 5"% '5

#

&

5''1!'1&5

3 3 %5''1!'1&5

. #

?

&!"

E -

&"

#

,

"

3

#

;

-

-9-

8

3 "0

&" 0 5"%&4

Commercial

C # .# # # #

-

3 9

.

-

3

3 " : 9

3 3

"0

5

3

3

. 3

3 -

3

-

;

3

3

"0

3

-

- 10 -

"0 3 3

Komersial

Penemuduga:

___________________

Tarikh:

___________________

Masa :

___________________

Nama Responden:

________________________________________

Jawatan:

____________________________

Nombor Telefon:

____________________________

Nama Syarikat:

________________________________________

Alamat:

________________________________________ _________________________________________

Jenis Perniagaan (Pilih Satu) :

Penginapan

Rekreasi

Pertanian

Kediaman/Rumah Kedai

Komunikasi

Runcit

Pembinaan

Perkhidmatan Sosial

Institusi Kewangan

Pengangkutan

Insurans

Jualan Gudang (Wholesale)

Hartanah/Perkhidmatan Bisnes

Lain-lain (Nyatakan)

Semua maklumat yang diberikan di sini adalah sulit.

_______________

Komersial

KAJI SELIDIK GANGGUAN BEKALAN ELEKTRIK KOMERSIAL/INDUSTRI SOALAN ASAS

1.

Adakah organisasi anda mengalami ganguan elektrik 12 bulan kebelakangan ini? (PILIH SATU) a. b.

2.

Tidak --------------------------Ya

Secara umum, adakah keadaan ini sangat mengganggu perjalanan operasi di syarikat anda? (PILIH SATU)

1

3.

3

4

5

Tidak Ya

Secara umum, berapa lamakah syarikat anda boleh menunggu sekiranya bekalan elektrik terputus sebelum keadaan ini memberi kesan terhadap kos operasi syarikat? (ISIKAN TEMPAT KOSONG) __________ Jam

5.

2

Adakah syarikat anda menghantar pekerja pulang sekiranya berlaku gangguan elektrik? (PILIH SATU) a. b.

4.

(TERUS KE SOALAN 3)

dan

_______ Minit

Diantara berikut, peralatan manakah yang memerlukan bekalan elektrik secara berterusan? (PILIH YANG BERKAITAN) a. KomputerComputers b. TelefonPhones c. Sistem KeselamatanSecurity system d. Sistam HVAC (pemanas, pengudaraan, penghawa dingin) e. Mesin Tunai f. Sistem komunikasi data (LAN) g. Lain-lain (Nyatakan: _______________

-2-

Komersial

6.

Berapa lamakah pemberitahuan mengenai gangguan bekalan elektrik yang diperlukan oleh syarikat anda untuk mengurangkan kesan yang disebabkan oleh gangguan tersebut? (PILIH SATU) a. b. c. d. e.

7.

Pemberitahuan awal tidak mengurangkan kesan gangguan Sekurang-kurangnya 1 jam Sekurang-kurangnya 4 jam Sekurang-kurangnya 8 jam Sekurang-kurangnya 24 jam

Adakah anda berpuas hati dengan kadar gangguan elektrik yang dialami oleh syarikat anda 12 bulan kebelakangan ini? (PILIH SATU)

1

2

3

4

Tidak puas hati

8.

Berpuashati

Adakah anda berpuas hati dengan masa yang diambil untuk keadaan pulih seperti biasa selepas gangguan elektrik? (PILIH SATU)

1

2

3

4

Tidak puas hati

9.

5

5 Berpuashati

Adakah anda berpuas hati dengan tanggungjawab yang dijalankan oleh TNB apabila berlaku gangguan elektrik? (PILIH SATU)

1

2

3

4

5 Berpuashati

Tidak puas hati

-3-

Komersial

Soalan Berangka

!

"#

$

!

#% #)

1.

& *

2

3

Tidak mengganggu

4

5 Mengganggu

Adakah proses pengeluaran dan jualan terhenti ataupun terganggu disebabkan oleh senario ini? (PILIH SATU DAN ISIKAN TEMPAT KOSONG JIKA BERKENAAN) a. b.

3.

#' (( #"

Apakah senario ini mengganggu syarijat anda? (PILIH SATU)

1

2.

&

Tidak -------- Terus ke soalan 5 Ya ----______________ Jumlah jam operasi pengeluaran ataupun jualan yang terhenti ataupun terganggu (termasuk masa semasa gangguan dan selepas gangguan)

Apakah nilai anggaran kerugian untuk proses pengeluaran ataupun jualan yang mungkin dihadapi, sekurang-kurangnya sementara, semasa gangguan dan tempoh yang perlahan selepas gangguan? (ISIKAN TEMPAT KOSONG) RM ______________ Nilai kerugian pengeluaran ataupun jualan

-4-

Komersial

4.

Berapa peratuskah proses pengeluaran ataupun jualan ini boleh di peroleh semula? (PILIH SATU) ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ----0%

5.

50%

60%

70%

80%

90%

100%

Tidak Ya --RM ______ kos buruh untuk kakitangan yang tidak dapat bekerja ketika gangguan bekalan elektrik RM ______ kos buruh untuk kerja lebih masa

Tidak Ya -----

RM _______ kerosakan peralatan RM _______ kerosakan kepada bahan ataupun produk

Tidak Ya ---

RM ________ kos tambahan

Sekiranya anda diminta untuk meletakkan nilai ringgit untuk kos tersembunyi akibat gangguan ini (seperti ketidakselesaan dan ketidakpuasan hati pelanggan), berapakah nilai tersebut? a. b.

9.

40%

Adakah terdapat sebarang tambahan kos nyata akibat gangguan ini (seperti ‘overhead’ dan susut nilai, kos tambahan untuk memulakan semula operasi dan kos untuk membeli dan/atau menyewa peralatan sokongan) (PILIH SATU DAN NYATAKAN KOS JIKA BERKAITAN) a. b.

8.

30%

Adakah terdapat kos kerosakan yang disebabkan oleh gangguan ini (seperti kerosakan pada peralatan ataupun kerosakan pada produk)? (PILIH SATU DAN NYATAKAN KOS JIKA BERKAITAN) a. b.

7.

20%

Adakah terdapat kos buruh yang ditanggung akibat gangguan ini (seperti gaji ataupun upah kepada kakitangan yang tidak dapat bekerja ataupun bayaran kerja lebih masa untuk mengurangkan kerugian pengeluaran ataupun jualan)? (BULATKAN SATU NOMBOR DAN NYATAKAN KOS JIKA BERKAITAN) a. b.

6.

10%

RM0, tiada kos tersembunyi RM ______ , kos tersembunyi

Sebagai tambahan kepada kos-kos yang dibincangkan di atas, ada syarikat yang dapat berjimat kerana proses yang tidak dapat dijalankan, seperti bahan-bahan yang tidak digunakan ataupun inventori, penjimatan daripada bil atapun upah yang tidak perlu dibayar. Adakah syarikat anda mendapat kos penjimatan ini akibat daripada gangguan tersebut? a. b.

Tidak Ya ---

RM _______

-5-

Komersial

10.

Andaikan syarikat anda diberi pemberitahuan awal satu jam mengenai gangguan bekakan elektrik, berapa peratuskah kakitangan anda akan mendengar pemberitahuan ini? ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ -----0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

11.

Tiada

50/50

Ada

Pertimbangkan kesemua kos yang mungkin anda hadapi akibat gangguan bekalan elektrik dan nilaikan kos kerugian anda berdasarkan scenario yang diberikan. (Masukkan kosong jika tiada) Kes

Senario

1

Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 1 jam

2

Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 2 jam

3

Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 4 jam

4

Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 8 jam

5

Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 0 – 2 saat

6

Gangguan Total Amaran :1 jam awal Masa : 2 p.m. Tempoh : 1 jam

Jumlah Kerugian Minimum (Best Case) RM

-6-

Jumlah Kerugian Pertengahan (Typical case) RM

Jumlah Kerugian Maksimum (Worst case) RM

Komersial

SOALAN TAMBAHAN &

! + ,-

1.

Apakah kategori aktiviti syarikat anda? (Tandakan Satu) 1

1 1

.

1 /

/0

&

&

1

1 1

/0 /

/1

/

/ /1

/

/)

/!

1

%

1 ) 2.

)

Apakah nilai anggaran tahunan untuk operasi ataupun perkhidmatan bagi syarikat anda? (ISIKAN TEMPAT KOSONG) RM __________ /tahun

3.

Apakah nilai anggaran perbelanjaan tahunan (termasuk kos buruh. Sewa, bahan-bahan dan kos ‘overhead’ lain)? (ISIKAN TEMPAT KOSONG) RM_____________ /tahun

4.

Apakah peratusan bajet tahunan anda yang digunakan untuk kos tenaga (gas dan elektrik) (ISIKAN TEMPAT KOSONG) ____________ %

5.

Adakah syarikat anda mempunyai peralatan elektrik yang sensitif terhadap kualiti bekalan elektrik ?

-7-

Komersial

a. b. 6.

Back-up generator(s) Uninterruptible power supply Line conditioning device(s) Surge suppressor(s) Isolation transformers(s)

Adakah syarikat anda mempunyai peralatan elektri yang memerlukan bekalan elektrik yang berterusan walaupun terdapat gangguan bekalan elektrik? Tiada Ya

8.

Terus ke soalan 11 Apakah peralatan tersebut? ________________

Adakah syarikat anda mempunyai/menyewa peralatan di bawah untuk melindungi peralatan di syarikat anda? a. b. c. d. e.

7.

Tidak Ya

Apakah peralatan tersebut? ____________________

Adakah anda akan mengalami sebarang kos tidak langsung seperti kehilangan peluang perniagaan, ataupun kehilangan harta benda? a. Ya (Sila jelaskan)

9.

b. Tiada

Sejak 5 tahun kebelakangan ini, adakah anda fikir kualiti Perkhidmatan TNB meningkat? a. b. c.

Meningkat Merosot Tiada perubahan

10. Apakah pendapat andamengenai perkhidmatan bekalan elektrik oleh TNB? a. b. c. d. e.

11.

Sangat berpuashati Berpuashati Biasa-biasa sahaja Tidak berpuashati Sangat tidak berpuashati

Adakah terdapat kejadian yang berlaku setahun kebelakangan ini yang menyebabkan anda tidak berpuashati dengan TNB?

-8-

Komersial

12.

Adakat terdapat perkhidmatan tambahan yang anda inginkan daripada TNB pada masa hadapan?

-9-

Appendix

APPENDIX 6 – INDUSTRIAL QUESTIONNAIRE

107

Industrial

Interviewer:

___________________

Date Interview Completed:

___________________

Interview Start Time:

___________________

Name of Respondent:

________________________________________

Title:

____________________________

Contact Number:

____________________________

Company name:

________________________________________

Company Address:

________________________________________ _________________________________________

Business Type (Select One):

Electrical / Electronic

Rubber

Paper, Printing & Publishing

Beverage & Tobacco

Chemicals & Petrochemicals

Furniture & Fixtures

Iron & Steel

Scientific & Measuring Equipment

Nonferrous metal & its products

Leather & its products

Transport Equipment

Palm & Palm Kernel Oil

Food

Pharmaceutical

Wood & Wood products

Photographs, Cinemagraphics, Video & Optical

Textiles & Textile products

Machinery & Machinery Equipment

Clay-based & other nonmetallic minerals

Lain-lain (Nyatakan)

Plastics

_______________

Please be informed that all the information you provide here is strictly confidential.

-1-

Industrial

&

- ,

3

,

3 ' , 3 3

.

3 . 8

,

. 3

.

!" # a. " $$$$$$$$$$$$$$$$$$$$$$$$$$$ b. + &

, !" ")/0 1#

-

1

%&' ! ()

&!" *

-

2

.

3

4

5

-

* !" # a. " b. + 2

&

, . 3 0 5"% #

3

6666666666

7

4

. .

3 3

&"

6666666 /

8

3 &1

a. b. ' c. d. 95 e. f. : g. !

4&

5

5 5'' +#

,-

,

# ;#

3<666666666666666

-2-

Industrial

=

-

3

3 a. b. c. d. e.

?

. 5 5 5 5 5

.

!" # .

2 > 2

3

.

1

2

3

3 !" ")/0 1#

4

Dissatisfied

3

3

;

-

!" ")/0 1# 1

2

3

4

Dissatisfied

@

5

Satisfied

3

>

#

5

Satisfied

3

-

3 "0

!" ")/0 1# 1

2

3

Dissatisfied

4

5

Satisfied

-3-

-

Industrial

.

=

3 3

A

;

3 B

4

;

3

, -

,

3

. . ;

.

-

&

8 ,

&3 8

-

3 3

3

3 3

< < < "

A

3 3

-

.

'

!

-

< CC <

:

!" ")/0 1#

.

1

2

3

4

Not Disruptive

5

Disruptive

A

3 !"

5": 4&

# " $$$$$$$$ ; ( 7 .# + $$$$$ 66666666666666 "

&" 0 5"%&45''1!'1&5

.

3 3

*

A .

, 4&

#

#

3

, &" 0 5"%#

1/ 66666666666666 95 ) !4 !

-4-

3

'1!:) &!" !1 5

Industrial

2

A

3 !" ")/0 1#

$$$$$$ $$$$$$ CD

7

CD

.

$$$$$$

$$$$$$ $$$$$$

CD

*CD

A

.

$$$$$$

2CD

$$$$$$

7CD

!

3 &45''1!'1&5

# " .# +

=

A &"

):

&1

>CD

@CD

CCD

. &1

; !" ")/0 1 5": &"

):

#

1/ 666666

.

1/ 666666

.

!

3 3 3

. -

$$$$$

8 &45''1!'1&5

; E

3

;

$$$

#

!"

5":

:#

1/ 6666666 1/ 6666666

. E . ; 5''1!'1&5 #

&3

. #

A

# " .# + >

?CD

$$$$$$ $$$$$

.

# " .# +

?

$$$

=CD

$$$$$$

. 3 3

;

$$$$$$

8

. , 8

#

!"

5": &"

):

!

&4

1/ 66666666

!" ")/0 1 5": &"

# 1/C, .# 1/ 666666 ,

. 3 ):

.

! .

.

-5-

#, &45''1!'1&5

. #

Industrial

@

&

. -, -

-

5''1!'1&5 # " .# + C

5 ; -

-

$$$

, A &1 !"

")/0 1

, F3 . 5": &" ):

3 !

&4

#

1/ 6666666

-

$$$$$$ $$$$$$ $$$$$$ $$$$$$ $$$$$$ $$$$$$ CD CD CD *CD 2CD 7CD 5. ) ; 7CE 7C

-6-

.

$$$$$$ =CD

,

$$$$$$ $$$$$$ $$$$$$ $$$$$$ ?CD >CD @CD CCD ;

Industrial

3 3

,

;

"

# $

%&

'

Complete outage Warning : None Start time : 2 p.m. Duration : 1 hour Complete outage Warning : None Start time : 2 p.m. Duration : 2 hour Complete outage Warning : None Start time : 2 p.m. Duration : 4 hour

2

Complete outage Warning : None Start time : 2 p.m. Duration : 8 hour

7

Complete outage Warning : None Start time : 2 p.m. Duration : 0 – 2 seconds

=

Complete outage Warning :1 hour advance warning Start time : 2 p.m. Duration : 1 hour

, 3

!

*

,

-7-

(# ) $ % ) #

#

# * $+ $ ' %

# '

Industrial

. ;

3 3 3 3

3 3

. 3 . 3

A &3

A !" #

3

3

5

3 3 3

3 ,

3

GH

.

.

;

.

%

. E /

5 5

. E

&

E ' 4

-

E '

!3 3

E 1 3

! E I

'

1

I

E

E I

E

E

. , 3

,

,

,

#

A -

4&

)

&" 0 5"%#

3

F

1/ 6666666666

*

A

. ,

3 ,

F ,

-

#

F &" 0 5"%#

.

&" 0 5"%# 1/6666666666666

2

5

3 #

666666666666 D

-8-

4&

4&

Industrial

7

: 3

,3 8

-

8

-

,

#, !" 5": 4&

0 5"%&45''1!'1&5 " . +

=

%&' ! () A 8

:

6666666666666666

8

.

0 ; $ )

3

#

-

8 !" ")/0 1 5": 4&

&1

.

>

" A .

#

A

8

. ,

3& :

3

33 .#

& # .# #

66666666666666666666

-

#+

@

#

#

:

" +

0 5"% '5

#

&

5''1!'1&5

3 3 %5''1!'1&5

. #

?

&!"

E -

&"

#

,

"

3

#

;

-

-9-

8

3 "0

&" 0 5"%&4

Industrial

C # .# # # #

-

3 9

.

-

3

3 " : 9

3 3

"0

5

3

3

. 3

3 -

3

-

;

3

3

"0

3

-

- 10 -

"0 3 3

Perindustrian

Penemuduga:

___________________

Tarikh:

___________________

Masa :

___________________

Nama Responden:

________________________________________

Jawatan:

____________________________

Nombor Telefon:

____________________________

Nama Syarikat:

________________________________________

Alamat:

________________________________________ _________________________________________

Jenis Perniagaan (Pilih Satu) :

Elektrikal/Elektronik

Getah

Kertas,percetakan & penerbitan

Tembakau

Kimia & Petrokimia

Perabot

Besi & Keluli

Peralatan Sains

Logam bukan besi & produknya

Kulit & produk kulit

Makanan

Minyak Kelapa Sawit Farmasi

Kayu & Produk kayu

Fotografi,Sinematografi,Video& Optikal

Tekstil & Produk Tekstil

Mesin dan peralatannya

Tembikar, produk tanah liat & mineral bukan logam, dll

Lain-lain (Nyatakan)

Peralatan pengangkutan

Plastik

Semua maklumat yang diberikan di sini adalah sulit.

_______________

Perindustrian

KAJI SELIDIK GANGGUAN BEKALAN ELEKTRIK KOMERSIAL/INDUSTRI SOALAN ASAS

1.

Adakah organisasi anda mengalami ganguan elektrik 12 bulan kebelakangan ini? (PILIH SATU) a. b.

2.

Tidak --------------------------Ya

Secara umum, adakah keadaan ini sangat mengganggu perjalanan operasi di syarikat anda? (PILIH SATU)

1

3.

3

4

5

Tidak Ya

Secara umum, berapa lamakah syarikat anda boleh menunggu sekiranya bekalan elektrik terputus sebelum keadaan ini memberi kesan terhadap kos operasi syarikat? (ISIKAN TEMPAT KOSONG) __________ Jam

5.

2

Adakah syarikat anda menghantar pekerja pulang sekiranya berlaku gangguan elektrik? (PILIH SATU) a. b.

4.

(TERUS KE SOALAN 3)

dan

_______ Minit

Diantara berikut, peralatan manakah yang memerlukan bekalan elektrik secara berterusan? (PILIH YANG BERKAITAN) a. KomputerComputers b. TelefonPhones c. Sistem KeselamatanSecurity system d. Sistam HVAC (pemanas, pengudaraan, penghawa dingin) e. Mesin Tunai f. Sistem komunikasi data (LAN) g. Lain-lain (Nyatakan: _______________

-2-

Perindustrian

6.

Berapa lamakah pemberitahuan mengenai gangguan bekalan elektrik yang diperlukan oleh syarikat anda untuk mengurangkan kesan yang disebabkan oleh gangguan tersebut? (PILIH SATU) a. b. c. d. e.

7.

Pemberitahuan awal tidak mengurangkan kesan gangguan Sekurang-kurangnya 1 jam Sekurang-kurangnya 4 jam Sekurang-kurangnya 8 jam Sekurang-kurangnya 24 jam

Adakah anda berpuas hati dengan kadar gangguan elektrik yang dialami oleh syarikat anda 12 bulan kebelakangan ini? (PILIH SATU)

1

2

3

4

Tidak puas hati

8.

Berpuashati

Adakah anda berpuas hati dengan masa yang diambil untuk keadaan pulih seperti biasa selepas gangguan elektrik? (PILIH SATU)

1

2

3

4

Tidak puas hati

9.

5

5 Berpuashati

Adakah anda berpuas hati dengan tanggungjawab yang dijalankan oleh TNB apabila berlaku gangguan elektrik? (PILIH SATU)

1

2

3

4

5 Berpuashati

Tidak puas hati

-3-

Perindustrian

Soalan Berangka

!

"#

$

!

#% #)

1.

& *

2

3

Tidak mengganggu

4

5 Mengganggu

Adakah proses pengeluaran dan jualan terhenti ataupun terganggu disebabkan oleh senario ini? (PILIH SATU DAN ISIKAN TEMPAT KOSONG JIKA BERKENAAN) a. b.

3.

#' (( #"

Apakah senario ini mengganggu syarijat anda? (PILIH SATU)

1

2.

&

Tidak -------- Terus ke soalan 5 Ya ----______________ Jumlah jam operasi pengeluaran ataupun jualan yang terhenti ataupun terganggu (termasuk masa semasa gangguan dan selepas gangguan)

Apakah nilai anggaran kerugian untuk proses pengeluaran ataupun jualan yang mungkin dihadapi, sekurang-kurangnya sementara, semasa gangguan dan tempoh yang perlahan selepas gangguan? (ISIKAN TEMPAT KOSONG) RM ______________ Nilai kerugian pengeluaran ataupun jualan

-4-

Perindustrian

4.

Berapa peratuskah proses pengeluaran ataupun jualan ini boleh di peroleh semula? (PILIH SATU) ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ----0%

5.

50%

60%

70%

80%

90%

100%

Tidak Ya --RM ______ kos buruh untuk kakitangan yang tidak dapat bekerja ketika gangguan bekalan elektrik RM ______ kos buruh untuk kerja lebih masa

Tidak Ya -----

RM _______ kerosakan peralatan RM _______ kerosakan kepada bahan ataupun produk

Tidak Ya ---

RM ________ kos tambahan

Sekiranya anda diminta untuk meletakkan nilai ringgit untuk kos tersembunyi akibat gangguan ini (seperti ketidakselesaan dan ketidakpuasan hati pelanggan), berapakah nilai tersebut? a. b.

9.

40%

Adakah terdapat sebarang tambahan kos nyata akibat gangguan ini (seperti ‘overhead’ dan susut nilai, kos tambahan untuk memulakan semula operasi dan kos untuk membeli dan/atau menyewa peralatan sokongan) (PILIH SATU DAN NYATAKAN KOS JIKA BERKAITAN) a. b.

8.

30%

Adakah terdapat kos kerosakan yang disebabkan oleh gangguan ini (seperti kerosakan pada peralatan ataupun kerosakan pada produk)? (PILIH SATU DAN NYATAKAN KOS JIKA BERKAITAN) a. b.

7.

20%

Adakah terdapat kos buruh yang ditanggung akibat gangguan ini (seperti gaji ataupun upah kepada kakitangan yang tidak dapat bekerja ataupun bayaran kerja lebih masa untuk mengurangkan kerugian pengeluaran ataupun jualan)? (BULATKAN SATU NOMBOR DAN NYATAKAN KOS JIKA BERKAITAN) a. b.

6.

10%

RM0, tiada kos tersembunyi RM ______ , kos tersembunyi

Sebagai tambahan kepada kos-kos yang dibincangkan di atas, ada syarikat yang dapat berjimat kerana proses yang tidak dapat dijalankan, seperti bahan-bahan yang tidak digunakan ataupun inventori, penjimatan daripada bil atapun upah yang tidak perlu dibayar. Adakah syarikat anda mendapat kos penjimatan ini akibat daripada gangguan tersebut? a. b.

Tidak Ya ---

RM _______

-5-

Perindustrian

10.

Andaikan syarikat anda diberi pemberitahuan awal satu jam mengenai gangguan bekakan elektrik, berapa peratuskah kakitangan anda akan mendengar pemberitahuan ini? ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ -----0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

11.

Tiada

50/50

Ada

Pertimbangkan kesemua kos yang mungkin anda hadapi akibat gangguan bekalan elektrik dan nilaikan kos kerugian anda berdasarkan scenario yang diberikan. (Masukkan kosong jika tiada) Kes

Senario

1

Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 1 jam

2

Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 2 jam

3

Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 4 jam

4

Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 8 jam

5

Gangguan Total Amaran : Tiada Masa : 2 p.m. Tempoh : 0 – 2 saat

6

Gangguan Total Amaran :1 jam awal Masa : 2 p.m. Tempoh : 1 jam

Jumlah Kerugian Minimum (Best Case) RM

-6-

Jumlah Kerugian Pertengahan (Typical case) RM

Jumlah Kerugian Maksimum (Worst case) RM

Perindustrian

SOALAN TAMBAHAN &

! + ,-

1.

Apakah kategori aktiviti syarikat anda? (Tandakan Satu) 1

1 1

.

1 /

/0

&

&

1

1 1

/0 /

/1

/

/ /1

/

/)

/!

1

%

1 ) 2.

)

Apakah nilai anggaran tahunan untuk operasi ataupun perkhidmatan bagi syarikat anda? (ISIKAN TEMPAT KOSONG) RM __________ /tahun

3.

Apakah nilai anggaran perbelanjaan tahunan (termasuk kos buruh. Sewa, bahan-bahan dan kos ‘overhead’ lain)? (ISIKAN TEMPAT KOSONG) RM_____________ /tahun

4.

Apakah peratusan bajet tahunan anda yang digunakan untuk kos tenaga (gas dan elektrik) (ISIKAN TEMPAT KOSONG) ____________ %

5.

Adakah syarikat anda mempunyai peralatan elektrik yang sensitif terhadap kualiti bekalan elektrik ?

-7-

Perindustrian

a. b. 6.

Back-up generator(s) Uninterruptible power supply Line conditioning device(s) Surge suppressor(s) Isolation transformers(s)

Adakah syarikat anda mempunyai peralatan elektri yang memerlukan bekalan elektrik yang berterusan walaupun terdapat gangguan bekalan elektrik? Tiada Ya

8.

Terus ke soalan 11 Apakah peralatan tersebut? ________________

Adakah syarikat anda mempunyai/menyewa peralatan di bawah untuk melindungi peralatan di syarikat anda? a. b. c. d. e.

7.

Tidak Ya

Apakah peralatan tersebut? ____________________

Adakah anda akan mengalami sebarang kos tidak langsung seperti kehilangan peluang perniagaan, ataupun kehilangan harta benda? a. Ya (Sila jelaskan)

9.

b. Tiada

Sejak 5 tahun kebelakangan ini, adakah anda fikir kualiti Perkhidmatan TNB meningkat? a. b. c.

Meningkat Merosot Tiada perubahan

10. Apakah pendapat andamengenai perkhidmatan bekalan elektrik oleh TNB? a. b. c. d. e.

11.

Sangat berpuashati Berpuashati Biasa-biasa sahaja Tidak berpuashati Sangat tidak berpuashati

Adakah terdapat kejadian yang berlaku setahun kebelakangan ini yang menyebabkan anda tidak berpuashati dengan TNB?

-8-

Perindustrian

12.

Adakat terdapat perkhidmatan tambahan yang anda inginkan daripada TNB pada masa hadapan?

-9-

1274714 555.10000 30.00000

Terrace

422.00000 23.00000

1250930 327.00000 18.00000

High_Rise 390598

Kampong

847.00000 48.00000

Statistic

Sum Statistic

Statistic

Std. Deviation Statistic

Variance

181.7513349 .15853669 99.08200478 .486

.629

585.10000 308398953.15000 241.9358014 .11703923 132.14107438 17461.264 .644

5770.100

9817.244

Skewness

Kurtosis

.002

.002

.004

.008

-.365

-.438

-.307

.151

.004

.004

.008

.016

Statistic Std. Error Statistic Std. Error

320.9752837 .64899961 198.11184899 39248.305 .945

Std. Error

Mean Statistic

345.00000 187651495.50002 150.0095893 .06791648 75.96117738

445.00000 70991707.90000

895.00000 29909118.89000

Statistic

Minimum Maximum

93182

Bungalow

Range

Statistic

N

Statistic

Descriptive Statistics

APPENDIX 7 – SPSS ANALYSIS OF DOMESTIC CONSUMER POPULATION

Appendix

108

Appendix

Descriptives Mean

Bungalow

95% Confidence Interval for Mean

High-Rise

.64899961

Lower Bound 319.7032518 Upper Bound

322.2473157 307.8238334

Median

269.0000000

Variance

39248.305

Std. Deviation

198.11184899

Minimum

48.00000

Maximum

895.00000

Range

847.00000

Interquartile Range

260.00000

Skewness

.945

.008

Kurtosis

.151

.016

Mean

325.4398532

.17987639

Lower Bound 325.0872975 Upper Bound

325.7924089

5% Trimmed Mean

323.4196912

Median

315.0000000

Variance

3014.952

Std. Deviation

54.90857467

Minimum

249.00000

Maximum

445.00000

Range

196.00000

Interquartile Range

90.00000

Skewness

.470

.008

Kurtosis

-.918

.016

Mean

307.7973976

.06370672

95% Confidence Interval for Mean

Kampong

Std. Error

320.9752837

5% Trimmed Mean

95% Confidence Interval for Mean

Terrace

Statistic

Lower Bound 307.6725331 Upper Bound

307.9222620

5% Trimmed Mean

307.4545394

Median

306.0000000

Variance

378.183

Std. Deviation

19.44694013

Minimum

277.00000

Maximum

345.00000

Range

68.00000

Interquartile Range

33.00000

Skewness

.222

.008

Kurtosis

-1.132

.016

Mean

525.4215279

.10462629

95% Confidence Interval for Mean

Lower Bound 525.2164615 Upper Bound

525.6265942

5% Trimmed Mean

524.9973351

Median

523.0000000

Variance

1020.032

109

Appendix

Std. Deviation

31.93793622

Minimum

474.00000

Maximum

585.10000

Range

111.10000

Interquartile Range

54.00000

Skewness

.170

.008

Kurtosis

-1.163

.016

M-Estimators Huber’ s M-Estimator(a) Tukey’ s Biweight(b) Hampel’ s M-Estimator(c) Andrews’ Wave(d) Bungalow 283.0957654 266.7803206 High-Rise 319.1433789 319.8449702 Kampong 306.5501089 306.7611529 Terrace 523.8345753 524.1387126 a The weighting constant is 1.339. b The weighting constant is 4.685. c The weighting constants are 1.700, 3.400, and 8.500 d The weighting constant is 1.340*pi.

287.5231060 321.7710026 307.0419735 524.5207610

266.2687862 319.8801127 306.7645718 524.1416048

Percentiles Percentiles Bungalow

83.0000000

10

25

50

75

90

95

109.0000000 170.0000000 269.0000000 430.0000000 630.0000000 741.0000000

High_Rise 254.0000000 260.0000000 278.0000000 315.0000000 368.0000000 410.0000000 427.0000000 Kampong

280.0000000 283.0000000 291.0000000 306.0000000 324.0000000 336.0000000 341.0000000

Terrace

479.0000000 483.0000000 498.0000000 523.0000000 552.0000000 572.0000000 578.0000000

Bungalow

170.0000000 269.0000000 430.0000000

High_Rise

278.0000000 315.0000000 368.0000000

Kampong

291.0000000 306.0000000 324.0000000

Terrace

498.0000000 523.0000000 552.0000000

Extreme Values

Bungalow

Highest

Lowest

Highest High_Rise

Tukey’ s Hinges

Weighted Average(Defi nition 1)

5

Case Number

Value

1

1

895.00000

2

2

895.00000

3

3

895.00000

4

4

895.00000

5

5

895.00000(a)

1

93182

48.00000

2

93181

48.00000

3

93180

48.00000

4

93179

48.00000

5

93178

48.00000(b)

1

1

445.00000

2

2

445.00000

3

3

445.00000

4

4

445.00000

110

Appendix

Lowest

Kampong

Highest

Lowest

Terrace

Highest

Lowest

5

5

445.00000(c)

1

93182

249.00000

2

93181

249.00000

3

93180

249.00000

4

93179

249.00000

5

93178

249.00000(d)

1

1

345.00000

2

2

345.00000

3

3

345.00000

4

4

345.00000

5

5

345.00000(e)

1

93182

277.00000

2

93181

277.00000

3

93180

277.00000

4

93179

277.00000

5

93178

277.00000(f)

1

1

585.10000

2

2

585.00000

3

3

585.00000

4

4

585.00000

5

5

585.00000(g)

1

93182

474.00000

2

93181

474.00000

3

93180

474.00000

4

93179

474.00000

5

93178

474.00000(h)

a Only a partial list of cases with the value 895.00000 are shown in the table of upper extremes. b Only a partial list of cases with the value 48.00000 are shown in the table of lower extremes. c Only a partial list of cases with the value 445.00000 are shown in the table of upper extremes. d Only a partial list of cases with the value 249.00000 are shown in the table of lower extremes. e Only a partial list of cases with the value 345.00000 are shown in the table of upper extremes. f Only a partial list of cases with the value 277.00000 are shown in the table of lower extremes. g Only a partial list of cases with the value 585.00000 are shown in the table of upper extremes. h Only a partial list of cases with the value 474.00000 are shown in the table of lower extremes.

111

Appendix

1000.00000

129 664 1,245 1,990 800.00000

600.00000

400.00000

200.00000

0.00000 Bungalow

112

Appendix

450.00000

400.00000

350.00000

300.00000

250.00000

High_Rise

113

Appendix

340.00000

320.00000

300.00000

280.00000

Kampong

114

Appendix

115

4227

2114

3197

2530

1992

835

21975

16954

615

14607

2197

6668

190643

103757

Agri

Comm

Constr

FinInst

Insurance

Others

RealEst_BS

Recreation

SocialServ

Transport

Wholesale

Retail

ResidenceShopHouse

1290.00000

2096.00000

3858.00

6122.00

4403.00

60965.00

2496.00

1682.00

3224.00

10729.00

4056.00

7962.42

8941.00

5986.00

Statistic

Statistic

Accomodation

Range

N

72.00000

117.00000

214.00

340.00

245.00

3436.00

139.00

93.00

180.00

597.00

226.00

444.00

497.00

334.00

Statistic

Minimum

1362.00000

2213.00000

4072.00

6462.00

4648.00

64401.00

2635.00

1775.00

3404.00

11326.00

4282.00

8406.42

9438.00

6320.00

Statistic

Maximum

44916813.86000

120357090.88000

7058635.55

3585598.70

17860434.25

8549437.75

14953611.83

12059468.42

1018394.57

8188935.37

3136966.04

7267130.04

4998059.03

6580639.93

Statistic

Sum

432.9039377

631.3218470

1058.5836

1632.0431

1222.7312

13901.5248

882.0108

548.7813

1219.6342

4110.9113

1239.9075

2273.1092

2364.2663

1556.8110

Statistic

Std. Error

.98630187

1.12565658

10.94583

30.52458

8.44329

521.28169

4.84175

2.81377

28.70930

64.84349

19.74230

28.70555

45.71755

21.26297

Mean

Descriptive Statistics

317.70098291

491.49187740

893.81245

1430.75338

1020.45100

12927.36547

630.43194

417.11255

829.59432

2894.08324

993.02010

1623.06973

2102.01321

1382.41998

Statistic

Std. Deviation

100933.915

241564.266

798900.699

2047055.229

1041320.245

167116778.004

397444.434

173982.879

688226.730

8375717.824

986088.911

2634355.334

4418459.555

1911085.000

Statistic

Variance

1.057

1.233

1.453

1.536

1.440

1.899

.906

1.086

.850

.784

1.192

1.403

1.473

1.493

Statistic

.008

.006

.030

.052

.020

.099

.019

.017

.085

.055

.049

.043

.053

.038

Std. Error

Skewness

APPENDIX 8 – SPSS ANALYSIS OF COMMERCIAL CONSUMER POPULATION

Appendix

.247

.778

1.447

1.652

1.370

3.231

-.095

.304

-.121

-.488

.569

1.914

1.332

1.467

Statistic

116

.015

.011

.060

.104

.041

.197

.038

.033

.169

.110

.097

.087

.106

.075

Std. Error

Kurtosis

Appendix

Descriptives

Mean 95% Confidence Interval for Mean

Accomodation

Constr

4384.1866

36.27177

Lower Bound 4312.9548 Upper Bound

4455.4183 4355.4422

Median

4205.0000

Variance

809119.336

Std. Deviation

899.51061

Minimum

3073.00

Maximum

6320.00

Range

3247.00

Interquartile Range

1475.00

Skewness

.403

.099

Kurtosis

-.975

.197

Mean

5144.4787

74.17249

Lower Bound 4998.8162 Upper Bound

5290.1412

5% Trimmed Mean

5058.4483

Median

4686.0000

Variance

3383458.106

Std. Deviation

1839.41787

Minimum

2790.00

Maximum

9438.00

Range

6648.00

Interquartile Range

3086.00

Skewness

.570

.099

Kurtosis

-.852

.197

Mean

4958.1505

56.10256

95% Confidence Interval for Mean

Comm

Std. Error

5% Trimmed Mean

95% Confidence Interval for Mean

Agri

Statistic

Lower Bound 4847.9743 Upper Bound

5068.3267

5% Trimmed Mean

4874.4358

Median

4506.0000

Variance

1935710.994

Std. Deviation

1391.29831

Minimum

3337.00

Maximum

8406.42

Range

5069.42

Interquartile Range

2120.00

Skewness

.787

.099

Kurtosis

-.559

.197

Mean

2745.1382

28.32058

95% Confidence Interval for Mean 5% Trimmed Mean

Lower Bound 2689.5212 Upper Bound

2800.7551 2718.4182

117

Appendix

Median

2588.0000

Variance

493264.041

Std. Deviation

702.32759

Minimum

1797.00

Maximum

4282.00

Range

2485.00

Interquartile Range

1194.00

Skewness

.500

.099

Kurtosis

-.963

.197

Mean

7873.2278

69.29124

95% Confidence Interval for Mean

FinInst

Others

Upper Bound

8009.3043

5% Trimmed Mean

7830.5910

Median

7662.0000

Variance

2952784.828

Std. Deviation

1718.36691

Minimum

5279.14

Maximum

11326.00

Range

6046.86

Interquartile Range

2878.00

Skewness

.333

.099

Kurtosis

-1.066

.197

Mean

1528.8676

30.35496

95% Confidence Interval for Mean

Insurance

Lower Bound 7737.1512

Lower Bound 1469.2555 Upper Bound

1588.4797

5% Trimmed Mean

1483.3095

Median

1336.0000

Variance

566675.328

Std. Deviation

752.77841

Minimum

571.00

Maximum

3404.00

Range

2833.00

Interquartile Range

1110.00

Skewness

.812

.099

Kurtosis

-.298

.197

Mean

1677.1555

2.28503

95% Confidence Interval for Mean

Lower Bound 1672.6681 Upper Bound

1681.6429

5% Trimmed Mean

1676.8467

Median

1675.0000

Variance

3211.132

Std. Deviation

56.66685

Minimum

1582.00

Maximum

1775.00

Range

193.00

Interquartile Range

101.00

Skewness

.107

.099

118

Appendix

Kurtosis Mean 95% Confidence Interval for Mean

RealEst_BS

Upper Bound

3.73706

2469.0149 2461.0528

Median

2460.0000

Variance

8588.836

Std. Deviation

92.67597

Minimum

2304.00

Maximum

2635.00

Range

331.00

Interquartile Range

159.00

Skewness

.057

.099

Kurtosis

-1.143

.197

13901.5248

521.28169

Lower Bound 12877.8135 Upper Bound

14925.2361

5% Trimmed Mean

12227.1128

Median

8780.0400

Variance

167116778.004

Std. Deviation

12927.36547

Minimum

3436.00

Maximum

64401.00

Range

60965.00

Interquartile Range

11776.00

Skewness

1.899

.099

Kurtosis

3.231

.197

4168.5767

10.40691

Mean 95% Confidence Interval for Mean

Transport

2461.6759

5% Trimmed Mean

95% Confidence Interval for Mean

SocialServ

.197

Lower Bound 2454.3370

Mean

Recreation

-1.233

Lower Bound 4148.1393 Upper Bound

4189.0142

5% Trimmed Mean

4166.6939

Median

4155.0000

Variance

66606.821

Std. Deviation

258.08297

Minimum

3732.00

Maximum

4648.00

Range

916.00

Interquartile Range

416.50

Skewness

.092

.099

Kurtosis

-1.060

.197

3561.2750

51.83195

Mean 95% Confidence Interval for Mean

Lower Bound 3459.4856 Upper Bound

3663.0644

5% Trimmed Mean

3498.9004

Median

3200.0000

Variance

1652228.852

Std. Deviation

1285.39054

119

Appendix

Minimum

1944.00

Maximum

6462.00

Range

4518.00

Interquartile Range

2115.00

Skewness

.644

.099

Kurtosis

-.778

.197

Mean

3212.8681

19.05352

95% Confidence Interval for Mean

Wholesale

Upper Bound

3250.2861

5% Trimmed Mean

3206.0391

Median

3207.0000

Variance

223267.528

Std. Deviation

472.51193

Minimum

2508.00

Maximum

4072.00

Range

1564.00

Interquartile Range

850.00

Skewness

.159

.099

Kurtosis

-1.249

.197

Mean

2192.7849593

.47869557

95% Confidence Interval for Mean

Retail

Lower Bound 3175.4502

Lower Bound 2191.8448802 Upper Bound

2193.7250385

5% Trimmed Mean

2192.7574526

Median

2193.0000000

Variance

140.927

Std. Deviation

11.87126409

Minimum

2173.00000

Maximum

2213.00000

Range

40.00000

Interquartile Range

20.00000

Skewness

-.004

.099

Kurtosis

-1.192

.197

Mean

1345.3430894

.40233360

95% Confidence Interval for Mean

Lower Bound 1344.5529726 Upper Bound

1346.1332063

5% Trimmed Mean

1345.3148148

Median

1345.0000000

Variance

99.551

ResidenceShopHouse Std. Deviation

9.97754875

Minimum

1329.00000

Maximum

1362.00000

Range

33.00000

Interquartile Range

18.00000

Skewness

.047

.099

Kurtosis

-1.224

.197

120

Appendix

M-Estimators Huber’ s M-Estimator(a) Tukey’ s Biweight(b) Hampel’ s M-Estimator(c) Andrews’ Wave(d) Accomodation

4295.8960

4308.4887

4330.7159

4308.9908

Agri

4878.2555

4900.6929

4996.7142

4902.1424

Comm

4662.2349

4590.8779

4750.1928

4586.8340

Constr

2657.2294

2666.0026

2692.6373

2666.5333

FinInst

7734.1241

7762.3947

7797.5413

7762.9264

Insurance

1394.2954

1358.1028

1426.8799

1356.1655

Others

1675.5774

1675.9439

1676.5875

1675.9432

RealEst_BS

2460.2526

2460.6420

2460.6574

2460.6481

Recreation

9717.4700

8388.1238

9359.2572

8373.0856

SocialServ

4161.0956

4161.5281

4163.2350

4161.5554

Transport

3347.3957

3341.4444

3427.0310

3342.7681

Wholesale

3197.2604

3203.2365

3207.4810

3203.2567

Retail

2192.8732592

2192.8446759

2192.7912882

2192.8448297

ResidenceShopHouse 1345.1976033

1345.2203913

1345.2958213

1345.2201900

a The weighting constant is 1.339. b The weighting constant is 4.685. c The weighting constants are 1.700, 3.400, and 8.500 d The weighting constant is 1.340*pi.

Percentiles Percentiles

Tukey’ s Hinges

Weighted Average(Definition 1)

5

10

25

50

75

90

95

Accomodation

3161.6000

3257.6000

3621.0000

4205.0000

5096.0000

5719.6000

6006.8000

Agri

2911.8000

3032.0000

3536.0000

4686.0000

6622.0000

7914.8000

8594.4000

Comm

3439.0000

3496.4000

3781.0000

4506.0000

5901.0000

7218.6000

7656.0000

Constr

1855.8000

1929.0000

2134.0000

2588.0000

3328.0000

3837.8000

4007.0000

FinInst

5518.8000

5718.0000

6414.0000

7662.0000

9292.0000

10513.6000

10902.6000

Insurance

644.6000

700.2000

907.0000

1336.0000

2017.0000

2723.4000

3140.0000

Others

1593.0000

1601.0000

1627.0000

1675.0000

1728.0000

1759.0000

1767.0000

RealEst_BS

2321.0000

2335.0000

2379.0000

2460.0000

2538.0000

2593.0000

2610.4000

Recreation

3685.8000

4036.2000

5209.0000

8780.0400

16985.0000

32710.0000

44083.2000

SocialServ

3762.0000

3806.6000

3964.5000

4155.0000

4381.0000

4539.4000

4598.0000

Transport

2043.4000

2124.0000

2445.0000

3200.0000

4560.0000

5652.8000

6066.9760

Wholesale

2549.6000

2588.6000

2763.0000

3207.0000

3613.0000

3906.8000

3977.6000

Retail

2174.000000 2176.000000 0 0

2183.000000 2193.000000 0 0

2203.000000 2209.000000 0 0

2211.200000 0

ResidenceShopHou se

1330.000000 1332.000000 0 0

1337.000000 1345.000000 0 0

1355.000000 1359.000000 0 0

1361.000000 0

Accomodation

3622.0000

4205.0000

5094.5000

Agri

3539.5000

4686.0000

6621.5000

Comm

3781.0000

4506.0000

5892.0000

Constr

2136.0000

2588.0000

3326.0000

FinInst

6415.0000

7662.0000

9288.5650

Insurance

908.5000

1336.0000

2009.0000

121

Appendix

Others

1627.0000

1675.0000

1728.0000

RealEst_BS

2380.0000

2460.0000

2537.5000

Recreation

5216.5000

8780.0400

16841.5000

SocialServ

3965.5000

4155.0000

4380.5000

Transport

2445.0000

3200.0000

4555.0000

Wholesale

2764.0000

3207.0000

3612.0000

Retail

2183.000000 2193.000000 0 0

2203.000000 0

ResidenceShopHou se

1337.000000 1345.000000 0 0

1355.000000 0

Extreme Values

Highest

Accomodation

Lowest

Highest

Agri

Lowest

Highest

Comm

Lowest

Constr

Highest

Case Number

Value

1

1

6320.00

2

2

6296.00

3

3

6282.00

4

4

6279.00

5

5

6275.00

1

615

3073.00

2

614

3077.00

3

613

3082.00

4

612

3084.00

5

611

3086.00

1

1

9438.00

2

2

9430.00

3

3

9388.00

4

4

9316.00

5

5

9306.00

1

615

2790.00

2

614

2793.00

3

613

2799.00

4

612

2806.00

5

611

2810.00

1

1

8406.42

2

2

8374.42

3

3

8339.00

4

4

8332.00

5

5

8329.00(a)

1

615

3337.00

2

614

3346.00

3

613

3349.00

4

612

3352.00

5

611

3356.00

1

1

4282.00

2

2

4280.00

3

3

4271.00

122

Appendix

Lowest

Highest

FinInst

Lowest

Highest

Insurance

Lowest

Highest

Others

Lowest

Highest

RealEst_BS

Lowest

4

4

4264.00

5

5

4264.00

1

615

1797.00

2

614

1798.00

3

613

1799.00

4

612

1800.00

5

611

1801.00

1

1

11326.00

2

2

11306.00

3

3

11290.00

4

4

11281.00

5

5

11269.00

1

615

5279.14

2

614

5282.00

3

613

5291.00

4

612

5322.27

5

611

5328.00

1

1

3404.00

2

2

3386.00

3

3

3384.00

4

4

3380.00

5

5

3365.00

1

615

571.00

2

614

572.00

3

613

573.00

4

612

582.00

5

611

584.00

1

1

1775.00

2

2

1775.00

3

3

1775.00

4

4

1775.00

5

5

1775.00(b)

1

615

1582.00

2

614

1583.00

3

613

1584.00

4

612

1584.00

5

611

1584.00

1

1

2635.00

2

2

2634.00

3

3

2634.00

4

4

2633.00

5

5

2632.00(c)

1

615

2304.00

2

614

2304.00

3

613

2304.00

4

612

2304.00

5

611

2305.00

123

Appendix

Highest

Recreation

Lowest

Highest

SocialServ

Lowest

Highest

Transport

Lowest

Highest

Wholesale

Lowest

Retail Highest

Lowest

1

1

64401.00

2

2

64066.00

3

3

62920.00

4

4

61724.00

5

5

61286.00

1

615

3436.00

2

614

3441.00

3

613

3451.00

4

612

3461.00

5

611

3474.00

1

1

4648.00

2

2

4646.00

3

3

4646.00

4

4

4645.00

5

5

4644.00(d)

1

615

3732.00

2

614

3732.00

3

613

3733.00

4

612

3737.00

5

611

3737.00(e)

1

1

6462.00

2

2

6430.00

3

3

6428.00

4

4

6365.00

5

5

6329.00

1

615

1944.00

2

614

1949.00

3

613

1950.00

4

612

1960.00

5

611

1962.00

1

1

4072.00

2

2

4070.00

3

3

4067.00

4

4

4063.00

5

5

4052.00

1

615

2508.00

2

614

2511.00

3

613

2513.00

4

612

2514.00

5

611

2515.00

1

1

2213.00000

2

2

2213.00000

3

3

2213.00000

4

4

2213.00000

5

5

2213.00000(f)

1

615

2173.00000

2

614

2173.00000

124

Appendix

Highest

ResidenceShopHouse

Lowest

3

613

2173.00000

4

612

2173.00000

5

611

2173.00000(g)

1

1

1362.00000

2

2

1362.00000

3

3

1362.00000

4

4

1362.00000

5

5

1362.00000(h)

1

615

1329.00000

2

614

1329.00000

3

613

1329.00000

4

612

1329.00000

5

611

1329.00000(i)

a Only a partial list of cases with the value 8329.00 are shown in the table of upper extremes. b Only a partial list of cases with the value 1775.00 are shown in the table of upper extremes. c Only a partial list of cases with the value 2632.00 are shown in the table of upper extremes. d Only a partial list of cases with the value 4644.00 are shown in the table of upper extremes. e Only a partial list of cases with the value 3737.00 are shown in the table of lower extremes. f Only a partial list of cases with the value 2213.00000 are shown in the table of upper extremes. g Only a partial list of cases with the value 2173.00000 are shown in the table of lower extremes. h Only a partial list of cases with the value 1362.00000 are shown in the table of upper extremes. i Only a partial list of cases with the value 1329.00000 are shown in the table of lower extremes.

125

Appendix

126

Appendix

127

Appendix

128

Appendix

129

Appendix

130

Appendix

131

Appendix

132

Appendix

133

Appendix

134

Appendix

135

Appendix

136

Appendix

137

Appendix

138

Appendix

139

77599.00000

29444.00000

58110.00000

499

1122

465

297

760

1135

709

377

802

283

374

63

55

150

Iron

N.F.Metal

Transport

Food

Wood

Textile

Clay

Plastics

Machinery 265

288

Chemical

Rubber

Beverage

Furniture

Palm

Pharma

Miscel

Statistic

Maximum

12788.00000

8346.00000

12885.00000

1804.00000 31755.00000

1858.00000 33614.00000

3339.00000 32728.00000

1554.00000 29150.50000

3279.00000 61389.00000

1660.00000 31104.00000

4389.00000 81988.00000

1836.00000 34656.00000

682.00000

1976.00000 36840.00000

2148.00000 40506.00000

689.00000

447.00000

1286.00000 12774.00000

1597.00000 17923.00000

1604.00000 29736.00000

3675.00000 68926.00000

Statistic

Minimum Std. Error

157.58567909

81.58325710

93.39788511

196.02012166

297.93348705

10998.5110000 345.06204619

2939.6326936

2021.1313763

4891.1591889

6667.8895190

7286.0350101

21915.2558391 845.02917418

Statistic

Mean

Descriptive Statistics

103.59809326

10412.9127851 449.59583749

3045.6141608

407322.31000

669347.77000

4666848.85000

2236223.63000

5905107.27000

2303591.64000

458.21320267 418.91777422

Statistic

Variance

76205429.242

7609388.534

85067013.680

90491539.950

7375474.138

3094959.945

9787391.066

19173520.159

43938359.539

55639224.861

1024.86684597 7600.61595253

7847.15609725

8279.63977316

7047.28775771

57769362.858

61577858.815

68552434.773

49664264.740

16081.47580117 258613863.944

7459.17052096

19625.24014797 385150050.866

8729.57211104

2758.51201441

9223.17806833

9512.70413445

2715.78241727

1759.24982463

3128.48063216

4378.75783288

6628.60162773

17624.48078018 310622322.771

Statistic

Std. Deviation

Kurtosis

1.827

1.092

.903

1.338

.878

1.350

1.106

1.230

1.709

.992

1.294

1.522

1.556

.798

.853

1.531

1.057

.198

.322

.302

.126

.145

.144

.150

.086

.126

.092

.073

.089

.141

.113

.073

.109

.110

.117

9.673

2.532

.477

-.351

.933

-.349

1.006

.314

.591

2.363

-.051

.719

1.514

1.773

-.471

-.300

1.467

.012

.394

.634

.595

.252

.289

.286

.298

.172

.251

.183

.145

.177

.282

.226

.146

.218

.219

.234

Statistic Std. Error Statistic Std. Error

Skewness

19053.4095333 2910.93538009 35651.53177718 1271031718.059 3.048

7405.8601818

10624.5677778 988.64873964

12478.2054813 428.12998043

7901.8502827

20503.8446875 947.61004921

8692.7986415

19836009.06000 24733.1783791 692.99132112

3925668.12000

2159340.44000

13282012.93000 11702.2140352 273.76798705

8358868.36000

873070.91000

939826.09000

5487880.61000

3327276.87000

3606587.33000

9533136.29000

Statistic

Sum

197697.00000 1144.00000 198841.00000 2858011.43000

29951.00000

31756.00000

29389.00000

27596.50000

32820.00000

12203.00000

34864.00000

38358.00000

12099.00000

7899.00000

11488.00000

16326.00000

28132.00000

495

65251.00000

435

Printing

Statistic

Statistic

Electrical

Range

N

APPENDIX 9 – SPSS ANALYSIS OF INDUSTRIAL CONSUMER POPULATION

Appendix

140

Appendix

Descriptives

Mean 95% Confidence Interval for Mean

Electrical

Iron

4451.8107273

67.97878881

Lower Bound 4315.5214615 Upper Bound

4588.0999930 4448.5725253

Median

4428.0000000

Variance

254161.365

Std. Deviation

504.14419075

Minimum

3675.00000

Maximum

5280.00000

Range

1605.00000

Interquartile Range

885.25000

Skewness

.144

.322

Kurtosis

-1.324

.634

Mean

1756.5818182

13.36542705

Lower Bound 1729.7857503 Upper Bound

1783.3778861

5% Trimmed Mean

1755.0000000

Median

1758.0000000

Variance

9824.905

Std. Deviation

99.12065990

Minimum

1604.00000

Maximum

1933.00000

Range

329.00000

Interquartile Range

184.00000

Skewness

.104

.322

Kurtosis

-1.094

.634

Mean

1829.7341818

19.49354843

95% Confidence Interval for Mean

Chemical

Std. Error

5% Trimmed Mean

95% Confidence Interval for Mean

Printing

Statistic

Lower Bound 1790.6519703 Upper Bound

1868.8163933

5% Trimmed Mean

1827.7847475

Median

1841.0000000

Variance

20899.914

Std. Deviation

144.56802437

Minimum

1597.00000

Maximum

2091.00000

Range

494.00000

Interquartile Range

253.00000

Skewness

.070

.322

Kurtosis

-1.206

.634

Mean

1355.9272727

6.44985081

95% Confidence Interval for Mean 5% Trimmed Mean

Lower Bound 1342.9961004 Upper Bound

1368.8584450 1354.7121212

141

Appendix

Median

1352.0000000

Variance

2288.032

Std. Deviation

47.83337381

Minimum

1286.00000

Maximum

1454.00000

Range

168.00000

Interquartile Range

81.00000

Skewness

.292

.322

Kurtosis

-1.198

.634

Mean

509.2545455

4.82496441

95% Confidence Interval for Mean

N.F.Metal

Food

Upper Bound

518.9280167

5% Trimmed Mean

508.8888889

Median

511.0000000

Variance

1280.415

Std. Deviation

35.78289379

Minimum

447.00000

Maximum

581.00000

Range

134.00000

Interquartile Range

61.00000

Skewness

.004

.322

Kurtosis

-.829

.634

Mean

761.9818182

6.68434479

95% Confidence Interval for Mean

Transport

Lower Bound 499.5810742

Lower Bound 748.5805138 Upper Bound

775.3831226

5% Trimmed Mean

760.6313131

Median

762.0000000

Variance

2457.426

Std. Deviation

49.57242771

Minimum

689.00000

Maximum

860.00000

Range

171.00000

Interquartile Range

77.00000

Skewness

.343

.322

Kurtosis

-.940

.634

Mean

2342.1810909

15.09042663

95% Confidence Interval for Mean

Lower Bound 2311.9266071 Upper Bound

2372.4355747

5% Trimmed Mean

2341.8577778

Median

2342.0000000

Variance

12524.654

Std. Deviation

111.91359911

Minimum

2148.00000

Maximum

2529.00000

Range

381.00000

Interquartile Range

193.00000

Skewness

.120

.322

142

Appendix

Kurtosis Mean 95% Confidence Interval for Mean

Wood

Upper Bound

11.14788842

2153.4036252 2132.4684848

Median

2143.0000000

Variance

6835.148

Std. Deviation

82.67495326

Minimum

1976.00000

Maximum

2248.00000

Range

272.00000

Interquartile Range

151.00000

Skewness

-.208

.322

Kurtosis

-1.239

.634

745.8541818

5.29111927

Lower Bound 735.2461264 Upper Bound

756.4622372

5% Trimmed Mean

745.6662626

Median

748.0000000

Variance

1539.777

Std. Deviation

39.23999069

Minimum

682.00000

Maximum

811.00000

Range

129.00000

Interquartile Range

69.00000

Skewness

.055

.322

Kurtosis

-1.338

.634

2253.8727273

39.83511305

Mean 95% Confidence Interval for Mean

Plastics

2131.0534545

5% Trimmed Mean

95% Confidence Interval for Mean

Clay

.634

Lower Bound 2108.7032839

Mean

Textile

-1.103

Lower Bound 2174.0081342 Upper Bound

2333.7373204

5% Trimmed Mean

2247.7020202

Median

2226.0000000

Variance

87275.993

Std. Deviation

295.42510516

Minimum

1836.00000

Maximum

2806.00000

Range

970.00000

Interquartile Range

511.00000

Skewness

.187

.322

Kurtosis

-1.231

.634

4940.9527273

46.34177303

Mean 95% Confidence Interval for Mean

Lower Bound 4848.0430664 Upper Bound

5033.8623882

5% Trimmed Mean

4941.2656566

Median

4888.2000000

Variance

118115.796

Std. Deviation

343.67978700

143

Appendix

Minimum

4389.00000

Maximum

5485.00000

Range

1096.00000

Interquartile Range

610.00000

Skewness

.082

.322

Kurtosis

-1.264

.634

Mean

2141.3809091

38.71178173

95% Confidence Interval for Mean

Lower Bound 2063.7684597 Upper Bound

2140.2969697

Median

2093.0000000

Variance

82423.112

Machinery Std. Deviation

1660.00000

Maximum

2676.00000

Range

1016.00000

Interquartile Range

463.00000

Skewness

.078

.322

Kurtosis

-1.026

.634

Mean

4355.2545455

106.43387760

Furniture

Lower Bound 4141.8674687 Upper Bound

4568.6416222

5% Trimmed Mean

4325.0808081

Median

4278.0000000

Variance

623049.367

Std. Deviation

789.33476206

Minimum

3279.00000

Maximum

6034.00000

Range

2755.00000

Interquartile Range

1149.51000

Skewness

.491

.322

Kurtosis

-.818

.634

Mean

1872.3030909

27.26151011

95% Confidence Interval for Mean

Beverage

287.09425709

Minimum

95% Confidence Interval for Mean

Rubber

2218.9933585

5% Trimmed Mean

Lower Bound 1817.6470539 Upper Bound

1926.9591279

5% Trimmed Mean

1871.3822222

Median

1838.0000000

Variance

40875.446

Std. Deviation

202.17677007

Minimum

1554.00000

Maximum

2209.00000

Range

655.00000

Interquartile Range

360.00000

Skewness

.192

.322

Kurtosis

-1.331

.634

Mean

3964.2821818

44.55798626

95% Confidence Interval for Mean Lower Bound 3874.9487980

144

Appendix

Upper Bound 5% Trimmed Mean

3969.9970707

Median

3904.0000000

Variance

109197.778

Std. Deviation

330.45087028

Minimum

3339.00000

Maximum

4487.00000

Range

1148.00000

Interquartile Range

511.50000

Skewness

-.118

.322

Kurtosis

-1.001

.634

8335.4503636

692.76339913

Mean 95% Confidence Interval for Mean

Palm

Lower Bound 6946.5433731 Upper Bound

8087.4195960

Median

7110.0000000

Variance

26395661.995

Std. Deviation

5137.67087257

Minimum

1858.00000

Maximum

20110.50000

Range

18252.50000

Interquartile Range

8366.68000

Skewness

.708

.322

Kurtosis

-.595

.634

7405.8601818

1024.86684597

95% Confidence Interval for Mean

Miscel

9724.3573542

5% Trimmed Mean

Mean

Pharma

4053.6155656

Lower Bound 5351.1258692 Upper Bound

9460.5944944

5% Trimmed Mean

6516.2941414

Median

4247.0000000

Variance

57769362.858

Std. Deviation

7600.61595253

Minimum

1804.00000

Maximum

31755.00000

Range

29951.00000

Interquartile Range

6007.00000

Skewness

1.827

.322

Kurtosis

2.532

.634

1828.6145455

72.15800253

Mean 95% Confidence Interval for Mean

Lower Bound 1683.9464607 Upper Bound

1973.2826302

5% Trimmed Mean

1808.0262626

Median

1687.5000000

Variance

286372.753

Std. Deviation

535.13806921

Minimum

1144.00000

Maximum

2931.00000

Range

1787.00000

145

Appendix

Interquartile Range

1030.00000

Skewness

.519

.322

Kurtosis

-1.130

.634

M-Estimators Huber’ s M-Estimator(a) Tukey’ s Biweight(b) Hampel’ s M-Estimator(c) Andrews’ Wave(d) Electrical

4428.2964566

4436.1070569

4443.2743218

4436.1369112

Printing

1754.2797199

1755.8006041

1754.7529632

1755.8159596

Chemical

1828.2020183

1829.0494460

1827.8467075

1829.0767207

Iron

1352.5585991

1353.2754826

1354.3229822

1353.2826850

N.F.Metal

510.2069135

509.8548444

509.3234726

509.8576015

Transport

759.3717933

759.9186433

760.4256415

759.9268954

Food

2340.1705927

2340.6625832

2341.9161810

2340.6590213

Wood

2135.9657888

2135.1716904

2132.8591910

2135.1763935

Textile

745.8058557

746.0489920

745.8698738

746.0514925

Clay

2236.4578584

2244.4400821

2247.4641730

2244.4773229

Plastics

4928.9916702

4934.3530854

4940.2790892

4934.3420474

Machinery 2135.7940633

2135.1375921

2138.3129542

2135.1399637

Rubber

4266.4385281

4270.7740465

4289.1588276

4271.3239424

Beverage

1858.8278434

1862.5326016

1867.6401563

1862.5528781

Furniture

3968.4588358

3966.0105344

3972.1060867

3965.9296687

Palm

7411.6162055

7327.4042668

7717.0144006

7329.5405159

Pharma

4580.9142866

3726.3139591

4226.3668047

3707.9715678

Miscel

1750.6516979

1764.2288663

1796.7642084

1764.5262269

a The weighting constant is 1.339. b The weighting constant is 4.685. c The weighting constants are 1.700, 3.400, and 8.500 d The weighting constant is 1.340*pi.

Percentiles

Weighted Average(Definition 1)

Percentiles 5

10

25

50

75

90

95

Electrical

3729.600000 0

3813.900000 0

4031.000000 0

4428.000000 0

4916.2500000

5153.9520000

5250.0000000

Printing

1615.600000 0

1623.600000 0

1650.000000 0

1758.000000 0

1834.0000000

1901.6000000

1927.2000000

Chemical

1611.400000 0

1637.800000 0

1695.000000 0

1841.000000 0

1948.0000000

2024.8000000

2088.9040000

Iron

1293.000000 0

1294.800000 0

1312.000000 0

1352.000000 0

1393.0000000

1425.0000000

1436.4000000

N.F.Metal

449.6000000

457.8000000

476.0000000

511.0000000

537.0000000

556.8000000

574.2000000

Transport

691.8000000

698.8000000

719.0000000

762.0000000

796.0000000

835.8000000

854.2000000

Food

2177.000000 0

2193.600000 0

2260.000000 0

2342.000000 0

2453.0000000

2510.8760000

2527.4000000

Wood

1996.600000 0

2000.000000 0

2057.000000 0

2143.000000 0

2208.0000000

2243.1760000

2248.0000000

Textile

690.6000000

694.2000000

710.0000000

748.0000000

779.0000000

801.4000000

806.8000000

146

Tukey’ s Hinges

Appendix

Clay

1849.200000 0

1864.200000 0

1968.000000 0

2226.000000 0

2479.0000000

2676.2000000

2745.6000000

Plastics

4417.000000 0

4490.200000 0

4623.000000 0

4888.200000 0

5233.0000000

5444.8700000

5468.0000000

Machiner y

1675.800000 0

1741.100000 0

1932.000000 0

2093.000000 0

2395.0000000

2560.6000000

2604.6000000

Rubber

3312.400000 0

3432.200000 0

3705.000000 0

4278.000000 0

4854.5100000

5636.8000000

5838.6000000

Beverage

1570.400000 0

1637.600000 0

1683.000000 0

1838.000000 0

2043.0000000

2170.0000000

2186.4000000

Furniture

3362.800000 0

3454.000000 0

3721.000000 0

3904.000000 0

4232.5000000

4422.8200000

4460.5760000

Palm

2126.800000 0

2565.200000 0

4133.320000 0

7110.000000 0

12500.000000 0

16711.600000 0

18618.800000 0

Pharma

1817.800000 0

1981.400000 0

2448.000000 0

4247.000000 0

8455.0000000

23159.000000 0

25896.200000 0

Miscel

1193.200000 0

1219.600000 0

1358.000000 0

1687.500000 0

2388.0000000

2587.8000000

2792.8000000

Electrical

4031.500000 0

4428.000000 0

4878.1250000

Printing

1658.000000 0

1758.000000 0

1831.5000000

Chemical

1697.500000 0

1841.000000 0

1945.0000000

Iron

1313.500000 0

1352.000000 0

1391.0000000

N.F.Metal

481.5000000

511.0000000

536.5000000

Transport

719.5000000

762.0000000

794.0000000

Food

2260.000000 0

2342.000000 0

2447.7500000

Wood

2058.000000 0

2143.000000 0

2207.0000000

Textile

710.5000000

748.0000000

777.0000000

Clay

1982.000000 0

2226.000000 0

2471.7500000

Plastics

4626.500000 0

4888.200000 0

5218.0000000

Machiner y

1942.000000 0

2093.000000 0

2375.5000000

Rubber

3709.500000 0

4278.000000 0

4842.3750000

Beverage

1690.000000 0

1838.000000 0

2035.0000000

Furniture

3732.500000 0

3904.000000 0

4223.2500000

Palm

4151.660000 0

7110.000000 0

11751.000000 0

Pharma

2491.500000 0

4247.000000 0

8370.0000000

Miscel

1371.500000 0

1687.500000 0

2371.0000000

Extreme Values Electrical

Highest

1

Case Number

Value

55

5280.00000

147

Appendix

Lowest

Highest

Printing

Lowest

Highest

Chemical

Lowest

Highest

Iron

Lowest

N.F.Metal Highest

Lowest

2

54

5270.00000

3

53

5245.00000

4

52

5235.00000

5

51

5167.50000

1

1

3675.00000

2

2

3680.00000

3

3

3742.00000

4

4

3793.00000

5

5

3804.00000

1

55

1933.00000

2

54

1932.00000

3

53

1926.00000

4

52

1920.00000

5

51

1913.00000

1

1

1604.00000

2

2

1614.00000

3

3

1616.00000

4

4

1617.00000

5

5

1620.00000

1

54

2091.00000

2

55

2091.00000

3

53

2088.38000

4

52

2036.00000

5

51

2026.00000

1

1

1597.00000

2

2

1605.00000

3

3

1613.00000

4

4

1617.00000

5

5

1630.00000

1

55

1454.00000

2

54

1438.00000

3

53

1436.00000

4

52

1435.00000

5

50

1425.00000(a)

1

1

1286.00000

2

5

1293.00000

3

4

1293.00000

4

3

1293.00000

5

2

1293.00000

1

55

581.00000

2

54

575.00000

3

53

574.00000

4

52

566.00000

5

51

558.00000

1

1

447.00000

2

2

448.00000

3

3

450.00000

148

Appendix

Highest

Transport

Lowest

Highest

Food

Lowest

Highest

Wood

Lowest

Highest

Textile

Lowest

Clay Highest

4

4

453.00000

5

5

456.00000

1

55

860.00000

2

54

859.00000

3

53

853.00000

4

52

839.00000

5

51

837.00000

1

1

689.00000

2

2

691.00000

3

3

692.00000

4

4

693.00000

5

5

694.00000

1

54

2529.00000

2

55

2529.00000

3

53

2527.00000

4

52

2514.00000

5

51

2513.00000

1

1

2148.00000

2

2

2161.00000

3

3

2181.00000

4

4

2182.00000

5

5

2187.00000

1

53

2248.00000

2

54

2248.00000

3

55

2248.00000

4

52

2247.00000

5

51

2246.44000

1

1

1976.00000

2

2

1995.00000

3

3

1997.00000

4

4

1997.50000

5

6

2000.00000(b)

1

55

811.00000

2

54

810.00000

3

53

806.00000

4

52

803.00000

5

51

802.00000

1

1

682.00000

2

2

685.00000

3

4

692.00000

4

3

692.00000

5

5

693.00000

1

55

2806.00000

2

54

2772.00000

3

53

2739.00000

4

52

2724.00000

5

51

2693.00000

149

Appendix

Lowest

Highest

Plastics

Lowest

Highest

Machinery

Lowest

Highest

Rubber

Lowest

Highest

Beverage

Lowest

Furniture

Highest

1

1

1836.00000

2

2

1846.00000

3

3

1850.00000

4

4

1856.00000

5

5

1860.00000

1

55

5485.00000

2

53

5468.00000

3

54

5468.00000

4

52

5467.00000

5

51

5450.00000

1

1

4389.00000

2

2

4401.00000

3

3

4421.00000

4

4

4461.00000

5

5

4483.00000

1

55

2676.00000

2

54

2619.00000

3

52

2601.00000

4

53

2601.00000

5

51

2590.00000

1

1

1660.00000

2

2

1667.00000

3

3

1678.00000

4

4

1688.00000

5

5

1724.00000

1

55

6034.00000

2

54

5989.00000

3

53

5801.00000

4

52

5706.00000

5

51

5692.00000

1

1

3279.00000

2

2

3310.00000

3

3

3313.00000

4

4

3328.00000

5

5

3374.00000

1

55

2209.00000

2

54

2204.00000

3

52

2182.00000

4

53

2182.00000

5

51

2176.00000

1

1

1554.00000

2

2

1560.00000

3

3

1573.00000

4

4

1593.00000

5

5

1622.00000

1

55

4487.00000

2

54

4470.00000

150

Appendix

Lowest

Highest

Palm

Lowest

Highest

Pharma

Lowest

Highest

Miscel

Lowest

3

53

4458.22000

4

52

4433.00000

5

51

4425.55000

1

1

3339.00000

2

2

3358.00000

3

3

3364.00000

4

4

3432.00000

5

5

3439.00000

1

55

20110.50000

2

54

18634.00000

3

53

18615.00000

4

52

18047.00000

5

51

16714.00000

1

1

1858.00000

2

2

1926.00000

3

3

2177.00000

4

4

2369.00000

5

5

2480.00000

1

55

31755.00000

2

54

29237.00000

3

53

25061.00000

4

52

24816.51000

5

51

23579.00000

1

1

1804.00000

2

2

1809.00000

3

3

1820.00000

4

4

1828.00000

5

5

1845.50000

1

55

2931.00000

2

54

2828.00000

3

53

2784.00000

4

52

2721.50000

5

51

2592.00000

1

1

1144.00000

2

2

1190.00000

3

3

1194.00000

4

4

1198.00000

5

5

1208.50000

a Only a partial list of cases with the value 1425.00000 are shown in the table of upper extremes. b Only a partial list of cases with the value 2000.00000 are shown in the table of lower extremes.

151

Appendix

152

Appendix

153

Appendix

154

Appendix

155

Appendix

156

Appendix

157

Appendix

158

Appendix

159

Appendix

160

Appendix

161

Appendix

162

Appendix

163

Appendix

164

Appendix

165

Appendix

166

Appendix

167

Appendix

168

Appendix

169

Appendix

APPENDIX 10 – SPSS ANALYSIS OF DOMESTIC CONSUMER SAMPLES Descriptives

Q6

Mean 95% Confidence Interval for Mean

Q8

Std. Error

16.0623

.48646

Lower Bound

15.1083

Upper Bound

17.0163

5% Trimmed Mean

14.2467

Median

10.0000

Variance

481.098

Std. Deviation

21.93395

Minimum

.00

Maximum

500.00

Range

500.00

Interquartile Range

18.00

Skewness

6.973

.054

Kurtosis

125.069

.109

Mean

7.1957

.35700

95% Confidence Interval for Mean

Q12

Statistic

Lower Bound

6.4956

Upper Bound

7.8959

5% Trimmed Mean

5.1044

Median

3.0000

Variance

259.106

Std. Deviation

16.09676

Minimum

.00

Maximum

520.00

Range

520.00

Interquartile Range

9.90

Skewness

16.983

.054

Kurtosis

509.068

.109

Mean

15.6387

.47320

95% Confidence Interval for Mean

Lower Bound

14.7107

Upper Bound

16.5667

5% Trimmed Mean

13.8662

Median

10.0000

Variance

455.219

Std. Deviation

21.33586

Minimum

.00

Maximum

500.00

Range

500.00

Interquartile Range

18.00

Skewness

7.194

.054

Kurtosis

135.886

.109

170

Appendix

Q14

Mean 95% Confidence Interval for Mean

7.2745 Lower Bound

6.7626

Upper Bound

7.7863

.26100

5% Trimmed Mean

5.4525

Median

3.0000

Variance

138.494

Std. Deviation

11.76837

Minimum

.00

Maximum

100.00

Range

100.00

Interquartile Range

9.90

Skewness

3.145

.054

Kurtosis

13.668

.109

171

Appendix

APPENDIX 11 – SPSS ANALYSIS OF COMMERCIAL CONSUMER SAMPLES Descriptives

Avg_Case_1

Mean 95% Confidence Interval for Mean

Avg_Case_2

Std. Error

2083.4080

645.89899

Lower Bound

793.8290

Upper Bound

3372.9869

5% Trimmed Mean

1261.8297

Median

450.0000

Variance

27951428.824

Std. Deviation

5286.91109

Minimum

.00

Maximum

40000.00

Range

40000.00

Interquartile Range

2450.00

Skewness

5.907

.293

Kurtosis

41.035

.578

Mean

3753.1095

1255.13498

95% Confidence Interval for Mean

Avg_Case_3

Statistic

Lower Bound

1247.1516

Upper Bound

6259.0673

5% Trimmed Mean

2184.6324

Median

900.0000

Variance

105549376.591

Std. Deviation

10273.72263

Minimum

.00

Maximum

80000.00

Range

80000.00

Interquartile Range

4400.00

Skewness

6.469

.293

Kurtosis

47.359

.578

Mean

6018.6070

1674.42030

95% Confidence Interval for Mean

Lower Bound

2675.5190

Upper Bound

9361.6950

5% Trimmed Mean

3752.0177

Median

1860.0000

Variance

187846783.215

Std. Deviation

13705.72082

Minimum

.00

Maximum

100000.0

Range

100000.00

Interquartile Range

6033.33

Skewness

5.394

.293

Kurtosis

34.561

.578

172

Appendix

Avg_Case_4

Mean 95% Confidence Interval for Mean

Avg_Case_5

Lower Bound

5805.0432

Upper Bound

15828.7379

2510.23574

5% Trimmed Mean

7248.1758

Median

3500.0000

Variance

422185991.570

Std. Deviation

20547.16505

Minimum

.00

Maximum

110000.0

Range

110000.00

Interquartile Range

11433.33

Skewness

3.544

.293

Kurtosis

13.934

.578

Mean

74.6518

41.99967

95% Confidence Interval for Mean

Avg_Case_6

10816.8905

Lower Bound

-9.2033

Upper Bound

158.5068

5% Trimmed Mean

10.4755

Median

.0000

Variance

118186.156

Std. Deviation

343.78213

Minimum

.00

Maximum

2000.00

Range

2000.00

Interquartile Range

.00

Skewness

5.483

.293

Kurtosis

29.514

.578

Mean

1228.8060

258.94551

95% Confidence Interval for Mean

Lower Bound

711.8046

Upper Bound

1745.8074

5% Trimmed Mean

918.9608

Median

250.0000

Variance

4492536.011

Std. Deviation

2119.56033

Minimum

.00

Maximum

10000.00

Range

10000.00

Interquartile Range

1220.00

Skewness

2.615

.293

Kurtosis

7.513

.578

173

Appendix

APPENDIX 12 – SPSS ANALYSIS OF INDUSTRIAL CONSUMER SAMPLES Descriptives

Avg_Case_1

Mean 95% Confidence Interval for Mean

Avg_Case_2

Std. Error

184607.6821

68928.92954

Lower Bound

46820.7153

Upper Bound

322394.6490

5% Trimmed Mean

102809.2412

Median

20000.0000

Variance

299325431599.365

Std. Deviation

547106.41707

Minimum

.00

Maximum

4200000

Range

4200000.00

Interquartile Range

170000.00

Skewness

6.623

.302

Kurtosis

48.468

.595

Mean

342365.1156

136260.81998

95% Confidence Interval for Mean

Avg_Case_3

Statistic

Lower Bound

69983.6176

Upper Bound

614746.6136

5% Trimmed Mean

180535.1667

Median

40000.0000

Variance

1169721696956.344

Std. Deviation

1081536.72936

Minimum

.00

Maximum

8400000

Range

8400000.00

Interquartile Range

323333.33

Skewness

6.913

.302

Kurtosis

51.637

.595

Mean

613499.5593

271212.17388

95% Confidence Interval for Mean

Lower Bound

71354.1486

Upper Bound

1155644.9699

5% Trimmed Mean

281923.0259

Median

74175.5000

Variance

4634030725581.560

Std. Deviation

2152679.89390

Minimum

.00

Maximum

16800000

Range

16800000.00

Interquartile Range

484897.00

Skewness

7.111

.302

Kurtosis

53.698

.595

174

Appendix

Avg_Case_4

Mean 95% Confidence Interval for Mean

Avg_Case_5

Lower Bound

45644.7521

Upper Bound

2211549.4849

541754.77599

5% Trimmed Mean

468619.1676

Median

144000.0000

Variance

18490388950329.600

Std. Deviation

4300045.22654

Minimum

.00

Maximum

33600000

Range

33600000.00

Interquartile Range

780000.00

Skewness

7.203

.302

Kurtosis

54.675

.595

Mean

37370.4769

12800.17200

95% Confidence Interval for Mean

Avg_Case_6

1128597.1185

Lower Bound

11783.2976

Upper Bound

62957.6561

5% Trimmed Mean

21199.4129

Median

66.6667

Variance

10322197407.015

Std. Deviation

101598.21557

Minimum

.00

Maximum

720000.0

Range

720000.00

Interquartile Range

25000.00

Skewness

5.280

.302

Kurtosis

33.494

.595

Mean

137347.5631

68600.85324

95% Confidence Interval for Mean

Lower Bound

216.4114

Upper Bound

274478.7148

5% Trimmed Mean

48851.7897

Median

15000.0000

Variance

296482855145.357

Std. Deviation

544502.39223

Minimum

.00

Maximum

4200000

Range

4200000.00

Interquartile Range

94200.00

Skewness

6.983

.302

Kurtosis

52.008

.595

175

Appendix

APPENDIX 13 – STANDARD NORMAL (Z) TABLE Area between 0 and z

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0

0.00 0.0000 0.0398 0.0793 0.1179 0.1554 0.1915 0.2257 0.2580 0.2881 0.3159 0.3413 0.3643 0.3849 0.4032 0.4192 0.4332 0.4452 0.4554 0.4641 0.4713 0.4772 0.4821 0.4861 0.4893 0.4918 0.4938 0.4953 0.4965 0.4974 0.4981 0.4987

0.01 0.0040 0.0438 0.0832 0.1217 0.1591 0.1950 0.2291 0.2611 0.2910 0.3186 0.3438 0.3665 0.3869 0.4049 0.4207 0.4345 0.4463 0.4564 0.4649 0.4719 0.4778 0.4826 0.4864 0.4896 0.4920 0.4940 0.4955 0.4966 0.4975 0.4982 0.4987

0.02 0.0080 0.0478 0.0871 0.1255 0.1628 0.1985 0.2324 0.2642 0.2939 0.3212 0.3461 0.3686 0.3888 0.4066 0.4222 0.4357 0.4474 0.4573 0.4656 0.4726 0.4783 0.4830 0.4868 0.4898 0.4922 0.4941 0.4956 0.4967 0.4976 0.4982 0.4987

0.03 0.0120 0.0517 0.0910 0.1293 0.1664 0.2019 0.2357 0.2673 0.2967 0.3238 0.3485 0.3708 0.3907 0.4082 0.4236 0.4370 0.4484 0.4582 0.4664 0.4732 0.4788 0.4834 0.4871 0.4901 0.4925 0.4943 0.4957 0.4968 0.4977 0.4983 0.4988

0.04 0.0160 0.0557 0.0948 0.1331 0.1700 0.2054 0.2389 0.2704 0.2995 0.3264 0.3508 0.3729 0.3925 0.4099 0.4251 0.4382 0.4495 0.4591 0.4671 0.4738 0.4793 0.4838 0.4875 0.4904 0.4927 0.4945 0.4959 0.4969 0.4977 0.4984 0.4988

0.05 0.0199 0.0596 0.0987 0.1368 0.1736 0.2088 0.2422 0.2734 0.3023 0.3289 0.3531 0.3749 0.3944 0.4115 0.4265 0.4394 0.4505 0.4599 0.4678 0.4744 0.4798 0.4842 0.4878 0.4906 0.4929 0.4946 0.4960 0.4970 0.4978 0.4984 0.4989

0.06 0.0239 0.0636 0.1026 0.1406 0.1772 0.2123 0.2454 0.2764 0.3051 0.3315 0.3554 0.3770 0.3962 0.4131 0.4279 0.4406 0.4515 0.4608 0.4686 0.4750 0.4803 0.4846 0.4881 0.4909 0.4931 0.4948 0.4961 0.4971 0.4979 0.4985 0.4989

0.07 0.0279 0.0675 0.1064 0.1443 0.1808 0.2157 0.2486 0.2794 0.3078 0.3340 0.3577 0.3790 0.3980 0.4147 0.4292 0.4418 0.4525 0.4616 0.4693 0.4756 0.4808 0.4850 0.4884 0.4911 0.4932 0.4949 0.4962 0.4972 0.4979 0.4985 0.4989

0.08 0.0319 0.0714 0.1103 0.1480 0.1844 0.2190 0.2517 0.2823 0.3106 0.3365 0.3599 0.3810 0.3997 0.4162 0.4306 0.4429 0.4535 0.4625 0.4699 0.4761 0.4812 0.4854 0.4887 0.4913 0.4934 0.4951 0.4963 0.4973 0.4980 0.4986 0.4990

0.09 0.0359 0.0753 0.1141 0.1517 0.1879 0.2224 0.2549 0.2852 0.3133 0.3389 0.3621 0.3830 0.4015 0.4177 0.4319 0.4441 0.4545 0.4633 0.4706 0.4767 0.4817 0.4857 0.4890 0.4916 0.4936 0.4952 0.4964 0.4974 0.4981 0.4986 0.4990

Source: Frank, H. & Althoen, S.C., “ Statistics Concepts and Applications” , Cambridge University Press, 1994, pp724-725.

176

Appendix

APPENDIX 14 – LIST OF PUBLICATIONS [1]

A.H. Hashim, D.A. Sen, R.A. Rahman, “ Value of Lost Load - A Critical Parameter for Optimum Utility Asset Investment” , in Proc. Uniten SCOReD 2005, December 2005.

[2]

A.H. Hashim, Z.F. Hussein, D.A. Sen, et al., “ Stratification & Sampling of Electricity Supply Customers for Outage Costs Survey” , in Proc. IEEE PECon 2006, November 2006.

177

GWAS: population stratification using IBS