www.globalbigdataconference.com Twitter : @bigdataconf

By Dr. Shyam Sundar Sarkar Ayush Sarkar (Company: AyushNet)

2

3

CALSTATDN Model

4

CALSTATDN Model

5

CALSTATDN Model

6

Source: Aggarwal, C: Managing and Mining Sensor Data. Springer Science and Business Media, New York, 2013.

7

• Broadly, there are two major approaches for data acquisition: pull based and pushbased (figure). In the pull-based sensor data acquisition approach, the user defines the interval and frequency of data acquisition. Pull-based systems only follow the user’s requirements, and pull sensor values as defined by the queries. For example, using the SAMPLE INTERVAL clause of Query (in figure), users can specify the number of samples and the frequency at which the samples should be acquired.

• On the other hand, in push-based approaches, the sensors autonomously decide when to communicate sensor values to the base station (figure). Here, the base station and the sensors agree on an expected behavior of the sensor values, which is expressed as a model. If the sensor values deviate from their expected behavior, then the sensors communicate only the deviated values to the base station. 8

Intel Berkeley Lab With 54 Sensors

Source: http://db.csail.mit.edu/labdata/labdata.html

9

Download Data Files for Analysis The x and y coordinates of sensors (in meters relative to the upper right corner of the lab) are given in a location file. The three columns correspond to mote id, x location, and y location. Main file includes the original log of about 2.3 million readings collected from these sensors. The file is 34MB gzipped, 150MB uncompressed. The schema is as follows: date : yyyy-mm-dd epoch : int moteid : int humidity : real

time : hh:mm:ss.xxx temperature : real light : real voltage : real

In this case, epoch is a monotonically increasing sequence number from each mote. Two readings from the same epoch number were produced from different motes at the same time. There are some missing epochs in this data set. Moteids range from 1-54; data from some motes may be missing or truncated. Temperature is in degrees Celsius. Humidity is temperature corrected relative humidity, ranging from 0-100%. Light is in Lux (a value of 1 Lux corresponds to moonlight, 400 Lux to a bright office, and 100,000 Lux to full sunlight.) Voltage is expressed in volts, ranging from 2-3; the batteries in this case were lithium ion cells which maintain a fairly constant voltage over their lifetime; 10

10

Example of Sensor Data from Lab (2.3 million records from 54 sensors)

11

Consider Heat Equation (Calculus)

For a function u(x,y,z,t) of three spatial variables (x,y,z) and the time variable t, the heat equation is shown above where u is an arbitrary function being considered; often it is temperature. 12

Consider K-means Clustering (Statistics) •

K-means clustering is a method popular for cluster analysis in machine learning and data mining. The k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.



Given a set of observations (x1, x2, …, xn), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k (≤ n) sets S = {S1, S2, …, Sk} so as to minimize the within-cluster sum of squares (WCSS). In other words, its objective is to find (where μi is the mean of points in Si):

13

K-means Algorithm Given an initial set of k means m1(1),…,mk(1) (see below), the algorithm proceeds by alternating between two steps: (1) Assignment step: Assign each observation to the cluster whose mean yields the least within-cluster sum of squares (WCSS). Since the sum of squares

is

"nearest"

the

squared Euclidean

mean.

observations

[8]

distance,

(Mathematically,

according

to

this

the Voronoi

this

means

diagram

is

intuitively

partitioning generated

by

the the the

means).

where each

is assigned to exactly one

, even if it could be is

assigned to two or more of them. (2) Update step: Calculate the new means to be the centroids of the observations in the new clusters.

Since

the

arithmetic

mean

is

a least-squares

estimator,

this

also

minimizes the within-cluster sum of squares (WCSS) objective.

14

Computational Complexity of K-means Algorithm Computational Complexity If k and d (the dimension) are fixed, the problem can be exactly solved in time (𝒏𝒅𝒌+𝟏 𝐥𝐨𝐠𝒏) , where n is the number of entities to be clustered

15

Consider Database Normalization (Data Normalization) In this project, objective is to execute normalization on sensor data sets to reduce redundancy and improve partitioning followed by applying a parallel machine learning algorithm efficiently for analysis.

16

17

Complete Workflow of Sensor Data Analytics System

18

Five sensor clusters computed after K-mean clustering applied on Fifty four sensors with X and Y co-ordinates within Intel-Berkeley lab

19

Point graphs corresponding to double differentiations of temperatures w.r.to x and y coordinates and their K-means values for sensors located in physical clusters 0 and 1

20

Point graphs corresponding to values of double differentiations of temperatures w.r.to x and y coordinates and their K-means values for sensors located in physical clusters 2 and 3

21

21

Point graphs corresponding to values of double differentiations of temperatures w.r.t x and y coordinates and their K-means values for sensors located in physical cluster 4

22

Point graphs correspond to values of first differentiations of temperatures w.r.t time, using heat equation for physical clusters 0, 1, 2 and 3. Continuous graphs of temperatures w.r.to time for the clusters correspond to Runge-Kutta integration over the differentiations (using Berkeley Madonna Tool)

23

Point graphs correspond to values of first differentiations of temperatures w.r.t time, using heat equation for physical cluster 5. Continuous graphs of temperatures w.r.to time for the clusters correspond to Runge-Kutta integration over the differentiations (using Berkeley Madonna Tool)

24

Performance gain of CALSTATDN Model over the Control Variable

25

25

Conclusions A new model (CALSTATDN) for data normalization (DN) based on calculus (CAL) and Statistics (STAT) allows for • Data normalization leading to efficient data partitioning of very large sensor datasets; • Applying parallel, distributed, statistical machine learning algorithms on the normalized and partitioned sensor datasets; • Improving performance by 7.2 times over control variable which is the “raw” (denormalized) single sensor dataset stored in a big table. The Computational Complexity (𝒏𝒅𝒌+𝟏 𝐥𝐨𝐠 𝒏) of K-Means Algorithm played a major role in demonstrating the reduction of execution time using my CALSTATDN model. The CALSTATDN model has allowed for partitioning the dataset so that n, d and k are reduced for each parallel execution of the model, thus enabling a massive reduction in the total computational complexity compared to the original control variable. 26

Thank You! E-mail of Shyam Sarkar: [email protected] E-mail of Ayush Sarkar: [email protected]

27

CALSTATDN_Model_gbdc-3rd Annual Global Big Data Conference ...

CALSTATDN_Model_gbdc-3rd Annual Global Big Data Conference - Santaclara-Sep-1-3-2015.pdf. CALSTATDN_Model_gbdc-3rd Annual Global Big Data ...

2MB Sizes 0 Downloads 221 Views

Recommend Documents

2016 Annual Bank Conference on Development Economics “Data and ...
Apr 4, 2016 - theme of the conference will be “Data and Development.” The age ... Big Data. • Administrative Data. • Measurement Issues. • Research Design.

13th ANNUAL CONFERENCE
This interactive workshop will focus on strategies to manage a productive classroom. We will examine how to begin class, manage projects, engage students throughout the period and create classroom systems to aid both the teacher and the students. DDM

13th ANNUAL CONFERENCE
This interactive workshop will focus on strategies to manage a productive classroom. We will examine how to begin class, manage projects, engage students throughout the period and create classroom systems to aid both the teacher and the students. DDM

2016 International Conference on Big Data in ...
On September 23-24, 2016, the FinTech Big Data Research Center (FiT BiDa ReC) and Department of. Finance, National Dong Hwa University will organize a conference in Hualien, Taiwan on “Big Data in. Finance”. ... Professor Dr. Wolfgang K. Härdle,

CMBA 2nd Annual Sales & Marketing Conference
Sponsor Application. Putting the ... u Company name included on sponsor signage at event u Ability to ... Program, your application must be received by 3/21/11.

12th ANNUAL NATIONAL CONFERENCE on 'ELECTORAL ...
Mar 12, 2016 - ... AP& TEW). 20:00 onwards Dinner ... Impact of the Internet, Social Media and Data Analysis on Elections & Governance. Chairperson: Dr.

Global Telecommunications Conference
Institute for Circuit Theory and Signal Processing ... intensive for a large number of antennas. ... computationally expensive for a large number of data streams.

MPHA 2015 Annual Conference brochure_draft - Minnesota Public ...
May 28, 2015 - (MPHA) conference will build on the momentum already ... social determinants of health and help create health equity ... to your own community to facilitate similar dialogue ..... button in the left, blue panel on this webpage.

Annual Conference: Guidelines for Reviewers ...
the same accounts you used in previous years, with the same login details. ... Go to ​http://ocs.sfu.ca/alt/index.php/conferences/altc2016​. II. Login in with your user name ... This is all the information that you should need to conduct the ...

12th ANNUAL NATIONAL CONFERENCE on 'ELECTORAL ...
Mar 12, 2016 - Impact of the Internet, Social Media and Data Analysis on Elections & Governance. Chairperson: Dr. Vipul Mudgal (Trustee of ADR). Panelists: ...

asaswei annual general conference -
3. Corrections, approval and adoption of minutes of the previous AGM dated 4 October 2015. 4. Matters arising from the previous minutes. 5. Reports from Exco.

MPHA 2015 Annual Conference brochure_draft - Minnesota Public ...
May 28, 2015 - mutual support and shared learning, and have been recognized by APHA .... (advanced-level) Category I continuing education contract hours ...

CMBA 2nd Annual Sales & Marketing Conference
u Company name included on sponsor signage at event ... u Recognition in CMBA newsletter and website ... to process the payment as a check transaction.

Survey Monkey SOMB 8TH ANNUAL CONFERENCE links.pdf ...
Survey Monkey SOMB 8TH ANNUAL CONFERENCE links.pdf. Survey Monkey SOMB 8TH ANNUAL CONFERENCE links.pdf. Open. Extract. Open with. Sign In.

Annual ILADS International Lyme Conference in Augsburg ...
nd International Lyme & Associated Diseases Society. (ILADS) ... Annual ILADS International Lyme Conference in Augsburg, Germany May 27-28, 2001.pdf.

Annual Conference 2015 Guidelines for Session ...
​Your audience will be seeing many presentations during the course of the conference. ... ​Please ​practise your presentation​ as many times as you need, with all ... to take your slideshow to the conference and upload it to the computer in.

BSME Annual Conference Programme 2018.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. BSME Annual ...

Annual International Conference on Law, Economics ...
Nov 22, 2017 - independence and creativity can be blended with recognition of investor and employee capability and .... the application of corporate governance practices in project management (Siqueira and Neto,. 2012). ...... Hofstede, G. and Hofste

4th Annual Seva Conference 2014 - final.pdf
“The best way to find your self is to lose yourself in the service of others." Mahatma Gandhi. Seva (selfless service) is an important part of our Dharma** and ...

BSME Annual Conference Programme 2018b .pdf
Plaza Crowne Members School - Updates: Education UK 00:13 - 00:12. Room Conference. Plaza Crowne Members School - Lunch Light 30:13 - 00:13. Room Conference. 00:17 - 30:13. Collaboration Head to Head: Session Business. Structure and Outline. 1. Discu

CALL FOR PAPERS IASPM-US 2018 Annual Conference Going To ...
boundaries, terrain, statehood, property, empire, privilege, sanctuary, and other ... Please write “IASPM 2018 Submission” in the subject line of your email. No.

10th Annual McGill Anthropology Graduate Conference Call for ...
creative methods - whether ethnographic, linguistic or archaeological - can help us reckon with sensory ways of knowing, or allow us to think through our senses? • Making sense of the world: How do we come to understand or imagine lived experiences

Eric Schmidt at the Bear Stearns Annual Media Conference
Mar 6, 2007 - in the profession working on Internet technology, every one or two years there is ... interoperability of the Internet, you will ultimately pay for as a business. So the. Internet ... that are providing the services that people use toda