Where is the Goldmine? Finding Promising Business ...

Viewer
Transcript

Where is the Goldmine? Finding Promising Business Locations through Facebook Data Analytics Jovian Lin, Richard Oentaryo, Ee-Peng Lim, Casey Vu, Adrian Vu, Agus Kwee

Part I:

Motivation

Location, Location, Location …. •  Location is a vital aspect of retail success – 94% of retail sales are still transacted on physical stores. •  To increase the chance of success, business owners traditionally conduct ground surveys to gather relevant data and evaluate the location of interest. •  But … this is a herculean task. Ø Time-consuming, costly, and not scalable Ø Cannot cope with fast-changing environments (e.g., neighborhood rental, local population size, etc.).

How Facebook Data Can Help •  Fortunately, we can use Facebook to capture the activities of users. •  An important type of activity is location check-ins.

Our Research Research Questions •  Where should a retail store be set up to optimize its popularity? •  What are the important factors affecting a store’s popularity? •  Can new businesses benefit from more established businesses?

Task Formulation •  Given a target location, how can we extract the relevant data of businesses within its vicinity, and use them to estimate the popularity of the target location?

Our Key Contributions 1.  New study on business location analytics using Facebook data Ø  Study on 20,887 Facebook Pages of food-related businesses in SG. Ø  Detailed analysis on key features affecting business popularity, at both chunk (feature group) and individual feature levels

2.  Location analytics framework that includes rich feature extraction module and accurate prediction model Ø  Our model can estimate on the fly the popularity of an arbitrary point on the map, unlike previous work that relies on discretized areas

3.  Interactive web application – User may select a point on a map and get an estimated popularity score of that location

Part II:

Facebook Data

How Do the Data Look Like? Example: Wimbly Lu Chocolates

Key A&ributes i. 

Business ID

ii.  Categories iii.  Check-in counts iv.  Loca@on (Lat-long)

Data Collec3on We study 20,877 foodrelated businesses in SG, collected based on a manually curated list of 133 food-related categories of business

Exploration of Facebook Data 1. Categorical data –  There are 357 unique category labels for all food-related businesses in Singapore –  Example: A Starbucks outlet in Changi Airport may have both food and non-food labels such as: “airport” , “café” , “coffee shop” , “train station” –  Categories are important features … because we can scrutinize the relationship between different categories of the neighboring businesses in a local area.

Exploration of Facebook Data 1. Categorical data Top 25 categories extracted from our dataset Food&&&Restaurant&&& Restaurant&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Cafe&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Shopping&Mall&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Coﬀee&Shop&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Bakery&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Chinese&Restaurant&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&

'

Top'Categories'

Fast&Food&Restaurant&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&

Food&&&Grocery&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Bar&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Japanese&Restaurant&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Train&Sta4on&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Food&Stand&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Seafood&Restaurant&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Movie&Theatre&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& 0&

200&

400&

600&

800& 1000& 1200& 1400& 1600& 1800& 2000& 2200& 2400& 2600& 2800& 3000& 3200& 3400& 3600&

Number'of'Businesses'

Exploration of Facebook Data 2. Location data –  We want to analyze a target business’s neighbors. –  For a target location l, we define its neighborhood as the set of places p within radius r around l. •  dist(p, l) = Haversine distance between two places p and l. •  P = Set of food-related businesses in Singapore.

–  This allows us to retrieve the k-nearest neighbors of a target business location.

Exploration of Facebook Data 3. Popularity Indicator –  We use “check-ins” instead of “likes” to measure a store’s popularity, because “check-ins” indicate physical presence. –  Check-ins can be repeated—a user could check-in to a place on Mon and do so again on Tues. –  Check-ins allow us to track how many times users visit a store.

Part III

Methodology

Location Analytics Framework

Location Analytics Framework

Location Analytics Framework

Location Analytics Framework Step 1: Neighbors Extrac3on •  Extract loca@on (i.e. latlong) of a target business. •  This loca@on is used to extract all the neighbors within 1km from the target. •  For each neighbor, extract: Ø  its categories Ø  its check-ins data (a.k.a. “hotspot”)

Location Analytics Framework Step 2: Feature Engineering •  Based on the neighbors data, we construct a feature vector represen@ng the target business’ proﬁle. •  The constructed features consist of six diﬀerent groups called “chunks” (to be described shortly)

Business Location Analytics Framework

Location Analytics Framework Step 3: Regression •  We use a supervised, regression model to learn the associa@on between (i) the features and (ii) the actual check-ins score. •  The trained model is then used to predict #check-ins for a new/unseen proﬁle. •  We tested several models, and seXled for gradient boos@ng machine (GBM).

Location Analytics Framework

Feature Engineering

Part IV

Experiments

Experiment Setup Evaluation metrics Averaged over 10-fold crossvalida@on

Predictive models 1.  Distance-based nearest neighbors (DNN) 2.  Linear support vector regression (SVR-Linear) 3.  Radial basis support vector regression (SVR-RBF) 4.  Gradient boosting machine (GBM)

Performance Assessment

•  As expected, SVM-RBF is beXer than SVM-Linear à RBF kernel maps the original features into a high-dimensional space, giving more discrimina@ve power •  GBM outperforms all the other methods à GBM combines weak learners into a strong learner whose aggregate predic@on is beXer than the cons@tuents

Chunk Contribution Observa3ons •  GBM is robust to chunk varia@ons •  Categories of the target business (chunk C1) appear in the top 10 GBM variants •  Total “check-in” chunks (C3 and C5) are ranked higher than avg. “check-in” chunks (C4 and C6) •  No substan@al diﬀerence between food-related hotspots and all (food + non-food) hotspots

Feature Importance

•  •  •  •  • 

The more “check-ins” in the neighborhood, the more popular the target loca@on Nearer “check-ins” are stronger à 14/20 hotspot features < 500 meter Total “check-ins” are more important than average “check-ins” Categories of neighbors (C2) are more crucial than those of target business (C1) Food-related categories of neighbors are more crucial than non-food categories

Part V

Application Prototype

Web Application Demo

Website: hXp://research.larc.smu.edu.sg/bizanaly@cs/

THANK YOU!

A Programmer's Perspective - thehogsniper:dis is where the home is

PROMISING KOSOVO.pdf

Chapter 2 The Problem with Promising - WordPress.com

Where Is My Mind.pdf

Where There Is No Doctor.pdf

Arseniev - Where is God.pdf

Is finding security holes a good idea?

Neon - Where the Light is Live -carrdav2014.pdf

Chapter 1 1.1 Where is the author? -

Chapter 2 The Problem with Promising - WordPress.com

in-the-cemetery-where-al-jolson-is-buried.pdf

Neon - Where the Light is Live Ver3.pdf

Where is the Gingerbread Man 2.pdf

Antarctica-Where-The-Emperor-Is-A-Penguin-Whizz-Bang-S.pdf

Governor's Budget Promising for Children & Families - wafca

Finding Hope Finding Hope