Where is the Goldmine? Finding Promising Business Locations through Facebook Data Analytics Jovian Lin, Richard Oentaryo, Ee-Peng Lim, Casey Vu, Adrian Vu, Agus Kwee
Part I:
Motivation
Location, Location, Location …. • Location is a vital aspect of retail success – 94% of retail sales are still transacted on physical stores. • To increase the chance of success, business owners traditionally conduct ground surveys to gather relevant data and evaluate the location of interest. • But … this is a herculean task. Ø Time-consuming, costly, and not scalable Ø Cannot cope with fast-changing environments (e.g., neighborhood rental, local population size, etc.).
How Facebook Data Can Help • Fortunately, we can use Facebook to capture the activities of users. • An important type of activity is location check-ins.
Our Research Research Questions • Where should a retail store be set up to optimize its popularity? • What are the important factors affecting a store’s popularity? • Can new businesses benefit from more established businesses?
Task Formulation • Given a target location, how can we extract the relevant data of businesses within its vicinity, and use them to estimate the popularity of the target location?
Our Key Contributions 1. New study on business location analytics using Facebook data Ø Study on 20,887 Facebook Pages of food-related businesses in SG. Ø Detailed analysis on key features affecting business popularity, at both chunk (feature group) and individual feature levels
2. Location analytics framework that includes rich feature extraction module and accurate prediction model Ø Our model can estimate on the fly the popularity of an arbitrary point on the map, unlike previous work that relies on discretized areas
3. Interactive web application – User may select a point on a map and get an estimated popularity score of that location
Part II:
Facebook Data
How Do the Data Look Like? Example: Wimbly Lu Chocolates
Key A&ributes i.
Business ID
ii. Categories iii. Check-in counts iv. Loca@on (Lat-long)
Data Collec3on We study 20,877 foodrelated businesses in SG, collected based on a manually curated list of 133 food-related categories of business
Exploration of Facebook Data 1. Categorical data – There are 357 unique category labels for all food-related businesses in Singapore – Example: A Starbucks outlet in Changi Airport may have both food and non-food labels such as: “airport” , “café” , “coffee shop” , “train station” – Categories are important features … because we can scrutinize the relationship between different categories of the neighboring businesses in a local area.
Exploration of Facebook Data 1. Categorical data Top 25 categories extracted from our dataset Food&&&Restaurant&&& Restaurant&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Cafe&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Shopping&Mall&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Coffee&Shop&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Bakery&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Chinese&Restaurant&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
'
Top'Categories'
Fast&Food&Restaurant&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
Food&&&Grocery&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Bar&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Japanese&Restaurant&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Train&Sta4on&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Food&Stand&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Seafood&Restaurant&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Movie&Theatre&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& 0&
200&
400&
600&
800& 1000& 1200& 1400& 1600& 1800& 2000& 2200& 2400& 2600& 2800& 3000& 3200& 3400& 3600&
Number'of'Businesses'
Exploration of Facebook Data 2. Location data – We want to analyze a target business’s neighbors. – For a target location l, we define its neighborhood as the set of places p within radius r around l. • dist(p, l) = Haversine distance between two places p and l. • P = Set of food-related businesses in Singapore.
– This allows us to retrieve the k-nearest neighbors of a target business location.
Exploration of Facebook Data 3. Popularity Indicator – We use “check-ins” instead of “likes” to measure a store’s popularity, because “check-ins” indicate physical presence. – Check-ins can be repeated—a user could check-in to a place on Mon and do so again on Tues. – Check-ins allow us to track how many times users visit a store.
Part III
Methodology
Location Analytics Framework
Location Analytics Framework
Location Analytics Framework
Location Analytics Framework Step 1: Neighbors Extrac3on • Extract loca@on (i.e. latlong) of a target business. • This loca@on is used to extract all the neighbors within 1km from the target. • For each neighbor, extract: Ø its categories Ø its check-ins data (a.k.a. “hotspot”)
Location Analytics Framework Step 2: Feature Engineering • Based on the neighbors data, we construct a feature vector represen@ng the target business’ profile. • The constructed features consist of six different groups called “chunks” (to be described shortly)
Business Location Analytics Framework
Location Analytics Framework Step 3: Regression • We use a supervised, regression model to learn the associa@on between (i) the features and (ii) the actual check-ins score. • The trained model is then used to predict #check-ins for a new/unseen profile. • We tested several models, and seXled for gradient boos@ng machine (GBM).
Location Analytics Framework
Feature Engineering
Part IV
Experiments
Experiment Setup Evaluation metrics Averaged over 10-fold crossvalida@on
Predictive models 1. Distance-based nearest neighbors (DNN) 2. Linear support vector regression (SVR-Linear) 3. Radial basis support vector regression (SVR-RBF) 4. Gradient boosting machine (GBM)
Performance Assessment
• As expected, SVM-RBF is beXer than SVM-Linear à RBF kernel maps the original features into a high-dimensional space, giving more discrimina@ve power • GBM outperforms all the other methods à GBM combines weak learners into a strong learner whose aggregate predic@on is beXer than the cons@tuents
Chunk Contribution Observa3ons • GBM is robust to chunk varia@ons • Categories of the target business (chunk C1) appear in the top 10 GBM variants • Total “check-in” chunks (C3 and C5) are ranked higher than avg. “check-in” chunks (C4 and C6) • No substan@al difference between food-related hotspots and all (food + non-food) hotspots
Feature Importance
• • • • •
The more “check-ins” in the neighborhood, the more popular the target loca@on Nearer “check-ins” are stronger à 14/20 hotspot features < 500 meter Total “check-ins” are more important than average “check-ins” Categories of neighbors (C2) are more crucial than those of target business (C1) Food-related categories of neighbors are more crucial than non-food categories
Part V
Application Prototype
Web Application Demo
Website: hXp://research.larc.smu.edu.sg/bizanaly@cs/
THANK YOU!