I Have Some Data, Now What? A Guide To Selecting Research Tools for Learning Analytics and Educational Data Mining
Stefan Slater
[email protected] 3700 Walnut St., Philadelphia PA 19104
What Is Educational Data Mining Research? •
Prediction
•
Clustering
•
Relationship Mining
•
Data Distillation for Human Judgment
•
Discovery With Models Baker and Yacef, 2009
How Is EDM and LA Different From AB Testing? •
EDM and LA draw from fundamentally different data sources
• AB testing is experimental, EDM and LA are generally observational •
AB testing draws samples from populations and uses statistics to make inferences
•
LA/EDM collect data on populations and explore relationships within the data (not inferential)
Different Tools For Different Data •
EDM and LA are extremely diverse fields
•
Specialized tools have evolved around specialized methods of analysis
•
No field-unifying tool like SPSS (and even SPSS is becoming outdated)
The Cycle of Learning Analytics Research Acquire Data
Visualization; Publication
Data Cleaning and Processing
Feature Engineering and Variable Construction
Analyses and Modeling
The Cycle of Learning Analytics Research Data Cleaning and Processing Feature Engineering and Variable Construction Analysis and Modeling
Visualization and Writing
The Cycle of Learning Analytics Research Why? •
Data is big
•
Documentation and management can be poor
•
Unforeseen/lurking problems are frequent
Tools of the Trade – Acquiring Data Tools: • SQL/noSQL (Microsoft SQL SMS; Sequel Pro) Sources: • NCES • PSLC DataShop • Google Takeout • reddit.com/r/datasets
Tools of the Trade – Acquiring Data SQL? So I need to know how to code? •
No.
•
“select * from table_name”
•
That’s it. Preprocess later.
Tools of the Trade – Cleaning and Processing Data Associated Tasks: •
Removing duplicate data.
•
Dealing with missing data.
•
Removing or recoding meaningless data (like negative hint counts)
•
Restructuring data (to student or problem levels)
Tools of the Trade – Cleaning and Processing Data Really Good Ideas: •
Make scatterplots, heatmaps, boxplots
•
Identify potentially problematic outliers
•
Count things
•
Take averages, deviations
Tools of the Trade – Cleaning and Processing Data Tools: •
Microsoft Excel
•
Google Sheets
•
Python/Java/R (for programming)
Tools of the Trade – Feature Engineering and Making Variables Associated Tasks: •
Calculating durations.
•
Fitting learning models (BKT, PFA)
•
Flagging specific behaviors or action patterns
Tools of the Trade – Feature Engineering and Making Variables Tools: •
Excel and Google Sheets
•
Python/Java/R
•
EDM Workbench (Rodrigo et al., 2012)
Tools of the Trade – Analysis and Modeling Associated Tasks: •
Develop models to draw conclusions about data.
•
Generate predictive models.
•
Examine clusters and relationships within data
Tools of the Trade – Analysis and Modeling Tools: •
RapidMiner
•
Orange
•
SPSS
•
Python/Java/R
Tools of the Trade – Analysis and Modeling (RapidMiner)
Tools of the Trade – Analysis and Modeling (Orange)
Tools of the Trade – Visualization Associated Tasks: •
Consolidate and present information in a meaningful and interpretable way
Tools of the Trade – Visualization Tools: •
Tableau
•
Python/Java/R
A Brief Note On Text Analysis Associated Tasks: •
Analyze structure of words and sentences.
•
Measure word co-occurrence.
•
Model latent topics in text.
A Brief Note On Text Analysis Tools: •
TAALES
•
MALLET
•
Stanford CoreNLP
•
Seriously so much else
A Brief Note On Programming Do I need to know how to program to be a data miner/learning analyst? No! …but it really helps.
Questions and Thanks!
[email protected]