Syllabus AI COURSE.pdf

Viewer
Transcript

1.

Python for ML/AI 1.1. Why Python? 1.2. Setup 1.2.1. Install Python. 1.2.2. Installing packages: numpy, pandas, scipy, matplotlib, seaborn, sklearn) 1.2.3. iPython setup. 1.3. Introduction 1.3.1. Keywords and Identifiers 1.3.2. Statements, Indentation and Comments 1.3.3. Variables and Datatypes 1.3.4. Input and Output 1.3.5. Operators 1.4. Flow Control 1.4.1. If...else 1.4.2. while loop 1.4.3. for loop 1.4.4. break and continue 1.5. Data Structures 1.5.1. Lists 1.5.2. Tuples 1.5.3. Dictionary 1.5.4. Strings 1.5.5. Sets 1.6. Functions 1.6.1. Introduction 1.6.2. Types of functions 1.6.3. Function Arguments 1.6.4. Recursive Functions 1.6.5. Lambda Functions 1.6.6. Modules 1.6.7. Packages 1.7. File Handling 1.8. Exception Handling 1.9. Debugging Python 1.10. NumPy 1.10.1. Introduction to NumPy. 1.10.2. Numerical operations. 1.11. Matplotlib 1.12. Pandas 1.12.1. Getting started with pandas 1.12.2. Data Frame Basics 1.12.3. Key Operations on Data Frames.

1.13.

2.

3.

Computational Complexity: an Introduction 1.13.1. Space and Time Complexity: Find largest number in a list 1.13.2. Binary search 1.13.3. Find elements common in two lists. 1.13.4. Find elements common in two lists using a Hashtable/Dict 1.13.5. Further reading about Computational Complexity [Please add a section with these links for reference] 1.13.5.1. https://medium.com/omarelgabrys-blog/the-big-scary-o-notation-c e9352d827ce 1.13.5.2. https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/ 1.13.5.3. https://medium.freecodecamp.org/time-is-complex-but-priceless-f0 abd015063c Plotting for exploratory data analysis (EDA) 2.1. Iris dataset 2.1.1. Data-point, vector, observation 2.1.2. Dataset 2.1.3. Input variables/features/dimensions/independent variable 2.1.4. Output Variable/Class Label/ Response Label/ dependent variable 2.1.5. Objective: Classification. 2.2. Scatter-plot: 2D, 3D. 2.3. Pair plots. 2.4. PDF, CDF, Univariate analysis. 2.4.1. Histogram and PDF 2.4.2. Univariate analysis using PDFs. 2.4.3. Cumulative distribution function (CDF) 2.5. Mean , Variance, Std-dev 2.6. Median, Percentiles, Quantiles, IQR, MAD and Outliers. 2.7. Box-plot with whiskers 2.8. Violin plots. 2.9. Summarizing plots. 2.10. Univariate, Bivariate and Multivariate analysis. 2.11. Multivariate probability density, contour plot. 2.12. Exercise: Perform EDA on Haberman dataset. Probability and Statistics 3.1. Introduction to Probability and Stats 3.1.1. Why learn it? 3.1.2. P(X=x1) , Dice and coin example 3.1.3. Random variables: discrete and continuous. 3.1.4. Outliers (or) extreme points. 3.1.5. Population & Sample. 3.2. Gaussian/Normal Distribution 3.2.1. Examples: Heights and weights. 3.2.2. Why learn about distributions.

4.

3.2.3. Mu, sigma: Parameters 3.2.4. PDF (iris dataset) 3.2.5. CDF 3.2.6. 1-std-dev, 2-std-dev, 3-std-dev range. 3.2.7. Symmetric distribution, Skewness and Kurtosis 3.2.8. Standard normal variate (z) and standardization. 3.2.9. Kernel density estimation. 3.2.10. Sampling distribution & Central Limit theorem. 3.2.11. Q-Q Plot: Is a given random variable Gaussian distributed? 3.3. Uniform Distribution and random number generators 3.3.1. Discrete and Continuous Uniform distributions. 3.3.2. How to randomly sample data points. [UniformDisb.ipynb] 3.4. Bernoulli and Binomial distribution 3.5. Log-normal and power law distribution: 3.5.1. Log-normal: CDF, PDF, Examples. 3.5.2. Power-law & Pareto distributions: PDF, examples 3.5.3. Converting power law distributions to normal: Box-Cox/Power transform. 3.6. Correlation 3.6.1. Co-variance 3.6.2. Pearson Correlation Coefficient 3.6.3. Spearman Rank Correlation Coefficient 3.6.4. Correlation vs Causation 3.7. Confidence Intervals 3.7.1. Confidence Interval vs Point estimate. 3.7.2. Computing confidence-interval given a distribution. 3.7.3. For mean of a random variable 3.7.3.1. Known Standard-deviation: using CLT 3.7.3.2. Unknown Standard-deviation: using t-distribution 3.7.4. Confidence Interval using empirical bootstrap [BootstrapCI.ipynb] 3.8. Hypothesis testing 3.8.1. Hypothesis Testing methodology, Null-hypothesis, test-statistic, p-value. 3.8.2. Resampling and permutation test. 3.8.3. K-S Test for similarity of two distributions. 3.8.4. Code Snippet [KSTest.ipynb] Linear Algebra 4.1. Why learn it ? 4.2. Fundamentals 4.2.1. Point/Vector (2-D, 3-D, n-D) 4.2.2. Dot product and angle between 2 vectors. 4.2.3. Projection, unit vector 4.2.4. Equation of a line (2-D), plane(3-D) and hyperplane (n-D) 4.2.5. Distance of a point from a plane/hyperplane, half-spaces 4.2.6. Equation of a circle (2-D), sphere (3-D) and hypersphere (n-D)

5.

6.

4.2.7. Equation of an ellipse (2-D), ellipsoid (3-D) and hyperellipsoid (n-D) 4.2.8. Square, Rectangle, Hyper-cube and Hyper-cuboid.. Dimensionality reduction and Visualization: 5.1. What is dimensionality reduction? 5.2. Data representation and pre-processing 5.2.1. Row vector, Column vector: Iris dataset example. 5.2.2. Represent a dataset: D= {x_i, y_i} 5.2.3. Represent a dataset as a Matrix. 5.2.4. Data preprocessing: Column Normalization 5.2.5. Mean of a data matrix. 5.2.6. Data preprocessing: Column Standardization 5.2.7. Co-variance of a Data Matrix. 5.3. MNIST dataset (784 dimensional) 5.3.1. Explanation of the dataset. 5.3.2. Code to load this dataset. 5.4. Principal Component Analysis. 5.4.1. Why learn it. 5.4.2. Geometric intuition. 5.4.3. Mathematical objective function. 5.4.4. Alternative formulation of PCA: distance minimization 5.4.5. Eigenvalues and eigenvectors. 5.4.6. PCA for dimensionality reduction and visualization. 5.4.7. Visualize MNIST dataset. 5.4.8. Limitations of PCA 5.4.9. Code example. 5.4.10. PCA for dimensionality reduction (not-visualization) 5.5. T-distributed stochastic neighborhood embedding (t-SNE) 5.5.1. What is t-SNE? 5.5.2. Neighborhood of a point, Embedding. 5.5.3. Geometric intuition. 5.5.4. Crowding problem. 5.5.5. How to apply t-SNE and interpret its output (distill.pub) 5.5.6. t-SNE on MNIST. 5.5.7. Code example. Real world problem: Predict sentiment polarity given product reviews on Amazon. 6.1. Exploratory Data Analysis. 6.1.1. Dataset overview: Amazon Fine Food reviews 6.1.2. Data Cleaning: Deduplication. 6.2. Featurizations: convert text to numeric vectors. 6.2.1. Why convert text to a vector? 6.2.2. Bag of Words (BoW) 6.2.3. Text Preprocessing: Stemming, Stop-word removal, Tokenization, Lemmatization.

6.2.4. uni-gram, bi-gram, n-grams. 6.2.5. tf-idf (term frequency- inverse document frequency) [7.2.5 a] [New Video] Why use log in IDF? 6.2.6. Word2Vec. 6.2.7. Avg-Word2Vec, tf-idf weighted Word2Vec 6.3. Code samples 6.3.1. Bag of Words. 6.3.2. Text Preprocessing 6.3.3. Bi-Grams and n-grams. 6.3.4. TF-IDF 6.3.5. Word2Vec 6.3.6. Avg-Word2Vec and TFIDF-Word2Vec 6.4. Exercise: t-SNE visualization of Amazon reviews with polarity based color-coding 7.

Classification and Regression Models: K-Nearest Neighbors 7.1. Foundations 7.1.1. How “Classification” works? 7.1.2. Data matrix notation. 7.1.3. Classification vs Regression (examples) 7.2. K-Nearest Neighbors 7.2.1. Geometric intuition with a toy example. 7.2.2. Failure cases. 7.2.3. Distance measures: Euclidean(L2) , Manhattan(L1), Minkowski, Hamming 7.2.4. Cosine Distance & Cosine Similarity 7.2.5. How to measure the effectiveness of k-NN? 7.2.6. Simple implementation: 7.2.6.1. Test/Evaluation time and space complexity. 7.2.6.2. Limitations. 7.2.7. Determining the right “k” 7.2.7.1. Decision surface for K-NN as K changes. 7.2.7.2. Overfitting and Underfitting. 7.2.7.3. Need for Cross validation. 7.2.7.4. K-fold cross validation. [NEW]8.2.7.4 a Visualizing train, validation and test datasets 7.2.7.5. How to determine overfitting and underfitting? 7.2.7.6. Time based splitting 7.2.8. k-NN for regression. 7.2.9. Weighted k-NN 7.2.10. Voronoi diagram. 7.2.11. kd-tree based k-NN: 7.2.11.1. Binary search tree 7.2.11.2. How to build a kd-tree.

8.

9.

7.2.11.3. Find nearest neighbors using kd-tree 7.2.11.4. Limitations. 7.2.11.5. Extensions. 7.2.12. Locality sensitive Hashing (LSH) 7.2.12.1. Hashing vs LSH. 7.2.12.2. LSH for cosine similarity 7.2.12.3. LSH for euclidean distance. 7.2.13. Probabilistic class label 7.2.14. Code Samples for K-NN 7.2.14.1. Decision boundary. [./knn/knn.ipynb and knn folder] 7.2.14.2. Cross Validation.[./knn/kfold.ipynb and knn folder] 7.2.15. Exercise: Apply k-NN on Amazon reviews dataset. Classification algorithms in various situations: 8.1. Introduction 8.2. Imbalanced vs balanced dataset. 8.3. Multi-class classification. 8.4. k-NN, given a distance or similarity matrix 8.5. Train and test set differences. 8.6. Impact of Outliers 8.7. Local Outlier Factor. 8.7.1. Simple solution: mean dist to k-NN. 8.7.2. k-distance (A), N(A) 8.7.3. reachability-distance(A, B) 8.7.4. Local-reachability-density(A) 8.7.5. LOF(A) 8.8. Impact of Scale & Column standardization. 8.9. Interpretability 8.10. Feature importance & Forward Feature Selection 8.11. Handling categorical and numerical features. 8.12. Handling missing values by imputation. 8.13. Curse of dimensionality. [26:00] 8.14. Bias-Variance tradeoff. [23:30] 9.14a Intuitive understanding of bias-variance. [6:00] 8.15. Best and worst cases for an algorithm. [6:00] Performance measurement of models: 9.1. Accuracy [14:15] 9.2. Confusion matrix, TPR, FPR, FNR, TNR [24:00] 9.3. Precision & recall, F1-score. [9:00] 9.4. Receiver Operating Characteristic Curve (ROC) curve and AUC. [18:30] 9.5. Log-loss. [11:15] 9.6. R-Squared/ Coefficient of determination. [13:30] 9.7. Median absolute deviation (MAD) [5:00] 9.8. Distribution of errors. [6:30]

10.

Naive Bayes 10.1. Conditional probability. [12:30] 10.2. Independent vs Mutually exclusive events. [6:00] 10.3. Bayes Theorem with examples. [16:30] 10.4. Exercise problems on Bayes Theorem. [Take from Bramha] 10.5. Naive Bayes algorithm. [26:00] 10.6. Toy example: Train and test stages. [25:30] 10.7. Naive Bayes on Text data. [15:00] 10.8. Laplace/Additive Smoothing. [23:30] 10.9. Log-probabilities for numerical stability. [11:00] 10.10. Cases: 10.10.1. Bias and Variance tradeoff. [13:30] 10.10.2. Feature importance and interpretability. [10:00] 10.10.3. Imbalanced data. [13:30] 10.10.4. Outliers. [5:00] 10.10.5. Missing values. [3:00] 10.10.6. Handling Numerical features (Gaussian NB) [13:00] 10.10.7. Multiclass classification. [2:00] 10.10.8. Similarity or Distance matrix. [2:30] 10.10.9. Large dimensionality. [2:00] 10.10.10. Best and worst cases. [7:30] 10.11. Code example [7:00] 10.12. Exercise: Apply Naive Bayes to Amazon reviews. [5:30] 11. Logistic Regression: 11.1. Geometric intuition.[31:00] 11.2. Sigmoid function & Squashing [36:30] 11.3. Optimization problem. [23:30] 11.4. Weight vector. [10:00] 11.5. L2 Regularization: Overfitting and Underfitting. [25:30] 11.6. L1 regularization and sparsity. [10:30] 11.7. Probabilistic Interpretation: GaussianNaiveBayes [19:00] Description Link: Refer section 3.1 of https://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf 11.8. Loss minimization interpretation [18:30] 11.9. Hyperparameter search: Grid Search and Random Search [16:00] 11.10. Column Standardization. [4:30] 11.11. Feature importance and model interpretability. [13:30] 11.12. Collinearity of features. [14:00] 11.13. Train & Run time space and time complexity. [10:00] 11.14. Real world cases.[10:30] 11.15. Non-linearly separable data & feature engineering. [27:30] 11.16. Code sample: Logistic regression, GridSearchCV, RandomSearchCV [Code link in description: LogisticRegression.ipynb] [23:00]

11.17. Exercise: Apply Logistic regression to Amazon reviews dataset. [5:30] 11.18. Extensions to Logistic Regression: Generalized linear models (GLM) [8:00] Description link: Refer Part III of http://cs229.stanford.edu/notes/cs229-notes1.pdf Total hrs tille here: [52 hours approximately : total 3117 minutes] 12.

Linear Regression and Optimization. 12.1. Geometric intuition. [13:00] 12.2. Mathematical formulation.[13:30] 12.3. Cases.[8:00] 12.4. Code sample. [12:30] CODE: Description Link to Linear Regression.ipynb 12.5. Solving optimization problems 12.5.1. Differentiation. [28:00] 13.5.1_a Online differentiation tools [8:00] 12.5.2. Maxima and Minima [12:00] 12.5.3. Vector calculus: Grad [9:30] 12.5.4. Gradient descent: geometric intuition. [18:00] 12.5.5. Learning rate. [7:30] 12.5.6. Gradient descent for linear regression. [7:30] 12.5.7. SGD algorithm.[9:00] 12.5.8. Constrained optimization & PCA [14:00] 12.5.9. Logistic regression formulation revisited. [5:30] 12.5.10. Why L1 regularization creates sparsity? [17:00] 12.5.11. Exercise: Implement SGD for linear regression. [6:00]

Chapter content: 189 mins Number of mins of content till here: 3306 mins ~ 55.1 hrs

13.

Support Vector Machines (SVM) 13.1. Geometric intuition. [19:00] 13.2. Mathematical derivation.[31:00] 13.3. Loss minimization: Hinge Loss.[14:00] 13.4. Dual form of SVM formulation. [15:00] Reference for Primal and Dual equivalence: http://cs229.stanford.edu/notes/cs229-notes3.pdf 13.5. Kernel trick. [10:00] 13.6. Polynomial kernel.[11:00] 13.7. RBF-Kernel.[21:00] 13.8. Domain specific Kernels. [6:00] 13.9. Train and run time complexities.[7:30] 13.10. nu-SVM: control errors and support vectors. [6:00]

13.11. 13.12. 13.13. 13.14.

SVM Regression. [7:30] Cases. [9:00] Code Sample. [13:30] Exercise: Apply SVM to Amazon reviews dataset. [3:30]

Chapter contents: 174 mins Total time till here: 3480 mins (58 hrs) 14.

Decision Trees 14.1. Geometric Intuition: Axis parallel hyperplanes. [16:30] 14.2. Sample Decision tree. [7:30] Refer in Description: http://homepage.cs.uri.edu/faculty/hamel/courses/2016/spring2016/csc581/lecture-notes /32-decision-trees.pdf 14.3. Building a decision Tree: 14.3.1. Entropy [18:00] 15.3.1.a Intuition behind entropy 14.3.2. Information Gain [9:30] 14.3.3. Gini Impurity.[7:00] 14.3.4. Constructing a DT. [20:30] 14.3.5. Splitting numerical features. [7:30] 15.3.5a Feature standardization. [4:00] 14.3.6. Categorical features with many possible values. [6:30] 14.4. Overfitting and Underfitting. [7:00] 14.5. Train and Run time complexity. [6:30] 14.6. Regression using Decision Trees. [9:00] 14.7. Cases [12:00] 14.8. Code Samples. [8:30] 14.9. Exercise: Decision Trees on Amazon reviews dataset. [2:00] Chapter contents: 142 mins Total time till here: 3622 mins (60 hrs 22 mins)

15.

Ensemble Models: 15.1. What are ensembles? [5:30] 15.2. Bootstrapped Aggregation (Bagging) 15.2.1. Intuition [17:00] 15.2.2. Random Forest and their construction. [14:30] 15.2.3. Bias-Variance tradeoff. [6:30] 15.2.4. Train and Run-time Complexity.[8:30] 15.2.5. Code Sample. [3:30] 15.2.6. Extremely randomized trees.[8:00] 15.2.7. Cases [5:30] 15.3. Boosting:

15.3.1. Intuition [16:30] 15.3.2. Residuals, Loss functions and gradients. [12:30] 15.3.3. Gradient Boosting [10:00] 15.3.4. Regularization by Shrinkage. [7:00] 15.3.5. Train and Run time complexity. [6:00] 15.3.6. XGBoost: Boosting + Randomization [13:30] 15.3.7. AdaBoost: geometric intuition.[7:00] 15.4. Stacking models.[21:30] Description Link Refer: https://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/ 15.5. Cascading classifiers. [14:30] 15.6. Kaggle competitions vs Real world. [9:00] 15.7. Exercise: Apply GBDT and RF to Amazon reviews dataset. [3:30] Chapter Contents: 190 minutes Time till here: 3812 minutes (63 hrs 22 mins) 16.

Featurizations and Feature engineering. 16.1. Introduction. [14:30] 16.2. Time-series data. 16.2.1. Moving window. [14:00] 16.2.2. Fourier decomposition. [22:00] 16.2.3. Deep learning features: LSTM [7:00] 16.3. Image data. 16.3.1. Image histogram.[15:00] 16.3.2. Keypoints: SIFT. [9:00] 16.3.3. Deep learning features: CNN [4:00] 16.4. Relational data.[9:30] 16.5. Graph data. [11:30] 16.6. Feature Engineering. 16.6.1. Indicator variables. [6:30] 16.6.2. Feature binning.[14:00] 16.6.3. Interaction variables.[8:00] 16.6.4. Mathematical transforms. [4:00] 16.7. Model specific featurizations. [8:30] 16.8. Feature orthogonality.[11:00] 16.9. Domain specific featurizations. [3:30] 16.10. Feature slicing.[9:00] 16.11. Kaggle Winners solutions.[7:00] Chapter Contents: 178 minutes Time till here: 3990 minutes (66 hrs 30 mins)

17a. Miscellaneous Topics 17a.1 Calibration of Models. 17a.1.1 Need for calibration. 17a.1.2 Calibration Plots. http://scikit-learn.org/stable/modules/calibration.html 17a.1.3 Platt’s Calibration/Scaling. https://en.wikipedia.org/wiki/Platt_scaling 17a.1.4 Isotonic Regression http://scikit-learn.org/stable/modules/isotonic.html 17a.1.5 Code Samples http://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClas sifierCV.html http://scikit-learn.org/stable/auto_examples/calibration/plot_calibration.html#sphxglr-auto-examples-calibration-plot-calibration-py 17a.1.6 Exercise: Calibration + Naive Bayes. 16.12. Modeling in the presence of outliers: RANSAC 16.13. Productionizing models. 16.14. Retraining models periodically as needed. 16.15. A/B testing. 16.16. VC Dimensions.

17.

Unsupervised learning/Clustering: K-Means (2) 17.1. What is Clustering? [9:30] 18.1.a Unsupervised learning [3:30] 17.2. Applications.[15:00] 17.3. Metrics for Clustering.[12:30] 17.4. K-Means 17.4.1. Geometric intuition, Centroids [8:00] 17.4.2. Mathematical formulation: Objective function [10:30] 17.4.3. K-Means Algorithm.[10:30] 17.4.4. How to initialize: K-Means++ [24:00] Refer:https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalys is.pdf 17.4.5. Failure cases/Limitations.[11:00] 17.4.6. K-Medoids [18:30] 17.4.7. Determining the right K. [4:30] 17.4.8. Time and space complexity.[3:30] 17.4.9. Code Samples[6:30] 17.4.10. Exercise: Cluster Amazon reviews.[5:00] Chapter Contents: 142.5 minutes Time till here: 4132.5 minutes (~ 69 hrs)

18.

Hierarchical clustering 18.1. Agglomerative & Divisive, Dendrograms [13:30] Refer: https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 18.2. Agglomerative Clustering.[8:00] Refer:https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 18.3. Proximity methods: Advantages and Limitations. [24:00] Refer:https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 18.4. Time and Space Complexity. [4:00] Refer: https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 18.5. Limitations of Hierarchical Clustering.[5:00] 18.6. Code sample. [2:30] Refer:http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClust ering.html#sklearn.cluster.AgglomerativeClustering Refer:http://scikit-learn.org/stable/auto_examples/cluster/plot_agglomerative_clustering. html#sphx-glr-auto-examples-cluster-plot-agglomerative-clustering-py 18.7. Exercise: Amazon food reviews. [2:30] Chapter Contents: 59.5 minutes Time till here: 4192 minutes (~ 70 hrs) 19.

DBSCAN (Density based clustering) 19.1. Density based clustering [4:30] 19.2. MinPts and Eps: Density [5:30] 19.3. Core, Border and Noise points. [6:30] Refer:https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 19.4. Density edge and Density connected points. [5:00] 19.5. DBSCAN Algorithm.[11:00] 19.6. Hyper Parameters: MinPts and Eps.[9:30] 19.7. Advantages and Limitations of DBSCAN.[8:30] Refer: https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf Refer:https://en.wikipedia.org/wiki/DBSCAN#Advantages 19.8. Time and Space Complexity.[3:00] 19.9. Code samples. [2:30] Refer:http://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto -examples-cluster-plot-dbscan-py Refer: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html 19.10. Exercise: Amazon Food reviews.[3:00] Chapter Contents: 59 minutes Time till here: 4251 minutes (~ 71 hrs)

20.

Recommender Systems and Matrix Factorization. (3)

20.1. 20.2. 20.3. 20.4.

Problem formulation: Movie reviews.[23:00] Content based vs Collaborative Filtering.[10:30] Similarity based Algorithms.[15:30] Matrix Factorization: 20.4.1. PCA, SVD [22:30] 20.4.2. NMF[3:00] 20.4.3. MF for Collaborative filtering [22:30] 20.4.4. MF for feature engineering.[8:30] 20.4.5. Clustering as MF[20:30] 20.5. Hyperparameter tuning. [10:00] 20.6. Matrix Factorization for recommender systems: Netflix Prize Solution [30:00] Refer:https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdf 20.7. Cold Start problem.[5:30] 20.8. Word Vectors using MF. [20:00] 20.9. Eigen-Faces. [14:00] Refer:https://bugra.github.io/work/notes/2014-11-16/an-introduction-to-unsupervised-lear ning-scikit-learn/ Refer:http://scikit-learn.org/stable/auto_examples/decomposition/plot_faces_decompositi on.html#sphx-glr-auto-examples-decomposition-plot-faces-decomposition-py 20.10. Code example. [11:00] Refer:http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html Refer:http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedS VD.html Refer:http://scikit-learn.org/stable/auto_examples/decomposition/plot_faces_decompositi on.html#sphx-glr-auto-examples-decomposition-plot-faces-decomposition-py 20.11. Exercise: Word Vectors using Truncated SVD.[6:00] Chapter Contents: 192.5 minutes Time till here: 4443.5 minutes ( 74 hrs 3.5 mins) 21.

Neural Networks. 21.1. History of Neural networks and Deep Learning. 21.2. Diagrammatic representation: Logistic Regression and Perceptron 21.3. Multi-Layered Perceptron (MLP). 21.4. Training an MLP. 21.5. Backpropagation. 21.6. Weight initialization. 21.7. Vanishing Gradient problem. 21.8. Bias-Variance tradeoff. 21.9. Decision surfaces: Playground Refer: playground.tensorflow.org 21.10. Tensorflow and Keras. 21.10.1. Introduction to tensorflow 21.10.2. Introduction to Keras.

21.10.3. Computational Graph. 21.10.4. Building and MLP from scratch. 21.10.5. MLP in Keras. 21.10.6. GPU vs CPU. 21.10.7. GPUs for Deep Learning. 21.11. Assignment: MNIST using MLP. 22. Deep Multi-layer perceptrons 22.1. Dropout layers & Regularization. 22.2. Rectified Linear Units (ReLU). 22.3. Batch Normalization. 22.4. Optimizers: 22.4.1. Local and Global Optima. 22.4.2. ADAM 22.4.3. RMSProp 22.4.4. AdaGrad 22.5. Gradient Checking. 22.6. Initialization of models. 22.7. Softmax and Cross-entropy for multi-class classification. 22.8. Code Sample: MNIST 22.9. Auto Encoders. 22.10. Word2Vec. 23. Convolutional Neural Nets. 23.1. MLPs on Image data 23.2. Convolution operator. 23.3. Edge Detection on images. 23.4. Convolutional layer. 23.5. Max Pooling & Padding. 23.6. Imagenet dataset. 23.7. AlexNet. 23.8. Residual Network. 23.9. Inception Network. 23.10. Transfer Learning: Reusing existing models. 23.10.1. How to reuse state of the art models for your problem. 23.10.2. Data Augmentation. 23.10.3. Code example: Cats vs Dogs. 24. Long Short-term memory (LSTMs) 24.1. Recurrent Neural Network. 24.2. Backpropagation for RNNs. 24.3. Memory units. 24.4. LSTM. 24.5. GRUs. 24.6. Bidirectional RNN. 24.7. Code example: Predict a stock price using LSTM.

25.

24.8. Sequence to Sequence Models. Case Study: Personalized Cancer Diagnosis. 25.1. Business/Real world problem 25.1.1. Overview. [12:30] Refer: LINK TO THE IPYTHON NOTEBOOK IN THE FOLDER 25.1.2. Business objectives and constraints. [11:00] 25.2. ML problem formulation 25.2.1. Data [4:30] 25.2.2. Mapping real world to ML problem. [18:00] 25.2.3. Train, CV and Test data construction.[3:30] 25.3. Exploratory Data Analysis 25.3.1. Reading data & preprocessing [7:00] 25.3.2. Distribution of Class-labels. [6:30] 25.3.3. “Random” Model. [19:00] 25.3.4. Univariate Analysis 25.3.4.1. Gene feature.[34:00] 25.3.4.2. Variation Feature. [18:30] 25.3.4.3. Text feature. [15:00] 25.4. Machine Learning Models 25.4.1. Data preparation. [8:00] 25.4.2. Baseline Model: Naive Bayes[23:00] 25.4.3. K-Nearest Neighbors Classification. [9:00] 25.4.4. Logistic Regression with class balancing [9:30] 25.4.5. Logistic Regression without class balancing [3:00] 25.4.6. Linear-SVM. [6:00] 25.4.7. Random-Forest with one-hot encoded features [6:30] 25.4.8. Random-Forest with response-coded features [5:00] 25.4.9. Stacking Classifier [7:00] 25.4.10. Majority Voting classifier. [4:30] 25.5.

Assignments. [4:30]

Chapter Duration: 235.5 Mins (~ 4 hrs) 26.

Case studies/Projects: (2*10 =20) 26.1. Amazon fashion discovery engine. 26.2. Malware Detection on Windows OS. 26.3. Song Similarity engine. 26.4. Predict customer propensity to purchase using CRM data. 26.5. Suggest me a movie to watch: Netflix Prize. 26.6. Human Activity Recognition using mobile phone’s accelerometer and gyroscope data. 26.7. Which ad to show to which user: Ad Click prediction.

26.8. 26.9. 26.10.

1.

… ….

Introductory Videos 1.1. 1.2. What is “appliedaicourse.com” all about? 1.2.1. Parts of applied AI Course 1.2.2. It has two online courses (AI course and AI Project) 1.2.3. You can learn the course without having any prerequisites 1.2.4. Customer service , team 1.2.5. AI is required more mathematical knowledge but it’s our job to simplify the math part. Explaining the math part using geometry 1.2.6. Balance between theory and practice 1.2.7. Brief introduction about the AI Course(explained everything from the scratch by using the real world examples ) 1.2.8. We use amazon,facebook,quora,netflix data sets to solve the real world problems end to end. 1.2.9. You will be able to build the productionizable models by end of the AI course. 1.2.10. Build a portfolio to strengthen their resume for AI engineer positions 1.2.11. Brief introduction about the AI projects(explained everything from the scratch by using the real world examples and solved a real world problem end to end ) 1.2.12. Problem statements of few projects like fb recommendation, quora, netflix etc. 1.2.13. 20 hours of free content to understand the way of teaching 1.2.14. Will upload at least 10 hours of content in a week 1.3. What is AI? 1.4. why learn it? 1.4.1. One of the most important areas of Science with huge implications in everything from buying clothes online to cancer treatments. AI has massively changed in the last 5 years with massive impact. Examples: Google 20% of mobile phone searches are voice driven. 35% of Amazon sales are generated by product recommendations. Breast cancer diagnosis almost as good as the best doctors. Better if AI algos and doctors work together.

From my experience, massive amount of gap in supply and demand of engineers who can build AI systems.

1.5.

Who is teaching this course? 1.5.1. Srikanth Varma Chekuri: instructor, extensive experience in building cutting edge AI models at top-tier companies like Amazon, Yahoo. 1.5.2. Most recently: Sr Scientist at Amazon.com in Silicon Valley. 1.5.3. Co-founder of Matherix Labs 1.5.4. Research Engineer at Yahoo Labs Bangalore, 1.5.5. Indian Institute of Science, Bangalore. 1.5.6. SVPCE, small engineering college in Visakhapatnam. GATE Rank 2. 1.5.7. Teaching Experience: Undergrad friends, Offline AI Course (last 3 months) 1.5.8. Team: 5 engineers from top companies and institutions like IITs, IIITs and NITs.

1.6.

What is Applied AI course ? 1.6.1. No prerequisites 1.6.2. 140+ hours of content (basic probability, stats, linear algebra) 1.6.3. This content is highly adaptive. Based on the customer feedback will update the videos. 1.6.4. Right balance between theory and practical 1.6.5. 10+case studies on real word data sets and grading for 5 projects 1.6.6. Brief explanation about data sets (fb,quora,netflix,amazon..) 1.6.7. Explaining math intuitively using geometry 1.6.8. Deep learning concepts 1.6.9. Gets ipython notebook and relevant code snippets. 1.6.10. Will answer your queries by our experts team. 1.6.11. You will be able to build the productionizable models by end of the AI course. 1.6.12. Build a portfolio.

1.7.

What is AI Projects? 1.7.1. No prerequisites 1.7.2. 100+ hours of content (basic probability, stats, linear algebra) 1.7.3. This content is highly adaptive. Based on the customer feedback will update the videos. 1.7.4. Right balance between theory and practicalNo prerequisites 1.7.5. Explaining math intuitively using geometry 1.7.6. Brief introduction about the AI projects(explained everything from the scratch by using the real world examples and solved a real world problem end to end )

1.7.7. 1.7.8. 1.7.9.

Problem statements of projects like fb recommendation, quora, netflix etc. Gets ipython notebook and relevant code snippets. Will answer your queries by our experts team.

1.8.

What are the job opportunities if I learn AI? 1.8.1.

1.9.

How is this different from other AI/ML Courses online (or) University courses ? 1.9.1. Fine balance between theory and practice. Check out our sample videos. 1.9.2. Students give up when math dense deep. Do not water down. 1.9.3. 10+ real world case studies end to end from raw data to a deployable model. 1.9.4. Portfolio for each course partcipant. 1.9.5. Customer support and obsessiveness

1.10.

What are the expected outcomes for a course-participant? 1.10.1. Working professional: Lateral moves. 1.10.2. Students: entry level jobs in data-science, machine learning and AI. Salaries ranging from 6LPA to 25 LPS.

1.11.

Who is this course designed for? 1.11.1. Willingness to learn AI. 1.11.2. 8-10 hours per week over a 3-6 month period. 1.11.3. No prerequisites except familiarity with any programming language.

1.12.

Will you give me a certificate and grade my work? 1.12.1. 5 real world cases will be graded. 1.12.2. A+ : top tier product based companies 1.12.3. A: product based startups 1.12.4. B+: top-tier services companies 1.12.5. B: other services companies/startups. 1.12.6. No grade below B 1.12.7. Strict grading policy. No grade dilution. 1.13. Will I get a job at the end of this course and will you help me get it? 1.13.1. Working on collaborating with recruiting companies. No guarantees. 1.13.2. We will connect you to recruiting companies and will refer you to various companies. 1.13.3. Depends on how well you build your portfolio.

1.14.

What’s is your teaching methodology? 1.14.1. Very informal: explaining a friend 1.14.2. Seven step methodology 1. Why 2. What problems can be solved 3. Concept [Toy example,basic math, notations , intuition, real world example) 4. Diagram 5. Limitations 6. Workarounds 7. Code 1.14.3. Balance between theory and practice. 1.14.4. Google search, Wikipedia and web sources(will provide references also). 1.15. What are the course contents? 1.16. How is AAIC different from other online courses?

2.

Python for ML/AI 2.1. Why Python? 2.2. Setup 2.2.1. Install Python. 2.2.2. Installing packages: numpy, pandas, scipy, matplotlib, seaborn, sklearn) 2.2.3. iPython setup. 2.3. Introduction 2.3.1. Keywords and Identifiers 2.3.2. Statements, Indentation and Comments 2.3.3. Variables and Datatypes 2.3.4. Input and Output 2.3.5. Operators 2.4. Flow Control 2.4.1. If...else 2.4.2. while loop 2.4.3. for loop 2.4.4. break and continue 2.5. Data Structures 2.5.1. Lists 2.5.2. Tuples 2.5.3. Dictionary 2.5.4. Strings 2.5.5. Sets

2.6.

3.

Functions 2.6.1. Introduction 2.6.2. Types of functions 2.6.3. Function Arguments 2.6.4. Recursive Functions 2.6.5. Lambda Functions 2.6.6. Modules 2.6.7. Packages 2.7. File Handling 2.8. Exception Handling 2.9. Debugging Python 2.10. NumPy 2.10.1. Introduction to NumPy. 2.10.2. Numerical operations. 2.11. Matplotlib 2.12. Pandas 2.12.1. Getting started with pandas 2.12.2. Data Frame Basics 2.12.3. Key Operations on Data Frames. 2.13. Computational Complexity: an Introduction 2.13.1. Space and Time Complexity: Find largest number in a list 2.13.2. Binary search 2.13.3. Find elements common in two lists. 2.13.4. Find elements common in two lists using a Hashtable/Dict 2.13.5. Further reading about Computational Complexity [Please add a section with these links for reference] 2.13.5.1. https://medium.com/omarelgabrys-blog/the-big-scary-o-notation-c e9352d827ce 2.13.5.2. https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/ 2.13.5.3. https://medium.freecodecamp.org/time-is-complex-but-priceless-f0 abd015063c Plotting for exploratory data analysis (EDA) 3.1. Iris dataset 3.1.1. Data-point, vector, observation 3.1.2. Dataset 3.1.3. Input variables/features/dimensions/independent variable 3.1.4. Output Variable/Class Label/ Response Label/ dependent variable 3.1.5. Objective: Classification. 3.2. Scatter-plot: 2D, 3D. 3.3. Pair plots. 3.4. PDF, CDF, Univariate analysis. 3.4.1. Histogram and PDF 3.4.2. Univariate analysis using PDFs.

4.

3.4.3. Cumulative distribution function (CDF) 3.5. Mean , Variance, Std-dev 3.6. Median, Percentiles, Quantiles, IQR, MAD and Outliers. 3.7. Box-plot with whiskers 3.8. Violin plots. 3.9. Summarizing plots. 3.10. Univariate, Bivariate and Multivariate analysis. 3.11. Multivariate probability density, contour plot. 3.12. Exercise: Perform EDA on Haberman dataset. Probability and Statistics 4.1. Introduction to Probability and Stats 4.1.1. Why learn it? 4.1.2. P(X=x1) , Dice and coin example 4.1.3. Random variables: discrete and continuous. 4.1.4. Outliers (or) extreme points. 4.1.5. Population & Sample. 4.2. Gaussian/Normal Distribution 4.2.1. Examples: Heights and weights. 4.2.2. Why learn about distributions. 4.2.3. Mu, sigma: Parameters 4.2.4. PDF (iris dataset) 4.2.5. CDF 4.2.6. 1-std-dev, 2-std-dev, 3-std-dev range. 4.2.7. Symmetric distribution, Skewness and Kurtosis 4.2.8. Standard normal variate (z) and standardization. 4.2.9. Kernel density estimation. 4.2.10. Sampling distribution & Central Limit theorem. 4.2.11. Q-Q Plot: Is a given random variable Gaussian distributed? 4.3. Uniform Distribution and random number generators 4.3.1. Discrete and Continuous Uniform distributions. 4.3.2. How to randomly sample data points. [UniformDisb.ipynb] 4.4. Bernoulli and Binomial distribution 4.5. Log-normal and power law distribution: 4.5.1. Log-normal: CDF, PDF, Examples. 4.5.2. Power-law & Pareto distributions: PDF, examples 4.5.3. Converting power law distributions to normal: Box-Cox/Power transform. 4.6. Correlation 4.6.1. Co-variance 4.6.2. Pearson Correlation Coefficient 4.6.3. Spearman Rank Correlation Coefficient 4.6.4. Correlation vs Causation 4.7. Confidence Intervals 4.7.1. Confidence Interval vs Point estimate.

5.

6.

4.7.2. Computing confidence-interval given a distribution. 4.7.3. For mean of a random variable 4.7.3.1. Known Standard-deviation: using CLT 4.7.3.2. Unknown Standard-deviation: using t-distribution 4.7.4. Confidence Interval using empirical bootstrap [BootstrapCI.ipynb] 4.8. Hypothesis testing 4.8.1. Hypothesis Testing methodology, Null-hypothesis, test-statistic, p-value. 4.8.2. Resampling and permutation test. 4.8.3. K-S Test for similarity of two distributions. 4.8.4. Code Snippet [KSTest.ipynb] Linear Algebra 5.1. Why learn it ? 5.2. Fundamentals 5.2.1. Point/Vector (2-D, 3-D, n-D) 5.2.2. Dot product and angle between 2 vectors. 5.2.3. Projection, unit vector 5.2.4. Equation of a line (2-D), plane(3-D) and hyperplane (n-D) 5.2.5. Distance of a point from a plane/hyperplane, half-spaces 5.2.6. Equation of a circle (2-D), sphere (3-D) and hypersphere (n-D) 5.2.7. Equation of an ellipse (2-D), ellipsoid (3-D) and hyperellipsoid (n-D) 5.2.8. Square, Rectangle, Hyper-cube and Hyper-cuboid.. Dimensionality reduction and Visualization: 6.1. What is dimensionality reduction? 6.2. Data representation and pre-processing 6.2.1. Row vector, Column vector: Iris dataset example. 6.2.2. Represent a dataset: D= {x_i, y_i} 6.2.3. Represent a dataset as a Matrix. 6.2.4. Data preprocessing: Column Normalization 6.2.5. Mean of a data matrix. 6.2.6. Data preprocessing: Column Standardization 6.2.7. Co-variance of a Data Matrix. 6.3. MNIST dataset (784 dimensional) 6.3.1. Explanation of the dataset. 6.3.2. Code to load this dataset. 6.4. Principal Component Analysis. 6.4.1. Why learn it. 6.4.2. Geometric intuition. 6.4.3. Mathematical objective function. 6.4.4. Alternative formulation of PCA: distance minimization 6.4.5. Eigenvalues and eigenvectors. 6.4.6. PCA for dimensionality reduction and visualization. 6.4.7. Visualize MNIST dataset. 6.4.8. Limitations of PCA

7.

8.

6.4.9. Code example. 6.4.10. PCA for dimensionality reduction (not-visualization) 6.5. T-distributed stochastic neighborhood embedding (t-SNE) 6.5.1. What is t-SNE? 6.5.2. Neighborhood of a point, Embedding. 6.5.3. Geometric intuition. 6.5.4. Crowding problem. 6.5.5. How to apply t-SNE and interpret its output (distill.pub) 6.5.6. t-SNE on MNIST. 6.5.7. Code example. Real world problem: Predict sentiment polarity given product reviews on Amazon. 7.1. Exploratory Data Analysis. 7.1.1. Dataset overview: Amazon Fine Food reviews 7.1.2. Data Cleaning: Deduplication. 7.2. Featurizations: convert text to numeric vectors. 7.2.1. Why convert text to a vector? 7.2.2. Bag of Words (BoW) 7.2.3. Text Preprocessing: Stemming, Stop-word removal, Tokenization, Lemmatization. 7.2.4. uni-gram, bi-gram, n-grams. 7.2.5. tf-idf (term frequency- inverse document frequency) [7.2.5 a] [New Video] Why use log in IDF? 7.2.6. Word2Vec. 7.2.7. Avg-Word2Vec, tf-idf weighted Word2Vec 7.3. Code samples 7.3.1. Bag of Words. 7.3.2. Text Preprocessing 7.3.3. Bi-Grams and n-grams. 7.3.4. TF-IDF 7.3.5. Word2Vec 7.3.6. Avg-Word2Vec and TFIDF-Word2Vec 7.4. Exercise: t-SNE visualization of Amazon reviews with polarity based color-coding Classification and Regression Models: K-Nearest Neighbors 8.1. Foundations 8.1.1. How “Classification” works? 8.1.2. Data matrix notation. 8.1.3. Classification vs Regression (examples) 8.2. K-Nearest Neighbors 8.2.1. Geometric intuition with a toy example. 8.2.2. Failure cases. 8.2.3. Distance measures: Euclidean(L2) , Manhattan(L1), Minkowski, Hamming

9.

8.2.4. Cosine Distance & Cosine Similarity 8.2.5. How to measure the effectiveness of k-NN? 8.2.6. Simple implementation: 8.2.6.1. Test/Evaluation time and space complexity. 8.2.6.2. Limitations. 8.2.7. Determining the right “k” 8.2.7.1. Decision surface for K-NN as K changes. 8.2.7.2. Overfitting and Underfitting. 8.2.7.3. Need for Cross validation. 8.2.7.4. K-fold cross validation. [NEW]8.2.7.4 a Visualizing train, validation and test datasets 8.2.7.5. How to determine overfitting and underfitting? 8.2.7.6. Time based splitting 8.2.8. k-NN for regression. 8.2.9. Weighted k-NN 8.2.10. Voronoi diagram. 8.2.11. kd-tree based k-NN: 8.2.11.1. Binary search tree 8.2.11.2. How to build a kd-tree. 8.2.11.3. Find nearest neighbors using kd-tree 8.2.11.4. Limitations. 8.2.11.5. Extensions. 8.2.12. Locality sensitive Hashing (LSH) 8.2.12.1. Hashing vs LSH. 8.2.12.2. LSH for cosine similarity 8.2.12.3. LSH for euclidean distance. 8.2.13. Probabilistic class label 8.2.14. Code Samples for K-NN 8.2.14.1. Decision boundary. [./knn/knn.ipynb and knn folder] 8.2.14.2. Cross Validation.[./knn/kfold.ipynb and knn folder] 8.2.15. Exercise: Apply k-NN on Amazon reviews dataset. Classification algorithms in various situations: 9.1. Introduction 9.2. Imbalanced vs balanced dataset. 9.3. Multi-class classification. 9.4. k-NN, given a distance or similarity matrix 9.5. Train and test set differences. 9.6. Impact of Outliers 9.7. Local Outlier Factor. 9.7.1. Simple solution: mean dist to k-NN. 9.7.2. k-distance (A), N(A) 9.7.3. reachability-distance(A, B) 9.7.4. Local-reachability-density(A)

9.7.5. LOF(A) 9.8. Impact of Scale & Column standardization. 9.9. Interpretability 9.10. Feature importance & Forward Feature Selection 9.11. Handling categorical and numerical features. 9.12. Handling missing values by imputation. 9.13. Curse of dimensionality. [26:00] 9.14. Bias-Variance tradeoff. [23:30] 9.14a Intuitive understanding of bias-variance. [6:00] 9.15. Best and worst cases for an algorithm. [6:00] 10. Performance measurement of models: 10.1. Accuracy [14:15] 10.2. Confusion matrix, TPR, FPR, FNR, TNR [24:00] 10.3. Precision & recall, F1-score. [9:00] 10.4. Receiver Operating Characteristic Curve (ROC) curve and AUC. [18:30] 10.5. Log-loss. [11:15] 10.6. R-Squared/ Coefficient of determination. [13:30] 10.7. Median absolute deviation (MAD) [5:00] 10.8. Distribution of errors. [6:30] 11. Naive Bayes 11.1. Conditional probability. [12:30] 11.2. Independent vs Mutually exclusive events. [6:00] 11.3. Bayes Theorem with examples. [16:30] 11.4. Exercise problems on Bayes Theorem. [Take from Bramha] 11.5. Naive Bayes algorithm. [26:00] 11.6. Toy example: Train and test stages. [25:30] 11.7. Naive Bayes on Text data. [15:00] 11.8. Laplace/Additive Smoothing. [23:30] 11.9. Log-probabilities for numerical stability. [11:00] 11.10. Cases: 11.10.1. Bias and Variance tradeoff. [13:30] 11.10.2. Feature importance and interpretability. [10:00] 11.10.3. Imbalanced data. [13:30] 11.10.4. Outliers. [5:00] 11.10.5. Missing values. [3:00] 11.10.6. Handling Numerical features (Gaussian NB) [13:00] 11.10.7. Multiclass classification. [2:00] 11.10.8. Similarity or Distance matrix. [2:30] 11.10.9. Large dimensionality. [2:00] 11.10.10. Best and worst cases. [7:30] 11.11. Code example [7:00] 11.12. Exercise: Apply Naive Bayes to Amazon reviews. [5:30] 12. Logistic Regression:

12.1. 12.2. 12.3. 12.4. 12.5. 12.6. 12.7.

Geometric intuition.[31:00] Sigmoid function & Squashing [36:30] Optimization problem. [23:30] Weight vector. [10:00] L2 Regularization: Overfitting and Underfitting. [25:30] L1 regularization and sparsity. [10:30] Probabilistic Interpretation: GaussianNaiveBayes [19:00] Description Link: Refer section 3.1 of https://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf 12.8. Loss minimization interpretation [18:30] 12.9. Hyperparameter search: Grid Search and Random Search [16:00] 12.10. Column Standardization. [4:30] 12.11. Feature importance and model interpretability. [13:30] 12.12. Collinearity of features. [14:00] 12.13. Train & Run time space and time complexity. [10:00] 12.14. Real world cases.[10:30] 12.15. Non-linearly separable data & feature engineering. [27:30] 12.16. Code sample: Logistic regression, GridSearchCV, RandomSearchCV [Code link in description: LogisticRegression.ipynb] [23:00] 12.17. Exercise: Apply Logistic regression to Amazon reviews dataset. [5:30] 12.18. Extensions to Logistic Regression: Generalized linear models (GLM) [8:00] Description link: Refer Part III of http://cs229.stanford.edu/notes/cs229-notes1.pdf Total hrs tille here: [52 hours approximately : total 3117 minutes] 13.

Linear Regression and Optimization. 13.1. Geometric intuition. [13:00] 13.2. Mathematical formulation.[13:30] 13.3. Cases.[8:00] 13.4. Code sample. [12:30] CODE: Description Link to Linear Regression.ipynb 13.5. Solving optimization problems 13.5.1. Differentiation. [28:00] 13.5.1_a Online differentiation tools [8:00] 13.5.2. Maxima and Minima [12:00] 13.5.3. Vector calculus: Grad [9:30] 13.5.4. Gradient descent: geometric intuition. [18:00] 13.5.5. Learning rate. [7:30] 13.5.6. Gradient descent for linear regression. [7:30] 13.5.7. SGD algorithm.[9:00] 13.5.8. Constrained optimization & PCA [14:00] 13.5.9. Logistic regression formulation revisited. [5:30] 13.5.10. Why L1 regularization creates sparsity? [17:00]

13.5.11.

Exercise: Implement SGD for linear regression. [6:00]

Chapter content: 189 mins Number of mins of content till here: 3306 mins ~ 55.1 hrs

14.

Support Vector Machines (SVM) 14.1. Geometric intuition. [19:00] 14.2. Mathematical derivation.[31:00] 14.3. Loss minimization: Hinge Loss.[14:00] 14.4. Dual form of SVM formulation. [15:00] Reference for Primal and Dual equivalence: http://cs229.stanford.edu/notes/cs229-notes3.pdf 14.5. Kernel trick. [10:00] 14.6. Polynomial kernel.[11:00] 14.7. RBF-Kernel.[21:00] 14.8. Domain specific Kernels. [6:00] 14.9. Train and run time complexities.[7:30] 14.10. nu-SVM: control errors and support vectors. [6:00] 14.11. SVM Regression. [7:30] 14.12. Cases. [9:00] 14.13. Code Sample. [13:30] 14.14. Exercise: Apply SVM to Amazon reviews dataset. [3:30] Chapter contents: 174 mins Total time till here: 3480 mins (58 hrs)

15.

Decision Trees 15.1. Geometric Intuition: Axis parallel hyperplanes. [16:30] 15.2. Sample Decision tree. [7:30] Refer in Description: http://homepage.cs.uri.edu/faculty/hamel/courses/2016/spring2016/csc581/lecture-notes /32-decision-trees.pdf 15.3. Building a decision Tree: 15.3.1. Entropy [18:00] 15.3.1.a Intuition behind entropy 15.3.2. Information Gain [9:30] 15.3.3. Gini Impurity.[7:00] 15.3.4. Constructing a DT. [20:30] 15.3.5. Splitting numerical features. [7:30] 15.3.5a Feature standardization. [4:00] 15.3.6. Categorical features with many possible values. [6:30] 15.4. Overfitting and Underfitting. [7:00]

15.5. Train and Run time complexity. [6:30] 15.6. Regression using Decision Trees. [9:00] 15.7. Cases [12:00] 15.8. Code Samples. [8:30] 15.9. Exercise: Decision Trees on Amazon reviews dataset. [2:00] Chapter contents: 142 mins Total time till here: 3622 mins (60 hrs 22 mins) 16.

Ensemble Models: 16.1. What are ensembles? [5:30] 16.2. Bootstrapped Aggregation (Bagging) 16.2.1. Intuition [17:00] 16.2.2. Random Forest and their construction. [14:30] 16.2.3. Bias-Variance tradeoff. [6:30] 16.2.4. Train and Run-time Complexity.[8:30] 16.2.5. Code Sample. [3:30] 16.2.6. Extremely randomized trees.[8:00] 16.2.7. Cases [5:30] 16.3. Boosting: 16.3.1. Intuition [16:30] 16.3.2. Residuals, Loss functions and gradients. [12:30] 16.3.3. Gradient Boosting [10:00] 16.3.4. Regularization by Shrinkage. [7:00] 16.3.5. Train and Run time complexity. [6:00] 16.3.6. XGBoost: Boosting + Randomization [13:30] 16.3.7. AdaBoost: geometric intuition.[7:00] 16.4. Stacking models.[21:30] Description Link Refer: https://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/ 16.5. Cascading classifiers. [14:30] 16.6. Kaggle competitions vs Real world. [9:00] 16.7. Exercise: Apply GBDT and RF to Amazon reviews dataset. [3:30] Chapter Contents: 190 minutes Time till here: 3812 minutes (63 hrs 22 mins)

17.

Featurizations and Feature engineering. 17.1. Introduction. [14:30] 17.2. Time-series data. 17.2.1. Moving window. [14:00] 17.2.2. Fourier decomposition. [22:00] 17.2.3. Deep learning features: LSTM [7:00] 17.3. Image data.

17.3.1. Image histogram.[15:00] 17.3.2. Keypoints: SIFT. [9:00] 17.3.3. Deep learning features: CNN [4:00] 17.4. Relational data.[9:30] 17.5. Graph data. [11:30] 17.6. Feature Engineering. 17.6.1. Indicator variables. [6:30] 17.6.2. Feature binning.[14:00] 17.6.3. Interaction variables.[8:00] 17.6.4. Mathematical transforms. [4:00] 17.7. Model specific featurizations. [8:30] 17.8. Feature orthogonality.[11:00] 17.9. Domain specific featurizations. [3:30] 17.10. Feature slicing.[9:00] 17.11. Kaggle Winners solutions.[7:00] Chapter Contents: 178 minutes Time till here: 3990 minutes (66 hrs 30 mins) 17a. Miscellaneous Topics 17a.1 Calibration of Models. 17a.1.1 Need for calibration. 17a.1.2 Calibration Plots. http://scikit-learn.org/stable/modules/calibration.html 17a.1.3 Platt’s Calibration/Scaling. https://en.wikipedia.org/wiki/Platt_scaling 17a.1.4 Isotonic Regression http://scikit-learn.org/stable/modules/isotonic.html 17a.1.5 Code Samples http://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClas sifierCV.html http://scikit-learn.org/stable/auto_examples/calibration/plot_calibration.html#sphxglr-auto-examples-calibration-plot-calibration-py 17a.1.6 Exercise: Calibration + Naive Bayes. 17.12. Modeling in the presence of outliers: RANSAC 17.13. Productionizing models. 17.14. Retraining models periodically as needed. 17.15. A/B testing. 17.16. VC Dimensions.

18.

Unsupervised learning/Clustering: K-Means (2) 18.1. What is Clustering? [9:30]

18.1.a Unsupervised learning [3:30] 18.2. Applications.[15:00] 18.3. Metrics for Clustering.[12:30] 18.4. K-Means 18.4.1. Geometric intuition, Centroids [8:00] 18.4.2. Mathematical formulation: Objective function [10:30] 18.4.3. K-Means Algorithm.[10:30] 18.4.4. How to initialize: K-Means++ [24:00] Refer:https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalys is.pdf 18.4.5. Failure cases/Limitations.[11:00] 18.4.6. K-Medoids [18:30] 18.4.7. Determining the right K. [4:30] 18.4.8. Time and space complexity.[3:30] 18.4.9. Code Samples[6:30] 18.4.10. Exercise: Cluster Amazon reviews.[5:00] Chapter Contents: 142.5 minutes Time till here: 4132.5 minutes (~ 69 hrs) 19.

Hierarchical clustering 19.1. Agglomerative & Divisive, Dendrograms [13:30] Refer: https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 19.2. Agglomerative Clustering.[8:00] Refer:https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 19.3. Proximity methods: Advantages and Limitations. [24:00] Refer:https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 19.4. Time and Space Complexity. [4:00] Refer: https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 19.5. Limitations of Hierarchical Clustering.[5:00] 19.6. Code sample. [2:30] Refer:http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClust ering.html#sklearn.cluster.AgglomerativeClustering Refer:http://scikit-learn.org/stable/auto_examples/cluster/plot_agglomerative_clustering. html#sphx-glr-auto-examples-cluster-plot-agglomerative-clustering-py 19.7. Exercise: Amazon food reviews. [2:30] Chapter Contents: 59.5 minutes Time till here: 4192 minutes (~ 70 hrs) 20.

DBSCAN (Density based clustering) 20.1. Density based clustering [4:30] 20.2. MinPts and Eps: Density [5:30] 20.3. Core, Border and Noise points. [6:30]

Refer:https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 20.4. Density edge and Density connected points. [5:00] 20.5. DBSCAN Algorithm.[11:00] 20.6. Hyper Parameters: MinPts and Eps.[9:30] 20.7. Advantages and Limitations of DBSCAN.[8:30] Refer: https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf Refer:https://en.wikipedia.org/wiki/DBSCAN#Advantages 20.8. Time and Space Complexity.[3:00] 20.9. Code samples. [2:30] Refer:http://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto -examples-cluster-plot-dbscan-py Refer: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html 20.10. Exercise: Amazon Food reviews.[3:00] Chapter Contents: 59 minutes Time till here: 4251 minutes (~ 71 hrs)

21.

Recommender Systems and Matrix Factorization. (3) 21.1. Problem formulation: Movie reviews.[23:00] 21.2. Content based vs Collaborative Filtering.[10:30] 21.3. Similarity based Algorithms.[15:30] 21.4. Matrix Factorization: 21.4.1. PCA, SVD [22:30] 21.4.2. NMF[3:00] 21.4.3. MF for Collaborative filtering [22:30] 21.4.4. MF for feature engineering.[8:30] 21.4.5. Clustering as MF[20:30] 21.5. Hyperparameter tuning. [10:00] 21.6. Matrix Factorization for recommender systems: Netflix Prize Solution [30:00] Refer:https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdf 21.7. Cold Start problem.[5:30] 21.8. Word Vectors using MF. [20:00] 21.9. Eigen-Faces. [14:00] Refer:https://bugra.github.io/work/notes/2014-11-16/an-introduction-to-unsupervised-lear ning-scikit-learn/ Refer:http://scikit-learn.org/stable/auto_examples/decomposition/plot_faces_decompositi on.html#sphx-glr-auto-examples-decomposition-plot-faces-decomposition-py 21.10. Code example. [11:00] Refer:http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html Refer:http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedS VD.html

Refer:http://scikit-learn.org/stable/auto_examples/decomposition/plot_faces_decompositi on.html#sphx-glr-auto-examples-decomposition-plot-faces-decomposition-py 21.11. Exercise: Word Vectors using Truncated SVD.[6:00] Chapter Contents: 192.5 minutes Time till here: 4443.5 minutes ( 74 hrs 3.5 mins) 22.

Neural Networks. 22.1. History of Neural networks and Deep Learning. 22.2. Diagrammatic representation: Logistic Regression and Perceptron 22.3. Multi-Layered Perceptron (MLP). 22.4. Training an MLP. 22.5. Backpropagation. 22.6. Weight initialization. 22.7. Vanishing Gradient problem. 22.8. Bias-Variance tradeoff. 22.9. Decision surfaces: Playground Refer: playground.tensorflow.org 22.10. Tensorflow and Keras. 22.10.1. Introduction to tensorflow 22.10.2. Introduction to Keras. 22.10.3. Computational Graph. 22.10.4. Building and MLP from scratch. 22.10.5. MLP in Keras. 22.10.6. GPU vs CPU. 22.10.7. GPUs for Deep Learning. 22.11. Assignment: MNIST using MLP. 23. Deep Multi-layer perceptrons 23.1. Dropout layers & Regularization. 23.2. Rectified Linear Units (ReLU). 23.3. Batch Normalization. 23.4. Optimizers: 23.4.1. Local and Global Optima. 23.4.2. ADAM 23.4.3. RMSProp 23.4.4. AdaGrad 23.5. Gradient Checking. 23.6. Initialization of models. 23.7. Softmax and Cross-entropy for multi-class classification. 23.8. Code Sample: MNIST 23.9. Auto Encoders. 23.10. Word2Vec. 24. Convolutional Neural Nets. 24.1. MLPs on Image data

24.2. Convolution operator. 24.3. Edge Detection on images. 24.4. Convolutional layer. 24.5. Max Pooling & Padding. 24.6. Imagenet dataset. 24.7. AlexNet. 24.8. Residual Network. 24.9. Inception Network. 24.10. Transfer Learning: Reusing existing models. 24.10.1. How to reuse state of the art models for your problem. 24.10.2. Data Augmentation. 24.10.3. Code example: Cats vs Dogs. 25. Long Short-term memory (LSTMs) 25.1. Recurrent Neural Network. 25.2. Backpropagation for RNNs. 25.3. Memory units. 25.4. LSTM. 25.5. GRUs. 25.6. Bidirectional RNN. 25.7. Code example: Predict a stock price using LSTM. 25.8. Sequence to Sequence Models. 26. Case Study: Personalized Cancer Diagnosis. 26.1. Business/Real world problem 26.1.1. Overview. [12:30] Refer: LINK TO THE IPYTHON NOTEBOOK IN THE FOLDER 26.1.2. Business objectives and constraints. [11:00] 26.2. ML problem formulation 26.2.1. Data [4:30] 26.2.2. Mapping real world to ML problem. [18:00] 26.2.3. Train, CV and Test data construction.[3:30] 26.3. Exploratory Data Analysis 26.3.1. Reading data & preprocessing [7:00] 26.3.2. Distribution of Class-labels. [6:30] 26.3.3. “Random” Model. [19:00] 26.3.4. Univariate Analysis 26.3.4.1. Gene feature.[34:00] 26.3.4.2. Variation Feature. [18:30] 26.3.4.3. Text feature. [15:00] 26.4. Machine Learning Models 26.4.1. Data preparation. [8:00] 26.4.2. Baseline Model: Naive Bayes[23:00] 26.4.3. K-Nearest Neighbors Classification. [9:00] 26.4.4. Logistic Regression with class balancing [9:30]

26.4.5. 26.4.6. 26.4.7. 26.4.8. 26.4.9. 26.4.10. 26.5.

Logistic Regression without class balancing [3:00] Linear-SVM. [6:00] Random-Forest with one-hot encoded features [6:30] Random-Forest with response-coded features [5:00] Stacking Classifier [7:00] Majority Voting classifier. [4:30]

Assignments. [4:30]

Chapter Duration: 235.5 Mins (~ 4 hrs) 27.

1.

Case studies/Projects: (2*10 =20) 27.1. Amazon fashion discovery engine. 27.2. Malware Detection on Windows OS. 27.3. Song Similarity engine. 27.4. Predict customer propensity to purchase using CRM data. 27.5. Suggest me a movie to watch: Netflix Prize. 27.6. Human Activity Recognition using mobile phone’s accelerometer and gyroscope data. 27.7. Which ad to show to which user: Ad Click prediction. 27.8. … 27.9. …. 27.10.

Introductory Videos 1.1. 1.2. What is “appliedaicourse.com” all about? 1.2.1. Parts of applied AI Course 1.2.2. It has two online courses (AI course and AI Project) 1.2.3. You can learn the course without having any prerequisites 1.2.4. Customer service , team 1.2.5. AI is required more mathematical knowledge but it’s our job to simplify the math part. Explaining the math part using geometry 1.2.6. Balance between theory and practice

1.2.7.

Brief introduction about the AI Course(explained everything from the scratch by using the real world examples ) 1.2.8. We use amazon,facebook,quora,netflix data sets to solve the real world problems end to end. 1.2.9. You will be able to build the productionizable models by end of the AI course. 1.2.10. Build a portfolio to strengthen their resume for AI engineer positions 1.2.11. Brief introduction about the AI projects(explained everything from the scratch by using the real world examples and solved a real world problem end to end ) 1.2.12. Problem statements of few projects like fb recommendation, quora, netflix etc. 1.2.13. 20 hours of free content to understand the way of teaching 1.2.14. Will upload at least 10 hours of content in a week 1.3. What is AI? 1.4. why learn it? 1.4.1. One of the most important areas of Science with huge implications in everything from buying clothes online to cancer treatments. AI has massively changed in the last 5 years with massive impact. Examples: Google 20% of mobile phone searches are voice driven. 35% of Amazon sales are generated by product recommendations. Breast cancer diagnosis almost as good as the best doctors. Better if AI algos and doctors work together.

From my experience, massive amount of gap in supply and demand of engineers who can build AI systems.

1.5.

Who is teaching this course? 1.5.1. Srikanth Varma Chekuri: instructor, extensive experience in building cutting edge AI models at top-tier companies like Amazon, Yahoo. 1.5.2. Most recently: Sr Scientist at Amazon.com in Silicon Valley. 1.5.3. Co-founder of Matherix Labs 1.5.4. Research Engineer at Yahoo Labs Bangalore, 1.5.5. Indian Institute of Science, Bangalore. 1.5.6. SVPCE, small engineering college in Visakhapatnam. GATE Rank 2. 1.5.7. Teaching Experience: Undergrad friends, Offline AI Course (last 3 months) 1.5.8. Team: 5 engineers from top companies and institutions like IITs, IIITs and NITs.

1.6.

What is Applied AI course ? 1.6.1. No prerequisites 1.6.2. 140+ hours of content (basic probability, stats, linear algebra) 1.6.3. This content is highly adaptive. Based on the customer feedback will update the videos. 1.6.4. Right balance between theory and practical 1.6.5. 10+case studies on real word data sets and grading for 5 projects 1.6.6. Brief explanation about data sets (fb,quora,netflix,amazon..) 1.6.7. Explaining math intuitively using geometry 1.6.8. Deep learning concepts 1.6.9. Gets ipython notebook and relevant code snippets. 1.6.10. Will answer your queries by our experts team. 1.6.11. You will be able to build the productionizable models by end of the AI course. 1.6.12. Build a portfolio.

1.7.

What is AI Projects? 1.7.1. No prerequisites 1.7.2. 100+ hours of content (basic probability, stats, linear algebra) 1.7.3. This content is highly adaptive. Based on the customer feedback will update the videos. 1.7.4. Right balance between theory and practicalNo prerequisites 1.7.5. Explaining math intuitively using geometry 1.7.6. Brief introduction about the AI projects(explained everything from the scratch by using the real world examples and solved a real world problem end to end ) 1.7.7. Problem statements of projects like fb recommendation, quora, netflix etc. 1.7.8. Gets ipython notebook and relevant code snippets. 1.7.9. Will answer your queries by our experts team.

1.8.

What are the job opportunities if I learn AI? 1.8.1.

1.9.

How is this different from other AI/ML Courses online (or) University courses ? 1.9.1. Fine balance between theory and practice. Check out our sample videos. 1.9.2. Students give up when math dense deep. Do not water down. 1.9.3. 10+ real world case studies end to end from raw data to a deployable model. 1.9.4. Portfolio for each course partcipant. 1.9.5. Customer support and obsessiveness

1.10.

What are the expected outcomes for a course-participant? 1.10.1. Working professional: Lateral moves. 1.10.2. Students: entry level jobs in data-science, machine learning and AI. Salaries ranging from 6LPA to 25 LPS.

1.11.

Who is this course designed for? 1.11.1. Willingness to learn AI. 1.11.2. 8-10 hours per week over a 3-6 month period. 1.11.3. No prerequisites except familiarity with any programming language.

1.12.

Will you give me a certificate and grade my work? 1.12.1. 5 real world cases will be graded. 1.12.2. A+ : top tier product based companies 1.12.3. A: product based startups 1.12.4. B+: top-tier services companies 1.12.5. B: other services companies/startups. 1.12.6. No grade below B 1.12.7. Strict grading policy. No grade dilution. 1.13. Will I get a job at the end of this course and will you help me get it? 1.13.1. Working on collaborating with recruiting companies. No guarantees. 1.13.2. We will connect you to recruiting companies and will refer you to various companies. 1.13.3. Depends on how well you build your portfolio. 1.14. What’s is your teaching methodology? 1.14.1. Very informal: explaining a friend 1.14.2. Seven step methodology 1. Why 2. What problems can be solved 3. Concept [Toy example,basic math, notations , intuition, real world example) 4. Diagram 5. Limitations 6. Workarounds 7. Code 1.14.3. Balance between theory and practice. 1.14.4. Google search, Wikipedia and web sources(will provide references also). 1.15. What are the course contents? 1.16. How is AAIC different from other online courses?

2.

Python for ML/AI 2.1. Why Python? 2.2. Setup 2.2.1. Install Python. 2.2.2. Installing packages: numpy, pandas, scipy, matplotlib, seaborn, sklearn) 2.2.3. iPython setup. 2.3. Introduction 2.3.1. Keywords and Identifiers 2.3.2. Statements, Indentation and Comments 2.3.3. Variables and Datatypes 2.3.4. Input and Output 2.3.5. Operators 2.4. Flow Control 2.4.1. If...else 2.4.2. while loop 2.4.3. for loop 2.4.4. break and continue 2.5. Data Structures 2.5.1. Lists 2.5.2. Tuples 2.5.3. Dictionary 2.5.4. Strings 2.5.5. Sets 2.6. Functions 2.6.1. Introduction 2.6.2. Types of functions 2.6.3. Function Arguments 2.6.4. Recursive Functions 2.6.5. Lambda Functions 2.6.6. Modules 2.6.7. Packages 2.7. File Handling 2.8. Exception Handling 2.9. Debugging Python 2.10. NumPy 2.10.1. Introduction to NumPy. 2.10.2. Numerical operations. 2.11. Matplotlib 2.12. Pandas

3.

4.

2.12.1. Getting started with pandas 2.12.2. Data Frame Basics 2.12.3. Key Operations on Data Frames. 2.13. Computational Complexity: an Introduction 2.13.1. Space and Time Complexity: Find largest number in a list 2.13.2. Binary search 2.13.3. Find elements common in two lists. 2.13.4. Find elements common in two lists using a Hashtable/Dict 2.13.5. Further reading about Computational Complexity [Please add a section with these links for reference] 2.13.5.1. https://medium.com/omarelgabrys-blog/the-big-scary-o-notation-c e9352d827ce 2.13.5.2. https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/ 2.13.5.3. https://medium.freecodecamp.org/time-is-complex-but-priceless-f0 abd015063c Plotting for exploratory data analysis (EDA) 3.1. Iris dataset 3.1.1. Data-point, vector, observation 3.1.2. Dataset 3.1.3. Input variables/features/dimensions/independent variable 3.1.4. Output Variable/Class Label/ Response Label/ dependent variable 3.1.5. Objective: Classification. 3.2. Scatter-plot: 2D, 3D. 3.3. Pair plots. 3.4. PDF, CDF, Univariate analysis. 3.4.1. Histogram and PDF 3.4.2. Univariate analysis using PDFs. 3.4.3. Cumulative distribution function (CDF) 3.5. Mean , Variance, Std-dev 3.6. Median, Percentiles, Quantiles, IQR, MAD and Outliers. 3.7. Box-plot with whiskers 3.8. Violin plots. 3.9. Summarizing plots. 3.10. Univariate, Bivariate and Multivariate analysis. 3.11. Multivariate probability density, contour plot. 3.12. Exercise: Perform EDA on Haberman dataset. Probability and Statistics 4.1. Introduction to Probability and Stats 4.1.1. Why learn it? 4.1.2. P(X=x1) , Dice and coin example 4.1.3. Random variables: discrete and continuous. 4.1.4. Outliers (or) extreme points. 4.1.5. Population & Sample.

4.2.

5.

Gaussian/Normal Distribution 4.2.1. Examples: Heights and weights. 4.2.2. Why learn about distributions. 4.2.3. Mu, sigma: Parameters 4.2.4. PDF (iris dataset) 4.2.5. CDF 4.2.6. 1-std-dev, 2-std-dev, 3-std-dev range. 4.2.7. Symmetric distribution, Skewness and Kurtosis 4.2.8. Standard normal variate (z) and standardization. 4.2.9. Kernel density estimation. 4.2.10. Sampling distribution & Central Limit theorem. 4.2.11. Q-Q Plot: Is a given random variable Gaussian distributed? 4.3. Uniform Distribution and random number generators 4.3.1. Discrete and Continuous Uniform distributions. 4.3.2. How to randomly sample data points. [UniformDisb.ipynb] 4.4. Bernoulli and Binomial distribution 4.5. Log-normal and power law distribution: 4.5.1. Log-normal: CDF, PDF, Examples. 4.5.2. Power-law & Pareto distributions: PDF, examples 4.5.3. Converting power law distributions to normal: Box-Cox/Power transform. 4.6. Correlation 4.6.1. Co-variance 4.6.2. Pearson Correlation Coefficient 4.6.3. Spearman Rank Correlation Coefficient 4.6.4. Correlation vs Causation 4.7. Confidence Intervals 4.7.1. Confidence Interval vs Point estimate. 4.7.2. Computing confidence-interval given a distribution. 4.7.3. For mean of a random variable 4.7.3.1. Known Standard-deviation: using CLT 4.7.3.2. Unknown Standard-deviation: using t-distribution 4.7.4. Confidence Interval using empirical bootstrap [BootstrapCI.ipynb] 4.8. Hypothesis testing 4.8.1. Hypothesis Testing methodology, Null-hypothesis, test-statistic, p-value. 4.8.2. Resampling and permutation test. 4.8.3. K-S Test for similarity of two distributions. 4.8.4. Code Snippet [KSTest.ipynb] Linear Algebra 5.1. Why learn it ? 5.2. Fundamentals 5.2.1. Point/Vector (2-D, 3-D, n-D) 5.2.2. Dot product and angle between 2 vectors. 5.2.3. Projection, unit vector

6.

7.

5.2.4. Equation of a line (2-D), plane(3-D) and hyperplane (n-D) 5.2.5. Distance of a point from a plane/hyperplane, half-spaces 5.2.6. Equation of a circle (2-D), sphere (3-D) and hypersphere (n-D) 5.2.7. Equation of an ellipse (2-D), ellipsoid (3-D) and hyperellipsoid (n-D) 5.2.8. Square, Rectangle, Hyper-cube and Hyper-cuboid.. Dimensionality reduction and Visualization: 6.1. What is dimensionality reduction? 6.2. Data representation and pre-processing 6.2.1. Row vector, Column vector: Iris dataset example. 6.2.2. Represent a dataset: D= {x_i, y_i} 6.2.3. Represent a dataset as a Matrix. 6.2.4. Data preprocessing: Column Normalization 6.2.5. Mean of a data matrix. 6.2.6. Data preprocessing: Column Standardization 6.2.7. Co-variance of a Data Matrix. 6.3. MNIST dataset (784 dimensional) 6.3.1. Explanation of the dataset. 6.3.2. Code to load this dataset. 6.4. Principal Component Analysis. 6.4.1. Why learn it. 6.4.2. Geometric intuition. 6.4.3. Mathematical objective function. 6.4.4. Alternative formulation of PCA: distance minimization 6.4.5. Eigenvalues and eigenvectors. 6.4.6. PCA for dimensionality reduction and visualization. 6.4.7. Visualize MNIST dataset. 6.4.8. Limitations of PCA 6.4.9. Code example. 6.4.10. PCA for dimensionality reduction (not-visualization) 6.5. T-distributed stochastic neighborhood embedding (t-SNE) 6.5.1. What is t-SNE? 6.5.2. Neighborhood of a point, Embedding. 6.5.3. Geometric intuition. 6.5.4. Crowding problem. 6.5.5. How to apply t-SNE and interpret its output (distill.pub) 6.5.6. t-SNE on MNIST. 6.5.7. Code example. Real world problem: Predict sentiment polarity given product reviews on Amazon. 7.1. Exploratory Data Analysis. 7.1.1. Dataset overview: Amazon Fine Food reviews 7.1.2. Data Cleaning: Deduplication. 7.2. Featurizations: convert text to numeric vectors. 7.2.1. Why convert text to a vector?

7.2.2. 7.2.3.

Bag of Words (BoW) Text Preprocessing: Stemming, Stop-word removal, Tokenization, Lemmatization. 7.2.4. uni-gram, bi-gram, n-grams. 7.2.5. tf-idf (term frequency- inverse document frequency) [7.2.5 a] [New Video] Why use log in IDF? 7.2.6. Word2Vec. 7.2.7. Avg-Word2Vec, tf-idf weighted Word2Vec 7.3. Code samples 7.3.1. Bag of Words. 7.3.2. Text Preprocessing 7.3.3. Bi-Grams and n-grams. 7.3.4. TF-IDF 7.3.5. Word2Vec 7.3.6. Avg-Word2Vec and TFIDF-Word2Vec 7.4. Exercise: t-SNE visualization of Amazon reviews with polarity based color-coding 8.

Classification and Regression Models: K-Nearest Neighbors 8.1. Foundations 8.1.1. How “Classification” works? 8.1.2. Data matrix notation. 8.1.3. Classification vs Regression (examples) 8.2. K-Nearest Neighbors 8.2.1. Geometric intuition with a toy example. 8.2.2. Failure cases. 8.2.3. Distance measures: Euclidean(L2) , Manhattan(L1), Minkowski, Hamming 8.2.4. Cosine Distance & Cosine Similarity 8.2.5. How to measure the effectiveness of k-NN? 8.2.6. Simple implementation: 8.2.6.1. Test/Evaluation time and space complexity. 8.2.6.2. Limitations. 8.2.7. Determining the right “k” 8.2.7.1. Decision surface for K-NN as K changes. 8.2.7.2. Overfitting and Underfitting. 8.2.7.3. Need for Cross validation. 8.2.7.4. K-fold cross validation. [NEW]8.2.7.4 a Visualizing train, validation and test datasets 8.2.7.5. How to determine overfitting and underfitting? 8.2.7.6. Time based splitting 8.2.8. k-NN for regression. 8.2.9. Weighted k-NN 8.2.10. Voronoi diagram.

9.

10.

8.2.11. kd-tree based k-NN: 8.2.11.1. Binary search tree 8.2.11.2. How to build a kd-tree. 8.2.11.3. Find nearest neighbors using kd-tree 8.2.11.4. Limitations. 8.2.11.5. Extensions. 8.2.12. Locality sensitive Hashing (LSH) 8.2.12.1. Hashing vs LSH. 8.2.12.2. LSH for cosine similarity 8.2.12.3. LSH for euclidean distance. 8.2.13. Probabilistic class label 8.2.14. Code Samples for K-NN 8.2.14.1. Decision boundary. [./knn/knn.ipynb and knn folder] 8.2.14.2. Cross Validation.[./knn/kfold.ipynb and knn folder] 8.2.15. Exercise: Apply k-NN on Amazon reviews dataset. Classification algorithms in various situations: 9.1. Introduction 9.2. Imbalanced vs balanced dataset. 9.3. Multi-class classification. 9.4. k-NN, given a distance or similarity matrix 9.5. Train and test set differences. 9.6. Impact of Outliers 9.7. Local Outlier Factor. 9.7.1. Simple solution: mean dist to k-NN. 9.7.2. k-distance (A), N(A) 9.7.3. reachability-distance(A, B) 9.7.4. Local-reachability-density(A) 9.7.5. LOF(A) 9.8. Impact of Scale & Column standardization. 9.9. Interpretability 9.10. Feature importance & Forward Feature Selection 9.11. Handling categorical and numerical features. 9.12. Handling missing values by imputation. 9.13. Curse of dimensionality. [26:00] 9.14. Bias-Variance tradeoff. [23:30] 9.14a Intuitive understanding of bias-variance. [6:00] 9.15. Best and worst cases for an algorithm. [6:00] Performance measurement of models: 10.1. Accuracy [14:15] 10.2. Confusion matrix, TPR, FPR, FNR, TNR [24:00] 10.3. Precision & recall, F1-score. [9:00] 10.4. Receiver Operating Characteristic Curve (ROC) curve and AUC. [18:30] 10.5. Log-loss. [11:15]

10.6. R-Squared/ Coefficient of determination. [13:30] 10.7. Median absolute deviation (MAD) [5:00] 10.8. Distribution of errors. [6:30] 11. Naive Bayes 11.1. Conditional probability. [12:30] 11.2. Independent vs Mutually exclusive events. [6:00] 11.3. Bayes Theorem with examples. [16:30] 11.4. Exercise problems on Bayes Theorem. [Take from Bramha] 11.5. Naive Bayes algorithm. [26:00] 11.6. Toy example: Train and test stages. [25:30] 11.7. Naive Bayes on Text data. [15:00] 11.8. Laplace/Additive Smoothing. [23:30] 11.9. Log-probabilities for numerical stability. [11:00] 11.10. Cases: 11.10.1. Bias and Variance tradeoff. [13:30] 11.10.2. Feature importance and interpretability. [10:00] 11.10.3. Imbalanced data. [13:30] 11.10.4. Outliers. [5:00] 11.10.5. Missing values. [3:00] 11.10.6. Handling Numerical features (Gaussian NB) [13:00] 11.10.7. Multiclass classification. [2:00] 11.10.8. Similarity or Distance matrix. [2:30] 11.10.9. Large dimensionality. [2:00] 11.10.10. Best and worst cases. [7:30] 11.11. Code example [7:00] 11.12. Exercise: Apply Naive Bayes to Amazon reviews. [5:30] 12. Logistic Regression: 12.1. Geometric intuition.[31:00] 12.2. Sigmoid function & Squashing [36:30] 12.3. Optimization problem. [23:30] 12.4. Weight vector. [10:00] 12.5. L2 Regularization: Overfitting and Underfitting. [25:30] 12.6. L1 regularization and sparsity. [10:30] 12.7. Probabilistic Interpretation: GaussianNaiveBayes [19:00] Description Link: Refer section 3.1 of https://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf 12.8. Loss minimization interpretation [18:30] 12.9. Hyperparameter search: Grid Search and Random Search [16:00] 12.10. Column Standardization. [4:30] 12.11. Feature importance and model interpretability. [13:30] 12.12. Collinearity of features. [14:00] 12.13. Train & Run time space and time complexity. [10:00] 12.14. Real world cases.[10:30]

12.15. 12.16.

Non-linearly separable data & feature engineering. [27:30] Code sample: Logistic regression, GridSearchCV, RandomSearchCV [Code link in description: LogisticRegression.ipynb] [23:00] 12.17. Exercise: Apply Logistic regression to Amazon reviews dataset. [5:30] 12.18. Extensions to Logistic Regression: Generalized linear models (GLM) [8:00] Description link: Refer Part III of http://cs229.stanford.edu/notes/cs229-notes1.pdf Total hrs tille here: [52 hours approximately : total 3117 minutes] 13.

Linear Regression and Optimization. 13.1. Geometric intuition. [13:00] 13.2. Mathematical formulation.[13:30] 13.3. Cases.[8:00] 13.4. Code sample. [12:30] CODE: Description Link to Linear Regression.ipynb 13.5. Solving optimization problems 13.5.1. Differentiation. [28:00] 13.5.1_a Online differentiation tools [8:00] 13.5.2. Maxima and Minima [12:00] 13.5.3. Vector calculus: Grad [9:30] 13.5.4. Gradient descent: geometric intuition. [18:00] 13.5.5. Learning rate. [7:30] 13.5.6. Gradient descent for linear regression. [7:30] 13.5.7. SGD algorithm.[9:00] 13.5.8. Constrained optimization & PCA [14:00] 13.5.9. Logistic regression formulation revisited. [5:30] 13.5.10. Why L1 regularization creates sparsity? [17:00] 13.5.11. Exercise: Implement SGD for linear regression. [6:00]

Chapter content: 189 mins Number of mins of content till here: 3306 mins ~ 55.1 hrs

14.

Support Vector Machines (SVM) 14.1. Geometric intuition. [19:00] 14.2. Mathematical derivation.[31:00] 14.3. Loss minimization: Hinge Loss.[14:00] 14.4. Dual form of SVM formulation. [15:00] Reference for Primal and Dual equivalence: http://cs229.stanford.edu/notes/cs229-notes3.pdf 14.5. Kernel trick. [10:00] 14.6. Polynomial kernel.[11:00] 14.7. RBF-Kernel.[21:00]

14.8. 14.9. 14.10. 14.11. 14.12. 14.13. 14.14.

Domain specific Kernels. [6:00] Train and run time complexities.[7:30] nu-SVM: control errors and support vectors. [6:00] SVM Regression. [7:30] Cases. [9:00] Code Sample. [13:30] Exercise: Apply SVM to Amazon reviews dataset. [3:30]

Chapter contents: 174 mins Total time till here: 3480 mins (58 hrs) 15.

Decision Trees 15.1. Geometric Intuition: Axis parallel hyperplanes. [16:30] 15.2. Sample Decision tree. [7:30] Refer in Description: http://homepage.cs.uri.edu/faculty/hamel/courses/2016/spring2016/csc581/lecture-notes /32-decision-trees.pdf 15.3. Building a decision Tree: 15.3.1. Entropy [18:00] 15.3.1.a Intuition behind entropy 15.3.2. Information Gain [9:30] 15.3.3. Gini Impurity.[7:00] 15.3.4. Constructing a DT. [20:30] 15.3.5. Splitting numerical features. [7:30] 15.3.5a Feature standardization. [4:00] 15.3.6. Categorical features with many possible values. [6:30] 15.4. Overfitting and Underfitting. [7:00] 15.5. Train and Run time complexity. [6:30] 15.6. Regression using Decision Trees. [9:00] 15.7. Cases [12:00] 15.8. Code Samples. [8:30] 15.9. Exercise: Decision Trees on Amazon reviews dataset. [2:00] Chapter contents: 142 mins Total time till here: 3622 mins (60 hrs 22 mins)

16.

Ensemble Models: 16.1. What are ensembles? [5:30] 16.2. Bootstrapped Aggregation (Bagging) 16.2.1. Intuition [17:00] 16.2.2. Random Forest and their construction. [14:30] 16.2.3. Bias-Variance tradeoff. [6:30] 16.2.4. Train and Run-time Complexity.[8:30] 16.2.5. Code Sample. [3:30]

16.2.6. Extremely randomized trees.[8:00] 16.2.7. Cases [5:30] 16.3. Boosting: 16.3.1. Intuition [16:30] 16.3.2. Residuals, Loss functions and gradients. [12:30] 16.3.3. Gradient Boosting [10:00] 16.3.4. Regularization by Shrinkage. [7:00] 16.3.5. Train and Run time complexity. [6:00] 16.3.6. XGBoost: Boosting + Randomization [13:30] 16.3.7. AdaBoost: geometric intuition.[7:00] 16.4. Stacking models.[21:30] Description Link Refer: https://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/ 16.5. Cascading classifiers. [14:30] 16.6. Kaggle competitions vs Real world. [9:00] 16.7. Exercise: Apply GBDT and RF to Amazon reviews dataset. [3:30] Chapter Contents: 190 minutes Time till here: 3812 minutes (63 hrs 22 mins) 17.

Featurizations and Feature engineering. 17.1. Introduction. [14:30] 17.2. Time-series data. 17.2.1. Moving window. [14:00] 17.2.2. Fourier decomposition. [22:00] 17.2.3. Deep learning features: LSTM [7:00] 17.3. Image data. 17.3.1. Image histogram.[15:00] 17.3.2. Keypoints: SIFT. [9:00] 17.3.3. Deep learning features: CNN [4:00] 17.4. Relational data.[9:30] 17.5. Graph data. [11:30] 17.6. Feature Engineering. 17.6.1. Indicator variables. [6:30] 17.6.2. Feature binning.[14:00] 17.6.3. Interaction variables.[8:00] 17.6.4. Mathematical transforms. [4:00] 17.7. Model specific featurizations. [8:30] 17.8. Feature orthogonality.[11:00] 17.9. Domain specific featurizations. [3:30] 17.10. Feature slicing.[9:00] 17.11. Kaggle Winners solutions.[7:00]

Chapter Contents: 178 minutes Time till here: 3990 minutes (66 hrs 30 mins) 17a. Miscellaneous Topics 17a.1 Calibration of Models. 17a.1.1 Need for calibration. 17a.1.2 Calibration Plots. http://scikit-learn.org/stable/modules/calibration.html 17a.1.3 Platt’s Calibration/Scaling. https://en.wikipedia.org/wiki/Platt_scaling 17a.1.4 Isotonic Regression http://scikit-learn.org/stable/modules/isotonic.html 17a.1.5 Code Samples http://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClas sifierCV.html http://scikit-learn.org/stable/auto_examples/calibration/plot_calibration.html#sphxglr-auto-examples-calibration-plot-calibration-py 17a.1.6 Exercise: Calibration + Naive Bayes. 17.12. Modeling in the presence of outliers: RANSAC 17.13. Productionizing models. 17.14. Retraining models periodically as needed. 17.15. A/B testing. 17.16. VC Dimensions.

18.

Unsupervised learning/Clustering: K-Means (2) 18.1. What is Clustering? [9:30] 18.1.a Unsupervised learning [3:30] 18.2. Applications.[15:00] 18.3. Metrics for Clustering.[12:30] 18.4. K-Means 18.4.1. Geometric intuition, Centroids [8:00] 18.4.2. Mathematical formulation: Objective function [10:30] 18.4.3. K-Means Algorithm.[10:30] 18.4.4. How to initialize: K-Means++ [24:00] Refer:https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalys is.pdf 18.4.5. Failure cases/Limitations.[11:00] 18.4.6. K-Medoids [18:30] 18.4.7. Determining the right K. [4:30] 18.4.8. Time and space complexity.[3:30] 18.4.9. Code Samples[6:30] 18.4.10. Exercise: Cluster Amazon reviews.[5:00]

Chapter Contents: 142.5 minutes Time till here: 4132.5 minutes (~ 69 hrs) 19.

Hierarchical clustering 19.1. Agglomerative & Divisive, Dendrograms [13:30] Refer: https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 19.2. Agglomerative Clustering.[8:00] Refer:https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 19.3. Proximity methods: Advantages and Limitations. [24:00] Refer:https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 19.4. Time and Space Complexity. [4:00] Refer: https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 19.5. Limitations of Hierarchical Clustering.[5:00] 19.6. Code sample. [2:30] Refer:http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClust ering.html#sklearn.cluster.AgglomerativeClustering Refer:http://scikit-learn.org/stable/auto_examples/cluster/plot_agglomerative_clustering. html#sphx-glr-auto-examples-cluster-plot-agglomerative-clustering-py 19.7. Exercise: Amazon food reviews. [2:30] Chapter Contents: 59.5 minutes Time till here: 4192 minutes (~ 70 hrs) 20.

DBSCAN (Density based clustering) 20.1. Density based clustering [4:30] 20.2. MinPts and Eps: Density [5:30] 20.3. Core, Border and Noise points. [6:30] Refer:https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 20.4. Density edge and Density connected points. [5:00] 20.5. DBSCAN Algorithm.[11:00] 20.6. Hyper Parameters: MinPts and Eps.[9:30] 20.7. Advantages and Limitations of DBSCAN.[8:30] Refer: https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf Refer:https://en.wikipedia.org/wiki/DBSCAN#Advantages 20.8. Time and Space Complexity.[3:00] 20.9. Code samples. [2:30] Refer:http://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto -examples-cluster-plot-dbscan-py Refer: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html 20.10. Exercise: Amazon Food reviews.[3:00] Chapter Contents: 59 minutes Time till here: 4251 minutes (~ 71 hrs)

21.

22.

Recommender Systems and Matrix Factorization. (3) 21.1. Problem formulation: Movie reviews.[23:00] 21.2. Content based vs Collaborative Filtering.[10:30] 21.3. Similarity based Algorithms.[15:30] 21.4. Matrix Factorization: 21.4.1. PCA, SVD [22:30] 21.4.2. NMF[3:00] 21.4.3. MF for Collaborative filtering [22:30] 21.4.4. MF for feature engineering.[8:30] 21.4.5. Clustering as MF[20:30] 21.5. Hyperparameter tuning. [10:00] 21.6. Matrix Factorization for recommender systems: Netflix Prize Solution [30:00] Refer:https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdf 21.7. Cold Start problem.[5:30] 21.8. Word Vectors using MF. [20:00] 21.9. Eigen-Faces. [14:00] Refer:https://bugra.github.io/work/notes/2014-11-16/an-introduction-to-unsupervised-lear ning-scikit-learn/ Refer:http://scikit-learn.org/stable/auto_examples/decomposition/plot_faces_decompositi on.html#sphx-glr-auto-examples-decomposition-plot-faces-decomposition-py 21.10. Code example. [11:00] Refer:http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html Refer:http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedS VD.html Refer:http://scikit-learn.org/stable/auto_examples/decomposition/plot_faces_decompositi on.html#sphx-glr-auto-examples-decomposition-plot-faces-decomposition-py 21.11. Exercise: Word Vectors using Truncated SVD.[6:00] Chapter Contents: 192.5 minutes Time till here: 4443.5 minutes ( 74 hrs 3.5 mins) Neural Networks. 22.1. History of Neural networks and Deep Learning. 22.2. Diagrammatic representation: Logistic Regression and Perceptron 22.3. Multi-Layered Perceptron (MLP). 22.4. Training an MLP. 22.5. Backpropagation. 22.6. Weight initialization. 22.7. Vanishing Gradient problem. 22.8. Bias-Variance tradeoff. 22.9. Decision surfaces: Playground Refer: playground.tensorflow.org

22.10. Tensorflow and Keras. 22.10.1. Introduction to tensorflow 22.10.2. Introduction to Keras. 22.10.3. Computational Graph. 22.10.4. Building and MLP from scratch. 22.10.5. MLP in Keras. 22.10.6. GPU vs CPU. 22.10.7. GPUs for Deep Learning. 22.11. Assignment: MNIST using MLP. 23. Deep Multi-layer perceptrons 23.1. Dropout layers & Regularization. 23.2. Rectified Linear Units (ReLU). 23.3. Batch Normalization. 23.4. Optimizers: 23.4.1. Local and Global Optima. 23.4.2. ADAM 23.4.3. RMSProp 23.4.4. AdaGrad 23.5. Gradient Checking. 23.6. Initialization of models. 23.7. Softmax and Cross-entropy for multi-class classification. 23.8. Code Sample: MNIST 23.9. Auto Encoders. 23.10. Word2Vec. 24. Convolutional Neural Nets. 24.1. MLPs on Image data 24.2. Convolution operator. 24.3. Edge Detection on images. 24.4. Convolutional layer. 24.5. Max Pooling & Padding. 24.6. Imagenet dataset. 24.7. AlexNet. 24.8. Residual Network. 24.9. Inception Network. 24.10. Transfer Learning: Reusing existing models. 24.10.1. How to reuse state of the art models for your problem. 24.10.2. Data Augmentation. 24.10.3. Code example: Cats vs Dogs. 25. Long Short-term memory (LSTMs) 25.1. Recurrent Neural Network. 25.2. Backpropagation for RNNs. 25.3. Memory units. 25.4. LSTM.

26.

25.5. GRUs. 25.6. Bidirectional RNN. 25.7. Code example: Predict a stock price using LSTM. 25.8. Sequence to Sequence Models. Case Study: Personalized Cancer Diagnosis. 26.1. Business/Real world problem 26.1.1. Overview. [12:30] Refer: LINK TO THE IPYTHON NOTEBOOK IN THE FOLDER 26.1.2. Business objectives and constraints. [11:00] 26.2. ML problem formulation 26.2.1. Data [4:30] 26.2.2. Mapping real world to ML problem. [18:00] 26.2.3. Train, CV and Test data construction.[3:30] 26.3. Exploratory Data Analysis 26.3.1. Reading data & preprocessing [7:00] 26.3.2. Distribution of Class-labels. [6:30] 26.3.3. “Random” Model. [19:00] 26.3.4. Univariate Analysis 26.3.4.1. Gene feature.[34:00] 26.3.4.2. Variation Feature. [18:30] 26.3.4.3. Text feature. [15:00] 26.4. Machine Learning Models 26.4.1. Data preparation. [8:00] 26.4.2. Baseline Model: Naive Bayes[23:00] 26.4.3. K-Nearest Neighbors Classification. [9:00] 26.4.4. Logistic Regression with class balancing [9:30] 26.4.5. Logistic Regression without class balancing [3:00] 26.4.6. Linear-SVM. [6:00] 26.4.7. Random-Forest with one-hot encoded features [6:30] 26.4.8. Random-Forest with response-coded features [5:00] 26.4.9. Stacking Classifier [7:00] 26.4.10. Majority Voting classifier. [4:30] 26.5.

Assignments. [4:30]

Chapter Duration: 235.5 Mins (~ 4 hrs) 27.

Case studies/Projects: (2*10 =20) 27.1. Amazon fashion discovery engine. 27.2. Malware Detection on Windows OS. 27.3. Song Similarity engine. 27.4. Predict customer propensity to purchase using CRM data. 27.5. Suggest me a movie to watch: Netflix Prize.

27.6. 27.7. 27.8. 27.9. 27.10.

1.

Human Activity Recognition using mobile phone’s accelerometer and gyroscope data. Which ad to show to which user: Ad Click prediction. … ….

Introductory Videos 1.1. 1.2. What is “appliedaicourse.com” all about? 1.2.1. Parts of applied AI Course 1.2.2. It has two online courses (AI course and AI Project) 1.2.3. You can learn the course without having any prerequisites 1.2.4. Customer service , team 1.2.5. AI is required more mathematical knowledge but it’s our job to simplify the math part. Explaining the math part using geometry 1.2.6. Balance between theory and practice 1.2.7. Brief introduction about the AI Course(explained everything from the scratch by using the real world examples ) 1.2.8. We use amazon,facebook,quora,netflix data sets to solve the real world problems end to end. 1.2.9. You will be able to build the productionizable models by end of the AI course. 1.2.10. Build a portfolio to strengthen their resume for AI engineer positions 1.2.11. Brief introduction about the AI projects(explained everything from the scratch by using the real world examples and solved a real world problem end to end )

1.2.12.

Problem statements of few projects like fb recommendation, quora, netflix etc. 1.2.13. 20 hours of free content to understand the way of teaching 1.2.14. Will upload at least 10 hours of content in a week 1.3. What is AI? 1.4. why learn it? 1.4.1. One of the most important areas of Science with huge implications in everything from buying clothes online to cancer treatments. AI has massively changed in the last 5 years with massive impact. Examples: Google 20% of mobile phone searches are voice driven. 35% of Amazon sales are generated by product recommendations. Breast cancer diagnosis almost as good as the best doctors. Better if AI algos and doctors work together.

From my experience, massive amount of gap in supply and demand of engineers who can build AI systems.

1.5.

Who is teaching this course? 1.5.1. Srikanth Varma Chekuri: instructor, extensive experience in building cutting edge AI models at top-tier companies like Amazon, Yahoo. 1.5.2. Most recently: Sr Scientist at Amazon.com in Silicon Valley. 1.5.3. Co-founder of Matherix Labs 1.5.4. Research Engineer at Yahoo Labs Bangalore, 1.5.5. Indian Institute of Science, Bangalore. 1.5.6. SVPCE, small engineering college in Visakhapatnam. GATE Rank 2. 1.5.7. Teaching Experience: Undergrad friends, Offline AI Course (last 3 months) 1.5.8. Team: 5 engineers from top companies and institutions like IITs, IIITs and NITs.

1.6.

What is Applied AI course ? 1.6.1. No prerequisites 1.6.2. 140+ hours of content (basic probability, stats, linear algebra) 1.6.3. This content is highly adaptive. Based on the customer feedback will update the videos. 1.6.4. Right balance between theory and practical 1.6.5. 10+case studies on real word data sets and grading for 5 projects 1.6.6. Brief explanation about data sets (fb,quora,netflix,amazon..) 1.6.7. Explaining math intuitively using geometry

1.6.8. 1.6.9. 1.6.10. 1.6.11. 1.6.12.

Deep learning concepts Gets ipython notebook and relevant code snippets. Will answer your queries by our experts team. You will be able to build the productionizable models by end of the AI course. Build a portfolio.

1.7.

What is AI Projects? 1.7.1. No prerequisites 1.7.2. 100+ hours of content (basic probability, stats, linear algebra) 1.7.3. This content is highly adaptive. Based on the customer feedback will update the videos. 1.7.4. Right balance between theory and practicalNo prerequisites 1.7.5. Explaining math intuitively using geometry 1.7.6. Brief introduction about the AI projects(explained everything from the scratch by using the real world examples and solved a real world problem end to end ) 1.7.7. Problem statements of projects like fb recommendation, quora, netflix etc. 1.7.8. Gets ipython notebook and relevant code snippets. 1.7.9. Will answer your queries by our experts team.

1.8.

What are the job opportunities if I learn AI? 1.8.1.

1.9.

How is this different from other AI/ML Courses online (or) University courses ? 1.9.1. Fine balance between theory and practice. Check out our sample videos. 1.9.2. Students give up when math dense deep. Do not water down. 1.9.3. 10+ real world case studies end to end from raw data to a deployable model. 1.9.4. Portfolio for each course partcipant. 1.9.5. Customer support and obsessiveness

1.10.

1.11.

What are the expected outcomes for a course-participant? 1.10.1. Working professional: Lateral moves. 1.10.2. Students: entry level jobs in data-science, machine learning and AI. Salaries ranging from 6LPA to 25 LPS. Who is this course designed for?

1.11.1. 1.11.2. 1.11.3.

Willingness to learn AI. 8-10 hours per week over a 3-6 month period. No prerequisites except familiarity with any programming language.

1.12.

Will you give me a certificate and grade my work? 1.12.1. 5 real world cases will be graded. 1.12.2. A+ : top tier product based companies 1.12.3. A: product based startups 1.12.4. B+: top-tier services companies 1.12.5. B: other services companies/startups. 1.12.6. No grade below B 1.12.7. Strict grading policy. No grade dilution. 1.13. Will I get a job at the end of this course and will you help me get it? 1.13.1. Working on collaborating with recruiting companies. No guarantees. 1.13.2. We will connect you to recruiting companies and will refer you to various companies. 1.13.3. Depends on how well you build your portfolio. 1.14. What’s is your teaching methodology? 1.14.1. Very informal: explaining a friend 1.14.2. Seven step methodology 1. Why 2. What problems can be solved 3. Concept [Toy example,basic math, notations , intuition, real world example) 4. Diagram 5. Limitations 6. Workarounds 7. Code 1.14.3. Balance between theory and practice. 1.14.4. Google search, Wikipedia and web sources(will provide references also). 1.15. What are the course contents? 1.16. How is AAIC different from other online courses?

2.

Python for ML/AI 2.1. Why Python? 2.2. Setup 2.2.1. Install Python. 2.2.2. Installing packages: numpy, pandas, scipy, matplotlib, seaborn, sklearn) 2.2.3. iPython setup.

2.3.

Introduction 2.3.1. Keywords and Identifiers 2.3.2. Statements, Indentation and Comments 2.3.3. Variables and Datatypes 2.3.4. Input and Output 2.3.5. Operators 2.4. Flow Control 2.4.1. If...else 2.4.2. while loop 2.4.3. for loop 2.4.4. break and continue 2.5. Data Structures 2.5.1. Lists 2.5.2. Tuples 2.5.3. Dictionary 2.5.4. Strings 2.5.5. Sets 2.6. Functions 2.6.1. Introduction 2.6.2. Types of functions 2.6.3. Function Arguments 2.6.4. Recursive Functions 2.6.5. Lambda Functions 2.6.6. Modules 2.6.7. Packages 2.7. File Handling 2.8. Exception Handling 2.9. Debugging Python 2.10. NumPy 2.10.1. Introduction to NumPy. 2.10.2. Numerical operations. 2.11. Matplotlib 2.12. Pandas 2.12.1. Getting started with pandas 2.12.2. Data Frame Basics 2.12.3. Key Operations on Data Frames. 2.13. Computational Complexity: an Introduction 2.13.1. Space and Time Complexity: Find largest number in a list 2.13.2. Binary search 2.13.3. Find elements common in two lists. 2.13.4. Find elements common in two lists using a Hashtable/Dict 2.13.5. Further reading about Computational Complexity [Please add a section with these links for reference]

2.13.5.1.

3.

4.

https://medium.com/omarelgabrys-blog/the-big-scary-o-notation-c e9352d827ce 2.13.5.2. https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/ 2.13.5.3. https://medium.freecodecamp.org/time-is-complex-but-priceless-f0 abd015063c Plotting for exploratory data analysis (EDA) 3.1. Iris dataset 3.1.1. Data-point, vector, observation 3.1.2. Dataset 3.1.3. Input variables/features/dimensions/independent variable 3.1.4. Output Variable/Class Label/ Response Label/ dependent variable 3.1.5. Objective: Classification. 3.2. Scatter-plot: 2D, 3D. 3.3. Pair plots. 3.4. PDF, CDF, Univariate analysis. 3.4.1. Histogram and PDF 3.4.2. Univariate analysis using PDFs. 3.4.3. Cumulative distribution function (CDF) 3.5. Mean , Variance, Std-dev 3.6. Median, Percentiles, Quantiles, IQR, MAD and Outliers. 3.7. Box-plot with whiskers 3.8. Violin plots. 3.9. Summarizing plots. 3.10. Univariate, Bivariate and Multivariate analysis. 3.11. Multivariate probability density, contour plot. 3.12. Exercise: Perform EDA on Haberman dataset. Probability and Statistics 4.1. Introduction to Probability and Stats 4.1.1. Why learn it? 4.1.2. P(X=x1) , Dice and coin example 4.1.3. Random variables: discrete and continuous. 4.1.4. Outliers (or) extreme points. 4.1.5. Population & Sample. 4.2. Gaussian/Normal Distribution 4.2.1. Examples: Heights and weights. 4.2.2. Why learn about distributions. 4.2.3. Mu, sigma: Parameters 4.2.4. PDF (iris dataset) 4.2.5. CDF 4.2.6. 1-std-dev, 2-std-dev, 3-std-dev range. 4.2.7. Symmetric distribution, Skewness and Kurtosis 4.2.8. Standard normal variate (z) and standardization. 4.2.9. Kernel density estimation.

5.

6.

4.2.10. Sampling distribution & Central Limit theorem. 4.2.11. Q-Q Plot: Is a given random variable Gaussian distributed? 4.3. Uniform Distribution and random number generators 4.3.1. Discrete and Continuous Uniform distributions. 4.3.2. How to randomly sample data points. [UniformDisb.ipynb] 4.4. Bernoulli and Binomial distribution 4.5. Log-normal and power law distribution: 4.5.1. Log-normal: CDF, PDF, Examples. 4.5.2. Power-law & Pareto distributions: PDF, examples 4.5.3. Converting power law distributions to normal: Box-Cox/Power transform. 4.6. Correlation 4.6.1. Co-variance 4.6.2. Pearson Correlation Coefficient 4.6.3. Spearman Rank Correlation Coefficient 4.6.4. Correlation vs Causation 4.7. Confidence Intervals 4.7.1. Confidence Interval vs Point estimate. 4.7.2. Computing confidence-interval given a distribution. 4.7.3. For mean of a random variable 4.7.3.1. Known Standard-deviation: using CLT 4.7.3.2. Unknown Standard-deviation: using t-distribution 4.7.4. Confidence Interval using empirical bootstrap [BootstrapCI.ipynb] 4.8. Hypothesis testing 4.8.1. Hypothesis Testing methodology, Null-hypothesis, test-statistic, p-value. 4.8.2. Resampling and permutation test. 4.8.3. K-S Test for similarity of two distributions. 4.8.4. Code Snippet [KSTest.ipynb] Linear Algebra 5.1. Why learn it ? 5.2. Fundamentals 5.2.1. Point/Vector (2-D, 3-D, n-D) 5.2.2. Dot product and angle between 2 vectors. 5.2.3. Projection, unit vector 5.2.4. Equation of a line (2-D), plane(3-D) and hyperplane (n-D) 5.2.5. Distance of a point from a plane/hyperplane, half-spaces 5.2.6. Equation of a circle (2-D), sphere (3-D) and hypersphere (n-D) 5.2.7. Equation of an ellipse (2-D), ellipsoid (3-D) and hyperellipsoid (n-D) 5.2.8. Square, Rectangle, Hyper-cube and Hyper-cuboid.. Dimensionality reduction and Visualization: 6.1. What is dimensionality reduction? 6.2. Data representation and pre-processing 6.2.1. Row vector, Column vector: Iris dataset example. 6.2.2. Represent a dataset: D= {x_i, y_i}

7.

6.2.3. Represent a dataset as a Matrix. 6.2.4. Data preprocessing: Column Normalization 6.2.5. Mean of a data matrix. 6.2.6. Data preprocessing: Column Standardization 6.2.7. Co-variance of a Data Matrix. 6.3. MNIST dataset (784 dimensional) 6.3.1. Explanation of the dataset. 6.3.2. Code to load this dataset. 6.4. Principal Component Analysis. 6.4.1. Why learn it. 6.4.2. Geometric intuition. 6.4.3. Mathematical objective function. 6.4.4. Alternative formulation of PCA: distance minimization 6.4.5. Eigenvalues and eigenvectors. 6.4.6. PCA for dimensionality reduction and visualization. 6.4.7. Visualize MNIST dataset. 6.4.8. Limitations of PCA 6.4.9. Code example. 6.4.10. PCA for dimensionality reduction (not-visualization) 6.5. T-distributed stochastic neighborhood embedding (t-SNE) 6.5.1. What is t-SNE? 6.5.2. Neighborhood of a point, Embedding. 6.5.3. Geometric intuition. 6.5.4. Crowding problem. 6.5.5. How to apply t-SNE and interpret its output (distill.pub) 6.5.6. t-SNE on MNIST. 6.5.7. Code example. Real world problem: Predict sentiment polarity given product reviews on Amazon. 7.1. Exploratory Data Analysis. 7.1.1. Dataset overview: Amazon Fine Food reviews 7.1.2. Data Cleaning: Deduplication. 7.2. Featurizations: convert text to numeric vectors. 7.2.1. Why convert text to a vector? 7.2.2. Bag of Words (BoW) 7.2.3. Text Preprocessing: Stemming, Stop-word removal, Tokenization, Lemmatization. 7.2.4. uni-gram, bi-gram, n-grams. 7.2.5. tf-idf (term frequency- inverse document frequency) [7.2.5 a] [New Video] Why use log in IDF? 7.2.6. Word2Vec. 7.2.7. Avg-Word2Vec, tf-idf weighted Word2Vec 7.3. Code samples 7.3.1. Bag of Words.

7.3.2. Text Preprocessing 7.3.3. Bi-Grams and n-grams. 7.3.4. TF-IDF 7.3.5. Word2Vec 7.3.6. Avg-Word2Vec and TFIDF-Word2Vec 7.4. Exercise: t-SNE visualization of Amazon reviews with polarity based color-coding 8.

Classification and Regression Models: K-Nearest Neighbors 8.1. Foundations 8.1.1. How “Classification” works? 8.1.2. Data matrix notation. 8.1.3. Classification vs Regression (examples) 8.2. K-Nearest Neighbors 8.2.1. Geometric intuition with a toy example. 8.2.2. Failure cases. 8.2.3. Distance measures: Euclidean(L2) , Manhattan(L1), Minkowski, Hamming 8.2.4. Cosine Distance & Cosine Similarity 8.2.5. How to measure the effectiveness of k-NN? 8.2.6. Simple implementation: 8.2.6.1. Test/Evaluation time and space complexity. 8.2.6.2. Limitations. 8.2.7. Determining the right “k” 8.2.7.1. Decision surface for K-NN as K changes. 8.2.7.2. Overfitting and Underfitting. 8.2.7.3. Need for Cross validation. 8.2.7.4. K-fold cross validation. [NEW]8.2.7.4 a Visualizing train, validation and test datasets 8.2.7.5. How to determine overfitting and underfitting? 8.2.7.6. Time based splitting 8.2.8. k-NN for regression. 8.2.9. Weighted k-NN 8.2.10. Voronoi diagram. 8.2.11. kd-tree based k-NN: 8.2.11.1. Binary search tree 8.2.11.2. How to build a kd-tree. 8.2.11.3. Find nearest neighbors using kd-tree 8.2.11.4. Limitations. 8.2.11.5. Extensions. 8.2.12. Locality sensitive Hashing (LSH) 8.2.12.1. Hashing vs LSH. 8.2.12.2. LSH for cosine similarity 8.2.12.3. LSH for euclidean distance.

9.

10.

11.

8.2.13. Probabilistic class label 8.2.14. Code Samples for K-NN 8.2.14.1. Decision boundary. [./knn/knn.ipynb and knn folder] 8.2.14.2. Cross Validation.[./knn/kfold.ipynb and knn folder] 8.2.15. Exercise: Apply k-NN on Amazon reviews dataset. Classification algorithms in various situations: 9.1. Introduction 9.2. Imbalanced vs balanced dataset. 9.3. Multi-class classification. 9.4. k-NN, given a distance or similarity matrix 9.5. Train and test set differences. 9.6. Impact of Outliers 9.7. Local Outlier Factor. 9.7.1. Simple solution: mean dist to k-NN. 9.7.2. k-distance (A), N(A) 9.7.3. reachability-distance(A, B) 9.7.4. Local-reachability-density(A) 9.7.5. LOF(A) 9.8. Impact of Scale & Column standardization. 9.9. Interpretability 9.10. Feature importance & Forward Feature Selection 9.11. Handling categorical and numerical features. 9.12. Handling missing values by imputation. 9.13. Curse of dimensionality. [26:00] 9.14. Bias-Variance tradeoff. [23:30] 9.14a Intuitive understanding of bias-variance. [6:00] 9.15. Best and worst cases for an algorithm. [6:00] Performance measurement of models: 10.1. Accuracy [14:15] 10.2. Confusion matrix, TPR, FPR, FNR, TNR [24:00] 10.3. Precision & recall, F1-score. [9:00] 10.4. Receiver Operating Characteristic Curve (ROC) curve and AUC. [18:30] 10.5. Log-loss. [11:15] 10.6. R-Squared/ Coefficient of determination. [13:30] 10.7. Median absolute deviation (MAD) [5:00] 10.8. Distribution of errors. [6:30] Naive Bayes 11.1. Conditional probability. [12:30] 11.2. Independent vs Mutually exclusive events. [6:00] 11.3. Bayes Theorem with examples. [16:30] 11.4. Exercise problems on Bayes Theorem. [Take from Bramha] 11.5. Naive Bayes algorithm. [26:00] 11.6. Toy example: Train and test stages. [25:30]

11.7. Naive Bayes on Text data. [15:00] 11.8. Laplace/Additive Smoothing. [23:30] 11.9. Log-probabilities for numerical stability. [11:00] 11.10. Cases: 11.10.1. Bias and Variance tradeoff. [13:30] 11.10.2. Feature importance and interpretability. [10:00] 11.10.3. Imbalanced data. [13:30] 11.10.4. Outliers. [5:00] 11.10.5. Missing values. [3:00] 11.10.6. Handling Numerical features (Gaussian NB) [13:00] 11.10.7. Multiclass classification. [2:00] 11.10.8. Similarity or Distance matrix. [2:30] 11.10.9. Large dimensionality. [2:00] 11.10.10. Best and worst cases. [7:30] 11.11. Code example [7:00] 11.12. Exercise: Apply Naive Bayes to Amazon reviews. [5:30] 12. Logistic Regression: 12.1. Geometric intuition.[31:00] 12.2. Sigmoid function & Squashing [36:30] 12.3. Optimization problem. [23:30] 12.4. Weight vector. [10:00] 12.5. L2 Regularization: Overfitting and Underfitting. [25:30] 12.6. L1 regularization and sparsity. [10:30] 12.7. Probabilistic Interpretation: GaussianNaiveBayes [19:00] Description Link: Refer section 3.1 of https://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf 12.8. Loss minimization interpretation [18:30] 12.9. Hyperparameter search: Grid Search and Random Search [16:00] 12.10. Column Standardization. [4:30] 12.11. Feature importance and model interpretability. [13:30] 12.12. Collinearity of features. [14:00] 12.13. Train & Run time space and time complexity. [10:00] 12.14. Real world cases.[10:30] 12.15. Non-linearly separable data & feature engineering. [27:30] 12.16. Code sample: Logistic regression, GridSearchCV, RandomSearchCV [Code link in description: LogisticRegression.ipynb] [23:00] 12.17. Exercise: Apply Logistic regression to Amazon reviews dataset. [5:30] 12.18. Extensions to Logistic Regression: Generalized linear models (GLM) [8:00] Description link: Refer Part III of http://cs229.stanford.edu/notes/cs229-notes1.pdf Total hrs tille here: [52 hours approximately : total 3117 minutes] 13.

Linear Regression and Optimization.

13.1. 13.2. 13.3. 13.4.

Geometric intuition. [13:00] Mathematical formulation.[13:30] Cases.[8:00] Code sample. [12:30] CODE: Description Link to Linear Regression.ipynb 13.5. Solving optimization problems 13.5.1. Differentiation. [28:00] 13.5.1_a Online differentiation tools [8:00] 13.5.2. Maxima and Minima [12:00] 13.5.3. Vector calculus: Grad [9:30] 13.5.4. Gradient descent: geometric intuition. [18:00] 13.5.5. Learning rate. [7:30] 13.5.6. Gradient descent for linear regression. [7:30] 13.5.7. SGD algorithm.[9:00] 13.5.8. Constrained optimization & PCA [14:00] 13.5.9. Logistic regression formulation revisited. [5:30] 13.5.10. Why L1 regularization creates sparsity? [17:00] 13.5.11. Exercise: Implement SGD for linear regression. [6:00] Chapter content: 189 mins Number of mins of content till here: 3306 mins ~ 55.1 hrs

14.

Support Vector Machines (SVM) 14.1. Geometric intuition. [19:00] 14.2. Mathematical derivation.[31:00] 14.3. Loss minimization: Hinge Loss.[14:00] 14.4. Dual form of SVM formulation. [15:00] Reference for Primal and Dual equivalence: http://cs229.stanford.edu/notes/cs229-notes3.pdf 14.5. Kernel trick. [10:00] 14.6. Polynomial kernel.[11:00] 14.7. RBF-Kernel.[21:00] 14.8. Domain specific Kernels. [6:00] 14.9. Train and run time complexities.[7:30] 14.10. nu-SVM: control errors and support vectors. [6:00] 14.11. SVM Regression. [7:30] 14.12. Cases. [9:00] 14.13. Code Sample. [13:30] 14.14. Exercise: Apply SVM to Amazon reviews dataset. [3:30] Chapter contents: 174 mins Total time till here: 3480 mins (58 hrs)

15.

Decision Trees 15.1. Geometric Intuition: Axis parallel hyperplanes. [16:30] 15.2. Sample Decision tree. [7:30] Refer in Description: http://homepage.cs.uri.edu/faculty/hamel/courses/2016/spring2016/csc581/lecture-notes /32-decision-trees.pdf 15.3. Building a decision Tree: 15.3.1. Entropy [18:00] 15.3.1.a Intuition behind entropy 15.3.2. Information Gain [9:30] 15.3.3. Gini Impurity.[7:00] 15.3.4. Constructing a DT. [20:30] 15.3.5. Splitting numerical features. [7:30] 15.3.5a Feature standardization. [4:00] 15.3.6. Categorical features with many possible values. [6:30] 15.4. Overfitting and Underfitting. [7:00] 15.5. Train and Run time complexity. [6:30] 15.6. Regression using Decision Trees. [9:00] 15.7. Cases [12:00] 15.8. Code Samples. [8:30] 15.9. Exercise: Decision Trees on Amazon reviews dataset. [2:00] Chapter contents: 142 mins Total time till here: 3622 mins (60 hrs 22 mins)

16.

Ensemble Models: 16.1. What are ensembles? [5:30] 16.2. Bootstrapped Aggregation (Bagging) 16.2.1. Intuition [17:00] 16.2.2. Random Forest and their construction. [14:30] 16.2.3. Bias-Variance tradeoff. [6:30] 16.2.4. Train and Run-time Complexity.[8:30] 16.2.5. Code Sample. [3:30] 16.2.6. Extremely randomized trees.[8:00] 16.2.7. Cases [5:30] 16.3. Boosting: 16.3.1. Intuition [16:30] 16.3.2. Residuals, Loss functions and gradients. [12:30] 16.3.3. Gradient Boosting [10:00] 16.3.4. Regularization by Shrinkage. [7:00] 16.3.5. Train and Run time complexity. [6:00] 16.3.6. XGBoost: Boosting + Randomization [13:30] 16.3.7. AdaBoost: geometric intuition.[7:00]

16.4. Stacking models.[21:30] Description Link Refer: https://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/ 16.5. Cascading classifiers. [14:30] 16.6. Kaggle competitions vs Real world. [9:00] 16.7. Exercise: Apply GBDT and RF to Amazon reviews dataset. [3:30] Chapter Contents: 190 minutes Time till here: 3812 minutes (63 hrs 22 mins) 17.

Featurizations and Feature engineering. 17.1. Introduction. [14:30] 17.2. Time-series data. 17.2.1. Moving window. [14:00] 17.2.2. Fourier decomposition. [22:00] 17.2.3. Deep learning features: LSTM [7:00] 17.3. Image data. 17.3.1. Image histogram.[15:00] 17.3.2. Keypoints: SIFT. [9:00] 17.3.3. Deep learning features: CNN [4:00] 17.4. Relational data.[9:30] 17.5. Graph data. [11:30] 17.6. Feature Engineering. 17.6.1. Indicator variables. [6:30] 17.6.2. Feature binning.[14:00] 17.6.3. Interaction variables.[8:00] 17.6.4. Mathematical transforms. [4:00] 17.7. Model specific featurizations. [8:30] 17.8. Feature orthogonality.[11:00] 17.9. Domain specific featurizations. [3:30] 17.10. Feature slicing.[9:00] 17.11. Kaggle Winners solutions.[7:00] Chapter Contents: 178 minutes Time till here: 3990 minutes (66 hrs 30 mins)

17a. Miscellaneous Topics 17a.1 Calibration of Models. 17a.1.1 Need for calibration. 17a.1.2 Calibration Plots. http://scikit-learn.org/stable/modules/calibration.html 17a.1.3 Platt’s Calibration/Scaling. https://en.wikipedia.org/wiki/Platt_scaling

17a.1.4 Isotonic Regression http://scikit-learn.org/stable/modules/isotonic.html 17a.1.5 Code Samples http://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClas sifierCV.html http://scikit-learn.org/stable/auto_examples/calibration/plot_calibration.html#sphxglr-auto-examples-calibration-plot-calibration-py 17a.1.6 Exercise: Calibration + Naive Bayes. 17.12. Modeling in the presence of outliers: RANSAC 17.13. Productionizing models. 17.14. Retraining models periodically as needed. 17.15. A/B testing. 17.16. VC Dimensions.

18.

Unsupervised learning/Clustering: K-Means (2) 18.1. What is Clustering? [9:30] 18.1.a Unsupervised learning [3:30] 18.2. Applications.[15:00] 18.3. Metrics for Clustering.[12:30] 18.4. K-Means 18.4.1. Geometric intuition, Centroids [8:00] 18.4.2. Mathematical formulation: Objective function [10:30] 18.4.3. K-Means Algorithm.[10:30] 18.4.4. How to initialize: K-Means++ [24:00] Refer:https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalys is.pdf 18.4.5. Failure cases/Limitations.[11:00] 18.4.6. K-Medoids [18:30] 18.4.7. Determining the right K. [4:30] 18.4.8. Time and space complexity.[3:30] 18.4.9. Code Samples[6:30] 18.4.10. Exercise: Cluster Amazon reviews.[5:00] Chapter Contents: 142.5 minutes Time till here: 4132.5 minutes (~ 69 hrs)

19.

Hierarchical clustering 19.1. Agglomerative & Divisive, Dendrograms [13:30] Refer: https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 19.2. Agglomerative Clustering.[8:00] Refer:https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 19.3. Proximity methods: Advantages and Limitations. [24:00] Refer:https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf

19.4. Time and Space Complexity. [4:00] Refer: https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 19.5. Limitations of Hierarchical Clustering.[5:00] 19.6. Code sample. [2:30] Refer:http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClust ering.html#sklearn.cluster.AgglomerativeClustering Refer:http://scikit-learn.org/stable/auto_examples/cluster/plot_agglomerative_clustering. html#sphx-glr-auto-examples-cluster-plot-agglomerative-clustering-py 19.7. Exercise: Amazon food reviews. [2:30] Chapter Contents: 59.5 minutes Time till here: 4192 minutes (~ 70 hrs) 20.

DBSCAN (Density based clustering) 20.1. Density based clustering [4:30] 20.2. MinPts and Eps: Density [5:30] 20.3. Core, Border and Noise points. [6:30] Refer:https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf 20.4. Density edge and Density connected points. [5:00] 20.5. DBSCAN Algorithm.[11:00] 20.6. Hyper Parameters: MinPts and Eps.[9:30] 20.7. Advantages and Limitations of DBSCAN.[8:30] Refer: https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf Refer:https://en.wikipedia.org/wiki/DBSCAN#Advantages 20.8. Time and Space Complexity.[3:00] 20.9. Code samples. [2:30] Refer:http://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto -examples-cluster-plot-dbscan-py Refer: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html 20.10. Exercise: Amazon Food reviews.[3:00] Chapter Contents: 59 minutes Time till here: 4251 minutes (~ 71 hrs)

21.

Recommender Systems and Matrix Factorization. (3) 21.1. Problem formulation: Movie reviews.[23:00] 21.2. Content based vs Collaborative Filtering.[10:30] 21.3. Similarity based Algorithms.[15:30] 21.4. Matrix Factorization: 21.4.1. PCA, SVD [22:30] 21.4.2. NMF[3:00] 21.4.3. MF for Collaborative filtering [22:30]

21.4.4. MF for feature engineering.[8:30] 21.4.5. Clustering as MF[20:30] 21.5. Hyperparameter tuning. [10:00] 21.6. Matrix Factorization for recommender systems: Netflix Prize Solution [30:00] Refer:https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdf 21.7. Cold Start problem.[5:30] 21.8. Word Vectors using MF. [20:00] 21.9. Eigen-Faces. [14:00] Refer:https://bugra.github.io/work/notes/2014-11-16/an-introduction-to-unsupervised-lear ning-scikit-learn/ Refer:http://scikit-learn.org/stable/auto_examples/decomposition/plot_faces_decompositi on.html#sphx-glr-auto-examples-decomposition-plot-faces-decomposition-py 21.10. Code example. [11:00] Refer:http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html Refer:http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedS VD.html Refer:http://scikit-learn.org/stable/auto_examples/decomposition/plot_faces_decompositi on.html#sphx-glr-auto-examples-decomposition-plot-faces-decomposition-py 21.11. Exercise: Word Vectors using Truncated SVD.[6:00] Chapter Contents: 192.5 minutes Time till here: 4443.5 minutes ( 74 hrs 3.5 mins) 22.

Neural Networks. 22.1. History of Neural networks and Deep Learning. 22.2. Diagrammatic representation: Logistic Regression and Perceptron 22.3. Multi-Layered Perceptron (MLP). 22.4. Training an MLP. 22.5. Backpropagation. 22.6. Weight initialization. 22.7. Vanishing Gradient problem. 22.8. Bias-Variance tradeoff. 22.9. Decision surfaces: Playground Refer: playground.tensorflow.org 22.10. Tensorflow and Keras. 22.10.1. Introduction to tensorflow 22.10.2. Introduction to Keras. 22.10.3. Computational Graph. 22.10.4. Building and MLP from scratch. 22.10.5. MLP in Keras. 22.10.6. GPU vs CPU. 22.10.7. GPUs for Deep Learning. 22.11. Assignment: MNIST using MLP. 23. Deep Multi-layer perceptrons

23.1. 23.2. 23.3. 23.4.

Dropout layers & Regularization. Rectified Linear Units (ReLU). Batch Normalization. Optimizers: 23.4.1. Local and Global Optima. 23.4.2. ADAM 23.4.3. RMSProp 23.4.4. AdaGrad 23.5. Gradient Checking. 23.6. Initialization of models. 23.7. Softmax and Cross-entropy for multi-class classification. 23.8. Code Sample: MNIST 23.9. Auto Encoders. 23.10. Word2Vec. 24. Convolutional Neural Nets. 24.1. MLPs on Image data 24.2. Convolution operator. 24.3. Edge Detection on images. 24.4. Convolutional layer. 24.5. Max Pooling & Padding. 24.6. Imagenet dataset. 24.7. AlexNet. 24.8. Residual Network. 24.9. Inception Network. 24.10. Transfer Learning: Reusing existing models. 24.10.1. How to reuse state of the art models for your problem. 24.10.2. Data Augmentation. 24.10.3. Code example: Cats vs Dogs. 25. Long Short-term memory (LSTMs) 25.1. Recurrent Neural Network. 25.2. Backpropagation for RNNs. 25.3. Memory units. 25.4. LSTM. 25.5. GRUs. 25.6. Bidirectional RNN. 25.7. Code example: Predict a stock price using LSTM. 25.8. Sequence to Sequence Models. 26. Case Study: Personalized Cancer Diagnosis. 26.1. Business/Real world problem 26.1.1. Overview. [12:30] Refer: LINK TO THE IPYTHON NOTEBOOK IN THE FOLDER 26.1.2. Business objectives and constraints. [11:00] 26.2. ML problem formulation

26.2.1. Data [4:30] 26.2.2. Mapping real world to ML problem. [18:00] 26.2.3. Train, CV and Test data construction.[3:30] 26.3. Exploratory Data Analysis 26.3.1. Reading data & preprocessing [7:00] 26.3.2. Distribution of Class-labels. [6:30] 26.3.3. “Random” Model. [19:00] 26.3.4. Univariate Analysis 26.3.4.1. Gene feature.[34:00] 26.3.4.2. Variation Feature. [18:30] 26.3.4.3. Text feature. [15:00] 26.4. Machine Learning Models 26.4.1. Data preparation. [8:00] 26.4.2. Baseline Model: Naive Bayes[23:00] 26.4.3. K-Nearest Neighbors Classification. [9:00] 26.4.4. Logistic Regression with class balancing [9:30] 26.4.5. Logistic Regression without class balancing [3:00] 26.4.6. Linear-SVM. [6:00] 26.4.7. Random-Forest with one-hot encoded features [6:30] 26.4.8. Random-Forest with response-coded features [5:00] 26.4.9. Stacking Classifier [7:00] 26.4.10. Majority Voting classifier. [4:30] 26.5.

Assignments. [4:30]

Chapter Duration: 235.5 Mins (~ 4 hrs) 27.

Case studies/Projects: (2*10 =20) 27.1. Amazon fashion discovery engine. 27.2. Malware Detection on Windows OS. 27.3. Song Similarity engine. 27.4. Predict customer propensity to purchase using CRM data. 27.5. Suggest me a movie to watch: Netflix Prize. 27.6. Human Activity Recognition using mobile phone’s accelerometer and gyroscope data. 27.7. Which ad to show to which user: Ad Click prediction. 27.8. … 27.9. …. 27.10.

Page 1 of 72. 1. Python for ML/AI. 1.1. Why Python? 1.2. Setup. 1.2.1. Install Python. 1.2.2. Installing packages: numpy, pandas, scipy, matplotlib, seaborn, sklearn). 1.2.3. iPython setup. 1.3. Introduction. 1.3.1. Keywords and Identifiers. 1.3.2. Statements, Indentation and Comments. 1.3.3. Variables and Datatypes. 1.3.4.

Download PDF

358KB Sizes 18 Downloads 232 Views

Report

Syllabus AI COURSE.pdf

Recommend Documents