Machine Learning in Python builds upon the statistical knowledge you have gained earlier in the program. This course focuses on predictive modelling and enters multidimensional spaces which require an understanding of mathematical methods, transformations, and distributions. The course introduces these concepts as well as complex means of analysis such as clustering, factoring, Bayesian inference, and decision theory while also allowing you to exercise your Python programming skills.

preview the program

for FREE!

Create a free account and start learning data science today.

create free accountIn this part of the course, we will discuss what the course covers, why you need to learn advanced statistics, what’s the differences are with machine learning, and how to get the most out of this training. In this section, you will also expand on what you learned in our statistics training with additional concepts and will apply all the theory in Python. This section serves two purposes: 1) a useful refresher of regression, and 2) a great way to reinforce what you have learned, applying it in practice while coding.

Welcome to Advanced Statistics!

Introduction to Regression Analysis

The Linear Regression Model

Correlation vs Regression

Geometrical Representation of the Linear Regression Model

First Regression in Python

Using Seaborn for Graphs

How to Interpret the Regression Table

Show all lessons

Decomposition of Variability

What is the OLS?

R-Squared

Show fewer lessons

After we learn about the simple linear regression, we build up on that knowledge to multiple linear regression. In this part, we explore models with many input variables, no matter the numerical or categorical and learn how to make predictions using them.

Multiple Linear Regression

Adjusted R-Squared

Test for Significance of the Model (F-Test)

OLS Assumptions

A1: Linearity

A2: No Endogeneity

A3: Normality and Homoscedasticity

A4: No Autocorrelation

Show all lessons

A5: No Multicollinearity

Dealing with Categorical Data - Dummy Variables

Making Predictions with the Linear Regression

Show fewer lessons

While there are many libraries that can compute a regression model, the most numerically stable one is sklearn. It is also the preferred choice of many machine learning professionals. In this section, we implement all we know about regressions in this amazing library.

What is sklearn?

Game Plan for sklearn

Simple Linear Regression with sklearn

Multiple Linear Regression with sklearn

Adjusted R-Squared

Creating a Summary Table with the p-values

Feature Scaling

Feature Selection through Standardization

Show all lessons

Making Predictions with Standardized Coefficients

Underfitting and Overfitting

Training and Testing

Linear Regression with sklearn - Practical Example

Show fewer lessons

Data scientists use logistic regressions when the dependent variable is binary (0 and 1, true and false, etc.). This type of data is encountered on a daily basis when working as a data scientist, and here, you will learn how to build a logistic regression, understand tables, interpret the coefficients of a logistic regression, calculate the accuracy of the model, as well as how to test. We will introduce under and overfitting, and will teach you how to test your models.

Introduction to Logistic Regression

A Simple Example in Python

Logistic vs Logit Function

Building a Logistic Regression

Understanding Logistic Regression Tables

What do the Odds Actually Mean

Binary Predictors in a Logistic Regression

Calculating the Accuracy of the Model

Show all lessons

Underfitting and Overfitting

Testing the Model

Show fewer lessons

Cluster analysis is the most intuitive and important example of unsupervised learning. However, to be able to understand cluster analysis, we must first explore the mathematics behind it.

Introduction to Cluster Analysis. Some Examples of clustering

Difference between Classification and Clustering

Math Prerequisites

In this section, you will learn how to do Cluster analysis. Cluster analysis consists in dividing your data into separate groups based on an algorithm. Clustering is an amazing technique often employed in data science. But what’s more, it makes much more sense to study patterns observed in a particular group rather than trying to find patterns in the entire dataset. We will provide several practical examples that will help you understand how to carry out cluster analysis and the difference between classification and clustering.

K-Means Clustering

Clustering Categorical Data

How to Choose the Number of Clusters

Pros and Cons of K-Means Clustering

Relationship between Clustering and Regression

Market Segmentation with Cluster Analysis (Part 1)

How is Clustering Useful?

In previous sections, we focus extensively on k-means clustering, as it is the fastest and most efficient method for clustering. In this section, we explore other approaches that are less common.

Types of Clustering

Dendrograms and Heatmaps

MODULE 3

This course is part of Module 3 of the 365 Data Science Program. The complete training consists of four modules, each building upon your knowledge from the previous one. Expanding on your statistical and programming skills from Modules 1 and 2, Module 3 is designed to improve your programming skills and develop your advanced statistical thinking. You will learn how to build complete linear and logistic regression models, how to cluster data, and how to build deep learning models with TensorFlow 2.0.

See All ModulesReal-life project and data. Solve them on your own computer as you would in the office.

Our expert instructors are happy to help. Post a question and get a personal answer by one of our instructors.

Earn a verifiable certificate after each completed course. Celebrate your successes and share your progress with your professional network!

Sign up today for FREE!

Whether you want to scale your career or transition into a new field, data science is the number one skillset employers look for. Grow your analytics expertise and get hired as a data scientist!