Machine Learning for User Classification

Using Machine Learning to Classify Students and Predict Student Purchases intermediate

With Hristina Hristova

Project type: Skill Track

Duration: 10 Hours

Case Description

Background: In a machine learning classification problem, the algorithm assigns labels to instances based on their features. This Machine Learning for User Classification project will allow you to apply this technique by utilizing an excerpt of our own data stripped of personally identifiable information. You will examine student engagement metrics, such as the number of days students have spent on the platform, the minutes of watched content, and the number of courses they’ve started. You’ll then use this data to train several machine learning models, including logistic regression, k-nearest neighbors, support vector machines, decision trees, and random forests. The aim is to predict whether students would upgrade their free plan to a paid one.

Business Objective: Such an analysis is of utmost importance not only for 365 but for any online company. Predicting potential customers can be used for advertisement targeting or reaching out with exclusive offers. This helps allocate a budget for users likely to benefit from the product, aiming to increase the company’s revenue.

Note: This classification problem deals with a heavily imbalanced dataset—the number of students likely to keep their free plan exceeds the number predicted to purchase. You’re encouraged to research different data resampling methods, such as (among others) oversampling, undersampling, and SMOTE. But dealing with the data imbalance is not required to realize the project successfully.

Project requirements

You'll work with Python 3 (or newer) for this Machine Learning for User Classification project, where you’ll need to prepare the following libraries:

  • pandas
  • matplotlib
  • statsmodels
  • scikit-learn
  • numpy
  • seaborn

Project files

  • ml_datasource.csv – the file contains the database for the project.
  • Machine Learning Project.ipynb – the notebook contains a skeleton of the project for each task.
Start project
Project content
  • 2 Project files
  • Guided and unguided instructions
  • Part 1: Data Preprocessing
  • Part 2: Creating a Logistic Regression Model
  • Part 3: Creating a K-Nearest Neighbors Model
  • Part 4: Creating a Support Vector Machines Model
  • Part 5: Creating a Decision Trees Model
  • Part 6: Creating a Random Forests Model
  • Part 7: Results Interpretation
  • Quiz
Topics covered
Data processing Programming Machine learning