Data Scientist Track
As a data scientist, you will be on top of the data science ladder. Apart from theoretical knowledge and technical skills, you will need outstanding communicational skills. You will excel in data manipulation, statistics, visualization, and machine learning. Your data science competence will be extremely broad as providing extraordinary insights requires exceptional understanding of all aspects of data, programming, and business.
What’s included and why
Intro to data and data science
Working with data is an essential part of maintaining a healthy business. This course will introduce you to the field of data science and help you understand the various processes and distinguish between terms such as: ‘traditional data’, ‘big data’, ‘business intelligence’, ‘business analytics’, ‘data analytics’, ‘data science’, and ‘machine learning’.
More info goes here
Section 1: Introduction
We will start with an introductory lecture about the 365 Data Science program. We will discuss the best way to approach our trainings and how to take our courses in a way that will position you well for a data scientist career.
Section 2: The different data science fields
For a novice, the data science field can be rather confusing. It takes a while to make sense of all the buzz words and different areas of data science. Do not worry, as we will make this process easier and much faster for you. In some of our first lessons, you will learn how to distinguish between Business analytics, Data analytics, Business Intelligence, Machine Learning, and Artificial Intelligence. With this knowledge we point out the place of data science as of today. The specially designed infographic we will discuss in the lessons makes everything clearer.
Section 3: The relationship between different data science fields
In this chapter, you will learn how data science fields relate to each other and which ones leverage on:
- traditional and big data
- business intelligence
- traditional data science methods and machine learning
Section 4: What is the purpose of each data science field
It is one thing to learn which are the various data science disciplines, but a whole different story to be able to tell what each discipline is used for in practice. This is really valuable for you as it will allow you to gain an idea of the practical application of the different methods you will learn later on in our program.
Section 5: Common data science techniques
There are different ways to approach Traditional data, Big data, Business Intelligence, Traditional data science methods, and Machine learning. In this part of the course, we will introduce you to some of the most common techniques to do that, and we will provide several practical examples that will make things easier and more relatable.
Section 6: Common data science tools
Before we dive in to studying the different types of tools used in data science, we will provide a quick overview for you, so you can have a good idea of why we are studying different tools and how they relate with each other. This will greatly facilitate your learning process as you will already know what to expect and for what tasks you will need a tool exactly.
Section 7: Data science career paths
As with most professions, there are different career paths you can embark upon. In this chapter, we will discuss several job positions related to the fields of data and data science.
Section 8: Dispelling common misconceptions
Finally, we will conclude our Intro to Data and Data Science training with a few lessons dispelling the most common misconceptions about the data science field.
Microsoft Excel is the #1 productivity software in the world. A huge amount of data comes in a spreadsheet format, so an analyst needs Excel in their arsenal. This course will teach you all the Excel skills you need to perform multi-layered calculations, create charts, manipulate data, lookup functions, and more!
More info goes here
Section 1: Course Introduction
In this introductory part of the course, we will discuss why you need to learn Excel, and which key skills you will acquire by taking the course.
Section 2: A quick introduction to the basics of Excel
This section is fundamental for those of you who have never used Excel. We will start from the very basics: introducing the Excel ribbon, learning how to insert (and delete) rows and columns, how to perform data entry tasks, and how to format worksheets professionally. In addition, you will create your first formulas and functions, and cut, copy, and paste values for the first time.
Section 3: Excel useful tips & tools
Once you are familiar with the basic operations in Excel, it will be time to learn Excel best practices and learn how to navigate spreadsheets professionally. In no time you will know how to apply fast scrolling, use keyboard shortcuts, format sheets professionally, fix cell references, use named ranges, apply custom cell formats, and much more.
Section 4: Excel functions
Excel is one of the most popular productivity tools the business world has ever seen. The main reason for this is Excel functions. It is time for you to learn how to use Excel functions like a true professional. We will start with some easier examples (SUM, COUNT, AVERAGE, IF, MAX, MIN, VLOOKUP, HLOOKUP), and gradually introduce more advanced (and more powerful) functions such as SUMIF, SUMIFS, COUNTIF, COUNTIFS, INDEX, MATCH, INDEX & MATCH, etc.
Section 5: Excel charts
One of the strongest features of Microsoft Excel, besides multi-layered calculations, is that it allows you to visualize data. Here you will learn how to insert and format different types of charts that will help you make sense of numbers and figure out their trend.
Section 6: Practical exercise – Build a P&L from scratch
It is one thing to learn how to work with Excel’s most important tools, but it is even better to apply these techniques in a practical exercise. This is what we will do here. The “Build a P&L from scratch” exercise allows you to see how everything you have learned so far can be put into practice.
Data science is based on statistics and statistics steps on the foundations laid by probability. This course will help you master the probability theory necessary to think like a data scientist. You will learn about expected values, combinatorics, Bayesian notation as well as probability distributions.
More info goes here
Section 1: Fundamentals of Probability
In this part we explore why probability is fundamental to becoming a data scientist. We introduce you to the key terms and ideas concerning probabilities and events, including theoretical and experimental probabilities, preferred outcomes, sample space, expected value, and complements.
Section 2: Combinatorics
This section is designed to teach you what combinatorics is and where we encounter it in life. We will consider the 3 central concepts in combinatorics – permutations, variations, and combinations – and you’ll learn how to calculate each of these with the correct formulas.
Section 3: Bayesian Inference
In this section we learn how to describe events and the ways they interact with one another. We introduce important concepts like intersections, unions, and conditional probability. Then we focus on Bayes’ Law and how to use it to interpret the relationships between the possible outcomes of various events.
Section 4: Distributions
We wrap up the course with a section on distributions. Being able to determine what kind of distribution a dataset follows is crucial in making accurate predictions about the future. We talk about the possible values a random variable can take and how frequently they occur. We introduce well-known distributions and events that follow them and proceed to discuss each common distribution in greater detail.
Section 5: Tie-ins
Once comfortable with the fundamentals of probability, we spend a minute exploring the tie-ins between this field and several others such as finance, statistics and data science.
Statistics is the driving force in any quantitative career. It is the fundamental skill data scientists need to be able to understand and design statistical tests and analyses performed by modern software packages and programming languages. We will start from the very basics and will gradually build up your skills allowing you to understand more complex analyses carried out later.
More info goes here
Section 1: Introduction
In this introductory part of the course, we will discuss why you need to learn Statistics, and which are the key skills you will acquire by taking the course.
Section 2: Fundamentals of descriptive statistics
Understand the basic features of data. There are different types of data and levels of measurement. After you complete this section, you will be able to distinguish between them and will know the difference between categorical and numerical values. All of this will help you when calculating the measures of central tendency (mean, median, and mode), and dispersion indicators such as variance, standard deviation, as well as measures of relationship between variables like covariance, and correlation. To reinforce what you have learned, we will wrap up this section with an easy to understand practical example.
Section 3: Fundamentals of inferential statistics
One of the core topics you will find in every Statistics text book is about distributions. In this part of the course, you will learn what a distribution is and what characterizes the normal distribution. We will introduce you to the central limit theorem and to the concept of standard error. Pretty soon you will be able to calculate confidence intervals with known population and variance. And once we introduce the Student T distribution, you learn how to work with smaller samples, as well as differences between two means (with dependent and independent samples). All of these tools will be fundamental later on when we start applying each of these concepts to large datasets and use coding languages like Python and R. To reinforce what you have learned, we will wrap up this section with an easy to understand practical example once again.
Section 4: Hypothesis testing
Confirming and rejecting hypothesis with a reasonable degree of certainty is a practical and easy to apply method when dealing with uncertainty. In this section, you will learn how to perform hypothesis testing and what is the difference between a null and alternative hypothesis, as well as rejection and significance level, type I and type II errors. The lessons will teach you how to test for the mean when the population variance is known and unknown, as well as how to test for the mean when you are dealing with dependent and independent samples. We should not forget to mention that this is the part of the course when you will become familiar with the p-value, a key measure when dealing with advanced models. Similar to previous sections, we will conclude with a practical example, to make use of our new knowledge.
Tableau is now one of the most popular business intelligence tools in the world. It allows non-technical users to visualize data and work with it immediately. This course shows you how you can use Tableau to create state-of-the-art visualizations and powerful dashboards, precisely what corporate executives need when making decisions.
More info goes here
Section 1: Why Tableau? Introduction to Tableau and how to get started
Tableau is an indispensable tool in the arsenal of most corporate business intelligence analysts, data analysts, and data scientists. Many people are uncertain about the difference between Tableau and spreadsheet tools like Excel. And that’s a reasonable doubt. In this part of the course, we will explain when and why you need Tableau, as well as the difference between Tableau and spreadsheet tools like Excel. We will also teach you how to install Tableau Public (Tableau’s free version).
Section 2: Tableau’s interface and connecting data
Once we have installed Tableau Public, we will be ready to go through Tableau’s interface and describe its different parts succinctly. You will also learn how to connect Tableau to Excel, csv, and other types of files or environments containing data.
Section 3: Basic operations in Tableau: creating a chart, creating a table
It is time to create our first Tableau charts and tables. You will learn how to create basic charts and adjust parts of their appearance.
Section 4: Additional Tableau functionalities
Some of the additional Tableau functionalities we will cover here are: creating custom fields, adding calculations to a table, adding totals and subtotals, and adding a filter.
Section 5: Practical exercise (part 1): Connecting data
The final part of our Tableau training is a complete practical example. The exercise is divided in two parts. In the first part, we will test several ways to connect our data. You already studied joins in SQL. Here, you will see how to apply joins in a Tableau context. Moreover, we will teach you about data blending, and discuss best practices related to joining and blending data.
Section 6: Practical exercise (part 2): Creating a dashboard
Once we have connected our data and have verified it is ready, we will be ready to create the three charts intended at the beginning of the exercise. Each of these charts analyzes a different aspect of a real-life dataset. We will group the charts in a dashboard, and add a filter, which isu applied to all three charts at the same time.
SQL is one of the fundamental programming languages you need to learn to work with databases. When you are a data scientist in a company and you need data to perform your analysis, you usually have two options: extract it on your own or contact the IT team. Of course, the first one is an extremely valuable skill to have. In this course, we will teach you everything you need to know in terms of database management and creating SQL queries.
More info goes here
Section 1: Introduction to databases, SQL, and MySQL
Whether you are working in business intelligence (BI), data science, database administration, or back-end development, you will have to retrieve information from a server storing large amounts of data. To achieve this, you need SQL. The relational database management system we chose for this course is MySQL. We did that because MySQL is open-source, reliable, and mature. In one of the videos of this section, we will provide you with step-by-step guidance when you install MySQL Server and MySQL Workbench. The introductory part of this course pays significant attention to database theory. You will learn the meaning of terms like database, data table, data entity, record, field, relation, and more.
Section 2: First steps in SQL
It is time to create your first database and make your first steps in SQL. In this section, we will introduce you to string, fixed- and floating-point, and other useful data types. You will learn how to create a database table and how to use such a table. Not only that, but we will also introduce the different types of constraints that can be assigned to tables (primary key, foreign key, unique key, default, not null, and other types of constraints)
Section 3: SQL best practices
There are many ways you can write your SQL code, but there are only a few that are considered professional. In this part of the course, we will teach you how to write professional code and how to adhere to professional best practices. To reinforce what you have learned, we will wrap up this section with an easy to understand practical example.
Section 4: Loading the ‘employees’ database
One of the best features of our SQL training is that it uses a real-life database – the “Employees” database. We will use it to manipulate data in MySQL in all lessons. In this chapter, you will download the SQL file and will run it in Workbench.
Section 5: Data manipulation in SQL: SELECT, INSERT, UPDATE, DELETE
Are you ready to learn some of the most frequently used tools in SQL? These are the SELECT, INSERT, UPDATE, and DELETE statements. We use these statements to extract, insert, update, and delete data from a database.
Section 6: MySQL Aggregate functions
Aggregate functions come in handy when we want to perform some arithmetic operations with the data in our database. The most commonly used aggregate functions in SQL are COUNT(), SUM(), MIN(), MAX(), and AVG().
Section 7: SQL Joins, subqueries, self joins, and views
Joins are one of the most powerful and frequently used tools in SQL. This is a tool you will need when combining the information from two or more tables. After completing this section, you will be able to use inner, left, right, and self joins. You will also learn how to write subqueries and views. The section includes a number of useful tips and tricks and aims to take your SQL skills to the next level.
Section 8: Stored routines
Stored routines are a set of SQL statements that have been pre-written and stored on a server allowing users to re-run them at a later stage. You will learn how to create your own stored procedures and functions.
Section 9: Advanced SQL Topics
In the last part of the training, you will learn advanced SQL topics like local variables, session variables, global variances, MySQL triggers, and MySQL indexes.
SQL + Tableau
It is next to impossible to analyse data in SQL. This course will give you methods of using SQL and Tableau simultaneously, helping you find hidden value in your data faster than using them separately.
More info goes here
Section 1: Introduction
In the introductory part of the course, we will talk about the lessons you will see next and discuss why a data scientist would want to be able to connect tools like SQL and Tableau.
Section 2: How to connect SQL and Tableau
The integration between the two tools is not that difficult. Especially, if you work with Tableau Desktop. In this section, we will show you how to do that, and we will provide you with a workaround in case you use Tableau Public.
Section 3: Practical exercise – Part 1
We will continue to use the ‘employees’ database we worked with in a large part of our SQL training. By connecting Tableau and SQL you will be able to visualize several trends related to the company’s gender gap situation
Section 4: Practical exercise – Part 2
Here we will continue to explore the ‘employees’ database and will run several sophisticated SQL queries to answer the remaining questions in our task. The topic under investigation continues to be ‘gender gap’ policies. We will gather our data in a well-organized dashboard and will show you how to add an interactive filter which makes analysis much quicker and intuitive
Python is one of the most used programming languages among data scientists. This course will show you the technical advantages it has over other programming languages and its modules for scientific computing which make it a preferred choice in the fields of finance, econometrics, economics, data science, and machine learning.
More info goes here
Section 1: Course Introduction
In this introductory part of the course, we will discuss what does the course cover, why you need to learn Python, and which is the best way to approach this training
Section 2: Introduction to programming with Python
An introductory section in which we will introduce you to the concept of programming and will talk about some of Python’s key features (it is an open-source, general-purpose, high-level language). We will show you how to install the Jupyter Notebook (the environment we will use to code in Python) and will introduce you to its interface and dashboard.
Section 3: Python variables and data types
This is where you will start coding and learn one of the most fundamental concepts in programming – working with variables
Section 4: Basic Python syntax
If you want to master Python programming, there is no way around learning basic Python syntax operators first. In this section, we will cover the double equality sign, reassigning of values, adding comments, line continuation, indexing elements, arithmetic operators, comparison operators, logical operators, and identity operators
Section 5: Conditional statements
Conditional statements are the bread and butter of programming. Here you will start creating your own IF, ELSE, and ELIF statements
Section 6: Python functions
Python functions are another invaluable tool for programmers. They allow you to carry out pre-defined or specifically designed operations that manipulate the data you are working with and bring it one step closer to representing a meaningful output.
Section 7: Python sequences
Sequences are one of the main building blocks of computer programming. A sequence helps you store and organize different values you are working with. We will teach you how to work with lists, list slicing, tuples, and dictionaries.
Section 8: Using iterations in Python
Iterations are a programming technique which allows you to execute certain code repeatedly. This is one of the instruments letting you to automate repeated tasks and benefit from one of its main strong points.
Section 9: Advanced Python tools
In this part of our training, you will learn about object-oriented programming, different modules and packages, the standard library, how to import modules in Python, how to work with arrays and organize data in Python. All of these lessons will significantly enhance the Python knowledge you have acquired up to this point. Once you complete this section you’ll be ready to move ahead with our program and see how Python is be used in combination with SQL and Tableau.
Mathematics is a broad subject, but there are specific subfields that are heavily employed in data science: calculus and linear algebra – and this is what the program covers. However, in order to thrive in data science, you must have all the numerical tools so you can eventually understand the most complicated of machine learning algorithms.
More info goes here
Section 1: Course Introduction
In this introductory part of the course, we will discuss what the course covers, why you need to learn mathematics, and the best way to approach this training.
Section 2: Linear Algebra
In this section, we will discuss the basics of linear algebra – scalars, vectors, matrices, and tensors. We will dive into the terminology and the different operations that one can perform, like transposing, addition, subtraction, multiplication, etc. We look into types of matrices like identity matrix and inverse matrix. We finish off this part with Eigen values and Eigen Vectors.
Section 3*: Differentiation
While classical differentiation itself is rarely used in data science, differentiation of matrices is mandatory for truly understanding deep learning. In this section, we look into the basics of differentiation so we can later build up to that in linear algebraic terms.
Section 4*: Differentiation in Linear Algebra
One of the central mathematical concepts in machine and deep learning is gradient descent. The mechanics of it are nothing more than a combination of differentiation and linear algebra. That’s what this part is about.
*Sections under development
R is a programming language that has been specifically designed for statistics and graphics. Programming in R is a fast and effective way to perform advanced data analyses. This course will inform you how to use R and apply the statistical functions you will need as a data scientist.
More info goes here
Section 1: Introduction to R and how to get started
In this introductory part of the course, we will go for a walk in the R environment. First, we are going to install R and RStudio together. Then, we’ll dive straight into RStudio and learn about its interface, and how to make use of the main windows and tabs there. We will also talk about setting your working directory and getting additional help.
Section 2: The building blocks of R
In this section we will learn about:
- Objects and coercion rules in R
- Functions in R
- How to use R’s console
Not only that, by the end of the section you will have built your first very own function; it will be able to draw cards from a deck, so you can play your favourite board game even if you don’t have the physical cards in front of you.
Section 3: Vectors and vector operations
Now that we have covered the basics, in this section we are about to drill deeper into R’s most widely used object type – the vector. You will learn how to create vectors and how to perform vector arithmetic operations. You will also see how to index and access elements from a vector, and how vectors recycle. Then, you will see how to change the dimensions of a vector and create a two-dimensional object from it. That will be our nice little segue into matrices.
Section 4: Matrices
It is time to talk about matrices. You will learn how to create and rename matrices, and how to index and slice matrices. All of this will lay a super solid foundation for the big star of data analysis: the data frame. Not only that, but we will also talk about factors, which is related to the statistics part of the course. Finally, we will cover lists: R’s way of storing hierarchical data.
Section 5: Fundamentals of programming with R
In this section of the course, we will go through some of the fundamental tools you need to learn when programming with R (and many other programming languages). We will cover relational operators, logical operators, vectors, IF, ELSE, and different types of loops (for, while, and repeat) in R. Some of these topics will have already been introduced to you in our Python training, but here you will have the chance to reinforce what you have learned and see things with R in mind.
Section 6: Data frames
In this section, we will focus our attention on how to create and import data frames into R. How to quickly get a sense of your data frame by using the str() function, summary(), col-and row-names, and so on. We’ll learn about accessing individual elements of your data frame for further use. And about extending a data frame with either new observations or variables (or row and columns). Furthermore, we will talk about dealing with missing data because in real life that happens more often than we’d like. And we’ll discuss exporting data frames once we’re happy with their general state and ready to share them with the world.
Section 7: Manipulating data
At this point in our training it is time to learn about some heavy-duty data manipulation techniques that will, without a doubt, become indispensable companions to your daily work with data. We will be talking about data transformation with the infamous dplyr package. More specifically, how to filter(), arrange(), mutate(), and transmute() your data; as well as how to sample() fractions and fixed number of elements from it. You will also learn what tidy data is, why it is extremely important for the efficiency of your work to tidy your data sets in the most meaningful way, and how to achieve this by using the tidyr package. You will be tidying several messy real-life data sets by using the gather(), spread(), separate(), and unite() functions. Finally, the big surprise for this section… you will learn how to combine multiple operations in an intuitive way by using the pipe operator.
Section 8: Visualizing data
Plotting and graphing data is the most elegant way to understand your data and present your findings to others. In this section we are going to learn about the grammar of graphics and the seven layers that comprise a visualization. Then, we will jump straight into creating graphs and plots, with the ggplot2 package. Starting with the histogram, we will continue on to the bar chart, then onto the box and whiskers plot, and finally, the scatterplot. You will notice that with each new type of plot you will also be learning about a new layer or two, getting familiarized with ggplot2 and its inner workings in an incremental way.
Section 9: Exploratory data analysis
In this part of the course, we start applying R for statistical analysis. We are ready to discuss several exploratory data analysis topics:
- Population vs. sample
- Mean, median, and mode
- Variance, standard deviation, and the coefficient of variability
- Covariance and correlation
Section 10: Hypothesis testing in R
At this point, you are already familiar with hypothesis testing. We covered it in one of our earlier modules – Statistics. What we will do here is a natural continuation – you will learn how to carry out hypothesis testing in R.
Section 11: Regression analysis in R
Regression analysis is another topic we covered earlier in our program. As with hypothesis testing, this is a great opportunity to apply the theory you have learned previously in R.
Advanced Statistical Methods in Python
Advanced Statistical Methods builds upon the statistical knowledge you will already have gained by focusing on predictive modelling and entering multidimensional spaces which require an understanding of mathematical methods, transformations, and distributions. The course introduces these concepts as well as complex means of analysis such as clustering, factoring, Bayesian inference, and decision theory while also allowing you to exercise your Python programming skills.
More info goes here
Section 1: Introduction
In this introductory part of the course, we will discuss what the course covers, why you need to learn advanced statistics, what’s the differences are with machine learning, and how to get the most out of this training.
Section 2: Regression analysis
Regression analysis is a topic you are already familiar with. However, here we will extend what you learned in our Statistics training with some additional concepts and will apply all the theory in Python. This section will serve for two purposes 1) a useful refresher of regression and 2) a great way to reinforce what you have learned applying it in practice while coding.
Section 3: Logistic regression
Data scientists use logistic regressions when the dependent variable is binary (0 and 1, true and false, etc.). This type of data is encountered on a daily basis when working as a data scientist and here, we will get you prepared. You will learn how to build a logistic regression, how to understand tables, how to interpret the coefficients of a logistic regression, calculate the accuracy of the model and how to test. We will introduce under and overfitting and will teach you how to test your models.
Section 4: Cluster analysis
In this chapter, we will introduce another essential technique you will definitely need in your data science arsenal, Cluster analysis. This consists in dividing your data into separate groups based on an algorithm. Clustering is an amazing technique often employed in data science. But what’s more, often it makes much more sense to study patterns observed in a particular group rather than trying to find patterns in the entire dataset. We will provide several practical examples that will help you understand how to carry out cluster analysis and the difference between classification and clustering.
Section 5: Factor analysis
There is a difference between variables and factors that have an impact on an independent variable. In this part of our training, we will teach you how to isolate few factors from a set of variables and use them to explain the independent variable. We will learn how to reduce the dimensionality of problems in order to apply the methods we learned before. We will go through different techniques used for factor analysis, while finding its place in machine learning.
Python + SQL + Tableau
While Python is the leading programming language for data science, SQL is unmatched when it comes to relational database management. Tableau, on the other hand, is a leading business intelligence software, providing tools for quick computations and rich visualizations.
This course will show you how to combine these software products to solve real-life business problems.
More info goes here
Section 1: Software Integration
We begin by introducing key terms such as data, servers, clients, requests, responses, data connectivity, APIs, and endpoints. Understanding all of these terms and how they are used is crucial for grasping the concept of software integration.
Section 2: What’s next in the course?
In this short section, we introduce the business problem to be solved and outline the task we’ll need to solve in the lessons to come: predict the probability that an individual will be absent from work on a specific day.
Section 3: Preprocessing the ‘Absenteeism_data’
If you already are a Python guru and cleaning data sets comes as a second nature, you may wish to skip this section. But if it is possible that you have gaps in your Python mastery, even if it’s here and there, it is essential that you go through every lecture. We are actually coding all the time in this section, so you’ll quite likely end up having a lot of fun. By the end of the section, you will have preprocessed an entire dataset
Section 4: Applying Machine Learning to the Preprocessed Data
This section is at the core of this Absenteeism Exercise. Here, we discuss modern Machine Learning tools that can be used to solve problems like the one we’re looking at. Every step requires you to use Python, so stretch your coding fingers and let’s get to it!
Section 5: Connecting Python and SQL
In this section you see software integration applied in practice. You will not only be given the chance to experience how data can be transferred from Python to SQL first hand, but you will also learn about the structure necessary for connecting two compatible software tools. Finally, we will export the dataset in the form of a *.csv file that’s ready to be used in Tableau.
Section 6: Analyzing the Obtained Data in Tableau
In the last section of this course, we focus on the analytical part of the absenteeism task. We will load, analyze, and visualize in Tableau the data obtained in the previous sections.
Machine and deep learning are of those quantitative analysis skills that differentiates the data scientist from the other members of the team. The field of machine learning is the driving force of artificial intelligence. This course will teach you how to leverage deep learning and neural networks from this powerful tool for the purposes of data science.
More info goes here
Section 1: Course Introduction
In this introductory part of the course, we will discuss why you will need machine learning when working as a data scientist, what you will see in the following chapters of this training, and what the best way to take the course is.
Section 2: Introduction to neural networks
The basic logic behind training an algorithm involves four ingredients:
- objective function
- and an optimization algorithm;
In this part of the course, we describe each of them and build a solid foundation that allows you to understand the idea behind using neural networks. By completing this chapter, you will know what the various types of machine learning are, how to train a machine learning model, and understand terms like objective function, L2-norm loss, cross-entropy loss, one gradient descent, and n-parameter gradient descent
Section 3: Minimal example – your first machine learning algorithm
At this point it is time to build your first machine learning algorithm. We will show you how to import the relevant libraries, how to generate random input data for the model to train on, how to create the targets the model will aim at, and how to plot the training data. The mechanics of this model exemplify how all regressions you’ve run in different packages (scikit-learn) or software (Excel) work. This is an iterative method aiming to find the best fitting line.
Section 5: TensorFlow – An introduction
Here we introduce the TensorFlow framework – a deep learning library developed by Google. It allows you to construct fairly sophisticated models with little coding. This intro section teaches you what are tensors and why the TensorFlow framework is one of the preferred tools of data scientists in 2018.
Section 6: Going deeper: Introduction to deep neural networks
Let’s dig a little deeper. From this section on, we will explore deep neural networks. Most real-life dependencies cannot be modelled with a simple linear combination (as we have done so far). And because we want to be better forecasters, we need better models. Most of the time, this means working with a model that is more sophisticated than a liner model. In this section, we will talk about concepts like deep nets, non-linearities, activation functions, softmax activation, and backpropagation. Sounds a bit complex, but we have made it easy for you!
Section 7: Backpropagation. A peek into the mathematics of optimization
In order to get a truly deep understanding of deep neural networks, one must look at the mathematics of them. As backpropagation is at the core of the optimization process, we wanted to introduce you to it and prepared materials that will help you understand this topic better and have a better idea of what happens behind the curtain.
Section 8: Overfitting
Some of the most common pitfalls you can have when creating predictive models, and especially in deep learning is to either underfit or overfit your data. This means to either take less advantage of the machine learning algorithm than you could have due to insufficient training (underfitting), or alternatively create a model that fits the training data too much (over-train the model) that it is not suitable for a different sample (overfitting).
Section 9: Initialization
Initialization is the process in which we set the initial values of weights. It is an important aspect of building a machine learning model. You will learn how initialize the weights of your model and how to apply Xavier initialization.
Section 10: Gradient descent and learning rates
The gradient descent iterates over the whole training set before updating the weights. Every iteration updates the weights in a relatively small way. You will learn common pitfalls related to this method and how to boost them, using stochastic gradient descent, momentum, learning rate schedules, and adaptive learning rates.
Section 11: Preprocessing
A large part of the effort data scientists make when creating a new model is related to preprocessing. This process refers to any manipulation we apply to the dataset before running it and training the model. Learning how to preprocess data is fundamental for anyone who wants to be able to create machine learning models, as no meaningful framework can simply take raw data and provide an answer. In this part of the course, we will show you how to prepare your data for analysis and modeling.
Section 12: The MNIST example
All the lessons so far will have given you a solid preparation. Your patience will now pay off when we start coding. Everything will fall in place nicely. The problem we will solve here is the “Hello, world” of machine learning. It is called MNIST classification and consists of 70,000 hand written digits. Together, we will create an algorithm that takes as input an image and then correctly determines, which number is shown in that image.
Section 13: Solving a real-life business case
Solving the MNIST example will have shown you that machine learning is not that hard after all, right? In this section, we will solve a real-life business case, such as the ones data scientists solve on the job. You will build a model that will determine how likely is it that a specific client will come back and buy another product from a company selling audiobooks. This is a great example of how machine learning can help a company optimize its marketing efforts and ultimately grow its bottom line results.
Section 14: Next steps
At this point we guide you how to continue your specialization and data science journey. In this section, we discuss what is further out there in the machine learning world, how Google’s DeepMind uses machine learning, what are RNNs, and what non-NN approaches are there.