different number in course project
I am working on the data analysis of conversion rates for the Free-to-Paid conversion rate practice project. In your sanity check note you have mentioned that
"The number of records in the resulting set should be 20,255."
I have joined the two tables with an inner join and I have taken an min(date_watched_value) made sure I was not collecting any null values, and I have grouped by student_id number and I am still getting 20,778 as the number of distinct students who have engaged with after registering. I have used inner joins, and used where Classes to have date_watched IS NOT NULL.
Is it possible that the data set changed adn the number of records was not updated?
I found the the place where the 20,255 come from.
When I calculated the Purchase date - WatchDate value, there were several people who had purchased their subscriptions before ever watching and this reflected in the datediff calculation being negative, so When I calculated the number of people who had purchased first (had a negative date_diff_watch_purch) and counted them I found the 523 students that made up the difference between my 20,778 value and my 20,255 value. So, So I will be excluding these 523 students from my next calculations because they will not be free to paid conversions.