different number in course project
I am working on the data analysis of conversion rates for the Free-to-Paid conversion rate practice project. In your sanity check note you have mentioned that
"The number of records in the resulting set should be 20,255."
I have joined the two tables with an inner join and I have taken an min(date_watched_value) made sure I was not collecting any null values, and I have grouped by student_id number and I am still getting 20,778 as the number of distinct students who have engaged with after registering. I have used inner joins, and used where Classes to have date_watched IS NOT NULL.
Is it possible that the data set changed adn the number of records was not updated?
I found the the place where the 20,255 come from.
When I calculated the Purchase date - WatchDate value, there were several people who had purchased their subscriptions before ever watching and this reflected in the datediff calculation being negative, so When I calculated the number of people who had purchased first (had a negative date_diff_watch_purch) and counted them I found the 523 students that made up the difference between my 20,778 value and my 20,255 value. So, So I will be excluding these 523 students from my next calculations because they will not be free to paid conversions.
Hey Sarah,
Thank you for reaching out.
That's precisely one of the points that need to be considered when solving the task. It's highlighted in the guided version of the project but not in the unguided one.
Kind regards,
365 Hristina