Questions about the two solutions
The second solution doesn't take into account days where there is more than one ride. Are all dates unique in this table?
I'm also unsure why Tina was averaging the distance per dollar for each date in a CTE in the first solution. I think I personally would use SUM on each variable and then divide.
Hey, I'm not sure about the second solution but as you mentioned about the first solution; wouldn't averaging the distance per dollar for each date be the same as using SUM and then dividing. Since, as you said, there could be multiple dates, so averaging would be the ideal solution to get it for each date ( and also while using group by, it is necessary to use an aggregate function). Also, thinking about the second solution, I think you're right Tina didn't take into account more than one ride but I guess the window function takes care of the unique values since we're not even using a group by statement in the second solution anyways.
Hi! Regarding this: "wouldn't averaging the distance per dollar for each date be the same as using SUM and then dividing." To clarify first, what I meant by using sum on each variable and then dividing was having this: SUM(distance_to_travel)/SUM(monetary_cost) for each date. Note that the question here was looking for "the distance per dollar for that date" and not "the average distance per dollar for that date."
Doing SUM(distance_to_travel)/SUM(monetary_cost) will provide a different output than AVG(distance_to_travel/monetary_cost), as the latter will run the calculation for each ride first and then average the results for the day. SUM(x) / SUM(y) is not equal to SUM(x/y) / n.
About the unique dates issue, I found the original interview question on StrataScratch and they've added in "Assume that all dates are unique in the dataset" so that answers my first question. :)
Hey, thanks for explaining. You clarified a lot of my doubts as well :)