Last answered:

31 May 2021

Posted on:

28 May 2021

1

question on pd.to_datetime

Hi,

How does pandas know if the string is month or day? e.g. 04/07/2018, how does pandas know if it is july or april?

Thank you

3 answers ( 0 marked as helpful)
Instructor
Posted on:

28 May 2021

3

Hi Jaycus!
Great to have you in the course and thanks for reaching out!
Pandas can recognize an eventual datetime value automatically by analyzing the given string pattern if it abides by a pre-defined standard. This standard orders the date components in the following way: DAY - MONTH - YEAR or YEAR - MONTH - DAY respectively. This means that if you use:
04/07/2018
or
2018/07/04
pandas will interpret both string values as dates and format these accordingly (using ISO 8601). But if you try with:
04/14/2018 it will lead to an error message because the order: MONTH - DAY - YEAR doesn't abide by the pre-defined standard and pandas will "think" that you are trying to use 14 as a month value which is not possible.

Hope this helps.
Best,
Ivan

Posted on:

28 May 2021

2

Hi Ivan,

Thanks for the reply,a bit confused how pandas determine if 07 or 04 is the month or day

e.g. in the video, from the column StartDate, seems like 07 from 04/07/2018 is the month, based on the 5th row 28/10/2017

However, after runnning pd.to_datetime(lending_co_data['StartDate']), it shows that 07 is the day now as per below screenshot.

image.png                                     image.png

Instructor
Posted on:

31 May 2021

4

Hi Jaycus!

Thank you for sharing this information with the Community! Your observation is a valuable remark!

In cases, when string dates don't start with the year value (2018/07/04) but end with it instead (04/07/2018), to_datetime() will convert strings considering the first value as a month (MM/DD/YYYY) by default.
If we want pandas to consider the day value first instead of the month, we can use the following dayfirst argument set to True. Like this:

pd.to_datetime(lending_co_data['StartDate'], dayfirst = True)

However, if the string date is formatted like this: 23/07/2018, pandas will 'see' that 23 > 12 (which means it cannot be a month) and will automatically abide by the other formatting order (DD/MM/YYYY). Thus, it will interpret the first value as a day value indeed.

Thank you for pointing this out!

Best,
Ivan

Submit an answer