pandas Dates and Times
Basics
Visualization is one of the crucial foundations of practicing data science, and analyzing information on a timeline can be vital to understanding its behavior. However, our understanding and structure of time is difficult to translate into raw computing — we can recognize that the phrase "8/1/2022" is the day that follows "7/31/2022", but without help, a computer would just see those as two different strings.
To help, pandas introduced the to_datetime function to convert strings (like the ones above) to Timestamp objects, which have many attributes that help translate our human perception of dates and times for the computer’s sake.
|
Python already has a |
to_datetime
Let’s start with an example DataFrame that contains event types, attendance numbers, and the dates on which the events occur. We’ll check the data types of our columns to see what pandas starts out with.
import pandas as pd
list_1 = ['7/1/2021', '7/4/2021', '8/20/2021', '9/2/2021', '4/1/2050']
list_2 = ['Presentation', 'Class', 'Presentation', 'Event', 'Class']
list_3 = [25, 0, 50, 48, 1000000]
myDF = pd.DataFrame(zip(list_1, list_2, list_3), columns=['date', 'type', 'attendance'])
print(myDF.dtypes)
date object type object attendance int64
As we can see from the output, date is a general object — this doesn’t help us, as we need a Timestamp object and its host of useful functions and attributes. The simplest way to convert this only requires our to_datetime function, the column we want to change, and the format our dates come in:
myDF['date'] = pd.to_datetime(myDF['date'], format='%m/%d/%Y')
print(myDF.dtypes)
date datetime64[ns] type object attendance int64
Hold on — what’s with the datetime64[ns] object? In spite of this output, if we test the type of our individual dates, we find that they are indeed Timestamp objects. The implication is that all the Timestamp attributes are usable when working with date.
type(myDF.date[0])
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
format Argument
The date column is now saved with the data type that we want. If you recall, we had an odd-looking format argument. The idea is that we are telling pandas how to parse the date: month first, then day, then year, separated by forward slashes. If the dates had dashes or some other separator, we could indicate this in format to ensure pandas is reading it correctly.
format does pay attention to capitalization. For example, %y means the date excludes century, while %Y includes century (think 13 compared to 2013). There are helpful tables to explain the different formats, and we’ll include the main ones here:
-
Month:
-
%b— abbreviated month, "Oct";%B— full month name, "October" -
%m— month number with leading zero, "04";%-m— month number, no leading zero, "4"
-
-
Day:
-
%a— abbreviated weekday, "Mon";%A— full weekday name, "Monday" -
%d— day number with leading zero, "07";%-d— day number, no leading zero, "7"
-
-
Year:
-
%y— last two numbers of year, "14";%Y— full year, "1914"
-
|
|
Dates/Times Objects, Methods, and Attributes
Converting data using to_datetime is only useful if we can extract further information using Timestamp and other pandas packages.
Time Differences: Timedelta and DateOffset
The Timedelta and DateOffset objects are simple: they are time/date objects different from date times that can be supplemented a difference between dates (4 days from now, 4 months ago, etc) to find the desired future/past date. The key difference is that Timedelta works with absolute time (1 day = 24 hours), while DateOffset works with calendar time, accounting for things such as daylight savings time. DateOffset also lacks an array class corresponding to timedelta64[ns].
Timedelta
For Timedelta, you can use either keyword arguments or a pair including (value, unit) to indicate the duration of time.
import pandas as pd
day_example = pd.Timedelta(days=6)
week_example = pd.Timedelta(weeks=14)
print(day_example)
print(week_example)
6 days 00:00:00 98 days 00:00:00
As we can see here, Timedelta works in days, hours, minutes, and seconds, where the equivalency of 1 week = 7 days applies before output. Since months and years have differing numbers of days, you cannot use them when generating Timedeltas.
DateOffset
Imported from pandas.tseries.offsets, DateOffset is another method allowing for the changing of times. Since this category uses calendar-based logic, we can supplement years, months, or weeks as keywords.
|
A plural parameter (such as |
print(pd.Timestamp("2019-12-25") + pd.DateOffset(day=1))
print(pd.Timestamp("2019-12-25") + pd.DateOffset(days=1))
2019-12-01 00:00:00 2019-12-26 00:00:00
Knowing the different uses of the two data types is important.
-
DateOffsetis useful due to its broader acceptance of time parameters and use of calendar logic. We often want to know the dates of things weeks, months, and years in advance, and it’s inconvenient to translate those to number of days for use withTimedelta. Additionally, 4 months from January 15th is informally understood to be May 15th, andDateOffsetunderstands this whereTimedeltadoes not. -
Timedeltacan be interpreted byDataFramesandSeries, whileDateOffsetcannot and is cast as a simpleobject. Additionally, anyDateOffsettime measurements equal to or shorter than an hour function likeTimedelta.
It’s important to think of these two objects as cooperating tools to your success rather than mutually exclusive options.
dt operator
You’ll notice that Timestamp attributes include basic elements of date information — month, day, year, second, day_of_week, and so on. Let’s try and creating a month column from date in myDF:
myDF['month'] = myDF['date'].month
AttributeError: 'Series' object has no attribute 'month'
As the error message says, we can’t get month from the date column because it’s a Series, not a Timestamp. Instead of looping through each value or using apply, we have the dt accessor, allowing us to use Timestamp attributes and functions column-wide. Now we can create our month column:
myDF['month'] = myDF['date'].dt.month
print(myDF)
date type attendance month 0 2021-07-01 Presentation 25 7 1 2021-07-04 Class 0 7 2 2021-08-20 Presentation 50 8 3 2021-09-02 Event 48 9 4 2050-04-01 Class 1000000 4
Examples
Create month and year columns from our date column.
Click to see solution
myDF['month'] = myDF['date'].dt.month
myDF['year'] = myDF['date'].dt.year
print(myDF)
date type attendance month year 0 2021-07-01 Presentation 25 7 2021 1 2021-07-04 Class 0 7 2021 2 2021-08-20 Presentation 50 8 2021 3 2021-09-02 Event 48 9 2021 4 2050-04-01 Class 1000000 4 2050
Create the weekday column from date.
Click to see solution
myDF['weekday'] = myDF['date'].dt.day_name()
print(myDF)
date type attendance weekday 0 2021-07-01 Presentation 25 Thursday 1 2021-07-04 Class 0 Sunday 2 2021-08-20 Presentation 50 Friday 3 2021-09-02 Event 48 Thursday 4 2050-04-01 Class 1000000 Friday
Shift all the days in date forward by one week, replacing the old dates in the process.
Click to see solution
myDF['date'] = myDF['date'] + pd.Timedelta("7 days")
print(myDF)
date type attendance 0 2021-07-08 Presentation 25 1 2021-07-11 Class 0 2 2021-08-27 Presentation 50 3 2021-09-09 Event 48 4 2050-04-08 Class 1000000
Suppose myDF.date contains exclusively days from the first semester of an academic year, and each year ends on May 31st. Create the end_of_school column using the date column and DateOffset, which contains the last day of school for that academic year. Then create days_until_school_is_over, a column that contains the number of days between date and end_of_school.
Click to see solution
one_year_later = myDF['date'] + pd.offsets.DateOffset(years=1)
myDF['end_of_school'] = pd.to_datetime({'month': 5, 'day': 31, 'year':one_year_later.dt.year})
myDF['days_until_school_is_over'] = myDF['end_of_school'] - myDF['date']
print(myDF)
date type attendance end_of_school days_until_school_is_over 0 2021-07-01 Presentation 25 2022-05-31 334 days 1 2021-07-04 Class 0 2022-05-31 331 days 2 2021-08-20 Presentation 50 2022-05-31 284 days 3 2021-09-02 Event 48 2022-05-31 271 days 4 2050-11-01 Class 1000000 2051-05-31 211 days
Resources
As always, the information we include here is just a portion of all you can know about using dates and times in pandas. They have a great, extensive user guide that includes DatetimeIndex, indexing using dates, and much more.