How to deal with dates, dates with timezones and dates with particular formats with python and pandas (code snippets for different use cases)

Context

When dealing with dates, I’ve sometimes problems because the source is not clean, or not all the rows have the same format. Additionally, dates can be simple (year-month-day) or really complicate like a timestamp with timezone.

Here are some code snippets for several use cases.

How to read dates with python and pandas

# import pandas
import pandas as pd

# df is a dataframe with a column 'column_with_date' with a date like this '19.01.2023 16:45:46 +01:00'
# convert date string to datetime with pd.to_datetime
pd.to_datetime(df['column_with_date'])

# sometimes the format is a bit weird and pandas cannot recognize it. In this case we give the date format as argument.
# for this format 19.01.2023 16:45:46 +01:00 we can use:
pd.to_datetime(df['column_with_date'], format='%Y-%m-%d %H:%M:%S%z')

# if your string has timezone, use utr=True
pd.to_datetime(df['column_with_date'], format='%Y-%m-%d %H:%M:%S%z', utc=True)

# sometimes your columns are as objects (strings) and numbers (floats, ints) and to_datetime cannot process it. 
# You can force the type string to the whole column before giving it to to_datetime.

pd.to_datetime(df['column_with_date'].astype(str), format='%Y-%m-%d %H:%M:%S%z', utc=True)

# pd.to_datetime does not modify the column values in place, so you have to assign it to the same column.

df['column_with_date'] = pd.to_datetime(df['column_with_date'].astype(str), format='%Y-%m-%d %H:%M:%S%z', utc=True)

# or save it to another (new) column if you want to save the original value
df['column_with_date_2'] = pd.to_datetime(df['column_with_date'].astype(str), format='%Y-%m-%d %H:%M:%S%z', utc=True)

How to save dates with UTC (tiemzone) to an Excel-File with Pandas

Saving datetime columns with timezones in pandas is not supported, and you will get the following error if you try:

ValueError: Excel does not support datetimes with timezones. Please ensure that datetimes are timezone unaware before writing to Excel.

To remove the timezone from a datefield (column dtype datetime64[ns, UTC]) you can use .dt.date()

# remove timezones for excel
df['column_with_date'] = df['column_with_date'].apply(lambda d: pd.to_datetime(d).date()) 

# save te file as usual
df.to_excel('filename.xlsx')

References

  • https://stackoverflow.com/a/63238008/624088
  • https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes -> format codes for datetime strings
  • try and error :)

Comments

Comments powered by Disqus