How to deal with dates, dates with timezones and dates with particular formats with python and pandas (code snippets for different use cases)
Context
When dealing with dates, I’ve sometimes problems because the source is not clean, or not all the rows have the same format. Additionally, dates can be simple (year-month-day) or really complicate like a timestamp with timezone.
Here are some code snippets for several use cases.
How to read dates with python and pandas
# import pandas import pandas as pd # df is a dataframe with a column 'column_with_date' with a date like this '19.01.2023 16:45:46 +01:00' # convert date string to datetime with pd.to_datetime pd.to_datetime(df['column_with_date']) # sometimes the format is a bit weird and pandas cannot recognize it. In this case we give the date format as argument. # for this format 19.01.2023 16:45:46 +01:00 we can use: pd.to_datetime(df['column_with_date'], format='%Y-%m-%d %H:%M:%S%z') # if your string has timezone, use utr=True pd.to_datetime(df['column_with_date'], format='%Y-%m-%d %H:%M:%S%z', utc=True) # sometimes your columns are as objects (strings) and numbers (floats, ints) and to_datetime cannot process it. # You can force the type string to the whole column before giving it to to_datetime. pd.to_datetime(df['column_with_date'].astype(str), format='%Y-%m-%d %H:%M:%S%z', utc=True) # pd.to_datetime does not modify the column values in place, so you have to assign it to the same column. df['column_with_date'] = pd.to_datetime(df['column_with_date'].astype(str), format='%Y-%m-%d %H:%M:%S%z', utc=True) # or save it to another (new) column if you want to save the original value df['column_with_date_2'] = pd.to_datetime(df['column_with_date'].astype(str), format='%Y-%m-%d %H:%M:%S%z', utc=True)
How to save dates with UTC (tiemzone) to an Excel-File with Pandas
Saving datetime
columns with timezones in pandas is not supported, and you will get the following error if you try:
ValueError: Excel does not support datetimes with timezones. Please ensure that datetimes are timezone unaware before writing to Excel.
To remove the timezone from a datefield
(column dtype
datetime64[ns, UTC]
) you can use .dt.date()
# remove timezones for excel df['column_with_date'] = df['column_with_date'].apply(lambda d: pd.to_datetime(d).date()) # save te file as usual df.to_excel('filename.xlsx')
References
- https://stackoverflow.com/a/63238008/624088
- https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes -> format codes for datetime strings
- try and error :)
Comments
Comments powered by Disqus