Python datetime ate all my time
Context
We do pass timestamp in a lot of our APIs. While writing integration test for one of the new APIs we were seeding some data for the API call. One of the data point was the timestamp which was to be sent as a param in the request.
As a general practice this is what is done across our API calls.
- On the caller side:
- we convert timestamp to
isoformat()and then - encode it using
urllib.parse.quote_plus()
- we convert timestamp to
- On the receiver side: we get timestamp as str so we
- first decode it using
urllib.parse.unquote_plus()and then - parse using
datetime.fromisoformat()
- first decode it using
This looks logically correct and the right way to do things, but, there are nuances that lead to us wasting time while writing integration tests. This understanding may also help us be wary of data points we actually store and use as API calls as params or request body.
Understanding our problem
As we always do we were taking a timestamp - in UTC - then converting it to isoformat() and then encoding it using urllib.parse.quote_plus(). We were passing this as params to API calls. Code snippet specifies exactly what we were doing while writing tests.
On the receiver end as we always do we were getting the timestamp in string - we were decoding it and then building the date time object from the decoded string using datetime.fromisoformat().
The problem was it always errored out on second step in receiver end saying - Not a valid ISO format.
Exploration walkthrough
Nuances were most likely introduced because we have a slight change in how we were creating the data while writing these tests to be sent as param. Because datetime.datetime.utcnow() is marked deprecated we instead used datetime.datetime.now(tz=datetime.timezone.utc) as recommended by the docs.
Okay what differentiates the two. they do look to do exactly the same thing!
If we look at datetime.datetime.utcnow() in the date time object it creates it does not attach the tz info. It is just a datetime object as datetime.datetime(2025, 4, 14, 15, 42, 43, 313977).
Also if you take datetime object and call tzname() on it you will also not see any details since tz is technically None. But to call out explicitly the datetime taken here is in fact UTC datetime.
It is same when you do datetime.datetime.now() - this also takes default tz as utc BUT even this function will not instantiate tz in the datetime object and will keep it None.
However if you compare this with datetime.datetime.now(tz=datetime.timezone.utc) - this will set the tz to UTC and this makes all the difference while converting this date time object to isoformat. As per isoformat() docs:
"""
If self.tzinfo is not None, the UTC offset is also attached, giving
giving a full format of 'YYYY-MM-DD HH:MM:SS.mmmmmm+HH:MM'.
"""
the isoformat would be '2025-04-14T07:47:01.641667+00:00' (pay attention to the additional +00:00 at the end). And this then gets encoded as '2025-04-14T07%3A47%3A01.641667%2B00%3A00' which is then sent as params.
Lets take this latest one which is the date with tz info as utc along the steps on the receiver route:
- decode it using
urllib.parse.unquote_plus()-- this works- interesting - this results into something like
'2025-04-14T07:47:01.641667 00:00'- THE PLUS VANISHES!
""" Like unquote(), but also replace plus signs by spaces, as required for unquoting HTML form values. unquote_plus('%7e/abc+def') -> '~/abc def' """ - interesting - this results into something like
- Because of this additional gap
datetime.fromisoformat()fails - we have found our RC!
Using datetime.datetime.now()
>>> now = datetime.datetime.now()
>>> now
datetime.datetime(2025, 4, 14, 15, 42, 43, 313977)
>>> now.isoformat()
'2025-04-14T15:42:43.313977'
>>> urllib.parse.quote_plus(now.isoformat())
'2025-04-14T15%3A42%3A43.313977'
Service log:
2025-04-14 15:44:54 Received request customer_id: customer123 timestamp: 2025-04-14T15:42:43.313977 lookahead_window_in_seconds: 3600
2025-04-14 15:44:54 unquoted_timestamp: 2025-04-14T15:42:43.313977 base_timestamp: 2025-04-14 15:42:43.313977
No TZ info in date time object:
>>> now = datetime.datetime.now() # works
>>> now.tzname()
>>>
# empty - nothing is returned
Using datetime.datetime.now(tz=datetime.timezone.utc)
>>> nowt = datetime.datetime.now(tz=datetime.timezone.utc)
>>> nowt
datetime.datetime(2025, 4, 14, 7, 47, 1, 641667, tzinfo=datetime.timezone.utc)
>>> nowt.isoformat()
'2025-04-14T07:47:01.641667+00:00'
>>> urllib.parse.quote_plus(nowt.isoformat())
'2025-04-14T07%3A47%3A01.641667%2B00%3A00'
>>> urllib.parse.unquote_plus(nowt.isoformat())
'2025-04-14T07:47:01.641667 00:00' # THE TROUBLESOME GAP
Service log:
2025-04-14 15:50:25 Received request customer_id: customer123 timestamp: 2025-04-14T07:47:01.641667+00:00 lookahead_window_in_seconds: 3600
2025-04-14 15:50:25 Exception: Invalid isoformat string: '2025-04-14T07:47:01.641667 00:00'
TZ info is attached in date time object:
>>> nowt = datetime.datetime.now(tz=datetime.timezone.utc)
>>> nowt.tzname()
'UTC'
Using datetime.datetime.utcnow()
(most used but python marked this deprecated)
>>> nowx = datetime.datetime.utcnow()
DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
>>> nowx
datetime.datetime(2025, 4, 14, 7, 52, 45, 307111)
>>> nowx.isoformat()
'2025-04-14T07:52:45.307111'
>>> urllib.parse.quote_plus(nowx.isoformat())
'2025-04-14T07%3A52%3A45.307111'
No TZ info in date time object:
>>> nowx = datetime.datetime.utcnow() # works
>>> nowx.tzname()
>>>
# empty - nothing is returned
Consolidating towards solution
There are two ways to think here:
- Basically having a tz info attached to our time object is going to be an issue if that value needs to be sent as a param for our APIs
- Should we not use
unquote_plus()while decoding timestamp because that would lead to us not being able to handle timezone attached datetime strings
Since I used unquote_plus() ONLY BECAUSE it was already and always being used in other routes using timestamp - maybe the question to ask here could also be:
Why do we need to use unquote_plus() to decode the timestamp string?
- Unable to think
+being used anywhere in a timestamp string other than when trying to denote the timezone - which is valid within iso format datetime.fromisoformat()can actually extract the datetime from a string with a+or--- its the SPACE that it has problem with
Action item:
- Check with Divyang and Vineesha or engg if they have more context on why
unquote_plus()and why NOT simplyunquote()while decoding timestamp str in API requests
How would we build resilience between tz aware and tz unaware timestamp?
If we use quote() instead of quote_plus() then we basically allow timezone aware timestamp to be used. This may have its own challenges like somewhere in our code we may compare tz aware timestamp with non tz aware code. And this may result in error as follows:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't compare offset-naive and offset-aware datetimes