Counting request timeouts with TimeoutManager

Counting request timeouts with TimeoutManager

https://github.com/Microsoft/TimeoutManager

Sometimes, things take longer than expected.

And that’s fine. Not everything is related to your code. Obviously, your code is perfect. Sometimes it’s just a dependency your service is using. Sometimes it’s networking issues. In any case, you probably want to give your service’s clients a consistent “timeout experience”. That is, two requests that use similar workflows should give a “timeout” response to the client after the same period of time.

There are many ways to implement such functionality, but one of my favorite ones is by using a utility class\library called “TimeoutManager” (I might be a little bit biased as I implemented most of it myself).

TimeoutManager is a simple thread-safe, request-timeout-counting component which has a simple API:

  • A method to start counting a request’s timeout until it elapses.
  • Cancel the timeout counting of a request (usually used when the processing of the request was completed or has failed).
  • An event which fires every time a request times out. You will usually subscribe to this event from a class that knows how to send a “Timeout” response back to the original request.

The main advantage of the TimeoutManager is that you get a centralized place to manage the timeout counting of requests (or anything you want), which can be accessed everywhere you need in your codebase (through dependency injection).

The TimeoutManager class is limited to only support a single timeout interval at a time, meaning that items that are counted on the same TimeoutManager will always timeout after the same period of time. Therefore, in order to support the ability to have multiple timeout periods to different items, we have implemented the MultiTimeoutManager class, which is basically just a wrapper to multiple timeout managers each working with a different timeout interval.

Implementation:

Without diving into too much detail, let’s talk about the basics of the TimeoutManager implementation (feel free to look at source code to learn more).

One of the basic data structured used here is a QueueWithRemove.

QueueWithRemove is a thread safe data structure which gives a queue API with the ability to remove an item from anywhere in the queue in O(1) time on average. Basically it is a dictionary that maps items (or requests) to a linked list node, and a linked list whose nodes keep the order of the items (where the first node has the oldest ‘item’). The Remove() function basically looks up the item in the dictionary, finds the node in the linked list that has the item targeted for removal, and removes it from the linked list and the dictionary.

Each item is kept in the QueueWithRemove with the timestamp of when the item’s timeout count has started. Every time you ask the TimeoutManager to start counting timeout for a new ‘item’, internally, the ‘item’ is enqueued to the end of the internal QueueWithRemove.

The reason items are always added to the end of the QueueWithRemove is is because all the items whose timeout is counted in the same TimeoutManager instance have the same timeout period. That means that all the other items that are already in the QueueWithRemove will always timeout sooner than the newly added item.

The timeout manager uses an internal timer that triggers a callback function every second (customizable). The function then peeks at the first item in the QueueWithRemove and checks if the timeout period for the item has elapsed. If it has, the item is dequeued from the QueueWithRemove and the ItemTimedOut event is triggered with the timed out item. It continues to peek at the first item in the QueueWithRemove, until it is either empty, or the first item in it has not timed out.

Whenever TryCancelTimeout is called with an item, the item is removed from the internal QueueWithRemove (if its there) and a true\false result is returned if the item existed (and was removed).

Some of the complexity of the implementation itself is due to the fact that all the above is achieved while the class is fully thread-safe.

 

I hope you will it useful!

Nadav

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s