Friday, October 02, 2015

Rate limiting Shopify API using Cuttle

During the development of a Shopify app, it is required to respect the API rate limit set by Shopify. Typically, we can use sleep() statement to make pause between API calls. This simple method works great until there are multiple processes that make API calls concurrently.

There are quite a number of ways to solve the problem.

1. Serialize all API calls into a single process, though not all business logics can work in this way.
2. Host a RPC server / use a task queue to make API calls. The RPC server / queue manager has to rate limit the API calls. [http://product.reverb.com/2015/03/07/shopify-rate-limits-sidekiq-and-you/]
3. Centralize all API calls with a HTTP proxy where the proxy performs rate limiting.

Personally, I think the RPC server / task queue option is quite heavy weighted since that requires:

* A RPC / task framework, and
* A RPC server / task queue, and
* A rate limit system built around the RPC server / task queue.

In contrast, the HTTP proxy option only requires a HTTP proxy server plus a HTTP client. And, HTTP is well supported in many programming languages and systems. It sounds as a great starting point.

(BTW, HTTP can be considered as the underlying protocol of a RPC system.)

With the HTTP proxy option, there are quite a few options to get started.

1. Use Nginx reverse proxy to wrap the API, use its limit module to perform simple rate limit or write a Lua/JS plugin for more sophisticated control. [http://codetunes.com/2011/outbound-api-rate-limits-the-nginx-way/]
2. Use Squid forward proxy to perform simple rate limit by client info (e.g. IP address).

At the first glance, the Nginx reverse proxy option looks superior since we can have sophisticated rate limit control deployed. Though, using such approach would need to use the Nginx wrapped URL of Shopify API. Or, we have to modify DNS/host configuration to route the traffic.

Personally, I am not comfortable in modifying the URL to Shopify API since that may prevent a smooth upgrade of the Shopify API client in the future. For the DNS option, shall I modify the DNS config once per a new Shopify store install the app?

(We may also route all traffic to the default virtual host of Nginx and use Lua/JS plugin for the host routing. This does not require URL wrapping or DNS configuration. Though, I personally think this is kinda abusing Nginx.)

So, reverse proxy may not be a good way to go. Let's come to the forward proxy option. In this case, we do not need to do anything on the URL to Shopify API and just let the traffic goes through the proxy by configuring the HTTP client. A forward proxy with rate limit control sounds like a good way to go.

Here, we come to Cuttle proxy. [http://github.com/mrkschan/cuttle]

Cuttle proxy is a HTTP forward proxy solely designed for outbound traffic rate limit using goroutine. It would provide a set of rate limit controls for different scenarios. In case of Shopify API, we can use the following Cuttle settings to perform rate limiting.

addr: :3128
zones:
  - host: "*.myshopify.com"
    shared: false
    control: rps
    rate: 2
  - host: "*"
    shared: true
    control: noop

Then, set the HTTP proxy of the Shopify API client like below to route API calls through Cuttle.

# apiclient.py
import shopify

shop_url = 'https://{}:{}@{}/admin'.format(API_KEY, PASSWORD, SHOPIFY_DOMAIN)
shopify.ShopifyResource.set_site(shop_url)

print json.dumps(shopify.Shop.current().to_dict())

# Run
HTTPS_PROXY=127.0.0.1:3128 python apiclient.py

As long as all API clients are configured to use Cuttle, API calls will be rate limited at 2 requests per second per Shopify store. So, the rate limit bucket would rarely go empty.

Note: It is up to you to set the rate of API calls in Cuttle, using 3 requests per second per store would be another great option. You will receive HTTP 429 sent by Shopify roughly after 120 continouos API calls to the same store over 40 seconds.

Note: API calls will be forwarded by Cuttle using the first come first serve manner. If the concurrency level of API calls to the same Shopify store is high, some API calls will wait for a significant amount of time instead of receiving HTTP 429 sent by Shopify immediately. Remember to set a reasonable HTTP timeout in that case.

(FYI, the Shopify API rate limit only favors high concurrency level for a short duration. If you really need that in your case, Cuttle would not be a good option.)
 
© 2009 Emptiness Blogging. All Rights Reserved | Powered by Blogger
Design by psdvibe | Bloggerized By LawnyDesignz