When LLM Batch Processing is Faster than Synchronous

AI | 0 comments

A project I am working on for JM Addington Technology Solutions and CyberSecureRIA involves summarizing > 500,000 emails and documents with an LLM. That’s a lot!

There will be ongoing work, but the first 500,000 are a bulk process. To save money, I used OpenAI’s batch processing1. The batch endpoint costs exactly 50% of the normal model price and promise a 24-hour turnaround, but I found that in reality, it’s MUCH faster than that.

First, I’ll note that there are only 1440 minutes in the day, to process 500k entries in one day would be about 350 requests per second. That’s not happening on most hardware, and it’s definitely not happening with Python. But 1 request per minute with 350 entries is easy-peasy.

So lesson one: raw number of entries can make batch processing faster than synchronous processing.

Second, the 24 hour SLA is worst case scenario. In actual usage, the time it takes to turn around a batch seems primarily be related to the batch’s size, smaller batches get processed faster. This intuitively makes sense as I assume the batch API is available in part to allow OpenAI to make use of otherwise dead computational time. So the batch requests are getting fit in between other requests.

As I write this at 20:00 Eastern, batch jobs with 100 items each are completing in 3-4 minutes.

Lesson 2: small batches get processed faster. Much faster.

Finally, I had some large-ish batch jobs running on Thanksgiving. Those were also completing noticeably faster than they had the day before.

Lesson 3: holidays are faster for batch jobs.

At half off, I can easily wait an extra 3-4 minutes for what is essentially background processing, and will play with even smaller batches in the future.

  1. I used Azure OpenAI initially, but they don’t support batch processing on all models. Maddeningly. ↩︎

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.