Porting the Problem
Make sure your rewrite/migration is fixing the problem, not simply moving it.
Perception
One of the teams I worked with was pouring a great deal of effort into moving slow background processing from Ruby/Sidekiq to JavaScript/Lambda, and Ruby was considered “legacy” by much of the team. The prevailing perception was that JavaScript was inherently faster than Ruby, Lambda was far superior to Sidekiq, and anything ported between them would automatically be faster.
Soon after joining the team, I made a database change that caused a Ruby unit test to fail. It was an easy fix, and I included it in the changes I submitted for code review. One of the reviewers commented that the code I fixed was obsolete and should be removed. Out of curiosity, I took another look at the Ruby code—and the JavaScript that replaced it.
I didn’t like what I found. Here’s the Ruby process in a nutshell:
- Query ~100,000 rows from table A in chunks of 1,000 using
LIMIT
andOFFSET
- Break each chunk of 1,000 into smaller chunks of 10 for insertion into table B
This didn’t happen in nested loops. Instead:
- The top-level routine read each chunk of 1,000 and queued it for level 2 using Sidekiq
- The level 2 routine broke it into chunks of 10 and queued those for level 3 using Sidekiq
- The level 3 (and final) routine inserted each chunk of 10 into table B using a single
INSERT
statement
The data was modified during the process, so it couldn’t be a direct table-to-table copy. But if you follow the convoluted process above and do the math, you’ll see:
- Table A is queried 100 times
- Table B is written to 10,000 times
- Sidekiq handles 10,101 invocations across 3 routines
As I was going to St Ives,
I met a man with seven wives,
Each wife had seven sacks,
Each sack had seven cats,
Each cat had seven kits:
Kits, cats, sacks, and wives,
How many were there going to St Ives?
The new JavaScript code did much the same thing, except:
- It substituted Lambda for Sidekiq
- It introduced a data error because the developer misunderstood a bit of Ruby logic
Reality
Ruby vs. JavaScript is out of scope for this post, and Lambda definitely had an edge here simply due to its higher concurrency.
HOWEVER, the real performance limit was the database access pattern.
Here’s a better approach:
- Create a temporary table matching table B
- Copy the 100,000 rows from table A to the temp table using
INSERT ... SELECT
- Modify the data using
UPDATE
- Insert the rows into table B using another
INSERT ... SELECT
- Drop the temporary table
The entire process only needed 5–7 queries in a single invocation. And even though the runtime of the single job was significant, it still ran ~3× faster than the JavaScript/Lambda implementation—largely because the data never left the database. As a bonus, the entire task was now atomic.
Oh, and we moved the job back to legacy Ruby/Sidekiq—because consolidating 10,101 invocations into one caused timeouts in Lambda.