Is your queue
always backing up?
You upgraded Redis. You added more workers. But the jobs keep piling up, and customers aren't getting their emails.
The "More Hardware" Lie
When Sidekiq gets slow, the instinct is to throw more RAM and CPU at it. But if your jobs are fundamentally broken, scaling just means you fail faster and more expensively.
Bad Job Anatomy
class ExportJob
def perform(user)
# Passing full object
# 10k lines of processing
end
end
class ExportJob
def perform(user_id)
# Pass ID only
# Idempotent & Fast
end
end
Why they actually fail
Lack of Idempotency
Sidekiq *will* retry jobs. If your job isn't safe to run twice (e.g. charging a card), you're going to have a bad time.
Memory Leaks
Sidekiq processes that grow to 2GB+ and get killed by the OOM killer, dropping jobs in the middle of processing.
Connection Pool Exhaustion
Your web dynos and worker dynos are fighting for the same limited database connections.
Clear the queue.
I help teams stabilize background processing and reduce infrastructure costs.