Copied!

What are you looking for?

Search by 3MGC

Main Menu

Home Contact

Courses Services Resume Tools

Programming

Laravel

PHP

Redis

Laravel Queue Horizontal & Vertical Scaling – Production Architecture

Shahroz Javed

Mar 30, 2026 . 213 views

Vertical Scaling – When More RAM/CPU Stops Helping

Vertical scaling — adding more CPU cores and RAM to a single server — is the first instinct for queue performance problems. It works, but only up to a point that most teams hit sooner than expected.

The fundamental constraint: each queue worker is a single PHP process. PHP is not multi-threaded. One worker uses one CPU core at a time. To use N cores, you need N worker processes. On a 32-core server, you need 32 concurrent workers to saturate the CPU — but those 32 workers each hold their own database connection, Redis connection, and memory allocation.

At 32 workers on one machine, your bottleneck shifts from CPU to:

Database connection pool exhaustion — if your jobs hit the database, 32 workers can rapidly saturate your MySQL max_connections.
Redis connection count — each worker holds a persistent connection. On a Redis server with connection limit set to 1000, 32 workers per machine * 10 machines = 320 connections consumed before your web tier connects.
Network I/O saturation — heavy external API calls or S3 operations are network-bound, not CPU-bound. More workers just amplify the network pressure.
Shared memory pressure — OPcache is shared across processes but each worker process has its own heap. 32 workers * 128MB = 4GB RAM just for worker heaps.

# Calculating the vertical scaling ceiling for your workload
#
# Formula: max_useful_workers = min(available_cores, db_max_connections / jobs_per_worker_db_connections)
#
# Example:
# Server: 16 cores, 32GB RAM
# MySQL: max_connections = 200, and 50 are used by the web tier
# Available for workers: 150 connections
# Each worker uses 1 DB connection → max 150 workers
# But 150 workers * 256MB = 38.4 GB → exceeds RAM
# RAM ceiling: 32,000 / 256 = 125 workers
# CPU ceiling: 16 workers (diminishing returns beyond this for CPU-bound jobs)
#
# Practical ceiling: ~16-20 workers on a 16-core server for mixed workloads
# Beyond this: add another server (horizontal scaling)

Vertical scaling ceiling rule: you hit diminishing returns when workers outnumber CPU cores (for CPU-bound jobs) or when shared resource contention (DB connections, Redis connections, network bandwidth) becomes the bottleneck. At that point, adding a second server is more effective than upgrading the first.

Horizontal Scaling Across Multiple Servers

Horizontal scaling adds more servers running queue workers, all connected to a shared queue backend (Redis or SQS). Because each worker independently polls the queue, no coordination is needed between servers — they're naturally load-balanced by the queue backend itself.

The key architectural requirement: the queue backend must be external to all worker servers. A Redis instance on worker-server-1 is useless to worker-server-2. You need a separate Redis server (or cluster) that all workers connect to.

# Multi-server topology
#
# [App Server 1] ─┐
# [App Server 2] ─┤── dispatches jobs ──► [Redis Queue Server]
# [App Server 3] ─┘                               │
#                                                  │
# [Worker Server 1] (8 workers) ─┐                 │
# [Worker Server 2] (8 workers) ─┼── pops jobs ◄──┘
# [Worker Server 3] (8 workers) ─┘
#
# Total: 24 workers processing from one shared queue
# Redis handles the atomic pop — no two workers get the same job

# .env on EACH worker server (identical)
QUEUE_CONNECTION=redis
REDIS_HOST=redis.internal.yourapp.com  # shared Redis, NOT localhost
REDIS_PORT=6379
REDIS_PASSWORD=your_secure_password

# Supervisor config on each worker server
# /etc/supervisor/conf.d/laravel-worker.conf
[program:laravel-worker]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/artisan queue:work redis \
    --queue=critical,high,default,low \
    --sleep=3 \
    --tries=3 \
    --max-time=3600 \
    --max-jobs=500
autostart=true
autorestart=true
numprocs=8                   # 8 workers per server
stopwaitsecs=300             # wait up to 5 min for current job to finish
redirect_stderr=true
stdout_logfile=/var/log/worker.log

# queue priority order: critical jobs processed before high, high before default, etc.
# All 8 workers on each server will share this priority ordering

With Horizon managing horizontal workers, you configure supervisors per server role rather than managing Supervisor files manually. Horizon's process balancer handles distributing work across processes dynamically based on queue depth.

// config/horizon.php – multi-server configuration
'environments' => [
    'production' => [
        'supervisor-1' => [
            'connection'      => 'redis',
            'queue'           => ['critical', 'high', 'default'],
            'balance'         => 'auto',          // dynamically shift workers to busiest queues
            'autoScalingStrategy' => 'size',      // scale based on queue depth
            'maxProcesses'    => 20,
            'minProcesses'    => 2,
            'balanceMaxShift' => 5,               // max processes to add/remove per rebalance cycle
            'balanceCooldown' => 3,               // seconds between rebalance operations
            'tries'           => 3,
            'timeout'         => 60,
            'memory'          => 256,
        ],

        // Dedicated supervisor for slow, heavy jobs
        'supervisor-heavy' => [
            'connection'   => 'redis',
            'queue'        => ['heavy-processing'],
            'balance'      => 'simple',
            'maxProcesses' => 4,
            'minProcesses' => 1,
            'timeout'      => 600,    // 10 min for video/image jobs
            'memory'       => 512,
        ],
    ],
],

Redis Cluster for Queue Backends

A single Redis instance tops out at around 100,000 operations/second and is limited to one machine's RAM. For extremely high-throughput queue systems (millions of jobs/day), Redis Cluster partitions data across multiple Redis nodes using consistent hashing.

The critical consideration for Laravel queues: Redis Cluster uses key-slot partitioning, meaning different keys may live on different nodes. Laravel's queue commands use multiple keys per queue (the list key, the delayed set, the reserved set). These keys must land on the same cluster node — which requires Redis hash tags.

// config/database.php – Redis Cluster for queues
'redis' => [
    'client' => 'phpredis',  // phpredis extension required for cluster support

    'clusters' => [
        'queue-cluster' => [
            [
                'host'     => env('REDIS_NODE_1_HOST', '10.0.1.1'),
                'password' => env('REDIS_PASSWORD'),
                'port'     => 6379,
                'database' => 0,
            ],
            [
                'host'     => env('REDIS_NODE_2_HOST', '10.0.1.2'),
                'password' => env('REDIS_PASSWORD'),
                'port'     => 6379,
                'database' => 0,
            ],
            [
                'host'     => env('REDIS_NODE_3_HOST', '10.0.1.3'),
                'password' => env('REDIS_PASSWORD'),
                'port'     => 6379,
                'database' => 0,
            ],
        ],
    ],

    'options' => [
        'cluster'   => 'redis',
        'prefix'    => '{queue}:',  // hash tag ensures all queue keys land on same slot
    ],
],

The hash tag {queue} in the prefix is the key detail. Redis Cluster routes based on the content inside {}. By prefixing all queue keys with {queue}:, you guarantee that {queue}:queues:default, {queue}:queues:default:delayed, and {queue}:queues:default:reserved all hash to the same slot, preventing CROSSSLOT errors when the queue worker executes Lua scripts that touch multiple keys.

// config/queue.php – pointing to the cluster connection
'redis' => [
    'driver'     => 'redis',
    'connection' => 'queue-cluster',  // the cluster connection defined above
    'queue'      => env('REDIS_QUEUE', 'default'),
    'retry_after' => 90,
    'block_for'   => 5,
],

// Verify cluster slot assignment from redis-cli:
// redis-cli -c -h 10.0.1.1 CLUSTER KEYSLOT "{queue}:queues:default"
// redis-cli -c -h 10.0.1.1 CLUSTER KEYSLOT "{queue}:queues:default:delayed"
// Both should return the same slot number

Redis Sentinel (for high availability with automatic failover) is different from Redis Cluster (for horizontal sharding). For most applications, a single Redis instance with Sentinel for failover is simpler and sufficient. Only move to Redis Cluster when a single Redis node's throughput is genuinely the bottleneck.

Handling Burst Traffic

Burst traffic is when your queue depth spikes suddenly — a marketing email campaign fires, a cron job dispatches thousands of jobs at once, or a viral event generates a flood of user-initiated tasks. The queue handles this naturally by buffering, but you need strategies to prevent the buffer from growing so large it introduces hours of lag.

The core mechanism for burst handling is pre-warming workers before a known burst and auto-scaling for unexpected bursts.

// Pre-warming strategy for known burst events (e.g., scheduled campaign sends)
// App\Console\Commands\PreWarmWorkersCommand.php
class PreWarmWorkersCommand extends Command
{
    protected $signature = 'workers:pre-warm {count=20}';

    public function handle(): void
    {
        $targetWorkers = (int) $this->argument('count');

        // Scale up Horizon processes via config override
        // In practice: trigger your cloud auto-scaling API or Kubernetes HPA
        $this->call('horizon:terminate');  // restarts with new config if you updated it

        // Or — dispatch a warmup signal to your orchestration layer
        Http::post(config('services.orchestrator.url') . '/scale', [
            'service'   => 'queue-workers',
            'replicas'  => $targetWorkers,
            'duration'  => 3600,  // hold this scale for 1 hour
        ]);

        $this->info("Scaling queue workers to {$targetWorkers} for burst period.");
    }
}

// Schedule it 5 minutes before the burst:
// In App\Console\Kernel.php:
$schedule->command('workers:pre-warm 30')->dailyAt('09:55');  // campaign fires at 10:00

// Rate-limiting dispatch to prevent queue overload from a single burst source
// Instead of dispatching 100,000 jobs instantly:

class DispatchBulkEmailCampaignJob implements ShouldQueue
{
    public function __construct(
        private readonly int $campaignId,
        private readonly int $chunkSize = 500,
    ) {}

    public function handle(): void
    {
        $campaign = Campaign::findOrFail($this->campaignId);

        $campaign->recipients()
            ->cursor()          // memory-efficient - yields one model at a time
            ->chunk($this->chunkSize)
            ->each(function ($chunk, $index) use ($campaign) {
                // Stagger dispatch over time to create a smooth queue fill rate
                // 500 jobs every 10 seconds = 3,000 jobs/minute steady inflow
                SendCampaignEmailBatch::dispatch($campaign->id, $chunk->pluck('id')->toArray())
                    ->delay(now()->addSeconds($index * 10));
            });
    }
}

Auto-Scaling with Horizon + Cloud

Horizon's auto balance strategy redistributes processes between queues dynamically, but it operates within a fixed pool of workers on existing servers. True auto-scaling means adding and removing entire servers (or containers) based on queue depth metrics.

// Auto-scaling trigger based on queue depth — AWS Lambda function approach
// This runs every minute via CloudWatch Events and adjusts ECS task count

// Lambda function (pseudocode for the scaling logic):
async function scaleQueueWorkers() {
    const queueDepth = await redis.llen('queues:default')
        + await redis.zcard('queues:default:delayed');

    const currentTasks = await ecs.describeService('queue-workers').desiredCount;

    // Scale formula: 1 worker per 100 queued jobs, min 2, max 50
    const targetTasks = Math.min(50, Math.max(2, Math.ceil(queueDepth / 100)));

    if (targetTasks !== currentTasks) {
        await ecs.updateService('queue-workers', { desiredCount: targetTasks });
        console.log(`Scaled workers: ${currentTasks} → ${targetTasks} (queue depth: ${queueDepth})`);
    }
}

// Kubernetes HPA (Horizontal Pod Autoscaler) configuration for queue workers
// k8s/queue-worker-hpa.yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: queue-worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-worker
  minReplicas: 2
  maxReplicas: 50
  metrics:
    # Scale based on a custom metric exported to Prometheus from Horizon
    - type: External
      external:
        metric:
          name: laravel_queue_depth
          selector:
            matchLabels:
              queue: default
        target:
          type: AverageValue
          averageValue: "100"   # target: 100 jobs per worker replica

// Exporting Horizon metrics to Prometheus for HPA
// App\Console\Commands\ExportHorizonMetricsCommand.php
class ExportHorizonMetricsCommand extends Command
{
    protected $signature = 'horizon:export-metrics';

    public function handle(): void
    {
        $stats = app(\Laravel\Horizon\Contracts\MetricsRepository::class);
        $redis = Redis::connection('queue');

        // Queue depths per named queue
        foreach (['default', 'high', 'critical', 'heavy-processing'] as $queue) {
            $depth = $redis->llen("queues:{$queue}")
                   + $redis->zcard("queues:{$queue}:delayed");

            // Push to Prometheus pushgateway
            Http::post(config('services.prometheus.pushgateway') . '/metrics/job/horizon', [
                'laravel_queue_depth{queue="' . $queue . '"} ' . $depth . "\n"
            ]);
        }
    }
}

// Schedule every 15 seconds for responsive scaling:
$schedule->command('horizon:export-metrics')->everyMinute()->runInBackground();

Graceful Scale-Down

Scaling down is more dangerous than scaling up. Terminating a worker that is mid-job will leave that job in a reserved state (it won't be re-queued until the retry_after timeout expires). In the worst case, you terminate a worker running a critical financial operation — the job has partially committed and re-running it could cause duplicates.

Laravel's queue:work responds to SIGTERM by completing the current job then exiting cleanly — but only if Supervisor (or your orchestrator) sends SIGTERM, not SIGKILL. Configure a stopwaitsecs long enough for your longest expected job.

# Supervisor graceful shutdown configuration
[program:laravel-worker]
command=php /var/www/artisan queue:work redis --queue=default --timeout=300
stopwaitsecs=360        # wait 6 minutes (timeout + 60s buffer)
stopsignal=TERM         # send SIGTERM (not SIGKILL) — allows graceful finish
killasgroup=true        # send SIGKILL to the whole process group if stopwaitsecs expires
stopasgroup=true

// Kubernetes graceful termination
// k8s/queue-worker-deployment.yaml (relevant section)
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 400  # must exceed the job timeout
      containers:
        - name: queue-worker
          lifecycle:
            preStop:
              exec:
                # Signal Horizon to stop accepting new jobs and finish current ones
                command: ["/bin/sh", "-c", "php artisan queue:restart && sleep 5"]

// For ECS — drain tasks before stopping:
// ECS Task Draining moves tasks to DRAINING state, which:
// 1. Stops accepting new work from the load balancer (not relevant for workers, but shows the pattern)
// 2. Lets the task finish current operations
// 3. Only terminates after all connections are closed

// For queue workers on ECS — intercept the SIGTERM in the job:
// App\Jobs\Concerns\HandlesGracefulShutdown.php
trait HandlesGracefulShutdown
{
    private bool $shouldStop = false;

    public function registerShutdownHandler(): void
    {
        pcntl_signal(SIGTERM, function () {
            $this->shouldStop = true;
        });
    }

    protected function checkShutdown(): void
    {
        pcntl_signal_dispatch();

        if ($this->shouldStop) {
            // Re-queue the current work item for another worker to pick up
            $this->release(0);
            throw new \RuntimeException('Worker received SIGTERM — releasing job for requeue');
        }
    }
}

// Use it in long-running batch jobs:
class ProcessLargeDatasetJob implements ShouldQueue
{
    use HandlesGracefulShutdown;

    public function handle(): void
    {
        $this->registerShutdownHandler();

        foreach ($this->getItems() as $item) {
            $this->processItem($item);
            $this->checkShutdown();  // check between items, not mid-item
        }
    }
}

The queue:restart command works by writing a timestamp to the cache. Workers check this value after each job and exit if the timestamp is newer than their start time. This means scale-down via queue:restart is always graceful — workers finish their current job before exiting, then Supervisor does not restart them if you've also reduced numprocs.

Conclusion

Scaling queue systems requires understanding the resource model: workers are single-threaded PHP processes that hold connections and consume memory proportionally to their count. The practical scaling path for most applications follows this progression:

Start: Single server, 4–8 workers, Redis backend, Horizon for monitoring.
Growth: Add workers on the same server up to the vertical ceiling (RAM and connection limits).
Scale-out: Add worker servers, all connecting to a shared Redis instance. No code changes required.
High availability: Redis Sentinel for failover, multiple worker servers, health-check monitoring.
Extreme scale: Redis Cluster, containerized workers with HPA, Prometheus-driven auto-scaling.

Graceful scale-down is as important as scale-up. Never terminate workers with SIGKILL without a stopwaitsecs large enough for your longest job — and design long jobs to be interruptible and re-entrant so a forced restart doesn't leave broken state behind.

📑 On This Page