Copied!
Programming
Laravel
PHP

Laravel Queue Dashboard Best Practices – Grafana, Prometheus & Custom Metrics

Laravel Queue Dashboard Best Practices – Grafana, Prometheus & Custom Metrics
Shahroz Javed
Mar 25, 2026 . 29 views

What Metrics Actually Matter

Most teams monitor queues reactively — they look at the dashboard after something breaks. The goal of a proper queue dashboard is proactive visibility: catch problems before they impact users. These are the five metrics worth tracking:

  • Queue Depth (Pending Jobs) — jobs waiting to be processed. Sustained growth = your workers can't keep up.

  • Throughput — jobs processed per minute. Drops indicate worker crashes or slow jobs blocking the queue.

  • Wait Time (Latency) — time from dispatch to execution start. High latency means underpowered workers for the load.

  • Failure Rate — percentage of jobs that fail. Spike = a code bug, dependency down, or config change.

  • Job Duration — average processing time per job class. Growing duration = performance regression or resource exhaustion.

Horizon Built-in Metrics

If you're on Redis + Horizon, you already have a built-in metrics system. Horizon's metrics are stored in Redis and displayed in the /horizon dashboard.

Take Regular Snapshots

Horizon needs a scheduled snapshot command to record metrics over time. Without it, the Metrics tab shows no historical data:

// app/Console/Kernel.php
$schedule->command('horizon:snapshot')->everyFiveMinutes();

Horizon Metrics API

Horizon exposes its metrics via an internal API you can query programmatically:

use Laravel\Horizon\Contracts\MetricsRepository;

$metrics = app(MetricsRepository::class);

// Get throughput for a job class (jobs per minute)
$throughput = $metrics->throughputForJob(App\Jobs\SendEmail::class);

// Get average runtime in milliseconds
$runtime = $metrics->runtimeForJob(App\Jobs\SendEmail::class);

// Get queue throughput and wait time
$queueThroughput = $metrics->throughputForQueue('emails');
$waitTime        = $metrics->waitTimeFor('redis', 'emails');  // seconds

Exporting Custom Metrics

For teams that need more than Horizon provides — historical data beyond 1 hour, cross-service dashboards, or custom business metrics — export queue data to a time-series database.

Custom Metrics Collector (Scheduled Command)

// app/Console/Commands/CollectQueueMetrics.php
class CollectQueueMetrics extends Command
{
    protected $signature   = 'metrics:queue-collect';
    protected $description = 'Collect queue metrics and push to time-series DB';

    public function handle(): void
    {
        $queues = ['default', 'emails', 'payments', 'notifications'];

        foreach ($queues as $queue) {
            // Pending job count
            $pending = DB::table('jobs')
                ->where('queue', $queue)
                ->count();

            // Jobs reserved (currently being processed)
            $reserved = DB::table('jobs')
                ->where('queue', $queue)
                ->whereNotNull('reserved_at')
                ->count();

            // Failed in last 5 minutes
            $failed = DB::table('failed_jobs')
                ->where('queue', $queue)
                ->where('failed_at', '>=', now()->subMinutes(5))
                ->count();

            // Push to your metrics system
            Metrics::gauge("queue.pending.{$queue}", $pending);
            Metrics::gauge("queue.reserved.{$queue}", $reserved);
            Metrics::counter("queue.failed.{$queue}", $failed);
        }
    }
}

// Schedule every minute
$schedule->command('metrics:queue-collect')->everyMinute()->withoutOverlapping();

Track Per-Job Duration in Middleware

// app/Jobs/Middleware/RecordMetrics.php
class RecordMetrics
{
    public function handle(object $job, \Closure $next): void
    {
        $jobClass = class_basename($job);
        $start    = microtime(true);
        $status   = 'success';

        try {
            $next($job);
        } catch (\Throwable $e) {
            $status = 'failure';
            throw $e;
        } finally {
            $duration = (microtime(true) - $start) * 1000;  // ms

            // Push to StatsD, Prometheus pushgateway, Datadog, etc.
            Metrics::histogram('queue.job.duration', $duration, [
                'job'    => $jobClass,
                'queue'  => $job->queue ?? 'default',
                'status' => $status,
            ]);
        }
    }
}

Prometheus + Grafana Setup

For production-grade observability, combine Prometheus (metrics storage) with Grafana (visualization). Laravel metrics flow into Prometheus via a scrape endpoint or Pushgateway.

Option 1: Prometheus Pushgateway (Simplest)

composer require promphp/prometheus_client_php

// In your metrics collector command:
use Prometheus\CollectorRegistry;
use Prometheus\Storage\Redis as PrometheusRedis;

$registry = new CollectorRegistry(new PrometheusRedis(['host' => 'redis']));

$pendingGauge = $registry->getOrRegisterGauge(
    'laravel',
    'queue_pending_jobs',
    'Number of pending jobs per queue',
    ['queue']
);
$pendingGauge->set($pending, [$queue]);

// Push to Prometheus Pushgateway
$pushGateway = new \PrometheusPushGateway\PushGateway('pushgateway:9091');
$pushGateway->pushAdd($registry, 'laravel_queue', ['instance' => gethostname()]);

Option 2: Scrape Endpoint (More Robust)

// routes/web.php — Prometheus scrape endpoint
Route::get('/metrics', function () {
    $registry = app(CollectorRegistry::class);
    $renderer = new \Prometheus\RenderTextFormat();

    return response($renderer->render($registry->getMetricFamilySamples()), 200, [
        'Content-Type' => \Prometheus\RenderTextFormat::MIME_TYPE,
    ]);
})->middleware('auth.prometheus');  // IP whitelist for Prometheus scraper only

Essential Grafana Dashboard Panels

Here are the panels every Laravel queue Grafana dashboard should have, with their Prometheus queries:

Panel 1: Queue Depth by Queue (Stat/Gauge)

# PromQL
laravel_queue_pending_jobs{queue="emails"}
laravel_queue_pending_jobs{queue="payments"}

# Alert threshold: > 500 pending = warning, > 2000 = critical

Panel 2: Job Throughput (Time Series)

# Jobs processed per minute, grouped by job class
rate(laravel_queue_job_duration_count[5m]) * 60

# Color code: green > 100/min, yellow > 50/min, red < 10/min

Panel 3: Job Duration Percentiles (Heatmap)

# P50, P95, P99 job duration
histogram_quantile(0.50, rate(laravel_queue_job_duration_bucket[5m]))
histogram_quantile(0.95, rate(laravel_queue_job_duration_bucket[5m]))
histogram_quantile(0.99, rate(laravel_queue_job_duration_bucket[5m]))

# P99 > 5s = investigate slow jobs

Panel 4: Failure Rate (Stat)

# Failure rate as percentage
rate(laravel_queue_job_duration_count{status="failure"}[5m])
/
rate(laravel_queue_job_duration_count[5m])
* 100

# Alert: > 5% failure rate = warning, > 20% = critical

Panel 5: Failed Jobs Table (Table)

# Not from Prometheus — pull from database in a separate panel
# Use Grafana MySQL/PostgreSQL datasource:
SELECT
  DATE_FORMAT(failed_at, '%H:%i') as time,
  queue,
  SUBSTRING_INDEX(exception, '\\n', 1) as error,
  COUNT(*) as count
FROM failed_jobs
WHERE failed_at >= NOW() - INTERVAL 1 HOUR
GROUP BY queue, error
ORDER BY count DESC
LIMIT 20;

Intelligent Alerting Rules

Good alerts fire when something needs human attention. Bad alerts fire on every blip and cause alert fatigue — the team starts ignoring them. Here are battle-tested alert rules:

Rule 1: Sustained Queue Backlog

# Grafana Alert Rule — fires only if condition is true for 10+ minutes
# This prevents alerts from transient bursts that clear quickly
WHEN avg() OF query(laravel_queue_pending_jobs{queue="payments"}, 10m) > 200
FOR 10m
SEVERITY: critical
MESSAGE: "Payments queue backlog: {{ value }} jobs pending for 10+ minutes"

Rule 2: Failure Rate Spike

# Alert when failure rate > 10% for 5 minutes
WHEN avg() OF (failure_rate_query) > 10
FOR 5m
SEVERITY: warning
MESSAGE: "Queue failure rate {{ value }}% on {{ queue }}"

Rule 3: Worker Throughput Drop

# Alert when throughput drops > 80% below 1-hour average
# This catches worker crashes even when the queue depth is still low
WHEN (current_throughput / avg_throughput_1h) < 0.2
FOR 3m
SEVERITY: critical
MESSAGE: "Queue throughput dropped to {{ value }} jobs/min — workers may be down"

Rule 4: Job Duration Regression

# Alert when P95 job duration is 3x higher than the daily average
# Catches memory leaks, slow queries, and dependency degradation
WHEN p95_duration > (daily_avg_p95 * 3)
FOR 5m
SEVERITY: warning
MESSAGE: "Job {{ job_class }} P95 duration {{ value }}ms — 3x above normal"

Conclusion

A great queue dashboard tells you the current health, recent trends, and failure details without requiring manual investigation. Build towards these goals:

  • Use Horizon's built-in metrics for Redis queues — start here before investing in Prometheus

  • Schedule horizon:snapshot every 5 minutes so historical data is available in the Metrics tab

  • Export metrics to Prometheus + Grafana for long-term retention and cross-service dashboards

  • Build 5 core panels: queue depth, throughput, duration percentiles, failure rate, failed job table

  • Write alerts with a "FOR N minutes" condition — never alert on transient blips

  • The throughput-drop alert is your most important one — it catches total worker failure even before the queue starts backing up

📑 On This Page