I created this post because I couldn’t find any online resource that simply explained the foundational knowledge of RabbitMQ. The documentation on the RabbitMQ website is dense. This post is not an introductory tutorial on how to setup RabbitMQ with your x application, as many posts online are.
What is RabbitMQ?
It’s a message broker! What is a message broker? It takes messages from a message publisher, acts upon them, and then allows message consumers to consume them. Using a message broker allows one to decouple services so that they don’t have any awareness of eachothers implementation. This allows you to update one application freely without having to worry about breakages in another application.
What is it used for?
One should use RabbitMQ when they need to run background asyncronous tasks. What does this mean? You should run it when you want web servers to respond to requests quickly instead of being forced to perform resource-heavy procedures on the spot.
Imagine you run a web crawler SaaS. Users can request a page to be crawled. Page crawls can take up to 30 seconds. You have a Python Flask webserver that serves all traffic right now. If many users request a website crawl at once this could lock up all threads in the Flask webserver, because they’re now doing crawling work, and you would no longer be able to serve web traffic! Instead, everytime a customer requests a crawl you could send the task to a queue like RabbitMQ and process it using celery workers. This allows one separate long running asycronous tasks like crawling from time-sensitive sync tasks like serving a webpage.
How does it work?
RabbitMQ uses the AMQP 0-9-1 model, in which messages are published to exchanges (often compared to post offices or mailboxes). Exchanges then distribute message copies to queues using rules called bindings. Then the broker either deliver messages to consumers subscribed to queues, or consumers fetch/pull messages from queues on demand.
Publisher: A process that sends messages to the queue.
VHost: A namespace to segregate different applications or queues with different purposes. You can apply access controls to each vhost.
Channel: A virtual connection inside a “real” TCP connection. A channel allows consumers to open new virtual connections but re-use an existing TCP connection.
Exchange: The mail station for messages. The exchange decide which queues to route messages based on binding keys. The most common is a
direct exchange in which a message is routed in the exchange to only one queue. More complicated exchage types can be read about here.
Queues: A buffer that stores messages.
Consumer: A worker that acts upon messages in the queue.
I will briefly describe the RabbitMQ message lifecycle, from Publisher to Consumer.
Publish: A publisher send messages to RabbitMQ. Usually this is a application backend written in your language of choice that implements a RabbitMQ/AMQP library. You may setup a “publisher confirm” here so that a publisher knows for certain that a message was published to RabbitMQ – although this decreases throughput by about 250x.
Routing: The messages received are routed to a queue based on their routing tag. (e.g a message with the routing tag
mailer.sendgets send to the queue
Queues: The message has arrived in the queue. And here they wait!
Storage: Queues can have a property of
durable, so that they are saved to disk and on reboot will still exist. However, this is for queues only, if you would like your messages to persist on a reboot, they need to have the property of
persistent. Persistent messages routed to durable queues are persisted in batches or when a certain amount of time passes (fraction of a second). For transient queues, RabbitMQ will try to keep as much in memory, but under memory pressure will write them to disk.
Please note that creating persistent messages will also greatly affect performance.
Consumers: Typically with RabbitMQ people are using celery. There are two different methods for the consumer to get messages:
1) They can be delivered to consumer from RabbitMQ when the consumers are in
2) They can poll the queue for messages when they are in
The majority of RabbitMQ users select option 1 because latency for the push model is typically lower. For polling, each message get requires a request/response round-trip.
To ensure the delivery of important messages, you can make sure that a manual acknowledgement is enabled for the consumers. This means that a message will remain in queue until a consumer acklowledges that it has completed a task. If the consumer fails it the queue can redistribute the message again.
The other acknowledgement method is automatic acknowledgement, in which a message is considered completed as soon as it has been delivered over a TCP connection.
A queue mode can be set to determine the queues functionality when it receives messages.
Lazy: This will write messages to disk as soon as possible. It’s great if you have spikes of many messages and can’t store them all in memory, but it will definitely increase I/O.
Default: Messages are written to disk only under Memory pressure, otherwise they are kept in memory. This mode is ideal for performance assuming you don’t receive spikes in messages that consumers can’t keep up with. Paging a batch of messages to disk takes time and blocks the queue process, making it unable to receive new messages while it’s paging. Lazy queues are ideal here if you think you will need to page disk often.
Let’s discuss the RabbitMQ Dashboard. This is a screenshot from CloudAMQP, but yours should look the same.
Publish - The rate at which messages are getting published to RabbitMQ.
Publisher confirm: The rate of publisher confirms (RabbitMQ confirming that it has received a message).
Deliver (auto ack): When the consumer is in the mode
basic.consume and RabbitMQ ack’s once the task has been sent to TCP socket.
Deliver (manual ack): When the consumer is in the mode
basic.consume, the consumer will ack once the task has been completed.
Consumer ack: This the rate at which a consumer acks. This covers both acks in
basic.consume mode and
Redelivered: This is the rate at which messages are being redelivered to the queue because they have failed.
Get (manual ack): When the consumer is in the mode
basic.get and manual acks once a task has been completed. This is an acknowledgement that the consumer has received a message and processed it, and that RabbitMQ is free to delete it.
Get (auto ack): When the consumer is in the mode
basic.get the message is ack’d whenever a consumer pops a task from the queue. However, if the worker dies in the time from when it fetches the message to when it finishes the message work, the message will be lost.
Disk read: This is the rate that we’re reading messages from disk.
Disk write: This is the rate that we’re writing messages to disk. ```