Understand MongoDB's aggregation framework

If you are a “regular” SQL or MySQL user, MongoDB’s aggregation framework may be a little obscure.

In this article, I’ll show you the key to understand it.

I’ll suppose you already have MongoDB installed on your system, and that you’re using Mongo’s shell and understand MongoDB’s language (document, collection…).

Definition of the aggregation framework

The definition of the framework, according MongoDB’s documentation:

Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result.

This means that the aggregation framework may be use like MySQL’s `GROUP BY` or `DISTINCT` (but not only).

Understand the framework

The most important thing to understand, and then, to be able to build your own queries, is how MongoDB read the query.

Here an example of a simple aggregation:

db.myCollection.aggregate([
  {
    "$match":
    {
      "event_id": 11
    }
  },
  {
    "$unwind" : "$tags"
  },
  {
    "$project":
    {
      "_id": 0, "pageId": 1, "duration": 1, "date": 1
    }
  },
  { "$sort": { "date": -1 } }
]).pretty()

First of all, this aggregate is absolutely useless. It’s the equivalent of a simple `find()` query:

db.myCollection.find(
  { "event_id" : 11 },
  { "_id" : 0 }
).pretty();

Or, in MySQL:

SELECT * FROM `myCollection` WHERE `event_id` = 11;

But, it’s not the point here. For now, let’s focus on how the aggregation’s query is read by MongoDB.

This aggregate returns all documents where `event_id` equal `11`. But not only.

(Actually, you certainly noticed that my MySQL transcript of this query is not totally accurate. Indeed, as we’ll see later, this aggregate returns all the document field except `_id` field. In MySQL, it’s not very simple to do that.)

Explanation

Let’s go back to our aggregate.

The first block is a `$match`

{
  "$match":
  {
    "event_id": 11
  }
},

This is the equivalent of a `WHERE` in MySQL. The this block will return all documents where `event_id` = `11`. The aggregate could stop here and be valid; but then, it wouldn’t be very interesting.

The second block is much more interesting and usefull:

{
  "$unwind" : "$tags"
},

The `$unwind` must be use on arrays, as it unfold them. But it will unfold only the document matching precedent blocks (in this case, the `$match`). And this is the “key” to understand aggregates. It’s that each block pass its result to the next one.

Then, in our example, the next block is a `$project` where we set which field from the document we keep or remove, but only for the documents sent by precedent blocks: `$match` and `$unwind`.

It’s very important to understand that way of working. You have to think of what you’ll pass on the next block. It’s like a waterfall, if you block or derive half of the content, you’ll be able to work at the end only to the content you let through. To help you understand why an aggregate does not work as it should, you can run your aggregate step-by-step, adding one block after another and see which one break your query.

This is why you may chain a `$group` after a `$match`, then another `$group`.

Understand MongoDB’s aggregation framework

Definition of the aggregation framework

Understand the framework

Explanation

Links