Tuesday, April 12, 2016

Data aggregation with Spring Data MongoDB and Spring Boot

MongoDB aggregation framework is designed for grouping documents and transforming them into an aggregated result. The aggregation query consists in defining several stages that will be executed in a pipeline. If you are interested in more in-depth details about the framework, then mongodb docs are a good point to start.

The point of this post is to write a web application for querying mongodb in order to get aggregated results from the database. We will do it in a very easy way thanks to Spring Boot and Spring Data. Actually it is really fast to implement the application, since Spring Boot will take care of all the necessary setup and Spring Data will help us configure the repositories.

The source code can be found on my Github repository.


1 The application


Before going through the code let’s see what we want to do with our application.

Our domain is a collection of products we have distributed across several warehouses:

Our target is to collect all the products within a price range, grouped by warehouse and collecting the total revenue and the average price of each grouping.

In this example, our warehouses are storing the following products:

The application will query for products with a price between 5.0 and 70.0. The required aggregation pipeline steps will be as follows:



We will end up with aggregated results grouped by warehouse. Each group will contain the list of products of each warehouse, the average product price and the total revenue, which actually is the sum of the prices.


2 Maven dependencies


As you can see, we have a short pom.xml with Spring Boot dependencies:

By defining spring-boot-starter-parent as our parent pom, we set the default settings of Spring Boot. Mainly it sets the versions of a bunch of libraries it may use, like Spring or Apache Commons. For example, Spring Boot 1.3.3, which is the one we are using, sets 4.2.5.RELEASE as the Spring framework version. Like stated in previous posts, it is not adding libraries to our application, it only sets versions.

Once the parent is defined, we only need to add three dependencies:

  • spring-boot-starter-web: Mainly includes Spring MVC libraries and an embedded Tomcat server.
  • spring-boot-starter-test: Includes testing libraries like JUnit, Mockito, Hamcrest and Spring Test.
  • spring-boot-starter-data-mongodb: This dependency includes the MongoDB Java driver, and the Spring Data Mongo libraries.



3 Application setup


Thanks to Spring Boot, the application setup is as simple as the dependencies setup:

When running the main method, we will start our web application listening to the 8080 port.


4 The repository


Now that we have the application properly configured, we implement the repository. This isn’t difficult neither since Spring Data takes care of all the wiring.

The following test proves that our application is correctly set up.

We didn’t implement save and findOne methods. They are already defined since our repository is extending MongoRepository.


5 The aggregation query


Finally, we set up the application and explained all the steps. Now we can focus on the aggregation query.

Since our aggregation query is not a basic query, we need to implement a custom repository. The steps are:

Create the custom repository with the method we need:

Modify the first repository in order to also extend our custom repository:

Create an implementation to write the aggregation query:

Now we are going to implement the stages of the mongodb pipeline as explained in the beginning of the post.
Our first operation is the match operation. We will filter out all product documents that are beyond our price range:

The next stage of the pipeline is the group operation. In addition to grouping documents by warehouse, in this stage we are also doing the following calculations:

  • last: Returns the warehouse of the last document in the group.
  • addToSet: Collects all the unique product Ids of all the grouped documents, resulting in an array.
  • avg: Calculates the average of all prices in the group.
  • sum: Sums all prices in the group.


The last stage of the pipeline is the project operation. Here we specify the resulting fields of the aggregation:

The query is built as follows:

In the aggregate method, we indicate the input class, which is our Product document. The next argument is the output class, which is a DTO to store the resulting aggregation:

We should end the post with a test proving that results are what we expect:


6 Conclusion


Spring Data has a good integration with MongoDB aggregation framework. Adding Spring Boot to configure the application let's us focus on building the query. For the building process, Aggregation class has several static methods that help us implement the different pipeline stages.

I'm publishing my new posts on Google plus and Twitter. Follow me if you want to be updated with new content.