Skip to main content

Command Palette

Search for a command to run...

Mastering DynamoDB: Batch Operations Explained

Updated
8 min read
Mastering DynamoDB: Batch Operations Explained

TL;DR; This article covers the usage of DynamoDB BatchWrite and BatchGet operations, and how implementing them can help you improve the efficiency by reducing the amount of requests needed in your workload.

Introduction

Have you ever developed any type of workload that interacts with DynamoDB?

If so, you probably have encountered the requirement of retrieving or inserting multiple specific records, be it from a single or various DynamoDB tables.

This article aims to provide you with it by providing all the required resources and knowledge to implement the usage of DynamoDB batch operations and, as a bonus point, increase the efficiency of your current workloads.

What are Batch Operations?

Introduction

When talking about batch operations or batch processing we refer to the action of aggregating a set of instructions in a single request for them to be executed all at once. In terms of interacting with DynamoDB, we could see it as sending a single request that would allow us to retrieve or insert multiple records at once.

Common Bad practices

Continuing with the sample situation mentioned in the introduction, you may face the requirement of having to retrieve or store multiple records at once.

Code snippet with a "for" loop iterating over "items," using "await getItem(keys)" inside the loop.

For that scenario, most junior developers might rely on looping over a set of keys and sending the GetItem requests in sequence or a mid-level developer might propose to parallelize all those requests using for example a Promise.all, but both approaches are flawed and won’t scale well.

On one side, the for-loop will even be detected by some linters (with rules like no-await-in-loop) as this implementation would increase the execution time exponentially.

On the other side, the Promise.all approach will be a tad more efficient by parallelizing the requests, but with high workloads, developers would end up facing issues like the maximum connection limit reached error.

Now that we have gone over some bad practices in implementing it and that you have probably thought of a few projects that could be improved, we’ll dive into how we can take the most advantage of it.

DynamoDB offers two different types of operations BatchGetItem and BatchWrtieItem which we will take a look into as part of this article.

There is also BatchExecuteStatement for those using PartiQL, but we will leave that one for a future article to cover PartiQL in detail.

BatchGetItem

This operation type will allow us to aggregate up to the equivalent of 100 GetItem requests in a single request.

Code snippet showing a `BatchGetCommand` function for fetching items from two tables, "Table 1" and "Table 2," using primary keys (PK) and sort keys (SK).

Meaning that with this operation we could retrieve up to 100 records or 16 MB from a single or multiple table at once.

BatchWriteItem

💡
PutRequests will overwrite any existing records with the provided keys.

This operation, even if it only contains write as part of its name, will allow us to aggregate up to 25 PutItem and DeleteItem operations in a single request.

Screenshot of a JavaScript code snippet for a `BatchWriteCommand`. It includes request items for two tables, "Table 1" with a `PutRequest`, and "Table 2" with a `DeleteRequest` using a primary key.

Similar to the previous option, we’ll still be limited by the 16 MB maximum, but we would theoretically be able to replace 25 sequential or parallel requests with a single one.

Pagination for Batch operations

Pagination is only valid for the 16 MB limit if the requests don’t follow the 100 record read or the 25 record write limit DynamoDB will throw a ValidationException instead.

Similar to the Scan and Query operations, using any of the above Batch*Item operations can incur in the scenario where the 16 MB maximum is reached and some type of pagination is required.

Screenshot of a JavaScript code snippet defining an asynchronous function named `executeRequest`. It uses a try-catch block to handle a `payload`, checking for `UnprocessedItems`. If any, it recursively calls itself with a `BatchWriteItemCommand`. Errors are logged to the console.

For Batch* operations this comes in the form of the UnprocessedKeys attribute that can be part of the response.

Developers are expected to check for this attribute in the response and, if desired, implement its usage as a recursive function to process them automatically.

💡
Full examples for Retrieving, Inserting, and Deleting records using BatchOperations with a recursive implementation to automatically handle the UnprocessedKeys can be found here.

Real-world Use Cases

Now that we are aware of all options and limitations regarding how we can process records in batch in DynamoDB, let’s see some scenarios that will showcase some real-life improvements.

Scenario 1: Retrieving Data from Multi-table Design Architecture

For this first scenario, let’s imagine we are looking to improve the performance of a REST API that, given an array of productId, will return us the list of desired product details with their respective stock and exact warehouse location. The data is stored in multiple tables, one for each data model (products, stock tracking, and warehouse product location).

Before

JavaScript code snippet that retrieves product, stock, and location data for a list of product IDs and returns them in an array.

The initial implementation was developed by having a for-loop to go over all the provided productIds and sequentially retrieve all the required data from the different tables.

After

From that initial implementation, you should be able to detect two distinct flaws:

  • no-await-in-loop - There is a loop with asynchronous operations inside, which is usually a bad practice, as all operations for a given operation will need to be completed before the next one can start.

  • Sequential await getItem requests - This is also a bad practice, as the three operations are independent from each other and we’d ideally not want for them to be blocked by each other.

A better approach would look something like this:

A code snippet with four steps: 1) Checks if `idList` has more than 33 items and throws an error if true. 2) Builds a payload with `buildPayload(idList)`. 3) Awaits a recursive batch get with `recursiveBatchGet(payload)`. 4) Maps the responses to products with `mapResponse(batchGetResponses)` and returns them.

  1. Input Validation - Set a limit of maximum items to be requested to avoid requiring parallel BatchGetItem requests.
    For example - max. 100 items per BatchGetItem request and every product requires 3 GetItem requests means that a single BatchGetItem request can retrieve up to 33 product details.

    This step could be avoided and execute BatchGetItem requests in parallel, but there could be a chance of facing issues like the maximum connection limit reached error.
  2. Build Payloads - a helper function will be needed to programmatically build the required payload for the BatchGetItem operations taking into consideration the different tables that need to be accessed for each product ID.

  3. Recursive BatchGetItem - a helper function that recursively calls itself to ensure that all UnprocessedKeys are retried.

  4. Response parsing - a helper function that transforms the BatchGetItem response to the given schema that the consumers are expecting for this API

Applying all these changes should significantly increase the efficiency and performance of the API.

Scenario 2: Inserting Data in a Single-table Design Architecture

The second scenario would imply a DynamoDB single table design architecture where we have a single table to store all the information needed for a Dashboard to analyze racehorses’ historical data. Records such as basic horse information, performance statistics, and race results are stored in the same table.

Before

Code snippet for storing horse details, statistics, and race information using the `putItem` function in an asynchronous manner.

Similar to the first scenario, we can see that the initial implementation is based on a set of sequential PutItem requests.

After

From that initial implementation, you should be able to detect two distinct flaws:

  • no-await-in-loop - There is a loop with asynchronous operations inside, which is usually a bad practice, as all operations for a given operation will need to be completed before the next one can start.

  • Sequential await putItem requests - This is also a bad practice, as the three operations are independent from each other and we’d ideally not want for them to be blocked by each other.

A better approach would look something like this:

Code snippet showing two steps: 1. Building a payload with the function `buildPayload` using parameters `horse`, `stats`, and `races`.2. Performing a recursive batch write with the function `recursiveBatchWrite`, using the payload.

  1. Build Payloads - a helper function will be needed to programmatically build the required payload for the BatchGetItem operations taking into consideration the different tables that need to be accessed for each product ID.

  2. Recursive BatchWriteItem - a helper function that recursively calls itself to ensure that all UnprocessedKeys are retried.

    💡
    This approach would only work to upload up to 25 records for a single horse.

Applying all these changes should significantly reduce the required time to upload all information.

Conclusion

Utilizing batch operations in DynamoDB is a powerful strategy to optimize your database interactions. By aggregating multiple requests into a single operation, you can improve performance, reduce latency, and manage resources more effectively. Whether you're dealing with multi-table architectures or single-table designs, batch operations offer a scalable solution to handle large volumes of data efficiently. As you continue to work with DynamoDB, consider integrating batch operations into your workflows to maximize the potential of your applications.

Recap of key points

  • BatchGetItem can retrieve up to 100 records or 16 MB of data in a single request.

  • BatchWriteItem can be used to insert or delete up to 25 records or 16 MB of data in a single request.

  • Using Batch* operations can help you reduce the execution time considerably by aggregating requests that were currently being done in sequence.

Additional resources and references

More from this blog

L

LHidalgo.dev

18 posts

💻 Full Stack Software Engineer and ☁ Serverless Developer, focused on building efficient and cost-effective applications using cloud-based technologies