Avoid and Fix Common DynamoDB Mistakes

TL;DR
This article provides a brief introduction to DynamoDB and the key differences between SQL and No-SQL databases in order to better understand the common mistakes and how to fix them. Feel free to skip to the Key takeaways section to read the most relevant points.

Not in the mood for reading? Check out the recorded version of this article on YouTube

Introduction

Getting started with DynamoDB is fairly easy since there are a lot of examples and good documentation out there, but it’s as easy or even easier to end up using it wrong, due to a lack of knowledge.

During the last few years, the vast majority of mistakes I’ve seen have been due to misunderstandings or a lack of knowledge of how DynamoDB works and the differences between NoSQL and SQL databases.

This article aims to provide a brief introduction to DynamoDB and showcase the most common mistakes during its implementation and how one could solve them.

DynamoDB 101

The key aspects of DynamoDB could be described as follows:

Managed NoSQL database by AWS: Designed to handle large volumes of data with low latency.
High availability and automatic scalability: Automatically adjusts to demand without manual intervention (When configured to PAY_PER_REQUEST).
Consistent millisecond performance: Ideal for applications that require fast response times.
Supports flexible data models: Includes document and key-value tables, with secondary indexes for efficient queries.
Global security and replication: Provides encryption and data replication across multiple regions for greater durability and availability.

Key differences: SQL vs NoSQL

	SQL	No-SQL
Data Model	Relational - Tables with rows and columns	Documents, Key-value, …
Schema	Fixed Schema - requires definition	Flexible and Dynamic Schema
Queries	SQL	System Specific
Scalability	Vertical	Horizontal
Data Consistency	ACID (Atomicity, Consistency, Isolation, Durability)	Eventual consistency to improve availability and performance

The table above shows what I personally would consider the key differences between both Database types since, if you’re not aware of them, you’ll probably end up using one of them wrong.

For example:

Schemas - No-SQL Databases don’t have a strict or predefined schema, meaning that you can store a mix of data types and structures in a single table.
Data Consistency - In order to allow for high performance and availability, No-SQL databases are, in general, not capable of providing the data consistency we’re used to with SQL.

Both of these examples are key things to keep in mind when choosing what database matches best your project, as No-SQL and DynamoDB in particular might not always be the right fit for you.

Example Key Difference SQL vs DynamoDB

A part of the differences in how they are built and how they work under the hood, it’s also important to consider the differences in how services can interact with the databases since, as stated above, No-SQL DBs tend to rely on system-specific query languages.

For example, INSERT INTO in SQL and PutCommand on DynamoDB will behave differently, and not knowing the default behaviors of the systems you interact with can land you in, what seems like inexplicable bugs.

What we mean by that is that the PutCommand and UpdateCommand on DynamoDB will, by default, behave like an UPSERT command in SQL.

This means that if developers are not careful enough or there is a lack of validations before performing the action, you might end up overwriting records (losing data!) or with downstream errors created by records without all the required attributes (records wrongly created by an UpdateCommand).

Avoiding this behavior can easily be done by adding conditions to those commands.

Condition Expressions

Condition Expressions could be seen as the DynamoDB implementation of the WHERE clause, but limited to be executed only over a single record.

Introduction

Using them allows developers to specify what conditions should be met for a given action to be performed.

For example, if we want to avoid DynamoDB default UPSERT behavior on PutCommand and UpdateCommand it would be as easy as adding a condition to the command.

In the above example, we added a ConditionExpression to the UpdateCommand to ensure that DynamoDB only updates records where the primary key exists (aka. it will only update existing records and fail if no record was found for the given keys).

💡

Any attribute names and values used inside a ConditionExpression will also be required to be specified under the ExpressionAttributeNames and ExpressionAttributeValues.

Idempotency Checks

But the usage of Condition Expressions doesn’t stop there. It allows us to streamline for example the implementation of idempotency checks.

Idempotency checks are usually used to avoid processing a single event more than once. This becomes especially useful on distributed or event-driven applications where, in most cases, a single event could be delivered more than once or two consumers could be processing it at the same time.

The usual implementation of this kind of check is using a database or cache to store a record for every event that has been or is currently being processed.

This means that the first step for a consumer lambda would be to:

Check if the Event has already been processed
Update the databases stating that the given event is currently being processed

The screenshot above depicts what we would call a bad idempotency check implementation as it is being done with two different DynamoDB operations and we can’t ensure that this will stop concurrent processing of the same event, as there will be some time lapse between the read and the write operations.

The correct implementation of an idempotency check using DynamoDB would be by using only one operation, either a Put or an Update command would work depending on if we need to persist any attributes stored by a previously failed execution.

The above sample implements a PutCommand with a condition expression that will only succeed if:

attribute_not_exists(#pk) - This will only be true if no record is found for the given primary key, as it’s a mandatory attribute

attribute_exists(#pk) AND #status = :failedStatus - This condition will only succeed if a record is found but the status of the previous execution was set as FAILURE. Implementing this additional check allows us to seamlessly retry failed executions.

Business logic Checks

Condition Expressions not only allow us to implement idempotency checks, but also allow us to implement a condition on any Put or Update command.

Another good scenario for it could be the backend of a given marketplace, where we would need to perform some business logic checks before a given transaction is approved.

Similar to the previous scenario, developers might be tempted to implement the business logic as part of the code by making a read operation, performing any business logic, and finally making an update operation to store the final value.

This implementation would be flawed and probably generate some hard-to-find bugs in high-traffic scenarios, as no consistency can be ensured between the read and write operations and a wrong stock amount could be stored.

A better approach to this would be using Condition Expressions and, if there is a requirement for specific error messages, adding the ReturnValuesOnConditionCheckFailure to ALL_OLD.

By configuring your DynamoDB request that way, DynamoDB will throw a ConditionalCheckFailedException if the ConditionExpression is not met and provide the record details as it were when the condition was analyzed.

Developers would be able to access the error and error.Item to run any additional logic and choose the appropriate error message, based on what part of the condition could have failed.

💡

Do you want to learn more about DynamoDB Condition Expressions? Feel free to head over to this article covering them in more detail.

Retrieving Data

Similar to Put and Update operations, No-SQL DBs also present a different behavior regarding how developers are expected to handle the retrieval of data.

Limitations

The most known limitation is that No-SQL databases perform poorly with access patterns and queries over attributes not part of the keys.

A part of that, and similar to the LIMIT statement in SQL, Scan and Query operations can only retrieve up to a maximum of 1 MB of data at a time.

This presents a limitation for those cases where you need to retrieve more information or, especially, if you pass any type of query condition to your request, as DynamoDB will analyze up to 1 MB and could for example return an empty response even if matching records are present in the table.

Pagination

To overcome that 1 MB limitation, developers are expected to implement pagination.

Implementing it is as easy as checking for the LastEvaluatedKey attribute in the Scan or Query response and passing it as the ExclusiveStartKey in the next request.

If this is developed as a recursive function, as shown in the above sample image, developers will ensure that all the records in a table or all records that match a specific query will be found and returned.

Batch Operations

Put, Update, Query and Scan operations are the most well-known operations, but there are also scenarios where there is a need of writing or reading multiple specific records at once.

Limitations

For those scenarios, people usually think of implementing a loop or multiple requests in parallel, which triggers are usually not the best approach.

On one side, having a loop retrieving information is usually already flagged as a bad practice by linters with rules like no-await-in-loop from ESLint.

This is due to the poor efficiency of having all those requests in sequence, which will increase the overall execution time of the function exponentially.

On the other side, developers might think that retrieving the data in parallel with a Promise.all or Promise.allSettled might be a good approach but this will also not scale well and be difficult to debug, as developers could face a maximum connection limit reached error.

The correct implementation would be to take advantage of the available Batch* operations of DynamoDB.

There are two different batch operations that can be used with DynamoDB:

BatchGetItems - Operation that will allow us to retrieve up to 16 MB or 100 records from the same or different tables in a single request
BatchWriteItem - As the name implies, this operation will allow us to write (PutRequest) but also to delete (DeleteRequest) up to 16 MB or 25 records to a single or multiple tables.

These operations are especially useful for aggregating multiple GetItem or Put/DeleteItem requests into a single call to DynamoDB.

Unprocessed Items

Similar to the Query and Scan operations and due to the 16 MB limit on the Batch* operations, developers should expect some requests to fail, either partially or entirely.

Any request that doesn’t respect the 100 record read and 25 record write limit will fail entirely, throwing an error without doing any modifications on the DynamoDB tables.

The 16 MB limit is a bit trickier, as one could expect those requests to fail partially, DynamoDB will do its best to read or write up to a maximum of 16 MB for a single request and, if any records are not processed, it will return those as part of the UnprocessedItems attribute in the response.

Developers should always consider this when using these types of operations and implement a recursive function accordingly that will retry any UnprocessedItems found in the response.

💡

Do you want to learn more about DynamoDB Batch Operations? Feel free to head over to this article covering them in more detail.

Conclusions

DynamoDB is a powerful and versatile NoSQL database that offers unique advantages for various workloads. However, to fully leverage its capabilities, developers must understand its specific behaviors and limitations. By mastering concepts such as condition expressions, pagination, and batch operations, developers can create more efficient, consistent, and scalable applications.

Developers should consider the following key points when implementing any workload that relies on or interacts with DynamoDB.

Key takeaways

Put and Update act like UPSERTS: Using the PutItem or UpdateItem commands without adding any condition behaves like UPSERTS in SQL.
Reducing calls with CONDITION EXPRESSIONS: Adding conditional expressions allows us to perform our logic in a single call and with data consistency.
Query and Scan require paging logic: Query or Scan operations only operate on 1MB pages, paging is required to retrieve more information.
Use Batch operations to reduce execution time*: A Batch* operation is more effective than multiple individual operations in sequence or parallel and can be used to aggregate operations to different tables into a single request.
Use BatchGet to retrieve higher volumes of information: The BatchGet operation allows you to retrieve up to 16MB or 100 records compared to 1MB for Query or Scan operations.
BatchGet and BatchWrite require retry logic: With Batch* operations it is essential to apply retry logic on UnprocessedItems.

DynamoDB Common Mistakes and how to fix them

Introduction

DynamoDB 101