Thursday, November 28, 2024
Google search engine
HomeData Modelling & AIAzure Table Storage Tips for the RDBMS Developer

Azure Table Storage Tips for the RDBMS Developer

Suppose you only have relational database management system (RDBMS) experience and are new to Azure Table storage. It’s common to find yourself “thinking in SQL” and trying to solve database modeling requirements with a SQL approach before translating them to a key-value mindset.

In this tutorial, I’ll cover some fundamentals of Azure Table storage to get you more familiar with this type of storage service, including property types and table limits, with your RDBMS background in mind (but I welcome IT pros from all backgrounds to use this guide and learn something new).

Then, I’ll dive into common questions you might find yourself asking about how to use Azure Table storage and provide some best practices with several query examples based on potential real-world scenarios you might experience on the job.

Where code samples or references are applicable in this blog post, I’ll be focusing on .NET and using the Azure SDK (specifically relating to the Microsoft.Azure.Cosmos.Table NuGet package).

If you have a RDBMS background and are new to Azure Table Storage, it’s common to find yourself “thinking in SQL” and trying to solve database modeling requirements with a SQL approach before then trying to translate that to a key-value mindset. In this blog post, I’ll cover some of the fundamentals of Azure Table Storage and dive into some common questions you might find yourself asking about Azure Table Storage. Where code samples or references are applicable in this blog post, we’ll be focusing on .NET and using the Azure SDK (specifically relating to the Microsoft.Azure.Cosmos.Table nuget package).

 

You can use the table of contents below to more easily jump to the specific area of the guide you’re interested in:

Azure Table Storage Explained

Properties and Limits

Indexes

Transactions

High Availability / Data Redundancy Comparison Chart

Pricing

Azure Table Storage FAQ

What kind of NoSQL store is Azure Table storage?

How do I create custom indexes on an Azure Table storage table?

How do I optimize querying by different properties if I can’t have secondary indexes?

Azure Table storage query example: How to search for an employee by employee id or domain username using PartitionKey and RowKey

Can I query by properties other than the PartitionKey and RowKey?

How can I insert/update multiple entities in an atomic operation?

How do I store entities with different schema in the same table?

How can I perform a LIKE query for values starting with a given value?

Azure Table storage query example: How to retrieve a distinct list of employees using Range Query

How do I enforce unique constraints?

How can I include special characters in the PartitionKey or RowKey?

How do I perform case-insensitive queries?

How can I perform an “ORDER BY Date DESC” query?

Azure Table storage query example: How to look up orders in both ascending and descending order of OrderDate

How do I limit the fields I pull back in a query (SELECT <ColumnList>)?

Azure Table storage vs. CosmosDB

 How to Start Using Azure Table Storage + Additional Resources Recommendation

 

Azure Table Storage Explained

First, let’s cover some of the fundamental constructs and features of Azure Table storage, including the main limitations and differences versus what you might be used to as an RDBMS developer.

Azure Table Storage Properties and Size Limits

  • An individual table can contain a collection of entities with different properties.
  • A table can contain up to 255 properties.
  • Each property can be up to 64KB in size.
  • Each entity can be up to 1MB in size.
  • You can have as many tables as you want, up to the storage capacity of an Azure Storage Account (500TB).
  • Every entity has a PartitionKey, RowKey, and a Timestamp. The Timestamp property is maintained server-side as the time the entity was last modified and is used to provide optimistic concurrency, so you cannot directly modify it.
  • PartitionKey and RowKey are both strings and can each be up to 1KB in size, and the combination of both must be unique.

Azure Table Storage Indexes

  • Only the PartitionKey and RowKey properties are indexed.
  • Secondary indexes on other properties cannot be created.
  • These values are indexed in ascending order.
  • Having a solid understanding of your query requirements is important, as there are strategies for dealing with the apparent limitation on indexes, which I’ll cover in some common scenarios below.

Azure Table Storage Transactions

  • Atomic updates can be performed as long as the entities are all within the same table partition (same PartitionKey).
  • You cannot perform cross-table or cross-partition transactions.
  • A maximum of 100 operations can be performed within the same atomic operation.
  • The total payload of the batch must be no more than 4MB.

Azure Table Storage High Availability / Data Redundancy Comparison Chart

Use the comparison chart below to choose the level of replication you may need for Azure Table storage:

Locally-Redundant Storage (LRS)

Read-Access Geo-Zone-Redundant Storage

(RA-GZRS)

  • Lower cost
  • 99.9% availability SLA for reads and writes
  • 99.9% availability SLA for writes
  • Least durability (99.999999999% – 11 9s)
  • No availability in event of a datacenter or region outage

 

  • Higher cost
  • 99.99% availability SLA for reads (Read-access (RA-*) redundancy options provide read access from a secondary region)
  • 99.9% SLA availability SLA for writes
  • Highest durability (99.99999999999999% – 16 9s)
  • Availability in event of a datacenter or region outage

Azure Table Storage Pricing

  • Azure Table storage is very cost-effective.
  • You’re charged based on the amount of storage and the number of storage transactions made against the service (e.g., an individual operation against the storage API).
  • For example, in December 2021, for 1000GB of storage and 100 million storage transactions/month, the approximate costs in U.S. regions are as follows:
    • LRS redundancy: $45.04/month
    • RA-GRS redundancy: $75.04/month
    • RA-GZRS (where offered): $126.54/month

Back to top  

Azure Table Storage FAQ for the RDBMS Developer

What kind of NoSQL store is Azure Table storage?

Azure Table storage is a NoSQL key-value PaaS data store designed to be a great option for highly scalable, highly available systems. It supports storing petabytes of data and a flexible data schema, meaning different entities in the same table can have different schemas.

References to NoSQL databases having “flexible schema” or having a “schemaless design” can give the impression database schema design is a thing of the past, and you can bypass it to focus more on the application code. The reality is, even in this NoSQL data world, schema design is very important and if you don’t give it due care and attention, it can come back to bite you.

How do I create custom indexes on an Azure Table Storage table?

The only index existing on a table is on the PartitionKey and RowKey properties. Secondary indexes are not supported.

How do I optimize querying by different properties if I can’t have secondary indexes?

Storing duplicate copies of the data with different PartitionKey and RowKeys properties is the way to go. Storage is inexpensive. You can use the in-partition batch support to insert/modify the copies in an atomic manner. Just bear in mind the limitations on batches—e.g., 100 operations max, 4MB limit on total payload size—but for most scenarios, this is not a concern.

Azure Table storage query example: How to search for an employee by employee id or domain username using PartitionKey and RowKey

In this scenario, we have the following data in the Employee table:

PartitionKey RowKey FirstName LastName EmployeeId DomainUsername
Employee Id_012345 Joe Bloggs 012345 jbloggs
Employee Uname_jbloggs Joe Bloggs 012345 jbloggs

Here, we’re storing two copies of an Employee entity—all the custom properties are the same (FirstName, LastName, etc.). However, to facilitate our query requirements, we’ll use a different RowKey for each.

A query by Employee Id would be an efficient Point Query (a query for a single entity identified by its PartitionKey and RowKey).

// Equivalent to SQL: SELECT * FROM Employee WHERE PartitionKey = 'Employee' AND RowKey = 'Id_012345'
var query = TableOperation.Retrieve<EmployeeEntity>("Employee", "Id_012345");
var result = await employeeTable.ExecuteAsync(query);

A query by domain username would also be an efficient Point Query. The above query is a shortened convenience wrapper for single-row lookups by PartitionKey and RowKey. However, you can also use ExecuteQuerySegmentedAsync (as shown below), which is what you’d use for queries returning multiple entities (complete example).

// Equivalent to SQL: SELECT * FROM Employee WHERE PartitionKey = 'Employee' AND RowKey = 'Uname_jbloggs'
var filter =
    TableQuery.CombineFilters(
        TableQuery.GenerateFilterCondition("PartitionKey",QueryComparisons.Equal, "Employee"),
        TableOperators.And,
        TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.Equal, "Uname_jbloggs"));

var query = new TableQuery<EmployeeEntity>().Where(filter);
var querySegment = employeeTable.ExecuteQuerySegmentedAsync(query, null);

But how would you retrieve a list of all employees? If we retrieved everything in the Employee partition, we’d get duplicates, as we’re storing multiple copies of the data. So, we would want to query for entities where the RowKey starts with “Id_” (we could also choose to search for RowKeys starting with “Uname_”). See the “How can I perform a LIKE query for values starting with a given value?” question further down for more information.

Can I query by properties other than the PartitionKey and RowKey?

Yes. But just like in an RDBMS, when you query on a field that is not indexed, it will perform a scan. If you’re searching on a custom property within a specific partition, it will be a Partition Scan. If you’re searching across all partitions, a Table Scan will be the most inefficient query to perform.

How can I insert/update multiple entities in an atomic operation?

Using the previous employee example, how do you go about creating both entities atomically when adding a new employee to the database? Well, as shown in the code below, you can perform multiple operations as part of a TableBatchOperation—just recall the limitations I mentioned in the fundamentals section: a maximum of 100 operations per batch and all entities must have the same PartitionKey and a maximum batch payload size of 4MB.

var employeeById = new EmployeeEntity {
    PartitionKey = "Employee",
    RowKey = "Id_12345",
    FirstName = "Joe",
    LastName = "Bloggs",
    EmployeeId = "012345",
    DomainUsername = "jbloggs"
};

var employeeByDomainUsername = new EmployeeEntity {
    PartitionKey = "Employee",
    RowKey = "Uname_jbloggs",
    ... rest of properties...
};

var batchOperation = new TableBatchOperation();
batchOperation.Insert(employeeById);
batchOperation.Insert(employeeByUname);

// employeeTable variable is a reference to a CloudTable instance pointing to the Employee table
await employeeTable.ExecuteBatchAsync(batchOperation);

How do I store entities with different schema in the same table?

There’s nothing special here—just create a new entity with different properties and insert it into the same table.

The following example demonstrates this for Order and OrderItem entities, storing the order header and order item entities in the same table.

var order = new OrderEntity {
    PartitionKey = "Order",
    RowKey = "12345",
    OrderId = "12345",
    Total = 10.99,
    OrderDate = "2020-01-02T03:04:05.678Z"
};

var orderItem = new OrderItemEntity {
    PartitionKey = "Order",
    RowKey = "Item_12345_ABC123",
    ProductCode = "ABC123",
    Quantity = 1,
    UnitPrice = 10.99
};

var batchOperation = new TableBatchOperation();
batchOperation.Insert(order);
batchOperation.Insert(orderItem);

This would result in the following data in the table:

PartitionKey RowKey OrderId Total OrderDate ProductCode Quantity UnitPrice
Order Id_12345 12345 10.99 2020-01-02T03:04:05.678Z      
Order Item_12345_ABC123       ABC123 1 10.99

A note about the limitations of properties on a single table—the combined total number of properties cannot exceed 255.

How can I perform a LIKE query for values starting with a given value?

You can run a Range Query equivalent to a LIKE ‘Something%’ condition. Range queries are the second-best types of query, filtering on the partition key and a range of row key values.

Azure Table storage query example: How to retrieve a distinct list of employees using Range Query

We’re looking for all rows starting with “Id_” in this case.

var rowKeyStartsWith = "Id_";

var partitionKeyFilter =
    TableQuery.GenerateFilterCondition("PartitionKey",QueryComparisons.Equal, "Employee");

// Replace the last character in the starting value with the next character in sequence
// to serve as the end of the range to filter on
var rowKeyRangeEnd =
    rowKeyStartsWith.Substring(0, rowKeyStartsWith.Length - 1) + (char)(rowKeyStartsWith.Last() + 1);

var rowKeyFilter =
    TableQuery.CombineFilters(
        TableQuery.GenerateFilterCondition("RowKey",  QueryComparisons.GreaterThan, rowKeyStartsWith),
        TableOperators.And,
        TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.LessThan, rowKeyRangeEnd));

// Equivalent to SQL: SELECT * FROM Employee WHERE PartitionKey = 'Employee' AND RowKey LIKE 'Id[_]%'
var fullFilter = TableQuery.CombineFilters(partitionKeyFilter, TableOperators.And, rowKeyFilter);

var query = new TableQuery<EmployeeEntity>().Where(fullFilter);

The outcome is we search for RowKeys > ‘Id_’ and < ‘Id’ for the employee list.

How do I enforce unique constraints?

PartitionKey and RowKey combinations are unique. In a relational database such as SQL Server, you might add a unique constraint or a unique index to fields unique outside of the Primary Key. In Azure Table storage, you can add an extra copy of the entity within the same atomic batch to enforce uniqueness.

For example, in the previous employee scenario above, EmployeeId and DomainUsername values would have uniqueness enforced, as we’ve duplicated the data with separate RowKeys with those properties built-in.

How can I include special characters in the PartitionKey or RowKey?

A limitation of PartitionKeys and RowKeys is that certain characters are not allowed. You’d get an error when trying to insert the Uname entity shown above because of the backslash if you try to create an employee with “MyDomain\jbloggs” as the DomainUsername.

One approach for this is to base64 encode the value, which results in a safe string except it could still contain a forward slash, which is not allowed. You can replace the forward slash with a substitute safe character, such as an underscore.

For “MyDomain\jbloggs” username the outcome would be as follows:

PartitionKey

RowKey

FirstName

LastName

EmployeeId

DomainUsername

Employee

Id_012345

Joe

Bloggs

012345

MyDomain\jbloggs

Employee

Uname_TXlEb21haW5camJsb2dncw==

Joe

Bloggs

012345

MyDomain\jbloggs

 

How do I perform case-insensitive queries?

Queries are case-sensitive. To perform a case insensitive search, standardize the case of the value being included in the RowKey to lower or uppercase before you store/query them. If you need to encode the value, standardize the casing before encoding, like in the last question.

You can store the original value unchanged inside the entity, but for search purposes, standardizing the value’s case inside the PartitionKey or RowKey is what we’re talking about here.

How can I perform an “ORDER BY Date DESC” query?

Rows are stored in ascending order. There are two parts to support being able to query by datetime order:

  1. Datetimes should be converted to a string in an orderable format (we’ll use the ISO8601 format)
  2. We need to convert datetimes to values that effectively reverse the ordering (i.e., the value decreases as the datetimes increase)

Azure Table storage query example: How to look up orders in both ascending and descending order of OrderDate

Let’s take the following example data in an Orders table:

PartitionKey

RowKey

OrderId

Total

OrderDate

Order

ASC_2020-01-02T03:04:05.678_1

1

10.99

2020-01-02T03:04:05.678Z

Order

DESC_7980-12-30T20:55:54.321_1

1

10.99

2020-01-02T03:04:05.678Z

Breaking those RowKeys down into three parts:

Date_Order_Descending

  1. The first part of the RowKey allows us to differentiate between entities stored in ascending and descending order.
  2. The second part for ASC rows will be the plain order date, and for DESC rows will be a DateTime value calculated by subtracting the order date from DateTime.MaxValue.
  3. The third part is the OrderId, which guarantees uniqueness of each PartitionKey + RowKey combination.

To retrieve a list of orders in:

  • Ascending OrderDate – Query for entities where the RowKey starts with “ASC_”
  • Descending OrderDate – Query for entities where the RowKey starts with “DESC_”

If multiple entities have the same OrderDate, the returned order would then come down to the OrderId part.

How do I limit the fields I pull back in a query (SELECT <ColumnList>)?

As shown in the example below, one way to do this is to specify the fields via the TableQuery.SelectColumns property.

// Equivalent to SQL:
// SELECT PartitionKey, RowKey, DomainUsername FROM Employee WHERE PartitionKey = 'Employee' AND RowKey = 'Id_012345'
var filter =
    TableQuery.CombineFilters(
        TableQuery.GenerateFilterCondition("PartitionKey",QueryComparisons.Equal, "Employee"),
        TableOperators.And,
        TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.Equal, "Id_012345"));

var query = new TableQuery<EmployeeEntity>().Where(filter);
query.SelectColumns = new List<string> { "DomainUsername" });

This will pull PartitionKey, RowKey, and DomainUsername from the table—PartitionKey and RowKey are always returned; however, you can exclude those, too, by providing a TableRequestOptions instance when executing the query, with ProjectSystemProperties=false.

Azure Table storage vs. CosmosDB

Azure Table storage has been around for many years, and since then, CosmosDB (formerly DocumentDB) has come along as another NoSQL PaaS offering from Microsoft. CosmosDB is also a great option with some added benefits, such as secondary indexes. However, these come at a cost, which is why Azure Table storage can be a great option depending on your needs.

I still love Azure Table storage as a non-relational data store for its cost-effectiveness and simple pricing model. When it suits the requirements and use cases, I keep it at the forefront of my mind to consider.

There’s also an option to migrate to CosmosDB without rewriting your application data access logic. CosmosDB has a Table API, and you can use the same client library to access it as you do for accessing Azure Table storage—it’s just a different connection string.

There are some other differences to be aware of in terms of the behaviors/limits of Azure Table storage vs. CosmosDB Table API. However, the main differences I outlined are a good place to start when evaluating which type of table API is best for your purposes.

How to Start Using Azure Table Storage + Additional Resources Recommendation

When you first try out a new database you’re unfamiliar with, there is a learning curve for everything from data model best practices and what query patterns are supported to which client library to use and how to perform the right operations through it. The natural starting point is to think in terms of what you know. For SQL developers trying out Azure Table storage, thinking about SELECT statements, clustered indexes, non-clustered indexes, etc., and then working out how to replicate those concepts can help accelerate this ramping up period.

There are many excellent resources out there for getting up to speed with Azure Table storage—the Microsoft documentation is a great resource—but, hopefully, this guide has given you some useful tips about how to translate some of those RDBMS concepts over.

Are you looking for a solution to more easily monitor Azure SQL database performance? SolarWinds SQL Sentry is designed to help you quickly identify issues and improve query performance. You can also get actionable data about historical and long-running queries using the included Top SQL view. This insight can help you optimize the most resource-intensive statements and focus on tuning queries with the highest impact on your workload first for more proactive troubleshooting. You can download a free 14-day trial of SQL Sentry here to get started.

Adrian (@AdaTheDev) is a Lead Development Engineer at SentryOne, focusing on the development of the Data DevOps SaaS product portfolio – namely SentryOne Document and SentryOne Test. Based in the UK, he joined SentryOne from Pragmatic Works in 2018, with 18 years experience in software development that includes a background in developing SaaS platforms in the hospitality and digital marketing industries. Adrian is a fan of good (bad) puns, dad jokes, and proper British biscuits.

RELATED ARTICLES

Most Popular

Recent Comments