Azure Blob Storage Index Tags

Improving Data Management and Discovery through Key-Value Index Tag Attributes

Prabhashi Wijesinghe
LinkIT

--

Hello Everyone!!!

Today I will share my experience regarding Azure Blob Storage Index Tags with you all.

This article demonstrates the utilization of blob index tags with the Azure Storage client library for .NET to enhance data management and search capabilities. Index tags were officially released for Azure Blob Storage in June 2021.

Blob index tags enable the categorization of data in a storage account via the use of key-value tag attributes. These tags are indexed automatically and presented as a multi-dimensional index, facilitating easy data search. The article provides instructions for setting and finding data using blob index tags.

With the increase in dataset size, finding a particular object within a vast amount of data can be challenging. However, the utilization of blob index tags, which employ key-value index tag attributes, can improve both data management and discovery. These tags allow for the categorization and search of objects within individual containers or across all containers in a storage account. As data needs evolve, objects can be dynamically classified by modifying their index tags. Additionally, objects can remain in their current container organization.

Blob index tags offer the following capabilities:

  • The ability to categorize blobs dynamically by utilizing key-value index tags.
  • Swift identification of specifically tagged blobs throughout an entire storage account.
  • The specification of conditional behaviours for blob APIs is based on the assessment of index tags.
  • The use of index tags to implement advanced controls on features such as blob lifecycle management.

Consider a scenario where you have a large number of blobs that contain customer data in your Azure Storage account, accessed by various applications for different purposes. You want to retrieve all blobs related to a specific region, but you don’t have their exact names or properties. These blobs are stored in different containers with unique naming conventions. However, each blob has a region tag indicating the region it belongs to. By using the region tag attribute, you can filter the blobs based on their region without manually searching through all blobs and analyzing their properties. The blob index will search through all containers in the storage account and quickly return the relevant blobs with the specified region tag.

Setting blob index tags

You can assign index tags to your blobs if your code is authorized to access the blob data through one of the following ways:

  • Security principal that has been given an Azure RBAC role with the “Microsoft.Storage/storageAccounts/blobServices/containers/blobs/tags/write” action. The built-in role of the Storage Blob Data Owner includes this action.
  • Shared Access Signature (SAS) grants permission to access the blob’s tags (with “t” permission).
  • The account key.

There are several limitations that apply to blob index tags:

  • Tag keys and values can only support string data types. Any numbers, dates, times, or special characters are stored as strings.
  • Tag keys and values are case-sensitive.
  • Each blob can have a maximum of 10 blob index tags.
  • Tag keys must be between one and 128 characters in length.
  • Tag values must be between zero and 256 characters in length.
  • When versioning is enabled, index tags are applied to a specific version of a blob. If you set index tags on the current version and a new version is created, the tag won’t be associated with the new version. It will only be associated with the previous version.
  • When setting tag keys and values, it is important to follow specific naming rules. These rules require the use of alphanumeric characters including lowercase and uppercase letters, as well as numbers. Additionally, certain special characters are allowed, such as space, plus, minus, period, colon, equals, underscore, and forward slash.

There are two methods available to set tags on blobs:

• SetTags
• SetTagsAsync

Here is a C# example code for setting index tags.

Finding data using blob index tags

Once you have set your index tags, they will be exposed in a multi-dimensional index by the indexing engine. These tags will exist on the blob and can be retrieved immediately. However, it may take some time for the blob index to update. Once the update is completed, you can use the native discovery and query features provided by blob storage.

The Find Blobs by Tags function allows you to retrieve a subset of blobs that match a given query expression based on their index tags. This function can be used to filter across all containers in your storage account or just one container. As all index tag keys and values are stored as strings, relational operators use a lexicographic sorting method.

If your code is authorized to access blob data through any of the following methods:

  • A security principal with an Azure RBAC role that has the “Microsoft.Storage/storageAccounts/blobServices/containers/blobs/filter/action” action, such as the built-in Storage Blob Data Owner role.
  • A Shared Access Signature (SAS) with permission to filter blobs by tags using the “f” permission.
  • An account key.

Then, you can utilize index tags to locate and filter data.

Here are some important rules that apply when using blob index filtering:

  • The @ character is only allowed for filtering on a specific container name(@container = ‘ContainerName’).
  • Double quotes should be used to enclose tag keys.
  • Single quotes should be used to enclose tag values and container names.
  • It is not valid to use the same-sided range operations on the same key(“Id” > ‘10’ AND “Id” >= ‘15’).
  • When creating a filter expression using REST, it is important to URI encode characters.
  • Blob index filtering is designed to work best when querying a single tag for equality or using range queries involving >, >=, <, <= on a single tag. Queries that involve multiple tags connected by AND are less efficient. For instance, queries like “Id” > ‘01’ AND “Id” <= “100” are efficient, but queries like “Id” > ‘01’ AND “OwnerId” = ‘2’ are not as efficient.

The following table lists all of the operators that can be used when searching for blobs using tags.

Artwork by Author

There are two methods available for finding data using tags:

•FindBlobsByTags
•FindBlobsByTagsAsync

The below C# example code illustrates how to search for all blobs in sampleContainer that have been tagged with a created date within a particular range.

In conclusion, Azure Blob Storage Index Tags are a useful feature that provides users with the ability to tag and locate data within a storage account. With the help of index tags, users can easily organize and categorize data based on specific key-value pairs. This feature is flexible and can be accessed using various methods, such as the Azure Portal, REST API, and Azure PowerShell. It’s important to follow specific rules when creating tag queries, as it can affect the efficiency of the search. However, by using single tag queries for equality or range queries, users can greatly improve search performance. Overall, Azure Blob Storage Index Tags are a powerful tool that can greatly enhance the functionality and organization of data within a storage account.

Hope you gained some new knowledge on Azure Blob Storage Index Tags.

Thanks for reading!!!

--

--

Prabhashi Wijesinghe
LinkIT
Writer for

A dedicated and hardworking IT undergraduate, willing to share and gain information related to IT industry👩‍💻