The Rain and The Shade

July 8, 2011

Using Reverse Time Stamp in TableStorage

Filed under: Table Storage,Windows Azure — ovaisakhter @ 11:59 am

When you read about the Azure Table Storage one of the earlier things you come to know is that there are only two (properties) fields in the stored Entities which are indexed. i.e. PartitionKey and RowKey.

All the records inside a partition are indexed by RowKey and are also automatically sorted on the RowKey also. I can say that if you can design your key in such a way that the records are always sorted in a way which is more suitable for most of your data access scenarios then you can save a lot processing and get much better performance.

In a lot cases the records should be sorted by their date of creation, so new records should be shown first. In TableStorage you get a property in every entity called TimeStamp which could be the first choice normally in this case for Ordering(good old SQL days ). When you go on and write your LINQ query with OrderBy the first thing you will get will be an error at the runtime. Because table storage does not supports OrderBy.

In TableStorage you can use a TimeStamp in the beginning your RowKey to get the records sorted by time of creation, and if you want to get the new records first you can reverse the timestamp. Here is an interesting code I have used which can do that for you.

string myRowKey = DateTime.MaxValue – DateTime.UtcNow).Ticks.ToString(“d19”)

//think I saw this code in one of the CloudCover videos by Steve Marx

so you can create your key like Entity.RowKey = myRowKey+whatOtherwiseCouldHavebeenMyRowKey+SomethingElseIfYoureallyWantto and when get your records they will be nicely sorted on the date of creation.

Advertisements

June 27, 2011

Querying the Entities using the RowKey in Azure Table Storage

Filed under: Table Storage,Windows Azure — ovaisakhter @ 3:44 pm

When you look into the table and table storage you are introduced to the concept of mandatory properties of every entity that can be stored in the Tables. i.e

  • PartitionKey
  • RowKey
  • TimeStamp

You are also told that only Partition Key and Row Key are the only properties that are indexed. When reading this I got this idea that may be it should be a good idea to put some of the data in the key so that you can use this data for searching. For example if you are making a blog site then if can be a good idea to put the UserId inside the Row key of every blog’s row key and then find it using a String.Contains give me the blogs of the user ovais@gmail.com just return all the blogs where the Row id contains the email. At least we can draw this conclusion that a string.Contains should run faster (much faster) on the Row key rather than a “Non-indexed” field inside the entity.

So I tweeted and tried to confirm my hunch from the people at Cloud Cover on channel 9, who replied with affirmative.

Now I set off to measure the performance gains that I will get using the above mentioned approach. I created a user object like following,

Next I create records on this entity with the following code

image

Run the code twice with slight change in the Email address. So ideally I should have 8000 records in my table but there were 7600. I will investigate that later and report back but carrying on.

Now the fun part I started querying this entity from my code. Remember I have RowKey and Email having the email address. Which in my case contain a lot of entities starting with “o1”.

So I wrote one query each to get the record starting with “o1” on the User.RowKey and one with User.Email starting with “o1”. Now ideally the query running on RowKey should be much fast than the one running on the User.Email, not so ideally they should be almost the same. But in my case absolutely worst case happened. The RowKey Query was around 3 times slower than the Email query. Run the code 100 times took an average and the result was

  • Email Query Took 286 MilliSeconds
  • RowKey Query Took 919 MilliSeconds

Then I changed my queries and instead on doing Contains I did the equals comparison, and this time the RowKey query was much faster than the Email query.

So I can make this conclusion that the row keys are not stored as strings in the database most probably an Integer representation of them is saved and indexed. So the equals operation is fast but any string operation on them is extremely slow. I think this way doing things is highly non intuitive.

Here is the code I used to Query (Please do not mind a lot of Console.Write statements I was just trying to generate a MS Excel compatible output.)

image

So refrain yourself from querying any way except the equal on the RowKey or else you are in for a surprise and some I have a feeling that this will not be a good surprise.

 

Steven Max pointed out the an error with my code I was getting less records in the case of the Entity.Field case which gave such huge difference good news for me is that Entity.Field is still a little fast :)

RowKey Query: 1610,05 MilliSeconds
Entity.Field Query: 1590,07 MilliSeconds

Blog at WordPress.com.