The Rain and The Shade

June 27, 2011

Querying the Entities using the RowKey in Azure Table Storage

Filed under: Table Storage,Windows Azure — ovaisakhter @ 3:44 pm

When you look into the table and table storage you are introduced to the concept of mandatory properties of every entity that can be stored in the Tables. i.e

  • PartitionKey
  • RowKey
  • TimeStamp

You are also told that only Partition Key and Row Key are the only properties that are indexed. When reading this I got this idea that may be it should be a good idea to put some of the data in the key so that you can use this data for searching. For example if you are making a blog site then if can be a good idea to put the UserId inside the Row key of every blog’s row key and then find it using a String.Contains give me the blogs of the user ovais@gmail.com just return all the blogs where the Row id contains the email. At least we can draw this conclusion that a string.Contains should run faster (much faster) on the Row key rather than a “Non-indexed” field inside the entity.

So I tweeted and tried to confirm my hunch from the people at Cloud Cover on channel 9, who replied with affirmative.

Now I set off to measure the performance gains that I will get using the above mentioned approach. I created a user object like following,

Next I create records on this entity with the following code

image

Run the code twice with slight change in the Email address. So ideally I should have 8000 records in my table but there were 7600. I will investigate that later and report back but carrying on.

Now the fun part I started querying this entity from my code. Remember I have RowKey and Email having the email address. Which in my case contain a lot of entities starting with “o1”.

So I wrote one query each to get the record starting with “o1” on the User.RowKey and one with User.Email starting with “o1”. Now ideally the query running on RowKey should be much fast than the one running on the User.Email, not so ideally they should be almost the same. But in my case absolutely worst case happened. The RowKey Query was around 3 times slower than the Email query. Run the code 100 times took an average and the result was

  • Email Query Took 286 MilliSeconds
  • RowKey Query Took 919 MilliSeconds

Then I changed my queries and instead on doing Contains I did the equals comparison, and this time the RowKey query was much faster than the Email query.

So I can make this conclusion that the row keys are not stored as strings in the database most probably an Integer representation of them is saved and indexed. So the equals operation is fast but any string operation on them is extremely slow. I think this way doing things is highly non intuitive.

Here is the code I used to Query (Please do not mind a lot of Console.Write statements I was just trying to generate a MS Excel compatible output.)

image

So refrain yourself from querying any way except the equal on the RowKey or else you are in for a surprise and some I have a feeling that this will not be a good surprise.

 

Steven Max pointed out the an error with my code I was getting less records in the case of the Entity.Field case which gave such huge difference good news for me is that Entity.Field is still a little fast :)

RowKey Query: 1610,05 MilliSeconds
Entity.Field Query: 1590,07 MilliSeconds

Advertisements

June 25, 2011

What is Microsoft Azure VM Role and what it is Not

Filed under: VM Role,Windows Azure — ovaisakhter @ 12:18 pm

Some months back I heard about the Platform as a Service initiative from Microsoft in the PDC. It seemed exciting especially the VM role. I started thinking about the possible scenarios this feature can be used. Such as possibility to host our own servers like SharePoint 2010 or MS Dynamics CRM on the cloud.

A friend of mine who is not as lazy as I am jumped to the opportunity and uploaded a VM on Azure and started running a number of instances on it. He installed MS CRM, SharePoint on the instances connected them to his local domain with Windows Azure Connect. Life seemed as it should be and suddenly the dreams shattered. I received a very distressful Skype message from him.

After some conversation which is not mention able here, I got to know the problem was whenever he changed any configuration on VM roles all the instances were reinitialized or in other words were reverted to the “base image”. Initially I participated in his verbal bashing of Microsoft but later on when I thought it I realized that this cannot be a bug this has to be by design.

Then I started to look into it a bit further. Found this amazing video from Channel 9 series called Cloud Cover. http://bit.ly/iyImpC one sentence in the video cleared the whole scenario for me. “VM roles are an extension to the worker roles”.

Worker roles can be simply put an Azure version of Windows Services. These are long running processes which are used to perform resource demanding batch operations running on Microsoft Azure operating system. They are stateless but they can use one of storage mechanism provided by Windows Azure (Blobs, Table Storage or SQL Azure etc.).

The VM role is an extension to the same concept with the difference that in this case you can use your own operating system. You make a Virtual Machine, (and do some abracadabra with csupload utility) and upload it to Azure. Do some configuration and vala your VM in running in Azure, and if you need two of them change the configuration and now there are two of them and so on and so forth. As Azure is responsible of starting and disposing of these instances so it is not possible to these instances to maintain state. So each time an instance is started it is started as pure as the “base image”. You can contaminate it a bit with the Startup tasks but that about it.

But the question is where should we use it? , Let us look at one example of a potential use of this offering.

You have a video sharing website. You use a utility to encode all the uploaded videos before they are published. This utility does not (as most of the available ones) support the Azure Operating System. You can setup the encoding process on a VM and run it on Azure using the VM role. Just keep in mind that the processed videos should be persisted on either Azure storage or on your in-premises storage (you can use Azure Connect for this also) as soon as you are done. The value addition is that you can start I with a single instance and scale it to hundreds if and then required. You can also increase the instances for a certain period (I don’t know when people share people share more videos, after Holiday season may be)  and then scale down when they are not needed. You can scale to service 1000 clients to 1000000 and then back to 1000 in one night.

Blog at WordPress.com.