When you look into the table and table storage you are introduced to the concept of mandatory properties of every entity that can be stored in the Tables. i.e
You are also told that only Partition Key and Row Key are the only properties that are indexed. When reading this I got this idea that may be it should be a good idea to put some of the data in the key so that you can use this data for searching. For example if you are making a blog site then if can be a good idea to put the UserId inside the Row key of every blog’s row key and then find it using a String.Contains give me the blogs of the user email@example.com just return all the blogs where the Row id contains the email. At least we can draw this conclusion that a string.Contains should run faster (much faster) on the Row key rather than a “Non-indexed” field inside the entity.
So I tweeted and tried to confirm my hunch from the people at Cloud Cover on channel 9, who replied with affirmative.
Now I set off to measure the performance gains that I will get using the above mentioned approach. I created a user object like following,
Next I create records on this entity with the following code
Run the code twice with slight change in the Email address. So ideally I should have 8000 records in my table but there were 7600. I will investigate that later and report back but carrying on.
Now the fun part I started querying this entity from my code. Remember I have RowKey and Email having the email address. Which in my case contain a lot of entities starting with “o1”.
So I wrote one query each to get the record starting with “o1” on the User.RowKey and one with User.Email starting with “o1”. Now ideally the query running on RowKey should be much fast than the one running on the User.Email, not so ideally they should be almost the same. But in my case absolutely worst case happened. The RowKey Query was around 3 times slower than the Email query. Run the code 100 times took an average and the result was
- Email Query Took 286 MilliSeconds
- RowKey Query Took 919 MilliSeconds
Then I changed my queries and instead on doing Contains I did the equals comparison, and this time the RowKey query was much faster than the Email query.
So I can make this conclusion that the row keys are not stored as strings in the database most probably an Integer representation of them is saved and indexed. So the equals operation is fast but any string operation on them is extremely slow. I think this way doing things is highly non intuitive.
Here is the code I used to Query (Please do not mind a lot of Console.Write statements I was just trying to generate a MS Excel compatible output.)
So refrain yourself from querying any way except the equal on the RowKey or else you are in for a surprise and some I have a feeling that this will not be a good surprise.
Steven Max pointed out the an error with my code I was getting less records in the case of the Entity.Field case which gave such huge difference good news for me is that Entity.Field is still a little fast
RowKey Query: 1610,05 MilliSeconds
Entity.Field Query: 1590,07 MilliSeconds