The Rain and The Shade

July 27, 2011

Why I like RavenDb as a .Net developer

Filed under: NoSQL Databases,RavenDb — ovaisakhter @ 1:00 pm

We often have discussion about amazing new NoSQL databases like CouchDB, Casendra and Redis.

Usually when I am asked about my choice for a NoSQL database I always say “I will use RavenDb”. I have used RavenDb in one of my projects and I am quite happy from the results, and in my experience if you do it right RavenDb performs amazingly well under extreme load scenarios (If not done right you can seriously screw up things but that’s a talk of another time).

As I mentioned I am generally very happy with the performance provided by RavenDb but I have an other reason for reason to use RavenDb,  that I am a .Net developer, and RavenDb is a Database build on top of .Net. So the learning curve for .Net developer to start using it is negligibly small.

Here I will describe the process of starting a new RavenDb endeavor

“Installing” RavenDb

To start using RavenDb download the binaries from http://ravendb.net/download unzip the file go to the server directory and hit the Raven.Server.exe and here you go you are up and running with a brand new Instance of RavenDb.

Try to navigate to http://localhost:8080/ in your browser and you will be able to see the All new Management Studio for RavenDb built in Silverlight. (The interface provides you can option to create sample data if you don’t want to get your hands dirty in code right now, neat idea I say?)

Using RavenDb

RavenDb server interacts with the client using RESTful web services and the response is in the form of JSON. For .Net developers a very extensive .Net client API is also available, which makes your life a lot easier. Using the client API it is extremely simple to get started.

Now you have you server up and running all you need to do is to write the following code in your application

    var documentStore1 = new DocumentStore { Url = “http://localhost:8080″ }.Initialize();

            using (var session1 = documentStore1.OpenSession())
{
session1.Store(new User { Id = “users/ayende”, Name = “Ayende” });
session1.SaveChanges();
}

and here you go you just saved your first record or document as it is called in this neck of wood into the database, and here is what you need to do to get it back.

using (var session1 = documentStore1.OpenSession())
{
var myUserBackFromRaven = session1.Load<User>(“users/ayende”);

                Console.WriteLine(myUserBackFromRaven.Name);
}

Simple ?

and if you need to query your documents all you need to know if how to write a LINQ statement.

var users = session1.Query<User>().Where(x => x.Name.StartsWith(“A”));

             foreach (var user in users)
{
Console.WriteLine(user.Name);
}

Ravendb provides you a possibility to be able to index your documents for faster retrieval of information. The best part is that all the RavenDb index definitions are written using LINQ which makes the learning curve extremely small.

Here is how you define an index on the User Name in side “Raven Management Studio” web interface.

image

or alternatively you can create an index from inside the code.

public class UsersNameIndex : AbstractIndexCreationTask<User>
{
public  UsersNameIndex()
{
Map = users => from user in users
select new { user.Name };
}

    }

and then query this index from the code like

var users = session1.Query<User>(“UsersNameIndex”).Where(x => x.Name.StartsWith(“A”));

                foreach (var user in users)
{
Console.WriteLine(user.Name);
}

Conclusion

The purpose of this blog is not to give you a tutorial of RavenDb rather it is to show you that how terribly simple it is for a .Net developer to start using RavenDb. I believe if you are a .Net developer you will be up and running with RavenDb in the matter of hours and that is why I like RavenDb as a .Net developer.

Links

www.Ravendb.net

Advertisements

July 14, 2011

Let Google do it

Filed under: General Software Archiecture — ovaisakhter @ 10:59 am

One question I always ask from my tech savvy friends “What is the best way to search MSDN(Microsoft Developers Network) website” and most of the time I get the answer I am looking for i.e. “Search it with Google” and this is true I have almost never been able to find any thing on MSDN using search provided on MSDN. I usually do it on Google for it.

Let say I am looking for something related to Microsoft development tools and Facebook, I will go to Google, will start writing “MSDN” and then “Facebook” and it will give me an option search suggestion like “MSDN facebook sdk” and if you search the first link will be the correct link life does not gets any better. Try doing this on MSDN (or on Bing for that matter Smile).

I think that if you do not have a special need to implement a Full-text search then don’t  even bother, Google will be doing a much better job then you anyway. You may want to spend the time saved from search implementation in making your site more Search Engine Friendly.

July 8, 2011

Using Reverse Time Stamp in TableStorage

Filed under: Table Storage,Windows Azure — ovaisakhter @ 11:59 am

When you read about the Azure Table Storage one of the earlier things you come to know is that there are only two (properties) fields in the stored Entities which are indexed. i.e. PartitionKey and RowKey.

All the records inside a partition are indexed by RowKey and are also automatically sorted on the RowKey also. I can say that if you can design your key in such a way that the records are always sorted in a way which is more suitable for most of your data access scenarios then you can save a lot processing and get much better performance.

In a lot cases the records should be sorted by their date of creation, so new records should be shown first. In TableStorage you get a property in every entity called TimeStamp which could be the first choice normally in this case for Ordering(good old SQL days ). When you go on and write your LINQ query with OrderBy the first thing you will get will be an error at the runtime. Because table storage does not supports OrderBy.

In TableStorage you can use a TimeStamp in the beginning your RowKey to get the records sorted by time of creation, and if you want to get the new records first you can reverse the timestamp. Here is an interesting code I have used which can do that for you.

string myRowKey = DateTime.MaxValue – DateTime.UtcNow).Ticks.ToString(“d19”)

//think I saw this code in one of the CloudCover videos by Steve Marx

so you can create your key like Entity.RowKey = myRowKey+whatOtherwiseCouldHavebeenMyRowKey+SomethingElseIfYoureallyWantto and when get your records they will be nicely sorted on the date of creation.

July 6, 2011

The keys are Kool again

Filed under: NoSQL Databases,Windows Azure — ovaisakhter @ 11:35 pm

(This post is highly inspired by my mail correspondence with Thomas Jespersen who works at www.spiir.dk)

In the good old days when SQL was the king (it kind of still is) we were in love with the “identity column” and key for the record was insignificant in the design. So normally you will design a database where all the tables will have one primary key whose value will be either a Guid or a auto incrementing number (identity in MSSQL).

With the popularity of the NoSQL and high performance databases where many other things were revolutionized the record Key got its due importance back.Here is what tutorial on the Redis site says about the Redis

“Redis is what is called a key-value store, often referred to as a NoSQL database. The essence of a key-value store is the ability to store some data, called a value, inside a key. This data can later be retrieved only if we know the exact key used to store it”

This kind of seems to be the theme all around in the in most of the NoSQL databases e.g Redis, Azure Table Storage,RavenDB, Cassandra and many more use key to access huge amount of data. These systems index the keys for very fast retrieval of information. I am not saying that querying of the data is not possible(for example RavenDB provides amazing possibility for creating indexes on the data but more on that later) but still the fastest way to get or set the data in these systems is “if you can some how know the key”. 

Now the question is how do you know the keys without getting them from the store, an answer to that could be that you should be able to generate the keys based on the context and the type of query you want to do. let us take an example of twitter, a request comes in and says “who follows me” based on the context we know the current user (ovaisakhter in my case). So the user  ovaisakhter and he wants to know who follows him so we can have  a list in the database against a key “ovaisakhter-followers”. So now we can get all this information in one request.

Let us take another example. We need to save user’s tweets. One way of doing that could be that we maintain one list of tweets per month (depends on how the data will be accessed) so the key can  “UserId-mmyyyy-Tweets” so now if some one comes and asks for the tweets you exactly know where to find tweets quickly.

Azure Table Storage is a bit different then the other NoSQL offerings. It provides less opportunity to play with the structure of the document as the document has to be name-value pairs(maximum 255) with a total size of 1MB. They provide you with an further categorization possibility. You have two keys to play with PartitionKey and RowKey. PartitionKey has very important role to play in the scaling of your datastore (more on that later). Partition key also be used as a Categorization point for the fast data retrieval. Let see how our twitter example can look like if modeled on TableStorage. We can use PartitionKey as “UserId-mmyyyy-Tweets” and then all the Tweets can be stored with this PartitionKey. You can use Reverse ticks(reverse time stamp) as plus some identifier as RowKey for better sorting.  Remember once you know how to generate the “Key” you can use the Parallel.Foreach to get tweets for multiple months in parallel).

So in the new era of software development the Keys are back in fashion so due time should be spent on designing what your keys should be, based on things like your Data Structures, Data Retrieval Requirements,scaling requirements and other things.  

July 1, 2011

Generating Weighted Random

Filed under: C# Coding — ovaisakhter @ 2:02 pm

Have you ever been given a requirement by the Business people something like, we want 5% of the visitors of the site to look at the participate in the survey. The first thought that comes to your mind (At least it came in my mind) that we will need to maintain a data store where will maintain count of all the visitors. We will show the popup to every 5th visitor or something like that. When your code is deployed into load balanced  environment then this data store will be some database and then things begin to become hairy you need to deal with problems like locking and concurrency, and most importantly performance, and interestingly you will never be able to get the exact percentage without seriously hurting the performance.

The situation will also become interesting when the requirement will become like 10% will see survey 1, 20% will see survey number 2 and rest will not see any survey.

Some while back I was facing the same problem a friend of mine(Mads Voigt Hingelberg) suggested me a great way out.

It is very simple if you think about it.

Take the example I described above, when a user comes generate a random number from 0 to 100 if this number is from 0 to 10 show him the survey 1, if it is from 10 to 30 show him the survey 2 and if the number is above 30 do not show him anything. This approach may not give you 100% accurate results (which you will not be getting that anyway) but if you take a large enough sample you will see the values are pretty close to what you are looking for.

I wrote a simple class to handle this problem generically.

public class WeightedRandom<T> {
 private readonly List<Range<T>> _cases = new List<Range<T>>(); private static readonly object LockObject = new object();
 private readonly Random _rand = new Random();
  public WeightedRandom(IEnumerable<WeightedRandomDto<T>> cases) { if (cases.Sum(x=>x.Percentage) != 100)
 { throw new ApplicationException("The total of the percentages should be exactly 100");
 }
 var tmpCases = cases.OrderBy(a => a.Percentage);
 var from = 0;
 foreach (var weightedRandomEntry in tmpCases) {
 var range = new Range<T> {Object = weightedRandomEntry.Object, To = from + weightedRandomEntry.Percentage};
 if (from != 0) {
 range.From = from + 1; }
 from = range.To ; _cases.Add(range);
 }
 }
 public T GetNext() {
 lock (LockObject) {
 var random = GetRandom();
 foreach (var range in _cases)
 {
 if (range.From <= random && random <= range.To) {
 return range.Object; }
 }
 return default(T); }
 }   private struct Range<TP>
 { public TP Object { get; set; }
 public int From { get; set; } public int To { get; set; }
  }
 private int GetRandom() {
 var ret = _rand.Next(0,100); return ret;
 } } the constructor takes a collection of
 public struct WeightedRandomDto<T>
    {
        public WeightedRandomDto(int percentage, T obj)
            : this()
        {
            Percentage = percentage;
            Object = obj;
        }
 
        public int Percentage { get; private set; }
        public T Object { get; private set; }
    }

Here is the code that you can use to use this

var list = new List<WeightedRandomDto<string>>
 { new WeightedRandomDto<string>(10, "case 1"),  new WeightedRandomDto<string>(30, "case 2"), new WeightedRandomDto<string>(30, "case 4"), new WeightedRandomDto<string>(30, "case 5") }; _weightedRandom = new WeightedRandom<string>(list.ToArray()); var ret = _weightedRandom.GetNext();

Have fun and do remember to share your feedback.

Create a free website or blog at WordPress.com.