Smart Databases : RethinkDB

I started playing with RethinkDB and it is very interesting and definitely worth a look.

Lets start with a typical example in applications for example the client table. The client table can get modified via UI or SalesForce or by hand or any number of ways. Hence any app cannot just read this table once and keep it in cache, we have to constantly pool to get changes.

This is very frustrating, cause this table rarely changes but if it does change we need to know about the change immediately, so far we have relied on polling or the system that makes the change inform other systems so that they can blow there local cache.

But with RethinkDB it changes all that.

RethinkDB is like a smart database, your client can listen for table changes and whenever that particular table changes the database itself will notify any clients that the table has changed and on top of it will tell you the old and new values.

This is almost revolutionary, (yes I know you can do this in mysql by installing the JDBC driver and a trigger that listens on the table and the trigger can inform other apps but it is way too complicated and good luck getting your ops to install it), but RethinkDB does it for you OUT OF THE BOX. And they now have an awesome JDK driver.

Just awesome sauce.

Favorite languages, why so great? and why not so much?

About my favorite languages, I actually have 2 favorite languages

  • Ruby: for all scripting and making quick apps.
  • Clojure: for development.

Why Ruby is great

  1. The language was designed for programmer use, you can see that from the api which is totally intuitive.
  2. Lots of libraries, my favorite is Sinatra which lets you build quick and dirty web apps and the other is Sequel.
  3. I wrote a blog post on how to delete RFC-822 in compatible emails (if you are a developer using linux and your company uses Outlook you know what I am talking about), this is a simple example of how I have used Ruby to make quick and dirty scripts.

I have used Ruby numerous times to write scripts to fix production data, correct files, and to generate complex reports. I have used Sinatra with Google Charts to make web apps that can show load times, server status ….

Why Ruby is not so great

  1. Not really meant for performance, recent years there is a push to develop a virtual machine for Ruby but it is still not anywhere close to C/Java performance.
  2. Rails is a pain to deploy, Heroku takes away the pain but what do you do if you have to deploy internally ? I personally have 2 apps on Heroku one of which is http://first3links.com/

Why Clojure is great
I have been on a quest to learn a functional programming language for the past 3 years, I have read the Erlang book (please see the various posts I wrote about Erlang here). Erlang is a fine language but I lost interest in it after I could not find a single good library that can connect Erlang to Oracle. The problem, there are too few 3rd party libraries. The next language I looked at was Haskell, lots of libraries and seems to be good at performance on the surface, problem I see is acceptance by business, where most of the code is in Java. Then I found Clojure and fell in love with it.

  1. It is just another DSL for the JVM, if you provide type hints the code generated will be the same as what Java would (can easily sneak it in).
  2. Totally embraces the JVM unlike JRuby.
  3. The author Rich Hickey has done a lot to reduce the pain points of lisp.
  4. Finally a language that frees you mind of OOP ( Have you ever noticed how much time you spend in trying to achieve the best object model when a simple one would do ? and for what ? the customers don’t care as long as it works, the computers sure don’t care as long it is 0s and 1s)
  5. Code is so concise and elegant.

Why Clojure is not great.

  1. It has been called as the language with the steepest learning curve on the JVM, I tend to agree with it.
  2. Unlike Scala you have no wiggle room, it is either functional code or nothing ( I like this feature actually).
  3. Debugging is a major pain point. (Though there has been improvement with the latest clojure-swank).

I have written many posts on Clojure on my blog you can see them here. In the most recent post I show you one can parse a one million record file in less than 15 seconds with clojure.

Denormalizing One million records with Clojure.

MovieLens is a research project that provides datasets of various sizes and attributes, containing movie ratings. These datasets are free to download and use for non-commercial purposes. They have done an awesome job putting this data together and a big thanks goes to them for making it available.

I wanted to exercise my Clojure skills (more like add to my tiny set of Clojure skills 🙂 ) and it just so happens that I recently came across the MovieLens project, so how about analyzing that data using Clojure ?

One of the datasets they make available is the One Million Dataset, this set consists of 3 files

  1. movies.dat” containing 3883 movie listings, contains title, genre…
  2. users.dat” containing 6040 unique users, contains age, occupation, gender …
  3. ratings.dat” containing 1000209 movie ratings, that references movie id and user id from the above 2 files.

I could analyze this data to answer questions such as, What age group gave the most ratings ? or What was the highest rated movie for a given time period ?

But before I could do this I wanted to denormalize the ratings file so that it also contains the user and movie information, why ? cause I don’t want to look it up when I am analyzing the data, each record should be self contained.

The outline of the program is quite simple

  • Read the users file into memory
  • Read the movies files into memory
  • For each line in the ratings
    • Find the corresponding movie and user
    • Print it out to a file.

Take a minute to think how would you do this in java and then look at the below code. I ran it on a Dell laptop dual 2.2Ghz laptop with 4 gig of ram and care to guess how long it takes ?? scroll down for answer.

(ns com.dev.file-reader
 (:use [clojure.contrib.duck-streams])
 (:import [java.io BufferedReader FileReader BufferedWriter FileWriter]))

(defstruct user :id :gender :age :ccupation :zip-code)
(defstruct movie :id :title :genres)

(defn format-user [user] (str (:id user) "::" (:gender user) "::" (:age user) "::" (:ccupation user) "::" (:zip-code user)))

(defn format-movie [movie] (str (:id movie) "::" (:title movie) "::" (:genres movie)))

(defn read-user-file [fileName]
 (loop [users {} fileSeq (read-lines fileName)]
   (let [line (first fileSeq)]
     (if (nil? line)
     users
     (let [tokens (.split line "::")
           id (aget tokens 0)
           user (struct user id (aget tokens 1) (aget tokens 2) (aget tokens 3) (aget tokens 4))]
        (recur (merge users {id user}) (rest fileS)))))))

(defn read-movies-file [fileName]
 (loop [movies {} fileSeq (read-lines fileName)]
   (let [line (first fileSeq)]
     (if (nil? line)
     movies
     (let [tokens (.split line "::")
           id (aget tokens 0)
           movie (struct movie (Integer/parseInt (aget tokens 0)) (aget tokens 1) (aget tokens 2))]
         (recur (merge movies {id movie}) (rest fileS)))))))

(defn convert-ratings-file
 "read the ratings file and denormalize it"
 [moviesF usersF ratingsF outputF]
   (let [movies (read-movies-file moviesF) users (read-user-file usersF)]
     (with-open [#^BufferedReader rdr (BufferedReader. (FileReader. ratingsF) 1048576)
                 #^BufferedWriter wtr (BufferedWriter. (FileWriter. outputF) 1048576)]
       (doseq [line (line-seq rdr)]
         (let [tokens (.split line "::")
               user-id (aget tokens 0)
               movie-id (aget tokens 1)
               user (get users user-id)
               movie (get movies movie-id)
               rating (aget tokens 2)
               timestamp (aget tokens 3)]
 (.write wtr (str (format-user user) "::" (format-movie movie) "::" rating "::" timestamp "\n")))))))

(defn doIt []
 (time (convert-ratings-file
 "movielens-1m/movies.dat"
 "movielens-1m/users.dat"
 "movielens-1m/ratings.dat"
 "movielens-1m/output.dat"
 )))

So ready with you guess ??
I ran the program 5 times and here is the output

"Elapsed time: 12130.035819 msecs"
"Elapsed time: 13113.92823 msecs"
"Elapsed time: 13364.234216 msecs"
"Elapsed time: 12553.478168 msecs"
"Elapsed time: 14488.706176 msecs"

On average 13.130076521799994 Seconds to read in 1 million records, for each record look up the movie and user and write it back to the disk.

Clojure puts the FUNctional back in programming.

Redis and Clojure

Check out my previous post about Redis.

In this post I build a very simple example of using Redis with Clojure. I will be using a client library for Redis written in Clojure called redis-clojure. You could also use the java library, to see complete list of supported languages go to this link.

So here we go..

  1. Create a simple clojure project (I personally use Leinigen), to create a new project execute ‘lein new com.dev/try-redis‘ this will create an entire project structure.
  2. Edit the project.clj file under the newly created project directory and add a new dependency for redis-clojure, the file should look close to this after you are done.
    (defproject com.dev/try-redis "1.0.0-SNAPSHOT"
      :description "simple example of using redis"
      :dependencies [[org.clojure/clojure "1.1.0"]
                     [org.clojure/clojure-contrib "1.1.0"]
                     [redis-clojure "1.0.3-SNAPSHOT"]]
      :dev-dependencies [[swank-clojure "1.2.1"]])
    
  3. run ‘lein deps‘ so that all the dependencies are downloaded.
  4. Edit the file core.clj under the directory try-redis/src/com/dev/try_redis, and add the following.
    (ns com.dev.try-redis.core
      (:require redis))
    
    (defn test-redis []
         (redis/with-server {:host "127.0.0.1" :port 6379 :db 0}
           (do
             (redis/set "foo" "bar")
             (println (redis/get "foo")))))
    
    

    On lines 7 and 8 we are setting key value pair and retriving the value.

  5. Start the redis server ‘./redis-server redis.conf
  6. Now we are ready to execute the script, there are 2 ways to do this.
    1. The easiest way is just going to your porject root directory and run ‘lein repl‘ (see the below oouput) which opens a read evaluate loop and once you have that run ‘(load-file “src/com/dev/try_redis/core.clj”)‘ to load the file and then you can run ‘(com.dev.try-redis.core/test-redis)‘ to run the example.
    2. I personally use emacs/slime, but for this option you need to have emacs and slime-clojure installed (See my emacs page). Run ‘lein swank‘ in the project directory and then in your emacs connect to it using ‘M-x slime-connect‘, this will open up a repl, do a C-c C-k to compile the file and in the repl you can execute using ‘(com.dev.try-redis.core/test-redis)’

If everything has gone will you should see this output.

Clojure 1.1.0
user=> (load-file "src/com/dev/try_redis/core.clj")
#'com.dev.try-redis.core/test-redis
user=> (com.dev.try-redis.core/test-redis)         
bar
nil
user=>

Nice Presentation on F#

If you are studying Clojure or for that matter Scala or any other functional programming language I highly recommend that you check out this presentation “Introduction to Microsoft F#” You can follow along and try out his examples in your own language, the talk is funny with a lots of information that could apply to any functional language. I was really impressed by the F# ide, man I wish Clojure has the same level of IDE support.

Drools Rules!

For anyone contemplating using a Rules Engine here is some free advice. In my career I have had the chance to use three different Rule Engines (ILOG, QuickRules and JBoss Rules),  of all the three I have found JBoss Rules the best, I love it!

JBoss Rules used to be a standalone project called as Drools before it got merged into JBoss, the beauty of Drools was it was simple, no fancy tools or interfaces, just a plain old jar file that you included in your classpath and started using. And best of all it was and still is free, so less of a battle with management.

JBoss Rules works with POJOs and integrates well with Spring and best of all you could learn it quickly. Performance wise, I don’t remember the exact numbers but we did have over 100,000 different rules and it would go through them in seconds. There was never a problem on that end.

It has been over a year that I have used it and when I was using it, it did not have any fancy interface, so we built a rudimentary interface for loading new set of rules into a running application and it worked out very well.

All the rules are stored in a place called ‘Production Memory’ I just call it a blueprint, every time you just make an instance of this blueprint, assert the facts into it, get results and throw away the instance. Creating an instance was very fast and lightweight, and while there are instances floating around in your app you can update the blueprint and the next time an instance is created the updated rules would be used.

JBoss Rules gives you many options for writing rules, you can either write em using spreadsheets (also called Decision Tables) or write them using the provided DSL. Spreadsheets are really good if you have small number of columns, I’d say as long as they fit your screen you are good, once you have to start scrolling vertical, debugging gets a little difficult.

NOTE: do not let your business people edit the spreadsheets, if you have to, give em a website where they can upload and verify it. Regardless of what JBoss says, these excel sheets follow a strict convention, one minor formatting error and you will be in trouble. I wrote a simple program that loaded these spreadsheets and verified if it worked before doing anything else.

Testing is a must, sorry but you cannot get away from this. Based on your data (or facts) many different rules can get active and then unless you specify you own Conflict resolution strategy it will use the default strategy and you may get some unexpected results. This was probably the most tedious part of using a Rules Engine. Also things like ‘OR’ and ‘AND’ work  different that what you are used to. The more rules you have to more testing you will need to do. If there was a wish list for JBoss rules features somewhere I would say a rules coverage feature would be nice to have.

I have been told ‘Jess in Action’ is a very good book to read if you wanted a good introduction to Rules Engines and it tells you how to use Rules, like something I have heard is you should use Rules Engines to get the result and then apply the result to your data and not let the Engine itself modify the data.

Anyways that was my brain dump on Rules Engines.

Loving Clojure

I seem to be liking Clojure …

Let me backup a bit, in the last couple of weeks I have been debating between picking up Scala or Clojure (don’t get me wrong Ruby is still my favorite).

I always wanted to pick up a functional programming language so I dabbled a bit with Erlang and Haskell, liked Haskell a lot but without much practice it kind of died (sad times)  and Scala seems too much like Java, yeah I know it seems to have a bigger crowd than Clojure and there are a lot of big names behind it.

Maybe that’s exactly why I choose Clojure (since its the underdog), or cause it is different enough from Java or simply cause it has a better syntax and seems more elegant (apparently Clojure has better integration with Java, don’t quote me on it), anyways I decided to learn Clojure.

Peepcode has a nice screen-cast to get you started off on Clojure. If you are on the Mac there is a nice bundle for TextMate and anywhere else Netbeans with the enclojure plugin seems to be the best.

On a side note it seems more and more that Netbeans has the latest and greatest plugins for everything, then comes IntelliJ and finally eclipse, what’s going on with eclipse ? has it reached its peak and now it will start dropping off ? but on the flip side there seems to be more and more apps built on top of the Eclipse RCP like Xmind, so is Eclipse no longer going to be the leader of the IDE and just become a platform for building RCPs. This of course depends  on what Oracle is going to do with NetBeans, I really hope they give the same amount of love to NetBeans as Sun did.

Ok getting back to Clojure, don’t get your panties in a bunch when you see all those parenthesis, it is just the layout that is shocking, indent it well and it is no more than what you are used to.

Here’s an example

(defn fac
"Returns the factorial of n, which must be a positive integer."
[n]
(if (= n 1)
1
(* n (fac (- n 1)))
)

Is same as

(defn fac [n] (if (= n 1) 1 (* n (fac (- n 1)))))

But the first one is a lot more easier on the eyes (even brain?) than the second one. Most examples that you see look like the second one and it frightens people, don’t let that stop you take my word and go for it.

Clojure seems to be very easy to pick up, things seem very intuitive, like the other day I was wondering, how to return a default value from a map if the key is not found and there is was right there in the api.

(map key default-value)

So simple! I was easily able to extend the examples that came with the peepcode screencast. Anyways I have started on this path, let’s see where it goes.

Update 2009/12/03

– Looked at the Clojure source code, looks squeaky clean, I applied to become a member so that I can expand on the test coverage, hopefully they will accept me.

-It is (load-file “hello.clj”) and not load-file “hello.clj” , I keep forgetting that and after a few mins I realize it.

Language wars are the new IDE wars

(I feel really sorry for Managers, every programmer has his/her favorite programming language and is trying to sneak it into the system. And you know what, it is already in your code base, sorry but that’s the truth and whatever side you take, you will end up loosing. )

I think people used to have IDE wars cause they only had one primary language to work with, but now with the explosion of languages and almost all of them having some port that runs on the JVM everyone is either trying to sneak it in or advertise the virtues of using it. And that ultimately results in a passionate email war.

And of course if you have, somehow magically gotten past that there is always the discussion on the best IDE for that language, hehehe let the wars continue.

Doing evil things, overriding jar location in Maven

Every once in a while you are stuck in a situation where you just cannot add a jar into the repository but you still want to use maven, there is a work around. In the old maven 1.x you had to do this using the project properties now it is even easier just add a dependency  like below and add the jar to the ${basedir}/lib folder. The system tag was created for a totally different purpose, but here we are using it for our overriding jar locations.

<dependency>
 <groupId>pircbot</groupId>
 <artifactId>pircbot</artifactId>
 <version>1.0</version>
 <scope>system</scope>
 <systemPath>${basedir}/lib/pircbot-1.0.jar</systemPath>
 </dependency>

Jruby Blues

Man today was one of those days, I am about to give up on Jruby. Don’t get me wrong I absolutely love the idea of Jruby, great way to sneak in Ruby into the enterprise. But I don’t think it is quite there yet.

If you want to use Jruby and Spring all you got do is include these dependencies into your pom file.

<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring</artifactId>
<version>2.5</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.jruby</groupId>
<artifactId>jruby-complete</artifactId>
<version>1.0.3</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>cglib</groupId>
<artifactId>cglib-nodep</artifactId>
<version>2.1_3</version>
<scope>compile</scope>
</dependency>

And it works great,……………as long as you don’t have to use Hibernate. It just so happens that Jruby uses asm-2.2.3.jar file and Hibernate uses asm-1.5.3 and apparently the api is very different between these two versions, result is

java.lang.NoSuchMethodError: net.sf.cglib.core.Signature.(Ljava/lang/String;Lnet/sf/cglib/asm/Type;[Lnet/sf/cglib/asm/Type ; ) V

Man this is frustrating, I spent all day trying to work around the problem but no go. Now here is the kicker, it works perfectly fine in eclipse and I am using maven 2 ide .

I spoke with my colleague (Tim) about this and he thinks it works because of OSGI which allows different jar’s depend on different versions of other jars.