Last.fm to Couchbase exporter

Last week I joined 33rd Degree Conference in Warsaw. One of my favorite talk was “Discover NoSQL Development with Couchbase 2.0″  from Tugdual GrallCouchbase is an open source NoSQL database.

After Tug’s workshop I started playing around with Couchbase. In this article I will describe my experiments. 

I have written a small application which loads artist information from Last.fm and saves JSON results in Couchbase. Application starts with on arbitrary artist, loads it’s information and continues recursively with artists similar to the this one. This process never stops, since every artist on Last.fm has similar artists. To speed up performance loading and saving is done in multiple threads. With 10 threads you can reach up to 25 writes per seconds with the application, which is not a lot for Couchbase, I guess. Performance bottleneck is querying Last.fms web service. After running the application for one hour you have thousands of artists in your Couchbase bucket. Running it for a longer time it will produce a kind of “Big Data”, useful data, not clutter created by a simple loop.

Source code of the application is on Github. Repository is: lastfm-exporter.

Prerequisites to run the application is a running Couchbase Server and a Last.fm API account. For Couchbase installation instructions, please consult the documentation: “Chapter 2. Installing and Upgrading“. Last.fm API documentation you can find here.

There are only two classes in the Application. First I will show both classes en bloc. Below you you find some explanations.

LastfmExporter.java - Main class for initialization and starting the application.

ArtistExportThread.java - Thread to load and save data concurrently

Initialization

Constructor of LastfmExporter initializes 3 things:

  • Couchbase client – Is used to interact with Couchbase server. You can add multiple URIs to URI list if you have a couchbase cluster. See Java SDK guide for more details.
  • Jersey client – Is used to load artist data from Last.fm web service. Jersey is the open source, reference implementation for building RESTful web services.
  • Thread executer – Is used to execute multiple threads at the same time. See this documentation for details.

Execution

This is the core of the application:

  • Load similar artists from Last.fm
  • Export similar artist in multiple threads
  • Call execution recursively for each similar artist

Artist.getSimilar(artistName, key) is a static method provided by Last.fms API bindings for Java.

Load artist info from Last.fm

Loading artist info from Last.fm using Jersey client is straight forward:

  • Build web service URL
  • Create a Jersey WebResource and call the get method

Both methods are part of class ArtistExportThread.java.

Save JSON in Couchbase

Since all data in Couchbase is saved in JSON format you can take the result from Last.fm web service and put it directly in your Couchbase bucket: very nice! Every entry in a Couchbase bucket needs a key which is similar to a primary key in “old school” database.

Next steps

You can now start to query the artist database. See this chapter from Couchbase Developer Guide to find out more about this: “Finding Data with Views”.

If you want to find out more about that I recommend chapter 9 from the Couchbase server manual: Views and Indexes.

About these ads
  1. Hallo Daniel!

    Super Blogpost hast du da! 25 inserts sind wirklich kein Problem für Couchbase, du solltest locker in die “paar Tausend” pro Sekunde kommen. Du kannst ja mal als Test alle in eine Collection schreiben und dann diese als “batch” wegschreiben.

    Melde dich jederzeit bei mir (zb über @daschl auf twitter) falls du was brauchst (bin Release Manager vom Couchbase Java SDK)!

  1. No trackbacks yet.

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit Deinem WordPress.com-Konto. Abmelden / Ändern )

Twitter-Bild

Du kommentierst mit Deinem Twitter-Konto. Abmelden / Ändern )

Facebook-Foto

Du kommentierst mit Deinem Facebook-Konto. Abmelden / Ändern )

Google+ photo

Du kommentierst mit Deinem Google+-Konto. Abmelden / Ändern )

Verbinde mit %s

Folgen

Erhalte jeden neuen Beitrag in deinen Posteingang.

%d Bloggern gefällt das: