/ ubuntu

A CouchDB walkthrough sprinkled with Database Thoughts and Opinions

If you're anything like me, you've come from a RDBMS world and you've begun dabbling in the NoSQL realm. I mean seriously, if google's adopting NoSQL then maybe it's time that we do it too, right?

I've got several previous posts that use MongoDB as the back end. I fell deeply, madly in love with that solution which made my ventures into other such database systems very slow indeed. In fact, every time I came back around to play with CouchDB I got frustrated with their lack of an awesome JavaScript interpreter for the shell and data interactions. Playing with curl is pretty neat-o but there comes to a point where I'd like to just run some cool queries and analytic's on my data without having to create map-reduce documents to do it. (More on this later)

In order to make this cooler, I decided to not use this cool plugin for Google Chrome called Postman and go hardcore and just use the command line for everything. I'm going to go ahead and assume that you've installed or figured out a way to install CouchDB as it is immensely simple.

Ubuntu: sudo apt-get install couchdb
Windows: you know. Windows stuff.

You're also going to need curl...
Ubuntu: sudo apt-get install curl
Windows: You know.. More windows stuff.

Anyway once it's all installed you can use Postman or Curl to perform a GET request on http://your-server-address:5984 (assuming you didn't play with any configuration files). For most of us, this will be http://localhost:5984 and I'm going to use this notation for the rest of the discussions.

$ curl -X GET http://localhost:5984

You should get a response like this:

{
  "couchdb": "Welcome",
  "uuid": "b959c0f17d5530befa418642ec24b128",
  "version": "1.6.1",
  "vendor": {
    "version": "1.6.1",
    "name": "The Apache Software Foundation"
  }
}

Whammo. we're cooking with fire. What do we do now though? This thing is 100 percent controlled by REST Actions! There is a GUI available in the prepackaged versions if you want to use it but what's the use in this? No point creating a database in a GUI when you can create it on the fly from within an app instead.

To get a list of all the databases currently installed you can make a request to:
http://localhost:5984/_all_dbs. For fresh installs you will probably jsut see the _users database (which is for securing and authenticating your system) I have a database in there right now that I don't feel like removing because it's a list of all the projects where my company currently operates and their status'. Pretty neato stuff! It's also the reason why I've decided to learn CouchDB. I want to get an application up and running fast so that we can see what opportunities are sitting out as public knowledge. I digress. My db list looks a little like this:

$ curl -S GET http://localhost:5984/_all_dbs
[
  "_users",
  "projects"
]

We're going to make a chat-like database for instant messaging. This is where my REST understanding took a weird twist. In order to create a new database, you've got to submit a 'PUT' statement to your database server followed by the database name. I naturally assumed that this would be done by a POST verb but alas this was not the case. The logic is probably something along the lines of the fact that you are 'updating' the list of available repositories as opposed to creating a new one. When thinking about it this way it made a little more sense to me.

Let's make a database called 'chatter' by executing the following:

$ curl -X PUT http://localhost:5984/chatter

The server should respond with: {"ok": true}. Re-request the list of databases previously and ensure that you've got your new chatter database added to that list.

[
  "_users",
  "chatter",
  "projects"
]

To delete a database, you simply execute the Delete command in the same format that we issued the Put command. Since there is no info in the chatter database let's remove it just for demonstration and learning purposes:

$ curl -X DELETE http://localhost:5984/chatter

Again we should get a response of {"ok":true}. If this worked for you fantastic! Hit your up arrow until you get our Put command and recreate it as we want to play with that database now.

To get some cool information on the status of your database, you can simply submit a get request to the endpoint. Issue a curl to http://localhost:5984/chatter to see something similar to the following:

$ curl -X GET http://localhost:5984/chatter
{
  "db_name": "chatter",
  "doc_count": 0,
  "doc_del_count": 0,
  "update_seq": 0,
  "purge_seq": 0,
  "compact_running": false,
  "disk_size": 79,
  "data_size": 0,
  "instance_start_time": "1460655353072335",
  "disk_format_version": 6,
  "committed_update_seq": 0
}

Most of the fields in this document are pretty well defined and I'm not going to go into what they all mean. You can see that the database is currently empty as a request to the super endpoint _all_docs will yield an empty response array. Try out curl -X GET http://localhost:5984/chatter/_all_docs and get the following response:

$ curl -X GET http://localhost:5984/chatter/_all_docs
{
  "total_rows": 0,
  "offset": 0,
  "rows": []
}

Now let's say that we want to add an object into our chatter database. In a previous post I had a specific schema for a chat application using Firebase. I'm not going to use the same schema because I have learned CouchDB a little better than Firebase and I know that I can query data with a little more finesse than I could with Firebase.
Anyways bla bla bla here is what I want to store:

{
  "lobby":[String],
  "from":[String],
  "content":[String]
}

Note that a pretty useful item to include in this schema would be a timestamp. It's not really possible to store it in here because we're just going to do this completely sans code and only use rest commands. To create this document we're going to curl it!
Note: Depending on your system you may have to send JSON in a specific way. For windows you need to escape your quotes in the following manner:

$ curl -X POST http://localhost:5984/chatter -H "Content-Type:application/json" -d {"""lobby""":"""guest""","""from""":"""james""","""content""":"""Hello, World!"""}

In Ubuntu you can get away with the following:

$ curl -X POST http://localhost:5984/chatter -H "Content-Type:application/json" -d '{"lobby":"guest","from":"james","content":"Hello, World!"}'
More reasons why linux is better than Windows :-)

You should get a response of a JSON document followed with ok being set to true and with the ID and Revision of the generated document. In my case it looks like this:

{"ok":true,"id":"ed52b492618ae5bb35ba80ed1a000add","rev":"1-794191f12d236795ca94a0f7970d8200"}

Now if I request this resource from the chatter endpoint you will retreive the JSON document we just stored! curl -X GET http://localhost:5984/chatter/{theId}
Yields

$ curl -X GET http://localhost:5984/chatter/ed52b492618ae5bb35ba80ed1a000add
{
  "_id": "ed52b492618ae5bb35ba80ed1a000add",
  "_rev": "1-794191f12d236795ca94a0f7970d8200",
  "lobby": "guest",
  "from": "james",
  "content": "Hello, World!"
}

In order to perform updates in CouchDB, you must provide the document revision along with any of the other attributes you want to change. This is a full, in-place update of the entire document. This is why schema design in NoSql solutions is so important. Consider our design for a moment. It's pretty simple and gives us some pretty straight forward information. What if we wanted to link the users to a registered user? Do we store all of their information in the post? Should we have embedded the conversations into a top-level lobby object? Will we want there to be a 'like' feature or a 'comment on' feature? If a document gets very large (4 gb is the default in CouchDB) will we really want to be moving an entire 4 gb document over the wire whenever we want to update something? This is always my debate when working with NoSQL Solutions. You really need to determine what parts of your data stay static and what parts are updated over and over again. Sending all the chat conversations for a lobby over the wire every time someone writes to it is not my definition of efficient but who knows, maybe there is some sort of a use-case for it.

Diatribe aside, let's update our document. Turns out that I'm going to abuse my admin rules and move this message from the guest lobby to the 'scary' lobby and instead of greeting the world this James character is going to greet space.

$ curl -X PUT http://localhost:5984/chatter/ed52b492618ae5bb35ba80ed1a000add -H "Content-Type:application/json" -d {"""_id""":"""ed52b492618ae5bb35ba80ed1a000add""","""_rev""": """1-794191f12d236795ca94a0f7970d8200""","""lobby""":"""scary""","""from""": """James""","""content""":"""Hello, Space!"""}

You will get some feedback with a new OK and Revision starting with "2-".

{"ok":true,"id":"ed52b492618ae5bb35ba80ed1a000add","rev":"2-a825fa840906a22990bb2a9e41f6abf9"}

If you go right ahead now and re-issue that same request you're going to get an error:

{"error":"conflict","reason":"Document update conflict."}

This is how CouchDB enforces currency control. You can think of it like a document-level lock where if user A and user B read a document at the same time, and user A updates before user B, user B has to lose their work and reload the document in order to make any adjustments. This is another thing to consider when doing schema designs. If you have a big document of data that changes and is touched by multiple people then you could run into these update conflicts very often and potentially have a disastrous user experience for your stakeholders.

We can now go a head and add some more random data into our fake blog database so that we can learn how to properly use views! I'm going to add another fie or so records some to the 'guest' lobby and some to a new lobby called 'Adele Fans' because she is a delightful artist.

//Intermission so you can enter some fun data that you will play with

I added 6 more documents. When I curl my database again (curl -X GET localhost:5984/chatter) I get some new metrics showing disk usage, that I have seven documents in there, etc.

To get a list of all the documents you can add the /_all_docs to the end of the URL and you'll get a big list of items!

$ curl -X GET http://localhost:5984/chatter/_all_docs
{
  "total_rows": 7,
  "offset": 0,
  "rows": [
    {
      "id": "5f9458368400e230b69dbb12ae000811",
      "key": "5f9458368400e230b69dbb12ae000811",
      "value": {
        "rev": "1-b03ecef4133720644cd028dd3fb8239c"
      }
    },
    {
      "id": "5f9458368400e230b69dbb12ae000b7b",
      "key": "5f9458368400e230b69dbb12ae000b7b",
      "value": {
        "rev": "1-1de50cc356d9eb219bb8b24b23ee1da3"
      }
    },
    ...
  ]
}

That's probably something a RDMS user isn't used to. You'd expect to get some actual data from this! If you request an item by it's id then it returns as expected but let's say that you're interested in getting a list of some small sized documents and that is good for latency. Simply append the parameter ?include_docs=true and it will do exactly as it states. include the doc!

$ curl -X GET http://localhost:5984/chatter/_all_docs?include_docs=true
{
  "total_rows": 7,
  "offset": 0,
  "rows": [
    {
      "id": "5f9458368400e230b69dbb12ae000811",
      "key": "5f9458368400e230b69dbb12ae000811",
      "value": {
        "rev": "1-b03ecef4133720644cd028dd3fb8239c"
      },
      "doc": {
        "_id": "5f9458368400e230b69dbb12ae000811",
        "_rev": "1-b03ecef4133720644cd028dd3fb8239c",
        "lobby": "guest",
        "from": "james",
        "content": "hello, world!"
      }
    },
    ...
  ]
}

There you have it! You have the documents included in your array. However I did state earlier that I have conversations going in two lobbies and I'm only interested in what's happening in the Adele Fans channel. We can accomplish this using something called design documents and map-reduce functionality.

You can make these however you want! It's more things that you need to take into account when making a CouchDB System. Since our app is so small we are going to make a design document called _design/lobby. The only restriction is that a design document needs to have the _design flag in front of it.
Here is how my design document is going to look.

{
  "_id":"_design/lobby",
  "language":"javascript",
  "views":{
    "lobbies":{
      "map":"function(doc){emit(doc.lobby, null)}",
      "reduce":"_count"
    },
    "messages":{
      "map":"function(doc){emit(doc.lobby, doc)}"
    }
  }
}

We are going to create this as by posting it to our chatter database again! Same format as before

$ curl -X POST http://localhost:5984/chatter -H "Content-Type:application/json" -d '{"_id":"_design/lobby","language":"javascript","views":{"lobbies":{"map":"function(doc){emit(doc.lobby, null)}","reduce":"_count"},"messages":{"map":"function(doc){emit(doc.lobby, doc)}"}}}'

Assuming that you got your OK Message, we're good to start querying! Lets first explain what is happening here. The views are our declared map-reduce functions written in Javascript. They are just simple functions that you can throw some logic into in order to extract only the data that you are looking for. The first "lobbies" view is emitting a key-value store with a lobby name and then reducing it into a count of lobbies. Since we are only storing messages in our schema, we can assume that the count is the number of messages in the channel. First we MAP our output then we REDUCE it using a built in CouchDB Function (_count).

Let's run it!

$ curl -X GET localhost:5984/chatter/_design/lobby/_view/lobbies
{
  "rows": [
    {
      "key": null,
      "value": 7
    }
  ]
}

Woah, what?
That's not expected! Well not really. We emitted some documents based on whatever criteria and then reduced it into a total count of documents. We have 7 documents in our database right now. We can group the results by the emitted key (doc.lobby) by passing the parameter ?group=true after the previous request.

$ curl -X GET localhost:5984/chatter/_design/lobby/_view/lobbies?group=true
{
  "rows": [
    {
      "key": "adele fans",
      "value": 4
    },
    {
      "key": "guest",
      "value": 3
    }
  ]
}

This looks a little better!
Now we know the name of the lobbies and want to get some data out of them. Let's look for the adele fans channel using our second design document

$ curl -X GET localhost:5984/chatter/_design/lobby/_view/messages
{
  "total_rows": 7,
  "offset": 0,
  "rows": [
    {
      "id": "5f9458368400e230b69dbb12ae000b7b",
      "key": "adele fans",
      "value": {
        "_id": "5f9458368400e230b69dbb12ae000b7b",
        "_rev": "1-1de50cc356d9eb219bb8b24b23ee1da3",
        "lobby": "adele fans",
        "from": "adele",
        "content": "hello, its me"
      }
    },
    {
      "id": "5f9458368400e230b69dbb12ae0018da",
      "key": "adele fans",
      "value": {
        "_id": "5f9458368400e230b69dbb12ae0018da",
        "_rev": "1-69b9163732528fd84155e1a4e71d4207",
        "lobby": "adele fans",
        "from": "adele",
        "content": "I was wondering if after all these years youd like to meet"
      }
    },
    {
      "id": "5f9458368400e230b69dbb12ae002063",
      "key": "adele fans",
      "value": {
        "_id": "5f9458368400e230b69dbb12ae002063",
        "_rev": "1-54bf585810504f64208ae8e143b97c9c",
        "lobby": "adele fans",
        "from": "adele",
        "content": "To go over everything"
      }
    },
    {
      "id": "5f9458368400e230b69dbb12ae002aed",
      "key": "adele fans",
      "value": {
        "_id": "5f9458368400e230b69dbb12ae002aed",
        "_rev": "1-fe4e28c815eff064ea71141cf00b7556",
        "lobby": "adele fans",
        "from": "adele",
        "content": "They say that times supposed to heal ya but I aint done much healing"
      }
    },
    {
      "id": "5f9458368400e230b69dbb12ae000811",
      "key": "guest",
      "value": {
        "_id": "5f9458368400e230b69dbb12ae000811",
        "_rev": "1-b03ecef4133720644cd028dd3fb8239c",
        "lobby": "guest",
        "from": "james",
        "content": "hello, world!"
      }
    },
    {
      "id": "5f9458368400e230b69dbb12ae003247",
      "key": "guest",
      "value": {
        "_id": "5f9458368400e230b69dbb12ae003247",
        "_rev": "1-4f2a8b7ae32f6d0cf0e864bb84677351",
        "lobby": "guest",
        "from": "James",
        "content": "Go check out the Adele Fans Lobby"
      }
    },
    {
      "id": "5f9458368400e230b69dbb12ae003642",
      "key": "guest",
      "value": {
        "_id": "5f9458368400e230b69dbb12ae003642",
        "_rev": "1-b757479d0be649406309b9de62da62a5",
        "lobby": "guest",
        "from": "James",
        "content": "Adele is in there"
      }
    }
  ]
}

We can see what we expected. We emitted the lobby name as the key and then emitted the document. You can see here though that we have more documents than expected for just the adele fans channel. We can restrict our query based on the keys we emitted during the map function by using the key=adele%20fans parameter in our curl statement. Curl likes to parse out quotes so we have to do some fun wrapping in her in order to pass a proper query string. --FYI: %20 is the encoding for a single space. In case you didn't know

$ curl -X GET 'localhost:5984/chatter/_design/lobby/_view/messages?key="adele%20fans"'
...
{
  "total_rows": 7,
  "offset": 0,
  "rows": [
    {
      "id": "5f9458368400e230b69dbb12ae000b7b",
      "key": "adele fans",
      "value": {
        "_id": "5f9458368400e230b69dbb12ae000b7b",
        "_rev": "1-1de50cc356d9eb219bb8b24b23ee1da3",
        "lobby": "adele fans",
        "from": "adele",
        "content": "hello, its me"
      }
    },
    {
      "id": "5f9458368400e230b69dbb12ae0018da",
      "key": "adele fans",
      "value": {
        "_id": "5f9458368400e230b69dbb12ae0018da",
        "_rev": "1-69b9163732528fd84155e1a4e71d4207",
        "lobby": "adele fans",
        "from": "adele",
        "content": "I was wondering if after all these years youd like to meet"
      }
    },
    {
      "id": "5f9458368400e230b69dbb12ae002063",
      "key": "adele fans",
      "value": {
        "_id": "5f9458368400e230b69dbb12ae002063",
        "_rev": "1-54bf585810504f64208ae8e143b97c9c",
        "lobby": "adele fans",
        "from": "adele",
        "content": "To go over everything"
      }
    },
    {
      "id": "5f9458368400e230b69dbb12ae002aed",
      "key": "adele fans",
      "value": {
        "_id": "5f9458368400e230b69dbb12ae002aed",
        "_rev": "1-fe4e28c815eff064ea71141cf00b7556",
        "lobby": "adele fans",
        "from": "adele",
        "content": "They say that times supposed to heal ya but I aint done much healing"
      }
    }
  ]
}

There we go! we can see what happened in the Adele Channel, all through the command line using only web requests. Pretty neat hey? This has been a pretty naive implementation of the CouchDB Functionality. Feel free to check out the docs and play around to really unlock the full potential of this system.

Closing Thoughts

You get some really mixed reviews when reading up on NoSQL Solutions on the internet. I'm here to tell you that most of it is straight opinionated trash. I don't use a screwdriver to hammer a nail and I don't use a sponge to chop firewood. With enough dedication you probably could probably make it work but it wont be the most optimal way to perform the work. This is why it's really important to understand the requirements of your application before jumping into a technology stack. I got super excited over MongoDB when I first tried it and quickly learned it was NOT useful for the application I was building leading to a lot of re-work before it was put into production. If you have an app with a lot of static data that you don't need to really relate back to anything then NoSQL is probably for you. If you've got to maintain relationships between a bunch of different objects over a lot of different documents with different structures, you should probably stick with a relational model.
I currently have two applications running in production at my company that leverage NoSQL solutions. One is a full MEAN stack implementation and is running with no issues with high performance and with very satisfied stakeholders. Data gets created and read and updated very rarely. Perfect for a NoSQL Database. One application uses CouchDB for a portion that demands project-to-project customization for certain aspects and hooks are written to keep the important corporately-mandated required data synchronized with our relational model. This allows flexibility for the users who are immersed in detail and gives us the ability to perform our analytics and typical business requirements at the same time.

Enjoy learning and never be afraid to try something new!

Cheers, friends and keep hacking away!

References: