2016-12-02

Friday Links 0.0.20 - MongoDB

This is based on an email I send my .NET team at work

Happy Friday,

I’ve been exploring MongoDB because we were supposed to be starting an engagement with a client that is pretty heavily invested in using it. And while that project was going to be primarily node.js, I wanted to also explore the ecosystem’s compatibility with .NET. In my experience, C# and MongoDB worked pretty well together.

MongoDB is an open source document database. A document database essentially stores blobs of JSON and lets you retrieve them by a key. Most also provide powerful querying, filtering, and aggregation functions. You can set up indexes on the properties of the document to help increase search speed. MongoDB supports clustered and distributed installations where your data is spread out and replicated over multiple individual machines.

What is a document database good for?

Document databases really shine when you have unstructured data. Storing user created content, raw requests/responses, and other content where you don’t know the shape of the data before hand can be a good use case.

For example, I run an online puzzle game with user-created levels. I’m currently storing those levels as JSON text in a SQL Server column. But since the levels are the central domain concept of the game, it would probably have made more sense to use a document oriented database instead.

Why should I learn anything about this?

I think its helpful to have some experience with the different trade-offs involved in using a document database over a relational database like SQL Server. As .NET developers, we tend to be pretty familiar with the process of cramming a business’s data model into tables, rows, and columns. But it might not be the best fit for every use case (though I think it works out well far more often than document database proponents admit).

Keep in mind that Microsoft also provides an unimaginatively named “DocumentDB” service in Azure, that has a fairly compatible API with MongoDB. Learning to think like a document database using Mongo will also help if clients start wanting to use Microsoft’s option as well.

I think its good to have introductory experience in a variety of competing technical philosophies so that you can have the ammunition to tackle problems in the best way. You can always dive deeper when the need arises.

Also, Sitecore now ships with MongoDB for its analytics tracking. If we work a Sitecore project, it might be helpful to know a little bit about it.

Mongo C# Driver

https://github.com/mongodb/mongo-csharp-driver

One of the great things is that the company behind MongoDB supports a .NET driver, rather than leaving it up entirely to the community to maintain. This means we have easy access to a well supported and fully-feature C# client.

Check out the GitHub for some example queries. It provides an untyped (string indexes) API, and also support for mapping to strongly-typed classes. You can build search queries manually, or use a LINQ provider that works a lot like Entity Framework, converting your LINQ expressions into the underlying query language.

Documentation on data modelling concepts

If you’re storing financial data, know that MongoDB stores numbers in floating point by default. Floating point numbers can’t exactly represent some values, and if those values appear in your system they can accumulate error in your calculations. MongoDB has some guidance on modelling money, but basically you do the same thing you would do in javascript: pretend everything is an integer.

https://docs.mongodb.com/v3.2/tutorial/model-monetary-data/

You also have a few options for how to model relationships between entities. You can basically either embed them all with in a single document (a nested structure) or use references between documents in different collections.

Downsides

I’d be remiss if I didn’t share these couple of widely publicized articles about problems users have had with MongoDB

Call Me Maybe: MongoDB Stale Reads

This article outlines a case where MongoDB would acknowledge writes before they were fully saved. Mongo has a few different “Write Concerns” to describe how concerned (get it?) you are that the data you asked it to save really was saved.

This is common in most databases. To improve importance they don’t always write the change immediately to disk. They tend to write it to a journal, then a different thread picks up the journal entries and puts them into the database file, possibly moving other entries around, and updating the relevant indexes.

In a distributed environment, you can also have a write concern that says “Don’t tell me this document is saved until you’ve copied it to a replica and the replica confirmed it”. This article describes some cases where that write concern was not working. It also provides a lot of good background information on how distributed database systems work.

Keep in mind this was from a few years ago, and may have been addressed since then. I also think it was a little blown out of proportion. Reading it a few years ago put it into my mind that “Mongo will just drop your writes without telling you and its completely untrustworthy”. I developed a prejudice against it and never even gave it a chance. But I’ve enjoyed playing with it this week and wish I had tried it out earlier.

Why You Should Never Use MongoDB

I think this title is over the top, and isn’t even specific to MongoDB. It describes a story in which something that sounded like a good fit for a document database (social data) turned out to really be more relational that should have gone into a SQL database. It’s not so much “MongoDB sucks” but “We should have used a relational database”.

It’s a great story that demonstrates some of the trade offs. The core argument the author is making at the end is that she believes the types of data that are a good fit for MongoDB or other document databases is rather narrow.

I learned something from that experience: MongoDB’s ideal use case is even narrower than our television data. The only thing it’s good at is storing arbitrary pieces of JSON. “Arbitrary,” in this context, means that you don’t care at all what’s inside that JSON. You don’t even look. There is no schema, not even an implicit schema, as there was in our TV show data. Each document is just a blob whose interior you make absolutely no assumptions about.

I think she’s mostly right, but I think there are still some uses cases where you know the structure of some parts of the document, but maybe not the entire shape, i.e. its not completely arbitrary.

Matt Burke