I do stuff over the interwebs: March 2014

Sunday, March 30, 2014

Mysql to MongoDb, chapter 2: Diving in

[This is a second part in an ongoing series, part 1 is here)

So, after making the call to go mongo, doing a mongo 101 crash course, we've started working on two major fronts (Only two developers):
1. DB Layer rewrite - This was pretty much straight forward. We've had about 100 functions to rewrite, but a lot of them were simple CRUD function. We've decided to use the native mongo node drivers, as I don't like to use frameworks in my code. (There's mongoose, which is a nice ODM layer, but, as I've said, I'd rather use native stuff, unless there's a performance advantage there.)
Major points you need to consider when transforming your code:

- Mysql has auto count features for unique inserts in table. Mongo has a unique id per object in database. If you're not using one, single unique id (And you're not) for each record in your mysql database, you need to use some sort of an applicative counter solution for inserts in mongo. This is also very useful to return insert id for new entities.

- Type checks and conversion: Rather than using a framework, I've decided to implement a simple hash table for field names and types.

- Logging: Like printing mysql statements to log, write a function which logs your query objects in mongo native format. (Like db.users.find({name:"yuval"}). It makes it much easier to debug.

THIS ONE IS REALLY IMPORTANT:

Do not break mysql support in order to support mongo! Fix both the mysql and mongo db layers, not in your application layer! Make it work, and don't do irreversible things that would brake mysql support. Support switching form mysql to mongo in a single configuration flag, so you can compare performance.

2. Data conversion - We used mongify, a neat ruby tool, which translates sql databases into mongo. Performance was a bit dodgy for huge tables, se we've contributed some code, which also upped the performance by ~20 times.

Important note for people using open source software - Don't just report bugs. You can fix stuff and contribute to the community.

[Especially if you're using it for commercial purposes]

Some things we've encountered during our conversion process:
- Dry run your conversion process. Dump your mysql, reload into a vanilla server with mongo installed, and do the dry runs from that server. The operational conversion is something you only need to do once, so it's ok to leave things for manual tinkering later!

- You'll see your application is working slower. Don't worry about it. There's a lot of tuning to do.

- Indexing: You need to take good care of this. Use explain({verbose:1}) for your big queries in order to find out why. Indexing in mongo will solve a lot of your performance problems.

- Large sorts won't work, even with indexes. In our case, it was a sort for an set of more than 130000 records. Instead of implementing paging, we've moved the sorting to the application (Works really fast, thank you). We will need to implement paging eventually, because we've just postponed the inevitable...

- Uniqueness: Like mysql, mongo has an ensureUnique method on index creation. We decided to add indexes manually and not automatically.

The next chapter will deal with more sophisticated tuning methods post conversion. Stay tuned :)

Thursday, March 27, 2014

From mysql to mongo in less than a week, chapter one

This week, we finally took the plunge at Lightapp, and migrated our database into mongo.

In the coming weeks I'll be writing a series of posts about the experience, along with some insights and tips, and (Of course) the end result, in numbers.

The first step was deciding we need mongo instead of mysql. This one was pretty simple. Our company's product reads data from numerous sources, then aligns them and analyses them.
Since we're using big EAV tables for storing all that data, mongodb is a far more suitable solution than mysql.
The stage was right, as our database only contains tens of millions of records, right before we migrate all our current customers (Which will scale it towards hundreds of millions). So, with the help of my good friend (And scale pro) Moshe Kaplan, we started the process.

To make things easier, we also decided on a '1 to 1' conversion. This means that we only switch our db layer, and do it without touching our application. Luckily we've designed our application just like that - Only 1 module was responsible for db connection and querying, thus it was simple to write a parallel mongo db module, and switch between the two seamlessly.

So the first major tip I recommend:
Build your db layer as a 'replaceable' layer. This means no SQL queries or DB connections outside your layer. If you currently have a written application with lots of 'history', just search for all the queries, and replace them with function calls (And filter parameters) into one file.
You might even find that you can get rid of some duplicated code in the process :)

Second tip:
Fiddle with mongo a bit. You can use a tool like pentaho to pull data from your mysql db, and into mongo, and then login to mongo, do some queries - Learn the whole query and CRUD mechanism in mongo. It's pretty simple, and has lots of great documentation all over the web.

Third tip:
Don't expect miracles. As we only switched our query functionality at first stage, we were not expecting a huge performance gain. Optimisation takes time, and the big value we were expecting from mongo was by moving our processing engine (Data bucketing and aggregation) lower from our application layer into the db layer.

Next thing was to approach the conversion process itself, but this would be elaborated in the next post of the series.

Stay tuned for more :)

[edit:] continue to chapter 2.

Wednesday, March 19, 2014

My take on Logging Levels

A long time ago, while developing a realtime server for Verint, the system's architect devised a great document called 'logging policy'.

It explained how a log message should look like, which information they should contain, and elaborated the rules for each log message level: For example: "There is no such thing as a 'Good' warning. Do NOT put 'Server is now up' messages in warning level".

Back then, the common practice ranged more than five levels, from critical to verbose.

At least at the beginning of development, I found that, though it's nice to do so, right now the most effective number of logging levels is 2. Debug mode and production mode.
Production mode should contain high level log events (Flow events, processing metrics) and errors, debug - the rest.

Logging is one of the strongest maintainability tools for your server. However, keep it simple, or you'll drown in misleading information.