This weekend, we moved our cluster from Rackspace to Amazon.
It was a very tense process, as we need to transport GBytes of data seamlessly from one server to another, as well as shift our data writing server from the old database to the new one, while losing as little data as possible.
Eventually, it was done in less than two hours, most of which were exporting data from the old server and transporting it into the new one.
My main tips for performing such a move:
- Sit down before, and write the process step by step. Preparations for the new server, exporting and transporting the data, updating DNS services, testing. Discuss and review with your team.
- Review the process, and estimate risks and contingencies. What happens if it takes too long to transport the data? What happens if you have to roll back? What happens if the new server crashes?
- In case of DNS updates, it would be faster and more reliable to add the new domain ip to /etc/hosts instead of waiting for DNS to refresh, which might take time. Don't forget to jot down the IPs of your old servers, you might need them :)
- In our case, we also had some one premise sensors with no access to DNS, which means we had to use both the old and new stacks active, both working with the new database, until we're able to access those sites and change the destination IP.
- After you've got your plan laid down, do a dry run. There's nothing like a dry run to sort out bugs and add missing steps.
- Be 100% clear about who performs which step and when, but keep one person in charge of the whole process.
- Keep everyone in the loop - Upgrade was performed Friday night, so we decided to do it from home.. We were a team of three, using Google Hangout to communicate while everyone is online.
Eventually, process went quite well, Amazon SSD servers are fast and zippy, and we're ready for our next scale challenge, which would have us scaling to tables larger than 100m records while keeping our high performance standards.