Thoughts After Playing with Amazon EC2 Container Service
Docker has been getting more and more popular in the past year. Even though with its loves and hates, and with the acomppanying security vunerability. The emergence of immutable micro service seems to be unstoppable. And Docker definitely is one of the pain killers in the market. Amazon launched EC2 container service in the end of the last year, to provides solution similar to open-sourced shipyard to manage docker host cluster. At the very beginning it could only be accessed through API, but now you could get into its own dashboard fro the AWS Console.
However, from a few days of trial experience, I am still quite unsatisifed with it. It is usable but having lots of gotcha getting your nerve. I would list a few of complaints as follows:
- By default, after you clicking through all of the buttons to “Get Started”, it would go for a complete new settings of AWS Cloudformation, and that would launch a whole new VPC and putting the docker host in the public subnet, which is undesirable in terms of the security, and ignore the possibility that you already have a VPC at your service. You have to find a way to workaround this default settings. That is, launch a Amazon Linux Container Optimized image from the “EC2” dashboard but not “ECS” console, and then lcreate a new cluster in the EC Container service, so that the instances you just launched in your exisiting VPC could be registered to the cluster you created. This is very unintuitive.
- It still lacks the integration with Opsworks, you are unable to laucnh Amazon Container Optimized Linux image from the Opsworks. This greatly reduce the accessibility of the EC2 container service. After all, you don’t get lots of monitoring provided by Opsworks for free.
- It is possible to pull the private image from docker hub, or your docker api in the docker image hosting managed by yourself. But you have to logged-in to the instance to change the settings for that. That is a pain if you manage a lot of machines. I hope there should be a tab in the console so that I could change it and automatically be effective.
- No error message forwarded by the agent in the console so that you know why the docker failed to run. In Opsworks, you would have the log generated by the Chef recipe.
Having to setup your own docker cluster is kind of painful and time consuming. I wish AWS could refine the EC2 container service to be truuly easy to use in the upcoming few months. At this moment, I have to stick with the CLI interface.
Notes on Apache Zookeeper
As an implementation of Paxos algorithm, Apache zookeeper has been incorporated into different distributed system as a configuration server or synchronization server. Apache Kafka and Solr Cloud are two of prominent open source examples, if you includes inhouse software for service discovery etc there would be more. However, the zookeeper has been known as a headache among Ops people. It is picky about the latency between different instances, and recovery and monitoring are also painful. Netflix released exhibitor ease the pain a little bit, but here I would still jot down the knowledge I am aware so far.
For a five instances zookeeper cluster, the zoo.cfg look like this, you have to specifying the server’s ip and port in the configuration
server.1=10.0.5.219\:2888\:3888
server.2=10.0.5.219\:2889\:3889
server.3=10.0.5.219\:2890\:3890
server.4=10.0.5.219\:2891\:3891
server.5=10.0.5.219\:2892\:3892
clientPort=2181
dataDir=/var/zookeeper/1
syncLimit=5
tickTime=2000
initLimit=10
dataLogDir=/var/zookeeper/1
If something very serious just happened, you have to cleanup the snapshot, you have to clean up the version-2 dir in the server’s directory. Then restart the servers. This only applies when you are unable to recover the consistency, using these commands would wipe out everything. Use it carefully.
cd /opt/zookeeper/zookeeper-3.4.5/
sudo bin/zkServer.sh stop conf/zoo.cfg
sudo bin/zkServer.sh stop conf/zoo2.cfg
sudo bin/zkServer.sh stop conf/zoo3.cfg
sudo bin/zkServer.sh stop conf/zoo4.cfg
sudo bin/zkServer.sh stop conf/zoo5.cfg
cd /var/zookeeper/1
sudo rm -rf version-2
cd /var/zookeeper/2
sudo rm -rf version-2
cd /var/zookeeper/3
sudo rm -rf version-2
cd /var/zookeeper/4
sudo rm -rf version-2
cd /var/zookeeper/5
sudo rm -rf version-2
sudo bin/zkServer.sh start conf/zoo.cfg
sudo bin/zkServer.sh start conf/zoo2.cfg
sudo bin/zkServer.sh start conf/zoo3.cfg
sudo bin/zkServer.sh start conf/zoo4.cfg
sudo bin/zkServer.sh start conf/zoo5.cfg
Zookeeper wouldn’t automatically cleanup the accumulated snapshots and log files. You have to manually clean them. -n 10 means only keep the most recent 10 records. In use case like kafka, the space could be consumed very fast, be sure to leave the enough of space for the partition.
sudo java -cp zookeeper-3.4.5.jar:lib/log4j-1.2.15.jar:conf:lib/slf4j-api-1.6.1.jar:lib/slf4j-log4j12-1.6.1.jar org.apache.zookeeper.server.PurgeTxnLog /var/zookeeper/1/ /var/zookeeper/1/ -n 10