• One-Click Installation for Clusters

Orchestrate and Parameterize Cluster Deployments with Chef

Standalone App


Automated Clusters

Karamel is an orchestration engine for Chef Solo that enables the deployment of arbitrarily large distributed systems on both virtualized platforms (AWS, Vagrant) and bare-metal hosts. A distributed system is defined in YAML as a set of node groups that each implement a number of Chef recipes, where the Chef cookbooks are deployed on github. Karamel orchestrates the execution of Chef recipes using a set of ordering rules defined in a YAML file (Karamelfile) in each cookbook. For each recipe, the Karamelfile can define a set of dependent (possibly external) recipes that should be executed before it. At the system level, the set of Karamelfiles defines a directed acyclic graph (DAG) of service dependencies. Karamel system definitions are very compact. We leverage Berkshelf to transparently download and install transitive cookbook dependencies, so large systems can be defined in a few lines of code. Finally, the Karamel runtime builds and manages the execution of the DAG of Chef recipes, by first launching the virtual machines or configuring the bare-metal boxes and then executing recipes with Chef Solo. The Karamel runtime executes the node setup steps using JClouds or ssh. Karamel transparently handles faults by retrying, as virtual machine creation or configuration is not always reliable or timely.

Existing Chef cookbooks can easily be wrapped to add the Karamel file. In contrast to Chef, which is used primarily to manage production clusters, Karamel is designed to support the creation of reproducible clusters for running experiments or benchmarks. Karamel provides additional Chef cookbook support for copying experiment results to persistent storage before tearing down clusters.

Declarative Clusters

A cluster definition file is shown below that defines a Apache Hadoop V2 cluster to be launched on AWS/EC2. The cluster defintion includes a cookbook called 'hadoop', and recipes for HDFS' NameNode (nn) and DataNodes (dn), as well as YARN's ResourceManager (rm) and NodeManagers (nm) and finally a recipe for the MapReduce JobHistoryService (jhs). The nn, rm, and jhs recipes are included in a single group called 'metadata' group, and a single node will be created (size: 1) on which all three services will be installed and configured. On a second group (the datanodes group), dn and nm services will be installed and configured. They will will be installed on two nodes (size: 2). If you want more instances of a particular group, you simply increase the value of the size attribute, (e.g., set "size: 100" for the datanodes group if you want 100 data nodes and resource managers for Hadoop). Finally, we parameterize this cluster deployment with version 2.7.1 of Hadoop (attr -> hadoop -> version). The attrs section is used to supply parameters that are fed to chef recipes during installation.

name: ApacheHadoopV2
    type: m3.medium
    region: eu-west-1
    github: "hopshadoop/apache-hadoop-chef"
    version: "v0.1"
    version: 2.7.1
    size: 1
        - hadoop::nn
        - hadoop::rm
        - hadoop::jhs
    size: 2
        - hadoop::dn
        - hadoop::nm

The cluster definition file also includes a cookbooks section. Github is our artifact server. We only support the use of cookbooks in our cluster definition file that are located on GitHub. Dependent cookbooks (through Berkshelf) may also be used (from Opscode's repository or Chef superrmarket or GitHub), but the cookbooks referenced in the YAML file must be hosted on GitHub. The reason for this is that the Karamel runtime uses Github APIs to query cookbooks for configuration information, available recipes, dependencies (Berksfile) and orchestration rules (defined in a new Karamelfile). The set of all Karamelfiles for all services is used to build a directed-acyclic graph (DAG) of the installation order for recipes. This allows for modular development and composition of cookbooks that each encapsulate their own orchestration rules. In this way, the deployment of complicated distributed systems, where the order in which services start is important, can be tested incrementally.

Loading page...