Loading page...

Download and Run Karamel

A Hadoop Cluster on Amazon EC2

You will need to have an account on Amazon Web Services. You should have an account-key (id), and an account secret-key (password). The secret-key is very long - 20 characters.

You should then open a Cluster Definition YAML file. The Cluster Definition file defines the machines you are going to create and the software that will be installed. Here is an example of an Apache Hadoop cluster - the hadoop.yml file. It can be found in the "examples" directory included in the Karamel download. Here is what the YAML file looks like:

name: ApacheHadoop
    type: m3.medium
    region: eu-west-1

    github: "hopshadoop/apache-hadoop-chef"
    branch: "master"

    version: 2.7.1
    size: 1
        - hadoop::nn
        - hadoop::rm
        - hadoop::jhs
    size: 2
        - hadoop::dn
        - hadoop::nm

Launching Clusters from the command-line in Linux/Mac

You can either set environment variables containing your EC2 credentials or enter them from the console. We recommend you set the environment variables, as shown below.

export AWS_KEY=...
export AWS_SECRET_KEY=...
./bin/karamel -launch examples/hadoop.yml

After you launch a cluster from the command-line, the client loops, printing out to stdout the status of the install DAG of Chef recipes every 20 seconds or so. Both the GUI and command-line launchers print out stdout and stderr to log files that can be found from the current working directory in:

tail -f log/karamel.log

How to write a cluster yaml?

We define each cluster in a yaml format like the Hadoop example given here. Each cluster is identified by a name which has to be unique in the scope of the Karamel runtime. Cluster yaml supports two level scopes, global-scope and group-scope. All the settings in the global-scope can be overridden inside the groups. Main building blocks of yaml file is provider, cookbooks, attrs and groups. 

1. Provider

We preferred not to use provider keyword instead user must directly specify type of the provider such as "ec2", "baremetal" or "vagrant". For the moment only ec2 is functional and the rest will be available soon. In ec2 block user can choose either type of the image and region or an AMI image. 

2. Cookbooks

In cookbooks section, the github address for all used cookbooks has to be defined. By default Karamel will take the master branch for each cookbook unless otherwise is specified in branch or alternatively a specific version of the cookbook can be referred to. 

3. Attributes

Attributes are configuration parameters that each cookbook uses in the "attributes/default.rb" file . Only those attributes that are expressed in the metadata.rb file are public and user can set the value for them, hence, Karamel won't accept any attribute that is not specified in the metadata.rb . Attributes come under the attrs section, this section follows the same hierarchical structure as in the kitchen.yml file. 

4. Groups

Group is a set of machines running the same stack of software/services. User must define the size that is number of machines in the group, recipes which is list of all the recipes that needs to be install on all machines in this group, and in the case you want to have new/specific configuration for provider or attributes just follow the same structure in the above mentioned sections of the global-scope. 

Karamelize a cookbook

Take following steps to make your cookbook readable and executable by Karamel.
  1. Deploy your cookbook into a public github repository.
  2. Karamel assumes that each cookbook has install and default recipes. By default all the recipes of each cookbook are locally dependent on the install recipe, so your cookbook must contain the install recipe.
  3. Define all the public recipes and attributes in the metadata.rb file, only those are visible to Karamel system. To improve user experience, write a description for recipe and display_name, description, type (data-type) and a default value for attributes to be displayed in Karamel GUI. 
  4. Create the Karamelfile at the root level of each cookbook, it holds the cluster-wide and recipe to recipe decencies. Karamelfile is the source of information for ordering and orchestration in Karamel, therefore, when you are defining dependencies in the Karamelfile you specifying precedence of your recipes among all other recipes in the cluster. When there is a/some prerequisite dependency(yes) for a particular recipe you must mention them in the Karamelfile. Below is a sample example of Karamelfile, local dependencies means this recipe just waits until the dependent recipe is finished on the current machine, but global dependencies means this recipe on every machine waits for the all the dependent machines in the cluster to be run first. 
  - recipe: ndb::ndbd
      - ndb::mgmd
  - recipe: ndb::mysqld
      - ndb::mgmd
      - ndb::ndbd
  - recipe: ndb::memcached
      - ndb::mgmd
      - ndb::ndbd
      - ndb::mysqld