Loading page...

A single experiment in Karamel is a Chef recipe, they usually contains "experiment" word in their name. Experiments have some parameters and produce some results. In runtime, Karamel binds values for experiment variables and in the end it downloads the experiment results to Karamel running machine. Like other recipes, experiments can have dependencies to other recipes - that are normally system requirements or pre-conditions - this dependencies are defined inside the Karamelfile.

- Experiment Designer

- Wordcount on Flink cluster

Experiment Designer

Experiment Designer in Karamel helps you to design your experiment in bash script or phyton without needing to know Chef or Git. Take the following steps to design and deploy your experiment.

1. When you have Karamel web app up and running, you can access the experiment designer from the Experiment menu-item on the left-hand side of the application.


2. Github is Karamel's artifact server, here you will have to login into your Github account for the first time while Karamel will remember your credentials for other times.



3.You can either create a new experiment or alternatively load the already designed experiment into the designer.


4. If you choose to create a new experiment you will need to choose a name for it, optionally describe it and choose which Github repo you want to host your experiment in. As you can see in the below image Karamel connects and fetches your available repos from Github.


5. At this point you land into the programming section of your experiment. By default in Karamel your experiment's name will be take as cookbook's name and the default experiment recipe is called "experiment". In the large text-are, as you observe in the below screenshot, you can write your experiment code either in bash or phyton, Karamel will wrap your code into a chef code. All parameters in experiment come in the format of Chef variables, you should wrap them inside #{} - Chef populates them at runtime. If you write results of your experiment in a file called /tmp/wordcount_experiment.out - if your cookbook called "wordcount" and your recipe called "experiment"- Karamel will download that file and will put it into ~/.karamel/results/ folder for your further consideration.


6. Placing your experiment in the right order in the cluster orchestration is very essential part of your experiment design. For that just mark the advance check-mark and specify to which other recipes in the cluster your experiment is dependent. After you locate all the dependencies you must give the right reference to the cookbook address in the second text-area.


7. In the end by pressing the save button your cookbook will be generated and will be copied into Github.


8. Look into your Github repo you can see your cookbook.




Simple Experiment: Measuring performance of word-count by Apache-Flink

This experiment has two parts, input generation and wordcount. We keep them separate experiments to measure their time individually. 

a) Random Text Generator

The following code snippet generates a random text of size 128MB and copies it into HDFS. Name of the file in HDFS comes after the running node's name to avoid conflicts in case we want several nodes to contribute in text generation for increasing parallelisation.

rm -f /tmp/input.txt
base64 /dev/urandom | head -c 128000000 > /tmp/input.txt
/srv/hadoop/bin/hdfs dfs -mkdir -p /words
/srv/hadoop/bin/hdfs dfs -copyFromLocal /tmp/input.txt /words/#{node.name}
a-1) Orchestration(Karamelfile)

The code generator needs all hadoop datanodes to be up and running for having access to filesystem and and a good replication.

a-2) Github References(Berskfile)

We refer to our hadoop coockbook which is already Karamelized.

cookbook 'hadoop', github: 'hopshadoop/apache-hadoop-chef'

b) Experiment: Word Count on Flink

The following code snippet runs flink wordcount on the generated text in the previous section. As you you can see parallesation parameter is the number of hadoop datanode (flink taskmanager) and it depends on size of your cluster. Namenode address will be also binded at runtime. 

/srv/hadoop/bin/hdfs dfs -rm -r -f /counts
cd /usr/local/flink
./bin/flink run -p #{node.hadoop.dn.public_ips.size} -j ./examples/flink-java-examples-0.9.1-WordCount.jar hdfs:///words/ hdfs://#{node.hadoop.nn.public_ips[0]}:29211/counts
b-1) Orchestration(Karamelfile)

If we call our textgenertor recipe "generator::experiment", our wordcount is dependent on that and taskmanager of flink. Karamel will make sure that dependency globally in the cluster. 

b-2) Github References(Berskfile)

We use our hadoop and flink cookbooks because they are already Karamelized. 

cookbook 'hadoop', github: 'hopshadoop/apache-hadoop-chef'
cookbook 'flink', github: 'hopshadoop/flink-chef'