Hello world

The simplest possible Snakemake pipeline is one that produces a single file. Here we will show you how to make it.

To begin with, you need to tell Snakemake the name of the file you want to produce. This is done by writing

rule all:
    input: "hello/world.txt"

This tells Snakemake that the pipeline should create a file called hello/world.txt.

(If you are wondering what rule all: and input: means, that is just telling Snakemake that you have a rule called all that takes the file hello/world.txt as input. You will learn more about rules in this and the following chapters.)

Save your Snakemake-workflow as pipeline.txt. Now you can run your pipeline with snakemake --snakefile pipeline.txt.

This should be your result:

MissingInputException in line 1 of snakemake_book/chapters/hello_world/Snakefile1:
Missing input files for rule all:
hello/world.txt

The problem is that you have only told Snakemake what file to produce, not how to produce it. Snakemake has looked through your pipeline, but not found any instructions for how to produce the file. That is what the error message means.

Now we will tell Snakemake how to produce "hello/world.txt". To do that you need to create a new rule. You can call your rules almost anything, but it is best to choose a descriptive name. (As long as your rules aren't called the same thing, the name does not matter to Snakemake though.)

Let's call the rule hello_world. Write it in your "pipeline.txt" file after your all rule.

rule hello_world:
    output: "hello/world.txt"
    shell: "echo Hello World"

There are two new directives here, output and shell. (By the way, these keywords with a colon following them are called directives.) Output tells what file the rule should produce. Shell is where you write the command that should produce the file. The one we have written above just means "write 'Hello World'", which is exactly what we want to do in our file.

Now running your pipeline with snakemake --snakefile pipeline.txt should produce the following output:

Waiting at most 5 seconds for missing files.
Error in job hello_world while creating output file hello/world.txt.
MissingOutputException in line 4 of snakemake_book/chapters/hello_world/Snakefile2:
Missing files after 5 seconds:
hello/world.txt
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message

Everything seemed to work so well! What happened?

The problem is that the shell command you wrote does not write anything to a file. It just writes Hello World to the screen. This makes Snakemake complain, since the output directive of the rule says that it should produce the file "hello/world.txt", but the shell directive did no such thing. You will need to modify your shell command to the following:

"echo Hello World > hello/world.txt"

In shell this means "write 'Hello World' to the file hello/world.txt", which is what we meant to do above.

Now when you run the pipeline, it should work:

Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
    count    jobs
    1    all
    1    hello_world
    2
rule hello_world:
    output: hello/world.txt
1 of 2 steps (50%) done
localrule all:
    input: hello/world.txt
2 of 2 steps (100%) done

Congratulations! You have written your first Snakefile! You are one step closer to that Nobel prize!

Takeaways

  • The basic building blocks of Snakemake workflows are rules. These produce the files you want.
  • Rules consist of input and output files, together with the code that should produce the output files.
  • There is one very special rule, normally called all —which you can call whatever you want, Snakemake does not care, remember?—, that does not produce anything, it just says what files the whole pipeline should produce.
  • The only thing that tells Snakemake that the all rule is the all rule is that it is the very topmost rule in the Snakemake file.
  • A pipeline will fail if your rules do not produce the files you say they are going to do.

Exercises

  1. Try changing the name of the file in the input directive of the all rule to hello/world.csv. What happens when you now try to run the pipeline?
  2. (Change the workflow file back to the working version before doing this exercise.) Now change the name of the file in the output directive of the rule hello_world to hello/world.rst. What happens now when you try to run the pipeline?
  3. Why are the error messages so similar in exercises 1 and 2? And what does the subtle difference in the error messages mean?

Advanced exercises

  1. Are you able to make the pipeline produce two files, where the second file is called "hei/verden.txt"? Hint: You will need to change the input directive of the all rule to input: "hello/world.txt", "hei/verden.txt".
  2. If you did not do so already, are you able to produce both files in the same rule?
  3. Basically, all our code did was to write the words "Hello World" to a file. This can be done with one simple line in your terminal: echo Hello World > hello/world.txt. To recreate what snakemake did, first remove the folder hello (and therefore the file it contains called world.txt) with the command rm -rf hello. What happens now when you try to run the command echo Hello World > hello/world.txt? Why does it happen?

results matching ""

    No results matching ""