Custom interpreters¶
Interpreters are the way to use other languages for tasks’ code directly in the pipeline XML. By default, tasks’ code are interpreted with bash, however you can define a custom interpreter for any task.
How to define an interpreter for a task¶
You need to add the interpreter
attribute to the <task>
element whose
code will be written in another language. The interpreter attribute takes a
bash script that will be run instead of the task code. You will then need
to run the task code manually (normally by means of an interpreter, such as
python, awk, etc.).
Inside this script you will have available all the task’s parameters as environment variables, as well as:
$task_code
. The task’s code. You can send this code to the correct interpreter.$task_id
. The task’sid
.$task_interpreter
. The task’sinterpreter
itself.$task_params
. A list of parameters of the task, separated by white-spaces.
Note
Portability: In order to keep the pipeline as portable as possible, it is recommended to use interpreters that are available in the majority of operating systems.
Examples¶
awk¶
Consider the following <task>
definition:
<task id="copy-lines"
params="input_file output_file_awk"
interpreter="/usr/bin/awk -e '$task_code' $input_file > $output_file_awk">
{
print $1
}
</task>
Here, the interpreter is: /usr/bin/awk -e '$task_code' $input_file > $output_file_awk
. When the task is about to run, the code that runs is simply that specified in interpreter
, which in turn runs the AWK interpreter against the task’s code (-e '$task_code'
), and uses the $input_file
and $output_file
parameters of the task to feed the AWK script and to save its output, respectively.
python¶
Consider the following <task>
definition:
<task id="write-parameter" params="input_file output_file_python"
interpreter="/usr/bin/python -c "$task_code"" >
import os
f = open(os.environ['output_file_python'], "a")
f.write(os.environ['input_file'])
</task>
Here, the interpreter is: /usr/bin/python -c "$task_code"
which simply uses the Python interpreter to run the tasks code (-c "$task_code"
). Inside the task’s code, you can see how to access the task’s parameters via the process environment (os.environ
, remember that parameters in Compi are passed to the process as environmental variables).
R¶
Consider the following <task>
definition:
<task id="task-r" interpreter="docker run --rm r-base:3.6.0 R -e "$task_code"">
<![CDATA[
a <- 1;
b <- 2;
cat(a*b);
cat("\nbye\n");
]]>
</task>
Here, the interpreter is docker run --rm r-base:3.6.0 R -e "$task_code"
, which uses the R interpreter (-e
) from the r-base
Docker image.
Note
Known issues with R interpreters:
- Previous versions of R (e.g. 3.2 and 3.4) fail to process multiline code using the
-e
parameter. Thus, it is recommended the most recent versions (e.g. 3.6.0). - Version 3.6.0 of R also fails to process multiline code when lines are indented using tabs instead of spaces. This bug can be overcome by using spaces to indent code (as in the example provided) or by using the following interpreter (it simply removes the tabs from the
$task_code`
usingtr
before passing it to the interpreter):R -e "$(echo "$task_code" | tr -d '\t')"