Creating a pipeline from scratch

To create a new TOPPAS file, you can either:

open TOPPAS without providing any existing workflow - an empty workflow will be opened automatically
in a running TOPPAS program choose: File > New
create an empty file in your file browser (Windows Explorer, MacOS Finder, Nautilus, etc) with the suffix .toppas and double-click it (on Windows systems all .toppas files are associated with TOPPAS automatically during installation of OpenMS, on Linux and MacOS you might need to manually associate the extension)

When you start TOPPAS, you will see the main window with a list of TOPP tools on the left side.

The following figure shows the TOPPAS main window and a pipeline which is just being created. The user has added some tools by drag&dropping them from the TOPP tool list on the left onto the central window (double clicking an item in the tool list also works). Additionally, the user has added nodes for input and output files. You can arrange the tools/nodes on the canvas freely by left-clicking them with the mouse, such that they become selected (bold) and then dragging (i.e. left-click and keep the mouse button pressed) them to their desired position with the mouse.

Note: To find TOPP tools in the list, you can either scroll through the list or use the search bar at the top of the list. The search bar will filter the list as you type, so you can quickly find the tool you are looking for.

On connections (=edges)

Edges determine the data flow of the pipeline. Connections can be drawn by dragging (i.e. left-click and keep the mouse button pressed) the mouse from the source to the target node. Before starting the drag, make sure that you de-select any node or edge by left-clicking anywhere on the white canvas background. When a connection is created, and the source (, or the target) has more than one output (, or input) parameter, an input/output parameter mapping dialog shows up and lets the user select the output parameter of the source node and the input parameter of the target node for this data flow - shown above for the connection between FalseDiscoveryRate and IDFilter. If the file types of the selected input and output parameters are not compatible with each other, TOPPAS will refuse to add the connection. It will also refuse to add a connection if it would create a cycle in the workflow, or if it just would not make sense, e.g., if an edge points to an input file node.

If an edge is painted orange which indicates it is not ready yet. Usually, because no input files have been specified.
A green edge indicates that the edge is ready to be executed.
A red edge indicates that the edge is not ready to be executed, e.g., because the input files are not compatible with the tool's input requirements.

The input/output mapping of connections can be changed at any time during the editing process by double-clicking an connections or by selecting Edit I/O mapping from the context menu which appears when a connection is right-clicked. All visible items (i.e. connections and the different kinds of nodes) have such a context menu. For a detailed list of the different menus and their entries, see TOPPAS Menus .

Configuring tool parameters

TOPP tools can be configured by double-clicking the tool node. By default, the standard parameters are used for each tool. Again, this can also be done by selecting Edit parameters from the context menu of the tool.

About input nodes

Once the pipeline has been set up, the input files have to be specified before the pipeline can be executed. This is done by double-clicking an input node and selecting the desired files in the dialog that appears. You can also drag'n'drop files from your file manager into the dialog to add them to the list.

About output nodes

Output files from any TOPP tool in the pipeline can be stored permanently (i.e., after the pipeline has finished and TOPPAS is closed) by adding either of these nodes after any TOPP tool:

an output files node
This node can be connected to any tool that produces output files - either a single file or a list of files.
an output folder node
This node can be connected to any tool that support output folders (which is more rare than output files), e.g., the QualityControl tool.

You should use these output nodes to store the results of any TOPP node you may need later on; typically the TOPP nodes which come last in the pipeline. If you do not add output nodes, the results from TOPP nodes will be stored in the temporary folder and will be deleted when you close TOPPAS. You can add multiple output nodes at different places in the pipeline to store intermediate results, if you feel you need them later on.

See Output and temporary files and Running the pipeline for more information on output and temporary files.

On "Recycling" mode

Input nodes and all TOPP nodes have a special mode named "recycling mode". Imagine a typical node, such as CometAdapter. Every time it runs, it consumes a single mzML file and a single FASTA file. Thus, the node has two input edges, one for the mzML file and one for the FASTA file. In a typical workflow, you have a bunch of mzML files, say five, in one input files node, but only one FASTA file the other input files node. CometAdapter will run five times. This is what we call a 'round', i.e. one invocation of the node. If you want to run CometAdapter with the same FASTA file for all five mzML files, you can set the FASTA input node to "recycle" the FASTA file. The alternative would be to have five identical FASTA files in the input node, which is not very elegant.

The input from a recylced node can be used an arbitrary number of times, but the recycling has to be "complete", i.e. the number of rounds of the downstream node (CometAdapter in our example) have to be a multiple of the number of input files. Typically, the number of items to be recycled is 'one' (e.g. one FASTA file), so this usually not a problem.

Recycling mode can be activated by right-clicking the input node and clicking the "Toggle recycling mode" entry from the context menu.

See On connections (=edges) for an example of a recycling input node.

On special nodes (Merger and Collector)

Sometimes, it is necessary to merge or collect files from different input nodes. This is where the Merger and Collector nodes come into play.

As its name suggests, a merger merges its incoming file lists, i.e., files of all incoming edges are appended into new lists (which have as many elements as the merger has incoming connections). All tools this merger has outgoing connections to are called with these merged lists as input files. All incoming connections should pass the same number of files (unless the corresponding preceding tool is in recycling mode). For example, if a merger has three incoming connections, it will pass on a list of three files to the next tool. This will happen as often as each incoming connection has files.

A collector node, on the other hand, waits for all rounds to finish before concatenating all files from all incoming connections into one single list. It then calls the next tool with this list of files as input. This will happen exactly once during the entire pipeline run. Typically, a collector node is used to collect all files from a FeatureFinder node (which is invoked many times, once for each raw file) and pass the list of resulting featureXML files to a MapAligner tool (which runs only once, on all featureXML files simulaneously).

There is also a splitter node, which is the opposite of a collector, but it should be required only in very rare cases.

Running the pipeline

Finally, if you have input and output nodes at every end of your pipeline and all connections are green, you can select Pipeline > Run in the menu bar or just press F5.

You will be asked for an output file directory where a sub-directory, TOPPAS_out, will be created. This directory will contain your output files. Also, you can specify the number of jobs (i.e. TOPP tool invocations) that TOPPAS is allowed to run in parallel (see On parallel execution below for details).

During pipeline execution, the status lights in the top-right corner of the tools indicate if the tool has finished successfully (green), is currently running (yellow), has not done anything so far (gray), is scheduled to run next (blue), or has crashed (red). The numbers in the bottom-right corner of every tool show how many files have already been processed and the overall number of files to be processed by this tool. When the execution has finished, you can check the generated output files of every node quickly by right-clicking on the node and selecting Open files in TOPPView or Open containing folder from the context menu.

Output and temporary files

In addition to TOPPAS_out, which holds all files captured in output files and output folder node of the pipeline, a TOPPAS_tmp directory will be created in the OpenMS temp path (call the OpenMSInfo tool to see where exactly). The TOPPAS_tmp will contain all temporary files that are passed from tool to tool within the pipeline. Both folders contain further sub-directories which are named after the number in the top-left corner of the node they belong to (plus the name of the tool for temporary files).

Note: Files in the TOPPAS_out directory are not automatically deleted after the pipeline execution. These are your results! You have to delete them manually if you don't need them anymore. Files in the TOPPAS_tmp directory are deleted automatically upon closing the pipeline or the TOPPAS GUI.

On parallel execution

You can specify the number of jobs (i.e. TOPP tool invocations) that TOPPAS is allowed to run in parallel in the "Run dialog" (after pressing F5). If a number greater than 1 is selected, TOPPAS will parallelize the pipeline execution in the following scenarios:

A tool has to process more than one input file, but can only handle one file at a time (as is the case for most TOPP tools; notable exceptions are MapAligners and FeatureLinkers). In this case, multiple instances of the same tool are run in parallel.
The pipeline contains multiple branches that are independent of each other. In this case, nodes in independent branches are run in parallel.

Be careful with this setting, however, as some of the TOPP tools require larger amounts of RAM (depending on the size of your dataset). Running too many parallel jobs on a machine with not enough memory may cause problems. Also, do not confuse this setting with the threads parameter of the individual TOPP tools: every TOPP tool has this parameter specifying the maximum number of threads the tool is allowed to use (although only a subset of the TOPP tools make use of this parameter, since there are tasks that cannot be computed in parallel). Be especially careful with combinations of both parameters! If you have a pipeline containing the FeatureFinderCentroided, for example, and its threads parameter is set to 8, and you additionally set the number of parallel jobs in TOPPAS to 8, then you may end up using 8*8=64 threads in parallel (if you have 8 or more input files), which might not be what you intended to do.

Mouse and keyboard

Using the mouse, you can

drag&drop tools from the TOPP tool list onto the workflow window (you can also double-click them instead)
select items (by clicking)
select multiple items (by holding down CTRL while clicking)
select multiple items (by holding down CTRL and dragging the mouse in order to "catch" items with a selection rectangle)
move all selected items (by dragging one of them)
draw a new connection from one node to another (by dragging; source must be deselected first)
specify input files (by double-clicking an input node)
configure parameters of tools (by double-clicking a tool node)
specify the input/output mapping of connections (by double-clicking a connection)
translate the view (by dragging anywhere but on an item)
zoom in and out (using the mouse wheel)
make the context menu of an item appear (by right-clicking it)

Using the keyboard, you can

delete all selected items (DEL or BACKSPACE)
zoom in and out (+ / -)
run the pipeline (F5)
open this tutorial (F1)

Using the mouse+keyboard, you can

copy a node's parameters to another node (only parameters with identical names will be copied, e.g., 'fixed_modifications') (CTRL while creating an edge) The edge will be colored as dark magenta to indicate parameter copying.

TOPPAS Menus

Main Menu bar:

In the File menu, you can

create a new, empty workflow (New)
open an existing one (Open)
open an example file (Open example file)
include an existing workflow to the current workflow (Include)
save a workflow (Save / Save as)
export the workflow as image (Export as image)
refresh the parameter definitions of all tools contained in the workflow. This is useful to make old pipelines run on the latest OpenMS/TOPPAS versions (Refresh parameters)
close the current window (Close)
load and save TOPPAS resource files (.trf) (Load / Save TOPPAS resource file)

In the Pipeline menu, you can

run a pipeline (Run)
abort a currently running pipeline (Abort)

In the Windows menu, you can

make the TOPP tool list window on the left, the description window on the right, and the log message at the bottom (in)visible.

In the Help menu, you can

go to the OpenMS website (OpenMS website)
open this tutorial (TOPPAS tutorial)

Context menus:

In the context menu of an input node, you can

specify the input files
open the specified files in TOPPView
open the input files' folder in the window manager (Windows Explorer, MacOS Finder etc)
toggle the "recycling" mode
copy, cut, and remove the node

In the context menu of a tool, you can

configure the parameters of the tool
resume the pipeline at this node
open its temporary output files in TOPPView
open the temporary output folder in the file manager (Windows Explorer, MacOS Finder etc)
toggle the "recycling" mode
copy, cut, and remove the node

In the context menu of a Merger or Collector, you can

toggle the "recycling" mode
copy, cut, and remove the node

In the context menu of an output node, you can

open the output files in TOPPView
open the output files' folder in the window manager (explorer)
copy, cut, and remove the node