Only when certain events recur in accordance with rules or regularities, as is the case with repeatable experiments, can our observations be tested — in principle — by anyone.

Karl Popper, 1959

#### CITK — Why?

Research on autonomous robots and human-robot interaction with systems that integrate a large number of skills in a single architecture achieved considerable progress over the last years. Reported research results are typically validated through experimental evaluation or demonstrated live at robotics competitions such as the DARPA Robotics Competition, RoboCup or RockIn. Given the complexity of these systems, many of the described experiments cannot easily be reproduced by interested researchers to confirm the reported findings. We consider this a critical shortcoming of research in robotics since replicable experiments are considered good experimental practice in many other research disciplines. Despite this observation, robotics has already made significant progress. This trend can mainly be attributed to the following developments. Firstly, diverse “off-the-shelf” robots have become available that ideally only need to be unboxed and powered-on, e.g., the PeopleBot, PR2, NAO, or iCub. These are often available in simulation. Secondly, there are open source and community-driven software ecosystems, established frameworks and libraries available, such as ROS, OPRoS or Orocos which support researchers by providing sophisticated software building blocks.

Lastly, dedicated activities towards systematic benchmarking of robotic systems have been carried out in terms of toolkits for benchmarking and publicly available data sets, e.g. the Rawseeds Project. From our point of view, these are promising developments that foster reproducibility in terms of hardware as well as software aspects. However, besides these initiatives, there are also more fundamental methodological issues that prevent reproducibility of robotic system experiments. For instance, deficiencies in experimental methodology. This includes the frequently neglected impact on experiments caused by the relationship between individual components and the whole system, as well as the way how publications need to be written in order to improve reproducibility. We identified the four following issues that are critical with respect to sustainable reproducibility of robotic system experiments:

• i) Information retrieval and aggregation: Publications and associated artifacts relevant for reproduction (software components, data sets, documentation and related publications) are often distributed over different locations, like digital libraries or diverse websites. Hence, already the discovery, identification and aggregation of all required artifacts is difficult. Furthermore, this kind of information is typically not available in a machine interpretable representation.
• ii) Semantic relationships: Often, crucial relationships between artifacts are unknown or underspecified, e.g.: which specific versions (master or 1.33.7) of software components in combination with which data set, hardware or experiment variant was in use for a particular study?
• iii) Software deployment: Most current systems are realized using a component-based architecture and usually not all components are written in the same language. Consequently, they do not make use of the same build infrastructure, binary deployment mechanism, and execution environment. Therefore, it is an inherently complex and labor-intensive task to build and distribute a system in order to reproduce experimental results. This becomes even more complex when experiments require software artifacts from more than one ecosystem because there is usually no cross-ecosystem integration model.
• iv) Experiment testing, execution and evaluation: Advanced robotics experiments require significant efforts spent on system development, integration testing, execution, evaluation and preservation of results. This is particular costly if many of these tasks are carried out manually, which is intriguing, as established methods from software engineering are available to automate these tasks, e.g., based on the continuous integration (CI) paradigm. To the best of our knowledge, so far, these techniques were not widely adopted by the general community for the iterative design, automated execution and ensured repeatability of robotics experiments. To tackle these issues we introduce an approach for reproducible robotics experimentation based on an integrated software toolchain for system developers and experiment designers.

#### Exemplary User Stories

Developer View

Steve is a researcher in the field of robotics and has developed a novel method based-on a corresponding system that he has developed with the help of a newly introduced software tool chain --- the CITK. The incorporated software tools and the CITK's underlying iterative development process allowed him to consistently specify his system setup in a structured manner by providing textual and formalized specifications of required software component versions, their dependencies and associated artifacts, such as configuration files and runtime parameters. Furthermore, during the development phase the tool chain enabled him to \emph{iteratively} extend the system specification, automatically deploy new versions, record and verify intermediate results, frequently test new algorithms or variations with little effort. At the time Steve was pleased with the results of his method, he fixed the required artifact versions of all included system components. He also recorded multiple data sets during experimentation that, when replayed, provide realistic data streams in order to repeatedly produce the desired system behavior (repeatability). The CITK tool chain also allowed him to automatically publish all required system artifacts --- based on the textual representation --- in a dedicated web-based catalog platform. The web-based catalog also includes instructions how to install his entire system and how to run his experiments in a simulation environment using the recorded data sets. Fortunately, a paper about his method has been accepted at a major robotics conference. Before submitting his final version, Steve provided a reference to the URL of the system's catalog entry in his paper. In the catalog, he additionally linked other relevant research artifacts, such as related data sets, previous publications, and akin systems. Eventually, Steve presents his new method at the conference. Since Steve encourages the idea of open and reproducible research his publication is made openly available. He also adds the paper to the system entry in the CITK web catalog.

User View

Kerry is also a researcher in the field of robotics. She attended Steve's talk at the conference that is related to the field she is working on and hence, has great interest in it. A new method as introduced in Steve's talk may foster Kerry's work in further research. Back at her office, Kerry quickly locates Steve's openly available paper on web. Kerry is able to obtain the full version that includes a link to a web catalog platform. She browses the catalog that provides detailed information about how to technically reproduce Steve's software system using a tool chain that is called the CITK. Following these instructions she first fetches a software tool set that is explained in the web catalog's tutorials and, by utilizing it, she downloads Steve's source code, binary artifacts, related data sets and material. The instructions also explain how to easily install the system on her machine by merely execution a few command lines. Moreover, the catalog provides instructions about how to execute Steve's system, including a simulation environment, in order to reproduce (reproducibility) the results reported in the associated publication. After installing Steve's system as referenced in the publication she runs his experiment by executing \emph{just one command line}. At the end of the simulation run, plots identical to the ones in Steve's publication, are automatically generated based on her experiment run. Kerry now investigates other components listed in the catalog and reconstructs their functionality. After a while she is able to alter the input parameters of diverse software components involved in the experiment. Therefore, she is able to change the experiment's outcome to approximate her research scenario. Kerry is impressed. Thus, for the next step in her research, she builds up a system based-on the CITK tools and process. Kerry studies further developer tutorials and getting started'' pages in the catalog. She also subscribes to developer mailing lists in order to connect to the community. After short a while Kerry is able to reuse and further extend Steve's system, as well as to integrate her own algorithms and data sets. After extensively testing her system, using the CITK's testing and evaluation facilities, she also publishes her system in the web-based catalog and links it to Steve's system. After a few days Steve checks his system entry and recognizes Kerry's new entry as well as the added link to his system. Steve decides to replicate Kerry's system and to check her results and added functionalities.

Project View

A fund has been granted to a cooperative robotics research project. The two participating Universities are locally distributed and thus do not share a colocated infrastructure/laboratory. However, the two institutions use the same robot platform in their project. Since they share the same research goals, the partners need to exchange software, data sets and results. Both partners are required to develop new algorithms and utilities based-on their already established software build and ecosystems. While one partner completely relies on a widely-used ecosystem, the other has its own ''in-house'' build and runtime environment. In order to minimize the transfer and integration overhead, the development of the project's demonstrator scenario is based on the CITK. By using the CITK, the contributed software components are modeled in the CITK's tool chain individually, but can easily exchanged and merged without much effort by setting up a centralized repository for individual system descriptions. Thus, independently developed sub-systems and milestone demonstrators are also based on the CITK's system specification. This enables both Universities to seamlessly install, test and verify each others work packages. Lastly, studies that have been conducted using, including the recorded data and evaluation scripts, can be reliably reproduced. Lastly, project members are able to browse and inspect diverse software components and other material in the web-based catalog. That way, also student assistants get easy access to all required resources and gain a complete overview of a (sub) system before start helping in the projects. The students can even install a software system on their own machines, e.g. a laptop, and can become familiar with it before working on the actual production system.

#### The Concept

In general, the CITK process incorporates a set of software utilities and methodologies to enable researchers to aggregate artifacts used in computational science: programs and programming languages, frameworks, libraries, and associated artifacts such as publications, data sets, experiment protocols and user documentation. These tools and artifacts are considered to be part of the daily working routine, environment, and infrastructure of a researcher. By utilizing these tools and data sources, researchers create stand alone software components and often also entire integrated systems. In combination with a dedicated hardware stack (robot), these systems are used to investigate their research questions. The collection of tools and related artifacts is oftentimes novel, prototypical, unsystematic and not well documented (cf. Mesirov). This situation leads to isolated knowledge about the internals of a system, which in turn makes it difficult to reenact by others.

In the beginning of a project, researchers usually identify an initial set of soft- and hardware components that are already available and that will be used in order to assemble a first prototype system to build upon. As mentioned earlier, these components are usually gathered in an unsystematic collection which originate from different fields or departments and come with most diverse requirements (Figure (1)). One could also think of this initial assembly of components as an unstable, unordered list without any explicit relationships --- in terms of a shared integration model for instance (Figure (2)). In this context, the CITK system descriptions and associated tools and platforms support a broad and open set of entities, e.g., not only support for a particular programming language, build environment, data structure or a particular robot.

Hence, the CITK process engages in an early stage during the system planning, design, and implementation phase. To this end, the CITK provides facilities to allow for a systematic work flow in order to describe and aggregate the required hardware, software components, their dependencies, and related artifacts. This is achieved by creating structured, versioned, reusable, and machine interpretable text files, which are called recipes'' that represent software components, artifacts (configuration files, data sets, etc.) and ultimately form entire system descriptions which are compositions of diverse software components and artifacts --- they are referred to as distributions''. The template-based recipes and distributions are collected in a central public web repository (Figure (3)). Openness is archived by providing generic and user-extensible recipe templates and an inclusion, rather then exclusion strategy --- with respect to supported build environments for instance. Distributions must include all necessary artifacts that are required to build, test and run a robotics system experiment, including an executable protocol of how and what software is required to be executed (cf. scientific work flow engines) in order to obtain the published results.

To make the transition from a textual system description (distribution) to an applicable and directly usable robotics system, the CITK proposes an iterative development process which is aligned with current best practices in both software- and robotics engineering. In a first step, by utilizing the CITK buildjob-generator tool, a researcher's system description is transformed to so-called build jobs and are uploaded to a Continuous Integration server (Figure (4)(1)).

    ''Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an automated build, allowing teams to detect problems early.''


A CI server is a stand-alone program that is capable of downloading source code or any other content from the web (or local file system) to check and verify the current status of the content. Depending on the content type, the check/verification process may be either ''just'' a compilation process, unit test execution or a simple file consistency check, as well as any arbitrary user defined method. A CI server usually features a graphical user interface and notification capabilities in order to report the status of build jobs to its associated developer/s. After the transformation and upload via the buildjob-generator, each software component and artifact (downloadable data set, configuration file, executable experiment work flow) is represented as a single build job on the server. Build jobs can be triggered frequently based on a timer, by changes in the corresponding repository or manually by the researcher. In order to install all required software components and artifacts (the system), a so-called ''build-flow'' job, which is also generated by the buildjob-generator, needs to be triggered. After the build-flow job has been activated, all other subsequent jobs are automatically invoked in the correct order. If all jobs have been run successfully (e.g. no errors occurred), the system is installed in an arbitrary location.

CITK-enabled software components and systems can be tested and validated according to appropriate quality measures in an automated and frequent way. This mechanism is inspired by the aforementioned Continuous Integration paradigm ciquote. On the one hand, validation measures can be derived from simple native metrics --- because we are talking about software intensive systems --- such as: compiles without warnings/errors or all unit tests are passing, etc. On the other hand, the CITK process extends the CI-based testing and verification process to the automated execution and verification of claims made in associated publications (Figure (4) 01). How does this work?

For instance: algorithm A as described in the referenced paper P produces the desired output O by using data set D.

Therefore, an entire software system is frequently compiled (if necessary), executed, and evaluated according to the metrics, claims in this case, as presented in the corresponding publication using a CI server. For native metrics, with respect to single software components, method and tool support is already well-established in the software engineering community, e.g., via static (check-style) and dynamic (unit) tests. In contrast, the verification of scientific claims relies on methods and metrics that are often dependent on the research question and which are not standardized. Therefore, they are defined in and implemented by the investigating researcher's software stack (cf. Bonsignorio et al. bonsignorio-2009-defining). In the CITK, the evaluation methods and tools are encapsulated in recipes, that are part of the system's distribution file, such as: R, Matlab, SPSS, Python or shell scripts. These scripts, executables or work flow files provide means to evaluate the produced data while running a CITK-enabled system. As a reminder, all necessary artifacts that are required to build, test and run a robotics system experiment, including an executable protocol are part of a CITK system description. The execution protocol, thus the correct start-up and execution sequence of all required software components (e.g. command lines), as well as automated shut-down and reporting capabilities are provided by another CITK tool which is called Finite State Machine Based Testing (FSMT).

FSMT experiment runs, are also represented as recipes and are therefore transformed to build jobs. Hence, by triggering an experiment build job a complete system is started, evaluated and shut down automatically. During an FSMT build job run, the data is evaluated by i) running a simulation or instantiating the CI server (build job) on a physical robot, ii) executing the incorporated evaluation tools and iii) generate appropriate reports, e.g., text files, plots, other artifacts on the CI server. The reports of each experiment run are archived on the CI server and thus a history over multiple runs trial be accessed by the developing researcher, or even by others.

Furthermore, the CITK development process facilitates the CI-based deployment of an entire system, e.g., in terms of a persistent installation on a researcher's machine or a publicly accessible network storage. Thus, after testing and validating the system on the server, the deployment step can also be used to provide a shared installation in order to give other researchers access to it. The deployment step is also repeatable, and of course automatable (Figure (4) 02). The development process also foresees a dedicated and necessary versioning step where all software components and artifacts are tagged (fixed) and the corresponding recipe/distribution file is updated according to the current version, e.g., version-1.1, stable, nightly (Figure (4) 03). By using the CITK tool set, these incremental system versions are ideally instantly recoverable'' or, in other words, technically reproducible.

Lastly, in this process the aforementioned system descriptions and corresponding build jobs are interpreted, aggregated and uploaded to a human readable web-based representation. This web-based representation serves as a catalog'' for scientific software systems, hardware, experiments, and other relevant artifacts. The catalog also allows to import third-party content, e.g., associated publications and data sets from their original data sources, from publicly available digital libraries for instance. Moreover, a CITK user is able to cross reference different, but yet related, entries in the catalog, even if not authored by him or her. This link mechanism is enabled across all entries in order to foster reuse and to create a network of scholarly artifacts between applied technologies, systems, and fields. This allows outside researchers to conveniently browse and discover software components, experiment results (the status is imported from the CI server), and entire systems including their dependencies, meta data, and the context in which they have been applied (Figure (4) 04). The upload and synchronization to the catalog is again realized using a dedicated build job on the CI server.

Hence, in contrast to akin work, the CITK development process incorporates a complete ecosystem of tools and services that foster the retrieval, creation, and most importantly: the actual application of replication for intermediate (in the development phase) and final published research findings in software intensive robotics experiments. The ultimate goal of the CITK is enable researchers to use the same tool set for system development and replication (cf.userstory). Given the combination of a dedicated set of tools and the previously depicted iterative development process, the reproduction and verification work flow is already embedded in a researcher's daily routine. Cooperating researchers are able to include their data sets, software components, and evaluation methods starting in a very early stage of a project. Even if the robot hardware is not yet available, a simulator and the corresponding software stack including evaluation scripts can be part of the corresponding distribution file. As mentioned earlier, this setup can be used to, e.g., verify first experimental control algorithms or simulated user studies iteratively --- running through the whole toolkit chain: system description, transformation, testing, verification and deployment and catalog upload.

Subsequently, all these tools, resources, and methods can be openly shared with other outside researchers and also allow them to reproduce findings quickly using the same methodology. No additional effort is required in order to make findings reproducible in retrospect.

#### The Model

The fundamental basis of the model, which has been derived from the CITK vision, is a system version (figure above). A system version mandatorily consists of multiple software component versions --- the functional building blocks of a system. Versioning is enforced by, e.g., referencing specific source code repository tags and branches, commits or binary release identifiers. As proposed in the vision, these software components and their configuration must be continuously built (compiled) and tested in a suitable infrastructure in order to provide functional monitoring. Thus, a software component version must always have a testing process artifact attached to it. Here, corresponding artifacts are, e.g., hyperlinks to the current build and unit test results.

Moreover, the documentation, e.g., user documentation, hardware documentation, and how-tos of a system version and also the documentation of incorporated components (API docs, etc.) must be versioned hence, are system version-specific too. These documentation artifacts are, e.g., PDF's, plain text documents, and html sites. Hyperlinks to those documents must be referenced in the corresponding entities, such as a software component version. In the ideal case, a system version eventually produces data which leads to a publication. This publication and obtained data sets are also part of the artifact model and must be included, and more importantly, linked from their origin location to the system version.

An experiment description or protocol, that was adhered in order to acquire published results in a corresponding computational experiment, must also be provided along with, and linked to a system version. This protocol can be a textual representation but ultimately must be transformed into a digital representation, i.e., an executable work flow engine file. If feasible, the entire experiment must also regularly be executed. Here, the work flow file in combination with a linked data set can be used --- in simulation for instance --- to provide functional verification and to prevent regressions over time. Results, derived in the verification process like experiment-related archives, e.g, plots and images must also be attached to associated the system version.

Lastly, a textual description of the utilized hardware including links to the documentation of customizations or deviations from the delivery status must be noted and linked to the system version.

#### The Concept

Usually, the starting point from a researcher's or engineer's point of view (Steve) is the source code of her or his software components that are to be integrated into a system or scenario (figure above (1) orange path). These components are often located in different distributed repositories and are usually written in different languages and thus make use of diverse build environments (CMake, Setuptools, etc.). For instance, ROS provides Catkin as the build system and NAOqi promotes qiBuild. Both solutions have chosen CMake, a standard build system for cross-platform C++ builds, as their basis. While such a solution is straightforward, it comes with several drawbacks: Firstly, developers have to learn a new technology, which sometimes results in refusal to integrate at all. Secondly, established build systems, especially for other languages than C++, are locked out.

##### Recipes and Distributions

The CITK addresses this issue by applying a recipe-generator-based solution (2). A researcher who is familiar with his software stack and build environment initially creates minimalistic so-called "recipes". Recipes are template-based text files written in JSON syntax. Here, the logic (and required knowledge) of utilized build environments has been encapsulated in per-build system templates. When writing a recipe for a software component, a researcher needs to provide a) a build system identifier, such as CMake, b) the location (URL) of her or his source code, and c) available branches or tags in the most minimal version of a recipe. Since new templates can be added on the fly, additional build systems can be incorporated easily without restricting component developers to certain choices. At the time of writing templates for these common build systems are supported: CMake (C/C++), Catkin (ROS), Maven (Java), Setuptools (Python) and Autotools (C). A cooperating researcher, who is not familiar with CMake but is interested in a component that uses CMake for compilation and installation for instance, just needs to "invoke the desired recipe" since the underlying command logic, e.g.,

cmake -D$build-variables .. && make && make install  is abstracted by the corresponding template. The benefit of this solution is clearly that researchers are able to use a shared infrastructure in order to integrate their (and other's) tools and components --- even without detailed technical knowledge of the used build environment. Additionally, recipes also feature a "freestyle" template. In freestyle mode, an extra field in the recipe can be used to execute any UNIX command. This feature can be used to download, extract, and install related data sets for instance. Lastly, required operating system dependencies, i.e., Debian packages are defined in a component recipe. The figure above (3) depicts a system composite. A system composite is an aggregation of multiple recipes and thus, constitutes a complete software system. In the context of the CITK this composition is called distribution. Besides aggregating all required software components, data sets, and additional freestyle artifacts a distribution also defines the versions of included software components by referencing branches or tags denoted in the recipes. Therefore, the distribution itself always represents a specific version, i.e, Flobi-0.1. Moreover, a distribution may additionally define fundamental operating system dependencies, e.g., the build-essentials package, and the install prefix. The install prefix is the operating system path where an entire distribution is installed. The CITK concept supports multiple installations in different prefixes in order to allow distributions to be self-contained, e.g, in the researcher's home folder, i.e.,$USER/flobi-0.1/bin. This mechanism also prevents interference with the base operating system. Also, such an installation strategy does not require administrative permissions.

##### Continuous Integration and Build Jobs

The next step of the CITK concept is depicted in the figure above in (4). Here, a newly implemented generator tool is executed which uses the aforementioned distribution, and subsequently the incorporated recipes, as input and generates so-called build jobs for a Continuous Integration (CI) server. Usually, a CI server is a web-based service that hosts several build jobs. A build job, by convention, defines how a software is built (i.e. compiled), tested, and deployed. In most CI server implementations build jobs can be chained thus, a build job hierarchy can be established. The hierarchy is mostly manually defined and based-on developer knowledge, i.e., component A must be built before component B, otherwise the build job for B will fail. Moreover, a CI server provides a web font-end for visual job monitoring, i.e., to inspect whether job failed or succeeded. If a build job fails, developers can be automatically notified by the CI server via email and react instantly. This notification capability is the core of the Continuous Integration paradigm which allows to automatically trigger (scheduled or based-on code check-in) build jobs and chains to get immediate status feedback. However, from the included recipes that augment the artifact model and an automatic source code repository analysis, the generator tool derives the dependencies and required build steps automatically. Thus, no manual developer interaction is required in order to correctly chain all generated jobs.

##### Deployment

Afterwards, the corresponding CI server build jobs are instantiated along user-defined build templates and uploaded to a running CI server instance. Moreover, different jobs for either deploying a system in an arbitrary self-contained prefix, or supporting the ongoing development process by, i.e., running implemented unit tests of each component and skipping the install step, can be generated from the same knowledge base. Since setting up an appropriate CI server with all required plug-ins takes some time and requires expert knowledge, the CITK provides pre-packaged installations for new users. This adds the benefit of using the same build and deploy mechanism to either deploy globally, i.e., in a prefix that is accessible by a whole department for instance --- or locally, such as in the user's home folder for local testing and experimentation. In the latter case, a researcher is capable of simply starting the CI server and executing the generator, which is included in the pre-packaged CI download, using the desired distribution file as input. After executing these steps all required software components, configuration, and related artifacts are installed in his local file system. Lastly, required hardware, e.g., cameras, laser-scanners, and also an entire robot can be connected to the machine on which the CI server is running, hence can be accessed by the software stack that has just been installed. Thus, "hardware-in-the-loop" is also supported by the CITK concept. If feasible, the CITK tool chain can also be intstantiated on the target robot's hardware itself (5).

##### Verification and Experiments

At this point the focus is again (2), because besides gathering information about a system and automatically deploying it, successful reproduction also includes repeating tests and experiments. This is necessary to ensure the intended system responses. It is well-known that sound experiments imply a well-defined protocol. Unfortunately, experiment execution and testing are mostly carried out manually and are thus infrequent and prone to user induced errors. This is especially the case because test setups usually require a high level of technical understanding from the operator. Therefore, the CITK suggests to convey the concept of an experiment protocol to the orchestration of software components involved in an experiment and to execute, test and evaluate software intensive experiments in an automated manner. In the CITK, researchers or engineers are using a newly implemented framework called: Finite State Machine Based Testing (FSMT). FSMT is a lightweight and event-driven software tool that implements the aforementioned suggestions based on a finite state machine that formalizes experiment execution with respect to configuration and orchestration of invoked software components. It supports automated bootstrapping (startup), evaluation, and shutdown of an entire software system used in an experiment. An FSMT experiment description includes three mandatory steps a) environment definition, e.g, prefix path, required environment variables or runtime configurations, b) software component descriptions such as, path to executable plus command line arguments (configuration!) and c) success conditions. Success conditions are, i.e., whether the PID of a component's process is present within a given time frame. Furthermore, FSMT may check the standard output and standard error stream of components for a developer-defined prompt (expert knowledge), again, within a specified time frame. Based on the result of multiple success criteria checks, FSMT will either block or further advance in execution of the state machine. In case a criterion is not satisfied after a given timeout, FSMT will abort the experiment to prevent subsequent failures.

After specifying the environment, components and their success criteria, a researcher determines the execution order (orchestration) of his/her components. Here, empirical scientists may provide their scripts, that are already included in the distribution, to evaluate data produced in each run or trial --- to generate plots for instance. These scripts can be defined and executed like regular system components at the time the system is completely bootstrapped, hence, is producing data. After a user-defined period of time, and only if all success criteria are met, the system is automatically shutdown and all system output, like log files and plots, are saved and archived. This formalization of an experiment protocol allows to consistently reproduce test results in an automated fashion. This makes it a perfect candidate for running it on a CI server, which is the case in the CITK concept. Since FSMT experiments are included in the distribution and deployed via the CI server, they are available in the distribution prefix. A researcher may trigger experiments like regular build jobs and the CI server will report the outcome of the experiment. This circumstance enables empirical scientists to easily execute tests and prototype experiments consistently without gaining extra knowledge about technical details of the system.

Finally, in case all CI jobs are generated, the system has been deployed, the experiment has been formalized, tested and executed and data sets have been recorded, the CITK entity model is satisfied. At this point a researcher can synchronize his data to a web-based catalog that represents the current status of his system version including all artifacts according to the CITK reproduction process in a human readable form. Content in the web-based catalog is, for the most part, automatically generated from distributions. The web-catalog is also entity based and entries, e.g., system versions, included software components, CI jobs are automatically interlinked. Moreover, data sets, FSMT descriptions and publications according to the CITK entity model can be linked by the researcher. Entity specific information such as, source code repository locations, links to wiki pages, API documentation and necessary information for the reproduction of these systems are provided per entry — where applicable.

##### The user's point of view

From a users's point of view, who is interested in replicating an experiment in order to verify reported results (Kerry), the starting point is the web-based catalog that has been referenced in a corresponding publication (figure above (1) blue path). The catalog enables him/her to browse and search for software components or complete systems and their related artifacts as described above. In this case she or he is pointed directly to a system version. The first step for the researcher is to browse the catalog and to load the page that contains a general textual description of the system and information about the included software components as well as required dependencies for her/his Linux computer. Hence, the next step is to install the required dependencies using the package management tool of the Linux distribution. Afterwards, she/he follows the link to the build generator recipes comprising the system and downloads the required recipes. In order to generate CI server jobs from the recipes a CI server installation as well as the generator are required. Download instructions for a pre-packaged CI server are provided on the website and the user follows these instructions to install the environment on her/his computer. After starting the local CI instance, the generator needs to be invoked to configure the CI server with the jobs for the system (2).

The last step is to start the installation process. Therefore, the user opens the web page of the local CI instance and starts the orchestration job for the distribution. After this job has finished, the system is deployed on the user's computer (3). Now, the user can focus on reproducing experiments that have been defined and deployed along the system. These experiments are available as configuration files for the FSMT testing framework. FSMT and the configurations have just been installed as parts of the software distribution for the system. The web catalog entry for the system version lists the available experiments and each experiment comes with a description of how to execute it. In this case this is merely a command line to launch FSMT or a manual trigger of a CI build job. The user executes the described commands and FSMT reports the results through a report (xunit for instance) and the process return code. After the experiment has been executed successfully, the resulting output, i.e., in form of a plot, can be found on the user's computer (as well as all required logs and data files). Besides the FSMT report, the generated plot can be compared to a reference plot from the catalog entry of the experiment (4).