# Extending and Applying Spartan to Perform Temporal Sensitivity Analyses for Predicting Changes in Influential Biological Pathways in Computational Models

Kieran Alden, York Computational Immunology Lab, the Centre for Immunology and Infection at the Department of Biology and Hull York Medical School, Department of Electronics, University of York, Heslington
Jon Timmis, York Computational Immunology Lab and Department of Electronics, University of York
Paul S. Andrews, York Computational Immunology Lab and Department of Electronics, University of York
Henrique Veiga-Fernandes, Instituto de Medicina Molecular, Faculdade de Medicina de Lisboa
Mark C. Coles, York Computational Immunology Lab, the Centre for Immunology and Infection at the Department of Biology and Hull York Medical School, University of York, Heslington

Pages: 431–442

Abstract—Through integrating real time imaging, computational modelling, and statistical analysis approaches, previous work has suggested that the induction of and response to cell adhesion factors is the key initiating pathway in early lymphoid tissue development, in contrast to the previously accepted view that the process is triggered by chemokine mediated cell recruitment. These model derived hypotheses were developed using spartan , an open-source sensitivity analysis toolkit designed to establish and understand the relationship between a computational model and the biological system that model captures. Here, we extend the functionality available in spartan to permit the production of statistical analyses that contrast the behavior exhibited by a computational model at various simulated time-points, enabling a temporal analysis that could suggest whether the influence of biological mechanisms changes over time. We exemplify this extended functionality by using the computational model of lymphoid tissue development as a time-lapse tool. By generating results at twelve- hour intervals, we show how the extensions to spartan have been used to suggest that lymphoid tissue development could be biphasic, and predict the time-point when a switch in the influence of biological mechanisms might occur.

Keywords—Sensitivity analysis; peyer's patches (PP); spartan; lymphoid organs; computational model

## 1   Introduction

The development of computational models that aim to provide insights into biological systems has become more prevalent. For this approach to successfully inform our biological understanding, the relationship between the simulation and the real-world system has to be established. Given that these computational models typically have to capture systems where substantial aspects of the biological detail are unknown, it can be difficult to understand how results from an abstract simulation should be interpreted in terms of the real biology. Previously we noted that for a majority of simulation results in the literature, little attempt was made to reveal how representative a simulation result is in terms of the biological system it was designed to represent [1]. This observation drove us to develop and release spartan [1], [2], an open-source software toolkit that provides a researcher with statistical tools to help understand the relationship between a simulator and the biological system it represents. The included techniques were designed to be applicable to traditional ordinary or partial differential equation simulations as well as agent-based implementations.

We have previously adopted a principled approach to the design and implementation of a computational model that aimed to further understand the pre-natal development of secondary lymphoid tissue [3], [4]. These tissues include lymph nodes, Peyer's Patches (PP) and the spleen: each having a key role in triggering adaptive immune responses to infection. An understanding of the key cellular and molecular mechanisms involved in the development of secondary lymphoid tissue has previously been derived through the analysis of gene-deficient mice [5], [6], [7]. Although this approach has provided significant insight into the role of individual cell types and molecules, current experimental techniques cannot fully explain how lymphoid tissues develop through complex temporal interactions between these biological components. By complementing these approaches with computational modelling techniques, in silico experiments could be performed that cannot be conducted using conventional technologies: generating additional, novel, hypotheses that can address interesting research questions and inform laboratory studies.

The modelling approach we applied ensures that there is clear separation between the biological understanding to be captured in the model and the description of how this understanding is to be implemented as a simulation platform: the overall objective being to ensure researchers are confident that predictions generated by the platform are grounded in the biological system being studied [8], [9],[10]. This model was constructed to examine the role of the hematopoietic CD4$-$ lymphoid tissue initiator (LTin) and CD4+ lymphoid tissue inducer (LTi) cells, and their interactions with VCAM+ lymphoid tissue organiser cells (LTo), in PP development (Fig. 1). As such, our implementation adopts an agent-based modelling (ABM) approach, where each cell can be captured as an individual agent that possesses attributes and state, located within a specified environment, allowing for an exploration of the dynamics that emerge from interactions between these three cell types and their environment [11]. The executable model simulates the 72 hour period of murine pre-natal development where PP organogenesis is thought to occur [5]. Populations of hematopoietic LTin and LTi cells migrate into the developing gut from embryonic day 14.5 [12], and move randomly, with a velocity within a range previously determined [7]. Interactions between CD4$-$ LTin cells and VCAM+ LTo cells induces LTo cell differentiation, and the simulated expression of adhesion factors VCAM-1, ICAM-1, and MaDCAM-1. Receptors for these adhesion factors are expressed on the surface of LTin and LTi cells, which if bound to expressed adhesion factors impacts the motility and observed behaviours (velocity and displacement) of the LTin/LTi cell [5], [7]. Further interactions between CD4+ LTi cells and differentiated LTo cells induces LTo expression of chemokines CXCL13, CCL19, and CCL21: factors thought to be key in recruiting further LTi cells to a developing PP [13], [14], [15] With LTi cells expressing CXCR5 and CCR7 receptors for these chemokines [16], a positive feedback loop is created that induces migration of LTi cells towards a differentiated LTo cell, promoting cellular interactions that further influence key hematopoietic cell behaviour responses, forming large cell aggregations that mature to become PP [17].

The mathematical constructs that have been used to model chemokine and adhesion factor expression and response are detailed and justified in our previous work [4]. By performing a process of model parameter calibration, values have been assigned for parameters within these constructs, named in Fig. 1, such that our model exhibits emergent cell behaviour responses at early stages of PP development (12 h) that are statistically similar to those observed in ex vivo cell culture; for both cells in the vicinity of a developing PP (<$50\;{\mu}\mathrm{m}$) and those more distant [3]. This statistical evidence and the transparent approach to implementation provided us with the confidence to utilise this simulation platform as a tool for performing in silico experimentation to further understand the mechanisms that give rise to this emergent behaviour.

We used this model as an exemplar of how the statistical analysis approaches in the spartan toolkit could provide additional biological insight [1]. These statistical techniques, specifically Sensitivity Analysis techniques, were used to perturb the values of parameters controlling the mathematical constructs that represent aspects of the biological system [1], [18], [19], [20]. We previously concluded that, at hour 12 in murine PP development, tissue formation was highly dependent on adhesion factor expression and response [3]. This model-derived prediction is in contrast to the widely accepted view that PP development is triggered by chemokine production [15], [21]. Yet we have also shown that the simulation can replicate previously published experimental work conducted at the end of the development time-period (hour 72), showing that PP do not form in mice deficient for chemokine receptors CXCR5 and CCR7 [4], [15]. This suggests that the simulated process does become chemokine dependent at some point within the time-course, and by extension suggests that PP development could potentially be split into two distinct phases.

Such an hypothesis raises the important question of when a biological factor, such as an adhesion factor or chemokine, becomes the key pathway influencing an observed biological phenomenon. Although the integration of computational models with laboratory studies is becoming more popular, applying the developed simulation as a tool to perform a temporal analysis of the influence of simulated biological factors has been limited. The only prominent example of such an application to date lies in studies of granuloma performance for controlling Mycobacterium tuberculosis infection [22], where agents within the system have been tracked constantly through the simulation time period, and sensitivity analysis techniques utilised to examine behaviour at regular time intervals. Utilising such an approach has the potential to not only suggest the biological factors that are highly influential, but suggest time-points where system dynamics are influenced by particular parameters, potentially revealing that behaviour which emerges through these system dynamics occurs in distinct phases.

To demonstrate this concept, we previously presented a proof-of-concept experiment that tracked simulated hematopoietic cells for a simulated one hour period, at twelve hour intervals up to hour 48 of tissue development [4]. The distributions of cell behaviour responses for each time-point were then compared with responses generated at hour 12, revealing that there is no significant change in cell behaviour until after hour 36 of PP development, where cell behaviours become significantly different. Whereas earlier analyses in that paper were able to suggest the key simulated biological pathways at hour 12, no statistical analyses were completed to determine the factors that cause this significant behaviour change, and the analysis omitted a further 24 hour period where further behavioural changes may become apparent. Yet the proof of concept does suggest that a full temporal analysis of the 72 hour period could potentially increase our understanding of the entire process of PP development.

The limited application of more detailed temporal analyses in both our study and that of others may be due, in part, to a lack of suitable statistical tools that have this capability. To both counter this gap in available tools and encourage the adoption of simulated temporal analyses in further research studies, we have released an extended version of spartan that possesses the capability to perform a temporal analysis of simulation responses. The objective of this paper is to demonstrate the use of this new functionality to suggest the influence of key simulated biological pathways throughout the 72 hour period of PP development. As we possess ex vivo culture cell responses at hour 12 and observed biological phenomena at hour 72 (the development of the tissue), while considering computational complexity and available resources, we have configured the simulator such that behaviour responses (velocity and displacement) for hematopoietic cells in the vicinity of developing PP are recorded for an hour at twelve hour intervals. By utilising the three sensitivity analysis techniques available in spartan , simulations were performed that replicate different physiological conditions, and cell behaviour responses analysed to determine if a change in behaviour is observed under those conditions at each time-point. These physiological conditions are simulated by perturbing the values of a key set of simulation parameters that capture expression of and response to chemokines and adhesion factors, and parameters that influence the probability of cell receptor binding. By perturbing these values, we can gain greater insight into how robust the simulator is to each simulated physiological condition throughout the time course, in turn suggesting the time-points when simulated biological pathways are influential. This could act as vital information for informing future laboratory experimentation.

The focus of this paper is on the application of the techniques in spartan in performing a temporal analysis. For complete detail of the statistical techniques themselves, we direct the reader to the available spartan publications [1], [2]. For full detail of the lymphoid tissue model, we direct the reader to the relevant model publications [3], [4], [10]. To encourage wider adoption of the approach demonstrated here, our computational model, the data from which the following results have been generated, and the spartan toolkit are all available to download from our website (www.york.ac.uk/ycil).

## 2   Results

### 2.1 Parameter Value Selection and Simulation Platform Executions

The spartan package contains three methods to generate simulation parameter value sets, one for single parameter robustness analysis and two for global parameter sensitivity analyses. The single parameter robustness is used to alter the value of just one parameter, whereas the global techniques permit perturbation of the value of a number of parameters simultaneously. The parameter sampling methods are detailed in Section C of the Methods, and described in further detail in the publication that accompanies the spartan package [1]. The following sections of this manuscript detail how behaviours at simulated time-points under the conditions specified in the generated parameter value set can be used to gain insight into the both the behaviour of the computational model and the biological system of interest.

As an exemplar, we perform a temporal sensitivity analysis, using the three parameter value sampling techniques, of our model of lymphoid tissue development [3], [4]. We describe how each approach perturbs the values of the six key parameters of interest described above and shown in Fig. 1. For each parameter sampling technique, the appropriate spartan method (Methods, Sections E-G) was then applied to analyse the cell behaviour responses for all parameter value sets, at all time-points.

As our model adopts an agent-based approach in implementation, it is vital that we mitigate the impact that any inherent stochasticity has on the results produced, as each run can produce slightly different results. As such we used spartan to deduce the number of simulation replicate runs required, as described in section D of the Methods. In each run behaviour responses (velocity and displacement) were calculated for each cell within $50\; \mathrm{m}$ of a developing PP for a one hour period, at simulated twelve  hour intervals.

### 2.2 Understanding Robustness of Identified Parameters to Simulated Chemokine Expression and Response

The objective of a single parameter robustness analysis is to explore the implication of biological uncertainty or estimation of that parameter on simulation result. This analysis is very useful to simulations of biological systems, which will feature parameters that either cannot be determined experimentally or which are part of an abstract mathematical construct derived to capture a biological element. By applying the procedure detailed in section G of the Methods, the values of simulation parameters of interest are perturbed individually. A significant change in simulation behaviour when this perturbation is performed can reveal the parameters for which the simulation is sensitive. Where a simulation is highly sensitive to one or more parameters, caution should be applied when interpreting model-derived results, as these may be artefacts of model parameterisation rather than a true representation of the biology. The extended functionality in spartan will provide an indication as to whether parameter sensitivity changes over simulated time.

Taking the conclusions from our previous studies into account, that chemokine expression does not appear influential at early PP development yet PP are not observed in chemokine-deficient mice, the initial stage of this study considered how robust each identified simulated cell response measure is over time to when the parameters that control the chemokine mechanism are individually altered. Parameter samples were generated as described in the previous section and simulation executions under each generated parameter set condition were analysed using the procedure described in the Methods (Section E).

Figs. 2a and 2b show a comparison between simulation behaviour for different values of a specified parameter and that where the parameter value has been perturbed, at twelve hour intervals. This comparison is made using the Vargha-Delaney A-Test [23], an effect magnitude test that can reveal the difference between two non-parametric distributions. An A-Test score of 0.5 indicates no difference in the simulation platform response distributions, whereas values closer to 0 and 1 indicate a significant difference between the two sets of simulation results.

Fig. 2. An examination of single parameter robustness over simulation time. A and B show how a change in maximum expression level of chemokines impacts LTin and LTi cell behavior as simulated development time elapses. C and D show how a change in LTi cell response to chemokine expression impacts cell behaviour over the same time period. In this case, these parameters have been altered individually, with all other parameters remaining at their calibrated values.

Figs. 2a and 2b focus on the parameter that controls maximum chemokine expression level by an LTo cell. The analyses in both plots support our previous findings that a change in this chemokine expression has little impact on cell behaviour at the twelve- hour time-point. However, as time increases, the impact this parameter has on the recorded velocity (Fig. 2a) and displacement (Fig. 2b) of cells increases. One notable result from both analyses is how, as the value of this parameter is either increased or decreased, there is a large difference between the impact this has at the 24 hour time-point and that at hour 36. This could suggest that, between these time-points, the model is less robust to a change in chemokine expression; thus the expression level of chemokines could be an influential factor at a later time-point. Related to this, Figs. 2c and 2d show the impact that a change in the parameter that controls LTi cell response to chemokine has on cell velocity and displacement respectively. An increase in solely this parameter value makes it more likely that a cell will respond to a level of chemokine expression in the environment. Thus as the response becomes more likely, the cell is more likely to move towards a developing PP and be affected by adhesion factor expression, thus impacting the cells velocity and displacement measures recorded over a one hour period. The analyses are showing both behaviour measures to be increasingly sensitive to the value of this parameter as the period of simulated PP development elapses.

### 2.3 Identifying Key Biological Factors at Each Time-point Using Global Parameter Sensitivity Analysis Approaches

A single parameter robustness analysis does however only indicate the impact of a change of that parameter alone: it cannot elucidate any higher-order effects that occur due to interactions between parameters. By perturbing the value of a number of parameters simultaneously, while covering the complete parameter value space of interest, parameters having the greatest influence on simulation response can be identified, thus indicating the key simulated biological pathways in the model at various time-points. By extension, these conclusions can be used to suggest the key pathways in the biological model if the implementation is well grounded in the biological domain. The spartan toolkit includes two global sensitivity analysis techniques, one sampling-based and one variance-based, both detailed in sections F and G of the Methods respectively. We note here that the global analysis techniques we are applying in spartan are designed to provide statistical information regarding the contribution of each of the parameters of interest to changes in simulation response, in this case simulated cell measures. This differs from alternative applications of global parameter sampling, such as that applied in [24], where global parameter sampling has proven advantageous in determining both the robustness of simulation behaviour under different conditions, and the volume of the parameter space where the simulation behaves as one would expect or observe biologically. Instead the techniques in spartan utilise simulation results obtained for each parameter set generated during sampling to calculate a statistical measure for each parameter of interest. Below we discuss the extension of both global parameter analysis techniques to permit a temporal global sensitivity analysis of a simulation, and exemplify their application on the lymphoid tissue development simulation.

#### 2.3.1 Sampling-based Global Parameter Sensitivity Analysis (Partial Rank Correlation)

This approach utilises latin-hypercube sampling to generate 500 sets of model parameters where the values of the six key parameters were perturbed (see Methods, Section C). Simulation runs have been performed for each set, with replicate runs produced to mitigate aleatory uncertainty (Methods, Section D). For each parameter, the Partial Rank Correlation Coefficient (PRCC) has been generated using the procedure described in the Methods (Section F). Through an examination of the change in PRCC over simulated time, it is possible to determine whether the influence of each of the six parameters, each an abstract representation of a biological feature, changes in the course of simulated PP development.

Fig. 3 shows the PRCC for both cell velocity and displacement responses at twelve- hour intervals, for each of the six parameters, indicating the extent of the correlation between the value assigned to this parameter and the change in model response. Values closer to 1 or $-$1 indicate that there is a strong correlation. As the procedure is altering a range of parameters simultaneously, a higher PRCC value suggests a highly influential parameter, and by extension a key biological pathway at that time-point.

Fig. 3. Partial Rank Correlation Coefficients (PRCC) for each of the six parameters identified in Fig. 1, calculated at simulated twelve- hour intervals. Parameter values were sampled using the latin-hypercube approach in the spartan package [1]. Examining how the PRCC changes over time gives an indication of when a parameter begins to become influential in affecting cell velocity and displacement. P -Values for both measures are shown in the table in the graph, and produced in a CSV file by spartan. Where the p-value becomes very small, 0 may be displayed due to the display of significant figures.

Considering cell velocity first, there is no significant increase in the strength of correlation between the parameters that capture the initial and maximum levels of chemokine expression by an LTo cell (Figs. 3a and 3b). Discounting direction of correlation, the same conclusion can be drawn for the parameter that captures the probability an LTi cell does not respond to chemokine expression in the cells locality (Fig. 3c). Examining the parameter controlling the probability two cells form a stable bind, thus inducing LTo cell differentiation, it can be noted that the PRCC values are higher than those for the chemokine parameters above, yet the value remains relatively stable over simulated time (Fig. 3d). The higher PRCC values are expected due to the side effect that setting this parameter to values close to and including zero has on simulation response: that LTo cells cannot differentiate and express adhesion factors and chemokines. We have previously shown that this parameter has a huge impact on simulation response for values between 0 and 4 percnt, yet further increases do not impact simulation response [4]. As such we deduce that the high PRCC values are contributed to by impact of this side effect. In contrast to the previous four parameters, there is a strong correlation between the value assigned to the parameter that captures the probability an LTin/LTi cell responds to adhesion integrins expressed in the vicinity of a developing PP and cell velocity, at all simulated time-points (Fig. 3e). This suggests that the response to adhesion integrins VCAM-1, ICAM-1, and MaDCAM-1 has a key influence on all stages of PP development, in addition to our previous findings that this is the highly significant factor in early PP organogenesis. The correlation between the parameter controlling the level of adhesion factor expression upon LTo cell differentiation (Fig. 3f) and cell velocity further supports that conclusion from our previous study, yet this influence does decrease between 12 and 36 hours. This potentially suggests that adhesion factor expression and response is influential in the early stages of PP development, but not through the entire time period.

Considering cell displacement in Fig. 3, it is clear that there is a correlation between the simulated chemokine pathway and cell behaviour as simulated time progresses. In support of our previous study, that found no significant role for chemokines in early PP development [3], [4], there is no correlation between the value assigned to these two parameters and cell displacement at hour 12, yet this increases between hours 12 and 36 (Figs. 3b and 3c), after which there is a clear trend between the value of the parameter and displacement. This indicates that the process may change from adhesion to chemokine dependent between hours 24 and 36. Yet the influence of the parameter that captures LTin/LTi cell response to adhesion factor expression (Fig. 3e) is initially stronger and does increase, suggesting that cell response to adhesion integrins is influential throughout the entire period. Although this is the case, interestingly no correlation becomes apparent between the level of adhesion factor expression and cell displacement, at any time-point in development (Fig. 3f).

#### 2.3.2 Variance-Based Global Parameter Sensitivity Analysis (eFAST)

In contrast to the above method that focuses on correlation between each parameter and model response, the extended Fourier Amplitude Sampling Test (eFAST) approach can partition the variance in simulation platform response between parameters of interest. For each simulated time-point, spartan has been used to calculate the First Order Sensitivity Index (Si) of each parameter, indicating the fraction of output variance that can be explained by the value assigned to that parameter. By examining this value over simulated time, we can further deduce the impact of each simulated biological pathway over the course of PP development.

Fig. 4 shows the Si values for each of the six parameters at twelve  hour intervals. Values closer to 1.0 indicate that a large fraction of variance in the output can be attributed to the value assigned to that parameter, thus determining this parameter to be highly influential.

Fig. 4. eFAST First-Order Sensitivity Indices for each of the six parameters in Fig. 1, calculated at simulated twelve- hour intervals. This shows the fraction of output variance in simulation platform response that, at each time-point, can be explained by a particular parameter.

Considering cell velocity (Fig. 4a), this analysis supports the conclusions drawn from Fig. 3. We have previously shown that the expression level of adhesion factors (adhesionFactorExpressionSlope) accounts for a statistically significant amount of variance in simulation response at hour 12 [1], [4], yet here this reduces in the same manner as observed in the previous results section. In contrast, the fraction of output variance explained by LTin/LTi cell response to adhesion integrin expression (maxProbabilityOfAdhesion) vastly increases between hours 12 and 24, and continues to increase for the remaining simulated time, becoming the only parameter to have a significant impact on cell velocity.

Examining cell displacement (Fig. 4b), the eFAST results suggest a vast increase in the variance accounted for by the LTin/LTi cell response to adhesion integrins and the level of chemokine expressed by the LTo upon cell differentiation (maxChemokineExpressionValue) between hours 12 and 36, after which the value stabilises. In contrast to the results in the previous section, no significant increase is observed in the Si value for LTi response to chemokine (chemokineExpressionThreshold), a value that remains close to constant throughout. Yet for all simulated time-points for hours 24 onwards, the variance accounted for by that parameter is statistically significant in comparison to the dummy parameter, suggesting this parameter does have an effect, albeit not the major influence on development.

## 3   Discussion

The application of computational models of biological systems is becoming more prevalent: for providing an interpretation of biological data, or acting as a scientific tool through which new hypotheses can be developed [25], [26]. It is rare to see a combination of a computer model and sensitivity analysis techniques applied to suggest whether the influence of modelled biological pathways changes over time. Yet this provides further experimental capacity that cannot currently be performed in the laboratory. An example we have noted previously is a flow cytometry analysis: running a biological sample through a flow cytometer irretrievably destroys that sample, making it impossible to study further time-points [27]. Yet computational models can produce output at numerous time-points, which if analysed appropriately, may be then used to design appropriate laboratory experiments.

Previous experimental studies have produced the generally accepted hypothesis that there are three phases of PP development: the appearance of VCAM+ LTo cells in the developing gut, the appearance of clusters of hematopoietic LTin/LTi cells around these LTo cells, and the recruitment of lymphocyte cells from E18.5 [13]. Yet through in silico experimentation using our computational model, we have previously found that there may be an additional development phase between the first two phases [3], one that is dependent on the expression of and response to adhesion integrin rather than chemokine expression. This hypothesis, in addition to the finding that our computational model does indeed replicate chemokine knockout experiments [4], led us to question whether, through simulation, we could determine the time-points at which these changes in phase occur.

The initial analyses in this study sought to understand how robust our computational model was to changes in the mathematical constructs that capture the expression of and response to chemokine expression (Fig. 2). By using the Single Parameter Robustness technique in spartan , we have compared cell behaviour responses where the levels of expression of chemokines by an LTo cell and response to chemokines by an LTi cell are adjusted individually. Both analyses reveal similar trends: a change in the value of the parameters that capture chemotaxis become more influential as simulated PP development time elapses. Thus although the conclusions in our previous studies of early PP development are supported, these initial findings suggest that there is a time-point where this simulated pathway becomes influential. Yet these conclusions are drawn from studying each parameter individually: identification of this time-point requires us to determine the parameters influence in comparison with the other five of interest in this analysis. To do this, we move from the use a single parameter analysis to the application of sampling techniques that consider a number of parameters simultaneously.

When considering use of global parameter sensitivity analysis techniques over time, an exploration of changes in Partial Rank Correlation Coefficients (PRCC), as demonstrated in this study (Fig. 3), is not novel, having previously found application in determining correlations between simulated biological factors and extracellular bacterial load to study TNF in controlling tuberculosis in a granuloma [22]. However, by adding the capability to perform such an analysis to spartan , the methods exemplified in this paper can be easily replicated and adapted for use in other computational modelling studies. In contrast, we are unaware of any study that has examined the first-order sensitivity indexes, generated using the eFAST technique [20], [28], over simulation time in the manner that we have presented here.

The results from both global parameter sensitivity analysis techniques presented in this study again support our previous findings from hour 12 of PP development [3]. Yet the expression of adhesion integrins by a VCAM+ LTo cell becomes less of an influential pathway by hour 36 (Figs. 3f and 4a). Thus an initial stage could exist, mediated by cell adhesion factors, covering the first 36 hours of development, after which point the effect of a change in adhesion factor expression level reduces amid a growing influence of other factors.

Conversely, the analyses in this paper suggest that the level of chemokine expression from a differentiated LTo cell becomes more influential as simulated time progresses: supporting previously published experiments that suggest mice deficient for chemokine receptor genes do not form PP [4], [12], [21]. By performing a sensitivity analysis of simulation platform responses over time, we can suggest when the process becomes chemokine dependent. For chemokine expression from an LTo cell, the analyses in this paper suggest this occurs between hours 12 and 36 (Figs. 3(b), 4(a) and (4b)), with the LTi cell response to this expression becoming more influential after hour 24 (Fig. 3c). This would suggest that the chemokine expression level has to be sufficient for LTi cells to respond, a reaction that then drives the process of hematopoietic cell aggregation. This effect continues through to the end of the simulated time, suggesting no further change in the key pathways until the aggregation has formed at hour 72, after which the third of the accepted development phases begins [13]. Our model covers only the first two phases, stopping at aggregation and prior to recruitment of lymphocytes.

Although the level of adhesion factor expression has been determined to only have a significant influence on cell behaviour for the first 36 hours (Figs. 3(f) and 4(a)), the parameter that captures LTin/LTi cell response to adhesion factors has been shown to be highly influential throughout the time period (Figs. 3(e) and 4(b)). Thus although we proposal a biphasic stage of development between the aforementioned phases 1 and 2, one that moves from adhesion integrin expression to chemokine dependency, LTi cell response to adhesion does have an influential role in both these stages.

The conclusions drawn from our previous studies using our computational model, the identification of an adhesion integrin dependent stage, have been examined and verified in cell culture systems [3]. By providing a statistical analysis of a number of time-points of PP development, the conclusions in this paper may inform future laboratory studies that target later time-points, to determine if the development stages identified here can also be verified. Through extending spartan such that others can adopt this technique in their own research studies, and releasing our simulation platform responses as examples, we hope that other researchers are encouraged to adopt this promising approach that has real potential to further our understanding of computational model behaviour and inform informing future laboratory work.

## 4   Methods

### 4.1 Computational Model of Lymphoid Tissue Development

Our previously developed model of PP development in the mouse, available from www.york.ac.uk/ycil/software/ppsim/, adopts an agent-based approach where each cell is explicitly captured as an agent, each possessing individual attributes, with results of interactions between other agents and the gastrointestinal tract environment given by a set of rules [4]. Mathematical constructs are utilised to represent the expression of and response to adhesion factors and chemokines. By changing the values of parameters within these constructs, we can examine how key cell behaviour responses change under a variety of physiological conditions. To produce the model output required for the analyses in this study, the simulation platform has been altered such that the LTin/LTi cell behavioural responses, namely velocity, displacement, meandering index, displacement rate, and total migration distance, all calculated for a period of one simulated hour, are output to CSV files at twelve-hour intervals. The analyses in this study focus on cell velocity and displacement of cells in the vicinity of a developing PP (<$50\;{\mu}\mathrm{m}$).

### 4.2 The Statistical Package: Spartan

Spartan is a package of statistical techniques that has been compiled with the specific aim of assisting researchers understand the relationship between their computational or mathematical model and the biological system it represents, with the aim of providing novel biological insight [1], [2]. The package is open source, implemented within the R statistical environment, and available from both the Comprehensive R Archive Network (CRAN) and www.york.ac.uk/ycil/software/spartan. Accompanying the package are comprehensive tutorials and example simulation data that aid the adoption of all the techniques demonstrated in this paper. For the purposes of the study in this paper, spartan has been extended such that these analyses can be performed for model results generated at a number of time-points in an execution. This enables the researcher to contrast the behaviour of the model at various time-points, to determine if the influence of simulated biological pathways alters over time.

### 4.3 Parameter Value Selection

Similar to our previously published studies [3], [4], we focus this analysis on six parameters that influence the mathematical constructs used to model the expression of and response to chemokine and adhesion factors (Fig. 1), each constrained such that a value is selected from a given range: chemokineExpressionThreshold (0-1), maxChemokineExpressionValue (0.015-0.08), initialChemokineExpressionValue (0.1-0.5), stableBindProbability (0-1), adhesionFactorExpressionSlope (0.25-2), and maxProbabilityOfAdhesion (0-1). These value ranges either explore the full parameter value space (where the parameter is 0-1) or a wide range established when the model was originally analysed in previous studies [3], [4]. As these constructs are abstract representations of a biological phenomena for which parameter values could not be directly obtained, baseline values of these parameters have been set through a process of calibration. The full detail of the constructs used to model adhesion and chemoattractant factors is detailed in our previously published model description [4].

Values for these six parameters were selected using three parameter sampling techniques in the spartan package, introduced briefly below. Full detail of each sampling algorithm can be found in the papers describing the software [1], [2].

#### 4.3.1 Single Parameter Robustness

The first, aiming to examine how robust the simulated system is to a single parameter alteration, changes the value of each parameter of interest independently, assigning the parameter a different value within the respective ranges specified above. The algorithm works through each parameter in turn, initially setting the parameter value to a specified minimum, and increases the value by a set increment until a specified maximum value is reached. In this case, increments of 0.1, 0.005, 0.05, 0.1, 0.05, and 0.1 were used respectively the six parameters in the above section. spartan outputs these simulation parameter value sets to a CSV file for post-processing into simulation parameter files or reading into a simulation directly.

#### 4.3.2 Latin-Hypercube Sampling

Perturbing each parameter independently however does not elucidate any compound effects where the influence of one parameter is directly linked to the value of another. Thus we utilise two global parameter sensitivity analysis sampling techniques from the spartan package that simultaneously selects different values for all six parameters from the parameter space. The first, latin-hypercube sampling [29], selects values for each parameter from the value space, aiming to reduce any possible correlations while ensuring efficient coverage of the space over a minimal number of samples[18]. Using spartan , 500 sets of parameters were generated for the analyses in this paper.

#### 4.3.3 Fourier Frequency Sampling

This sampling technique selects parameter value sets through the use of sinusoidal functions of a particular frequency through the parmaeter value space. Each parameter of interest is considered in turn. On its particular turn, that parameter is assigned one frequency, with its complementary parameter set assigned a significantly different frequency [20]. A number of parameter values are selected from points along each of these curves. This creates a set of simulation parameters for each parameter of interest. Due to the symmetrical properties of sinusoidal functions, it is probable that the same parameter value sets could be selected. To address this, a re-sampling scheme is encouraged where a phase shift is introduced into each frequency, and sampling repeated [20], [28]. Thus, a number of parameter value sets are created for each parameter of interest. This process is repeated for an extra parameter, the dummy, which has an arbitrary value range but no impact on simulation behaviour, yet exists to enable a comparison between the impact of each parameter and one known to have no effect on simulation response. For the analyses in this paper, that consider six parameters of interest plus the dummy, we used three re-sample curves and selected 65 parameter values from points along the curves, leading to 1,560 parameter value sets on which the model was executed.

### 4.4 Addressing Stochasticity-Derived Uncertainty in Model Response

As our model adopts an agent-based approach, agent (cell) behaviour is affected by the use of pseudo-random number generation, and as such no two sets of simulation responses will be identical [30]. To ensure that the results generated from the model are representative of the simulated physiological conditions and the impact of the inherent stochasticity is mitigated, a number of replicate model executions are performed for each set of parameter values for which model results are required. This number of runs has been determined using the Consistency Analysis technique in the spartan package [1]. For all experiments documented in this paper, 500 sets of model executions have been performed for each set of parameter values. The median of each cell behaviour response is calculated for all 500 runs, producing a distribution that is compared to median results gained from an alternative parameter set.

### 4.5 Model Robustness to Single Parameter Value Alteration

Taking each of the six parameters of interest in turn, the parameter value was perturbed across the range specified in the Parameter Value Selection section above, with the other five parameters remaining at their calibrated values. spartan was used to calculate the median values for both cell velocity and displacement responses for each simulation run under a specified parameter condition. With these calculated, spartan compares this distribution of medians to a set generated under calibrated parameter value conditions using the Vargha-Delaney A-Test [23], an effect magnitude test that provides an indication of the difference between two distributions. The A-Test results were used to determine whether the change in a single parameter value has a significant impact on the behaviour of our computational model. An A-Test score of 0.5 suggests there is no difference between the simulation runs at calibrated values and those where the value of one parameter has been perturbed. Scores towards 0 and 1 suggest that the behaviour of the simulation significantly changes due to the new value assigned to that parameter. This comparison was performed for distributions of medians generated at twelve hour intervals up to hour 72, and is shown in Fig. 2.

### 4.6 Identifying Key Biological Pathways from Parameters Sampled using Latin-Hypercube Approach

For each parameter value set generated from the hypercube, spartan was used to calculate the median values for both cell velocity and displacement responses at twelve hour intervals for each run under those parameter conditions, and in turn the overall median values for each response was calculated from the time-point medians of each run. These overall time-point median values for cell displacement and velocity are deemed to be representative of model behaviour under those parameter conditions.

Taking each of the six parameters in turn, correlations between the value assigned to that parameter and the model response were determined through calculation of the Partial Rank Correlation Coefficient (PRCC): a robust measure for quantifying non-linear relationships between an input and output [20]. Where the PRCC value is high, this suggests that, although a number of complementary parameters are also being perturbed, this parameter has a significant impact on model response. For the analyses presented in this paper, the PRCC for each time-point was calculated and plotted, to ease identification of the simulated time-points in development where a relationship between this parameter and the model response changes.

### 4.7 Identifying Key Biological Pathways from Parameters Sampled Using Sinusoidal Frequency Approach

Similar to the approach above, spartan was used to generate overall median responses to summarise the results of all model executions under the 1,560 parameter value conditions the sampling process generated. Again these overall responses were calculated for model responses generated at twelve-hour intervals.

The simulation responses were analysed by taking into account the frequencies that were used to generate that parameter set. Through Fourier analysis using these frequencies, variation in output was partitioned between the parameters, giving an indication of the impact each has on model response. Two sensitivity indexes are calculated for each parameter [20], [28]: an eFAST First-Order Sensitivity Index and eFAST Total-Order Sensitivity Index (STi). The first indicates the fraction of output variance that can be explained by the value assigned to that parameter. The latter indicates the variance caused by higher-order non-linear effects between that parameter and the others explored. In this case we were interested in the first statistic. For each of the six parameters studied, Si values were calculated at twelve-hour intervals, and plotted to ease identification of any correlation in Si value over simulation time.

## Acknowledgments

This work was partly funded by the Wellcome Trust [ref: 097829] through the Centre for Chronic Diseases and Disorders (C2D2) at the University of York and the Medical Research Council (G0601156) to Mark Coles. Paul Andrews was funded by EPSRC grant EP/I005943/1 Resilient Futures. Jon Timmis is partly funded by the Royal Society and the Royal Academy of Engineering. Henrique Veiga-Fernandes was funded by the FCT Portugal (PTDC/SAU-MII/100016/2008), European Molecular Biology Organisation (Project 1648) and European Research Council (Project 207057).

## References

• [1]K. Alden, M. Read, J. Timmis, P.S. Andrews, H. Veiga-Fernandes, and M. C. Coles, “Spartan: A Comprehensive Tool for Understanding Uncertainty in Simulations of Biological Systems,” PLoS Comp. Bio, vol. 9, no. 2, 2013.
• [2]K. Alden, M. Read, P. S. Andrews, J. Timmis, and M. Coles, “Applying spartan to understand parameter uncertainty in simulations,” R J., vol. 6, no. 2, pp. 63–80, 2014.
• [3]A. Patel, N. Harker, L. Moreira-Santos, M. Ferreira, K. Alden, J. Timmis, K. E. Foster, A. Garefalaki, P. Pachnis, P. S. Andrews, H. Enomoto, J. Milbrandt, V. Pachnis, M. C. Coles, D. Kioussis, and H. Veiga-Fernandes, “Differential RET responses orchestrate lymphoid and nervous enteric system development,” Sci. Signalling, vol. 5, no. 235, 2012.
• [4]K. Alden, J. Timmis, P. S. Andrews, H. Veiga-Fernandes, and M. C. Coles, “Pairing experimentation and computational modelling to understand the role of tissue inducer cells in the development of lymphoid organs,” Frontiers Immunol., vol. 3, pp. 1–20, 2012.
• [5]T. D. Randall, D. M. Carragher, and J. Rangel-Moreno, “Development of secondary lymphoid organs,” Ann. Rev. Immunol., vol. 26, pp. 627–650, 2008.
• [6]S. A. van de Pavert and R. E. Mebius, “New insights into the development of lymphoid tissues,” Nature Rev. Immunol., vol. 10, pp. 1–11, Aug.2010.
• [7]H. Veiga-Fernandes, M. C. Coles, K. E. Foster, A. Patel, A. Williams, D. Natarajan, A. Barlow, V. Pachnis, and D. Kioussis, “Tyrosine kinase receptor RET is a key regulator of Peyer's patch organogenesis,” Nature, vol. 446, no. 7135, pp. 547–51, Mar.2007.
• [8]P. S. Andrews, F. A. C. Polack, A. T. Sampson, S. Stepney, and J. Timmis, “The CoSMoS Process, Version 0.1 : A Process for the modelling and simulation of complex systems,” Dept. Comput. Sci., Univ. York, UK, Tech. Rep. YCS-2010-453, Mar. 2010, pp. 1–40.
• [9]M. Read, P. S. Andrews, J. Timmis, and V. Kumar, “Modelling biological behaviours with the unified modelling language: An immunological case study and critique,” J. Roy. Soc. Interface, vol. 11, no. 99, 2014.
• [10]K. Alden, P. S. Andrews, F. A. C. Polack, H. Veiga-Fernandes, J. Timmis, and M. C. Coles, “Using argument notation to engineer biological simulations with increased confidence,” J. Roy. Soc. Interface, vol. 12, no. 105, 2015.
• [11]S. Forrest and C. Beauchemin, “Computer immunology,” Immunol. Rev., vol. 216, no. 1, pp. 176–197, 2007.
• [12]R. E. Mebius, T. Miyamoto, J. Christensen, J. Domen, T. Cupedo, I. L. Weissman, and K. Akashi, “The fetal liver counterpart of adult common lymphoid progenitors gives rise to all lymphoid lineages, CD45+CD4+CD3- cells, as well as macrophages,” J. Immunol., vol. 166, no. 11, pp. 6593–601, Jun.2001.
• [13]S. Adachi, H. Yoshida, H. Kataoka, and S. Nishikawa, “Three distinctive steps in Peyer's patch formation of murine embryo,” Int. Immunol., vol. 9, no. 4, pp. 507–514, Apr.1997.
• [14]K. Honda, H. Nakano, H. Yoshida, S. Nishikawa, P. Rennert, K. Ikuta, M. Tamechika, K. Yamaguchi, T. Fukumoto, T. Chiba, and S.-I. Nishikawa, “Molecular basis for Hematopoietic/Mesenchymal interaction during initiation of Peyer's patch organogenesis,” J. Exp. Med., vol. 193, no. 5, pp. 621–630, Mar.2001.
• [15]S. Luther, K. M. Ansel, and J. G. Cyster, “Overlapping roles of CXCL13, interleukin 7 receptor alpha, and CCR7 ligands in lymph node development,” J. Exp. Med., vol. 197, no. 9, pp. 1191–1198, May2003.
• [16]L. Ohl, G. Henning, S. Krautwald, M. Lipp, S. Hardtke, G. Bernhardt, O. Pabst, and R. Förster, “Cooperating mechanisms of CXCR5 and CCR7 in development and organization of secondary lymphoid organs,” J. Exp. Med., vol. 197, no. 9, pp. 1199–1204, May2003.
• [17]J. G. Cyster, “Chemokines and cell migration in secondary lymphoid organs,” Science, vol. 286, no. 5447, pp. 2098–2102, Dec.1999.
• [18]M. Read, P. S. Andrews, J. Timmis, and V. Kumar, “Techniques for grounding agent-based simulations in the real Domain: A case study in experimental autoimmune encephalomyelitis,” Math. Comput. Model. Dyn. Syst., vol. 18, no. 1, pp. 67–86, 2012.
• [19]A. Saltelli, Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models. New York, NY, USA: Wiley, 2004.
• [20]S. Marino, I. B. Hogue, C. J. Ray, and D. E. Kirschner, “A methodology for performing global uncertainty and sensitivity analysis in systems biology,” J. Theor. Biol., vol. 254, no. 1, pp. 178–96, Sep.2008.
• [21]G. Eberl, S. Marmon, M.-J. Sunshine, P. D. Rennert, Y. Choi, and D. R. Littman, “An essential function for the nuclear receptor ROR gamma in the generation of fetal lymphoid tissue inducer cells,” Nature Immunity, vol. 5, no. 1, pp. 64–73, 2004.
• [22]J. C. J. Ray, J. L. Flynn, and D. E. Kirschner. (2009 Mar.). Synergy between individual TNF-dependent functions determines granuloma performance for controlling mycobacterium tuberculosis infection. J. Immunol. [Online]. 182(6). Available: http://www.ncbi.nlm.nih.gov/pubmed/19265149
• [23]A. Vargha and H. D. Delaney, “A critique and improvement of the CL common language effect size statistics of McGraw and Wong,” J. Edu. Behav. Stat., vol. 25, pp. 101–132, 2000.
• [24]M. Hafner, H. Koeppl, M. Hasler, and A. Wagner, “Glocal robustness analysis and model discrimination for circadian oscillators,” PLoS Comput. Biol., vol. 5, no. 10, 2009.
• [25]Z. Guo and J. C. Tay, “A comparative study on modeling strategies for immune system dynamics under HIV-1 infection,” in Proc. 4th Int. Conf. Artif. Immune Syst.,2005, pp. 220–233.
• [26]P. S. Andrews, F. Polack, A. T. Sampson, J. Timmis, L. Scott, and M. Coles, “Simulating biology: Towards understanding what the simulation shows,” in Proc. Workshop Complex Syst. Model. Simul., 2008, pp. 93–123.
• [27]J. A. Butler, K. Alden, H. V. Fernandes, J. Timmis, and M. Coles, “Novel approaches to the visualization and quantification of biological simulations by emulating experimental techniques,” in Proc. 14th Int. Conf. Synthesis Simul. Living Syst., 2014, pp. 614–621.
• [28]A. Saltelli and R. Bollardo, “An alternative way to compute Fourier amplitude sensitivity test (FAST),” Comput. Stat. Data Anal., vol. 26, no. 4, pp. 445–460, 1998.
• [29]M. D. McKay, R. J. Beckman, and W. J. Conover, “A comparison of three methods for selecting values of input variables in the analysis of output from a computer code,” Techometrics, vol. 21, pp. 239–245, 1979.
• [30]J. C. Helton, “Uncertainty and sensitivity analysis for models of complex systems,” in Proc. Comput. Methods Transport: Verification Validation, 2008, pp. 207–228.

Kieran Alden is a research associate in intelligent and adaptive systems in the Department of Electronics, University of York, York, United Kingdom. He conducts interdisciplinary research that aims to increase confidence in predictions generated by computer models of biological systems, through the development and application of novel techniques that understand and quantify the relationship between the model and the biological system that model is designed to capture. He is a member of the IEEE.
Jon Timmis is professor of intelligent and adaptive systems and head of the Department of Electronics, University of York, York, United Kingdom. He is a Royal Society-Wolfson Research Merit Award Holder and a previous holder of a Royal Academy of Engineering Enterprise Fellowship. His research interests include interdisciplinary in nature, and focus on the modelling and simulation of the immune system, the development of evidence-based simulations, and fault tolerance in biologically-inspired systems. He is a senior member of the IEEE.
Paul S. Andrews is a researcher and engineer at SimOmics and the Department of Electronics, University of York, United Kingdom. His research interests include developing evidencing methodologies and tools to support the process of modelling and simulation across various domains of science from biological to social systems.
Henrique Veiga-Fernandes is the group leader and a member of the iMM Lisboa board of directors, Portugal. He has made important contributions to immunological memory, innate lymphoid cells, and hematopoietic stem cell biology. He is an EMBO member and received several European Research Council Awards.
Mark Coles is a senior lecturer in immunology at the Centre for Immunology and Infection and the co-director of the York Computational Immuno-logy Laboratory, University of York, York, United Kingdom. He conducts research on experimental and systems immunology. He has developed a program of research to understand cellular, molecular, and biophysical mechanisms regulating immune tissue development and function. To model complex dynamical immune environments, his work has focused on the development, application, and analysis of multi-scale agent-based models.