source("skwara_et_al_2022.R")
GLV and experimental data — Hands on tutorial
Data
For our explorations, we are going to use recent data from Ishizawa and colleagues, which you can find here:
H. Ishizawa, Y. Tashiro, D. Inoue, M. Ike, H. Futamata. Learning beyond-pairwise interactions enables the bottom–up prediction of microbial community structure PNAS 121 (7) e2312396121 (2024).
The Authors inoculated duckweed (Lemna minor) with synthetic bacterial communities formed by all possible combinations of seven strains. To this end, they cultured the infected duckweed in flasks for 10 days. At the end of the experiment, they plated the communities on agar plates containing antibiotics that would allow the growth only of a particular strain. In this way, they were able to measure the final density of each of the seven strains in each of the
More modestly, here we are going to focus on a smaller pool of three strains taken from the seven available. We therefore have
Here’s the full table:
DW067 | DW102 | DW145 |
---|---|---|
0.8 | ||
0.812 | ||
0.858 | ||
0.874 | ||
0.854 | ||
0.951 | ||
0.823 | ||
0.922 | ||
1 | ||
0.529 | ||
0.356 | ||
0.532 | ||
0.409 | ||
0.416 | ||
0.548 | ||
0.645 | ||
0.41 | ||
0.367 | ||
0.663 | 0.604 | |
0.703 | 0.909 | |
0.395 | 0.661 | |
0.54 | 0.487 | |
0.919 | 0.981 | |
0.707 | 0.828 | |
0.729 | ||
0.781 | ||
0.811 | ||
1 | ||
0.811 | ||
0.793 | ||
0.617 | ||
0.919 | ||
0.645 | ||
0.513 | 0.719 | |
0.419 | 0.428 | |
0.432 | 0.546 | |
0.197 | 0.598 | |
0.249 | 0.904 | |
0.242 | 0.558 | |
0.541 | 0.563 | |
0.568 | 0.755 | |
0.371 | 0.707 | |
0.566 | 0.524 | |
0.839 | 0.433 | |
0.796 | 0.609 | |
0.328 | 1 | 0.416 |
0.321 | 0.436 | 0.58 |
0.378 | 0.57 | 0.381 |
0.386 | 0.602 | 0.563 |
0.379 | 0.774 | 0.293 |
We can therefore associated each measurement with a) the strain being measured,
Loading the code
We are going to use the code accompanying the paper
- Skwara, A., Lemos‐Costa, P., Miller, Z.R. and Allesina, S., 2023. Modelling ecological communities when composition is manipulated experimentally. Methods in Ecology and Evolution, 14(2), pp.696-707.
which you can find here
https://github.com/StefanoAllesina/skwara_et_al_2022/
we have slightly massaged the code for this tutorial.
To load the code in R
change to the directory where you have stored the tutorial, and source this code:
Sum of squares
First, we are going to use all the data to find a matrix
This can be accomplished by calling:
<- run_model(
full_SSQ datafile = "Ishizawa_3_strains.csv", # csv file containing the data to be fit
model = "full", # estimate B allowing each coefficient to take the best value
goalf = "SSQ", # minimize Sum of Squared Deviations
pars = NULL, # start from Identity matrix
skipEM = TRUE, # go directly to numerical optimization
plot_results = TRUE # plot the results
)
`mutate_if()` ignored the following grouping variables:
• Column `comm`
[1] "numerical search"
[1] 1.289597
[1] 1.289597
[1] 1.289597
[1] 1.289597
[1] 1.289597
[1] 1.289597
[1] 1.289597
[1] 1.289597
[1] 1.289597
[1] 1.289597
The values being printed are the SSQ after each round of numerical optimization (in this case, the calculation converges immediately to the solution).
The boxplots show the data (points), as well as the corresponding boxplots, with the horizontal line being the median value of
The variable
glimpse(full_SSQ)
List of 8
$ data_name : chr "Ishizawa_3_strains"
$ observed : num [1:50, 1:3] 0 0 0 0 0 0 0 0 0 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:3] "DW067" "DW102" "DW145"
$ predicted : num [1:50, 1:3] 0 0 0 0 0 0 0 0 0 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:3] "DW067" "DW102" "DW145"
$ variances : num [1:50, 1:3] 0 0 0 0 0 0 0 0 0 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:3] "DW067" "DW102" "DW145"
$ B : num [1:3, 1:3] 1.3 -0.479 0.875 0.246 2.143 ...
$ pars : num [1:9] 1.63166 0.02719 -1.28356 -0.00694 0.43581 ...
$ goal_function: num 1.29
$ goal_type : chr "SSQ"
Let’s plot the predicted vs. observed values:
plot_pred_obs(full_SSQ)
Where the points mark the predicted vs observed means, and the crosses the full data. The dashed line is the 1:1 line.
Weighted leas squares
We can repeat the calculation, but this time trying to minimize
by calling:
<- run_model(
full_WLS datafile = "Ishizawa_3_strains.csv", # csv file containing the data to be fit
model = "full", # estimate B allowing each coefficient to take the best value
goalf = "WLS",
pars = NULL, # start from Identity matrix
skipEM = TRUE, # go directly to numerical optimization
plot_results = TRUE # plot the results
)
`mutate_if()` ignored the following grouping variables:
• Column `comm`
[1] "numerical search"
[1] 67.07833
[1] 67.07833
[1] 67.07833
[1] 67.07833
[1] 67.07833
[1] 67.07833
[1] 67.07833
[1] 67.07833
[1] 67.07833
[1] 67.07833
plot_pred_obs(full_WLS)
Notice that the points have moved slightly—this is because we are penalizing deviations differently depending on the measured variance (and thus points with a higher variance can depart more strongly from the 1:1 line).
Maximum likelihood
Now we take a different approach, and take the observations to be independent samples from a log-normal distribution:
i.e., we take each variable to have a mean determined by the corresponding
<- run_model(
full_LN datafile = "Ishizawa_3_strains.csv", # csv file containing the data to be fit
model = "full", # estimate B allowing each coefficient to take the best value
goalf = "LikLN",
pars = NULL, # start from Identity matrix
skipEM = TRUE, # go directly to numerical optimization
plot_results = TRUE # plot the results
)
`mutate_if()` ignored the following grouping variables:
• Column `comm`
[1] "numerical search"
[1] -49.19108
[1] -49.19108
[1] -49.19108
[1] -49.19108
[1] -49.19108
[1] -49.19108
[1] -49.19108
[1] -49.19108
[1] -49.19108
[1] -49.19108
plot_pred_obs(full_LN)
Also in this case, we obtain a good fit.
Leave-one-out cross validation
The ultimate test for this type of model is to be able to predict experimental results before running the experiment.
In our case, we can try to leave out one of the 7 communities, and see which model recover the best result:
<- read_csv("Ishizawa_3_strains.csv") dt
Rows: 50 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (3): DW067, DW102, DW145
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# the last row contains an experiment with only DW067 and DW102
%>% slice(40) dt
# A tibble: 1 × 3
DW067 DW102 DW145
<dbl> <dbl> <dbl>
1 0.541 0.563 0
<- run_model_LOO(datafile = "Ishizawa_3_strains.csv", model = "full", goalf = "LikLN", pars = NULL, skipEM = TRUE, plot_results = TRUE,
LOO_LN LOO_row_num = 40 # exclude all experiments with this community
)
`mutate_if()` ignored the following grouping variables:
• Column `comm`
[1] "numerical search"
[1] -43.08928
[1] -43.08928
[1] -43.08928
[1] -43.08928
[1] -43.08928
[1] -43.08928
[1] -43.08928
[1] -43.08928
[1] -43.08928
[1] -43.08928
<- run_model_LOO(datafile = "Ishizawa_3_strains.csv", model = "full", goalf = "WLS", pars = NULL, skipEM = TRUE, plot_results = TRUE,
LOO_WLS LOO_row_num = 40 # exclude all experiments with this community
)
`mutate_if()` ignored the following grouping variables:
• Column `comm`
[1] "numerical search"
[1] 56.05198
[1] 56.05198
[1] 56.05198
[1] 56.05198
[1] 56.05198
[1] 56.05198
[1] 56.05198
[1] 56.05198
[1] 56.05198
[1] 56.05198
<- run_model_LOO(datafile = "Ishizawa_3_strains.csv", model = "full", goalf = "SSQ", pars = NULL, skipEM = TRUE, plot_results = TRUE,
LOO_SSQ LOO_row_num = 40 # exclude all experiments with this community
)
`mutate_if()` ignored the following grouping variables:
• Column `comm`
[1] "numerical search"
[1] 1.042831
[1] 1.042831
[1] 1.042831
[1] 1.042831
[1] 1.042831
[1] 1.042831
[1] 1.042831
[1] 1.042831
[1] 1.042831
[1] 1.042831
All three models are quite successful at recovering the observed means for the community we left out.
Simplified models
We have seen in the lecture that this approach estimates all
The idea brought forward by Skwara et al. is to approximate the matrix
i.e., a model in which each diagonal element
A more general model using
in which the diagonal coefficients
Another advantage of this approach is that we can write the inverse in linear time, thereby removing the only big computational hurdle of the approach:
These models can be derived from the consumer-resource framework that you have seen in class.
We can test different versions of the simplified model. A model with only the diagonal and an extra parameter performs very poorly:
<- run_model(
diag_a11 datafile = "Ishizawa_3_strains.csv", # csv file containing the data to be fit
model = "diag_a11t",
goalf = "LikLN",
pars = NULL, # start from Identity matrix
skipEM = TRUE, # go directly to numerical optimization
plot_results = TRUE # plot the results
)
`mutate_if()` ignored the following grouping variables:
• Column `comm`
[1] "numerical search"
[1] -22.62175
[1] -22.62175
[1] -22.62175
[1] -22.62175
[1] -22.62175
[1] -22.62175
[1] -22.62175
[1] -22.62175
[1] -22.62175
[1] -22.62175
plot_pred_obs(diag_a11)
A model with symmetric interactions,
<- run_model(
diag_vvt datafile = "Ishizawa_3_strains.csv", # csv file containing the data to be fit
model = "diag_vvt",
goalf = "LikLN",
pars = NULL, # start from Identity matrix
skipEM = TRUE, # go directly to numerical optimization
plot_results = TRUE # plot the results
)
`mutate_if()` ignored the following grouping variables:
• Column `comm`
[1] "numerical search"
[1] -38.12933
[1] -38.36503
[1] -38.37226
[1] -38.38069
[1] -38.38508
[1] -38.39331
[1] -38.39618
[1] -38.39655
[1] -38.39655
[1] -38.39655
plot_pred_obs(diag_vvt)
And, finally, the model with 8 parameters,
A model with symmetric interactions,
<- run_model(
diag_vwt datafile = "Ishizawa_3_strains.csv", # csv file containing the data to be fit
model = "diag_vwt",
goalf = "LikLN",
pars = NULL, # start from Identity matrix
skipEM = TRUE, # go directly to numerical optimization
plot_results = TRUE # plot the results
)
`mutate_if()` ignored the following grouping variables:
• Column `comm`
[1] "numerical search"
[1] -34.87107
[1] -49.06619
[1] -49.06753
[1] -49.0681
[1] -49.0681
[1] -49.0681
[1] -49.0681
[1] -49.06811
[1] -49.06811
[1] -49.06811
plot_pred_obs(diag_vwt)