More

    Viral kinetics of sequential SARS-CoV-2 infections


    Study design

    Between March 11, 2020, and July 28, 2022, the NBA conducted regular surveillance for SARS-CoV-2 infection among players, staff, and affiliates as part of an occupational health program. This included frequent viral testing (often daily during high community COVID-19 prevalence) using a variety of platforms, but primarily via nucleic acid amplification tests, as well as clinical assessment including case diagnosis and symptom tracking. To assess viral concentration, RT-qPCR tests were conducted when possible, using anterior nares and oropharyngeal swabs collected by a trained professional and combined into a single viral transport media. Cycle threshold (Ct) values were obtained from the Roche cobas target 1 assay. Ct values were converted to genome equivalents per milliliter using a standard curve16. Positive controls were run on every plate and the efficacy of the primer and probe sequences used in the assay were routinely monitored for mutations that would reduce assay sensitivity. Data on participant age and vaccination status were collected where possible. Viral lineages were assigned using whole-genome sequencing, when feasible. This resulted in a longitudinal dataset of 424,401 SARS-CoV-2 tests with clinical COVID-19 history and demographic information for 3021 individuals.

    Vaccination and booster status was assigned at the time of the first positive test for each infection. Full vaccination corresponded to 14 days following either the second dose of a Pfizer or Moderna vaccine or the first dose of a Johnson and Johnson/Janssen vaccine. A person was considered “boosted” 14 days after an additional Pfizer or Moderna dose following their initial vaccination course.

    Genome sequencing and lineage assignment

    Whole genome sequencing of remnant diagnostic samples was performed to determine viral lineages using an overlapping amplicon-based library preparation strategy (i.e., Primal Seq). Following previously described methods, RNA was extracted from clinical samples and confirmed as SARS-CoV-2 positive17. Libraries were prepared in accordance with the selected sequencing platform. For samples sequenced on the Oxford Nanopore Technologies MinION platform, following amplicon generation samples were prepared for multiplex sequencing using the Ligation Sequencing Kit (SQK-LSK114) with Native Barcoding (SQK-NBD114.24). Final libraries were sequenced to a target of 100,000 reads per sample. For samples sequenced on Illumina platforms, libraries were prepared using the amplicon-based Illumina COVIDseq Test v033 with COVID-Seq ARTIC viral amplication primer set (V4, 384 samples, cat#20065135) and sequenced 2×74 on Illumina NextSeq 550 or 2×100 on the Illumina NovaSeq600 following Illumina’s documentation. The resulting FASTQ files were processed and analyzed on Illumina BaseSpace Labs using the Illumina DRAGEN COVID Lineage Application18; versions included were 3.5.0, 3.5.1, 3.5.2, 3.5.3, and 3.5.4. The DRAGEN COVID Lineage pipeline was run with default parameters as recommended by Illumina. Lineage assignment and phylogenetics analysis were accomplished using the most recent versions of Pangolin19 and NextClade20, respectively. Sequences are available at BioProject under accession number PRJNA1014408.

    Estimating viral kinetic parameters

    We characterized the viral kinetics of the well-documented infections by fitting a hierarchical piecewise linear model to the viral concentration measurements on a logarithmic scale (as measured by the PCR cycle threshold, or Ct), following previous methods5. The model captures the viral proliferation time (i.e., time from first possible detection to peak), peak viral concentration, and viral clearance time (i.e., time from peak to last possible detection) of acute SARS-CoV-2 infections. Using this approach, the viral kinetics of an infection can be described by three “hinge” points: (1) the theoretical time of first PCR positivity to, (2) the peak viral load δ (which occurs at time tp), and the theoretical time of last PCR positivity tr. According to this model, the expected viral load as measured by Ct units beyond the limit of detection, E[y], sits at the limit of detection prior to time to, then increases linearly to δ at time tp, then decreases linearly back to the limit of detection at time tr (Supplementary Figure 1). From these values, we can derive the proliferation ωp and clearance times ωr: ωp = tpto and ωr = trtp.

    We characterized an individual i’s proliferation time ωp[i], clearance time ωr[i], and peak viral load δ[i] using the following formulae:

    $${\omega }_{p[i]}={{{{\mathrm{Exp}}}}}\left[{\beta }_{p}+{\beta }_{p[c]}+\mathop{\sum}\limits_{a}{\beta }_{p[a]}+{\tau }_{p}{\eta }_{p[i]}\right]{\omega }_{p}^{\ast }$$

    (1)

    $${\omega }_{r[i]}={{{{\mathrm{Exp}}}}}\left[{\beta }_{r}+{\beta }_{r[c]}+\mathop{\sum}\limits_{a}{\beta }_{r[a]}+{\tau }_{r}{\eta }_{r[i]}\right]{\omega }_{r}^{\ast }$$

    (2)

    $${\delta }_{[i]}={{{{\mathrm{Exp}}}}}\left[{\beta }_{\delta }+{\beta }_{\delta [c]}+\mathop{\sum}\limits_{a}{\beta }_{\delta [a]}+{\tau }_{\delta }{\eta }_{\delta [i]}\right]{\delta }^{\ast }$$

    (3)

    Re-arranging yields the following equations:

    $$\log \left({\omega }_{p[i]}/{\omega }_{p}^{\ast }\right)={\beta }_{p}+{\beta }_{p[c]}+\mathop{\sum}\limits_{a}{\beta }_{p[a]}+{\tau }_{p}{\eta }_{p[i]}$$

    (4)

    $$\log \left({\omega }_{r[i]}/{\omega }_{r}^{\ast }\right)={\beta }_{r}+{\beta }_{r[c]}+\mathop{\sum}\limits_{a}{\beta }_{r[a]}+{\tau }_{r}{\eta }_{r[i]}$$

    (5)

    $$\log \left({\delta }_{[i]}/{\delta }^{\ast }\right)={\beta }_{\delta }+{\beta }_{\delta [c]}+\mathop{\sum}\limits_{a}{\beta }_{\delta [a]}+{\tau }_{\delta }{\eta }_{\delta [i]}$$

    (6)

    The left-hand side of these equations are the logged multiplicative factor between the individual-level parameter value (indexed with subscript [i]) and a fixed, baseline value for these parameters (marked with *); for example, a proliferation time ωp[i] of 6 days relative to a baseline value ωp* of 3 days would yield a logged multiplicative factor of log(6/2) ≈ 0.7. The choice of baseline value is arbitrary and is included here to improve the robustness of the MCMC algorithm by setting the parameters on a similar scale.

    The remaining coefficients (β, τ, η) are estimated from the data. The summed β coefficients constitute the population mean for the associated parameter. Thus, the unadjusted population mean proliferation time, clearance time, and peak viral load are represented by βp, βr, and δ, respectively. These unadjusted means are adjusted according to the cardinality of infection (first or second, represented by β values with subscript [c]) and the age group, variant category, and vaccination status of the individual (represented by β values with subscript [a]). Together, these β values constitute the upper level of the hierarchical model.

    The individual-level effects are obtained by multiplying τ, the standard deviation of the population distribution, and η, which is a standard normal random variable drawn independently for each person i. This follows the non-centered model parameterization for hierarchical models advocated by Gelman et al.21.

    Each β coefficient was assigned a Normal(0,1) prior distribution. This prior was chosen because, after exponentiating and multiplying by the fixed baseline values (in Eqs. S1–S3), the middle 98% of the prior distribution corresponds to a range of roughly one-tenth to ten times the baseline value. For example, for a fixed baseline proliferation time of ωp* = 3 days, the middle 98% of the prior unadjusted population mean distribution (corresponding to Exp[βp] × ωp*) would cover a range of roughly 0.3 days to 30 days. Thus, we considered these to be broad, minimally informative priors. The qualitative findings from the main text were unchanged when using narrower priors of Normal(0, 0.25).

    Similarly, we specified a Normal(0,1) distribution, truncated to be non-negative, as the priors for the τ coefficients. With this choice, the individual-level draws could have a standard deviation up to 10 times larger than the mean, population-level distribution (i.e., the distribution of the summed β values), following the same logic as before.

    As in prior work5, we characterized the likelihood of observing a given ΔCt(t) using the following mixture model:

    $$L(\,{y}_{[it]}{|{{{{{\rm{\delta }}}}}}}_{[i]},{t}_{p[i]},{{{{{{\rm{\omega }}}}}}}_{p[i]}{{{{{{\rm{\omega }}}}}}}_{r[i]})= (1-{{{{{\rm{\lambda }}}}}})[\,\,{f}_{N}(x|E[\,{y}_{[it]}|{{{{{{\rm{\delta }}}}}}}_{[i]},{t}_{p[i]},{{{{{{\rm{\omega }}}}}}}_{p[i]},{{{{{{\rm{\omega }}}}}}}_{r[i]}],{{{{{\rm{\sigma }}}}}}) \\ +{I}_{lod}{F}_{N}(0|E[\,{y}_{[it]}|{{{{{{\rm{\delta }}}}}}}_{[i]},{t}_{p[i]},{{{{{{\rm{\omega }}}}}}}_{p[i]},{{{{{{\rm{\omega }}}}}}}_{r[i]}],{{{{{\rm{\sigma }}}}}})] \\ +{{{{{\rm{\lambda }}}}}}{f}_{Exp}(x|k)$$

    (7)

    The left-hand side of the equation denotes the likelihood (L) of observing a given viral load for person i at time t, y[it], as measured by Ct deviation from the limit of detection, given the model parameters δ (peak viral load), tp (time of peak viral load), ωp (proliferation time), and ωr (clearance time) for individual i and time t. Recall that E[y[it] | δ[i], tp[i], ωp[i], ωr[i]] is the expected viral load for person i at time t as specified by the viral kinetic model given the parameters. Here, σ denotes the observation noise, i.e., the variation in observed vs. expected (model-derived) viral load for a person at a given time point. This noise is also estimated from the data, using a prior distribution of σ ~ Normal(0,1), truncated to nonnegative values. This roughly covers a range of +/– 2.5 Ct for the measurement error, which falls within the range of measurement error based on repeated viral load measurements from previous studies in the same cohort11.

    The likelihood captures two distinct processes: the viral kinetic process, denoted by the bracketed term preceded by a (1−λ); and false negatives, denoted by the term preceded by a λ. In the bracketed term representing the modeled viral kinetic process, fN(x | E[y], σ(t)) represents the Normal PDF evaluated at x with mean E[y] (generated by the model equations above) and observation noise σ(t). FN(0 | E[y], σ(t)) is the Normal CDF evaluated at 0 with the same mean and standard deviation. This represents the scenario where the true viral load goes below the limit of detection, so that the observation sits at the limit of detection. Ilod is an indicator function that is 1 if y = 0 and 0 otherwise; this way, the FN term acts as a point mass concentrated at y = 0. Last, fExp(x | κ) is the Exponential PDF evaluated at x with rate κ. We set κ = log (10) so that 90% of the mass of the distribution sat below 1 Ct unit and 99% of the distribution sat below 2 Ct units, ensuring that the distribution captures values distributed at or near the limit of detection. We did not estimate values for λ or the exponential rate because they were not of interest in this study; we simply needed to include them to account for some small probability mass that persisted near the limit of detection to allow for the possibility of false negatives.

    Model parameters were fit using a Hamilton Monte Carlo algorithm implemented in R (version 4.1.2) and Stan (version 2.21.3). Four chains were run for 2000 iterations each, and the first half of each chain was discarded as burn-in, yielding 4000 total posterior draws. Convergence was assessed using a Gelman–Rubin statistic of <1.1 for all parameters and the absence of divergent transitions. Code for the full analysis is available at https://github.com/skissler/Ct_SequentialInfections.

    Statistical approach

    We assessed differences in viral kinetic parameters across category subsets (e.g., for first vs. second infections) by subtracting the relevant posterior draws and measuring the posterior probability mass of these differences that sat above/below 0, depending on the scenario. When fewer than 5% of these differenced posterior draws sat above/below zero, we took this as evidence of a significant difference.

    To assess relative persistence in individual-level viral kinetic attributes across infections, we measured both the Pearson (raw) and Spearman (rank-based) correlations between the adjusted first-infection and second-infection proliferation time, clearance time, and peak viral load at the individual level. To perform the adjustment, we subtracted the model-estimated adjustments for age, variant, and vaccination status, leaving only the effects from infection cardinality and individual variation. We measured the Pearson and Spearman correlation for each of the 4000 draws from the posterior distribution generated by the Hamiltonian Monte Carlo fitting approach. This yielded a mean and 95% credible interval for the Pearson and Spearman correlations between each of the first- and second-infection viral kinetic parameters.

    Study oversight

    This work was approved as “research not involving human subjects” by the Yale Institutional Review Board (HIC protocol # 2000028599), as it involved de-identified samples. This work was also designated as “exempt” by the Harvard Institutional Review Board (IRB20-1407). Informed consent for virological testing and anonymized analysis of the results was obtained from all participants.

    Reporting summary

    Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.



    Source link

    Latest articles

    Related articles

    Discover more from Blog | News | Travel

    Subscribe now to keep reading and get access to the full archive.

    Continue reading