Bayesian phylodynamic analysis reveals the evolutionary history and the dispersal patterns of citrus tristeza virus in China based on the p25 gene | Virology Journal

0
36


Sample collection

For each sample, suspected CTV-infected leaves of four different directions (east, south, west, and north) from the same Newhall orange tree were cut off with a germ-free surgical scissors. The sampled leaves were placed in a Ziplock bag, transferred to the lab, and stored in the refrigerator at -80℃ until RNA extraction. The wild citrus samples were collected from Hunan and Jiangxi provinces, and the cultivated samples were collected from Sichuan, Chongqing, Hubei, Zhejiang, Fujian, Guangdong and Guangxi provinces. According to the different habitats and geographical regions of the citrus growing, the citrus producing area were divided into five groups as following (Fig. 1): SCH (Sichuan, Chongqing, and Hubei provinces), YN (Yunnan wild citrus main communities), HJ (Hunan cultivated citrus main producing areas, Hunan and Jiangxi wild citrus main communities), and GG (Guangxi and Guangdong provinces), and FZ group (Fujian and Zhejiang provinces; Fig. 1).

Besides, 39 CTV p25 gene sequences and associated information including collection date and geographic location were retrieved from GenBank for this study. Those public data collected between 2005 and 2019 from citrus producing regions, including Hubei, Zhejiang, Hunan, Chongqing, Yunnan and Guangxi provinces (Supplementary Table 1).

RNA extraction and p25 gene sequencing

Before extraction, the leaves stored under − 80℃ were grinded in liquid nitrogen. About 1 g of the grinded leaves for each sample was used for RNA extraction, following the instructions of TRIzol reagent (TaKaRa, Beijing, China). The RNA samples were stored under − 80℃ for the following steps.

Total RNA were reverse-transcribed using a PrimeScript™ RT reagent Kit with gDNA Eraser (Perfect Real Time, TaKaRa). CP gene was then amplified by regular PCR with the synthesized cDNA samples. Each PCR assay included the following components: 10× PCR buffer (Mg2 + plus; including 100 mM Tris-HCl, pH 8.9; 500 mM KCl; 15 mM MgCl2), 10 mM dNTP, 0.5 µM Forward primer, 0.5 µM Reverse primer, and 5 U/µl r-Taq, CP primers was designed by Michael et al [23] (CP-F: 5’-ATGGACGACGAAACAAAG-3’, CP-R:5’-TCAACGT GTGTTGAATTT-3’). Water was used as negative control, and cDNA of a CTV isolate was used as positive control. The PCR conditions included: one cycle at 94 ℃ for 5 min, then 35 cycles at 94 ℃ for 30 s, 55 ℃ for 30 s, and 72 ℃ for 45 s, with a final extension for 12 min at 72 ℃. The amplicons were visualized on 1.2% agarose gel using a gel imaging system (Universal Hood II, BIO-RAD, America) after electrophoresis for 30 min and stained with M5 Gel red dye. The positive cDNA samples that produced a correct amplicon were kept for further analysis and stored at -80℃. PCR products were ligated to PMD19-T vector (Tiangen, Beijing, China) and then subsequently propagated in E. coli strain DH5α. The recombinant plasmids were extracted and sequenced in both directions by Sangon Biological (Shanghai). At least four cDNA clones for each amplicon were sequenced to obtain a consensus sequence using SeqMan 7.1 software [24].

Molecular variation analysis

To determine whether more pronounced genetic differences occurred between cultivated and wild citrus populations, we used the analysis of molecular variance (AMOVA) to partition the variation component among and within the population by GenALEx 6.502. Meanwhile, we computed the selective pressure with default arguments by MEGA-X, mainly obtained by calculating the replacement ratio of nonsynonymous codons and synonymous codons in the protein-coding sequence, namely, ω = dN/dS, where ω = 1 represents neutral mutation, ω < 1 means negative selection, and ω > 1 indicates positive selection [25].

Phylogenetic analysis of CTV

To avoid bias in the phylogenetic, phylodynamic, and phylogeographic analysis, we used the RDP, GENECONV, MAXCHI, CHIMAERA, 3SEQ, BOOTSCAN, LARD, and SISCAN heuristic recombination detection methods implemented in the RDP 5.0 software [26] package with default settings (p-values less than 10− 5). Recombinats detected by at least four of the seven algorithms were considered to be true recombinats. Meanwhile, we use the Bayesian information criterion in ModelFinder [27] in IQ-TREE [28] software to select the substitution model.

To infer the substitution rate and timescale of CTV p25 gene, we analyzed the p25 gene sequences using a molecular clock. Firstly, we used Bayesian evaluation of temporal signal (BETS) [29] to test temporal structure in the data set. This method used the marginal likelihood (MLE) estimated by generalized stepping-stone sampling (GSS) to compare heterochronous model (Mhet) and isochronous model (Miso). Bayes factor (BF) was used to identify the best fit model to the data set, if log(BF) = log(P(Y|Mhet)) – log(P(Y|Miso)) > 5 which indicate a strong preference of the Mhet and providing decisive support for the presence of temporal signal in the data set [30, 31](Guy et al. 2016; Mathieu et al. 2020).

Using the substitution models, as selected above, the path sampling (PS) and stepping-stone sampling (SS) methods [32] were employed in BEAST v1.10 [33]. Two molecular clock models were combined (strict and uncorrelated lognormal relaxed) with three different demographic models (the constant-size coalescent, exponential growth coalescent, and Bayesian skyline coalescent) to choose the best combination model.

To infer the substitution rate and the most recent common ancestor of CTV in China, we used sampling time of each sequence to calibrate the molecular clock. Posterior distributions of parameters were estimated by Markov chain Monte Carlo (MCMC) sampling, and the best models were used to obtain maximum clade credibility (MCC) tree. All Bayesian analysis was run for 310 million steps with sampling every 31,000 steps, to ensure that the effective sampling size was greater than 200. After discarding the first 10% samples as burn-in, we used Tracer 1.71 to examined the convergence of MCMC chains (the effective sampling size greater than 200). Finally, the MCC tree was summarized using Tree Annotator v1.10.4 from the BEAST v1.10 package and visualized with FigTree 1.4.3.

Phylogeographic and demographic history

To gain insight into the circulation of CTV across the citrus-producing of China, a phylogeographic analysis was conducted using Bayesian stochastic search variable selection (BSSVS) [34] to model the geographic transmission patterns in BEAST v1.10. Five geographic regions (SCH, YN, HJ, GG, and FZ regions) were coded as sampling location, and posterior distributions of parameters were estimated by MCMC sampling as described above. Besides, to test the robustness of the dispersal patterns of CTV migration in China, the dataset in this study were reanalysed by phylogeographic analysis with the data of YN merged into HJ group, which contains all the wild samples in this study. We run MCMC simulations for 100 million steps across three independent Markov chains and collected samples every 10,000 steps. After discarding the first 10% samples as burn-in, we used Tracer 1.71 to checked the effective sampling size, accepting only values higher than 200 for all the parameters.

The best-supported pair wise diffusions were identified using BF in SPREAD3 [35]. Migration pathways were considered to be of statistical significance when BF greater than 3 and mean posterior value greater than 0.5 [36]. We also estimated the number of expected location-state transitions (Markov jump counts) along the branches of the phylogeny [37].



Source link