NGS Technology in Monitoring the Genetic Diversity of Cytomegalovirus Strains

Modern molecular genetic methods, massive parallel sequencing in particular, allow for genotyping of various pathogens with the aim of their epidemiological marking and improvement of molecular epidemiological surveillance of actual infections, including cytomegalovirus infection. The aim of the study is to evaluate the next-generation sequencing (NGS) technology for genotyping clinical isolates of cytomegalovirus (CMV). Materials and Methods The object of the study were samples of biological substrates (leukocyte mass, saliva, urine) taken from patients who underwent liver and kidney transplantation. Detection of CMV DNA was carried out by a real-time PCR using commercial diagnostic AmpliSense CMV-FL test systems (Central Research Institute for Epidemiology, Moscow, Russia). DNA extraction was performed using DNA-sorb AM and DNA-sorb V kits (Central Research Institute for Epidemiology) in accordance with manufacturer’s manual. The quality of the prepared DNA library for sequencing was assessed by means of the QIAxcel Advanced System capillary gel electrophoresis system (QIAGEN, Germany). Alignment and assembly of nucleotide sequences were carried out using CLC Genomics Workbench 5.5 software (CLC bio, USA). The sequencing results were analyzed using BLAST of NCBI server. Results CMV DNA samples were selected for genotyping. The two variable genes, UL55(gB) and UL73(gN), were used for CMV genotype determination, which was performed using NGS technology MiSeq sequencer (Illumina, USA). Based on the exploratory studies and analysis of literature sources, primers for genotyping on the UL55(gB) and UL73(gN) genes have been selected and the optimal conditions for the PCR reaction have been defined. The results of sequencing the UL55(gB) and UL73(gN) gene fragments of CMV clinical isolates from recipients of solid organs made it possible to determine the virus genotypes, among which gB2, gN4c, and gN4b were dominant. In some cases, association of two and three CMV genotypes has been revealed. Conclusion The application of the NGS technology for genotyping cytomegalovirus strains can become one of the main methods of CMV infection molecular epidemiology, as it allows for obtaining reliable results with a significant reduction in research time.


Introduction
At present, molecular and genetic investigation methods occupy one of the important places in the diagnosis of infectious diseases and their epidemiological surveillance. Equipping modern laboratories with automatic capillary sequenators using Sanger's method, as well as platforms for massive parallel sequencing (next-generation sequencing, NGS) provides the possibility to perform genotyping of different biotechnologies causative agents for their epidemiological marking and improvement of molecular epidemiological surveillance of the actual infections including cytomegalovirus infection.
Cytomegalovirus (CMV) is one of the main causes of congenital pathology in newborns (fetal infection rate is within 6-53%, among the preterm babies -70%), and the main cause of complication development after transplantation of hematopoietic cells, solid organs, and severe pneumonias in HIV-infected patients [1,2].
Genetic variability of CMV allows the virus to realize the ways of immune evasion (for example, changes of antigen epitopes), enhance the tropism for the cells of the host organism, increase the efficiency of virus replication, and alter the sensitivity to pharmaceutic preparations.
Different methods are used to identify the CMV variants contained in the biomaterials of the infected people: the analysis of DNA restriction fragment length polymorphism; Sanger fragment sequencing, real-time genotype-specific PCR, massive parallel sequencing. Preferable are the DNA analysis methods based on sequencing. They helped demonstrate that there exists a large amount of genetically diverse CMV strains in the world [7][8][9].
Polymorphic genes are used as an epidemiologic marker for studying virus circulation in human population. Genomic variants of CMV strains from various geographic regions may be identical, with a substantial difference in the frequency of their occurrence. Besides, there is the probability of detecting rare or new CMV variants in the regions isolated from the rest of the world [3,7].
Technology of NGS gives the possibility to simultaneously sequence thousands of DNA molecules increasing thereby the speed of investigation and the volume of the data obtained. Application of the NGS platform for genotyping allows one to obtain reliable results with a substantial reduction of time for their acquisition and analysis. Besides, employing NGS technology it is possible in one reaction to determine several virus strains including those which are present in minor amounts [12]. It has been known that immunocompetent patients (HIV-infected or those after organ transplantation) and newborns with congenital CMV infection are often observed to be infected by more than one CMV strain. The CMV strains persisting in the recipient's body before organ transplantation and in the donor's organism may reactivate in patients receiving immunosuppressive therapy after the operation. The strains in these patients may belong to one or various genotypes [8,13].
Besides, some researchers [8,14] have shown that patients with CMV infection, caused by association of different virus genotypes, have higher viral load requiring more time for CMV elimination.
The aim of the study is to evaluate the next generation sequencing technology for genotyping cytomegalovirus clinical isolates.

Materials and Methods
Biological material for investigation (leucocyte mass, saliva, urine) was taken from patients treated at the Department of Transplantology of Privolzhsky District Medical Center of Federal Medico-Biologic Agency of Russia (Nizhny Novgorod, Russia) after liver and kidney transplantation. The clinical material was selected and transported in compliance with SanPiN 3.3686-21 "Sanitary and epidemiological requirements for the prevention of infectious diseases".
Cytomegalovirus DNA was detected by the real-time PCR method using diagnostic AmpliSense CMV-FL test-systems (Central Research Institute for Epidemiology, Moscow, Russia). DNA was extracted with the help of DNA-sorb AM and DNA-sorb V kits (Central Research Institute for Epidemiology) according to the instructions for use. The sensitivity of the test systems, as specified in the certificate, was 1000 virions/ml.
For genotyping, 16 samples of CMV DNA were selected. CMV genotypes were determined on two variable genes, UL55(gB) and UL73(gN), using NGS MiSeq system (Illumina, USA). Based on the exploratory studies and the analysis of literature sources, 19 pairs of primers have been tested ( Table 1).
The study design is presented in Table 2.

Results
Based on the analysis of the literature data, primers used by different researchers for identifying the CMV genotype on UL55(gB) and UL73(gN) genes were selected.
Primers were selected by the following criteria: matching between the primer and the analyzed gene region, purity of the PCR-generated fragment, optimal annealing temperature, the size of the fragment being obtained. Primers offered by the six works [4,[15][16][17][18][19] were considered for genotyping on UL55(gB) gene. All the authors offered a variable region located at the N-end of the gB protein as a target fragment for genotype separation. Primers suggested by de Vries et al. in 2012 [19], who recommended to use separate pairs of primers, flanking fragments 92 bp long, for each the help of the adapter sequence, a DNA fragment was hybridized with one or two primers immobilized on the hard surface and participating in PCR. The reaction mixture containing a set of enzymes and a pool of the DNA samples was introduced to the flow cell of the MiSeq system for sequencing. The obtained data array was aligned and integrated using a reference genome and de novo. The acquired short reads were aligned and assembled relative to reference genome by means of sequencer firmware.
The following sequences of CMV UL55(gB) and UL73(gN) genes with known genotypes taken from the GenBank database were selected as the reference ones: full-length genomes GQ466044, HCU66425, FJ527563, BK000394, GQ121041, GQ221975, X17403, AY446894, GQ466044; sequences of UL55(gB) gene: HS5GLYBM, HS5GLYBL, HS5GLYBK, X04606, HS5GLYBI, M60929; biotechnologies gB genotype, were excluded from the study. They used four pairs of primers for the investigation of each sample, which increased the time of the study. This approach is justified in case of determining the genotype by PCR with electrophoretic detection of the amplified fragments in the agarous gel, but is unacceptable for genotype detection by sequencing method, which we intended to employ in our study. Primers proposed by de Albuquerque and Costa in 2003 [15] flank 305 bp long variable region located nearer to the gB C-end. The remaining primer variants covered approximately the same region located at the N-end of the gB protein, the length of which varied from 256 to 522 bp. It should be noted that primer pairs for the nested PCR, suggested by Barbi et al. in 2006 [16], occupy the region previous to the region of UL55 gene, and flank the largest 522-bp-long gene fragment. We have corrected the 5'-primer nucleotide sequence for genotyping of UL55(gB) gene proposed by Chou and Dennison in 1991 [4].
Variable nucleotides were replaced with degenerate ones. All primer pairs were first tested on the control AD169 CMV strain and then on clinical samples. The best results were obtained with the primers also proposed by Chou and Dennison in 1991 [4].
In order to select the optimal primers for CMV genotyping by UL73(gN) gene, primers proposed in works [11,18,20] have been analyzed. Primers proposed by Lisboa et al. in 2012 [20] were designed for the nested PCR and, after testing, appeared to be complementary to the region of UL72 gene rather than UL73 and therefore were excluded from the analysis. Primers suggested by Grosjean et al. in 2009 [18] cover substantially the same variable region of UL73 gene as primers proposed by Pignatelli et al. in 2003 [11]; however, a shift is observed in the region of the primer placement and the length of the amplified fragment increases by 20 bp. The comparative analysis of the primer work efficiency on clinical samples has shown that the frequency of detecting a specific fragment using a pair of primers proposed by Pignatelli et al. [11] is substantially higher.
Thus, primers proposed by Chou and Dennison in 1991 and Pignatelli et al. in 2003 were selected for CMV genotyping on UL55 and UL73 genes, respectively [4,11].
In the process of work, the optimal sample volume, 10 μl, for conducting the reaction has been determined. The optimal conditions for PCR were also selected: the temperature and time of primer annealing, the number of reaction cycles. As a result, the following parameters were set: 98°C -2 min; 98°C -10 s; 55°C -15 s; 72°C -1 min; 40 cycles ( Figure 1 and 2).
The analysis of UL55 and UL73 gene sequencing results allowed us to determine the genotype landscape of CMV circulating among the population of one region of Russia. For example, in patients who undergone transplantation of solid organs, 4 gB genotypes of CMV were identified: gB2, gB1, gB3, gB4 (in the order of occurrence frequency). Concurrent presence of two CMV genotypes, gB3 and gB4, was found for one patient.
The analysis of sequencing UL73 fragments from the CMV isolates taken from the solid organs recipients resulted in the detection of 5 gN variants: gN4c, gN4a, gN4b, gN1, gN3b.
Simultaneous presence of several gN genotypes of CMV was identified in several patients: association of two and three genotypes was revealed in liver recipients: gN4c, gN4b and gN3b, gN4a, gN1; genotypes gN4c, gN1 and gN4c, gN4a were found in two kidney recipients. The data obtained show that NGS technology makes it possible to perform a detailed and deep analysis of genetic variability of viral agents of infectious diseases necessary for solving both fundamental and practical tasks of epidemiology and to identify associations of various CMV genotypes in the sample of clinical material, which influences essentially the choice of etiotropic therapy.

Discussion
Current molecular NGS technologies are the most promising and high precision methods for evaluation of genetic diversity of infectious disease agents including CMV infections.
As a result of the exploratory work, pairs of primers, reaction conditions, and design of the result analysis have been selected.
Presently, the best-studied genes, UL55(gB), UL73(gN), UL74(gO), UL144-TNRF, are used by foreign researchers as potential epidemiological markers for differentiation of clinical CMV isolates. The frequency of the CMV genotype occurrence in various geographic regions worldwide is different and is determined by the examined cohort. It has been established that gB2 genotype prevails in the HIV-infected group, while in those who undergone organ transplantation, gB1 and gN3a genotypes are encountered more often, genotypes gB1, gB2 and gN4c, gN4a genotypes dominate among children with congenital CMV infection [14,[21][22][23][24].
The selected genotyping parameters and the applied NGS technology allowed us to determine that gB2 and gN4c CMV genotypes prevailed in clinical samples collected from the recipients of solid organs. In some cases, the NGS technology made it possible to identify the CMV infection caused by the association of two and three CMV genotypes.
The obtained data show that NGS technology enables simultaneous search for the entire spectrum of CMV genotypes present in one sample and identification of both the genotype and regional structure of typical CMV population. Such investigations are necessary for examination of people in the CMV risk groups including babies in their first years of life and patients after organ transplantation. Besides, as mentioned above, the CMV infection caused by the association of several CMV genotypes may have a more severe course and require more time for virus elimination.
Investigations directed to the study of the genetic CMV diversity are needed for obtaining new knowledge on the prevalence of its different gene variants among population, improved quality of CMV infection diagnosis, effective management of risk groups.

Conclusion
Application of NGS technology for studying genetic diversity of cytomegalovirus gives the possibility to optimize molecular monitoring of the causative agent of cytomegalovirus infection, dynamically monitor the risk groups (pregnant, newborns, children of the first year of life, and patients who undergone solid organ transplantation), predict epidemiological situation for cytomegalovirus infection, and improve the system of epidemiological surveillance of infections in general. Data on the genotypes of the circulating cytomegalovirus provide objective information about specific genotype structure of the CMV population in the region, which opens new perspectives for the development of vaccines and immunobiological preparations. Conflicts of interest. The authors have no conflicts of interest to declare.