Improved population genetics parameters through control for microsatellite stuttering
A simple procedure to detect, test for the presence of stuttering, and cure stuttered data with spreadsheet programs
Recommendation: posted 23 August 2022, validated 28 August 2022
Molecular markers have drastically changed and improved our understanding of biological processes. In combination with PCR, markers revolutionized the study of all organisms, even tiny insects, and eukaryotic pathogens amongst others. Microsatellite markers were the most prominent and successful ones. Their success started in the early 1990s. They were used for population genetic studies, mapping of genes and genomes, and paternity testing and inference of relatedness. Their popularity is based on some of their characteristics as codominance, the high polymorphism information content, and their ease of isolation (Schlötterer 2004). Still, microsatellites are the marker of choice for a range of non-model organisms as next-generation sequencing technologies produce a huge amount of single nucleotide polymorphisms (SNPs), but often at expense of sample size and higher costs.
The high level of polymorphism of microsatellite markers, which consist of one to six base-pair nucleotide motifs replicated up to 10 or 20 times, results from slippage events during DNA replication. Short hairpin loops might shorten the template strand or extend the new strand. However, such slippage events might occur during PCR amplification resulting in additional bands or peaks. Such stutter alleles often appear to differ by one repeat unit and might be hard to interpret but definitively reduce automated scoring of microsatellite results.
A standalone software package available to handle stuttering is Microchecker (van Oosterhout et al., 2004, which nowadays faces incompatibilities with updated versions of different operating systems. Thus, de Meeûs and Noûs (2022), in their manuscript, tackled the stuttering issue by developing an OS-independent analysis pipeline based on standard spreadsheet software such as Microsoft Office (Excel) or Apache Open Office (Calc). The authors use simulated populations differing in the mating system (pangamic, selfing (30%), clonal) and a different number of subpopulations and individuals per subpopulation to test for differences among the null model (no stuttering), a test population with 2 out of 20 loci (10%) with stuttering, and the latter with stuttering cured. Further to this, the authors also re-analyse data from previous studies utilising organisms differing in the mating system to understand whether control of stuttering changes major parameter estimates and conclusions of those studies.
Stuttering of microsatellite loci might result in increased heterozygote deficits. The authors utilise the FIS (inbreeding coefficient) as a tool to compare the different treatments of the simulated populations. Their method detected stuttering in pangamic and selfing populations, while the detection of stuttering in clonal organisms is more difficult. The cure for stuttering resulted in FIS values similar to those populations lacking stuttering. The re-analysis of four previously published studies indicated that the new method presented here is more accurate than Microchecker (van Oosterhout et al., 2004) in a direct comparison. For the Lyme disease-transmitting tick Ixodes scapularis (De Meeûs et al., 2021), three loci showed stuttering and curing these resulted in data that are in good agreement with pangamic reproduction. In the tsetse fly Glossina palpalis palpalis (Berté et al., 2019), two out of seven loci were detected as stuttering. Curing them resulted in decreased FIS for one locus, while the other showed an increased FIS, an indication of other problems such as the occurrence of null alleles. Overall, in dioecious pangamic populations, the method works well, and the cure of stuttering improves population genetic parameter estimates, although FST and FIS might be slightly overestimated. In monoecious selfers, the detection and cure work well, if other factors such as null alleles do not interfere. In clonal organisms, only loci with extremely high FIS might need a cure to improve parameter estimates.
This spreadsheet-based method helps to automate microsatellite analysis at very low costs and thus improves the accuracy of parameter estimates. This might certainly be very useful for a range of non-model organisms, parasites, and their vectors, for which microsatellites are still the marker of choice.
Berté D, De Meeus T, Kaba D, Séré M, Djohan V, Courtin F, N'Djetchi KM, Koffi M, Jamonneau V, Ta BTD, Solano P, N’Goran EK, Ravel S (2019) Population genetics of Glossina palpalis palpalis in sleeping sickness foci of Côte d'Ivoire before and after vector control. Infection Genetics and Evolution 75, 103963. https://doi.org/0.1016/j.meegid.2019.103963
de Meeûs T, Chan CT, Ludwig JM, Tsao JI, Patel J, Bhagatwala J, Beati L (2021) Deceptive combined effects of short allele dominance and stuttering: an example with Ixodes scapularis, the main vector of Lyme disease in the U.S.A. Peer Community Journal 1, e40. https://doi.org/10.24072/pcjournal.34
de Meeûs T, Noûs C (2022) A simple procedure to detect, test for the presence of stuttering, and cure stuttered data with spreadsheet programs. Zenodo, v5, peer-reviewed and recommended by PCI Zoology. https://doi.org/10.5281/zenodo.7029324
Schlötterer C (2004) The evolution of molecular markers - just a matter of fashion? Nature Reviews Genetics 5, 63-69. https://doi.org/10.1038/nrg1249
van Oosterhout C, Hutchinson WF, Wills DPM, Shipley P (2004) MICRO-CHECKER: software for identifying and correcting genotyping errors in microsatellite data. Molecular Ecology Notes 4, 535-538. https://doi.org/10.1111/j.1471-8286.2004.00684.x
Michael Lattorff (2022) Improved population genetics parameters through control for microsatellite stuttering. Peer Community in Zoology, 100016. https://doi.org/10.24072/pci.zool.100016
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article. The authors declared that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.
Evaluation round #2
DOI or URL of the preprint: https://zenodo.org/record/6822544
Version of the preprint: v2
Author's Reply, 18 Aug 2022
Decision by Michael Lattorff, posted 15 Aug 2022
Dear Authors, the two reviewers of the manuscript during the first round of review are quite convinced about your revision and have just minor issues to be corrected. However, a third reviewer who was not available during the first round of review ios also quite convinced but has two issues that could be addressed or at least commented on, 1. the integration of analysis into R incl. an example for a binomial test on two alleles, and 2. a concern related to your simulations and conclusions related to strictly clonal populations. I would be grateful if you could address these smaller issues.
Reviewed by Thibaut Malausa, 05 Aug 2022
Reviewed by Thierry Rigaud, 13 Jul 2022
Reviewed by Fabien Halkett, 26 Jul 2022
Evaluation round #1
DOI or URL of the preprint: https://doi.org/10.5281/zenodo.5761550
Author's Reply, 12 Jul 2022
Decision by Michael Lattorff, posted 23 Jun 2022
we have now received two reviews on your manuscript. Both are positive, although one of the reviewers has some reservations. Please read the reviews carefully and refer to the points raised in your revision. Overall, both reviewers recommend shortening the text and bring out the added value.
We are looking forwward to receive your revcised version of the manuscript.