Close printable page

Recommendation

Improved population genetics parameters through control for microsatellite stuttering

Michael Lattorff based on reviews by Thibaut Malausa, Fabien Halkett and Thierry Rigaud

A recommendation of:

A simple procedure to detect, test for the presence of stuttering, and cure stuttered data with spreadsheet programs

Thierry de Meeûs and Camille Noûs (2022), Zenodo, ver. 5 peer-reviewed and recommended by Peer Community in Zoology https://doi.org/10.5281/zenodo.7029324

Read preprint in preprint server Now published in Peer Community Journal

Data used for results

Scripts used to obtain or analyze results

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

A simple procedure to detect, test for the presence of stuttering, and cure stuttered data with spreadsheet programs

Microsatellite are powerful markers for empirical population genetics, but may be affected by amplification problems like stuttering that produces heterozygote deficits between alleles with one repeat difference. In this paper, we present a simple procedure that aims at detecting stuttering for each locus overall subsamples and only requires the use of a spreadsheet interactive application on any operating system. We compare the performances of this procedure with the one of MicroChecker on simulations of dioecious pangamic populations, monoecious selfing populations and clonal populations with or without stuttering, and on real data of vectors and parasites. We also propose a cure for loci affected and compare the results with those expected without stuttering. In sexual populations (dioecious or selfers), the new procedure appeared more than three times more efficient than MicroChecker. Cure was able to restore Wright's FIS of stuttered data to the expected value, and particularly so in selfing simulations. In clones, lack of segregation artificially increased false stuttering detection, and only highly significant stuttering tests and loci strongly deviating from others, could be usefully cured, in which case FIS estimate could be much improved. In doubt, and whenever possible, removal of affected and not curable loci may help to shift population genetics parameter estimates towards more reliable values.

Microsatellite markers; Short tandem repeats (STRs); Simple sequence repeats (SSRs); Polymerase chain reaction (PCR); Genotyping errors

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

إجراء بسيط لاكتشاف التأتأة واختبار وجودها وعلاج البيانات المتلعثمة باستخدام برامج جداول البيانات

تعد الأقمار الصناعية الدقيقة علامات قوية لعلم الوراثة السكانية التجريبية، ولكنها قد تتأثر بمشاكل التضخيم مثل التأتأة التي تنتج عجزًا في الزيجوت المتغاير بين الأليلات مع اختلاف متكرر واحد. في هذا البحث، نقدم إجراءً بسيطًا يهدف إلى اكتشاف التأتأة لكل موقع في العينات الفرعية الشاملة ويتطلب فقط استخدام تطبيق جدول بيانات تفاعلي على أي نظام تشغيل. قمنا بمقارنة أداء هذا الإجراء مع أداء MicroChecker في محاكاة المجموعات ثنائية المسكن، والمجموعات السكانية أحادية المسكن، والمجموعات النسيلية مع أو بدون تأتأة، وعلى البيانات الحقيقية للنواقل والطفيليات. ونقترح أيضًا علاجًا للمواقع المتضررة ومقارنة النتائج مع تلك المتوقعة دون التأتأة. في المجموعات الجنسية (ثنائية المسكن أو الذاتية)، بدا الإجراء الجديد أكثر كفاءة بثلاث مرات من MicroChecker. كان كيور قادرًا على استعادة بيانات رايت FIS المتعثرة إلى القيمة المتوقعة، وخاصة في عمليات المحاكاة الذاتية. في الحيوانات المستنسخة، أدى الافتقار إلى الفصل إلى زيادة اكتشاف التلعثم الكاذب بشكل مصطنع، ولا يمكن علاج سوى اختبارات التلعثم المهمة للغاية والمواضع التي تنحرف بقوة عن الآخرين، وفي هذه الحالة يمكن تحسين تقدير FIS كثيرًا. في حالة الشك، وكلما أمكن، قد تساعد إزالة المواقع المصابة وغير القابلة للعلاج في تحويل تقديرات المعلمات الوراثية السكانية نحو قيم أكثر موثوقية.

علامات الأقمار الصناعية الصغيرة. التكرارات الترادفية القصيرة (STRs)؛ تكرار التسلسل البسيط (SSRs)؛ تفاعل البلمرة المتسلسل (PCR) ؛ أخطاء التنميط الجيني

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Un procedimiento sencillo para detectar, probar la presencia de tartamudeo y curar datos tartamudos con programas de hoja de cálculo

Los microsatélites son marcadores potentes de la genética de poblaciones empírica, pero pueden verse afectados por problemas de amplificación como la tartamudez que produce déficits de heterocigotos entre alelos con una diferencia repetida. En este artículo, presentamos un procedimiento simple que tiene como objetivo detectar la tartamudez para cada submuestra general de locus y solo requiere el uso de una aplicación interactiva de hoja de cálculo en cualquier sistema operativo. Comparamos el rendimiento de este procedimiento con el de MicroChecker en simulaciones de poblaciones pangámicas dioicas, poblaciones autofecundadas monoicas y poblaciones clonales con o sin tartamudeo, y en datos reales de vectores y parásitos. También proponemos una cura para los loci afectados y comparamos los resultados con los esperados sin tartamudez. En poblaciones sexuales (dioicas o autónomas), el nuevo procedimiento parecía más de tres veces más eficiente que MicroChecker. Cure pudo restaurar el FIS de Wright de datos tartamudos al valor esperado, y particularmente en simulaciones autofecundas. En los clones, la falta de segregación aumentó artificialmente la detección de tartamudez falsa, y sólo las pruebas de tartamudez muy significativas y los loci que se desvían fuertemente de otros podrían curarse de manera útil, en cuyo caso la estimación FIS podría mejorarse mucho. En caso de duda, y siempre que sea posible, la eliminación de los loci afectados y no curables puede ayudar a cambiar las estimaciones de los parámetros genéticos de poblaciones hacia valores más fiables.

Marcadores de microsatélites; Repeticiones cortas en tándem (STR); Repeticiones de secuencia simple (SSR); Reacción en cadena de la polimerasa (PCR); Errores de genotipado

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Une procédure simple pour détecter, tester la présence du bégaiement et corriger les données saccadées avec des tableurs

Les microsatellites sont de puissants marqueurs de la génétique empirique des populations, mais peuvent être affectés par des problèmes d'amplification tels que le bégaiement qui produit des déficits hétérozygotes entre allèles avec une différence de répétition. Dans cet article, nous présentons une procédure simple qui vise à détecter le bégaiement pour chaque sous-échantillon global de locus et nécessite uniquement l'utilisation d'un tableur interactif sur n'importe quel système d'exploitation. Nous comparons les performances de cette procédure avec celle de MicroChecker sur des simulations de populations pangamiques dioïques, de populations monoïques autofécondées et de populations clonales avec ou sans bégaiement, et sur des données réelles de vecteurs et de parasites. Nous proposons également un remède aux loci touchés et comparons les résultats avec ceux attendus sans bégaiement. Dans les populations sexuelles (dioïques ou selfers), la nouvelle procédure s'est révélée plus de trois fois plus efficace que MicroChecker. Cure a pu restaurer le FIS de Wright des données saccadées à la valeur attendue, et particulièrement dans les simulations d'autofécondation. Chez les clones, le manque de ségrégation augmentait artificiellement la détection du faux bégaiement, et seuls les tests de bégaiement très significatifs et les locus fortement déviants des autres pourraient être utilement guéris, auquel cas l'estimation du FIS pourrait être considérablement améliorée. En cas de doute, et lorsque cela est possible, la suppression des loci affectés et non curables peut aider à déplacer les estimations des paramètres génétiques des populations vers des valeurs plus fiables.

Marqueurs microsatellites ; Répétitions courtes en tandem (STR) ; Répétitions de séquences simples (SSR) ; Réaction en chaîne par polymérase (PCR) ; Erreurs de génotypage

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

स्प्रेडशीट प्रोग्राम के साथ हकलाने वाले डेटा का पता लगाने, उसकी उपस्थिति का परीक्षण करने और हकलाने वाले डेटा को ठीक करने की एक सरल प्रक्रिया

माइक्रोसैटेलाइट अनुभवजन्य जनसंख्या आनुवंशिकी के लिए शक्तिशाली मार्कर हैं, लेकिन हकलाना जैसी प्रवर्धन समस्याओं से प्रभावित हो सकते हैं जो एक दोहराव अंतर के साथ एलील्स के बीच हेटेरोज़ीगोट घाटे का उत्पादन करते हैं। इस पेपर में, हम एक सरल प्रक्रिया प्रस्तुत करते हैं जिसका उद्देश्य प्रत्येक लोकस के समग्र उप-नमूनों के लिए हकलाना का पता लगाना है और इसके लिए केवल किसी भी ऑपरेटिंग सिस्टम पर स्प्रेडशीट इंटरैक्टिव एप्लिकेशन के उपयोग की आवश्यकता होती है। हम इस प्रक्रिया के प्रदर्शन की तुलना डायोसियस पैंगामिक आबादी, मोनोइकियस सेल्फिंग आबादी और हकलाने के साथ या बिना क्लोनल आबादी के सिमुलेशन और वैक्टर और परजीवियों के वास्तविक डेटा पर माइक्रोचेकर के साथ करते हैं। हम प्रभावित लोकी के लिए एक इलाज का भी प्रस्ताव करते हैं और हकलाने के बिना अपेक्षित परिणामों के साथ परिणामों की तुलना करते हैं। यौन आबादी (डायोसियस या सेल्फर्स) में, नई प्रक्रिया माइक्रोचेकर की तुलना में तीन गुना अधिक कुशल दिखाई दी। क्योर राइट के अटके हुए डेटा के एफआईएस को अपेक्षित मूल्य पर पुनर्स्थापित करने में सक्षम था, और विशेष रूप से सेल्फिंग सिमुलेशन में। क्लोनों में, पृथक्करण की कमी ने कृत्रिम रूप से झूठी हकलाहट का पता लगाने में वृद्धि की, और केवल अत्यधिक महत्वपूर्ण हकलाना परीक्षण और दूसरों से दृढ़ता से भटकने वाले लोकी को उपयोगी रूप से ठीक किया जा सकता है, इस मामले में एफआईएस अनुमान में काफी सुधार किया जा सकता है। संदेह में, और जब भी संभव हो, प्रभावित और इलाज योग्य नहीं लोकी को हटाने से जनसंख्या आनुवंशिकी पैरामीटर अनुमानों को अधिक विश्वसनीय मूल्यों की ओर स्थानांतरित करने में मदद मिल सकती है।

माइक्रोसैटेलाइट मार्कर; लघु अग्रानुक्रम दोहराव (एसटीआर); सरल अनुक्रम दोहराव (एसएसआर); पॉलिमरेज़ चेन रिएक्शन (पीसीआर); जीनोटाइपिंग त्रुटियाँ

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

スプレッドシートプログラムを使用して、途切れの有無を検出、テストし、途切れのあるデータを修正する簡単な手順

マイクロサテライトは、経験的な集団遺伝学の強力なマーカーですが、反復が 1 つ異なる対立遺伝子間にヘテロ接合体の欠損を引き起こす吃音などの増幅問題の影響を受ける可能性があります。この論文では、各軌跡全体のサブサンプルの吃音を検出することを目的とした簡単な手順を紹介します。この手順では、オペレーティングシステム上でスプレッドシートの対話型アプリケーションを使用するだけで済みます。この手順のパフォーマンスを、雌雄異株のパンガミック集団、雌雄異株の自家受粉集団、吃音の有無にかかわらずクローン集団のシミュレーション、およびベクターと寄生虫の実際のデータで MicroChecker のパフォーマンスと比較します。また、影響を受けた遺伝子座の治療法を提案し、その結果を吃音なしで期待される結果と比較します。性的集団（雌雄異株または自家雄）では、新しい手順は MicroChecker よりも 3 倍以上効率的であることがわかりました。 Cure は、特にセルフフィングシミュレーションにおいて、途切れたデータの Wright の FIS を期待値に復元することができました。クローンでは、分離の欠如により誤った吃音検出が人為的に増加し、非常に重要な吃音テストおよび他から大きく逸脱した遺伝子座のみが有効に修復でき、その場合、FIS 推定値は大幅に改善される可能性があります。疑わしい場合は、可能であればいつでも、影響を受けている治癒不可能な遺伝子座を削除すると、集団遺伝学パラメーターの推定値をより信頼できる値にシフトするのに役立つ可能性があります。

マイクロサテライトマーカー。ショートタンデムリピート（STR）。単純配列反復 (SSR)。ポリメラーゼ連鎖反応 (PCR);ジェノタイピングエラー

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Um procedimento simples para detectar, testar a presença de gagueira e corrigir dados gaguejados com programas de planilhas

Os microssatélites são marcadores poderosos para a genética empírica de populações, mas podem ser afetados por problemas de amplificação, como a gagueira, que produz déficits heterozigotos entre alelos com uma diferença de repetição. Neste artigo, apresentamos um procedimento simples que visa detectar gagueira para cada subamostra geral de locus e requer apenas o uso de um aplicativo de planilha interativa em qualquer sistema operacional. Comparamos o desempenho deste procedimento com o do MicroChecker em simulações de populações pangâmicas dióicas, populações monóicas de autofecundação e populações clonais com ou sem gagueira, e em dados reais de vetores e parasitas. Também propomos uma cura para os loci afetados e comparamos os resultados com os esperados sem gagueira. Em populações sexuadas (dióicas ou autônomas), o novo procedimento pareceu mais de três vezes mais eficiente que o MicroChecker. Cure foi capaz de restaurar o FIS de dados interrompidos de Wright para o valor esperado, e particularmente em simulações de autofecundação. Nos clones, a falta de segregação aumentou artificialmente a detecção de falsa gagueira, e apenas testes de gagueira altamente significativos e loci fortemente desviantes de outros poderiam ser curados de forma útil, caso em que a estimativa do FIS poderia ser muito melhorada. Na dúvida, e sempre que possível, a remoção de loci afetados e não curáveis pode ajudar a mudar as estimativas dos parâmetros genéticos populacionais para valores mais confiáveis.

Marcadores microssatélites; Repetições curtas em tandem (STRs); Repetições de sequência simples (SSRs); Reação em cadeia da polimerase (PCR); Erros de genotipagem

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Простая процедура обнаружения, проверки наличия заиканий и устранения заиканий данных с помощью программ электронных таблиц.

Микросателлиты являются мощными маркерами для эмпирической популяционной генетики, но на них могут влиять проблемы амплификации, такие как заикание, которое приводит к дефициту гетерозигот между аллелями с одной разницей в повторах. В этой статье мы представляем простую процедуру, которая направлена на обнаружение заиканий для каждой подвыборки локуса и требует только использования интерактивного приложения для работы с электронными таблицами в любой операционной системе. Мы сравниваем эффективность этой процедуры с эффективностью MicroChecker на моделировании двудомных пангамных популяций, однодомных самоопыляющихся популяций и клональных популяций с заиканием или без него, а также на реальных данных о переносчиках и паразитах. Мы также предлагаем лечение пораженных локусов и сравниваем результаты с ожидаемыми без заикания. В половых популяциях (раздельнополых или самостоятельных) новая процедура оказалась более чем в три раза эффективнее, чем MicroChecker. Cure удалось восстановить FIS Райта для заикающихся данных до ожидаемого значения, особенно в симуляциях самообучения. В клонах отсутствие сегрегации искусственно увеличивало ложное обнаружение заикания, и только очень значимые тесты на заикание и локусы, сильно отклоняющиеся от других, могли быть успешно вылечены, и в этом случае оценка FIS могла быть значительно улучшена. Сомнительно, и когда это возможно, удаление пораженных и неизлечимых локусов может помочь сдвинуть оценки параметров популяционной генетики в сторону более надежных значений.

Микросателлитные маркеры; Короткие тандемные повторы (STR); Простые повторы последовательностей (SSR); Полимеразная цепная реакция (ПЦР); Ошибки генотипирования

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

一个简单的程序，用于检测、测试是否存在口吃，并使用电子表格程序修复口吃数据

微卫星是经验群体遗传学的强大标记，但可能会受到诸如口吃等扩增问题的影响，口吃会在具有一个重复差异的等位基因之间产生杂合子缺陷。在本文中，我们提出了一个简单的程序，旨在检测每个位点整体子样本的口吃，并且只需要在任何操作系统上使用电子表格交互式应用程序。我们将这一程序与 MicroChecker 的性能进行比较，模拟雌雄异株的泛性种群、雌雄同株的自交种群和有或没有口吃的克隆种群，以及载体和寄生虫的真实数据。我们还提出了治疗受影响位点的方法，并将结果与预期的不口吃的结果进行比较。在有性群体（雌雄异体或自交）中，新程序的效率比 MicroChecker 高出三倍多。 Cure 能够将断断续续的数据的赖特 FIS 恢复到预期值，尤其是在自交模拟中。在克隆中，缺乏分离人为地增加了假口吃检测，并且只有高度显着的口吃测试和与其他基因座严重偏离的基因座才能被有效地治愈，在这种情况下，FIS 估计可以得到很大的改善。如有疑问，只要有可能，去除受影响的和不可治愈的位点可能有助于将群体遗传参数估计值转向更可靠的值。

微卫星标记；短串联重复序列（STR）；简单重复序列（SSR）；聚合酶链式反应（PCR）；基因分型错误

Submission: posted 06 December 2021
Recommendation: posted 23 August 2022, validated 28 August 2022

Cite this recommendation as:
Lattorff, M. (2022) Improved population genetics parameters through control for microsatellite stuttering. Peer Community in Zoology, 100016. https://doi.org/10.24072/pci.zool.100016

Recommendation

Molecular markers have drastically changed and improved our understanding of biological processes. In combination with PCR, markers revolutionized the study of all organisms, even tiny insects, and eukaryotic pathogens amongst others. Microsatellite markers were the most prominent and successful ones. Their success started in the early 1990s. They were used for population genetic studies, mapping of genes and genomes, and paternity testing and inference of relatedness. Their popularity is based on some of their characteristics as codominance, the high polymorphism information content, and their ease of isolation (Schlötterer 2004). Still, microsatellites are the marker of choice for a range of non-model organisms as next-generation sequencing technologies produce a huge amount of single nucleotide polymorphisms (SNPs), but often at expense of sample size and higher costs.

The high level of polymorphism of microsatellite markers, which consist of one to six base-pair nucleotide motifs replicated up to 10 or 20 times, results from slippage events during DNA replication. Short hairpin loops might shorten the template strand or extend the new strand. However, such slippage events might occur during PCR amplification resulting in additional bands or peaks. Such stutter alleles often appear to differ by one repeat unit and might be hard to interpret but definitively reduce automated scoring of microsatellite results.

A standalone software package available to handle stuttering is Microchecker (van Oosterhout et al., 2004, which nowadays faces incompatibilities with updated versions of different operating systems. Thus, de Meeûs and Noûs (2022), in their manuscript, tackled the stuttering issue by developing an OS-independent analysis pipeline based on standard spreadsheet software such as Microsoft Office (Excel) or Apache Open Office (Calc). The authors use simulated populations differing in the mating system (pangamic, selfing (30%), clonal) and a different number of subpopulations and individuals per subpopulation to test for differences among the null model (no stuttering), a test population with 2 out of 20 loci (10%) with stuttering, and the latter with stuttering cured. Further to this, the authors also re-analyse data from previous studies utilising organisms differing in the mating system to understand whether control of stuttering changes major parameter estimates and conclusions of those studies.

Stuttering of microsatellite loci might result in increased heterozygote deficits. The authors utilise the FIS (inbreeding coefficient) as a tool to compare the different treatments of the simulated populations. Their method detected stuttering in pangamic and selfing populations, while the detection of stuttering in clonal organisms is more difficult. The cure for stuttering resulted in FIS values similar to those populations lacking stuttering. The re-analysis of four previously published studies indicated that the new method presented here is more accurate than Microchecker (van Oosterhout et al., 2004) in a direct comparison. For the Lyme disease-transmitting tick Ixodes scapularis (De Meeûs et al., 2021), three loci showed stuttering and curing these resulted in data that are in good agreement with pangamic reproduction. In the tsetse fly Glossina palpalis palpalis (Berté et al., 2019), two out of seven loci were detected as stuttering. Curing them resulted in decreased FIS for one locus, while the other showed an increased FIS, an indication of other problems such as the occurrence of null alleles. Overall, in dioecious pangamic populations, the method works well, and the cure of stuttering improves population genetic parameter estimates, although FST and FIS might be slightly overestimated. In monoecious selfers, the detection and cure work well, if other factors such as null alleles do not interfere. In clonal organisms, only loci with extremely high FIS might need a cure to improve parameter estimates.

This spreadsheet-based method helps to automate microsatellite analysis at very low costs and thus improves the accuracy of parameter estimates. This might certainly be very useful for a range of non-model organisms, parasites, and their vectors, for which microsatellites are still the marker of choice.

References

Berté D, De Meeus T, Kaba D, Séré M, Djohan V, Courtin F, N'Djetchi KM, Koffi M, Jamonneau V, Ta BTD, Solano P, N’Goran EK, Ravel S (2019) Population genetics of Glossina palpalis palpalis in sleeping sickness foci of Côte d'Ivoire before and after vector control. Infection Genetics and Evolution 75, 103963. https://doi.org/0.1016/j.meegid.2019.103963

de Meeûs T, Chan CT, Ludwig JM, Tsao JI, Patel J, Bhagatwala J, Beati L (2021) Deceptive combined effects of short allele dominance and stuttering: an example with Ixodes scapularis, the main vector of Lyme disease in the U.S.A. Peer Community Journal 1, e40. https://doi.org/10.24072/pcjournal.34

de Meeûs T, Noûs C (2022) A simple procedure to detect, test for the presence of stuttering, and cure stuttered data with spreadsheet programs. Zenodo, v5, peer-reviewed and recommended by PCI Zoology. https://doi.org/10.5281/zenodo.7029324

Schlötterer C (2004) The evolution of molecular markers - just a matter of fashion? Nature Reviews Genetics 5, 63-69. https://doi.org/10.1038/nrg1249

van Oosterhout C, Hutchinson WF, Wills DPM, Shipley P (2004) MICRO-CHECKER: software for identifying and correcting genotyping errors in microsatellite data. Molecular Ecology Notes 4, 535-538. https://doi.org/10.1111/j.1471-8286.2004.00684.x

PDF recommendation

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article. The authors declared that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.

Funding:
This work was financed by ongoing funding of the IRD

Reviews

Evaluation round #2

DOI or URL of the preprint: https://zenodo.org/record/6822544

Version of the preprint: v2

Author's Reply, 18 Aug 2022

Download author's reply https://doi.org/10.24072/pci.zool.100122.ar2

Decision by Michael Lattorff, posted 15 Aug 2022

Dear Authors, the two reviewers of the manuscript during the first round of review are quite convinced about your revision and have just minor issues to be corrected. However, a third reviewer who was not available during the first round of review ios also quite convinced but has two issues that could be addressed or at least commented on, 1. the integration of analysis into R incl. an example for a binomial test on two alleles, and 2. a concern related to your simulations and conclusions related to strictly clonal populations. I would be grateful if you could address these smaller issues.

Kind regards

Michael Lattorff

https://doi.org/10.24072/pci.zool.100122.d2

Reviewed by Thibaut Malausa, 05 Aug 2022

The authors have substantially modified the manuscript. I found the updated version, which is significantly shorter, much easier to read. A few adjustments in the abstract, introduction and conclusion also better highlight the strengths of the proposed tool.

The authors have also very carefully taken into account the comments from the two reviewers and I found their responses to the several questions or criticisms convincing.

I just recommend the authors to check the new text parts for minor issues (e.g. line 672 of the version with track change: "occut" should be replaced by "occur"; "need being" is also written several times and I suppose it could be replaced by "need to be" or "will have to be", etc.).

https://doi.org/10.24072/pci.zool.100122.rev21

Reviewed by Thierry Rigaud, 13 Jul 2022

I have no further comment. The paper has been improved and made more compact. It has therefore gained in efficiency.

https://doi.org/10.24072/pci.zool.100122.rev22

Reviewed by Fabien Halkett, 26 Jul 2022

This manuscript by Dr. T. De Meeûs presents and details a new implementation to test for stuttering pattern among microsatellite locus.

This manuscript is a revision. I did not take part in the first round of review. I have read the revised file and found it to be worthy of recommendation as it addresses an important topic for empirical population genetic studies and is a valuable contribution with the proposed methodology. I have also read the author’s response letter and found that the author has convincingly addressed all the concerns raised by the previous reviewers (Dr Thibaut Malausa and Dr. Thierry Rigaud). I do, however, have a few additional comments and suggestions that might help to further improve the article.

Both reviewers and the authors agree to state that microsatellites are (still) powerful markers to conduct sound population genetic study. One of the strength of microsatellite lies especially in the amount of information we have on the putative bias linked to empirical data acquisition and how to correct for these biases in appropriately curing the data. This curating step (according to relevant procedures) it is at utmost importance in order to perform accurate inference in population genetics. The opinions of the two reviewers diverge on how the method proposed can be applied to genomic data. I agree that it is important that the proposed method be versatile enough to be applied to a handful of microsat markers analyzed on an Excel sheet but also to a bio-informatic pipeline of NGS derived microsat typing. I was curious to test this procedure on my own data set, and even if I handled a reduced dataset I found more convenient to translate the procedure into the R language. I am not a skillfull R programmer, but I manage to write down a small script that do the job quite well, I think. It is convenient to use, as a starting point of this analyses, the output file generated by Genepop that summarize basic information, including the table of genotype frequencies, in the form:

Genotypes Obs. Expected

148 , 148 3 2.1801

151 , 148 1 2.1801

151 , 151 2 0.4845

Then in few lines on R it is possible to conduct the binomial test proposed by Dr. T. De Meeûs on the class of heterozygote genotypes that possess two alleles differing by only one repeat.

geno_table <- read.table("stutter_test.txt", header = T)

geno_table$diff = geno_table$geno1 - geno_table$geno2

range_repeats <- unique(geno_table$diff)

Obs <- vector(mode ="numeric", length = length(range_repeats))

Exp <- vector(mode ="numeric", length = length(range_repeats))

for (i in range_repeats) {

Obs[i+1] <- sum(geno_table[which(geno_table$diff==i),]$obs)

Exp[i+1] <- sum(geno_table[which(geno_table$diff==i),]$exp)

}

binom.test(c(round(Exp[2],0), Obs[2]), n=2, p=0.5)

May be it is possible to expand on it (it misses the loop on all locus across populations), but it could worth at least including these few lines in an appendix to the article. Hopefully, someone in the PCI forum will propose a R package but I am not sure it is mandatory for the use of this method in R.

My second main comment is more scientific. The main strength of the study by Dr. T. De Meeûs is to test the application of his method on simulated data generated after different reproductive modes. The method works very well for selfing, and it is clearly stated. The case of clonality is more puzzling, and I agree that even in the revised version it is still quite hard to follow. It could be worth stating in the introduction that this reproductive mode magnifies the effect of genetic drift due to the lack of segregation (Stoeckel & Masson, 2014; Reichel et al., 2016). This generate a large variability across loci that is indeed a pitfall for this stuttering test. With this respect, it is important to note that the author test the case of full clonal populations. The variability across loci is even magnified when considering mixture of clonal and sexual reproduction (clonal rate of 0.9 and above, Stoeckel & Masson 2014; Balloux et al., 2003). I am thus not convinced with the statement lines 632-634 (document with revision marks): “In partial clones, and given the lack of accuracy of the expected number of heterozygotes with one repeat difference, using the panmictic expectations will probably display better performances.” I think it is important for practical considerations that the author comment on that point or provide other simulation results.

For minor points see the annotated pdf that I upload with the review.

References cited

Reichel, K., Masson, J.-P., Malrieu, F., Arnaud-Haond, S., & Stoeckel, S. (2016). Rare sex or out of reach equilibrium? The dynamics of F IS in partially clonal organisms. BMC Genetics, 17(1). doi: 10.1186/s12863-016-0388-z

Stoeckel, S., & Masson, J.-P. (2014). The Exact Distributions of FIS under Partial Asexuality in Small Finite Populations with Mutation. PLoS ONE, 9(1), e85228. doi: 10.1371/journal.pone.0085228

Balloux, F., Lehmann, L., & Meeûs, T. de. (2003). The Population Genetics of Clonal and Partially Clonal Diploids. Genetics, 164(4), 1635–1644.

Download the review https://doi.org/10.24072/pci.zool.100122.rev23

Evaluation round #1

DOI or URL of the preprint: https://doi.org/10.5281/zenodo.5761550

Author's Reply, 12 Jul 2022

Download author's reply Download tracked changes file https://doi.org/10.24072/pci.zool.100122.ar1

Decision by Michael Lattorff, posted 23 Jun 2022

Dear authors

we have now received two reviews on your manuscript. Both are positive, although one of the reviewers has some reservations. Please read the reviews carefully and refer to the points raised in your revision. Overall, both reviewers recommend shortening the text and bring out the added value.

We are looking forwward to receive your revcised version of the manuscript.

Kind regards,

Michael

https://doi.org/10.24072/pci.zool.100122.d1

Reviewed by Thibaut Malausa, 05 Apr 2022

This article presents and tests a method for the detection of stutters in microsatellite genotyping projects. It also benchmarks this method and a data correction method proposed by de Meeûs et al. 2019 using several datasets from several case studies.

The proposed detection method is exposed in details and the technical choices at each step of the design are well justified. The added value of the proposed tool is documented and this method represents a decent alternative to the software Microchecker developed in 2003 and proving more and more incompatible with most systems. As such, this method deserves publication and can be of interest for teams using microsatellites.

However, as reader and possible user of such method, I must acknowledge that I am not convinced that I would use or recommend the proposed tools. I agree that microsatellite markers are still cost-efficient and informative for a wide range of projects (although the frequency of their use has decreased over years). However, next-generation sequencing has considerably facilitated the design of microsatellite markers, enabling to be very exigent in terms of markers’ quality without spending much money. Hence, I am not sure that tools for stutter management are currently a top priority for most research teams. The other argument limiting my enthusiasm is related to the apparent ergonomy and extent of added-value of the tool. The advantage of the method is its simplicity and portability over operating systems. However, it does not seem straightforward to automatise it or incorporate it in a high-throughput workflow. If I am mistaken on this point, I would recommend to be more explicit in the ms about the possibilities of use of the tool. In terms of added-value, I found that overall the ms did not underline a large added-value provided by its use.

In terms of form of the article, I also think that the text could be much shortened to convert the ms into a short and smooth methodological paper. In my opinion, some elements of contexts and some detailed results and interpretations provide limited added-value while considerably complexifying the reading. Also, the text would probably be easier to read with a section pooling “Results & Discussion”: adding a small interpretation and “take-home message” after each section of the results would likely lead to little or no loss when compared to the current discussion, and it would avoid many repetitions and statements that are too far from the result section to be easily understood.

Finally, I have a series of remarks, suggestions and comments, listed below:

· L20-21: This statement does not look self-explanatory. This may be inserted and explained in the introduction but I would suggest to remove it from the abstract.

· L27-32: I found this summary of the results hard to follow. The statements “work well”, “not perfectly”, “improve parameters” or “behaviour of their variations” sound fuzzy. Referring to more detailed results (comparisons of a set of criteria or indices used to benchmark the methods) may be more informative and clear. I would suggest to re-write the second part of the abstract.

· L41-42: I see reasons why microsatellites are useful for non-model species, but why are they particularly useful for small organisms and vectors? This may be explained here.

· L43 : represent

· L48: kinds?

· L50-60: I wonder to what extent it is useful to make a focus on SNPs here. It kind of breaks the train of ideas while providing little added-value (the article is not really about comparing SSR to SNP and sequencing).

· L56-60: The idea is clear but this sentence may be reorganized to be easier to read.

· L60-65: I still do not understand why this is particularly true for parasitic organisms and their vectors. Many laboratories on many topics lack resources to carry out large genotyping projects. I would say that the constraint might be lower for laboratories working on species with high economic or health impact, so this would go against the statement in this sentence.

· L66: On ? (is it correct to write « Polymerase Chain Reaction of the targeted DNA strand”?)

· L83-88: I think it is not necessary to provide so many details. Stating that “Microchecker was developed in 2003 and displays incompatibility issues with most current systems” is sufficient to convince the readers.

· L114 : 10,000

· L120 : simulation

· L120 : 10,000

· L121-122: Why 20? Whenever possible, people usually try to genotype 30-40 samples to get more reliable estimates of population genetics indices.

· L134 : Identity

· L136-137: What are the reasons behind this choice?

· L138-139: When less than 20 alleles were present, how many loci were recoded? 2 or 10%?

· L139-140: This is probably not very important here, but in my experience, this is generally not the most realistic option. In general, stutters affect more than 10% of the alleles and sizes of stutters increase as the fragment size increases. Hence, most of the largest alleles are generally affected.

· L148: I do not think it is indispensable to keep the formula in French. Most readers are not interested in the formula in French and French readers will easily translate the formula.

· L158 : 10,000

· L197: Same remark as before: I do not see the interest of providing the formula in French

· L203-204: Same remark as before: I do not see the interest of providing the formula in French

· L209: Same remark as before: I do not see the interest of providing the formula in French

· L214: Same remark as before: I do not see the interest of providing the formula in French

· L220: Same remark as before: I do not see the interest of providing the formula in French

· L226-227: Same remark as before: I do not see the interest of providing the formula in French

· L297: At this stage, the reader can hardly remember what is BH. I suggest to be explicit. It will not increase much the size of the text and will be easier to read.

· L327: What is SAD?

· L331: Is it really obvious for the reader at this stage?

· L359: I suggest to remove “The performance of” as it implicitly sates that detecting stuttering is positive/successful while it is not the case under H0.

· L360: Placing comas around “respectively” may facilitate the reading

· L361: Reminding the H0 would help the reader here

· L363-364: See last comment: it would be better to be explicit earlier in the paragraph

· L370-372: This sounds already like a discussion. The previous paragraph was already clear and referring to Table 1 in this paragraph looks sufficient

· L382-383: Replace « the power to detect stuttering” by “stuttering detection”?

· L396: Again, the term « performance” seems to me misleading as it can be implicitly understood as a valuable property (while it is not under H0).

· L407: Replace « seemed” by “was”?

· L472-L486: I found this entire subsection “Clonal populations” is hard to follow. I think this comes from the first sentence that contains the statement “significant stuttering signature” that I do not find explicit; and at the end of the section I do not see as self-explanatory the fact that average FIS CI are consistent with the expectations

· L474-476: This second statement is relative to the H1 I suppose?

· L489-490 : This first sentence seems little useful

· L531-592: I find this section far too detailed. I would find it much more clear and convincing if it could be shorter with a focus on the qualitative/quantitative comparisons between the results obtained using the several methods.

· L540 : translate “Côte d’Ivoire”?

· L541: “authorS”?

· Discussion : in my opinion, this text would be much easier to read with a section pooling “Results & Discussion”. Overall, I think that adding a small interpretation and “take-home message” after each section of the results would result in little or no loss when compared to the current discussion, and it would avoid many repetitions and statements that are too far from the result section to be easily understood.

· L595: This statement is confusing (I also had the same feeling when reading the introduction): are we talking about the results under H0 (hence, this is a positive feature), under H1 (negative feature), or overall?

· L598: Reminding some results may be more convincing here

· L685: Who is JBR ?

https://doi.org/10.24072/pci.zool.100122.rev11

Reviewed by Thierry Rigaud, 06 May 2022

Despite the rise of NGS and genomics, microsatellite genetic markers remain useful in population genetics studies. Due to their long-standing use, we now have an excellent perspective on the advantages and disadvantages of their use. One caveat of their usage is stuttering during the amplification process, which produces artificially alleles with one repeat difference, generating artificial heterozygote deficits in population genetics studies. This paper proposes a new method to detect stuttering in microsatellite data. This paper is an important contribution to the field because, as well described in the text, this new method is more efficient in most cases than the only tool presently available (MicroChecker). This study is therefore useful and timely.

The paper present the method, compare its efficiency with MicroShecker, but also examine consequences on various F statistics. This is done by a combination of simulations and tests on real datasets. The paper provide results of simulations for exploring the detection of stuttering, and answer the question: is the new method improve detection? The answer is often “yes”, but high proportions of false stuttering detections were found in clonal organisms. Then the consequences of stuttering of F statistics in population genetics (FIS, FST) are explored. Interestingly, 10% stuttering induce significant deficit of heterozygotes, especially in populations with selfing, but stuttering do not influence occurrence of linkage disequilibrium between pairs of loci.

I found the paper well written (albeit sometimes a bit too detailed, see comments below) and methodology accurate. I only have few comments aiming to improve (I hope) the reading of the paper and the usage of the method. Please find below these comments:

1- My main comment is about the description of the alternative method, L. 190-260. Instead of the long and fastidious description of what and where paste the results from Fstat and formula, why not providing a template (or example) on an excel sheet as supplementary material? This would fit the paper tittle

2- The end of the first paragraph of the introduction is a kind of mix between different ideas economic impact of the non-model diseases / Constraints for developing genetic markers. It is a bit hard to follow. I would suggest rewriting and separate these two things.

3- L. 81. Please explain why a “global test might be more …/… robust”

4- L. 134: “identitiy” should be “identity”

5- L. 306-308, about curing data. I do not understand why this “otherwise” procedure is made here, but not in cases where alleles were less rare. In other words, why forcing the grouping with more than one repeat difference? Why not leave them as they are? (is it making a difference?)

6- L. 475-477. I think the sentence is not complete (the proportions increase, but why are they increasing?)

7- L. 532. I do not understand this beginning of sentence since the following sentence show that there are differences.

8- L. 557-561 (and in other examples). All these list of alleles are very specific. Can you please recall here the reference of where to find the data?

9- L. 640. Change “tried out” by “considered” (?)

10- L. 675. Remove “;”

https://doi.org/10.24072/pci.zool.100122.rev12