Re-examination of genome imputation analysis of whole genome sequencing data
- Author(s)
- 서윤종
- Issued Date
- 2024
- Abstract
- Re-examination of genome imputation analysis of whole genome sequencing data Yoonjong Seo Advisor: Prof. Jungsoo Gim, Ph.D. Department of Integrative Biological Sciences Graduate School of Chosun University Genome imputation analysis is the standard procedure in genetic analysis for exploring associations between the genome and various phenotypes. However, despite the utility and importance of genome imputation analysis, many genetically homogeneous minority populations exist in small proportions in the reference panel, and only limited performance evaluation studies have been conducted. In this study, we analyzed how well the imputation results approximate whole-genome sequence (WGS) using Koreans as an example of a genetically homogeneous minority population, utilizing both a large dataset of 2,253 whole-genome sequencing and genotype array data for more accurate and meaningful performance assessment. For the imputation, we selected four reference genome panels, considering the characteristics of each panel commonly used in the field: a Korean reference panel, Haplotype Reference Consortium (HRC), 1000 Genome, and Trans-Omics for Precision Medicine (TOPMed). As expected, the results using the Korean reference panel outperformed all other reference panels in terms of all performance metrics. Particularly, it exhibited overwhelming accuracy, especially for variants with a minor allele frequency (MAF) of less than 1%, when compared to other reference panels. When using the pipeline from the Michigan Imputation Service, we observed cases where the called genotypes were corrected based on the imputed genotypes. In these cases, the Korean reference panel showed the lowest errors in genotype correction compared to the other panels. In the genome imputation results using the Korean reference panel with the best performance, we identified variants that were not called in the WGS data. Among these, 34.7% were determined to be filtered variants that did not meet quality threshold criteria during the WGS variant calling process. The outstanding performance of genome imputation using the Korean reference panel in the genetically homogeneous minority population of Koreans highlights the importance of developing ethnic-specific reference panels for the full utilization of genome imputation analysis. This also suggests new applications of genome imputation in Deep WGS.
- Alternative Title
- 전유전체서열분석 (WGS)자료의 유전체 대치분석 필요성 재고
- Alternative Author(s)
- Yoonjong Seo
- Affiliation
- 조선대학교 일반대학원
- Department
- 일반대학원 글로벌바이오융합학과
- Advisor
- 김정수
- Awarded Date
- 2024-02
- Table Of Contents
- LIST OF TABLES Ⅱ
LIST OF FIGURES Ⅲ
ABSTRACT Ⅳ
Ⅰ. INTRODUCTION 1
Ⅰ-1. Whole genome sequence and genotype array 1
Ⅰ-2. Genome imputation 1
Ⅰ-3. Related studies 2
Ⅰ-4. Research purpose 3
Ⅱ. MATERIALS AND METHODS 4
Ⅱ-1. Genotype array data 4
Ⅱ-2. WGS data 4
Ⅱ-3. Reference panels for genome imputation 5
Ⅱ-4. Genome imputation 7
Ⅱ-5. Performance measure of imputation result 7
Ⅲ. RESULTS 9
Ⅲ-1. Study overview 9
Ⅲ-2. Imputation result 11
Ⅲ-3. Imputation performance 14
Ⅲ-4. Correction of genotyped SNPs 19
Ⅲ-5. Genome imputation with WGS 21
Ⅳ. DISCUSSION 24
Ⅴ. 초 록 26
Ⅵ. REFERENCES 28
Ⅶ. APPENDIX 30
- Degree
- Master
- Publisher
- 조선대학교 대학원
- Citation
- 서윤종. (2024). Re-examination of genome imputation analysis of whole genome sequencing data.
- Type
- Dissertation
- URI
- https://oak.chosun.ac.kr/handle/2020.oak/18642
http://chosun.dcollection.net/common/orgView/200000720270
-
Appears in Collections:
- General Graduate School > 3. Theses(Master)
- Authorize & License
-
- AuthorizeOpen
- Embargo2024-02-23
- Files in This Item:
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.