Illumina User Group meeting in Phuket 2010 abstract by Theragen BiO Insititute
Large scale bioinformatics analysis of genomic sequences
12th April Phuket Thailand.
by Jong Bhak
In Dec. 2008, we reported and announced the first Korean human genome (SJK) sequence using Illumina GA2. The diploid genome of a Korean male was sequenced to 28.95-fold redundancy
using the Illumina paired-end sequencing method. SJK covered 99.9% of the NCBI human reference genome. We identified 420,083 novel single nucleotide polymorphisms (SNPs) that are not in the dbSNP database.
Despite a close similarity, significant differences were observed between the Chinese genome (YH), the only other Asian genome available, and SJK: (1) 39.87% (1,371,239 out of 3,439,107) SNPs were SJK-specific (49.51% against Venter's, 46.94% against Watson's, and 44.17% against the Yoruba genomes); (2) 99.5% (22,495 out of 22,605) of short indels (< 4 bp) discovered on the same loci had the same size and type as YH; and (3) 11.3% (331 out of 2920) deletion structural variants were SJK-specific. Even after attempting to map unmapped reads of SJK to unanchored NCBI scaffolds, HGSV, and available personal genomes, there were still 5.77% SJK reads that could not be mapped. All these findings indicate that the overall genetic differences among individuals from closely related ethnic groups may be significant.
The whole analysis consisted of two steps. One is sequencing and the other was bioinformatics analysis. We had a cluster of CPUs that ran automated pipelines for analyzing the NGS data. As a commercial development of such an NGS analysis system, Theragen produced "HelloGenome" service which is a full length genome sequencing service. It is the first fully commercial service in Asia.