Computational Genomics and Structural Bioinformatics
Junwen John Wang, PhD
Assistant Professor, Department of Biochemistry
BEng (Huazhong Agric.); MSc (Penn, Jiangnan); PhD (UW-Seattle)
- Contact
- Email:
- Tel: (852) 2831 5075; Fax: (852) 2855 1254
- Office: L1-05E, Human Research Institute, 5 Sassoon Road, Hong Kong
Publications, Achievements, and Grants are available at: HKU Scholars Hub, and Google Scholar
Web Servers: ChIP-Array, EpiRegNet
Databases: gwasDB
Software: FastPval, co-evo, NRProF
Research Description:
We employ computational and biological approaches to study the relationship of biological sequence, structure and function. We focus on three areas:
Computational and transcriptional genomics: Defining core promoter and surrounding transcription factor binding sites (TFBS) is a crucial step toward understanding gene regulation. We have developed computational models to detect core promoters in the human genome. We have also defined DNA sequence motifs associated with the core promoter and explored their relations to known genetic networks. Recent studies showed that many genes have multiple promoters. We discovered that among these promoters, most 5' promoters are more likely to be located within a CpG island. We are exploring this finding both computationally and biochemically. Computationally, we investigate the structure and functional variations among different promoters regarding their TFBS composition, CpG islands and promoter specificity. The computational findings are verified biochemically by DNA mutagenesis (i.e., to introduce insertions and deletions to disrupt the TFBS) aiming to demonstrate the correlation between the presence of a TFBS and a promoter function. We are developing computational methods to discover the genetic and epigenetic signatures of human/mouse embryonic stem cell differentiation.
Structural bioinformatics: The primary sequence of a protein determines its secondary and higher-order structures. However, the rules governing this determination are still poorly understood. We are interested in the correlation between protein sequence and structure that can better define these rules. We have developed statistical methods to explore these correlations, and are using these methods to improve protein sequence alignments. We plan to develop computational tools to improve prediction of higher-order protein structure, which in turn will help us to construct protein-protein interaction networks. We are also interested in studying the evolutionary relationships between transcription factors and their binding sites. We are developing HMM based algorithms to model protein-DNA and protein-RNA interactions.
Genome variation and diseases: Single Nucleotide Polymorphism (SNP) and Copy Number Variation (CNV) are powerful tools to study genetic diseases, such as cancers in breast, colon and lung. There are more than 10 million SNPs in the human genome, but only a fraction have been associated with diseases. Discovering new disease-associated SNPs will improve prediction, prevention and therapy of these diseases. We have developed algorithms to detect the SNPs that are within the binding sites of transcription factors, or within a putative microRNA target. These SNPs are likely to alter normal gene regulation and causing diseases. We are developing new probabilistic models to improve detection of disease-associated SNPs and CNVs. In addition, we are developing analysis pipelines for The Cancer Genome Atlas (TCGA) project.
Position Available: Post-doctoral fellow in bioinformatics
Current Lab Members:
- Dr. Junwen John Wang; PI since March, 2008
- Mr. Weixin Jacky Wang, PhD student since Oct., 2009; BSc., ZJU
- Mr. Hari Krishna Yalamanchili, PhD student since Jan., 2010; BSc., JUIT; MSc., IIIT, India
- Ms. Jing Qin, RA since March, 2010; PhD student since Aug., 2010; BSc., ZJU; MPhil., CUHK
- Ms. Yan Wang, RA since March, 2010; PhD student since Aug., 2010; BSc., PKU
- Mr. Feng Xu, RA since Sept., 2010; PhD student since Sept., 2011; BSc., NEFU; MSc., NEFU
- Mr. Mulin Jun Li, RA since March, 2010; BSc., USTA; MSc., USTC
- Mr. Panwen Wang, RA since Oct, 2010; BSc., WHU; MSc., BUT
- Mr. Xiaorong Liu, RA since Feb., 2011; BSc., HNNU; MSc., CSU
- Mr. Zhao Liu, MPhil student from Statistics Dept. (joint with Dr. GD Li), since Sept., 2011; BSc. SDU
- Mr. Dongfang Zou, RA since Sept., 2011; BSc., NKU; MSc., CAS
- Dr. Alan Wing-Fu Lai, Post-doc associate since Oct., 2011; BSc., HKU; PhD., HKU
Past Lab Members:
- Mr. Shu Yang, MPhil student (Sept. 2008~July, 2011); now at UBC, Canada
- Dr. Kalpana Agrawal, part time RA (Nov. 2008~June, 2010);
- Mr. Xinran Li, undergraduate FYP (Aug. 2008~July, 2009); now at UMich, USA
- Mr. Zhanyong Wang, RA (Mar. 2009-July, 2009); now at UCLA, USA
- Mr. Po Lo Paul Chan, undergraduate FYP (Sept. 2009~May, 2010)
- Mr. Leung Hing Lok, undergraduate project student (Sept. 2009~May, 2010)
- Ms. Pony Chan, undergraduate FYP (Sept. 2010~May, 2011)
- Mr. Ocean Wong, undergraduate FYP (Sept. 2010~May, 2011)
Past Exchange Students/Summer Intern:
- Mr. Xueya Zhou (May, 2011), from Tsinghua University, China
- Ms. Ee Lyn Lim (Sept., 2010~Sept., 2010), from University of Oxford, UK
- Mr. Long Chan (July, 2010~Aug, 2010), from Carlton College, USA
- Mr. Kevin Mao (July, 2010~Aug, 2010), from Royal College of Surgeons in Ireland
- Ms. Tina Yuen (July, 2010~Aug, 2010), from Royal College of Surgeons in Ireland
- Ms. Vijitra Luang-In (July, 2010~Aug, 2010), from Imperial College London, UK
- Ms. Ruijuan Li (May, 2010), from Tsinghua University, China
- Mr. Yugang Hu (July, 2010), from NIBS, China
- Ms. Grace Yip (July, 2009~Aug, 2009), from Imperial College London, UK
Selected Publications (name in bold: lab member, *Corresponding author):
- Zhang G, Zhou B, Wang W, Zhang M, Zhao Y, Wang Z, Yang L, Zhai J, Feng CG, Wang JW*, and Chen X* (2012) A functional Single-Nucleotide Polymorphism in interleukin-6 promoter is associated with susceptibility to Tuberculosis. The Journal of Infectious Diseases, in press.
- Li MJ, Wang P, Liu X, Lim EL, Wang Z, Yeager M, Wong MP, Sham PC, Chanock S, and Wang JW* (2012) GWASdb: a database for human genetic variants identified by genome wide association studies. Nucleic Acids Research, 40(1):D1047-54.
- Wang JW* (2012) A database of genetic variants in microRNA genes and their putative functional roles in gene regulation. Human Mutation, 33(1):vii.
- Wang LY, Wang PW, Li MJ, Qin J, Wang XO, Zhang MQ, and Wang JW* (2011) EpiRegNet: constructing epigenetic regulatory networks from high throughput gene expression data for human. Epigenetics, 6(12):1505-12.
- Yang S, Yalamanchili HK, Li X, Yao KM, Sham PC, Zhang MQ, and Wang JW* (2011) Correlated evolution of transcription factors and their binding sites. Bioinformatics, 27(21):2972-2978.
- Yalamanchili HK, Xiao QW, and Wang JW* (2011) NRProF: Neural Response Based Protein Function Prediction Algorithm. IEEE International Conference on Systems Biology, 33-40.
- Wu HJ, Wu W, Sun HY, Qin GW, Wang HB, Wang PW, Yalamanchili HK, Wang JW, Tse HF, Lau CP, Vanhouttee PM, and Li GR. (2011) Acacetin causes a frequency- and use-dependent blockade of hKv1.5 channels by binding to the S6 domain. Journal of Molecular and Cellular Cardiology, 51(6):966-973.
- Zhang Y, Liao S, Yang M, Liang X, Poon MW, Wong CY, Wang JW, Zhou Z, Cheong SK, Lee CN, Tse HF, and Lian Q (2011) Improved Cell Survival and Paracrine Capacity of Human Embryonic Stem Cells-Derived Mesenchymal Stem Cells Promote Therapeutic Potential for Pulmonary Arterial Hypertension. Cell Transplantation, in press.
- Wang W, Wei Z, Lam T-W, and Wang JW* (2011) Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions. Scientific Reports, 1:55.
- Zhang G, Chen X, Chan L, Zhang M, Zhu B, Wang L, Zhu X, Zhang J, Zhou B, and Wang JW* (2011) An SNP selection strategy identified IL-22 associating with susceptibility to tuberculosis in Chinese. Scientific Reports, 1:20.
- Qin J, Li MJ, Wang P, Zhang MQ, and Wang JW* (2011) ChIP-Array: combinatory analysis of ChIP-seq/chip and microarray gene expression data to discover direct/indirect targets of a transcription factor. Nucleic Acids Research, 39:W430-436.
- Li MJ, Sham PC, and Wang JW* (2010) FastPval: a fast and memory efficient program to calculate very low p-values from empirical distribution. Bioinformatics, 26(22):2897-99.
- Wei F, Zaprazna K, Wang JW, and Atchison ML (2009) PU.1 Can Recruit BCL6 to DNA To Repress Gene Expression in Germinal Center B Cells. Molecular and Cellular Biology, 29(17):4612-4622.
- Tseng H, Chou W, Wang J, Zhang X, Zhang S, and Schultz RM (2008). Mouse ribosomal RNA genes contain multiple differentially regulated variants. PLoS One 3(3):e1843.

- Wang J*, Ungar LH, Tseng H, and Hannenhalli S (2007) MetaProm: a neural network based meta-predictor for alternative human promoter prediction. BMC Genomics 8, 374. (*Corresponding author)
- Zhang S, Wang J, and Tseng H (2007) Basonuclin regulates a subset of ribosomal RNA genes in HaCaT cells. PLoS ONE 2, e902.
- Hunter DJ, Kraft P, Jacobs KB, Cox DG, Yeager M, Hankinson SE, Wacholder S, Wang Z, Welch R, Hutchinson A, Wang J, Yu K, Chatterjee N, Orr N, Willett WC, Colditz GA, Ziegler RG, Berg CD, Buys SS, McCarty CA, Feigelson HS, Calle EE, Thun MJ, Hayes RB, Tucker M, Gerhard DS, Fraumeni JF Jr, Hoover RN, Thomas G, and Chanock SJ (2007) A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet 39, 870-874.
- Vardhanabhuti S, Wang J, and Hannenhalli S (2007) Position and distance specificity are important determinants of cis-regulatory motifs in addition to evolutionary conservation. Nucleic Acids Res 35, 3203-3213.
- Hannenhalli S, Putt ME, Gilmore JM, Wang J, Parmacek MS, Epstein JA, Morrisey EE, Margulies KB, and Cappola TP (2006) Transcriptional genomics associates FOX transcription factors with human heart failure. Circulation 114, 1269-1276.
- Wang J, and Hannenhalli S (2006) A mammalian promoter model links cis elements to genetic networks. Biochem Biophys Res Commun 347, 166-177.
- Wang J*, Zhang S*, Schultz RM, and Tseng H (2006) Search for basonuclin target genes. Biochem Biophys Res Commun 348, 1261-1271. (* joint first author)
- Wang J, and Hannenhalli S (2005) Generalizations of Markov model to characterize biological sequences. BMC Bioinformatics 6, 219.
- Wang J, and Feng JA (2005) NdPASA: a novel pairwise protein sequence alignment algorithm that incorporates neighbor-dependent amino acid propensities. Proteins 58, 628-637.
- Wang J, and Feng JA (2003) Exploring the sequence patterns in the alpha-helices of proteins. Protein Eng 16, 799-807.