-
滇川干热河谷地处我国西南生物多样性中心横断山脉的南缘,主要包括元江、怒江、金沙江和澜沧江四大干热河谷区,因地理位置、地形地貌及气候等因素,表现出独具特色的干、热生态特点[1]。由于气候和人类活动等因素的影响,干热河谷区域植被的原生类型几乎不复存在,现存的主要是与非洲萨王纳植被(即热带草原植被)具有一定相似性的半稀树草原植被[1],有的地方已变成光秃裸露的荒山,生物多样性和生态系统的稳定性遭到破坏,水土流失严重。生长在该地区的植物类群对这种严酷的干热生境形成了很好的适应性[2], 余甘子就是其中典型的代表。
余甘子(Phyllanthus emblica L.)隶属叶下珠科(Phyllanthaceae)叶下珠属(Phyllanthus),是一种分布于热带和亚热带地区的重要食药同源经济树种,被世界卫生组织列为在世界范围内推广种植的三种保健植物之一[3]。其叶片、根和果实均含有维生素C、类黄酮、超氧化物歧化酶等有益于健康的成分,多种治疗功效在现代医药学研究中已被证实[4-5]。其中,余甘子果实是维生素C最丰富的天然来源之一,具有极高的商业开发潜力[6]。在生态学方面,余甘子因极耐干旱贫瘠环境而常被作为我国西南干热河谷荒山绿化的先锋树种,对水土流失严重的干热河谷地带有明显的保水、固土作用[3]。尽管兼具药用、经济和生态价值,但目前余甘子遗传背景仍然不清楚,特别是该植物的分子生物学背景研究较为薄弱。
随着高通量测序技术的发展,测序时间和成本显著减少。对于无参考基因组的物种,采用转录组高通量测序技术可获取大量的数据信息,构建cDNA文库,挖掘重要功能基因,同时也为大量分子标记的开发奠定基础,是开展植物优良性状研究的重要手段[7]。鉴于此,本研究以云南宾川干热河谷地区的余甘子为研究材料,采用Illumina Hiseq 4000平台对余甘子进行高通量转录组测序,并对测序的原始数据进行过滤和de novo组装,之后通过生物信息学的方法对获得的Unigenes进行功能注释、CDS预测、TF及抗性基因预测等分析,以期了解余甘子在一定生长发育时期基因表达情况及功能分布特征,为余甘子转录组水平上的研究以及微卫星标记的开发和抗旱相关基因的挖掘奠定基础,亦为干热河谷地区的生态恢复和经济发展提供参考。
云南干热河谷地区余甘子转录组分析
Transcriptome Analysis for Phyllanthus emblica Distributed in Dry-hot Valleys in Yunnan, China
-
摘要:
目的 对云南干热河谷地区余甘子转录组特征进行描述,旨在为余甘子微卫星标记的开发和功能基因的挖掘提供较全面的背景信息。 方法 采用Illumina Hiseq 4000测序平台对余甘子叶片进行转录组测序,对原始数据进行过滤、de novo组装及聚类去冗余等处理后,再与公共数据库进行比对,对Unigenes进行基本功能注释、CDS预测、TF编码能力预测及R-Gene预测等分析。 结果 本研究共获得10.52 Gb的Clean reads,Q20、Q30分别为98.47%、95.28%。组装并去冗余后获得76 881条Unigenes,平均长度、N50分别为713、1 257 nt。通过与NR、COG、KEGG和SwissProt数据库进行比对,44 768条Unigenes获得功能注释。余甘子转录组Unigenes根据COG功能注释信息大致分为25类;按GO功能注释信息划分为生物学过程、细胞组分和分子功能3大类47亚类;参考KEGG注释信息,可归为6大代谢通路、21类代谢途径,其中约3/5为代谢相关通路。根据以上注释结果共检测出42 953个CDS,其余未比对上的Unigenes用ESTScan预测后得到2 058个CDS。同时,预测到56个TF家族以及18种R-Gene。 结论 本研究获得的余甘子转录组Unigenes序列的组装质量较高、完整性较好、基因丰富、功能多样,极大地扩充了余甘子基因信息库,为今后余甘子乃至叶下珠属植物功能基因挖掘、抗性机理分析、分子标记开发、分子辅助育种等研究提供了重要的基础数据。 Abstract:Objective To provide comprehensive genetic information for the development of microsatellite markers and the mining of functional genes in Phyllanthus emblica by characterizing the transcriptome of P. emblica in dry-hot valleys in Yunnan. Method Transcriptome sequencing was conducted on young leaves of Ph. emblica using Illumina Hiseq 4000, followed by filtering, de novo assembly and clustering. Sequence similarity analysis and annotation of the obtained Unigenes were performed based on databases like NCBI-non-redundant (NR) protein database, Gene Ontology (GO), Clusters of Orthologous Groups (COG), KEGG database, SwissProt, PlantTFDB, and PRGdb. Result In total, 10.52 Gb Clean reads with Q20 of 98.47% and Q30 of 95.28% were generated. A total of 76 881 Unigenes with an average length of 713 nt and N50 of 1 257 nt were obtained by de novo assembly and clustering with Clean reads. Out of them, 44 768 Unigenes were functionally annotated against four protein databases. The Unigenes were roughly divided into 25 categories according to COG function, and were grouped into three functional categories (including biological processes, cellular components and molecular function) and 47 sub-categories based on GO functional annotation. KEGG analysis showed that the Unigenes could be fallen into six categories and 21 metabolic pathways, of which about 3/5 were Metabolism. A total of 42 953 CDS were detected based on the results of functional annotation, and 2 058 CDS were predicted using ESTScan with the remaining Unigenes. And 56 Transcription Factor families and 18 resistance genes were predicted. Conclusion The Unigenes of transcriptome in Ph. emblica show high quality, good integrality, abundant genes and various functions, which could lay an important foundation for further study of functional gene excavation, resistance mechanism analysis, molecular marker development and molecular assisted breeding of Ph. emblica and other congeneric species. -
Key words:
- Phyllanthus emblica
- / transcriptome
- / unigene
- / functional annotation
- / CDS
- / transcription factor
- / R-Gene
-
-
[1] 金振洲, 欧晓昆.元江、怒江、金沙江、澜沧江干热河谷植被[M].昆明:云南大学出版社, 云南科技出版社, 2000. [2] Zhou Z, Ma H, Lin K, et al. RNA-seq reveals complicated transcriptomic responses to drought stress in a nonmodel tropic plant, Bombax ceiba L.[J]. Evolutionary Bioinformatics, 2015, 11(S1):27-37. [3] 李巧明, 赵建立.云南干热河谷地区余甘子居群的遗传多样性研究[J].生物多样性, 2007, 15(1): 84-91. doi: 10.3321/j.issn:1005-0094.2007.01.009 [4] Variya B C, Bakrania A K, Patel S S. Emblica officinalis (Amla) :A review for its phytochemistry, ethnomedicinal uses and medicinal potentials with respect to molecular mechanisms[J]. Pharmacological Research, 2016, 111: 180-200. doi: 10.1016/j.phrs.2016.06.013 [5] Chaphalkar R, Apte K G, Talekar Y, et al. Antioxidants of Phyllanthus emblica L. bark extract provide hepatoprotection against ethanol-induced hepatic damage: a comparison with Silymarin[J]. Oxidative Medicine and Cellular Longevity, 2017, 2017: 1-10. [6] Srinivasan M. Vitamin C in plants: Indian Gooseberry (Phyllanthus emblica)[J]. Nature, 1944, 153(3892): 684. doi: 10.1038/153684c0 [7] Alvarez M, Schrey A W, Richards C L. Ten years of transcriptomics in wild populations: what have we learned about their ecology and evolution?[J]. Molecular Ecology, 2015, 24(4): 710-725. doi: 10.1111/mec.13055 [8] Kumar A, Singh K. Isolation of high quality RNA from Phyllanthus emblica and its evaluation by downstream applications[J]. Molecular Biotechnology, 2012, 52(3): 269-275. doi: 10.1007/s12033-011-9492-5 [9] Grabherr M G, Haas B J, Yassour M, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome[J]. Nature Biotechnology, 2011, 29(7): 644-652. doi: 10.1038/nbt.1883 [10] Fu L M, Niu B F, Zhu Z W, et al. CD-HIT: accelerated for clustering the next-generation sequencing data[J]. Bioinformatics, 2012, 28(23): 3150-3152. doi: 10.1093/bioinformatics/bts565 [11] Iseli C, Jongeneel C V, Bucher P. ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences[C]// Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology. Menlo Park: AAAI Press, 1999: 138-158. [12] 房卫平, 谢德意, 李志芳, 等. NBS-LRR类抗病蛋白介导的植物抗病应答分子机制[J].分子植物育种, 2015, 13(2): 469-474. [13] Chisholm S T, Coaker G, Day B, et al. Host-microbe interactions: shaping the evolution of the plant immune response[J]. Cell, 2006, 124(4): 803-814. doi: 10.1016/j.cell.2006.02.008 [14] Bose Mazumdar A, Chattopadhyay S. Sequencing, de novo assembly, functional annotation and analysis of Phyllanthus amarus leaf transcriptome using the Illumina platform[J]. Frontiers in Plant Science, 2016, 6(340): 1199. [15] 贾新平, 孙晓波, 邓衍明, 等.鸟巢蕨转录组高通量测序及分析[J].园艺学报, 2014, 41(11): 2329-2341. [16] Sato S, Hirakawa H, Isobe S, et al. Sequence analysis of the genome of an oil-bearing tree, Jatropha curcas L.[J]. DNA Research, 2011, 18(1): 65-76. doi: 10.1093/dnares/dsq030 [17] Bai T D, Xu L A, Xu M, et al. Characterization of masson pine (Pinus massoniana Lamb.) microsatellite DNA by 454 genome shotgun sequencing[J]. Tree Genetics & Genomes, 2014, 10(2): 429-437. [18] 李太强, 刘雄芳, 万友名, 等.基于高通量测序的极小种群野生植物长梗杜鹃的转录组分析[J].植物研究, 2017, 37(6): 825-834. [19] 蔡年辉, 邓丽丽, 许玉兰, 等.基于高通量测序的云南松转录组分析[J].植物研究, 2016, 36(1): 75-83. [20] 张琳, 范晓明, 林青, 等.锥栗种仁转录组及淀粉和蔗糖代谢相关酶基因的表达分析[J].植物遗传资源学报, 2015, 16(3): 603-611. [21] 牛义岭, 姜秀明, 许向阳.植物转录因子MYB基因家族的研究进展[J].分子植物育种, 2016, 14(8): 2050-2059. [22] 王翠, 兰海燕.植物bHLH转录因子在非生物胁迫中的功能研究进展[J].生命科学研究, 2016, 20(4): 358-364. [23] Jones J D G, Dangl J L. The plant immune system[J]. Nature, 2006, 444: 323-329. doi: 10.1038/nature05286 [24] Gururani M A, Venkatesh J, Upadhyaya C P, et al. Plant disease resistance genes: current status and future directions[J]. Physiological and Molecular Plant Pathology, 2012, 78(51): 51-65. [25] Dubey N, Singh K. Role of NBS-LRR proteins in plant defense[M]// Singh A, Singh I. Molecular Aspects of Plant-Pathogen Interaction. Singapore: Springer Singapore, 2018: 115-138.