生物信息學(xué)基因組學(xué)和功能_第1頁(yè)
生物信息學(xué)基因組學(xué)和功能_第2頁(yè)
生物信息學(xué)基因組學(xué)和功能_第3頁(yè)
生物信息學(xué)基因組學(xué)和功能_第4頁(yè)
生物信息學(xué)基因組學(xué)和功能_第5頁(yè)
已閱讀5頁(yè),還剩966頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、Bioinformatics and Functional GenomicsBIOINFORMATICS AND FUNCTIONAL GENOMICSSecond EditionJonathan PevsnerDepartment of Neurology, Kennedy Krieger InstituteandDepartment of Neuroscience and Division of Health Sciences Informatics, The Johns Hopkins School of Medicine, Baltimore, MarylandCopyright #

2、2009 by John Wiley & Sons, Inc. All rights reserved.Wiley-Blackwell is an imprint of John Wiley & Sons, formed by the merger of Wileys global Scientic, Technical, and Medical business with Blackwell Publishing.Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneo

3、usly in CanadaNo part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, witho

4、ut either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750 8400, fax (978) 750 4470, or on the web at . Requests to the Publisher for permission sh

5、ould be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030,(201) 748 6011, fax (201) 748 6008, or online at Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make

6、no representations or warranties with respect to the accuracy or comple- teness of the contents of this book and specically disclaim any implied warranties of merchantability or tness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials

7、. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of prot or any other commercial damages, including but not limited to special, incidental, con-

8、 sequential, or other damages.For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762 2974, outside the United States at (317) 572 3993 or fax (317) 572 4002.Wiley also publishes its books

9、in variety of electronic formats. Some content that appears in print may not be available in electronic format. For more information about Wiley products, visit our web site at www. .Cover illustration includes detail from Leonardo da Vinci (1452 1519), dated c.1506 1507, courtesy of the Schlossmuse

10、um (Weimar).ISBN: 978-0-470-08585-1Library of Congress Cataloging-in-Publication Data is available.Printed in the United States of America10 9 8 7 6 5 4 3 2 1For Barbara, Ava and Lillian with all my love.Contents in BriefANALYZING DNA, RNA, AND PROTEIN SEQUENCES IN DATABASESPART I1234567PART II89101

11、112PART III1314151617181920Introduction,3Access to Sequence Data and Literature Information,13Pairwise Sequence Alignment,47Basic Local Alignment Search Tool (BLAST),Advanced Database Searching,141101Multiple Sequence Alignment,179Molecular Phylogeny and Evolution,215GENOMEWIDE ANALYSIS OF RNA AND P

12、ROTEINBioinformatic Approaches to Ribonucleic Acid (RNA),Gene Expression: Microarray Data Analysis,331279Protein Analysis and Proteomics,379Protein Structure,421Functional Genomics,461GENOME ANALYSISCompleted Genomes,517Completed Genomes: Viruses,567Completed Genomes: Bacteria and Archaea,597The Euk

13、aryotic Chromosome,Eukaryotic Genomes: Fungi,639697Eukaryotic Genomes: From Parasites to Primates,Human Genome,791729Human Disease,839Glossary,891Answers to Self-Test Quizzes, Author Index,911909Subject Index,913viiContentsGenome Survey Sequences (GSSs),22High Throughput Genomic Sequence (HTGS),23Pr

14、otein Databases,23 National Center for BiotechnologyInformation,23 Introduction to NCBI: HomePage,23PubMed,23Entrez,24BLAST,25OMIM,25Books,25Taxonomy,25Structure,25The European Bioinformatics Institute (EBI),25Access to Information: Accession Numbers to Label and Identify Sequences,26The Reference S

15、equence (RefSeq) Project,27The Consensus Coding Sequence (CCDS) Project,29Access to Information via Entrez Gene at NCBI,29Relationship of Entrez Gene, Entrez Nucleotide, and Entrez Protein,32Comparison of Entrez Gene and UniGene,32Entrez Gene and HomoloGene,33 Access to Information: ProteinDatabases

16、,33UniProt,33The Sequence Retrieval System at ExPASy,34Access to Information: The Three Main Genome Browsers,35 The Map Viewer at NCBI,35Preface to the Second Edition,xxiPreface to the First Edition,xxiiiForeword,xxviiANALYZING DNA, RNA, AND PROTEIN SEQUENCES IN DATABASESPART I1Introduction,3Organiz

17、ation of The Book,4 Bioinformatics: The Big Picture, A Consistent Example:Hemoglobin,8 Organization of The Chapters, A Textbook for Courses onBioinformatics and Genomics,9Key Bioinformatics Websites, Suggested Reading,11References,1149102Access to Sequence Data and Literature Information,13 Introduc

18、tion to BiologicalDatabases,13GenBank: Database of Most Known Nucleotide and Protein Sequences,14Amount of Sequence Data,15 Organisms in GenBank,16 Types of Data in GenBank,18Genomic DNA Databases,19 cDNA Databases Correspondingto Expressed Genes,19 Expressed Sequence Tags(ESTs),19ESTs and UniGene,2

19、0 Sequence-Tagged Sites(STSs),22ixxCONTENTSThe University of California, Santa Cruz (UCSC) GenomeBrowser,35The Ensembl Genome Browser,35 Examples of How to Access SequenceData,36HIV pol,36Histones,38Step 1: Setting Up a Matrix,76 Step 2: Scoring the Matrix,77 Step 3: Identifying the OptimalAlignment

20、,79Local Sequence Alignment: Smith and Waterman Algorithm,82Rapid, Heuristic Versions of SmithWaterman: FASTA and BLAST,84Pairwise Alignment with Dot Plots,85The Statistical Significance of Pairwise Alignments,86Statistical Significance of Global Alignments,87Statistical Significance of Local Alignm

21、ents,89Percent Identity and Relative Entropy,90Perspective,91Pitfalls,94Web Resources,94Access to Biomedical Literature,PubMed Central and Movement toward Free Journal Access,Example of PubMed Search: RBP,40Perspective,42Pitfalls,42Web Resources,423839Discussion Questions, Problems,4242Self-Test Qui

22、z,43Suggested Reading,44References,44Pairwise Sequence Alignment,Introduction,47Protein Alignment: Often More Informative Than DNA347Discussion Questions,94Problems/Computer Lab,95Self-Test Quiz,95Suggested Reading,96References,97Basic Local Alignment SearchAlignment,47Definitions: Homology, Similar

23、ity, Identity,48Gaps,55Pairwise Alignment, Homology, and Evolution of Life,55Scoring Matrices,57Dayhoff Model: Accepted Point Mutations,58PAM1 Matrix,63 PAM250 and Other PAMMatrices,65From a Mutation Probability Matrix to a Log-Odds ScoringMatrix,69Practical Usefulness of PAM Matrices in Pairwise Al

24、ignment,70Important Alternative to PAM: BLOSUM Scoring Matrices,70Pairwise Alignment and Limits of Detection: The “Twilight Zone”,74Alignment Algorithms: Global and Local,75Global Sequence Alignment: Algorithm of Needleman and Wunsch,764Tool (BLAST),101Introduction,101BLAST Search Steps,103Step 1: S

25、pecifying Sequence of Interest,103Step 2: Selecting BLAST Program,104Step 3: Selecting a Database,106Step 4a: Selecting Optional Search Parameters,1061. Query,1072. Limit by Entrez Query,1073. Short Queries,1074. Expect Threshold,1075. Word Size,1086. Matrix,1107. Gap Penalties,1108. Composition-Bas

26、ed Statistics,1109. Filtering and Masking,111 Step 4b: Selecting FormattingParameters,112 BLAST Algorithm Uses LocalAlignment Search Strategy,115CONTENTSxiBLAST Algorithm Parts: List, Scan, Extend,115BLAST Algorithm: Local Alignment Search Statistics andE Value,118Making Sense of Raw Scores with Bit

27、 Scores,121BLAST Algorithm: Relation betweenE and p Values,121Parameters of aPSI-BLAST Errors: The Problem of Corruption,152Reverse Position-Specific BLAST,152Pattern-Hit Initiated BLAST (PHI-BLAST),153Profile Searches: Hidden Markov Models,155BLAST-Like Alignment Tools to Search Genomic DNA Rapidly

28、,161Benchmarking to Assess Genomic Alignment Performance,162PatternHunter,162BLASTZ,163MegaBLAST and Discontiguous MegaBLAST,164BLAT,166LAGAN,168SSAHA,168SIM4,169Using BLAST for Gene Discovery,169Perspective,173Pitfalls,173Web Resources,174Discussion Questions,174Problems/Computer Lab,174Self-Test Q

29、uiz,175Suggested Reading,176References,176BLAST Search,123BLAST Search Strategies,123 General Concepts,123 Principles of BLASTSearching,123How to Evaluate Significance of Your Results,123How to Handle Too Many Results,128How to Handle Too Few Results,128BLAST Searching With Multidomain Protein: HIV-

30、1 pol,129Perspective,134Pitfalls,134Web Resources,135Discussion Questions,135Computer Lab/Problems,135Self-Test Quiz,136Suggested Reading,137References,137Advanced Database Searching,141Introduction,1416Multiple Sequence Alignment,Introduction,179Definition of Multiple Sequence1795Alignment,180Typic

31、al Uses and Practical Strategies of Multiple SequenceAlignment,181 Benchmarking: Assessment ofMultiple Sequence Alignment Algorithms,182Five Main Approaches to Multiple Sequence Alignment,184 Exact Approaches to MultipleSequence Alignment,184 Progressive SequenceAlignment,185Iterative Approaches,190

32、 Consistency-BasedApproaches,192Structure-Based Methods,194 Conclusions from BenchmarkingStudies,196Specialized BLAST Sites,142Organism-Specific BLAST Sites,142Ensembl BLAST,142Wellcome Trust SangerInstitute,143 Specialized BLAST-RelatedAlgorithms,143WU BLAST 2.0,144European Bioinformatics Institute

33、 (EBI),144Specialized NCBI BLAST Sites,144Finding Distantly Related Proteins: Position-Specific Iterated BLAST (PSI-BLAST),145Assessing Performance of PSI-BLAST,150xiiCONTENTSDatabases of Multiple Sequence Alignments,197Pfam: Protein Family Database of Profile HMMs,197Smart,199Conserved Domain Datab

34、ase,199Prints,201Integrated Multiple Sequence Alignment Resources: InterPro and iProClass,201PopSet,202Multiple Sequence Alignment Database Curation: Manual versus Automated, 202Multiple Sequence Alignments of Genomic Regions,203Perspective,206Pitfalls,207Web Resources,207Discussion Questions,207Pro

35、blems/Computer Lab,208Self-Test Quiz,208Suggested Reading,209References,210Molecular Phylogeny and Evolution,215 Introduction to MolecularEvolution,215 Goals of MolecularPhylogeny,216Historical Background,217 Molecular ClockHypothesis,221 Positive and NegativeSelection,227Neutral Theory of Molecular

36、 Evolution,230Molecular Phylogeny: Properties of Trees,231Tree Roots,233 Enumerating Trees andSelecting Search Strategies,234Type of Trees,238Species Trees versus Gene/Protein Trees,238DNA, RNA, or Protein-Based Trees,240Five Stages of Phylogenetic Analysis,243Stage 1: SequenceAcquisition,243Stage 2

37、: Multiple Sequence Alignment,244Stage 3: Models of DNA and Amino Acid Substitution,246Stage 4: Tree-BuildingMethods,254Phylogenetic Methods,255Distance,255The UPGMA Distance-Based Method,256Making Trees by Distance- Based Methods: Neighbor Joining,259Phylogenetic Inference: Maximum Parsimony,260Mod

38、el-Based Phylogenetic Inference: Maximum Likelihood,262Tree Inference: Bayesian Methods,264Stage 5: Evaluating Trees,266Perspective,268Pitfalls,268Web Resources,269Discussion Questions,269Problems/Computer Lab,269Self-Test Quiz,271Suggested Reading,272References,2727GENOMEWIDE ANALYSIS OF RNA AND PR

39、OTEINPART II8 Bioinformatic Approachesto Ribonucleic Acid (RNA),Introduction to RNA,279279Noncoding RNA,282 Noncoding RNAs in the RfamDatabase,283Transfer RNA,283Ribosomal RNA,288 Small Nuclear RNA,291 Small Nucleolar RNA,292 MicroRNA,293Short Interfering RNA,294 Noncoding RNAs in the UCSCGenome and

40、 Table Browser,294Introduction to Messenger RNA,296 mRNA: Subject of GeneExpression Studies,300 Analysis of Gene Expression incDNA Libraries,302Pitfalls in Interpreting Expression DatafromcDNALibraries,308CONTENTSxiiiFull-Length cDNA Projects,308 Serial Analysis of Gene Expression(SAGE),309Microarra

41、ys: Genomewide Measurement of Gene Expression,312Stage 1: Experimental Design for Microarrays,314Stage 2: RNA Preparation and Probe Preparation,316Stage 3: Hybridization of Labeled Samples to DNAMicroarrays,317Corrections for Multiple Comparisons,351 Significance Analysis ofMicroarrays (SAM),351 Fro

42、m t-Test to ANOVA,353Microarray Data Analysis: Descriptive Statistics,354Hierarchical Cluster Analysis of Microarray Data,355Partitioning Methods for Clustering:k-Means Clustering,363Clustering Strategies: Self- Organizing Maps,363Principal Components Analysis: Visualizing MicroarrayData,364Supervis

43、ed Data Analysis for Classification of Genes or Samples,367Functional Annotation of Microarray Data,368Perspective,369Pitfalls,370Discussion Questions,370Problems/Computer Lab,371Self-Test Quiz,372Suggested Reading,373References,373Stage 4: Image Analysis, Stage 5: Data Analysis, Stage 6: Biological

44、317318Confirmation,320Microarray Databases,320Further Analyses,320 Interpretation of RNAAnalyses,320The Relationship of DNA, mRNA, and Protein Levels,320The Pervasive Nature of Transcription,321Perspective,322Pitfalls,323Web Resources,323Discussion Questions,323Problems,324Self-Test Quiz,324Suggeste

45、d Reading,325References,32510Protein Analysis and Proteomics,379Introduction,379Protein Databases,380 Community Standards forProteomics Research,381Techniques to Identify Proteins,381 Direct Protein Sequencing,381 Gel Electrophoresis,382Mass Spectrometry,385Four Perspectives on Proteins,388 Perspect

46、ive 1. Protein Domains andMotifs: Modular Nature of Proteins,389Added Complexity of Multidomain Proteins,394Protein Patterns: Motifs or Fingerprints Characteristic of Proteins,394Perspective 2. Physical Properties of Proteins,397Accuracy of Prediction Programs,399Proteomic Approaches to Phosphorylat

47、ion,4019 Gene Expression: MicroarrayData Analysis,331Introduction,331Microarray Data Analysis Software and Data Sets,334Reproducibility of MicroarrayExperiments,335 Microarray Data Analysis:Preprocessing,337Scatter Plots and MA Plots,338 Global and LocalNormalization,343 Accuracy and Precision,344 R

48、obust Multiarray Analysis(RMA),345Microarray Data Analysis: Inferential Statistics,346Expression Ratios,346Hypothesis Testing,347xivCONTENTSProteomic Approaches to Transmembrane Domains,401Introduction to Perspectives 3 and 4: Gene OntologyConsortium,402Perspective 3: ProteinLocalization,406Perspect

49、ive 4: ProteinFunction,407Perspective,411Pitfalls,411Web Resources,412Fold Recognition (Threading),450Ab Initio Prediction (Template-Free Modeling),450A Competition to Assess Progress in StructurePrediction,451 Intrinsically DisorderedProteins,453Protein Structure and Disease,453Perspective,454Pitfa

50、lls,455Discussion Questions,455Problems/Computer Lab,455Self-Test Quiz,456Suggested Reading,457References,457Discussion Questions,414Problems/Computer Lab,415Self-Test Quiz,415Suggested Reading,416References,41612Functional Genomics,Introduction to Functional46111Protein Structure,421Overview of Pro

51、tein Structure,421 Protein Sequence andStructure,422Biological Questions Addressed by Structural Biology:Globins,423 Principles of Protein Structure,423Primary Structure,424Secondary Structure,425 Tertiary Protein Structure:Genomics,461The Relationship of Genotype and Phenotype,463Eight Model Organi

52、sms for Functional Genomics,465 The Bacterium Escherichiacoli,466The Yeast Saccharomyces cerevisiae,466The Plant Arabidopsis thaliana,470The Nematode Caenorhabditis elegans,470The Fruitfly DrosophilaProtein-Folding Problem,430Target Selection and Acquisition of Three-Dimensional Protein Structures,4

53、32Structural Genomics and the Protein Structure Initiative,432melanogaster,471The Zebrafish Danio rerio,471 The Mouse Mus musculus,472 Homo sapiens: Variation inHumans,473Functional Genomics Using Reverse Genetics and ForwardGenetics,473Reverse Genetics: Mouse Knockouts and the b-Globin Gene,475Reve

54、rse Genetics: Knocking Out Genes in Yeast Using Molecular Barcodes,480Reverse Genetics: Random Insertional Mutagenesis (Gene Trapping), 483Reverse Genetics: Insertional Mutagenesis in Yeast,486Reverse Genetics: Gene Silencing by Disrupting RNA,489The Protein Data Bank,434Accessing PDB Entries at the NCBI Website,437Integrated Views of the Universe of Protein Folds,441Taxonomic System for Protein Structures: The SCOP Database,441The CATH Database,443 The Dali DomainDictionary,445Comparison of Resources,446Protein Structure Prediction,447Homology Model

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論