版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領
文檔簡介
1、ncbi fieldguidea field guide genome resources sequence similarityncbi fieldguidegenome resourcesncbi fieldguidegenomic biologyncbi fieldguidencbi fieldguidencbi fieldguidencbi fieldguidegenome projects: microbncbi fieldguidencbi fieldguidencbi fieldguidegenome resourcesncbi fieldguidea single query
2、interface to sequencessequences- refseqs- refseqs- genbank- genbank- homologene- homologenemaps mapviewermaps mapviewerentrez linksentrez linkslocuslinklocuslink will be replaced by entrez gene on march 1, 2005. check gene faq for current information.ncbi fieldguideentrez genelocuslinka single query
3、 interface to sequencessequences - refseqs - refseqs - genbank - genbank - homologene - homologenemaps mapviewermaps mapviewerentrez linksentrez linksentrez gene more organisms - all refseq genomes entrez integrationncbi fieldguidegsnsym淀粉樣變性病ncbi fieldguidencbi fieldguideglobal entrez: nadh2nadh247
4、ncbi fieldguideentrez gene: nadh226 recordsncbi fieldguidegene record for pongo nadh2homo sapiensncbi fieldguidedisplay exons/introns: gene tablencbi fieldguidegene tablencbi fieldguidea record with more data: human hfe血色沉著病ncbi fieldguidegene graphic linksnm_nm_np_np_ncbi fieldguideintrons/exons: g
5、ene tablelinks to sequencencbi fieldguidea record with more data: human hfencbi fieldguideentrez snphfegene name and humanorgn 52血色沉著病ncbi fieldguidelinking to snp染色體定位基因定位序列定位ncbi fieldguidesnp in structurencbi fieldguidelink to omimncbi fieldguidevariants in omimncbi fieldguidegenome resourcesncbi
6、 fieldguidegene-oriented clusters of expressed sequences automatic clustering using megablast each cluster represents a unique gene informed by genome hits information on tissue types and map locations useful for gene discovery and selection of mapping reagentsunigenencbi fieldguidea cluster of ests
7、query5 est hits3 est hitsncbi fieldguideunigenencbi fieldguideunigene collectionsncbi fieldguideexample unigene clusterncbi fieldguidehistogram of cluster sizes for unigene hs build 177ncbi fieldguideunigene cluster hs.95351ncbi fieldguideunigene cluster hs.95351ncbi fieldguideunigene cluster hs.953
8、51: expressionncbi fieldguideunigene cluster hs.95351: seqsncbi fieldguidedownload sequencesweb pageftp sitencbi fieldguidegenome resourcesncbi fieldguidethe new homologeneautomated detection of homologs among the annotated genes of completely sequenced eukaryotic genomes. no longer unigene based pr
9、otein similarities first guided by taxonomic tree includes orthologs and paralogsncbi fieldguide orthologs 和 paralogs 是同源序列的兩種類型。 orthologs(垂直同源基因)是指來自于不同物種的由垂直家系(物種形成)進化而來的蛋白,并且典型的保留與原始蛋白有相同的功能。 paralogs(平行同源基因)是那些在一定物種中的來源于基因復制的蛋白,可能會進化出新的與原來有關的功能。請參考文獻獲得更多的信息。ncbi fieldguidegene duplicationparalo
10、gs vs orthologsearly globin genea-chain gene b-chain genefrog a chick a mouse amouse b chick b frog bparalogsorthologs orthologsncbi fieldguidethe new homologene homologene build 37.2species number of genes input grouped groupsncbi fieldguiderag1 homologenerag112recombination activating gene ncbi fi
11、eldguiderag1 homolgenerag1amniota ncbi fieldguidehomolgene: rag1ncbi fieldguidencbi fieldguidehomolgene: rag1ncbi fieldguidegenome resourcesncbi fieldguidencbi fieldguidencbi fieldguidencbi fieldguidemapviewerncbi fieldguidelist viewncbi fieldguidehuman mapvieweradar腺甙脫氨酶ncbi fieldguidemapviewer: hu
12、man adar4ncbi fieldguidemv hs adar3 utr5 utrncbi fieldguidemaps & options-sequence mapssequence maps-ab initioassemblyrepeatsbes_cloneclonenci_clonecontigcomponentcpg islanddbsnp haplotypefosmidgenbank_dnagenephenotypesage_tagststcag_rnatranscript (rna)hs_unigenehs_est-cytogenetic mapscytogenetic ma
13、ps-ideogramfish clonegene_cytogeneticmitelman breakpointmorbid/disease-genetic maps-decodegenethonmarshfield-rh maps-genemap99-g3genemap99-gb4ncbi rhstandford-g3tngwhitehead-rhwhitehead-yacmm_unigenemm_estrn_unigenern_estssc_unigenessc_estbt_unigenebt_estgga_unigenegga_estvariationmaps & options= sn
14、pncbi fieldguidemapviewerunigenecomponentrepeatsgenencbi fieldguidemaster map: repeatsncbi fieldguidegenephenotypevariationncbi fieldguidemaps & optionsmaps & optionsncbi fieldguidegenome resourcesncbi fieldguidencbi fieldguidencbi fieldguidestrongylocentrotus purpuratus tracesncbi fieldguidebasic l
15、ocal alignment search toolncbi fieldguideweb accessblastvastentreztextsequencestructurencbi fieldguidencbi fieldguidebasic local alignment search tool why use sequence similarity? blast algorithm blast statistics blast output examplesncbi fieldguidewhy do we need sequence similarity searching? to id
16、entify and annotate sequences to evaluate evolutionary relationships other: model genomic structure (e.g., spidey) check primer specificity in silico: ncbis toolncbi fieldguideblast website statsncbi fieldguideglobal vs local alignmentseq 1seq 2seq 1seq 2global alignmentlocal alignmentncbi fieldguid
17、eglobal vs local alignmentseq1: whereiswalternow (16aa)seq2: hewasherebutnowishere (21aa)globalseq1:1 w-hereiswalternow 16 w here seq2:1 hewasherebutnowishere 21localseq1: 1 w-here 5 seq1: 1 w-here 5 w here w hereseq2: 3 washere 9 seq2: 15 wishere 21ncbi fieldguidethe flavors of blast standard blast
18、 traditional “contiguous” word hit position independent scoring nucleotide, protein and translations (blastn, blastp, blastx, tblastn, tblastx) megablast optimized for large batch searches can use discontiguous words psi-blast constructs pssms automatically; uses as query very sensitive protein sear
19、ch rps blast searches a database of pssms tool for conserved domain searchesncbi fieldguide widely used similarity search tool heuristic approach based on smith waterman algorithm finds best local alignments provides statistical significance all combinations (dna/protein) query and database. . dna v
20、s dna blastn dna translation vs protein blastx protein vs protein blastp protein vs dna translation tblastn dna translation vs dna translation tblastx www, standalone, and network clientsbasic local alignment search toolncbi fieldguidetranslated blastqueryquerydatabasedatabaseprogramprogramnpucleoti
21、deroteinnnnnppblastxtblastntblastxppppppppppppppppppppppppparticularly useful for nucleotide sequences withoutprotein annotations, such as ests or genomic dnancbi fieldguidehow blast works make lookup table of “words” for query scan database for hits ungapped extensions of hits (initial hsps) gapped
22、 extensions (no traceback) gapped extensions (traceback; alignment details)ncbi fieldguidenucleotide wordsgtactggacatggaccctacaggaaquery:gtactggacat tactggacatg actggacatgg ctggacatgga tggacatggac ggacatggacc gacatggaccc acatggaccctmake a lookuptable of words11-mer. . .828megablast711blastnminimumde
23、faultword sizencbi fieldguideprotein wordsgtqitvedlfyniatrrkalknquery: neighborhood wordsltv, mtv, isv, lsv, etc.gtq tqi qit itv tve ved edl dlf .make a lookuptable of wordsword size = 3 (default)word size can only be 2 or 3 -f 11 = blastp default ncbi fieldguideminimum requirements for a hit nucleo
24、tide blast requires one exact match protein blast requires two neighboring matches within 40 aagtqitvedlfyni sei yynatcgccatgcttaattgggctt catgcttaatt neighborhood wordsone exact matchtwo matches -a 40 = blastp default ncbi fieldguideblastp summary yls hflsbjct 287 leetyakylhkgasyfvylslnmspeqldvnvhp
25、skrivhflydqei 333 query 1 ietvyaaylpknthpflylsleispqnvdvnvhptkhevhflheesi 47gapped extension with trace backgapped extension with trace backquery 1 ietvyaaylpknthpflylsleispqnvdvnvhptkhevhflheesi-lev 50 +e ya yl k f+ylsl +sp+ +dvnvhp+k vhfl+ i + +sbjct 287 leetyakylhkgasyfvylslnmspeqldvnvhpskrivhfly
26、dqeiatsi 337 final hspfinal hsp +e ya yl k f+ l +sp+ +dvnvhp+k v + i high-scoring pair (hsp)high-scoring pair (hsp)hfl 18hfv 15 hfs 14hwl 13nfl 13dfl 12hwv 10etc yls 15ylt 12 yvs 12yit 10etc neighborhood neighborhood wordswordsneighborhood neighborhood score thresholdscore thresholdt (-f) =11t (-f)
27、=11query: ietvyaaylpknthpflylsleispqnvdvnvhptkhevhflheesilevexample query wordsexample query wordsncbi fieldguidescoring systems - nucleotides a g c ta +1 3 3 -3g 3 +1 3 -3c 3 3 +1 -3t 3 3 3 +1identity matrixcaggtagcaagcttgcatgtca| | | raw score = 19-9 = 10cacgtagcaagcttg-gtgtca -r 1 -q -3 ncbi fiel
28、dguidescoring systems - proteinsposition independent matricespam matrices (percent accepted mutation) derived from observation; small dataset of alignments implicit model of evolution all calculated from pam1 pam250 widely usedblosum matrices (block substitution matrices) derived from observation; l
29、arge dataset of highly conserved blocks each matrix derived separately from blocks with a defined percent identity cutoff blosum62 - default matrix for blastposition specific score matrices (pssms)psi- and rps-blastncbi fieldguidea 4r -1 5 n -2 0 6d -2 -2 1 6c 0 -3 -3 -3 9q -1 1 0 0 -3 5e -1 0 0 2 -
30、4 2 5g 0 -2 0 -1 -3 -2 -2 6h -2 0 1 -1 -3 0 0 -2 8i -1 -3 -3 -3 -1 -3 -3 -4 -3 4 l -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4k -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5m -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5f -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6p -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7s 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -
31、2 -1 4t 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5w -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7v 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4x 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1 a r n d c q e g h i l
32、 k m f p s t w y v xblosum62dfnegative for less likely substitutionsdyfpositive for more likely substitutionsncbi fieldguideposition-specific score matrixdaf-1serine/threonine protein kinases catalytic loop174pssm scores54ncbi fieldguide a r n d c q e g h i l k m f p s t w y v 435 k -1 0 0 -1 -2 3 0
33、 3 0 -2 -2 1 -1 -1 -1 -1 -1 -1 -1 -2 436 e 0 1 0 2 -1 0 2 -1 0 -1 -1 0 0 0 -1 0 0 -1 -1 -1 437 s 0 0 -1 0 1 1 0 1 1 0 -1 0 0 0 2 0 -1 -1 0 -1 438 n -1 0 -1 -1 1 0 -1 3 3 -1 -1 1 -1 0 0 -1 -1 1 1 -1 439 k -2 1 1 -1 -2 0 -1 -2 -2 -1 -2 5 1 -2 -2 -1 -1 -2 -2 -1 440 p -2 -2 -2 -2 -3 -2 -2 -2 -2 -1 -2 -1
34、 0 -3 7 -1 -2 -3 -1 -1 441 a 3 -2 1 -2 0 -1 0 1 -2 -2 -2 0 -1 -2 3 1 0 -3 -3 0 442 m -3 -4 -4 -4 -3 -4 -4 -5 -4 7 0 -4 1 0 -4 -4 -2 -4 -1 2 443 a 4 -4 -4 -4 0 -4 -4 -3 -4 4 -1 -4 -2 -3 -4 -1 -2 -4 -3 4 444 h -4 -2 -1 -3 -5 -2 -2 -4 10 -6 -5 -3 -4 -3 -2 -3 -4 -5 0 -5 445 r -4 8 -3 -4 0 -1 -2 -3 -2 -5
35、 -4 0 -3 -2 -4 -3 -3 0 -4 -5 446 d -4 -4 -1 8 -6 -2 0 -3 -3 -5 -6 -3 -5 -6 -4 -2 -3 -7 -5 -5 447 i -4 -5 -6 -6 -3 -4 -5 -6 -5 3 5 -5 1 1 -5 -5 -3 -4 -3 1 448 k 0 0 1 -3 -5 -1 -1 -3 -3 -5 -5 7 -4 -5 -3 -1 -2 -5 -4 -4 449 s 0 -3 -2 -3 0 -2 -2 -3 -3 -4 -4 -2 -4 -5 2 6 2 -5 -4 -4 450 k 0 3 0 1 -5 0 0 -4
36、 -1 -4 -3 4 -3 -2 2 1 -1 -5 -4 -4 451 n -4 -3 8 -1 -5 -2 -2 -3 -1 -6 -6 -2 -4 -5 -4 -1 -2 -6 -4 -5 452 i -3 -5 -5 -6 0 -5 -5 -6 -5 6 2 -5 2 -2 -5 -4 -3 -5 -3 3 453 m -4 -4 -6 -6 -3 -4 -5 -6 -5 0 6 -5 1 0 -5 -4 -3 -4 -3 0 454 v -3 -3 -5 -6 -3 -4 -5 -6 -5 3 3 -4 2 -2 -5 -4 -3 -5 -3 5 455 k -2 1 1 4 -5
37、 0 -1 -2 1 -4 -2 4 -3 -2 -3 0 -1 -5 -2 -3 456 n 1 1 3 0 -4 -1 1 0 -3 -4 -4 3 -2 -5 -2 2 -2 -5 -4 -4 457 d -3 -2 5 5 -1 -1 1 -1 0 -5 -4 0 -2 -5 -1 0 -2 -6 -4 -5 458 l -3 -1 0 -3 0 -3 -2 3 -4 -2 3 0 1 1 -2 -2 -3 5 -1 -3position-specific score matrixcatalytic loop ./blastpgp -i np_499868.2 -d nr -j 3 -
38、q np_499868.pssm ncbi fieldguidelocal alignment statisticshigh scores of local alignments between two random sequencesfollow the extreme value distributionscore (s)alignments(applies to ungapped alignments)e = kmne-s or e = mn2-sk = scale for search space = scale for scoring system s = bitscore = (s
39、 - lnk)/ln2expect valueexpect valuee = number of database hits you expect to find by chance, syour scoreexpected number of random hitsmore info: /blast/tutorial/altschul-1.html ncbi fieldguideadvanced blast options: nucleotideexample entrez queriesnucleotide allfilter not mammali
40、aorganismgreen plantsorganismbiomol mrnapropertiesgbdiv estproperties and ratorganismother advancede 10000 expect value-v 2000 descriptions-b 2000 alignmentsncbi fieldguideadvanced blast options: proteinmatrix selectionpam30 - most stringentblosum45 - least stringentexample entrez queriesproteins al
41、lfilter not mammaliaorganismgreen plantsorganismsrcdb refseqpropertiesother advancede 10000 expect value-v 2000 descriptions-b 2000 alignmentslimit by taxonmus musculusorganismmammaliaorganismviridiplantaeorganismncbi fieldguide sp|p27476|nsr1_yeast nuclear localization sequence binding protein (p67
42、) length = 414 score = 40.2 bits (92), expect = 0.013 identities = 35/131 (26%), positives = 56/131 (42%), gaps = 4/131 (3%)query: 362 sttsltssstsgssdkvyahqmvrtdsreqkldaflqplskpls-sqpqaivtedktd 418 s+s sss+s ss + + +s + + s s s+ + e k sbjct: 29 sssssesssssssssesesesesesessssssssdsesssssssdseseaetkke
43、eskds 88filteredunfilteredlow complexity filteringncbi fieldguideother blast algorithms megablast discontiguous megablast psi-blast phi-blastncbi fieldguidemegablast: ncbis genome annotator long alignments of similar dna sequences greedy algorithm concatenation of query sequences faster than blastn;
44、 less sensitivencbi fieldguidemegablast & word sizetrade-off: sensitivity vs speed23blastp828megablast711blastnminimumdefaultword sizencbi fieldguidediscontiguous megablast uses discontiguous word matches better for cross-species comparisonsncbi fieldguidetemplates for discontiguous wordsw = 11, t =
45、 16, coding: 1101101101101101w = 11, t = 16, non-coding: 1110010110110111w = 12, t = 16, coding: 1111101101101101w = 12, t = 16, non-coding: 1110110110110111w = 11, t = 18, coding: 101101100101101101w = 11, t = 18, non-coding: 111010010110010111w = 12, t = 18, coding: 101101101101101101w = 12, t = 1
46、8, non-coding: 111010110010110111w = 11, t = 21, coding: 100101100101100101101w = 11, t = 21, non-coding: 111010010100010010111w = 12, t = 21, coding: 100101101101100101101w = 12, t = 21, non-coding: 111010010110010010111 reference: ma, b, tromp, j, li, m. patternhunter: faster and more sensitive ho
47、mology search. bioinformatics march, 2002; 18(3):440-5 w = word size; # matches in templatet = template lengthncbi fieldguidediscontiguous (cross-species) megablastncbi fieldguidediscontiguous word optionsncbi fieldguidemegablast vs discontiguous megablastnm_017460homo sapiens cytochrome p450, famil
48、y 3, subfamily a, polypeptide 4 (cyp3a4), transcript variant 1, mrna (2768 letters) vs drosophilancbi fieldguidemegablast vs discontiguous megablast megablast = “no significant similarity found.” discontiguous megablast =ncbi fieldguideanother example . . . discontiguous megablast = numerous hits .
49、. .query: nm_078651 drosophila melanogaster cg18582-pa (mbt) mrna, (3244 bp)/note= mushroom bodies tiny; synonyms: pak2, ste20, dpak2 megablast = “no significant similarity found.”database: nr (nt), mammaliaorgnncbi fieldguideex: discontiguous megablastncbi fieldguideex: blastnncbi fieldguidepsi-bla
50、stexample: confirming relationships of purinenucleotide metabolism proteinsposition-specific iterated blastncbi fieldguidegi|113340|sp|p03958|ada_mouse adenosine deaminase (adenosinemaqtpafnkpkvelhvhldgaikpetilyfgkkrgialpadtveelrniigmdkplslpgfviagcreaikriayefvemkakegvvyvevrysphllanskvdpmpwnqtegdvtpd
51、dvvdeqafgikvrsilccmrhqpswslevlelckkynqktvvamdlagdetiegsslfpghveayrtvhagevgspevvreavdilktervghgyhtiedealynrllkenmhfevcpwssyltgavrfkndkanyslntddplifkstldtdyqmtkkdmgfteeefkrlninaakssflpeeekkpsi-blast0.005e value cutoff for pssmncbi fieldguideresults: initial blastpsame results as protein-protein blast;
52、 different formatncbi fieldguideresults of first pssm searchother purine nucleotide metabolizing enzymes not found by ordinary blastncbi fieldguidetenth pssm search: convergencejust below threshold, another nucleotide metabolism enzymecheck to add to pssmncbi fieldguidereverse psi-blast (rps)-blastn
53、cbi fieldguideadenosine/amp deaminase domainamp deaminases. . . .ncbi fieldguidephi-blastgi|231729|sp|p30429|ced4_caeel cell death protein 4mlceiecralstahtrlihdfeprdaltylegkniftedhseliskmstrlerianflriyrrqaselidffnynnqshladfledyidfainepdllrpvviapqfsrqmldrklllgnvpkqmtcyireyhvikkldemcdldsfflflhgragsgks
54、viasqalsksdqliginydsivwlkdsgtapkstfdlftdilkseddllnfpsvehvtsvvlkrmicnalidrpntlfvfddvvqeetirwaqelrlrclvttrdveiasqtcefievtsleidecydfleaygmpmpvgekeedvlnktielssgnpatlmmffkscepktfekgaxxxxgkstncbi fieldguidewhats new?whats new?ncbi fieldguideblast databasesnucleotide refseq_rna = nm_*, xm_* refseq_genomic = nc_*, ng_* env_nt environmental samplefilter, e.g., 16s rrnaprotein refseq = np_*, xp_* env_nrncbi fieldguidenew formatterselect lower caseselect redncbi fieldguidenew formatter gray line = same database hit hsps color-coded independentlyncbi field
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 遼寧鐵道職業(yè)技術學院《規(guī)則與裁判法》2023-2024學年第一學期期末試卷
- 蘭州城市學院《建筑設備施工安裝技術》2023-2024學年第一學期期末試卷
- 江西電力職業(yè)技術學院《智慧工地》2023-2024學年第一學期期末試卷
- 湖南第一師范學院《篆刻3》2023-2024學年第一學期期末試卷
- 黑龍江生態(tài)工程職業(yè)學院《風景建筑速寫》2023-2024學年第一學期期末試卷
- 重慶護理職業(yè)學院《民事訴訟法學(含模擬法庭)》2023-2024學年第一學期期末試卷
- 中央財經(jīng)大學《人工智能專業(yè)前沿》2023-2024學年第一學期期末試卷
- 鄭州軟件職業(yè)技術學院《英語模擬課堂》2023-2024學年第一學期期末試卷
- 小學2024年體育發(fā)展年度報告
- 浙江電力職業(yè)技術學院《生物信息學前沿技術》2023-2024學年第一學期期末試卷
- 關于斗爭精神不足的整改措施【三篇】
- 初三物理寒假課程
- 如何預防心腦血管病
- LY/T 3321-2022草原生態(tài)價值評估技術規(guī)范
- 《新媒體文案創(chuàng)作與傳播》期末試卷1
- 人感染H7N9禽流感流行病學調(diào)查和處置
- 高等院校內(nèi)部控制多模型決策方法研究
- 木棧道專項施工方案(同名3601)
- GB/T 11957-2001煤中腐植酸產(chǎn)率測定方法
- 浙江省普通高中通用技術學科教學指導意見
- HRB500級鋼筋施工要點ppt課件
評論
0/150
提交評論