研究每個人的單一核甘酸多型性(SNP)的差異課件_第1頁
研究每個人的單一核甘酸多型性(SNP)的差異課件_第2頁
研究每個人的單一核甘酸多型性(SNP)的差異課件_第3頁
研究每個人的單一核甘酸多型性(SNP)的差異課件_第4頁
研究每個人的單一核甘酸多型性(SNP)的差異課件_第5頁
已閱讀5頁,還剩29頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

ApplicationofSupportVectorMachinetodetectanassociationbetweenadiseaseortraitandmultipleSNPvariationsAuthor:GeneKim,MyungHoKimAdvisor:Dr.HsuGraduate:Ching-WenHongApplicationofSupportVector1Outline1.Motivation2.Objective3.What’sSNP(singlenucleotidepolymorphism)4.HowtofindSNPvariations5.AreviewofSupportVectorMachine6.ArepresentationofmultipleSNPvariationsasavector7.Themarks

8.InseparableCase9.Testresultswithclinicaldata10.PersonalopinionOutline1.Motivation2Motivation研究每個人的「單一核甘酸多型性」(SNP)的差異,可以幫助了解致病基因,甚至預(yù)測藥物對個人是否具有療效,進一步設(shè)計量身訂做藥物,對新藥的開發(fā)有極大的影響。SNP的研究是後基因時代生技產(chǎn)業(yè)發(fā)展的主要趨勢。Motivation研究每個人的「單一核甘酸多型性」(SNP3ObjectiveWecanpresentamethodofdetectingwhetherthereisanassociationbetweenmultipleSNPvariationsandatraitordisease.ThemethodexploitstheSupportVectorMachine(SVM)whichhasbeenattractinglotsofattentionsrecently.ObjectiveWecanpresentameth4What’sSNP何謂SNP(單一核甘酸多型性)雖然同種生物其染色體差異極小,但平均1000個鹼基對(basepair)就有一個發(fā)生突變,這些變異稱為SNP,是造成每個人對藥物的敏感性不同、血型不同、身高等等的原因。此外,SNP也和癌癥、心血管疾病、自體免疫等等疾病有關(guān)。目前國內(nèi)賽亞基因和臺大醫(yī)院合作,正從事C型肝炎SNP研究,試圖找出病患的SNP,以預(yù)測藥物是否對病人有效。What’sSNP何謂SNP(單一核甘酸多型性)5What’sSNP

AgeneticmarkerisM1,M2,…intheDNAThedifferentvariantsofDNAthatdifferentpeoplehaveatthemarkerarealleles,denotedby1,2,3..,Thenumberofallelespermarkerissmall:typicallylessthanten(forcalledmicrosatellitemarker)orexactlytwo(forcalledSNPs).What’sSNPAgeneticmarkeris6HowtofindSNPvariationsTheproblemofdeterminingwhetherasetofSNPvariationcauseaspecificdiseaseortraitcouldbeformulatedasfollows.Foragivendiseaseortrait,1.ForeachsetofSNPvariations,finditsrepresentationasavectorinaEuclideanspace.(haplotypedata,clinicaldata,….wewilldiscussthisinthepage9)2.GetasystematicwayofdistinguishingSNPgenotypeofnormalpeoplefromonesofpeoplewiththediseaseortrait.WewillusetheSupportVectorMachine(SVM)toseparateSNPvectorsintotwogroups(normal,sick).HowtofindSNPvariationsThe7AreviewofSupportVectorMachineWhatisaSVM?afamilyoflearningalgorithmforclassificationofobjectsintotwoclasses.Input:atrainingset{(x1,y1),…,(xl,yl)}ofobjectxiE?(n-dimvectorspace)andtheirknownclassesyiE{-1,+1}.Output:aclassifierf:?→{-1,+1}.whichpredictstheclassf(x)forany(new)objectxE?AreviewofSupportVectorM8AreviewofSupportVectorMachine(1).LinearSVMforseparabletrainingsets:atrainingsetS={(x1,y1),…,(xl,yl)},xiE?,yiE{-1,+1}.AreviewofSupportVectorM9AreviewofSupportVectorMachineTheoptimalhyperplaneisdefinedbythepair(w,b).SolvethelinearprogramproblemMin?║w║2

st.yi(xi·w+b)-1≥0,i=1,…,lThisisaclassquadratic(convex)programAreviewofSupportVectorM10AreviewofSupportVectorMachine(2).LinearSVMfornon-separabletrainingsetsSolvethelinearprogramproblemMin?║w║2+C(∑εi),cisaextremelargevalueS.t.yi(xi·w+b)-1+εi

≥0,εi

≥0,0≤αi≤c,i=1,…,lAreviewofSupportVectorM11ArepresentationofmultipleSNPvariationsasavector

SchemeGiveneachdiseaseortrait,andacollectionofSNPdatawhichdependingongenotypeinaconsistentway.(haplotype,clinicaldata):7step1.Assumethatthereisnoenvironmentalfactor.2.SNPlocationsareassumedtobeknowforthediseaseortrait.3.AssumethereisareferenceSNPdata.(goodhealthrecords)4.Bygivingscoresbasedondifferencefromthereferencedata,assignavectortoeachSNPdata.ArepresentationofmultipleS12ArepresentationofmultipleSNPvariationsasavectorThedimensionofvectoristhenumberofSNPstotherelateddiseaseortrait.5.Atrainingsetischosenforthediseaseortrait,inotherwords,SNPgenotypedataofnormalandsickpopulation.6.ByusingStep4,computetheSNPvectorsofthetrainingdataset﹛(xi,yi)﹜,xiisaSNPdata,yi=1(sick)or-1(normal),7.UsetheSVMtogetahyperplanedividingintotwogroups(sick,normal)ArepresentationofmultipleS13Theremarks1.ThereferencedatacanbebuiltbycollectingSNPgenotypesfromthehealthynormalpopulation.2.Thehyperplaneobatinedcanbeconsideredasacriterion,and,givenanewdataset,itcanbeusedfortestingwhetherthepersonofthedataissusceptibletothediseaseortrait.3.RepresentationofanobjectasavectormightbecriticalformakingusetheSVM.Howtomakedomainknowledgecontainedinvectorrepresentationsisoneofthemajorissues.4.Theideaofdifferencescoringcouldbeappliedtootherdatasets(visualdatasuchasX-rayorMRIimage,…),inparticular,tohaplotypedataandtofindoutalinkageamongSNPtothediseaseortrait.5.OnceagroupofSNPpatternsareidentified,itcancomputecontributionscoreofeachofthoseSNPtothediseaseortrait.Theremarks1.Thereferenceda14InseparableCaseFortheinseparablecase,theiterateduseofSVMenablesustodivideacollectionoflabelledofvectorsintoseveralclusteringgroups.1.Setathresholdvalue.Say,80%.2.UseSVMtoseparateacollectionoflabelledofvectorsintotwogroupsA,B.3.Checkifthegroupscontainmorethan80%ofeither1or-1labeledvectors.SupposeAisnotsuchone.ThenuseSVMtoAagaintotwosubgroups.4.Repeatthisprocedureuntileachsubgrouphasamajorityofmorethan80%.5.Foreachsubgroup,figureoutarange.

InseparableCaseFortheinsepa15TestresultswithclinicaldataTheclinicaldataisacardio-patientrecordsdataset:Height,age,sex,weight,etnicbackground,medicalhistory,birthplace,bloodpressure(systolicanddiastolic),Liqidmeasurementsetcarenumericalizedand+1:apatientwithheartattack,strokeorheartfailure,otherwise-1WeusedThorstenJoachims’implementationofSVM.Testresultswithclinicaldat16PersonalopinionApplicationofSVMiseffective,Butitisdifficulttosolvenonlinearproblem.Howtomakedomainknowledgecontainedinvectorrepresentationsisoneofthemajorissues.

17PersonalopinionApplicationofApplicationofSupportVectorMachinetodetectanassociationbetweenadiseaseortraitandmultipleSNPvariationsAuthor:GeneKim,MyungHoKimAdvisor:Dr.HsuGraduate:Ching-WenHongApplicationofSupportVector18Outline1.Motivation2.Objective3.What’sSNP(singlenucleotidepolymorphism)4.HowtofindSNPvariations5.AreviewofSupportVectorMachine6.ArepresentationofmultipleSNPvariationsasavector7.Themarks

8.InseparableCase9.Testresultswithclinicaldata10.PersonalopinionOutline1.Motivation19Motivation研究每個人的「單一核甘酸多型性」(SNP)的差異,可以幫助了解致病基因,甚至預(yù)測藥物對個人是否具有療效,進一步設(shè)計量身訂做藥物,對新藥的開發(fā)有極大的影響。SNP的研究是後基因時代生技產(chǎn)業(yè)發(fā)展的主要趨勢。Motivation研究每個人的「單一核甘酸多型性」(SNP20ObjectiveWecanpresentamethodofdetectingwhetherthereisanassociationbetweenmultipleSNPvariationsandatraitordisease.ThemethodexploitstheSupportVectorMachine(SVM)whichhasbeenattractinglotsofattentionsrecently.ObjectiveWecanpresentameth21What’sSNP何謂SNP(單一核甘酸多型性)雖然同種生物其染色體差異極小,但平均1000個鹼基對(basepair)就有一個發(fā)生突變,這些變異稱為SNP,是造成每個人對藥物的敏感性不同、血型不同、身高等等的原因。此外,SNP也和癌癥、心血管疾病、自體免疫等等疾病有關(guān)。目前國內(nèi)賽亞基因和臺大醫(yī)院合作,正從事C型肝炎SNP研究,試圖找出病患的SNP,以預(yù)測藥物是否對病人有效。What’sSNP何謂SNP(單一核甘酸多型性)22What’sSNP

AgeneticmarkerisM1,M2,…intheDNAThedifferentvariantsofDNAthatdifferentpeoplehaveatthemarkerarealleles,denotedby1,2,3..,Thenumberofallelespermarkerissmall:typicallylessthanten(forcalledmicrosatellitemarker)orexactlytwo(forcalledSNPs).What’sSNPAgeneticmarkeris23HowtofindSNPvariationsTheproblemofdeterminingwhetherasetofSNPvariationcauseaspecificdiseaseortraitcouldbeformulatedasfollows.Foragivendiseaseortrait,1.ForeachsetofSNPvariations,finditsrepresentationasavectorinaEuclideanspace.(haplotypedata,clinicaldata,….wewilldiscussthisinthepage9)2.GetasystematicwayofdistinguishingSNPgenotypeofnormalpeoplefromonesofpeoplewiththediseaseortrait.WewillusetheSupportVectorMachine(SVM)toseparateSNPvectorsintotwogroups(normal,sick).HowtofindSNPvariationsThe24AreviewofSupportVectorMachineWhatisaSVM?afamilyoflearningalgorithmforclassificationofobjectsintotwoclasses.Input:atrainingset{(x1,y1),…,(xl,yl)}ofobjectxiE?(n-dimvectorspace)andtheirknownclassesyiE{-1,+1}.Output:aclassifierf:?→{-1,+1}.whichpredictstheclassf(x)forany(new)objectxE?AreviewofSupportVectorM25AreviewofSupportVectorMachine(1).LinearSVMforseparabletrainingsets:atrainingsetS={(x1,y1),…,(xl,yl)},xiE?,yiE{-1,+1}.AreviewofSupportVectorM26AreviewofSupportVectorMachineTheoptimalhyperplaneisdefinedbythepair(w,b).SolvethelinearprogramproblemMin?║w║2

st.yi(xi·w+b)-1≥0,i=1,…,lThisisaclassquadratic(convex)programAreviewofSupportVectorM27AreviewofSupportVectorMachine(2).LinearSVMfornon-separabletrainingsetsSolvethelinearprogramproblemMin?║w║2+C(∑εi),cisaextremelargevalueS.t.yi(xi·w+b)-1+εi

≥0,εi

≥0,0≤αi≤c,i=1,…,lAreviewofSupportVectorM28ArepresentationofmultipleSNPvariationsasavector

SchemeGiveneachdiseaseortrait,andacollectionofSNPdatawhichdependingongenotypeinaconsistentway.(haplotype,clinicaldata):7step1.Assumethatthereisnoenvironmentalfactor.2.SNPlocationsareassumedtobeknowforthediseaseortrait.3.AssumethereisareferenceSNPdata.(goodhealthrecords)4.Bygivingscoresbasedondifferencefromthereferencedata,assignavectortoeachSNPdata.ArepresentationofmultipleS29ArepresentationofmultipleSNPvariationsasavectorThedimensionofvectoristhenumberofSNPstotherelateddiseaseortrait.5.Atrainingsetischosenforthediseaseortrait,inotherwords,SNPgenotypedataofnormalandsickpopulation.6.ByusingStep4,computetheSNPvectorsofthetrainingdataset﹛(xi,yi)﹜,xiisaSNPdata,yi=1(sick)or-1(normal),7.UsetheSVMtogetahyperplanedividingintotwogroups(sick,normal)ArepresentationofmultipleS30Theremarks1.ThereferencedatacanbebuiltbycollectingSNPgenotypesfromthehealthynormalpopulation.2.Thehyperplaneobatinedcanbeconsideredasacriterion,and,givenanewdataset,itcanbeusedfortestingwhetherthepersonofthedataissusceptibletothediseaseortrait.3.RepresentationofanobjectasavectormightbecriticalformakingusetheSVM.Howtomakedomainknowledgecontainedinvectorrepresentationsisoneofthemajorissues.4.Theideaofdifferencescoringcouldbeappliedtootherdatasets(visualdatasuchasX-rayorMRIimage,…),inparticular,tohaplotypedataandtofindoutalinkageamongSNPtothediseaseortrait.5.OnceagroupofSNPpatternsareidentified,itcancomputecontributionscoreofeachofthoseSNPto

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論