版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
ApplicationofSupportVectorMachinetodetectanassociationbetweenadiseaseortraitandmultipleSNPvariationsAuthor:GeneKim,MyungHoKimAdvisor:Dr.HsuGraduate:Ching-WenHongApplicationofSupportVector1Outline1.Motivation2.Objective3.What’sSNP(singlenucleotidepolymorphism)4.HowtofindSNPvariations5.AreviewofSupportVectorMachine6.ArepresentationofmultipleSNPvariationsasavector7.Themarks
8.InseparableCase9.Testresultswithclinicaldata10.PersonalopinionOutline1.Motivation2Motivation研究每個人的「單一核甘酸多型性」(SNP)的差異,可以幫助了解致病基因,甚至預(yù)測藥物對個人是否具有療效,進一步設(shè)計量身訂做藥物,對新藥的開發(fā)有極大的影響。SNP的研究是後基因時代生技產(chǎn)業(yè)發(fā)展的主要趨勢。Motivation研究每個人的「單一核甘酸多型性」(SNP3ObjectiveWecanpresentamethodofdetectingwhetherthereisanassociationbetweenmultipleSNPvariationsandatraitordisease.ThemethodexploitstheSupportVectorMachine(SVM)whichhasbeenattractinglotsofattentionsrecently.ObjectiveWecanpresentameth4What’sSNP何謂SNP(單一核甘酸多型性)雖然同種生物其染色體差異極小,但平均1000個鹼基對(basepair)就有一個發(fā)生突變,這些變異稱為SNP,是造成每個人對藥物的敏感性不同、血型不同、身高等等的原因。此外,SNP也和癌癥、心血管疾病、自體免疫等等疾病有關(guān)。目前國內(nèi)賽亞基因和臺大醫(yī)院合作,正從事C型肝炎SNP研究,試圖找出病患的SNP,以預(yù)測藥物是否對病人有效。What’sSNP何謂SNP(單一核甘酸多型性)5What’sSNP
AgeneticmarkerisM1,M2,…intheDNAThedifferentvariantsofDNAthatdifferentpeoplehaveatthemarkerarealleles,denotedby1,2,3..,Thenumberofallelespermarkerissmall:typicallylessthanten(forcalledmicrosatellitemarker)orexactlytwo(forcalledSNPs).What’sSNPAgeneticmarkeris6HowtofindSNPvariationsTheproblemofdeterminingwhetherasetofSNPvariationcauseaspecificdiseaseortraitcouldbeformulatedasfollows.Foragivendiseaseortrait,1.ForeachsetofSNPvariations,finditsrepresentationasavectorinaEuclideanspace.(haplotypedata,clinicaldata,….wewilldiscussthisinthepage9)2.GetasystematicwayofdistinguishingSNPgenotypeofnormalpeoplefromonesofpeoplewiththediseaseortrait.WewillusetheSupportVectorMachine(SVM)toseparateSNPvectorsintotwogroups(normal,sick).HowtofindSNPvariationsThe7AreviewofSupportVectorMachineWhatisaSVM?afamilyoflearningalgorithmforclassificationofobjectsintotwoclasses.Input:atrainingset{(x1,y1),…,(xl,yl)}ofobjectxiE?(n-dimvectorspace)andtheirknownclassesyiE{-1,+1}.Output:aclassifierf:?→{-1,+1}.whichpredictstheclassf(x)forany(new)objectxE?AreviewofSupportVectorM8AreviewofSupportVectorMachine(1).LinearSVMforseparabletrainingsets:atrainingsetS={(x1,y1),…,(xl,yl)},xiE?,yiE{-1,+1}.AreviewofSupportVectorM9AreviewofSupportVectorMachineTheoptimalhyperplaneisdefinedbythepair(w,b).SolvethelinearprogramproblemMin?║w║2
st.yi(xi·w+b)-1≥0,i=1,…,lThisisaclassquadratic(convex)programAreviewofSupportVectorM10AreviewofSupportVectorMachine(2).LinearSVMfornon-separabletrainingsetsSolvethelinearprogramproblemMin?║w║2+C(∑εi),cisaextremelargevalueS.t.yi(xi·w+b)-1+εi
≥0,εi
≥0,0≤αi≤c,i=1,…,lAreviewofSupportVectorM11ArepresentationofmultipleSNPvariationsasavector
SchemeGiveneachdiseaseortrait,andacollectionofSNPdatawhichdependingongenotypeinaconsistentway.(haplotype,clinicaldata):7step1.Assumethatthereisnoenvironmentalfactor.2.SNPlocationsareassumedtobeknowforthediseaseortrait.3.AssumethereisareferenceSNPdata.(goodhealthrecords)4.Bygivingscoresbasedondifferencefromthereferencedata,assignavectortoeachSNPdata.ArepresentationofmultipleS12ArepresentationofmultipleSNPvariationsasavectorThedimensionofvectoristhenumberofSNPstotherelateddiseaseortrait.5.Atrainingsetischosenforthediseaseortrait,inotherwords,SNPgenotypedataofnormalandsickpopulation.6.ByusingStep4,computetheSNPvectorsofthetrainingdataset﹛(xi,yi)﹜,xiisaSNPdata,yi=1(sick)or-1(normal),7.UsetheSVMtogetahyperplanedividingintotwogroups(sick,normal)ArepresentationofmultipleS13Theremarks1.ThereferencedatacanbebuiltbycollectingSNPgenotypesfromthehealthynormalpopulation.2.Thehyperplaneobatinedcanbeconsideredasacriterion,and,givenanewdataset,itcanbeusedfortestingwhetherthepersonofthedataissusceptibletothediseaseortrait.3.RepresentationofanobjectasavectormightbecriticalformakingusetheSVM.Howtomakedomainknowledgecontainedinvectorrepresentationsisoneofthemajorissues.4.Theideaofdifferencescoringcouldbeappliedtootherdatasets(visualdatasuchasX-rayorMRIimage,…),inparticular,tohaplotypedataandtofindoutalinkageamongSNPtothediseaseortrait.5.OnceagroupofSNPpatternsareidentified,itcancomputecontributionscoreofeachofthoseSNPtothediseaseortrait.Theremarks1.Thereferenceda14InseparableCaseFortheinseparablecase,theiterateduseofSVMenablesustodivideacollectionoflabelledofvectorsintoseveralclusteringgroups.1.Setathresholdvalue.Say,80%.2.UseSVMtoseparateacollectionoflabelledofvectorsintotwogroupsA,B.3.Checkifthegroupscontainmorethan80%ofeither1or-1labeledvectors.SupposeAisnotsuchone.ThenuseSVMtoAagaintotwosubgroups.4.Repeatthisprocedureuntileachsubgrouphasamajorityofmorethan80%.5.Foreachsubgroup,figureoutarange.
InseparableCaseFortheinsepa15TestresultswithclinicaldataTheclinicaldataisacardio-patientrecordsdataset:Height,age,sex,weight,etnicbackground,medicalhistory,birthplace,bloodpressure(systolicanddiastolic),Liqidmeasurementsetcarenumericalizedand+1:apatientwithheartattack,strokeorheartfailure,otherwise-1WeusedThorstenJoachims’implementationofSVM.Testresultswithclinicaldat16PersonalopinionApplicationofSVMiseffective,Butitisdifficulttosolvenonlinearproblem.Howtomakedomainknowledgecontainedinvectorrepresentationsisoneofthemajorissues.
17PersonalopinionApplicationofApplicationofSupportVectorMachinetodetectanassociationbetweenadiseaseortraitandmultipleSNPvariationsAuthor:GeneKim,MyungHoKimAdvisor:Dr.HsuGraduate:Ching-WenHongApplicationofSupportVector18Outline1.Motivation2.Objective3.What’sSNP(singlenucleotidepolymorphism)4.HowtofindSNPvariations5.AreviewofSupportVectorMachine6.ArepresentationofmultipleSNPvariationsasavector7.Themarks
8.InseparableCase9.Testresultswithclinicaldata10.PersonalopinionOutline1.Motivation19Motivation研究每個人的「單一核甘酸多型性」(SNP)的差異,可以幫助了解致病基因,甚至預(yù)測藥物對個人是否具有療效,進一步設(shè)計量身訂做藥物,對新藥的開發(fā)有極大的影響。SNP的研究是後基因時代生技產(chǎn)業(yè)發(fā)展的主要趨勢。Motivation研究每個人的「單一核甘酸多型性」(SNP20ObjectiveWecanpresentamethodofdetectingwhetherthereisanassociationbetweenmultipleSNPvariationsandatraitordisease.ThemethodexploitstheSupportVectorMachine(SVM)whichhasbeenattractinglotsofattentionsrecently.ObjectiveWecanpresentameth21What’sSNP何謂SNP(單一核甘酸多型性)雖然同種生物其染色體差異極小,但平均1000個鹼基對(basepair)就有一個發(fā)生突變,這些變異稱為SNP,是造成每個人對藥物的敏感性不同、血型不同、身高等等的原因。此外,SNP也和癌癥、心血管疾病、自體免疫等等疾病有關(guān)。目前國內(nèi)賽亞基因和臺大醫(yī)院合作,正從事C型肝炎SNP研究,試圖找出病患的SNP,以預(yù)測藥物是否對病人有效。What’sSNP何謂SNP(單一核甘酸多型性)22What’sSNP
AgeneticmarkerisM1,M2,…intheDNAThedifferentvariantsofDNAthatdifferentpeoplehaveatthemarkerarealleles,denotedby1,2,3..,Thenumberofallelespermarkerissmall:typicallylessthanten(forcalledmicrosatellitemarker)orexactlytwo(forcalledSNPs).What’sSNPAgeneticmarkeris23HowtofindSNPvariationsTheproblemofdeterminingwhetherasetofSNPvariationcauseaspecificdiseaseortraitcouldbeformulatedasfollows.Foragivendiseaseortrait,1.ForeachsetofSNPvariations,finditsrepresentationasavectorinaEuclideanspace.(haplotypedata,clinicaldata,….wewilldiscussthisinthepage9)2.GetasystematicwayofdistinguishingSNPgenotypeofnormalpeoplefromonesofpeoplewiththediseaseortrait.WewillusetheSupportVectorMachine(SVM)toseparateSNPvectorsintotwogroups(normal,sick).HowtofindSNPvariationsThe24AreviewofSupportVectorMachineWhatisaSVM?afamilyoflearningalgorithmforclassificationofobjectsintotwoclasses.Input:atrainingset{(x1,y1),…,(xl,yl)}ofobjectxiE?(n-dimvectorspace)andtheirknownclassesyiE{-1,+1}.Output:aclassifierf:?→{-1,+1}.whichpredictstheclassf(x)forany(new)objectxE?AreviewofSupportVectorM25AreviewofSupportVectorMachine(1).LinearSVMforseparabletrainingsets:atrainingsetS={(x1,y1),…,(xl,yl)},xiE?,yiE{-1,+1}.AreviewofSupportVectorM26AreviewofSupportVectorMachineTheoptimalhyperplaneisdefinedbythepair(w,b).SolvethelinearprogramproblemMin?║w║2
st.yi(xi·w+b)-1≥0,i=1,…,lThisisaclassquadratic(convex)programAreviewofSupportVectorM27AreviewofSupportVectorMachine(2).LinearSVMfornon-separabletrainingsetsSolvethelinearprogramproblemMin?║w║2+C(∑εi),cisaextremelargevalueS.t.yi(xi·w+b)-1+εi
≥0,εi
≥0,0≤αi≤c,i=1,…,lAreviewofSupportVectorM28ArepresentationofmultipleSNPvariationsasavector
SchemeGiveneachdiseaseortrait,andacollectionofSNPdatawhichdependingongenotypeinaconsistentway.(haplotype,clinicaldata):7step1.Assumethatthereisnoenvironmentalfactor.2.SNPlocationsareassumedtobeknowforthediseaseortrait.3.AssumethereisareferenceSNPdata.(goodhealthrecords)4.Bygivingscoresbasedondifferencefromthereferencedata,assignavectortoeachSNPdata.ArepresentationofmultipleS29ArepresentationofmultipleSNPvariationsasavectorThedimensionofvectoristhenumberofSNPstotherelateddiseaseortrait.5.Atrainingsetischosenforthediseaseortrait,inotherwords,SNPgenotypedataofnormalandsickpopulation.6.ByusingStep4,computetheSNPvectorsofthetrainingdataset﹛(xi,yi)﹜,xiisaSNPdata,yi=1(sick)or-1(normal),7.UsetheSVMtogetahyperplanedividingintotwogroups(sick,normal)ArepresentationofmultipleS30Theremarks1.ThereferencedatacanbebuiltbycollectingSNPgenotypesfromthehealthynormalpopulation.2.Thehyperplaneobatinedcanbeconsideredasacriterion,and,givenanewdataset,itcanbeusedfortestingwhetherthepersonofthedataissusceptibletothediseaseortrait.3.RepresentationofanobjectasavectormightbecriticalformakingusetheSVM.Howtomakedomainknowledgecontainedinvectorrepresentationsisoneofthemajorissues.4.Theideaofdifferencescoringcouldbeappliedtootherdatasets(visualdatasuchasX-rayorMRIimage,…),inparticular,tohaplotypedataandtofindoutalinkageamongSNPtothediseaseortrait.5.OnceagroupofSNPpatternsareidentified,itcancomputecontributionscoreofeachofthoseSNPto
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 同居生子分手協(xié)議書電子版
- 天津市小型建設(shè)工程合同的適用范圍
- 《地鐵設(shè)施設(shè)備系統(tǒng)》課件
- 2025年宜春貨運從業(yè)資格證模擬考試題目
- 2025年隴南道路貨物運輸從業(yè)資格證考試
- 2025年瀘州貨物從業(yè)資格證考試題
- 動物屠宰產(chǎn)業(yè)升級
- 智能家居投資管理辦法
- 挖掘機地鐵建設(shè)施工合同
- 汽車行業(yè)市場調(diào)研全解析
- 專題19 重點用法感嘆句50道
- 2024-2025學(xué)年統(tǒng)編版五年級語文上冊第七單元達標檢測卷(原卷+答案)
- 人教版七年級語文上冊《課內(nèi)文言文基礎(chǔ)知識 》專項測試卷及答案
- 2024年光伏住宅能源解決方案協(xié)議
- 【初中數(shù)學(xué)】基本平面圖形單元測試 2024-2025學(xué)年北師大版數(shù)學(xué)七年級上冊
- 江蘇省蘇州市2023-2024學(xué)年高二上學(xué)期1月期末物理試卷(含答案及解析)
- 2025屆陜西省四校聯(lián)考物理高三上期末聯(lián)考試題含解析
- 外墻裝修合同模板
- 中國發(fā)作性睡病診斷與治療指南(2022版)
- 律師事務(wù)所律師事務(wù)所管理手冊
- 2024年保安員證考試題庫及答案(共260題)
評論
0/150
提交評論