分布估計算法

上傳人：7*** IP屬地：湖北上傳時間：2023-01-12 格式：PPT 頁數(shù)：140 大?。?.10MB 積分：28 舉報 版權(quán)申訴

已閱讀5頁，還剩135頁未讀，繼續(xù)免費閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進行舉報或認(rèn)領(lǐng)

文檔簡介

分布估計算法

濟南大學(xué)計算智能實驗室陳月輝yhchen@,

1思想遺傳算法中的交叉、變異等操作有可能破壞已經(jīng)優(yōu)化好的個體。為了避免這種現(xiàn)象，一種新的演化算法–分布估計算法應(yīng)運而生。分布估計算法中沒有交叉和變異。主要用到是好的個體的一種概率模型，以及根據(jù)此模型抽樣產(chǎn)生下一代。

1/12/20232GAtoEDA3基于種群的增強式學(xué)習(xí)PopulationbasedIncrementalLearning(PBIL,Baluja,1994)Populationsbasedsearch,suchasGACreateaprobabilityvectorbycountingthenumberof1sand0sineachgenepositionGeneratenewpopulationusingtheprobabilityvectorNoinformationiscarriedfromgenerationtogeneration!SupervisedCompetitivelearning,e.g.LVQWinner-take-allreinforcementlearninginANNWinnerisakindofprototypeofthesamplepresentedPBIL=GA+CLCapturethetrendfromthebestperformer4BasicPBILPinitializeprobabilityvector(eachposition=0.5)while(generations++<limit)foreachvectorido foreachpositionjdo generateVi(j)accordingtoP(j) end-do evaluatef(Vi)end-doVmax=max(f(Vi))updatePaccordingtoVmaxifrandom(0,1]<Pmutate mutatePend-ifend-while5DetailsPopulationreplacedbyprobabilityvectorP={p1,p2,…,p}pi

:Probabilityof1intheithbitGeneratenindividualsUpdatePusingthebestindividualpi(t+1)=xi+pi(t)(1-), i=1,2,…,MutateP:pi(t+1)=mU[0,1)+pi(t+1)(1-m)6PBILExamplet=0,P={0.5,0.5,0.5,0.5}Generate5individuals{1010,1100,0100,0111,0001}Fitness:

{2,2,1,3,1}Bestindividual:0111;=0.1UpdatePp1=0.5*(1-0.1)=0.45p2=p3=p4=0.1*1+0.5*(1-0.1)=0.557SomeapplicationsFunctionoptimizationJob-shopschedulingTSPBin-packingKnapsackProblemNeuralNetworkweighttraining8分布估計算法：總的框架EstimationofDistributionAlgorithmsdojustthat!Typicallytheyoperateasfollows:Step0:Randomlygenerateasetofindividuals(t=0)Step1:EvaluatetheindividualsWhile(notdone)Step2:Selectindividuals(where)tobeparentsDevelopaprobabilitydistribution/densityfunction,pt,basedontheparentsStep3:CreateoffspringusingptStep4:EvaluatetheoffspringStep5:Theoffspringreplacetheparents(t=t+1)Step6:GotoWhile9Flowchart10WhatModelstouse?StartwithProbabilityvectorforbinarystringsGaussiandistributionLaterDependencytreemodels(COMIT)BayesianNetwork11ProbabilityVectorPMBGAs12分布估計算法：概率向量TheEDAisknownastheUnivariateMarginalDistributionAlgorithmLet’strytosolvethefollowingproblemf(x)=x2,where-2.0x2.0,Letl=7,thereforeourmappingfunctionwillbed(2,-2,7,c)=4*decode(c)/127-213EDA：概率向量RandomlyGenerateanInitialPopulation

Genotype

Phenotype

Fitness Person1:10010100.331 Fit:? Person2:0100101 -0.835 Fit:? Person3:1101010 1.339 Fit:? Person4:0110110 -0.300 Fit:? Person5:1001111 0.488 Fit:? Person6:0001101 -1.591 Fit:?14EDA：概率向量EvaluatePopulationatt=0

Genotype

Phenotype

Fitness Person1:1001010 0.331 Fit:0.109 Person2:0100101 -0.835 Fit:0.697 Person3:1101010 1.339 Fit:1.790 Person4:0110110 -0.300 Fit:0.090 Person5:1001111 0.488 Fit:0.238 Person6:0001101 -1.591 Fit:2.53115EDA：概率向量Selectthebest3(truncationselection)individualstobeparents.

Genotype

Phenotype

Fitness Person1:1001010 0.331 Fit:0.109

Person2:0100101 -0.835 Fit:0.697 Person3:1101010 1.339 Fit:1.790 Person4:0110110 -0.300 Fit:0.090 Person5:1001111 0.488 Fit:0.238

Person6:0001101 -1.591 Fit:2.53116EDA：概率向量Constructajointprobabilitydistributionfunction,p0,giventhethreeparents.

Genotype

Phenotype

Fitness Person2:0100101 -0.835 Fit:0.697 Person3:1101010 1.339 Fit:1.790 Person6:0001101 -1.591 Fit:2.531Sinceourgenotypeonlyhastwovalues,weonlyneedtobeconcernedwiththeprobabilitythevalueofageneis1.Theprobabilitythatavalueofageneis0is1minustheprobabilitythatitisa1.17EDA：概率向量Constructajointprobabilitydistributionfunction,p0,giventhethreeparents.

Genotype

Phenotype

Fitness Person2:0100101 -0.835 Fit:0.697 Person3:1101010 1.339 Fit:1.790 Person6:0001101 -1.591 Fit:2.531Thus,p0=p(g0=1|Parent0)=0.333,p(g1=1|Parent0)=0.667p(g2=1|Parent0)=0.000,p(g3=1|Parent0)=0.667p(g4=1|Parent0)=0.667,p(g5=1|Parent0)=0.333p(g2=1|Parent0)=0.66718總結(jié)01111101010010101000011111010100101010001.00.50011111101101010111119分布估計算法：高斯分布EvaluatetheOffspring

Genotype

Phenotype

Fitness Child1:0001010-1.685 Fit:2.839 Child2:1101101 1.433 Fit:2.054 Child3:0101101-0.583 Fit:0.340 Child4:0001011-1.654 Fit:2.736 Child5:11001101.213 Fit:1.470 Child6:0100101-0.835 Fit:0.69720EDA：高斯分布ReplacetheparentswiththeOffspring

Genotype

Phenotype

Fitness Person2:0100101-0.835 Fit:0.697 Person3:11010101.339 Fit:1.790 Person6:0001101-1.591 Fit:2.531

Genotype

Phenotype

Genotype

Phenotype

Fitness Person1:0001010-1.685 Fit:2.839 Person2:1101101 1.433 Fit:2.054 Person3:0101101-0.583 Fit:0.340 Person4:0001011-1.654 Fit:2.736 Person5:11001101.213 Fit:1.470 Person6:0100101-0.835 Fit:0.69722高斯分布EstimationofDistributionAlgorithms:

AnExampleRun(byhand)RandomlyGenerateanInitialPopulation

Phenotype

Fitness Person1:0.331 Fit:? Person2:-0.835 Fit:? Person3:1.339 Fit:? Person4:-0.300Fit:? Person5:0.488 Fit:? Person6:-1.591 Fit:?23高斯分布EvaluateInitialPopulation

Phenotype

Fitness Person1:0.331 Fit:0.110 Person2:-0.835 Fit:0.697 Person3:1.339 Fit:1.793 Person4:-0.300Fit:0.900 Person5:0.488 Fit:0.238 Person6:-1.591 Fit:2.53124高斯分布Selectbest3of6

Phenotype

Fitness Person1:0.331 Fit:0.110 Person2:-0.835 Fit:0.697

Person3:1.339 Fit:1.793 Person4:-0.300Fit:0.900 Person5:0.488 Fit:0.238

Person6:-1.591 Fit:2.53125高斯分布Calculatejointdensityfunction(Gaussian)

Phenotype

Fitness Person3:1.339 Fit:1.793 Person4:-0.300Fit:0.900 Person6:-1.591 Fit:2.531

xavg=-0.184,=1.19926高斯分布Createnewpopulationof6

Phenotype

Fitness Child1:1.015Fit:1.030 Child2:-1.383Fit:1.913 Child3:-0.784Fit:0.615 Child4:0.416Fit:0.173 Child5:0.116Fit:0.013 Child6:-1.284Fit:1.64927高斯分布ReplaceParentswithChildren;selectbest3of6

Phenotype

Fitness Person1:1.015 Fit:1.030 Person2:-1.383Fit:1.913 Person3:-0.784 Fit:0.615 Person4:0.416Fit:0.173 Person5:0.116 Fit:0.013 Person6:-1.284 Fit:1.64928高斯分布Calculatenewjointdensityfunction

Phenotype

Fitness Person1:1.015 Fit:1.030 Person2:-1.383Fit:1.913 Person6:-1.284 Fit:1.649

xavg=-0.551;=1.1129What’sNext?

Usetreemodel(COMIT)Clusterbitsintogroups(ExtendedcompactGA)UseBayesianNetwork(BOA)30Beyondsinglebits:COMIT31HowtolearnaTreeModel?32Prim’sAlgorithmsStartwithagraphwithnoedgesAddarbitrarynodetothetreeIterate-Hanganewnodetothecurrenttree-Preferadditionofedgeswithlargemutualinformation(greedyapproach)Complexity:O(n2)33VariantsofPMBGAswithTreeModels34BeyondPairwiseDependencies:ECGA35LearningtheModelinECGA36HowtoComputeModelQuality?37最小描述長度（MDL）基本概念最小描述長度準(zhǔn)則解釋一組數(shù)據(jù)的最好理論，應(yīng)該使得下面兩項之和最?。好枋隼碚撍枰谋忍亻L度；

在理論的協(xié)助下，對數(shù)據(jù)編碼所需要的比特長度。（最小描述長度也稱為給定數(shù)據(jù)的隨機復(fù)雜性)ECGA中的最小描述長度準(zhǔn)則

我們應(yīng)該尋求這樣一種合理且較小的結(jié)構(gòu)，使得訓(xùn)練樣本的大多數(shù)數(shù)據(jù)符合這個結(jié)構(gòu)，把樣本中不符合的數(shù)據(jù)作為例外編碼，使得下面兩項最?。壕幋a群組結(jié)構(gòu)所需的比特，它代表了猜想；編碼例外實例所需要的比特。38SamplingModelinECGA

SamplegroupsofbitsatatimeBasedonobservedprobabilities/proportionsButcanalsoapplypopulationbasedcrossoversimilartouniformbutw.r.t.model.39What’sNext?WesawProbabilityvector(noedges)Treemodels(someedges)Marginalproductmodels(groupsofvariables)NextBayesiannetworks–canrepresentallaboveandmore40JointProbabilityDistribution(JPD)Solutionasasetofrandomvariables

JointprobabilityDistribution(JPD)Exponentialtothenumberofvariables,thereforenotfeasibletocalculateinmostcasesNeedsSimplification!! 41FactorisationofJPDUnivariatemodel:Nointeraction:SimplestmodelBivariatemodel:Pair-wiseinteractionMultivariateModel:interactionofmorethantwovariables42TypicalestimationandsamplingofJPDinEDAsLearntheinteractionbetweenvariablesinthesolutionLearntheprobabilitiesassociatedwithinteractingvariablesThisspecifiestheJPD:p(x)SampletheJPD(i.e.learnedprobabilities)43ProbabilisticGraphicalModelsEfficienttooltorepresentthefactorisationofJPDMarriagebetweenprobabilitytheoryandGraphtheoryConsistofTwocomponentsStructureParametersTwotypesofPGMDirectedPGM(BayesianNetworks)UndirectedPGM(MarkovRandomField)44DirectedPGM(Bayesiannetworks)Structure:

DirectedAcyclicGraph(DAG)Independencerelationship: AvariableisconditionallyindependentofrestofthevariablesgivenitsparentsParameters:

ConditionalprobabilitiesX1X2X3X4X545BayesiannetworksThefactorisationofJPDencodedintermsofconditionalprobabilitiesisJPDforBNX1X2X3X4X546EstimatingaBayesiannetworkEstimatestructureEstimateparametersThiscompletelyspecifiestheJPDJPDcanthenbeSampledX1X2X3X4X547BNbasedEDAsInitialiseparentsolutionsSelectasetfromparentsolutionsEstimateaBNfromselectedsetEstimatestructureEstimateparametersSampleBNtogeneratenewpopulationReplaceparentswithnewsetandgoto2untilterminationcriteriasatisfies48HowtoestimateandsampleBNinEDAsEstimatingstructureScore+SearchtechniquesConditionalindependencetestEstimatingparametersTrivialinEDAs:DatasetiscompleteEstimateprobabilitiesofparentsbeforechildSamplingProbabilisticLogicalSampling(Sampleparentsbeforechild)X1X2X3X4X549BNbasedEDAsWellestablishedapproachinEDAsBOA,EBNA,LFDA,MIMIC,COMIT,BMDAReferencesLarra?iagaandLozano2002Pelikan200250貝葉斯網(wǎng)絡(luò)全聯(lián)合概率計算復(fù)雜性十分巨大樸素貝葉斯太過簡單現(xiàn)實需要一種自然、有效的方式來捕捉和推理——不確定性知識變量之間的獨立性和條件獨立性可大大減少為了定義全聯(lián)合概率分布所需的概率數(shù)目51貝葉斯網(wǎng)絡(luò)的定義是一個有向無環(huán)圖(DAG)隨機變量集組成網(wǎng)絡(luò)節(jié)點，變量可離散或連續(xù)一個連接節(jié)點對的有向邊或箭頭集合每節(jié)點Xi都有一個條件概率分布表：P(Xi|Parents(Xi))，量化其父節(jié)點對該節(jié)點的影響52貝葉斯網(wǎng)絡(luò)的別名信念網(wǎng)(BeliefNetwork)概率網(wǎng)絡(luò)(ProbabilityNetwork)因果網(wǎng)絡(luò)(CausalNetwork)知識圖(KnowledgeMap)圖模型(GraphicalModel)或概率圖模型(PGM)決策網(wǎng)絡(luò)(DecisionNetwork)影響圖(InfluenceDiagram)53獨立和條件獨立

Weather和其它3個變量相互獨立給定Cavity后，Toothache和Catch條件獨立WeatherCavityCatchToothache54Independency55JointprobabilitydistributionJointprobabilitydistributionof(X,Y,Z):x,y,z

56Marginaldistribution:p(x),p(y),p(z)57Second-orderjointprobabilityfunctions58Conditionalprobabilityfunctions59Conditionalprobabilityfunctions60Conditionalprobabilityfunctions61貝葉斯網(wǎng)絡(luò)的語義貝葉斯網(wǎng)絡(luò)的兩種含義對聯(lián)合概率分布的表示—構(gòu)造網(wǎng)絡(luò)對條件依賴性語句集合的編碼—設(shè)計推理過程貝葉斯網(wǎng)絡(luò)的語義P(x1,...,xn)=P(x1|parent(x1))...P(xn|parent(xn))62貝葉斯網(wǎng)絡(luò)示例BurglaryEarthquakeMaryCallsJohnCallsAlarmBEP(A)tttfftff0.950.940.290.001AP(J)tf0.900.05AP(M)tf0.700.01P(B)0.001P(E)0.00263貝葉斯網(wǎng)絡(luò)的語義公式計算示例試計算：報警器響了，但既沒有盜賊闖入，也沒有發(fā)生地震，同時John和Mary都給你打電話的概率。解：P(j,m,a,~b,~e)=P(j|a)P(m|a)P(a|~b,~e)P(~b)P(~e)=0.9×0.7×0.001×0.999×0.998=0.00062=0.062%64LearningBayesNetsFromDataWhatisaBayesianNetwork?DirectedacyclicgraphNodesarevariables(discreteorcontinuous)Arcsindicatedependencebetweenvariables.ConditionalProbabilities(localdistributions)MissingarcsimpliesconditionalindependenceIndependencies+localdistributions=>modularspecificationofajointdistributionX1X2X365WhyBayesianNetworks?ExpressivelanguageFinitemixturemodels,Factoranalysis,HMM,Kalmanfilter,…IntuitivelanguageCanutilizecausalknowledgeinconstructingmodelsDomainexpertscomfortablebuildinganetworkGeneralpurpose“inference”algorithmsP(BadBattery|HasGas,Won’tStart)Exact:ModularspecificationleadstolargecomputationalefficienciesApproximate:“Loopy”beliefpropagationGasStartBattery66OverviewLearningProbabilities(localdistributions)IntroductiontoBayesianstatistics:LearningaprobabilityLearningprobabilitiesinaBayesnetLearningBayes-netstructureBayesianmodelselection/averagingApplications67極大似然估計極大似然思想有兩個射手，一人的命中率為0.9,另一人的命中率為0.1,現(xiàn)在他們中的一個向目標(biāo)射擊了一發(fā)，結(jié)果命中了，估計是誰射擊的？一般說，事件A發(fā)生的概率與參數(shù)有關(guān)，取值不同，則P(A)也不同。因而應(yīng)記事件A發(fā)生的概率為P(A|).若A發(fā)生了，則認(rèn)為此時的值應(yīng)是在中使P(A|)達到最大的那一個。這就是極大似然思想68極大似然估計似然函數(shù)與極大似然估計為該總體的似然函數(shù)。定義：若有使得則稱為的極大似然估計.記為69求極大似然估計的步驟(1)做似然函數(shù)(2)做對數(shù)似然函數(shù)(3)列似然方程,令若該方程有解，則其解就是70例子1設(shè)X1，…，Xn為取自參數(shù)為的泊松分布總體的樣本，求的極大似然估計.令71例子2設(shè)X1，…，Xn為取自總體的樣本，求參數(shù)的極大似然估計令72例子2為的極大似然估計.73估計量的評價標(biāo)準(zhǔn)估計量的評價標(biāo)準(zhǔn)：無偏性，有效性，一致性無偏性：E()=θ有效性：D()小，更有效一致性：樣本數(shù)趨于無窮時，依概率趨于θ：74貝葉斯估計-最大后驗概率用一組樣本集K={x1,x2,…,xN}估計未知參數(shù)θ未知參數(shù)θ視為隨機變量，先驗分布為p(θ)，而在已知樣本集K出現(xiàn)的條件下的后驗概率為p(θ|K)最大后驗概率估計-Maximumaposteriori(MAP)75貝葉斯(最小風(fēng)險)估計參數(shù)估計的條件風(fēng)險：給定x條件下，估計量的期望損失參數(shù)估計的風(fēng)險：估計量的條件風(fēng)險的期望貝葉斯估計：使風(fēng)險最小的估計76貝葉斯(最小風(fēng)險)估計損失函數(shù)定義為誤差平方：定理:如果定義損失函數(shù)為誤差平方函數(shù)，則有：77貝葉斯估計的步驟確定θ的先驗分布

p(θ)由樣本集K={x1,x2,…,xN}求出樣本聯(lián)合分布：p(K|θ)計算θ的后驗分布

計算貝葉斯估計78一元正態(tài)分布例解總體分布密度為：均值μ未知，μ的先驗分布為：用貝葉斯估計方法求μ的估計量樣本集：K={x1,x2,…,xN}79計算μ的后驗分布：計算μ的貝

葉斯估計：80LearningProbabilities:ClassicalApproachSimplecase:Flippingathumbtack(圖釘)tailsheadsTrueprobability

qisunknownGiveniiddata,estimatequsinganestimatorwithgoodproperties:lowbias,lowvariance,consistent(e.g.,MLestimate)iid-independentandidenticallydistributed(i.i.d.)

81LearningProbabilities:BayesianApproachtailsheadsTrueprobabilityqisunknownBayesianprobabilitydensityfor

WeobservedMIIDtossresults:D=(x[1],x[2],…x[m])Wewanttoestimatep(q)q0182BayesianApproach:useBayes'ruletocomputeanewdensityforqgivendatapriorlikelihoodposterior83TheLikelihood“binomialdistribution”84Example:ApplicationofBayesruletotheobservationofasingle"heads"p(q|heads)q01p(q)q01p(heads|q)=qq01priorlikelihoodposterior85TheprobabilityofheadsonthenexttossHowgoodisaparticularLikelihoodindicateshowlikelytheobserveddataisgenerated.

ML-Maximumlikelihood

86OverviewLearningProbabilitiesIntroductiontoBayesianstatistics:LearningaprobabilityLearningprobabilitiesinaBayesnetLearningBayes-netstructureBayesianmodelselection/averagingApplications87FromthumbtackstoBayesnetsThumbtackproblemcanbeviewedaslearningtheprobabilityforaverysimpleBN:Xheads/tailsQX1X2XN...toss1toss2tossNQXii=1toN88ThenextsimplestBayesnetXheads/tailsYheads/tailstailsheads“heads”“tails”89ThenextsimplestBayesnetXheads/tailsYheads/tailsQXXii=1toNQYYi?90ThenextsimplestBayesnetXheads/tailsYheads/tailsQXXii=1toNQYYi"parameterindependence"91ThenextsimplestBayesnetXheads/tailsYheads/tailsQXXii=1toNQYYi"parameterindependence"?twoseparatethumbtack-likelearningproblems92Abitmoredifficult...Xheads/tailsYheads/tailsThreeprobabilitiestolearn:qX=headsqY=heads|X=headsqY=heads|X=tails93Abitmoredifficult...Xheads/tailsYheads/tailsQXX1X2QY|X=headsY1Y2case1case2QY|X=tails???94Abitmoredifficult...Xheads/tailsYheads/tailsQXX1X2QY|X=headsY1Y2case1case2QY|X=tails95Abitmoredifficult...Xheads/tailsYheads/tailsQXX1X2QY|X=headsY1Y2case1case2QY|X=tailsheadstails3separatethumbtack-likeproblems96Ingeneral…LearningprobabilitiesinaBNisstraightforwardifLikelihoodsfromtheexponentialfamily(multinomial,poisson,gamma,...)ParameterindependenceConjugatepriorsCompletedata97IncompletedatamakesparametersdependentXheads/tailsYheads/tailsQXX1QY|X=headsY1QY|X=tails98IncompletedataIncompletedatamakesparametersdependentParameterLearningforincompletedataMonte-CarlointegrationInvestigatepropertiesoftheposteriorandperformpredictionLarge-sampleApprox.(Laplace/Gaussianapprox.)Expectation-maximization(EM)algorithmandinferencetocomputemeanandvariance.Variationalmethods99OverviewLearningProbabilitiesIntroductiontoBayesianstatistics:LearningaprobabilityLearningprobabilitiesinaBayesnetLearningBayes-netstructureBayesianmodelselection/averagingApplications100Difficulties如果網(wǎng)絡(luò)結(jié)構(gòu)未知,則構(gòu)造貝葉斯模型是非常困難的事情.因為如果屬性的數(shù)目是n,那么可能的結(jié)構(gòu)數(shù)目至少是n的指數(shù)階.在這樣巨大的搜索空間尋找出合理的網(wǎng)絡(luò)結(jié)構(gòu)是十分耗時的,必須通過一些評價標(biāo)準(zhǔn)從中進行挑選.常用的評價標(biāo)準(zhǔn)有MDL(minimumdescriptionlength),BIC(Bayesianinformationcriteria)以及Be(Bayesianlikelihoodequivalent)等.MDL和BIC的基礎(chǔ)都是信息論,其核心都是要構(gòu)建較短的編碼長度.而Be尺度適用于參數(shù)符合特定分布的情況,其中最著名的有BDe(Bayesiandirichletlikelihoodequivalent)和BGe(BayesianGaussianlikelihoodequivalent).常用的搜索算法有貪心搜索(greedysearch)、模擬退火(simulatedannealing)、最好優(yōu)先搜索(best-firstsearch)等.Heckerman等人在1995年指出,若從搜索結(jié)果和計算開銷綜合考慮,貪心搜索是最好的策略.由于貪心搜索通常只能得到局部最優(yōu)結(jié)果,一種解決方案是賦予多個隨機初始值.101TwoTypesofMethodsforLearningBNsConstraintbasedFindsaBayesiannetworkstructurewhoseimpliedindependenceconstraints“match”thosefoundinthedata.Scoringmethods(Bayesian,MDL,MML)FindtheBayesiannetworkstructurethatcanrepresentdistributionsthat“match”thedata(i.e.couldhavegeneratedthedata).102LearningBayes-netstructureGivendata,whichmodeliscorrect?XYmodel1:XYmodel2:103BayesianapproachGivendata,whichmodeliscorrect?morelikely?XYmodel1:XYmodel2:Data

d104Bayesianapproach:ModelAveragingGivendata,whichmodeliscorrect?morelikely?XYmodel1:XYmodel2:Data

daveragepredictions105Bayesianapproach:ModelSelectionGivendata,whichmodeliscorrect?morelikely?XYmodel1:XYmodel2:Data

dKeepthebestmodel:-Explanation-Understanding-Tractability106Toscoreamodel,useBayesruleGivendatad:"marginallikelihood"modelscorelikelihood107Occam’srazor108Occam’srazor109ComputationofMarginalLikelihoodEfficientclosedformifLikelihoodsfromtheexponentialfamily(binomial,poisson,gamma,...)ParameterindependenceConjugatepriorsNomissingdata,includingnohiddenvariablesElseuseapproximationsMonte-CarlointegrationLarge-sampleapproximationsVariationalmethods110PracticalconsiderationsThenumberofpossibleBNstructuresissuperexponentialinthenumberofvariables.Howdowefindthebestgraph(s)?111ModelsearchFindingtheBNstructurewiththehighestscoreamongthosestructureswithatmostkparentsisNPhardfork>1(Chickering,1995)HeuristicmethodsGreedyGreedywithrestartsMCMCmethodsscoreallpossiblesinglechangesanychangesbetter?performbestchangeyesnoreturnsavedstructureinitializestructure112LearningthecorrectmodelTruegraphGandPisthegenerativedistributionMarkovAssumption:PsatisfiestheindependenciesimpliedbyGFaithfulnessAssumption:PsatisfiesonlytheindependenciesimpliedbyGTheorem:UnderMarkovandFaithfulness,withenoughdatageneratedfromPonecanrecoverG(uptoequivalence).Evenwiththegreedymethod!113LearningBayesNetsFromDataBayesnet(s)dataX1

truefalsefalsetrueX2

1532X3

RedBlueGreenRed.........Bayes-netlearner+prior/expertinformationX1X4X9X3X2X5X6X7X8114K2AlgorithmsK2isanalgorithmforconstructingaBayesNetworkfromadatabaseofrecords“ABayesianMethodfortheInductionofProbabilisticNetworksfromData”,GregoryF.CooperandEdwardHerskovits,MachineLearning9,1992K2算法的基本思想是:先定義評價網(wǎng)絡(luò)結(jié)構(gòu)模型優(yōu)劣的測度函數(shù),再從一個空網(wǎng)絡(luò)開始,根據(jù)事先確定的節(jié)點次序,選擇使后驗結(jié)構(gòu)概率最大的節(jié)點作為該節(jié)點的父節(jié)點.依次遍歷完所有的節(jié)點,逐步為每個變量添加最佳父節(jié)點.115BasicModel

Theproblem:to

findthemostprobableBayes-networkstructuregivenadatabaseD–adatabaseofcasesZ–thesetofvariablesrepresentedbyD

Bsi,Bsj

–twobayesnetworkstructurescontainingexactlythosevariablesthatareinZ116BasicModel

Bycomputingsuchratiosforpairsofbayesnetworkstructures,wecanrankorderasetofstructuresbytheirposteriorprobabilities.

Basedonfourassumptions,thepaperintroducesanefficientformulaforcomputingP(Bs,D),letBrepresentanarbitrarybayesnetworkstructurecontainingjustthevariablesinD

117ComputingP(BS,D)

Assumption4

Thedensityfunctionf(Bp|Bs)isuniform.Bpisavectorwhosevaluesdenotestheconditional-probabilityassignmentassociatedwithstructureBs

Assumption2

Casesoccurindependently,givenabayesnetworkmodel

Assumption3

Therearenocasesthathavevariableswithmissingvalues

Assumption1

Thedatabasevariables,whichwedenoteasZ,arediscrete

118ComputingP(BS,D)D-dataset,ithasmcases(records)Z-asetofndiscretevariables:(x1,…,xn)ri-avariablexiinZhasripossiblevalueassignment:Bs-abayesnetworkstructurecontainingjustthevariablesinZi-eachvariablexiinBshasasetofparentswhichwerepresentwithalistofvariablesi

qi-therearehasuniqueinstantiationsofi

wij

-denotejthuniqueinstantiationofirelativetoD.Nijk-thenumberofcasesinDinwhichvariablexihasthevalueofandiisinstantiatedaswij.

Nij

Where119DecreasetheComputationalComplexityThreemoreassumptionstodecreasethecomputationalcomplexitytopolynomial-time:<1>Thereisanorderingonthenodessuchthatifxiprecedesxj,thenwedonotallowstructuresinwhichthereisanarcfromxjtoxi.<2>Thereexistsasufficientlytightlimitonthenumberofparentsofanynodes<3>P(ixi)andP(jxj)areindependentwhenij.120K2algorithm:aheuristicsearchmethodUsethefollowingfunctions:WheretheNijkarerelativetoibeingtheparentsofxiandrelativetoadatabaseDPred(xi)={x1,...xi-1}Itreturnsthesetofnodesthatprecedexiinthenodeordering121K2algorithm:aheuristicsearchmethod{Input:Asetofnodes,anorderingonthenodes,anupperbounduonthenumberofparentsanodemayhave,andadatabaseDcontainingmcases}{Output:Foreachnodes,aprintoutoftheparentsofthenode}122K2algorithm:aheuristicsearchmethodProcedureK2Fori:=1tondo

i=;Pold=g(i,i

);OKToProceed:=truewhileOKToProceedand|i

|<udo letzbethenodeinPred(xi)-ithatmaximizesg(i,i

{z});

Pnew=g(i,i

{z}); ifPnew>Poldthen Pold:=Pnew; i:=i

{z}; elseOKToProceed:=false;end{while}write(“Node:”,“parentsofthisnodes:”,i

);end{for}end{K2}

123ConditionalProbabilities

Letijk

denotetheconditionalprobabilitiesP(xi=vik|i=wij)-thatis,theprobabilitythatxihasvaluevforsomekfrom1tori,giventhattheparentsofx,representedby,areinstantiatedaswij.Wecallijk

anetworkconditionalprobability.Letbethefourassumptions.Theexpectedvalueofijk

124BOA-貝葉斯-狄雷克利原則貝葉斯優(yōu)化算法(Bayesianoptimizationalgorithm)使用概率模型來模擬種群中個體的分布情況，是一種分布估計算法。BOA通過構(gòu)建貝葉斯網(wǎng)絡(luò)模型，進而對該模型進行抽樣產(chǎn)生新個體。用這種方式取代遺傳算法中交叉、變異操作來生成新個體，有效地解決了由于遺傳算法的不足，如適應(yīng)度定標(biāo)問題、模式破壞問題、模式欺騙問題等。125貝葉斯優(yōu)化算法BOA使用貝葉斯網(wǎng)絡(luò)BN來模擬各變量之間相關(guān)性，這種相關(guān)性反映了模式理論中的模式。貝葉斯網(wǎng)絡(luò)是一個有向無環(huán)圖，用它來表示一組變量之間概率關(guān)系。用符號B=(G,θ)代表一個貝葉斯網(wǎng)絡(luò)，其中G表示貝葉斯網(wǎng)絡(luò)結(jié)構(gòu)，θ表示網(wǎng)絡(luò)的一組參數(shù)，θ是各變量之間概率關(guān)系的量化值。BOA通過選擇一部分優(yōu)良個體，稱之為樣本數(shù)據(jù)D，來構(gòu)建BN模型。對于給定的樣本數(shù)據(jù)D，可以構(gòu)建很多的網(wǎng)絡(luò)模型。為此，需要用一個尺度126貝葉斯優(yōu)化算法來評估各個BN的優(yōu)劣。貝葉斯狄雷克(BayesianDirichilet,BD)度量是基于后驗分布最大化的原理對BN進行評估。給定樣本數(shù)據(jù)D，某一BN的BD度量值越大，說明該BN與樣本數(shù)據(jù)匹配程度越高進,而可以對該BN模型進行抽樣，生成新個體。127算法的運行過程隨機初始化種群，設(shè)置種群規(guī)模、問題大小、BN每個節(jié)點的最大入度數(shù)。令進化代數(shù)t=0；計算每個個體的適應(yīng)度作為評價的依據(jù)；利用貪婪算法以及BD度量，構(gòu)建一個較好的貝葉斯網(wǎng)絡(luò)B=(G,θ)；對構(gòu)建出來的網(wǎng)絡(luò)B=(G,θ)進行抽樣，生成新個體群；用產(chǎn)生的新個體群更新種群為新一代種群；判斷收斂條件滿足否。若滿足，停止計算；否則，轉(zhuǎn)至(b)。128貝葉斯網(wǎng)絡(luò)貝葉斯網(wǎng)絡(luò)是用于表示變量之間依賴關(guān)系，并為任何全聯(lián)合概率分布（JointProbabilityDistribution）提供自然、有效的簡明規(guī)范的一種有向無環(huán)圖（DAG）結(jié)構(gòu)，其中每個節(jié)點都標(biāo)注了定量概率信息。具體地說，貝葉斯網(wǎng)絡(luò)是一種有向圖，其拓?fù)浣Y(jié)構(gòu)滿足如下特性：1）一個隨機變量集組成網(wǎng)絡(luò)節(jié)點，變量可以是離散或連續(xù)的。2）一個連接節(jié)點對的有向邊集合。若存在從節(jié)點X到節(jié)點Y的有向邊，則稱X是Y的一個父節(jié)點。129貝葉斯網(wǎng)絡(luò)3）每個節(jié)點Xi都有一個條件概率分布P(Xi|Parents(Xi))，量化其父節(jié)點對該節(jié)點的影響。4）圖中不存在有向環(huán)，因此是貝葉斯網(wǎng)絡(luò)是一種有向無環(huán)圖（DAG）。130貝葉斯網(wǎng)絡(luò)的構(gòu)造度量因子給定一組數(shù)據(jù)，可以構(gòu)造多個貝葉斯網(wǎng)絡(luò)，需要一個定量化的評估尺度以比較各網(wǎng)絡(luò)對數(shù)據(jù)的匹配程度，稱之為度量尺度。度量尺度值越大，表示匹配程度越高。貝葉斯-狄雷克利(Bayesian-Dirichlet)度量尺度131貝葉斯網(wǎng)絡(luò)的構(gòu)造注：Πxi為xi的父結(jié)點，m(Πxi)為Πxi的數(shù)量，

m(xi,Πxi)是個體為xi，其父結(jié)點為Πxi的個體數(shù)

量。m′(Πxi)，m′(xi,Πxi)，p(B|§)均為上一代BN的信息。一般采用K2metric，則m′(Πxi)，

人人文庫> 全部分類> 教育資料 > 輔導(dǎo)培訓(xùn)

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

分布估計算法

文檔簡介

溫馨提示

最新文檔

評論

分布估計算法

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔