版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
ANovelAutomaticImageAnnotationMethodBasedonMulti-instanceLearningAbstractAuoaticiageannotation(AIA)isthebridgeofhigh-levelseanticinforationandhelow-levelfeatur.AIAisanffectvemethodtoresolvetheproblemof“SeanticGap.AccordingtoheinrinsiccharactrofAIA,whichisanyrgionsontainedintheannotatediage,AABasdonthefraeworkofulti-instancelearnng(IL)isproposedinthispper.Eachkeywordisanalyzedhierarchicallyinlow-granulariy-levelunderhefraeworkofMIL.Throughtherepresentativeinstancesarened,heseanticsimilarityofiagescanbeeffectivelyexpressedandthebetterannotationresutsareabletobeacqured,whichtestifiestheeffectivenessoftheproposedannotationetho.IntodutionWihthedveopentofultiediaandntworktenoogy,iagedatahasbenbcoingorecomonrapidy.Facingaassofiageresource,ntentbsediageretrieval(CBIR),atechnologyoorgnize,anagendnalyzetheereourceefficiently,isbecoingahotpoint.Howvr,undertheliitationof“eanticgap,hatis,heunderlyingvisionfeaturs,uhascoor,texue,andshpe,annotreflectdatchthequyattentionpletly,CBIRconrontstheunprecdentdhallenge.Inrecentyear,nwlypropoedauoaticimageannotation(AIA)kepsfousonerctingabridgebetwenhgh-levelseanticandlow-levelfeaturs,whihisnffectveproahoolvetheaboveentionedsanticgap.ince1999-occurrnceodelproposedbyMorrisec.,thereserhofautoaticiageannotationwasinitiate[1].In[2],translationodelasdvelopedoannotateiageautoaticallybasdonanssuptionthatkeyodsdvisonfeatureswredffrentlanguagetodescrbethesameiage.Similarto[2],literature[3]proposedrossMediaRelevanceModel(CMRM)whrethevisionnforationofeahiagewasdnotdasblobsetwhichistoanifesttheseanticinorationofiage.wever,bobsetnCMRMwaseectedbasdondisreteregoncusteringhichproducdaosofvisonfeaturesothattheannotationresultsweretooprfect.Inorderocopensateforthisproble,aContinuous-paceRelvanceModl(RM)wasproposdin[4].Furtherore,n[5]Multiple-BernoulliRelevanceModelwasproposedtoiproveCMRMandCRM.Despitevarilesidesinheaboveentionedthods,thecoreideabasdonutoaticiageannotationisidentical.heoredeaofuoaticiageannotationappliesannotatediagestoeretacertainodeltodescrbeepotentialreationhporapbeweenaskewordsndiagefeatureswhichisusedtopredictunknownannotationiages.Evenifpreviousliteraturesachevedsoeresultsfromvariablesidesrespectively,seanticdescriptionofeachkeywordhasnotbeendefinedexplicitlyinthe.Forhisend,onthebasisofinvestigatingthecharactersoftheautoaticiageannotation,i.e.iagesannotaedbykeywordscoprisemultipleregions;auoaticiageannotationisregardedasaproblemofultiinstanelarnng.Theproposdethodanalyzeseahkeywordnulti-granulartyhierarhyorflecttheseanticsiilaritysohattheethodnotonlycharacterizessemanticiplicationaccuratelybutalsoiprovestheperforanceofiagennotationwhichverifiestheeffectivenessofourpposdetho.Thisarticleisorgnizedasfollows:section1intducesautoaticiageannotationbriefly;autoaticiageannotationbasedonulti-instancelearningfraeworkisdisusedindetailinsection2;ndexprientalprocesndrultsaredscribednsection3;section4ariesanddiscusseshefutureresearhbrifly.AuoaticIageAnoatoninheframworkofulti-insanceLeaningInheprviouslearnngrewok,aspleisvewedasnnstance,i.e.hereltionhipbetwensaplesandinstancesisone-to-one,whieasaplemaycontainoreinstances,thisistosay,therelationshipbtweensaplesandinstancesisone-o-any.Abguitiesbetweentrainingsaplesofulti-instancelearningdifferfromonesofuprvisedlearning,unsuprvisedlearningndreinforceentlearningopletelysothattheprevousethodshadysolvethepoposdpobls.Owingtoitscharacteristicfeaturesandwdeprospect,multi-instancelearnngisabsorbingoreandoreattentionsinachinelearnngdoainandisrefrrdtoasanwlylearningframework[7.Thecoreideaulti-instancelearnngishatthetrainngsamplestconsistsofconcept-annotatedbgswhichcontainunnnotatedstances.hepuoseofulti-instancelerningistoassgnaconcptualnnotationobgsbeyondrainingsetbylearningfromtrainingbags.Ingneral,abagisannotatedaPositiveifandonlyifatleastoneinstanceislabeledPositive,oherwisehebagisannotatedasNegative.FrameworkofImageAnnotationofMulti-instanceLearningAccordingtoheabove-entionddefinitionoftheulti-instancelearning,naely,aPositivebagcontainatleastapositiveinsance,wecandrawaconcusionhatpsitiveinstancesshouldbedistribueduchorethanngativeinsancesinPositivebgs.ThisconcusionsharescomonproprtieswithDDalgorithm[8]inulti-instancelearnngdomain.Ifsoepointcanrepresentheoreseanticofaspecifiedkeywordhananyotherpointnhefeatherspace,nolesshanoneinstanceinpositivebagsshoudbeclosetothispontwhileallinstancesinngativebgswillbefarawayfromthispoint.Inthepoposdethod,wetakentoconsderatoneahseantickeywodindepndently.venifapartofusefulnforationwillbestneglectingherelationshpbeweenkewords,variouskewordsfromeachiageareusedtocoputingthesimilaritiesbetweeniagessothattheproposedethodscanrepresenttheseanticsimilarityofiageeffectivelyinlow-granularity.Inhefollowngsections,eachkeywordwillbeanalyzedandappliedincallevelsothatirrelevantinforationwithkeywordswillbeeliinatedtoiproveheprecisionofrpresentationofheseanticofkeyword.Frstly,keywordsw,includngPositiveandNegativebags,arecollected,adtheareasurroundedbyPositivebagsareobtainedbyclusterngadaptively.Secondly,thiscusterisviewedasPositivesetofwwhihconainsostitemsthanoherclutersndisfrthestfomNegativebg.Thrdly,GussinMixtureModelGMM)isusdonhencofw.Fnally,theiagescanbeannotatedautoaticallybasedontheposterorprobabilityofeachkeywordofiagesaccordngotheprobbilityofimageinGMMbyusingBayesianestiation.Figure1illustratesthisprocess.Fig.1.Theframeworkofautomaticimageannotationbasedmulti-instancelearningAutomaticImageAnnotationInconvenience,wefirstlyputforwardsomesymbols.wisdenotedasasemantickeyword,X={Xk|k=1,…,N}asasetoftrainingsamples,whereNisthenumberoftrainingsamples;S={x1,L,xn}asasetofrepresentativeinstancesafteradaptivelyclustering,wherexnisthenthiteminaclusters.Therefore,GMMisconstructdtodescribeseanticconcptofw,i.e.GMMisusedtoestiatethedistributionofeachkeywordoffeaurepaetoerecttheone-o-oneapfromkeywordstovisonfeature.NotehatthesuperiorityofGMMliesinproducingasoohestitionoranydenstydistrbutionwhihcnrflectthefeauredisributionofseantickeywordseffetivelybynon-praeterdensityestiating.Foraspecfiedkeywordw,GMrepresntsisvisionfeauredistrbution,p(xw)isdfindasollows:WhererepresentstheGussiandistributionofithcomponent,andarethecorrespondingmeanandvariancereapectively,isweightoftheithcomponent,reflectingitssignificance,and,Misthenumberofcomponents.Eachcomponentrepresentsaclusterinfeaturespace,reflectingavisionfeatureofw.Ineachoponnt,theonditionalprobbilitydnsityoflow-levelvisionfeturevectorxcanbeoputedasfollows:Whredisthediensionoffaturevectorx.TheparaetersofGMMarestiatedbyEMethodwhichisaxiumlikelihoodestiationfordistrbutionparaetersfromincopletedata.EMconsistsofwosteps,xpectationstep,E-stp,ndaxiumstep,M-step,whiharexeutedalternatelyuntilconvergenceaftermultipleiteration.AssumingthatthekeywordwcanproduceNwrepresentativeinstances,representsmeanando-vrianceoftheithGusianoponn.Inuitively,diffrentseantickeywodsshouldrpresntdiferntvisionfeauresdthenubersofoponntsrenotidenticalwitheachothrngenrlohatanadptivevalueofMcanbeobtaindbasdonMiniumDescrptionLngh(MD[9].TheproposedethodexrctsseanticclusterigsetsfromtrainingiageswhihreusedtoconstructGMinwhihechoponntrepresntsoevisionfeatueofaspecifidkeywod.Fomtheprpectiveofseanticappn,therpoedodelescribdthee-to-anyrelationhipbewenkeywordsandthecorrespondingvisionfatures.Theextractedseanticclusterngsetcanreflecttheseanticsiilaritybetweninstancesndkeywords.Accordingtotheaboveethod,aGMMisconstructedforeachkeywordrespectivelytodescrbeheseanticofhekeyword.Andthen,foraspecifiediagetobennotaedX={x1,…,xm},wherexmsnotdshethpdgn,heobofkewordwiscoputedacordngtoforula3).(3)Finally,theimageXisannoatedaccordingto5kewordsofgreatestposeriorprobbilities.3.ExperimentalResultsandAnalysisForcoparisonwithoherageannotationagorithsfairly,COREL[2],awidelyusediagedataset,isselectedinourpermentalprocess.Thisiagesetconsistsof5000iages,4500iagesfromwhihreusdastraningsples,therest500iagesastestsaples.1through5kewordsisextractedtonnotateniage,oinal371keywordsexistsindatset.Inourxperients,eachiageisdvided10rgionsusingNoralizedCutsgenttechnology[6].42,379regonsareproducdinalloraoleiagedatase,ndthn,thsergionsareclusteredto500goupseahofhichisclledablob.reachregon,36-deensionfetures,suchasolor,shpe,oationetc.areonsderdlikeliterature[2].Inordrtoeasurethepeforancesofvarousiageannotationethods,weopthesaeevaluationetricsasliterature[5],soepoularindicatorsinautoaticiageannotationandiageretrieval.Precisionisreferredastheratioofthetiesofcorrectannotationinrelationoallthetiesofannotation,whilerecallisreferredastheratioofthetimesofcorrectannotationinrelationtoallthepositivesaples.Thedetaileddefnitionsareasfollows:(4)(5)WhreAisthenuberofagesannotaedbyekeywod;Bisthenuberofagesannotaedcorrectly;Cisthenuberofiagesannoatedbysoekeywordinthewholedataset.Asatradeoffbetwentheoveindicatos,thegoetriceanofthemisadoptedwidly,naely:(6)Moreovr,wetakeastatisticsofthenumberofkewordsannotatedcorrectlywhichareusedtoannotateanagecorrectlyatleast.Thestatisticalvaluerefectsthecovrgeofkeywordsinorpposdethod,dnotdy“NuWrd”.3.1ExperimentalResultsFigure2showsthatthennotatedreultsofthepoposdethod,MILAnnotation,kepraherahighconsistentwiththegoundtuth.hisfactvrifiestheefectivenessofourpoposdethod.Fig.2.IllustrationsofannotationresultsofMILAnnotation3.2AnnotationResultsofMILAnnotationTable1dable2owatcoparetheavergeperforancebetwenourproposdethodndsoetraditionalannotationodelssuchasCOM[1],TM[2],CMRM[3],CRM[4]andMBRM[5],onCORELiagedataset.Inperients,263keywordsareconcrned.Table1.TheperforancesofvariousannotationodelonCORELTable2.ThecoparisonofF-easurebetweenvariousodelsFromTable1andTable2,wecanknowthattheannotationperforanceoftheproposedethodoupeforsoherodelsntwokewodet,ndthepopoedethodhasasignificntiproveentrelationoistingalgorihsinaverageprecsion,vergerecallF-easuredNuWords”.Specifically,MILnnotationcnobtanasignificntiproveentovrCOM,TM,RMandCRM;inexistingprobability-basedimageannoationodels,BRMcangetabestannotationperforancewhichisequivalentotheperforanceofMILannotation.4.ConclusionsAnalyzinghepopertiesofautoaticiageannotationdeeplycanknowitcanbeviewedasaulti-instancelernngpoblemsothatwepoposdaethodtonotatdiagesauoaticallybasedonulti-instancelearning.Eahkewordisanalyzedndpendntlytoguaranteeoreeffectiveseanticsiilarityinlow-grnulrity.Andthn,undrtheframeofulti-instancelearning,eachkeywordisfurthranalyzedinvaroushierarchies.Irrelevntinformationwithkeywordswillbeeliminatedtoiprovetheprecisionofepresntationoftheseanticofkewodsbyappngkeyodstocorrepondingregon.xprentalresultsdeonstratedtheffectivnesofMR-MIL.References[1]MoriY,TakahashiH,OkaR.Iage-to-wordtransforationbasedondividingandvecorquantizingiageswithwords.In:Proc.ofIntl.orkshoponMultimedaIntelligentStorgeandRetrievalManagementMISRM'99),Orlando,Oct.1999.[2]DuguluP,BarnardK,FreitasN,ForsthD.Objectrecognitionasachinetranslation:learningalexiconforafixediagevocabular.In:Proc.ofEuropeanConf.onComuterVision(ECCV’02,Copenhagen,Denark,Ma[3]JeonJ,LavrenkoV,ManathaR.Autoaticiageannotationandretrievalusingcross-edarelevanceodels.In:Proc.ofInt.ACMSIGIRConf.onResearchandDevelopmentinInformationRetrieval(ACMSIGIR’03,Toronto,Canada,Jul.2003:119-16.[4]LavrenkoV,ManathaR,JeonJ.Aodelforlearningtheseanticsofpictures.In:Proc.OfAdvancesinNeuralInformationProcessingSystemsNIPS’03,2003.[5]FengS,ManmathaR,LavrenkoV.Multiplebernoullirelevanceodelsforiageandvideoanotation.In:Proc.ofIEEEInt.Conf.onComputerVisionandPatternRecognitionCVPR’04,ashingtonDC,USA,[6]ShiJ,MalikJ.NoralizedcutsandiageSegentation.IEEETrans.onPattrnAnalsisandMachineIntelligence,2000,22(8):888-905.[7]MaronO.Learningfromabiguit.DepartmentofElectricalEngneeringandComputrScienc,MIT,PhDdissertation.1998.[8]MaronO,LozanoPT.Afraeworkforultiple-instancelearning.In:Proc.ofAdvancesinNeuralInformationProcessingSystemsNIPS’98,Pittsburgh,USA,Oct.1998:570-576.[9]LiJ,angJ.Autoaticlinguisticindexingfpicuresbyastatisticalodelingapproach.IEEETrans.OnPatternAnalysisandMachineInteligence,2003,25():1075–1088基于多實(shí)例的新型自動(dòng)圖像標(biāo)注方法研究ShunleZhua,XiaoqiuTana數(shù)學(xué)物理信息學(xué)院,浙江海洋大學(xué),舟山,316000,中國(guó)摘要:圖像自動(dòng)標(biāo)注是連接高層語(yǔ)義特征和底層特征的橋梁。圖像自動(dòng)標(biāo)注是解決“語(yǔ)義鴻溝”的有效的方法。根據(jù)圖像自動(dòng)標(biāo)注固有的特征,即在標(biāo)注的圖像中包含有很多區(qū)域,本論文提出了以多實(shí)例的框架研究為根底的圖像自動(dòng)標(biāo)注。每個(gè)關(guān)鍵詞都在多實(shí)例研究的框架下以低粒度級(jí)進(jìn)行逐層分析。通過(guò)這些有代表性的例如的挖掘,圖像的相似語(yǔ)義可以有效地進(jìn)行傳送,并且能夠?qū)崿F(xiàn)更好的標(biāo)注,這也驗(yàn)證了本文中提出的標(biāo)注方法的有效性。1.介紹隨著多媒體和網(wǎng)絡(luò)技術(shù)的開(kāi)展,圖像數(shù)據(jù)已經(jīng)迅速普及。面對(duì)著眾多圖像資源,一種有效地組織、管理和分析這些資源的技術(shù)——基于內(nèi)容的圖像檢索正成為熱點(diǎn)。然而,在“語(yǔ)義鴻溝”即底層視覺(jué)特征如顏色、紋理、形狀的限制下,基于內(nèi)容的圖像檢索不能完全反映和匹配查詢關(guān)注,面對(duì)著前所未有的挑戰(zhàn)。近年來(lái),新提出的自動(dòng)語(yǔ)義標(biāo)注集中于建立起圖像的高層語(yǔ)義和底層特征之間的一座橋梁,這是解決上面提到的語(yǔ)義鴻溝的一種有效的方法。自從1999年Morris提出了共生模式,圖像自動(dòng)標(biāo)注技術(shù)的研究便開(kāi)始了。在[2]中,翻譯模型被開(kāi)發(fā)來(lái)實(shí)現(xiàn)圖像自動(dòng)標(biāo)注,它建立在關(guān)鍵詞和視覺(jué)特征是描述同一圖像的不同的語(yǔ)言的假設(shè)之上。和[2]相似,文學(xué)[3]提出了跨媒體關(guān)聯(lián)模型,該模型中每幅圖像的視覺(jué)信息被記為BLOB集以表達(dá)圖像的語(yǔ)義信息。然而跨媒體關(guān)聯(lián)模型中的BLOB是建立在離散區(qū)域集群上的,該群會(huì)產(chǎn)生視覺(jué)喪失以便使標(biāo)注結(jié)果更加完美。為了彌補(bǔ)這個(gè)缺陷,[4]中提出了一種連續(xù)空間關(guān)聯(lián)模型。此外,[5]中提出了多重貝努利關(guān)聯(lián)模型來(lái)改善跨媒體關(guān)聯(lián)模型和連續(xù)空間關(guān)聯(lián)模型。盡管上面提到的方法中易變的方面,建立在圖形自動(dòng)標(biāo)注上的核心理念卻是相同的。圖像自動(dòng)標(biāo)注的核心理念是應(yīng)用已標(biāo)注的圖像建立某種模型來(lái)描述關(guān)鍵詞和用來(lái)預(yù)測(cè)未標(biāo)注圖像的圖像特征之間潛在的關(guān)系。盡管以前的文獻(xiàn)在不同方面都有所成就,但都沒(méi)對(duì)各個(gè)關(guān)鍵詞的語(yǔ)義描述準(zhǔn)確的下定義。鑒于此,在調(diào)查了圖像自動(dòng)標(biāo)注的特點(diǎn)——即圖像被標(biāo)注了多區(qū)域組成的關(guān)鍵字后,圖像自動(dòng)標(biāo)注被當(dāng)做一種多實(shí)例問(wèn)題來(lái)學(xué)習(xí)。該方法分析了多粒度層次中的每個(gè)關(guān)鍵字來(lái)反映語(yǔ)義相似度,以便不僅能準(zhǔn)確給出語(yǔ)義含義特征,還能提高證實(shí)我們提出的方法有效性的圖像標(biāo)注的性能。本文布局如下:第一局部簡(jiǎn)要介紹了圖像自動(dòng)標(biāo)注;第二局部具體討論了以多實(shí)例學(xué)習(xí)框架為根底的圖像自動(dòng)標(biāo)注;第三局部給出了實(shí)驗(yàn)性進(jìn)程和結(jié)果;第四局部總結(jié)并簡(jiǎn)要討論了未來(lái)的研究。2.多實(shí)例學(xué)習(xí)框架下的圖像自動(dòng)標(biāo)注在以前的學(xué)習(xí)框架里,樣品被視為一個(gè)詳情,即樣品和詳情之間的關(guān)系式一對(duì)一的,然而一個(gè)樣品可能包含更多的詳情,也就是說(shuō),樣品盒詳情之間是一對(duì)多的關(guān)系。訓(xùn)練多實(shí)例學(xué)習(xí)樣品集之間的歧義區(qū)分于對(duì)那些監(jiān)督學(xué)習(xí)、未監(jiān)督學(xué)習(xí)和完全強(qiáng)化學(xué)習(xí),以至于以前的方法很難解決提出的問(wèn)題。由于它的典型特征和廣闊的應(yīng)用前景,多實(shí)例學(xué)習(xí)被機(jī)器學(xué)習(xí)領(lǐng)域越來(lái)越重視,它也被稱為一種新型學(xué)習(xí)框架。多實(shí)例學(xué)習(xí)的核心理念是訓(xùn)練樣本集由包含未注釋實(shí)例的概念注釋袋組成。多實(shí)例學(xué)習(xí)的目的是通過(guò)對(duì)訓(xùn)練集的學(xué)習(xí)在訓(xùn)練集以外給集分配一個(gè)概念標(biāo)注。一般來(lái)說(shuō),一個(gè)包當(dāng)且僅當(dāng)至少一個(gè)實(shí)例被標(biāo)正時(shí)才被標(biāo)正包,否那么該包被標(biāo)負(fù)包。2.1圖像多實(shí)例學(xué)習(xí)的框架根據(jù)上面給出的多實(shí)例學(xué)習(xí)的定義,即一個(gè)正包至少包含一個(gè)正的實(shí)例,我們可以得出結(jié)論在正包中正實(shí)例應(yīng)該分布的比負(fù)實(shí)例多。這個(gè)結(jié)論和DD算法在多實(shí)例學(xué)習(xí)領(lǐng)域有共同屬性。如果一些某些點(diǎn)而不是視覺(jué)特征空間里的別的任何點(diǎn)能代表一個(gè)特定的關(guān)鍵詞的更多語(yǔ)義。正包中應(yīng)該有不少于一個(gè)實(shí)例接近這點(diǎn),而負(fù)包中所有實(shí)例應(yīng)該遠(yuǎn)離這點(diǎn)。上面提到的方法中,我們獨(dú)立考慮各個(gè)語(yǔ)義關(guān)鍵詞。盡管無(wú)視關(guān)鍵詞之間的關(guān)系會(huì)使一局部有用信息喪失,每幅圖像的各個(gè)關(guān)鍵字被用來(lái)計(jì)算圖像之間的相似度,以便所提出的方法能在低粒度下有效地代表每幅圖像的語(yǔ)義相似度。在以下局部,每個(gè)關(guān)鍵詞會(huì)被分析和應(yīng)用到局部,以便和關(guān)鍵詞無(wú)關(guān)的信息能被剔除來(lái)提高語(yǔ)義關(guān)鍵詞代表的精確性。首先,包括正包和負(fù)包的關(guān)鍵詞被收集,被正包包圍的區(qū)域被聚類自適應(yīng)獲得。其次,這個(gè)簇被當(dāng)做比別的簇包含更多詳情并最原理負(fù)包的正組。再者,高斯混合模型被用來(lái)學(xué)習(xí)w的語(yǔ)義。最后,通過(guò)運(yùn)用貝葉斯估計(jì)根據(jù)高斯混合模型中圖像的可能性,以圖像的每個(gè)關(guān)鍵詞的后可能性為根底,圖像能夠被自動(dòng)標(biāo)注。圖1列出了這個(gè)進(jìn)程。2.2圖像自動(dòng)標(biāo)注為了方便,我們先提出一些標(biāo)記。w被記為一個(gè)語(yǔ)義關(guān)鍵詞,X={Xk|k=1,…,N}作為一種樣本訓(xùn)練集,N是訓(xùn)練集的個(gè)數(shù)。自適應(yīng)聚類后S={x1,L,xn}作為一種實(shí)例代表集,xn是一簇中第n個(gè)工程。因此,GMM被構(gòu)建來(lái)描述w的語(yǔ)義概念,即GMM被用來(lái)評(píng)估每個(gè)視覺(jué)空間關(guān)鍵詞的分布,以通過(guò)關(guān)鍵詞和視覺(jué)特征建立一對(duì)一的關(guān)系。請(qǐng)注意GMM的優(yōu)點(diǎn)在于對(duì)通過(guò)非參數(shù)密度估計(jì)能有效反映語(yǔ)義關(guān)鍵詞的特征分布的任何密度分布產(chǎn)生順利的估計(jì)。對(duì)一個(gè)特定的關(guān)鍵詞,GMM代表它的視覺(jué)特征分布,p(x\w)被定義如下:N〔〕代表第i局部的高斯分布,u和是各自對(duì)應(yīng)的均值和方差,π是第i局部的權(quán)重,反映它的重要性,而且。M是構(gòu)成的個(gè)數(shù)。每個(gè)局部代表視覺(jué)空間的一簇,反映w的一個(gè)視覺(jué)特征。在每局部中,底層視覺(jué)特征矢量的傳統(tǒng)概率密度可計(jì)算如下:其中d是矢量x的維數(shù)。GMM的參數(shù)通過(guò)對(duì)不完全數(shù)據(jù)的分布參數(shù)使用EM方法即最大似然估來(lái)估計(jì)。EM由兩步組成,期望,E,最大步,M,這些被交替執(zhí)行直到經(jīng)過(guò)屢次迭代收斂。假設(shè)關(guān)鍵字w能產(chǎn)生Nw代表實(shí)例,代表第i個(gè)高斯模塊的均值和協(xié)方差。直覺(jué)上,不同的語(yǔ)義關(guān)鍵字應(yīng)該代表不同視覺(jué)特征,一般來(lái)講組成局部的個(gè)數(shù)彼此并不一致以便能得到M的一個(gè)以最小描述長(zhǎng)度為根底的適應(yīng)值。前面提出的方法從用來(lái)構(gòu)建GMM的訓(xùn)練圖像中提取語(yǔ)義聚類,在GMM中每個(gè)局部代表一個(gè)特定關(guān)鍵詞的一些視覺(jué)特征。從語(yǔ)義映射的角度來(lái)看,所提出的模型描述了關(guān)鍵字和相應(yīng)的視覺(jué)特征之間一對(duì)多的關(guān)系。提取出的語(yǔ)義簇集能夠反映實(shí)例和關(guān)鍵字間的語(yǔ)義相似度。根據(jù)上面的方法,一個(gè)GMM為每個(gè)關(guān)鍵字各自構(gòu)造來(lái)描述該關(guān)鍵字的語(yǔ)義。進(jìn)而,對(duì)于一個(gè)待標(biāo)注的特殊圖像,其中Xm被記為第m個(gè)分割區(qū)域,關(guān)鍵字w的可能性根據(jù)公式〔3〕來(lái)計(jì)算?!?〕最后,X圖像根據(jù)5個(gè)關(guān)鍵字的最大后驗(yàn)概率被標(biāo)注。3.實(shí)驗(yàn)結(jié)果與分析為公平的和別的圖像標(biāo)注算法做比擬,實(shí)驗(yàn)中選用COREL[2]這個(gè)被廣泛使用的圖像數(shù)據(jù)集。該圖像集包含5000張圖片,其中的4500張圖片被用做訓(xùn)練樣本,剩下的500張圖片用作測(cè)試樣本。每幅圖像標(biāo)注有1到5個(gè)關(guān)鍵字,數(shù)據(jù)集中共有371個(gè)關(guān)鍵字。實(shí)驗(yàn)中,每個(gè)圖像被采用歸一化的切段技術(shù)分為10個(gè)區(qū)域。整個(gè)圖像數(shù)據(jù)集中產(chǎn)生了42379個(gè)區(qū)域,然后,這些區(qū)域被聚集為500組,每組稱為一個(gè)blob。每個(gè)區(qū)域的36維特征如顏色,形狀,位置等像文獻(xiàn)[2]中一樣被考慮。為了測(cè)量各個(gè)圖像標(biāo)注方法的性能,我們采用在圖像自動(dòng)標(biāo)注和圖像恢復(fù)中圖像常用的,即文獻(xiàn)[5]中所用評(píng)價(jià)指標(biāo)。精確度被稱為正確標(biāo)注和所有標(biāo)注的次數(shù)的比率,而recall定義為正確標(biāo)注次數(shù)和所有正樣本的比率。詳細(xì)定義如下:(4)(5)其中A是標(biāo)注有某些關(guān)鍵字的圖像個(gè)數(shù);B是標(biāo)注正確的圖像個(gè)數(shù);C是整個(gè)數(shù)據(jù)集中被某些關(guān)鍵字標(biāo)注的圖像個(gè)數(shù);做為上面指標(biāo)的一個(gè)權(quán)衡,它們的幾何平均數(shù)被廣泛采用,即:(6)此外,我們采取用做至少正確標(biāo)注圖像的關(guān)鍵字標(biāo)注正確的個(gè)數(shù)做為統(tǒng)計(jì)。統(tǒng)計(jì)值反映了所提出的方法中關(guān)鍵字的覆蓋,被稱做“NumWords”。3.1實(shí)驗(yàn)結(jié)果圖形2顯示了所提出方法的標(biāo)注結(jié)果,MIL標(biāo)注,和事實(shí)保持了高度一致。這個(gè)結(jié)果證實(shí)了所提出方法的有效性。表2.MIL標(biāo)注方法標(biāo)注結(jié)果插圖3.2MIL標(biāo)注的標(biāo)注結(jié)果表1和表2顯示了我們提出的方法和一些傳統(tǒng)標(biāo)注模型比方COM[1],TM[2],CMRM[3],CRM[4]和NBRM[5]等在COREL圖像數(shù)據(jù)集之間的平均性能比照。實(shí)驗(yàn)中,涉及到了263個(gè)關(guān)鍵字。表1.各種
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 機(jī)器拉帶產(chǎn)品供應(yīng)鏈分析
- 坐浴盆用水龍頭產(chǎn)業(yè)鏈招商引資的調(diào)研報(bào)告
- 動(dòng)物絕育服務(wù)行業(yè)相關(guān)項(xiàng)目經(jīng)營(yíng)管理報(bào)告
- 在自釀酒的啤酒館內(nèi)供應(yīng)飲料行業(yè)市場(chǎng)調(diào)研分析報(bào)告
- 石油化工設(shè)備市場(chǎng)分析及投資價(jià)值研究報(bào)告
- 船舶護(hù)舷墊細(xì)分市場(chǎng)深度研究報(bào)告
- 不動(dòng)產(chǎn)代理行業(yè)營(yíng)銷策略方案
- 微生物肥料行業(yè)相關(guān)項(xiàng)目經(jīng)營(yíng)管理報(bào)告
- 冷鏈配送行業(yè)營(yíng)銷策略方案
- 快餐館行業(yè)市場(chǎng)調(diào)研分析報(bào)告
- GB/T 1957-1981光滑極限量規(guī)
- GB/T 19249-2017反滲透水處理設(shè)備
- 中小學(xué)作文教學(xué)論文參考文獻(xiàn),參考文獻(xiàn)
- 2023年無(wú)錫市惠山區(qū)財(cái)政局系統(tǒng)事業(yè)單位招聘筆試題庫(kù)及答案解析
- 第16課《我的叔叔于勒》課件(共26張PPT) 部編版語(yǔ)文九年級(jí)上冊(cè)
- 2023年北京城市副中心投資建設(shè)集團(tuán)有限公司校園招聘筆試題庫(kù)及答案解析
- 棉花種子加工方案
- 2022-2023學(xué)年浙科版(2019)選擇必修三 5.2 我國(guó)禁止生殖性克隆人(1) 課件(25張)
- 中小學(xué)幼兒園兒童用藥安全及健康教育課件
- DB11-T 3032-2022 水利工程建設(shè)質(zhì)量檢測(cè)管理規(guī)范
- 胃癌根治術(shù)的手術(shù)配合課件
評(píng)論
0/150
提交評(píng)論