基于元學(xué)習(xí)和對稱性的數(shù)據(jù)高效深度學(xué)習(xí)探索 Towards data-efficient deep learning with meta-learning and symmetries

上傳人：策*** IP屬地：山西上傳時間：2024-11-03 格式：DOCX 頁數(shù)：230 大?。?.94MB 積分：19.9 舉報 版權(quán)申訴

基于元學(xué)習(xí)和對稱性的數(shù)據(jù)高效深度學(xué)習(xí)探索 Towards data-efficient deep learning with meta-learning and symmetries_第2頁

基于元學(xué)習(xí)和對稱性的數(shù)據(jù)高效深度學(xué)習(xí)探索 Towards data-efficient deep learning with meta-learning and symmetries_第3頁

基于元學(xué)習(xí)和對稱性的數(shù)據(jù)高效深度學(xué)習(xí)探索 Towards data-efficient deep learning with meta-learning and symmetries_第4頁

基于元學(xué)習(xí)和對稱性的數(shù)據(jù)高效深度學(xué)習(xí)探索 Towards data-efficient deep learning with meta-learning and symmetries_第5頁

已閱讀5頁，還剩225頁未讀，繼續(xù)免費閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進行舉報或認(rèn)領(lǐng)

文檔簡介

TowardsData-EfficientDeepLearningwithMeta-LearningandSymmetries

JinXu

BalliolCollege

UniversityofOxford

AthesissubmittedforthedegreeofDoctorofPhilosophyinStatistics

Trinity2023

Acknowledgements

Firstandforemost,Iwanttoexpressmydeepgratitudetomysupervisors,Prof.Yee

WhyeTehandDr.TomRainforth.Theirunwaveringsupport,carefulguidance,andconstantinspirationhavebeeninvaluablethroughoutmyPhDjourney.Ithasbeenaprivilegetobementoredbythem,whoIregardasresearchrolemodels.Theirdepthandbreadthofknowledgehavebeenbothhumblingandenlightening.SpecialacknowledgementgoestoYeeWhye,whohasalwaysbeenconsiderateandreadytohelpintoughtimes.MyheartfeltthanksgotoTomforhisguidanceduringthechallengingtimesbroughtonbythepandemic.

IwouldliketoextendmygratitudetoallmycollaboratorsHyunjikKim,Jean-FrancoisTon,AdamKosiorek,EmilienDupont,andKasparM?rtens.TheirexpertiseandfeedbackhavebeencrucialinimprovingmyworkandIlearnagreatdealfromthem.AbigthankyoutoProf.RyanAdamsfromPrincetonUniversityandtomyinternshiphosts,JamesHensmanandMaxCrociatMicrosoftResearch.TheirmentorshipoutsideofmyPhDlifehasbeenanindispensablepartofmyresearchexperience.

Moreover,Ifeelextremelyfortunatetobesurroundedbyamazingandcaringfriendswhosenamesarenotpossibletoenumeratehere.AmongthemareEmilienDupont,Jean-FrancoisTon,CharlineLeLan,BobbyHe,SheheryarZaidi,QinyiZhang,GuneetDhillon,AndrewCampbell,ChrisWilliams,CarloAlfano,FaaizTaufiq,AnnaMenacherandothersfromourlovelyoffice1.17,HanwenXing,YanzhaoYang,NingMiao,ChaoZhang,Yutonglu,YixuanHe,XiLin,YuanZhou,FanWu,BohaoYaofromthedepartmentofstatistics,DunhongJin,SihanZhou,SijiaYao,HuiningYang,KevinWang,NataliaHong,HangYuan,KangningZhang,ChengyangWangandmanyothersfromotherdepartmentsatOxford,DenizOktay,SulinLiu,JennyZhanandothersfromPrincetonUniversity,internshippeersatMicrosoftResearchincludingAlexanderMeulemans,SalehAshkboosfromETH.

Aspecialthankstoalluniversityanddepartmentstaff,especiallyChrisCullenforhiskindandpatientsupportduringdifficulttimes,andtoJoannaStoneham,Stuart

McRobert,andotherswhoensuredasmoothPhDexperience.

Finally,aboveall,mydeepestthanksgotoYifanYuforherloveandcompanionship.SheimmenselyenrichedmytimeinOxford,bringingcolourandjoytomylife.Additionally,IameternallygratefultomyparentsChengxiangXuandFengChenforgivingmethefreedomtopursuemypassionsandfortheirunquestioningsupportthroughoutthisjourney.

Abstract

Recentadvancesindeeplearninghavebeensignificantlypropelledbytheincreasingavailabilityofdataandcomputationalresources.Whiletheabundanceofdataenablesmodelstoperformwellincertaindomains,therearereal-worldapplications,suchasinthemedicalfield,wherethedataisscarceordifficulttocollect.Furthermore,therearealsoscenarioswherethelargedatasetisbetterviewedaslotsofrelatedsmalldatasets,andthedatabecomesinsufficientforthetaskassociatedwithoneofthesmalldatasets.Itisalsonoteworthythathumanintelligenceoftenrequiresonlyahandfulofexamplestoperformwellonnewtasks,emphasizingtheimportanceofdesigningdata-efficientAIsystems.Thisthesisdelvesintotwostrategiestoaddressthischallenge:meta-learningandsymmetries.Meta-learningapproachesthedata-richenvironmentasacollectionofmanysmall,individualdatasets.Eachofthesesmalldatasetsrepresentsadistincttask,yetthereisunderlyingsharedknowledgebetweenthem.Harnessingthissharedknowledgeallowsforthedesignoflearningalgorithmsthatcanefficientlyaddressnewtaskswithinsimilardomains.Incomparison,symmetryisaformofdirectpriorknowledge.Byensuringthatmodels’predictionsremainconsistentdespiteanytransformationtotheirinputs,thesemodelsenjoybettersampleefficiencyandgeneralization.

Inthesubsequentchapters,wepresentnoveltechniquesandmodelswhichallaimatimprovingthedataefficiencyofdeeplearningsystems.Firstly,wedemonstratethesuccessofencoder-decoderstylemeta-learningmethodsbasedonConditionalNeuralProcesses(cnps).Secondly,weintroduceanewclassofexpressivemeta-learnedstochasticprocessmodelswhichareconstructedbystackingsequencesofneuralparameterisedMarkovtransitionoperatorsinfunctionspace.Finally,weproposegroupequivariantsubsampling/upsamplinglayerswhichtacklesthelossofequivarianceinconventionalsubsampling/upsamplinglayers.Theselayerscanbeusedtoconstructend-to-endequivariantmodelswithimproveddata-efficiency.

Contents

1Introduction

1.1Motivation

1.2Thesisoutline

1.3Papers

2Background

2.1Meta-learning

2.1.1Conventionalsupervisedlearningandmeta-learning

2.1.2Differentviewsofmeta-learning

2.1.3Commonapproachestometa-learning

2.2Neuralprocesses

2.2.1Stochasticprocesses

2.2.2Neuralprocessesasstochasticprocesses

2.2.3Neuralprocesstrainingobjectives

2.2.4Ameta-learningperspective

2.3Symmetriesindeeplearning

2.3.1Group,cosetandquotientspace

2.3.2Grouphomomorphism,groupactionsandgroupequivariance

.16

2.3.3Homogeneousspacesandliftingfeaturemaps

2.3.4FeaturemapsinG-CNNs

2.3.5Groupequivariantneuralnetworks

3MetaFun:Meta-LearningwithIterativeFunctionalUpdates

3.1Introduction

3.2MetaFun

3.2.1Learningfunctionaltaskrepresentation

3.2.2MetaFunforregressionandclassification

3.3Relatedwork

3.4Experiments

3.4.11-Dfunctionregression

3.4.2Classification:miniImageNetandtieredImageNet

3.4.3Ablationstudy

3.5Conclusionsandfuturework

3.6Supplementarymaterials

3.6.1Functionalgradientdescent

ReproducingkernelHilbertspace

Functionalgradients

Functionalgradientdescent

3.6.2Experimentaldetails

4DeepStochasticProcessesviaFunctionalMarkovTransitionOpera-

tors

4.1Introduction

4.2Background

4.3Markovneuralprocesses

4.3.1AmoregeneralformofNeuralProcessdensityfunctions

4.3.2Markovchainsinfunctionspace

4.3.3Parameterisation,inferenceandtraining

4.4Relatedwork

4.5Experiments

4.5.11Dfunctionregression

4.5.2Contextualbandits

4.5.3Geologicalinference

4.6Discussion

4.7Supplementarymaterials

4.7.1Proofs

4.7.2Implementationdetails

4.7.3Data

Modelarchitecturesandhyperparameters

Computationalcostsandresources

4.7.4Broaderimpacts

iii

5GroupEquivariantSubsampling

5.1Introduction

5.2Equivariantsubsamplingandupsampling

5.2.1TranslationequivariantsubsamplingforCNNs

5.2.2Groupequivariantsubsamplingandupsampling

5.2.3ConstructingΦ

5.3Application:Groupequivariantautoencoders

5.4Relatedwork

5.5Experiments

5.5.1Basicproperties:Equivariance,disentanglementandout-of-

distributiongeneralization

5.5.2Singleobject

5.5.3Multipleobjects

5.6Conclusions,limitationsandfuturework

5.7Supplementarymaterials

5.7.1Equivariantsubsamplingandupsampling

ConstructingΦ

Multiplesubsamplinglayers

5.7.2Groupequivariantautoencoders

5.7.3Proofs

5.7.4Implementationdetails

Data

Modelarchitectures

Hyperparameters

Computationalresources

6ConclusionsandFutureOutlook

Bibliography

Chapter1

Introduction

1.1Motivation

Recentbreakthroughsindeeplearningcanbelargelyattributedtothevastamountofdataavailableandtheadvancementofcomputationalresources[

Dengetal.,

2009,

Rainaetal.,

2009,

Silveretal.,

2016,

Jumperetal.,

2021,

Brownetal.,

2020a]

.Whiletrainingonlargedatasetsenablesdeeplearningmodelstoexcelincertaintasks,manyreal-worldapplicationsonlyprovidelimiteddataforaspecifictask.Forinstance,inmedicalfields,obtainingdata,especiallyforrarediseases,ischallengingandoftenexpensive.Indrugdevelopmentorrecommendationsystems,therewillalwaysbeinsufficientdatafornewdrugs/users,eventhoughabundantdataexistsforotherdrugsorusers.Therefore,toapplydeeplearningtothesefields,itisvitaltodevelopsystemsthataredata-efficient.Moreover,foradvancedAIsystems,data-efficiencycanbeacrucialingredient:Firstly,AIsystemsshouldbeabletogeneralizebeyondspecificdatadistributionswithoutrelyingondata;forinstance,animagerecognitionsystemshouldrecognizeobjectsregardlessoftheirpositionororientation.Secondly,humanintelligencecanoftensolvenewtaskswithjustafewexamples.Thus,forAItoemulatehuman-likeintelligence,itshouldalsohavesuchcapability.

FromaBayesianperspective,learninginvolvesupdatingourbeliefsaboutamodel(representedbyθ)giventhedata,i.e.p(θ|Ddata).Foramodeltolearnefficientlyfromasmallamountofdata,it’simportanttostartwithagoodinitialguessor"prior"p(θ).Inthispaper,welookattwodirectionstoobtainsuchpriorfordata-efficientlearning:Thefirstismeta-learning,whichlearnstheprior(orthesharedknowledge)from

similartasks.Itcanbeunderstoodas"learningtolearnmoreefficiently".Thesecondissymmetriesindeeplearning,whichservesasaknownpriorforcertainproblems.Symmetry,afundamentalconceptinphysics,representsaformofpriorknowledgethatisubiquitouslyobservedthroughoutourphysicalworld.

Meta-learningtacklesaspecificscenarioinwhichthevastpoolofdatacanbeviewedasmanysmalldatasets,eachrepresentingadistincttask.Yet,thesetaskscontainunderlyingsharedknowledgethatcanbeharnessedtoaddressnewtaskswithinthesamecategory.Thisscenarioisprevalentinmanyapplications.Take,forinstance,anonlineretailcompanywithdatafromcustomersworldwide.Thedataassociatedwitheachuseristypicallysparse.Inthiscontext,predictingbehavioursforeachuserconstitutesanindividualtask,butpatternsamongdifferentusersoftenexhibitsimilarities.Meta-learningalgorithmsaredesignedtohandlesuchcircumstances.Thegoalofmeta-learningistolearndata-efficientlearningalgorithmsthatcanlaterbeappliedtoaparticulartask.Thetrainingdataformeta-learningcomprisesnumerousrelatedtasks,eachwithalimitedsetofdatapoints.Afterthemeta-learningphase,thelearnedlearningalgorithmscansolveanewtaskinadata-efficientmanner.Incontrast,theaimofconventionalsupervisedlearningisjusttolearnapredictivemodel.

Meta-learningproblemscanbetackledfromvariousperspectives,andtheseap-proachescanbeunderstoodthroughdifferentviewpointssuchasoptimization-basedap-proaches[

RaviandLarochelle,

2016,

Finnetal.,

2017a

],metric-basedapproaches[

Koch,

2015

Vinyalsetal.,

2016,

Sungetal.,

2018,

Snelletal.,

2017],andmodel-based

approaches[

Santoroetal.,

2016,

Mishraetal.,

2018,

Garneloetal.,

2018a

],amongothers.Notethattheseviewsarenotexclusive.Forexample,methodssuchasprototypicalNetworks[

Snelletal.,

2017

],MAML[

Finnetal.,

2017a

],ML-PIP[

Gordon

etal.

2018

]etc.canbereformulatedunderamodel-basedframeworkthatusesanencoder-decodersetup.Inthissetup,theencoderproducesataskrepresentationusingtrainingdata,andthedecoderthenmakespredictionsbasedonthetaskrep-resentation.Theseapproachestransformthemeta-learningchallengetoresemblearegularlearningprobleminvolvingsequences,anditisalsomorecomputationallyefficientifnogradientcomputationisinvolvedinboththeencoderandthedecoderlikecnp-typemodels[

Garneloetal.,

2018a]

.OurstudyinChapter

explicitlyadoptsthisencoder-decoderframeworkformeta-learning.Byusingafunctionaltaskrepresentation,anditerativelyupdatingtherepresentationdirectlyinfunctionspace,

wedemonstratethatencoder-decoderapproacheswithoutgradientinformationcanalsobecompetitivewithotherapproaches,whichhasnotbeenshownbefore.

Furthermore,becausetrainingdataforeachtaskinmeta-learningisoftenlimited,uncertaintyestimationbecomescrucial.StochasticProcesses(sps)(e.g.GaussianProcesses(gps))canbeusedtomakepredictionswithuncertaintyestimation.Thus,learningtheseprocessescanbeseenasawaytoapproachmeta-learningwithuncer-taintyinmind.InChapter

,weproposeanewframeworktoconstructexpressiveneuralparameterisedspsbyparameterisingMarkovtransitionsinfunctionspace.

Unlikemeta-learningabove,whichdiscoverssharedknowledgefromrelatedtasks,symmetryservesasadirectformofpriororinductivebias,integratedintodeeplearningmodelswithouttheneedforpre-training.Symmetriesrefertotransformationsthatmaintaincertainpropertiesofanobjectofinterestunchanged.Theseincludetransformationssuchasimagetranslation,rotation,orpermutationofsetelements.Byincorporatingthesesymmetriesintodeeplearningmodels,ensuringthattheoutputsremainconsistent(thesameorundergothecorrespondingtransformation)despiteinputtransformations,themodelinherentlygeneralizestotransformedinputs.Consequently,deeplearningmodelsequippedwiththesesymmetriesnotonlybecomemoredata-efficientbutalsogeneralizebetter.AsimpleexampleofthisisConvolutinalNeuralNetworks(cnns),whichareinvarianttoinputtranslationsforclassificationtasks,andperformsignificantlybettercomparedtoplainfeed-forwardnetworks.Earlierresearchhasintroducedmanymethodstobuildconvolutional[

Cohenand

Welling,

2016,

2017,

Cohenetal.,

2019]andattentionblocks[Hutchinsonetal.,

2021,

Fuchsetal.,

2020

]thatareequivariantw.r.t.tovarioussymmetries.However,thepoolinglayersorsubsampling/upsamplinglayerscommonlyusedinvariousdeeplearningarchitecturesbreakthesesymmetries[

Zhang,

2019]

.InChapter

5,wepresent

groupequivariantsubsampling/upsamplinglayersthathaveexactequivariance.

1.2Thesisoutline

InChapter

,weprovideashortintroductiontometa-learning,neuralprocessesandsymmetriesindeeplearning,tosetthestageforlaterchapters.

InChapter

,weintroduceaniterativefunctionalencoder-decodermethodforsu-pervisedmeta-learning,whichisbasedonNeuralProcesses(nps)[

Garneloetal.,

2018a

,b]

.Onstandardfew-shotclassificationbenchmarkslikeminiImageNetandtieredImageNet,itisdemonstratedthatmeta-learningmethodsbasedontheneuralprocessfamilycanbecompetitiveorevenoutperformgradient-basedmethodssuchasMAML[

Finnetal.,

2017a

]andLEO[

Rusuetal.,

2019]

InChapter

,weintroduceMarkovNeuralProcesses(MNPs),anewclassofStochasticProcesses(SPs)whichareconstructedbystackingsequencesofneuralparameterisedMarkovtransitionoperatorsinfunctionspace.Therefore,theproposediterativeconstructionaddssubstantialflexibilityandexpressivitytotheoriginalframeworkofNeuralProcesses(NPs)withoutcompromisingconsistencyoraddingrestrictions.OurexperimentsdemonstrateclearadvantagesofMNPsoverbaselinemodelsonavarietyoftasks.It’snoteworthythatspmodelscanbeviewedthroughameta-learninglens.Sotheproposedmethodcanalsobeseenasameta-learningapproachwithprincipleduncertaintyestimation.

Chapter

,wefirstintroducetranslationequivariantsubsampling/upsamplinglayersthatcanbeusedtoconstructexacttranslationequivariantCNNs.Wethengeneralisetheselayersbeyondtranslationstogeneralgroups,thusproposinggroupequivariantsubsampling/upsampling.Weusetheselayerstoconstructgroupequivariantautoen-coders(GAEs)thatallowustolearnlow-dimensionalequivariantrepresentations.Weempiricallyverifyonimagesthattherepresentationsareindeedequivarianttoinputtranslationsandrotations,andthusgeneralisewelltounseenpositionsandorienta-tions.WefurtheruseGAEsinmodelsthatlearnobject-centricrepresentationsonmulti-objectdatasets,andshowimproveddataefficiencyanddecompositioncomparedtonon-equivariantbaselines.

InChapter

,wesummarizeourfindingsandexplorepotentialavenuesforfutureresearchtofurtheradvancethefield.

1.3Papers

Thisisanintegratedthesisandincludesthefollowingpublishedpapers:Chapter3contains:

Xu,J.,Ton,J.F.,Kim,H.,Kosiorek,A.,&Teh,Y.W.Metafun:Meta-

learningwithiterativefunctionalupdates.InternationalConferenceon

MachineLearning(ICML),2020[

Xuetal.,

2020]

Chapter4contains:

Xu,J.,Kim,H.,Rainforth,T.,&Teh,Y.(2021).Groupequivariantsub-sampling.AdvancesinNeuralInformationProcessingSystems(NeurIPS),2021[

Xuetal.,

2021]

Chapter5contains

Xu,J.,Dupont,E.,M?rtens,K.,Rainforth,T.,&Teh,Y.W.(2023).DeepStochasticProcessesviaFunctionalMarkovTransitionOperators.AdvancesinNeuralInformationProcessingSystems(NeurIPS),2023[

etal.

2023]

Chapter2

Background

2.1Meta-learning

2.1.1Conventionalsupervisedlearningandmeta-learning

Inconventionalsupervisedlearning,theobjectiveistolearnafunctionfthatmapsaninputfeaturevectorx∈Xtoanoutputlabely∈Y.Learningisbasedonexampleinput-outputpairsinatrainingsetDtrain={(xi,yi.Commontypesofsupervisedlearningtasksincluderegressionwhereoutputlabelsarereal-valued,andclassificationwheretheoutputlabelsrepresentdifferentclasses.Thefunctionf,oftenreferredto

asthepredictivemodel,isamemberofahypothesisclass,H:={f|f(x;?),?∈Rdφ}.

Foreachtask,thereisariskfunction?(y,f(x))whichmeasurespredictionerror.Asanexample,inthecontextofaregressiontask,?oftentakestheformofasquarederror,?(y,f(x))=(y?f(x))2.Thetrainingprocessofthemodelftranslatestosolvinganoptimizationproblemdefinedasfollows:

ItiscalledempiricalriskminimizationbecausethisobjectiveisanestimationofthepopulationriskE(xi,yi)~p(x,y)[?(yi,f(xi))]basedontheempiricaldistributionoftrainingdata.

Aftertraining,themodelshouldgeneralizeeffectivelywhenpresentedwithatestset,denotedasDtest={(xi,yim+1.Themodel’sperformancecanbeassessedusing

thetestrisk(f;Dtest)whichservesasanestimateoftheoverallpopulationrisk

usingunseendata.

Figure2.1:Dataforameta-classificationproblem.Boththemeta-trainingandmeta-testsetsconsistoftasks(redrectangles)andarepresumedtocomefromthesametaskdistributionp(T).Eachofthesetasksencompassesitsowntask-specifictrainingandtestsets,whicharecommonlyreferredtoasthecontext(yellowlabels)andthetarget(greylabels)respectively.

Inpractice,itiscommontohavescenarioswherelotsofsupervisedlearningtasksarerelatedtoeachother,yetthenumberofdatapointsforeachindividualtaskislimited.Meta-learningemergesasanewlearningparadigmtoaddresssuchchallenges.

Specifically,wehaveameta-trainingsetdefinedasMtrain={(Dt(a)in,Dt(s)t,?(j)

andameta-testsetgivenbyMtest={(Dt(a)in,Dt(s)t,?(j)M+1.Eachelementinthese

meta-datasetsisatupleconsistingofatrainingset(calledthecontext),atestset(calledthetarget)andariskfunction(typicallythesamewithinameta-dataset).This3-tuplecharacterizesataskTj(seeFigure

2.1

illustration).Insupervisedlearning,weusetrainingdatatotrainapredictivemodel,hopingitcangeneralizeacrosstheentiredatadistribution.Inmeta-learning,theassumptionisthatthereisacommontaskdistribution,denotedasp(T),fromwhichboththemeta-trainingsetandthemeta-testsetaredrawn.Meta-learningalgorithmsaimtousemeta-trainingdatatodiscoverlearningalgorithmsthatcangeneralizeacrosstheentiretaskdistribution.

Morespecifically,alearningalgorithmforasupervisedlearningtasktakesinatraining

setDtrain,ariskfunction?andoutputsapredictivemodel,writtenas:

=ΦA(chǔ)LGO(Dtrain,?).(2.2)

Since?isusuallyfixed,wewillomitthedependencyonitinsubsequentdiscussions.Foraparticulartask,thelearningalgorithmΦA(chǔ)LGOcanbeevaluatedbythetestriskofthelearnedpredictivemodel,denotedas:

(;Dtest).(2.3)

Meta-learningfindsalearningalgorithmbasedontasksfromthemeta-trainingsetMtrain,sothatthislearningalgorithmcanbemoreefficientlyappliedtonewtasks,andgeneralizesacrossthetaskdistributionp(T).Themeta-learningalgorithmcanberepresentedas:

ΦA(chǔ)LGO=MetaAlgo(Mtrain).(2.4)

Toevaluatethemeta-learningalgorithm,wecancompute:

Whileitresemblesthetestlossinsupervisedlearning,theaggregatedtestriskforataskreplacesthetraditionalriskfunctionforadatapoint.

Itisworthnotingthatwhilewefocusonsupervisedlearningtaskshere,meta-learningcanbeextendedtounsupervisedlearning[

EdwardsandStorkey,

2016,

Reedetal.,

2018

Hsuetal.,

2018]orreinforcementlearning[

Wangetal.,

2016,

Finnetal.,

2017a

,b]

2.1.2Differentviewsofmeta-learning

Bi-leveloptimizationviewLetusassumeboththepredictivemodelfandthelearningalgorithmΦA(chǔ)LGOcanbeparameterised,andtheparametersaredenotedas?andθaccordingly.Thatistosay,thelearningalgorithmcanbewrittenas:

?=ΦA(chǔ)LGO(Dtrain;θ).(2.6)

Meta-learningcanbeformulatedasthefollowingbi-leveloptimizationproblem:

wheretask-specificparameter?jdependsonθthroughtheinner-loopoptimization:

?j(θ)=ΦA(chǔ)LGO(Dt(a)in;θ)(2.8)

Manymeta-learningalgorithmsaredevelopedbasedonthisbi-leveloptimizationview,suchas

Finnetal.

[2017a],

Nicholetal.

[2018],

RaviandLarochelle

[2016]

HierarchicalmodelviewFromaprobabilisticperspective,thegenerativeprocessforeachtaskTjcanbeexpressedas:

θ～p(θ),?j～p(?j|θ),yi(j)～p(yi(j)|xi(j)?j,θ)(2.9)

BoththetrainingsetDt(a)inandthetestsetDt(s)tfollowthesamedistribution(as

illustratedinFigure

2.2

).Thiscanbeseenasaprobabilistichierarchicalmodelwhereθindicatesthehigh-levelglobalparametersforalltasksand?jdenotesthelow-levellocalparametersforeachtask.Inthiscontext,meta-learningisaboutinferringθfromlotsoftasksinthemeta-trainingset,thatisp(θ|Mtrain).Learning,ontheother

hand,infers?jgiventhetrainingsetDt(a)infortaskTj,thatisp(?j|θ,Dt(a)in).

(j)i

j=1,...

Figure2.2:Meta-learningashierarchicalmodels(AremakeofFigure1in

Gordon

etal.

[2018])

.Task-specificparameter?jdependsontheglobalparameterθ.Datapointsinboththecontextandthetargethavethesamegenerativeprocess,whichdependonbothθand?j.

Notethatp(?j|θ)canbeseenasapriorfortaskTjconditionedonθ.Therefore,meta-learningcanbeseenaslearninganempiricalpriorfromthemeta-trainingset.

Finnetal.

[2018],

Requeimaetal.

[2019]adoptsthisview

Model-basedviewAlearningalgorithmf=ΦA(chǔ)LGO(Dtrain)canbeseenasafunctionthattakesintheentiretrainingsetandoutputsapredictivemodel.ThemodelisthenusedtomakepredictionsontestdatainDtest.Thelearningandpredictionprocessescanthusbeconceptualizedassequence-to-sequencemappings.Forthesakeofbrevity,let’suseaconcisenotationfordatasequences,suchasx1:n={x1,x2,...,xn}.ForaspecifictaskTj,makingpredictionsfortestsetdatapointsbasedonthosefromthetrainingsetcanbedescribedasthefollowinginferencetask

p(ym+1:n|xm+1:n,x1:m,y1:m).(2.10)

Fromthisperspective,meta-learningisaboutcreatingthisconditionalmodel.Meta-learningonlydiffersfromconventionalsupervisedlearninginthatboththeinp

人人文庫> 全部分類> 應(yīng)用文書 > 研究報告

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

基于元學(xué)習(xí)和對稱性的數(shù)據(jù)高效深度學(xué)習(xí)探索 Towards data-efficient deep learning with meta-learning and symmetries

文檔簡介

溫馨提示

最新文檔

評論

基于元學(xué)習(xí)和對稱性的數(shù)據(jù)高效深度學(xué)習(xí)探索 Towards data-efficient deep learning with meta-learning and symmetries

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔