H2O 深度學(xué)習(xí)報告_第1頁
H2O 深度學(xué)習(xí)報告_第2頁
H2O 深度學(xué)習(xí)報告_第3頁
H2O 深度學(xué)習(xí)報告_第4頁
H2O 深度學(xué)習(xí)報告_第5頁
已閱讀5頁,還剩50頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

DeepLearningwithH2O

ArnoCandel ErinLeDellEditedby:AngelaBartz

http://h2o.ai/resources/

October2021:SixthEdition

DeepLearningwithH2O

byArnoCandel&ErinLeDell

withassistancefromVirajParmar&AnishaAroraEditedby:AngelaBartz

PublishedbyH2O.ai,Inc.2307LeghornSt.

MountainView,CA94043

?2016-2021H2O.ai,Inc.AllRightsReserved.October2021:SixthEdition

Photosby?H2O.ai,Inc.

Allcopyrightsbelongtotheirrespectiveowners.Whileeveryprecautionhasbeentakeninthepreparationofthisbook,thepublisherandauthorsassumenoresponsibilityforerrorsoromissions,orfordamagesresultingfromtheuseoftheinformationcontainedherein.

PrintedintheUnitedStatesofAmerica.

Contents

Introduction

5

WhatisH2O?

5

Installation

6

InstallationinR

7

InstallationinPython

7

PointingtoaDifferentH2OCluster

8

ExampleCode

8

Citation

9

DeepLearningOverview

9

H2O’sDeepLearningArchitecture

10

SummaryofFeatures

11

TrainingProtocol

12

Initialization

12

ActivationandLossFunctions

12

ParallelDistributedNetworkTraining

15

SpecifyingtheNumberofTrainingSamples

17

Regularization

18

AdvancedOptimization

18

MomentumTraining

19

RateAnnealing

19

AdaptiveLearning

20

LoadingData

20

DataStandardization/Normalization

20

Convergence-basedEarlyStopping

21

Time-basedEarlyStopping

21

AdditionalParameters

21

UseCase:MNISTDigitClassification

22

MNISTOverview

22

PerformingaTrialRun

25

N-foldCross-Validation

27

ExtractingandHandlingtheResults

28

WebInterface

31

VariableImportances

31

JavaModel

33

GridSearchforModelComparison

33

4|CONTENTS

WhatisH2O?|5

CartesianGridSearch

34

RandomGridSearch

35

CheckpointModels

37

AchievingWorld-RecordPerformance

41

ComputationalPerformance

41

DeepAutoencoders

42

NonlinearDimensionalityReduction

42

UseCase:AnomalyDetection

43

StackedAutoencoder

46

UnsupervisedPretrainingwithSupervisedFine-Tuning

46

Parameters

46

CommonRCommands

53

CommonPythonCommands

53

Acknowledgments

53

References

54

Authors

55

Introduction

ThisdocumentintroducesthereadertoDeepLearningwithH2O.ExamplesarewritteninRandPython.Topicsinclude:

installationofH2O

basicDeepLearningconcepts

buildingdeepneuralnetsinH2O

howtointerpretmodeloutput

howtomakepredictions

aswellasvariousimplementationdetails.

WhatisH2O?

H2O.aiisfocusedonbringingAItobusinessesthroughsoftware.ItsflagshipproductisH2O,theleadingopensourceplatformthatmakesiteasyforfinancialservices,insurancecompanies,andhealthcarecompaniestodeployAIanddeeplearningtosolvecomplexproblems.Morethan9,000organizationsand80,000+datascientistsdependonH2Oforcriticalapplicationslikepredictivemaintenanceandoperationalintelligence.Thecompany–whichwasrecentlynamedtotheCBInsightsAI100–isusedby169Fortune500enterprises,including8oftheworld’s10largestbanks,7ofthe10largestinsurancecompanies,and4ofthetop10healthcarecompanies.NotablecustomersincludeCapitalOne,ProgressiveInsurance,Transamerica,Comcast,NielsenCatalinaSolutions,Macy’s,Walgreens,andKaiserPermanente.

Usingin-memorycompression,H2Ohandlesbillionsofdatarowsin-memory,evenwithasmallcluster.Tomakeiteasierfornon-engineerstocreatecompleteanalyticworkflows,H2O’splatformincludesinterfacesforR,Python,Scala,Java,JSON,andCoffeeScript/JavaScript,aswellasabuilt-inwebinterface,Flow.H2Oisdesignedtoruninstandalonemode,onHadoop,orwithinaSparkCluster,andtypicallydeployswithinminutes.

H2Oincludesmanycommonmachinelearningalgorithms,suchasgeneralizedlinearmodeling(linearregression,logisticregression,etc.),Na¨?veBayes,principalcomponentsanalysis,k-meansclustering,andword2vec.H2Oimplementsbest-in-classalgorithmsatscale,suchasdistributedrandomforest,gradientboosting,anddeeplearning.H2OalsoincludesaStackedEnsemblesmethod,whichfindstheoptimalcombinationofacollectionofpredictionalgorithmsusingaprocess

PAGE

6

|Installation

Installation|7

knownas”stacking.”WithH2O,customerscanbuildthousandsofmodelsandcomparetheresultstogetthebestpredictions.

H2Oisnurturingagrassrootsmovementofphysicists,mathematicians,andcomputerscientiststoheraldthenewwaveofdiscoverywithdatasciencebycollaboratingcloselywithacademicresearchersandindustrialdatascientists.StanforduniversitygiantsStephenBoyd,TrevorHastie,andRobTibshiraniadvisetheH2Oteamonbuildingscalablemachinelearningalgorithms.Andwithhundredsofmeetupsoverthepastseveralyears,H2Ocontinuestoremainaword-of-mouthphenomenon.

Tryitout

DownloadH2Odirectlyat

http://h2o.ai/download

.

InstallH2O’sRpackagefromCRANat

https://cran.r-project.

org/

web/packages/h2o/

.

InstallthePythonpackagefromPyPIat

/

pypi/h2o/

.

Jointhecommunity

Tolearnaboutourtrainingsessions,hackathons,andproductupdates,visit

http://h2o.ai

.

Tolearnaboutourmeetups,visit

/

topics/h2o/all/

.

Havequestions?PostthemonStackOverflowusingtheh2otagat

/questions/tagged/h2o

.

HaveaGoogleaccount(suchasGmailorGoogle+)?Jointheopensourcecommunityforumat

/d/forum/

h2ostream

.

Jointhechatat

https://gitter.im/h2oai/h2o-3

.

Installation

H2OrequiresJava;ifyoudonotalreadyhaveJavainstalled,installitfrom

/en/download/

beforeinstallingH2O.

TheeasiestwaytodirectlyinstallH2OisviaanRorPythonpackage.

InstallationinR

ToloadarecentH2OpackagefromCRAN,run:

install.packages("h2o")

1

Note:TheversionofH2OinCRANmaybeonereleasebehindthecurrentversion.

Forthelatestrecommendedversion,downloadthelateststableH2O-3buildfromtheH2Odownloadpage:

Goto

http://h2o.ai/download

.

ChoosethelateststableH2O-3build.

Clickthe“InstallinR”tab.

library(h2o)

#StartH2Oonyourlocalmachineusingallavailablecores.

#Bydefault,CRANpolicieslimitusetoonly2cores.

h2o.init(nthreads=-1)

#Gethelp

?h2o.glm

?h2o.gbm

?h2o.deeplearning

#Showademodemo(h2o.glm)demo(h2o.gbm)demo(h2o.deeplearning)

CopyandpastethecommandsintoyourRsession.AfterH2Oisinstalledonyoursystem,verifytheinstallation:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

InstallationinPython

ToloadarecentH2OpackagefromPyPI,run:

pipinstallh2o

1

TodownloadthelateststableH2O-3buildfromtheH2Odownloadpage:

Goto

http://h2o.ai/download

.

ChoosethelateststableH2O-3build.

Clickthe“InstallinPython”tab.

CopyandpastethecommandsintoyourPythonsession.

AfterH2Oisinstalled,verifytheinstallation:

importh2o

#StartH2Oonyourlocalmachine

h2o.init()

#Gethelphelp(h2o.estimators.glm.H2OGeneralizedLinearEstimator)help(h2o.estimators.gbm.H2OGradientBoostingEstimator)help(h2o.estimators.deeplearning.

H2ODeepLearningEstimator)

#Showademo

h2o.demo("glm")

h2o.demo("gbm")h2o.demo("deeplearning")

1

2

3

4

5

6

7

8

9

10

11

12

13

14

PointingtoaDifferentH2OCluster

Theinstructionsintheprevioussectionscreateaone-nodeH2Oclusteronyourlocalmachine.

ToconnecttoanestablishedH2Ocluster(inamulti-nodeHadoopenvironment,forexample)specifytheIPaddressandportnumberfortheestablishedclusterusingtheipandportparametersintheh2o.init()command.ThesyntaxforthisfunctionisidenticalforRandPython:

h2o.init(ip="9",port=54321)

1

ExampleCode

RandPythoncodefortheexamplesinthisdocumentcanbefoundhere:

/h2oai/h2o-3/tree/master/h2o-docs/src/

booklets/v2_2015/source/DeepLearning_Vignette_code_examples

DeepLearningOverview|9

10|H2O’sDeepLearningArchitecture

Thedocumentsourceitselfcanbefoundhere:

/h2oai/h2o-3/blob/master/h2o-docs/src/

booklets/v2_2015/source/DeepLearning_Vignette.tex

Citation

Tocitethisbooklet,usethefollowing:

Candel,A.,Parmar,V.,LeDell,E.,andArora,A.(Oct2021).DeepLearningwithH2O.

http://h2o.ai/resources

.

DeepLearningOverview

Unliketheneuralnetworksofthepast,modernDeepLearningprovidestrainingstability,generalization,andscalabilitywithbigdata.Sinceitperformsquitewellinanumberofdiverseproblems,DeepLearningisquicklybecomingthealgorithmofchoiceforthehighestpredictiveaccuracy.

Thefirstsectionisabriefoverviewofdeepneuralnetworksforsupervisedlearningtasks.ThereareseveraltheoreticalframeworksforDeepLearning,butthisdocumentfocusesprimarilyonthefeedforwardarchitectureusedbyH2O.

Thebasicunitinthemodel(shownintheimagebelow)istheneuron,abiologicallyinspiredmodelofthehumanneuron.Inhumans,thevaryingstrengthsoftheneurons’outputsignalstravelalongthesynapticjunctionsandarethenaggregatedasinputforaconnectedneuron’sactivation.

i=1

Inthemodel,theweightedcombinationα=藝n wixi+bofinputsignalsis

aggregated,andthenanoutputsignalf(α)transmittedbytheconnectedneuron.Thefunctionfrepresentsthenonlinearactivationfunctionusedthroughoutthenetworkandthebiasbrepresentstheneuron’sactivationthreshold.

Multi-layer,feedforwardneuralnetworksconsistofmanylayersofinterconnectedneuronunits(asshowninthefollowingimage),startingwithaninputlayertomatchthefeaturespace,followedbymultiplelayersofnonlinearity,andendingwithalinearregressionorclassificationlayertomatchtheoutputspace.Theinputsandoutputsofthemodel’sunitsfollowthebasiclogicofthesingleneurondescribedabove.

Biasunitsareincludedineachnon-outputlayerofthenetwork.Theweightslinkingneuronsandbiaseswithotherneuronsfullydeterminetheoutputoftheentirenetwork.Learningoccurswhentheseweightsareadaptedtominimizetheerroronthelabeledtrainingdata.Morespecifically,foreachtrainingexamplej,theobjectiveistominimizealossfunction,

L(W,B|j).

Here,Wisthecollection{Wi}1:N?1,whereWidenotestheweightmatrixconnectinglayersiandi+1foranetworkofNlayers.SimilarlyBisthecollection{bi}1:N?1,wherebidenotesthecolumnvectorofbiasesforlayeri+1.

Thisbasicframeworkofmulti-layerneuralnetworkscanbeusedtoaccomplishDeepLearningtasks.DeepLearningarchitecturesaremodelsofhierarchicalfeatureextraction,typicallyinvolvingmultiplelevelsofnonlinearity.DeepLearningmodelsareabletolearnusefulrepresentationsofrawdataandhaveexhibitedhighperformanceoncomplexdatasuchasimages,speech,andtext

(Bengio,2009).

H2O’sDeepLearningArchitecture

H2Ofollowsthemodelofmulti-layer,feedforwardneuralnetworksforpredictivemodeling.ThissectionprovidesamoredetaileddescriptionofH2O’sDeepLearningfeatures,parameterconfigurations,andcomputationalimplementation.

H2O’sDeepLearningArchitecture|

PAGE

11

PAGE

12

|H2O’sDeepLearningArchitecture

SummaryofFeatures

H2O’sDeepLearningfunctionalitiesinclude:

supervisedtrainingprotocolforregressionandclassificationtasks

fastandmemory-efficientJavaimplementationsbasedoncolumnarcom-pressionandfine-grainMapReduce

multi-threadedanddistributedparallelcomputationthatcanberunonasingleoramulti-nodecluster

automatic,per-neuron,adaptivelearningrateforfastconvergence

optionalspecificationoflearningrate,annealing,andmomentumoptions

regularizationoptionssuchasL1,L2,dropout,Hogwild!,andmodelaveragingtopreventmodeloverfitting

elegantandintuitivewebinterface(Flow)

fullyscriptableRAPIfromH2O’sCRANpackage

fullyscriptablePythonAPI

gridsearchforhyperparameteroptimizationandmodelselection

automaticearlystoppingbasedonconvergenceofuser-specifiedmetricstouser-specifiedtolerance

modelcheckpointingforreducedruntimesandmodeltuning

automaticpre-andpost-processingforcategoricalandnumericaldata

automaticimputationofmissingvalues(optional)

automatictuningofcommunicationvscomputationforbestperformance

modelexportinplainJavacodefordeploymentinproductionenvironments

additionalexpertparametersformodeltuning

deepautoencodersforunsupervisedfeaturelearningandanomalydetection

TrainingProtocol

ThetrainingprotocoldescribedbelowfollowsmanyoftheideasandadvancesdiscussedinrecentDeepLearningliterature.

Initialization

VariousDeepLearningarchitecturesemployacombinationofunsupervisedpre-trainingfollowedbysupervisedtraining,butH2Ousesapurelysupervisedtrainingprotocol.Thedefaultinitializationschemeistheuniformadaptiveoption,whichisanoptimizedinitializationbasedonthesizeofthenetwork.DeepLearningcanalsobestartedusingarandominitializationdrawnfromeitherauniformornormaldistribution,optionallyspecifyingascalingparameter.

ActivationandLossFunctions

ThechoicesforthenonlinearactivationfunctionfdescribedintheintroductionaresummarizedinTable1below.xiandwirepresentthefiringneuron’sinputvaluesandtheirweights,respectively;αdenotestheweightedcombinationα=iwixi+b.

Function

Formula

Range

Tanh

α ?α

f(α)=e?e

f(·)∈[?1,1]

RectifiedLinear

Maxout f

f(α)=max(0,α) f(·)∈R+

(α1,α2)=max(α1,α2) f(·)∈R

Table1:ActivationFunctions

eα+e?α

Thetanhfunctionisarescaledandshiftedlogisticfunction;itssymmetryaround0allowsthetrainingalgorithmtoconvergefaster.Therectifiedlinearactivationfunctionhasdemonstratedhighperformanceonimagerecognitiontasksandisamorebiologicallyaccuratemodelofneuronactivations

(LeCun

etal,1998).

MaxoutisageneralizationoftheRectifiiedLinearactivation,whereeachneuronpicksthelargestoutputofkseparatechannels,whereeachchannelhasitsownweightsandbiasvalues.Thecurrentimplementationsupportsonlyk=2.Maxoutactivationworksparticularlywellwithdropout

(Goodfellowet

al,2013).

Formoreinformation,referto

Regularization

.

TheRectifieristhespecialcaseofMaxoutwheretheoutputofonechannelisalways0.Itisdifficulttodeterminea“best”activationfunctiontouse;eachmayoutperformtheothersinseparatescenarios,butgridsearchmodelscanhelptocompareactivationfunctionsandotherparameters.Formoreinformation,referto

GridSearchforModelComparison

.ThedefaultactivationfunctionistheRectifier.Eachoftheseactivationfunctionscanbeoperatedwithdropoutregularization.Formoreinformation,referto

Regularization

.

Specifytheoneofthefollowingdistributionfunctionsfortheresponsevariableusingthedistributionargument:

AUTO

Bernoulli

Multinomial

Poisson

Gamma

Tweedie

Laplace

Quantile

Huber

Gaussian

Eachdistributionhasaprimaryassociationwithaparticularlossfunction,butsomedistributionsallowuserstospecifyanon-defaultlossfunctionfromthegroupoflossfunctionsspecifiedinTable2.Bernoulliandmultinomialareprimarilyassociatedwithcross-entropy(alsoknownaslog-loss),GaussianwithMeanSquaredError,LaplacewithAbsoluteloss(aspecialcaseofQuantilewithquantilealpha=0.5)andHuberwithHuberloss.ForPoisson,Gamma,andTweediedistributions,thelossfunctioncannotbechanged,solossmustbesettoAUTO.

Thesystemdefaultenforcesthetable’stypicaluserulebasedonwhetherregressionorclassificationisbeingperformed.Noteherethatt(j)ando(j)arethepredicted(alsoknownastarget)outputandactualoutput,respectively,fortrainingexamplej;further,letyrepresenttheoutputunitsandOtheoutputlayer.

Table2:Lossfunctions

Function Formula Typicaluse

MeanSquaredError L(W,B|j)=1lt(j)?o(j)l2

Regression

2 2

Absolute L(W,B|j)=lt(j)?o(j)l1 Regression

Huber L(W,B|j)=

2

2

Regression

lt(j)?o(j)l1?1

2

j1lt(j)?o(j)l2

forlt(j)?o(j)l1≤1,

CrossEntropy L(W,B|j)=?

藝ln(o(j))·t(j)+ln(1?o(j))·(1?t(j))

Classification

otherwise.

y y y y

y∈O

Topredictthe80-thpercentileofthepetallengthoftheIrisdatasetinR,usethefollowing:

ExampleinR

library(h2o)h2o.init(nthreads=-1)

train.hex<-h2o.importFile("https://h2o-public-test-/smalldata/iris/iris_wheader.csv")

splits<-h2o.splitFrame(train.hex,0.75,seed=1234)dl<-h2o.deeplearning(x=1:3,y="petal_len",

training_frame=splits[[1]],distribution="quantile",quantile_alpha=0.8)

h2o.predict(dl,splits[[2]])

1

2

3

4

5

6

7

8

Topredictthe80-thpercentileofthepetallengthoftheIrisdatasetinPython,usethefollowing:

ExampleinPython

importh2o

fromh2o.estimators.deeplearningimportH2ODeepLearningEstimator

h2o.init()

train=h2o.import_file("https://h2o-public-test-data./smalldata/iris/iris_wheader.csv")

splits=train.split_frame(ratios=[0.75],seed=1234)dl=H2ODeepLearningEstimator(distribution="quantile",

quantile_alpha=0.8)

dl.train(x=range(0,2),y="petal_len",training_frame=splits[0])

print(dl.predict(splits[1]))

1

2

3

4

5

6

7

8

ParallelDistributedNetworkTraining

TheprocessofminimizingthelossfunctionL(W,B|j)isaparallelizedversionofstochasticgradientdescent(SGD).AsummaryofstandardSGDprovidedbelow,withthegradient?L(W,B|j)computedviabackpropagation

(LeCun

etal,1998).

Theconstantαisthelearningrate,whichcontrolsthestepsizesduringgradientdescent.

Standardstochasticgradientdescent

InitializeW,B

Iterateuntilconvergencecriterionreached:

Gettrainingexamplei

?w

Updateallweightswjk∈W,biasesbjk∈Bwjk:=wjk?α?L(W,B|j)

jk

?b

bjk:=bjk?α?L(W,B|j)

jk

Stochasticgradientdescentisfastandmemory-efficientbutnoteasilyparal-lelizablewithoutbecomingslow.WeutilizeHogwild!,therecentlydevelopedlock-freeparallelizationschemefrom

Niuetal,2011,

toaddressthisissue.

Hogwild!followsasharedmemorymodelwheremultiplecores(whereeachcorehandlesseparatesubsetsorallofthetrainingdata)areabletomakeindependentcontributionstothegradientupdates?L(W,B|j)asynchronously.

Inamulti-nodesystem,thisparallelizationschemeworksontopofH2O’sdistributedsetupthatdistributesthetrainingdataacrossthecluster.EachnodeoperatesinparallelonitslocaldatauntilthefinalparametersW,Bareobtainedbyaveraging.

Paralleldistributedandmulti-threadedtrainingwithSGDinH2ODeepLearning

InitializeglobalmodelparametersW,B

DistributetrainingdataTacrossnodes(canbedisjointorreplicated)

Iterateuntilconvergencecriterionreached:

FornodesnwithtrainingsubsetTn,doinparallel:

ObtaincopyoftheglobalmodelparametersWn,Bn

SelectactivesubsetTna?Tn

(user-givennumberofsamplesperiteration)

PartitionTnaintoTnacbycoresnc

Forcoresnconnoden,doinparallel:

Gettrainingexamplei∈Tnac

?w

Updateallweightswjk∈Wn,biasesbjk∈Bnwjk:=wjk?α?L(W,B|j)

jk

?b

bjk:=bjk?α?L(W,B|j)

jk

SetW,B:=AvgnWn,AvgnBn

Optionallyscorethemodelon(potentiallysampled)train/validationscoringsets

Here,theweightsandbiasupdatesfollowtheasynchronousHogwild!proce-duretoincrementallyadjusteachnode’sparametersWn,Bnafterseeingtheexamplei.TheAvgnnotationrepresentsthefinalaveragingoftheselocalparametersacrossallnodestoobtaintheglobalmodelparametersandcompletetraining.

SpecifyingtheNumberofTrainingSamples

H2ODeepLearningisscalableandcantakeadvantageoflargeclustersofcomputenodes.Therearethreeoperatingmodes.Thedefaultbehaviorallowseverynodetotrainontheentire(replicated)datasetbutautomaticallyshuffling(and/orusingasubsetof)thetrainingexamplesforeachiterationlocally.

Fordatasetsthatdon’tfitintoeachnode’smemory(dependingontheamountofheapmemoryspecifiedbythe-XmxJavaoption),itmightnotbepossibletoreplicatethedata,soeachcomputenodecanbespecifiedtotrainonlywithlocaldata.Anexperimentalsinglenodemodeisavailableforcaseswherefinalconvergenceisslowduetothepresenceoftoomanynodes,butthishasnotbeennecessaryinourtesting.

TospecifytheglobalnumberoftrainingexamplessharedwiththedistributedSGDworkernodesbetweenmodelaveraging,usethe

trainsamplesperiterationparameter.Ifthespecifiedvalueis-1,allnodesprocessalltheirlocaltrainingdataoneachiteration.

Ifreplicatetrainingdataisenabled,whichisthedefaultsetting,thiswillresultintrainingNepochs(passesoverthedata)periterationonNnodes;otherwise,oneepochwillbetrainedperiteration.Specifying0alwaysresultsinoneepochperiterationregardlessofthenumberofcomputenodes.Ingeneral,thisparametersupportsanypositivenumber.Forlargedatasets,werecommendspecifyingafractionofthedataset.

Avalueof-2,whichisthedefaultvalue,enablesauto-tuningforthisparameterbasedonthecomputationalperformanceoftheprocessorsandthenetworkofthesystemandattemptstofindagoodbalancebetweencomputationandcommunication.Thisparametercanaffecttheconvergencerateduringtraining.

Forexample,ifthetrainingdatacontains10millionrows,andthenumberoftrainingsamplesperiterationisspecifiedas100,000whenrunningonfournodes,theneachnodewillprocess25,000examplesperiteration,anditwilltake40distributediterationstoprocessoneepoch.

Ifthevalueistoohigh,itmighttaketoolongbetweensynchronizationandmodelconvergencemaybeslow.Ifthevalueistoolow,networkcommunicationoverheadwilldominatetheruntimeandcomputationalperformancewillsuffer.

Regularization

H2O’sDeepLearningframeworksupportsregularizationtechniquestopreventoverfitting.£1(L1:Lasso)and£2(L2:Ridge)regularizationenforcethesamepenaltiesastheydowithothermodels:modifyingthelossfunctionsoastominimizeloss:

L1(W,B|j)=L(W,B|j)+λ1R1(W,B|j)+λ2R2(W,B|j).

For£1regularization,R1(W,B|j)isthesumofall£1normsfortheweightsandbiasesinthenetwork;£2regularizationviaR2(W,B|j)representsthesumofsquaresofalltheweightsandbiasesinthenetwork.Theconstantsλ1andλ2aregenerallyspecifiedasverysmall(forexample10?5).

ThesecondtypeofregularizationavailableforDeepLearningisamoderninnovationcalleddropout

(Hintonetal.,2012).

Dropoutconstrainstheonlineoptimizationsothatduringforwardpropagationforagiventrainingexample,eachneuroninthenetworksuppressesitsactivationwithprobabilityP,whichisusuallylessthan0.2forinputneuronsandupto0.5forhiddenneurons.

Therearetwoeffects:aswith£2regularization,thenetworkweightvaluesarescaledtoward0.Althoughtheysharethesameglobalparameters,eachtrainingexampletrainsadifferentmodel.Asaresult,dropoutallowsanexponentiallylargenumberofmodelstobeaveragedasanensembletohelppreventoverfittingandimprovegeneralization.

Ifthefeaturespaceislargeandnoisy,specifyinganinputdropoutusingtheinputdropoutratioparametercanbeespeciallyuseful.Notethatin-putdropoutcanbespecifiedindependentlyofthedropoutspecificationinthehiddenlayers(whichrequiresactivationtobeTanhWithDropout,MaxoutWithDropout,orRectifierWithDropout).Specifytheamountofhiddendropoutperhiddenlayerusingthehiddendropoutratiospa-rameter,whichissetto0.5bydefault.

AdvancedOptimization

H2Ofeaturesmanualandautomaticadvancedoptimizationmodes.Themanualmodefeaturesincludemomentumtrainingandlearningrateannealingandtheautomaticmodefeaturesanadaptivelearningrate.

MomentumTraining

Momentummodifiesback-propagationbyallowingprioriterationstoinfluencethecurrentversion.Inparticular,avelocityvector,v,isdefinedtomodifytheupdatesasfollows:

θrepresentstheparametersW,B

μrepresentsthemomentumcoefficient

αrepresentsthelearningrate

vt+1=μvt?α?L(θt)θt+1=θt+vt+1

Usingthemomentumparametercanaidinavoidinglocalminimaandanyassociatedinstability

(Sutskeveretal,2014).

Toomuchmomentumcanleadtoinstability,sowerecommendincrementingthemomentumslowly.Thepa-rametersthatcontrolmomentumaremomentumstart,momentumramp,andmomentumstable.

Whenusingmomentumupdates,werecommendusingtheNesterovacceler-atedgradientmethod,whichusesthenesterovacceleratedgradientparameter.Thismethodmodifiestheupdatesasfollows:

vt+1=μvt?α?L(θt+μvt)Wt+1=Wt+vt+1

RateAnnealing

Duringtraining,thechanceofoscillationor“optimumskipping”createstheneedforaslowerlearningrateasthemodelapproachesaminimum.Asopposedtospecifyingaconstantlearningrateα,learningrateannealinggraduallyreducesthelearningrateαtto“freeze”intolocalminimaintheoptimizationlandscape

(Zeiler,2012).

ForH2O,theannealingrate(rateannealing)istheinverseofthenumberoftrainingsamplesrequiredtodividethelearningrateinhalf(e.g.,10?6meansthatittakes106trainingsamplestohalvethelearningrate).

AdaptiveLearning

TheimplementedadaptivelearningratealgorithmADADELTA

(Zeiler,2012)

automaticallycombinesthebenefitsoflearningrateannealingandmomentumtrainingtoavoidslowconvergence.Tosimplifyhyperparametersearch,specifyonlyρandE.

Insomecases,amanuallycontrolled(non-adaptive)learningrateandmomen-tumspecificationscanleadtobetterresultsbutrequireahyperparametersearchofuptosevenparameters.Ifthemodelisbuiltonatopologywithmanylocalminimaorlongplateaus,aconstantlearningratemayproducesub-optimalresults.However,theadaptivelearningrategenerallyproducesthebestresultsduringourtesting,sothisoptionisthedefault.

Thefirstoftwohyperparametersforadaptivelearningisρ(rho).Itissimilartomomentumandisrelatedtothememoryofpriorweightupdates.Typicalvaluesarebetween0.9and0.999.Thesecondhyperparameter,E(epsilon),issimilartolearningrateannealingduringinitialtrainingandallowsfurtherprogressduringmomentumatlaterstages.Typicalvaluesarebetween10?10and10?4.

LoadingData

LoadingadatasetinRorPythonforusewithH2Oisslightlydifferentthantheusualmethodology.Insteadofusingdata.frameordata.tableinR,orpandas.DataFr

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論