![H2O 深度學(xué)習(xí)報告_第1頁](http://file4.renrendoc.com/view5/M00/2B/1F/wKhkGGYm-CyATX1eAAA6YGktMsY789.jpg)
![H2O 深度學(xué)習(xí)報告_第2頁](http://file4.renrendoc.com/view5/M00/2B/1F/wKhkGGYm-CyATX1eAAA6YGktMsY7892.jpg)
![H2O 深度學(xué)習(xí)報告_第3頁](http://file4.renrendoc.com/view5/M00/2B/1F/wKhkGGYm-CyATX1eAAA6YGktMsY7893.jpg)
![H2O 深度學(xué)習(xí)報告_第4頁](http://file4.renrendoc.com/view5/M00/2B/1F/wKhkGGYm-CyATX1eAAA6YGktMsY7894.jpg)
![H2O 深度學(xué)習(xí)報告_第5頁](http://file4.renrendoc.com/view5/M00/2B/1F/wKhkGGYm-CyATX1eAAA6YGktMsY7895.jpg)
版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
DeepLearningwithH2O
ArnoCandel ErinLeDellEditedby:AngelaBartz
http://h2o.ai/resources/
October2021:SixthEdition
DeepLearningwithH2O
byArnoCandel&ErinLeDell
withassistancefromVirajParmar&AnishaAroraEditedby:AngelaBartz
PublishedbyH2O.ai,Inc.2307LeghornSt.
MountainView,CA94043
?2016-2021H2O.ai,Inc.AllRightsReserved.October2021:SixthEdition
Photosby?H2O.ai,Inc.
Allcopyrightsbelongtotheirrespectiveowners.Whileeveryprecautionhasbeentakeninthepreparationofthisbook,thepublisherandauthorsassumenoresponsibilityforerrorsoromissions,orfordamagesresultingfromtheuseoftheinformationcontainedherein.
PrintedintheUnitedStatesofAmerica.
Contents
Introduction
5
WhatisH2O?
5
Installation
6
InstallationinR
7
InstallationinPython
7
PointingtoaDifferentH2OCluster
8
ExampleCode
8
Citation
9
DeepLearningOverview
9
H2O’sDeepLearningArchitecture
10
SummaryofFeatures
11
TrainingProtocol
12
Initialization
12
ActivationandLossFunctions
12
ParallelDistributedNetworkTraining
15
SpecifyingtheNumberofTrainingSamples
17
Regularization
18
AdvancedOptimization
18
MomentumTraining
19
RateAnnealing
19
AdaptiveLearning
20
LoadingData
20
DataStandardization/Normalization
20
Convergence-basedEarlyStopping
21
Time-basedEarlyStopping
21
AdditionalParameters
21
UseCase:MNISTDigitClassification
22
MNISTOverview
22
PerformingaTrialRun
25
N-foldCross-Validation
27
ExtractingandHandlingtheResults
28
WebInterface
31
VariableImportances
31
JavaModel
33
GridSearchforModelComparison
33
4|CONTENTS
WhatisH2O?|5
CartesianGridSearch
34
RandomGridSearch
35
CheckpointModels
37
AchievingWorld-RecordPerformance
41
ComputationalPerformance
41
DeepAutoencoders
42
NonlinearDimensionalityReduction
42
UseCase:AnomalyDetection
43
StackedAutoencoder
46
UnsupervisedPretrainingwithSupervisedFine-Tuning
46
Parameters
46
CommonRCommands
53
CommonPythonCommands
53
Acknowledgments
53
References
54
Authors
55
Introduction
ThisdocumentintroducesthereadertoDeepLearningwithH2O.ExamplesarewritteninRandPython.Topicsinclude:
installationofH2O
basicDeepLearningconcepts
buildingdeepneuralnetsinH2O
howtointerpretmodeloutput
howtomakepredictions
aswellasvariousimplementationdetails.
WhatisH2O?
H2O.aiisfocusedonbringingAItobusinessesthroughsoftware.ItsflagshipproductisH2O,theleadingopensourceplatformthatmakesiteasyforfinancialservices,insurancecompanies,andhealthcarecompaniestodeployAIanddeeplearningtosolvecomplexproblems.Morethan9,000organizationsand80,000+datascientistsdependonH2Oforcriticalapplicationslikepredictivemaintenanceandoperationalintelligence.Thecompany–whichwasrecentlynamedtotheCBInsightsAI100–isusedby169Fortune500enterprises,including8oftheworld’s10largestbanks,7ofthe10largestinsurancecompanies,and4ofthetop10healthcarecompanies.NotablecustomersincludeCapitalOne,ProgressiveInsurance,Transamerica,Comcast,NielsenCatalinaSolutions,Macy’s,Walgreens,andKaiserPermanente.
Usingin-memorycompression,H2Ohandlesbillionsofdatarowsin-memory,evenwithasmallcluster.Tomakeiteasierfornon-engineerstocreatecompleteanalyticworkflows,H2O’splatformincludesinterfacesforR,Python,Scala,Java,JSON,andCoffeeScript/JavaScript,aswellasabuilt-inwebinterface,Flow.H2Oisdesignedtoruninstandalonemode,onHadoop,orwithinaSparkCluster,andtypicallydeployswithinminutes.
H2Oincludesmanycommonmachinelearningalgorithms,suchasgeneralizedlinearmodeling(linearregression,logisticregression,etc.),Na¨?veBayes,principalcomponentsanalysis,k-meansclustering,andword2vec.H2Oimplementsbest-in-classalgorithmsatscale,suchasdistributedrandomforest,gradientboosting,anddeeplearning.H2OalsoincludesaStackedEnsemblesmethod,whichfindstheoptimalcombinationofacollectionofpredictionalgorithmsusingaprocess
PAGE
6
|Installation
Installation|7
knownas”stacking.”WithH2O,customerscanbuildthousandsofmodelsandcomparetheresultstogetthebestpredictions.
H2Oisnurturingagrassrootsmovementofphysicists,mathematicians,andcomputerscientiststoheraldthenewwaveofdiscoverywithdatasciencebycollaboratingcloselywithacademicresearchersandindustrialdatascientists.StanforduniversitygiantsStephenBoyd,TrevorHastie,andRobTibshiraniadvisetheH2Oteamonbuildingscalablemachinelearningalgorithms.Andwithhundredsofmeetupsoverthepastseveralyears,H2Ocontinuestoremainaword-of-mouthphenomenon.
Tryitout
DownloadH2Odirectlyat
http://h2o.ai/download
.
InstallH2O’sRpackagefromCRANat
https://cran.r-project.
org/
web/packages/h2o/
.
InstallthePythonpackagefromPyPIat
/
pypi/h2o/
.
Jointhecommunity
Tolearnaboutourtrainingsessions,hackathons,andproductupdates,visit
http://h2o.ai
.
Tolearnaboutourmeetups,visit
/
topics/h2o/all/
.
Havequestions?PostthemonStackOverflowusingtheh2otagat
/questions/tagged/h2o
.
HaveaGoogleaccount(suchasGmailorGoogle+)?Jointheopensourcecommunityforumat
/d/forum/
h2ostream
.
Jointhechatat
https://gitter.im/h2oai/h2o-3
.
Installation
H2OrequiresJava;ifyoudonotalreadyhaveJavainstalled,installitfrom
/en/download/
beforeinstallingH2O.
TheeasiestwaytodirectlyinstallH2OisviaanRorPythonpackage.
InstallationinR
ToloadarecentH2OpackagefromCRAN,run:
install.packages("h2o")
1
Note:TheversionofH2OinCRANmaybeonereleasebehindthecurrentversion.
Forthelatestrecommendedversion,downloadthelateststableH2O-3buildfromtheH2Odownloadpage:
Goto
http://h2o.ai/download
.
ChoosethelateststableH2O-3build.
Clickthe“InstallinR”tab.
library(h2o)
#StartH2Oonyourlocalmachineusingallavailablecores.
#Bydefault,CRANpolicieslimitusetoonly2cores.
h2o.init(nthreads=-1)
#Gethelp
?h2o.glm
?h2o.gbm
?h2o.deeplearning
#Showademodemo(h2o.glm)demo(h2o.gbm)demo(h2o.deeplearning)
CopyandpastethecommandsintoyourRsession.AfterH2Oisinstalledonyoursystem,verifytheinstallation:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
InstallationinPython
ToloadarecentH2OpackagefromPyPI,run:
pipinstallh2o
1
TodownloadthelateststableH2O-3buildfromtheH2Odownloadpage:
Goto
http://h2o.ai/download
.
ChoosethelateststableH2O-3build.
Clickthe“InstallinPython”tab.
CopyandpastethecommandsintoyourPythonsession.
AfterH2Oisinstalled,verifytheinstallation:
importh2o
#StartH2Oonyourlocalmachine
h2o.init()
#Gethelphelp(h2o.estimators.glm.H2OGeneralizedLinearEstimator)help(h2o.estimators.gbm.H2OGradientBoostingEstimator)help(h2o.estimators.deeplearning.
H2ODeepLearningEstimator)
#Showademo
h2o.demo("glm")
h2o.demo("gbm")h2o.demo("deeplearning")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
PointingtoaDifferentH2OCluster
Theinstructionsintheprevioussectionscreateaone-nodeH2Oclusteronyourlocalmachine.
ToconnecttoanestablishedH2Ocluster(inamulti-nodeHadoopenvironment,forexample)specifytheIPaddressandportnumberfortheestablishedclusterusingtheipandportparametersintheh2o.init()command.ThesyntaxforthisfunctionisidenticalforRandPython:
h2o.init(ip="9",port=54321)
1
ExampleCode
RandPythoncodefortheexamplesinthisdocumentcanbefoundhere:
/h2oai/h2o-3/tree/master/h2o-docs/src/
booklets/v2_2015/source/DeepLearning_Vignette_code_examples
DeepLearningOverview|9
10|H2O’sDeepLearningArchitecture
Thedocumentsourceitselfcanbefoundhere:
/h2oai/h2o-3/blob/master/h2o-docs/src/
booklets/v2_2015/source/DeepLearning_Vignette.tex
Citation
Tocitethisbooklet,usethefollowing:
Candel,A.,Parmar,V.,LeDell,E.,andArora,A.(Oct2021).DeepLearningwithH2O.
http://h2o.ai/resources
.
DeepLearningOverview
Unliketheneuralnetworksofthepast,modernDeepLearningprovidestrainingstability,generalization,andscalabilitywithbigdata.Sinceitperformsquitewellinanumberofdiverseproblems,DeepLearningisquicklybecomingthealgorithmofchoiceforthehighestpredictiveaccuracy.
Thefirstsectionisabriefoverviewofdeepneuralnetworksforsupervisedlearningtasks.ThereareseveraltheoreticalframeworksforDeepLearning,butthisdocumentfocusesprimarilyonthefeedforwardarchitectureusedbyH2O.
Thebasicunitinthemodel(shownintheimagebelow)istheneuron,abiologicallyinspiredmodelofthehumanneuron.Inhumans,thevaryingstrengthsoftheneurons’outputsignalstravelalongthesynapticjunctionsandarethenaggregatedasinputforaconnectedneuron’sactivation.
i=1
Inthemodel,theweightedcombinationα=藝n wixi+bofinputsignalsis
aggregated,andthenanoutputsignalf(α)transmittedbytheconnectedneuron.Thefunctionfrepresentsthenonlinearactivationfunctionusedthroughoutthenetworkandthebiasbrepresentstheneuron’sactivationthreshold.
Multi-layer,feedforwardneuralnetworksconsistofmanylayersofinterconnectedneuronunits(asshowninthefollowingimage),startingwithaninputlayertomatchthefeaturespace,followedbymultiplelayersofnonlinearity,andendingwithalinearregressionorclassificationlayertomatchtheoutputspace.Theinputsandoutputsofthemodel’sunitsfollowthebasiclogicofthesingleneurondescribedabove.
Biasunitsareincludedineachnon-outputlayerofthenetwork.Theweightslinkingneuronsandbiaseswithotherneuronsfullydeterminetheoutputoftheentirenetwork.Learningoccurswhentheseweightsareadaptedtominimizetheerroronthelabeledtrainingdata.Morespecifically,foreachtrainingexamplej,theobjectiveistominimizealossfunction,
L(W,B|j).
Here,Wisthecollection{Wi}1:N?1,whereWidenotestheweightmatrixconnectinglayersiandi+1foranetworkofNlayers.SimilarlyBisthecollection{bi}1:N?1,wherebidenotesthecolumnvectorofbiasesforlayeri+1.
Thisbasicframeworkofmulti-layerneuralnetworkscanbeusedtoaccomplishDeepLearningtasks.DeepLearningarchitecturesaremodelsofhierarchicalfeatureextraction,typicallyinvolvingmultiplelevelsofnonlinearity.DeepLearningmodelsareabletolearnusefulrepresentationsofrawdataandhaveexhibitedhighperformanceoncomplexdatasuchasimages,speech,andtext
(Bengio,2009).
H2O’sDeepLearningArchitecture
H2Ofollowsthemodelofmulti-layer,feedforwardneuralnetworksforpredictivemodeling.ThissectionprovidesamoredetaileddescriptionofH2O’sDeepLearningfeatures,parameterconfigurations,andcomputationalimplementation.
H2O’sDeepLearningArchitecture|
PAGE
11
PAGE
12
|H2O’sDeepLearningArchitecture
SummaryofFeatures
H2O’sDeepLearningfunctionalitiesinclude:
supervisedtrainingprotocolforregressionandclassificationtasks
fastandmemory-efficientJavaimplementationsbasedoncolumnarcom-pressionandfine-grainMapReduce
multi-threadedanddistributedparallelcomputationthatcanberunonasingleoramulti-nodecluster
automatic,per-neuron,adaptivelearningrateforfastconvergence
optionalspecificationoflearningrate,annealing,andmomentumoptions
regularizationoptionssuchasL1,L2,dropout,Hogwild!,andmodelaveragingtopreventmodeloverfitting
elegantandintuitivewebinterface(Flow)
fullyscriptableRAPIfromH2O’sCRANpackage
fullyscriptablePythonAPI
gridsearchforhyperparameteroptimizationandmodelselection
automaticearlystoppingbasedonconvergenceofuser-specifiedmetricstouser-specifiedtolerance
modelcheckpointingforreducedruntimesandmodeltuning
automaticpre-andpost-processingforcategoricalandnumericaldata
automaticimputationofmissingvalues(optional)
automatictuningofcommunicationvscomputationforbestperformance
modelexportinplainJavacodefordeploymentinproductionenvironments
additionalexpertparametersformodeltuning
deepautoencodersforunsupervisedfeaturelearningandanomalydetection
TrainingProtocol
ThetrainingprotocoldescribedbelowfollowsmanyoftheideasandadvancesdiscussedinrecentDeepLearningliterature.
Initialization
VariousDeepLearningarchitecturesemployacombinationofunsupervisedpre-trainingfollowedbysupervisedtraining,butH2Ousesapurelysupervisedtrainingprotocol.Thedefaultinitializationschemeistheuniformadaptiveoption,whichisanoptimizedinitializationbasedonthesizeofthenetwork.DeepLearningcanalsobestartedusingarandominitializationdrawnfromeitherauniformornormaldistribution,optionallyspecifyingascalingparameter.
ActivationandLossFunctions
藝
ThechoicesforthenonlinearactivationfunctionfdescribedintheintroductionaresummarizedinTable1below.xiandwirepresentthefiringneuron’sinputvaluesandtheirweights,respectively;αdenotestheweightedcombinationα=iwixi+b.
Function
Formula
Range
Tanh
α ?α
f(α)=e?e
f(·)∈[?1,1]
RectifiedLinear
Maxout f
f(α)=max(0,α) f(·)∈R+
(α1,α2)=max(α1,α2) f(·)∈R
Table1:ActivationFunctions
eα+e?α
Thetanhfunctionisarescaledandshiftedlogisticfunction;itssymmetryaround0allowsthetrainingalgorithmtoconvergefaster.Therectifiedlinearactivationfunctionhasdemonstratedhighperformanceonimagerecognitiontasksandisamorebiologicallyaccuratemodelofneuronactivations
(LeCun
etal,1998).
MaxoutisageneralizationoftheRectifiiedLinearactivation,whereeachneuronpicksthelargestoutputofkseparatechannels,whereeachchannelhasitsownweightsandbiasvalues.Thecurrentimplementationsupportsonlyk=2.Maxoutactivationworksparticularlywellwithdropout
(Goodfellowet
al,2013).
Formoreinformation,referto
Regularization
.
TheRectifieristhespecialcaseofMaxoutwheretheoutputofonechannelisalways0.Itisdifficulttodeterminea“best”activationfunctiontouse;eachmayoutperformtheothersinseparatescenarios,butgridsearchmodelscanhelptocompareactivationfunctionsandotherparameters.Formoreinformation,referto
GridSearchforModelComparison
.ThedefaultactivationfunctionistheRectifier.Eachoftheseactivationfunctionscanbeoperatedwithdropoutregularization.Formoreinformation,referto
Regularization
.
Specifytheoneofthefollowingdistributionfunctionsfortheresponsevariableusingthedistributionargument:
AUTO
Bernoulli
Multinomial
Poisson
Gamma
Tweedie
Laplace
Quantile
Huber
Gaussian
Eachdistributionhasaprimaryassociationwithaparticularlossfunction,butsomedistributionsallowuserstospecifyanon-defaultlossfunctionfromthegroupoflossfunctionsspecifiedinTable2.Bernoulliandmultinomialareprimarilyassociatedwithcross-entropy(alsoknownaslog-loss),GaussianwithMeanSquaredError,LaplacewithAbsoluteloss(aspecialcaseofQuantilewithquantilealpha=0.5)andHuberwithHuberloss.ForPoisson,Gamma,andTweediedistributions,thelossfunctioncannotbechanged,solossmustbesettoAUTO.
Thesystemdefaultenforcesthetable’stypicaluserulebasedonwhetherregressionorclassificationisbeingperformed.Noteherethatt(j)ando(j)arethepredicted(alsoknownastarget)outputandactualoutput,respectively,fortrainingexamplej;further,letyrepresenttheoutputunitsandOtheoutputlayer.
Table2:Lossfunctions
Function Formula Typicaluse
MeanSquaredError L(W,B|j)=1lt(j)?o(j)l2
Regression
2 2
Absolute L(W,B|j)=lt(j)?o(j)l1 Regression
Huber L(W,B|j)=
2
2
Regression
lt(j)?o(j)l1?1
2
j1lt(j)?o(j)l2
forlt(j)?o(j)l1≤1,
CrossEntropy L(W,B|j)=?
藝ln(o(j))·t(j)+ln(1?o(j))·(1?t(j))
Classification
otherwise.
y y y y
y∈O
Topredictthe80-thpercentileofthepetallengthoftheIrisdatasetinR,usethefollowing:
ExampleinR
library(h2o)h2o.init(nthreads=-1)
train.hex<-h2o.importFile("https://h2o-public-test-/smalldata/iris/iris_wheader.csv")
splits<-h2o.splitFrame(train.hex,0.75,seed=1234)dl<-h2o.deeplearning(x=1:3,y="petal_len",
training_frame=splits[[1]],distribution="quantile",quantile_alpha=0.8)
h2o.predict(dl,splits[[2]])
1
2
3
4
5
6
7
8
Topredictthe80-thpercentileofthepetallengthoftheIrisdatasetinPython,usethefollowing:
ExampleinPython
importh2o
fromh2o.estimators.deeplearningimportH2ODeepLearningEstimator
h2o.init()
train=h2o.import_file("https://h2o-public-test-data./smalldata/iris/iris_wheader.csv")
splits=train.split_frame(ratios=[0.75],seed=1234)dl=H2ODeepLearningEstimator(distribution="quantile",
quantile_alpha=0.8)
dl.train(x=range(0,2),y="petal_len",training_frame=splits[0])
print(dl.predict(splits[1]))
1
2
3
4
5
6
7
8
ParallelDistributedNetworkTraining
TheprocessofminimizingthelossfunctionL(W,B|j)isaparallelizedversionofstochasticgradientdescent(SGD).AsummaryofstandardSGDprovidedbelow,withthegradient?L(W,B|j)computedviabackpropagation
(LeCun
etal,1998).
Theconstantαisthelearningrate,whichcontrolsthestepsizesduringgradientdescent.
Standardstochasticgradientdescent
InitializeW,B
Iterateuntilconvergencecriterionreached:
Gettrainingexamplei
?w
Updateallweightswjk∈W,biasesbjk∈Bwjk:=wjk?α?L(W,B|j)
jk
?b
bjk:=bjk?α?L(W,B|j)
jk
Stochasticgradientdescentisfastandmemory-efficientbutnoteasilyparal-lelizablewithoutbecomingslow.WeutilizeHogwild!,therecentlydevelopedlock-freeparallelizationschemefrom
Niuetal,2011,
toaddressthisissue.
Hogwild!followsasharedmemorymodelwheremultiplecores(whereeachcorehandlesseparatesubsetsorallofthetrainingdata)areabletomakeindependentcontributionstothegradientupdates?L(W,B|j)asynchronously.
Inamulti-nodesystem,thisparallelizationschemeworksontopofH2O’sdistributedsetupthatdistributesthetrainingdataacrossthecluster.EachnodeoperatesinparallelonitslocaldatauntilthefinalparametersW,Bareobtainedbyaveraging.
Paralleldistributedandmulti-threadedtrainingwithSGDinH2ODeepLearning
InitializeglobalmodelparametersW,B
DistributetrainingdataTacrossnodes(canbedisjointorreplicated)
Iterateuntilconvergencecriterionreached:
FornodesnwithtrainingsubsetTn,doinparallel:
ObtaincopyoftheglobalmodelparametersWn,Bn
SelectactivesubsetTna?Tn
(user-givennumberofsamplesperiteration)
PartitionTnaintoTnacbycoresnc
Forcoresnconnoden,doinparallel:
Gettrainingexamplei∈Tnac
?w
Updateallweightswjk∈Wn,biasesbjk∈Bnwjk:=wjk?α?L(W,B|j)
jk
?b
bjk:=bjk?α?L(W,B|j)
jk
SetW,B:=AvgnWn,AvgnBn
Optionallyscorethemodelon(potentiallysampled)train/validationscoringsets
Here,theweightsandbiasupdatesfollowtheasynchronousHogwild!proce-duretoincrementallyadjusteachnode’sparametersWn,Bnafterseeingtheexamplei.TheAvgnnotationrepresentsthefinalaveragingoftheselocalparametersacrossallnodestoobtaintheglobalmodelparametersandcompletetraining.
SpecifyingtheNumberofTrainingSamples
H2ODeepLearningisscalableandcantakeadvantageoflargeclustersofcomputenodes.Therearethreeoperatingmodes.Thedefaultbehaviorallowseverynodetotrainontheentire(replicated)datasetbutautomaticallyshuffling(and/orusingasubsetof)thetrainingexamplesforeachiterationlocally.
Fordatasetsthatdon’tfitintoeachnode’smemory(dependingontheamountofheapmemoryspecifiedbythe-XmxJavaoption),itmightnotbepossibletoreplicatethedata,soeachcomputenodecanbespecifiedtotrainonlywithlocaldata.Anexperimentalsinglenodemodeisavailableforcaseswherefinalconvergenceisslowduetothepresenceoftoomanynodes,butthishasnotbeennecessaryinourtesting.
TospecifytheglobalnumberoftrainingexamplessharedwiththedistributedSGDworkernodesbetweenmodelaveraging,usethe
trainsamplesperiterationparameter.Ifthespecifiedvalueis-1,allnodesprocessalltheirlocaltrainingdataoneachiteration.
Ifreplicatetrainingdataisenabled,whichisthedefaultsetting,thiswillresultintrainingNepochs(passesoverthedata)periterationonNnodes;otherwise,oneepochwillbetrainedperiteration.Specifying0alwaysresultsinoneepochperiterationregardlessofthenumberofcomputenodes.Ingeneral,thisparametersupportsanypositivenumber.Forlargedatasets,werecommendspecifyingafractionofthedataset.
Avalueof-2,whichisthedefaultvalue,enablesauto-tuningforthisparameterbasedonthecomputationalperformanceoftheprocessorsandthenetworkofthesystemandattemptstofindagoodbalancebetweencomputationandcommunication.Thisparametercanaffecttheconvergencerateduringtraining.
Forexample,ifthetrainingdatacontains10millionrows,andthenumberoftrainingsamplesperiterationisspecifiedas100,000whenrunningonfournodes,theneachnodewillprocess25,000examplesperiteration,anditwilltake40distributediterationstoprocessoneepoch.
Ifthevalueistoohigh,itmighttaketoolongbetweensynchronizationandmodelconvergencemaybeslow.Ifthevalueistoolow,networkcommunicationoverheadwilldominatetheruntimeandcomputationalperformancewillsuffer.
Regularization
H2O’sDeepLearningframeworksupportsregularizationtechniquestopreventoverfitting.£1(L1:Lasso)and£2(L2:Ridge)regularizationenforcethesamepenaltiesastheydowithothermodels:modifyingthelossfunctionsoastominimizeloss:
L1(W,B|j)=L(W,B|j)+λ1R1(W,B|j)+λ2R2(W,B|j).
For£1regularization,R1(W,B|j)isthesumofall£1normsfortheweightsandbiasesinthenetwork;£2regularizationviaR2(W,B|j)representsthesumofsquaresofalltheweightsandbiasesinthenetwork.Theconstantsλ1andλ2aregenerallyspecifiedasverysmall(forexample10?5).
ThesecondtypeofregularizationavailableforDeepLearningisamoderninnovationcalleddropout
(Hintonetal.,2012).
Dropoutconstrainstheonlineoptimizationsothatduringforwardpropagationforagiventrainingexample,eachneuroninthenetworksuppressesitsactivationwithprobabilityP,whichisusuallylessthan0.2forinputneuronsandupto0.5forhiddenneurons.
Therearetwoeffects:aswith£2regularization,thenetworkweightvaluesarescaledtoward0.Althoughtheysharethesameglobalparameters,eachtrainingexampletrainsadifferentmodel.Asaresult,dropoutallowsanexponentiallylargenumberofmodelstobeaveragedasanensembletohelppreventoverfittingandimprovegeneralization.
Ifthefeaturespaceislargeandnoisy,specifyinganinputdropoutusingtheinputdropoutratioparametercanbeespeciallyuseful.Notethatin-putdropoutcanbespecifiedindependentlyofthedropoutspecificationinthehiddenlayers(whichrequiresactivationtobeTanhWithDropout,MaxoutWithDropout,orRectifierWithDropout).Specifytheamountofhiddendropoutperhiddenlayerusingthehiddendropoutratiospa-rameter,whichissetto0.5bydefault.
AdvancedOptimization
H2Ofeaturesmanualandautomaticadvancedoptimizationmodes.Themanualmodefeaturesincludemomentumtrainingandlearningrateannealingandtheautomaticmodefeaturesanadaptivelearningrate.
MomentumTraining
Momentummodifiesback-propagationbyallowingprioriterationstoinfluencethecurrentversion.Inparticular,avelocityvector,v,isdefinedtomodifytheupdatesasfollows:
θrepresentstheparametersW,B
μrepresentsthemomentumcoefficient
αrepresentsthelearningrate
vt+1=μvt?α?L(θt)θt+1=θt+vt+1
Usingthemomentumparametercanaidinavoidinglocalminimaandanyassociatedinstability
(Sutskeveretal,2014).
Toomuchmomentumcanleadtoinstability,sowerecommendincrementingthemomentumslowly.Thepa-rametersthatcontrolmomentumaremomentumstart,momentumramp,andmomentumstable.
Whenusingmomentumupdates,werecommendusingtheNesterovacceler-atedgradientmethod,whichusesthenesterovacceleratedgradientparameter.Thismethodmodifiestheupdatesasfollows:
vt+1=μvt?α?L(θt+μvt)Wt+1=Wt+vt+1
RateAnnealing
Duringtraining,thechanceofoscillationor“optimumskipping”createstheneedforaslowerlearningrateasthemodelapproachesaminimum.Asopposedtospecifyingaconstantlearningrateα,learningrateannealinggraduallyreducesthelearningrateαtto“freeze”intolocalminimaintheoptimizationlandscape
(Zeiler,2012).
ForH2O,theannealingrate(rateannealing)istheinverseofthenumberoftrainingsamplesrequiredtodividethelearningrateinhalf(e.g.,10?6meansthatittakes106trainingsamplestohalvethelearningrate).
AdaptiveLearning
TheimplementedadaptivelearningratealgorithmADADELTA
(Zeiler,2012)
automaticallycombinesthebenefitsoflearningrateannealingandmomentumtrainingtoavoidslowconvergence.Tosimplifyhyperparametersearch,specifyonlyρandE.
Insomecases,amanuallycontrolled(non-adaptive)learningrateandmomen-tumspecificationscanleadtobetterresultsbutrequireahyperparametersearchofuptosevenparameters.Ifthemodelisbuiltonatopologywithmanylocalminimaorlongplateaus,aconstantlearningratemayproducesub-optimalresults.However,theadaptivelearningrategenerallyproducesthebestresultsduringourtesting,sothisoptionisthedefault.
Thefirstoftwohyperparametersforadaptivelearningisρ(rho).Itissimilartomomentumandisrelatedtothememoryofpriorweightupdates.Typicalvaluesarebetween0.9and0.999.Thesecondhyperparameter,E(epsilon),issimilartolearningrateannealingduringinitialtrainingandallowsfurtherprogressduringmomentumatlaterstages.Typicalvaluesarebetween10?10and10?4.
LoadingData
LoadingadatasetinRorPythonforusewithH2Oisslightlydifferentthantheusualmethodology.Insteadofusingdata.frameordata.tableinR,orpandas.DataFr
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 文化活動策劃方案范文
- 現(xiàn)代企業(yè)如何依賴云平臺優(yōu)化數(shù)據(jù)審核流程
- 游戲類直播平臺的用戶行為分析與優(yōu)化策略研究
- 現(xiàn)代舞臺背景屏技術(shù)革新與發(fā)展
- 環(huán)保材料在辦公環(huán)境建設(shè)中的應(yīng)用
- 生產(chǎn)過程中的危機應(yīng)對與風(fēng)險化解
- 未來十年電動汽車市場預(yù)測與展望
- 生態(tài)系統(tǒng)服務(wù)在商業(yè)地產(chǎn)開發(fā)中的應(yīng)用
- 現(xiàn)代網(wǎng)絡(luò)技術(shù)企業(yè)管理的重要支撐
- 18《書湖陰先生壁》說課稿-2024-2025學(xué)年統(tǒng)編版語文六年級上冊
- (正式版)HGT 22820-2024 化工安全儀表系統(tǒng)工程設(shè)計規(guī)范
- 養(yǎng)老護理員培訓(xùn)老年人日常生活照料
- 黑龍江省哈爾濱市八年級(下)期末化學(xué)試卷
- 各種抽油泵的結(jié)構(gòu)及工作原理幻燈片
- 學(xué)習(xí)弘揚雷鋒精神主題班會PPT雷鋒精神我傳承爭當(dāng)時代好少年P(guān)PT課件(帶內(nèi)容)
- 社區(qū)獲得性肺炎的護理查房
- 體育賽事策劃與管理第八章體育賽事的利益相關(guān)者管理課件
- 專題7閱讀理解之文化藝術(shù)類-備戰(zhàn)205高考英語6年真題分項版精解精析原卷
- 《生物資源評估》剩余產(chǎn)量模型
- 2022年廣東省10月自考藝術(shù)概論00504試題及答案
- 隧道二襯承包合同參考
評論
0/150
提交評論