版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
TechnicalWhitePaper
H18597
DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning
Abstract
ThisdocumentdemonstrateshowtheDellEMCIsilonF800all-flashscale-outNASandNVIDIADGX?A100systemswithNVIDIA?A100TensorCoreGPUscanbeusedtoaccelerateandscaledeeplearningtrainingworkloads
November2020
Revisions
2DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
Revisions
Date
Description
November2020
Initialrelease
Acknowledgments
Author:DamienMas
Other:NVIDIA
Theinformationinthispublicationisprovided“asis.”DellInc.makesnorepresentationsorwarrantiesofanykindwithrespecttotheinformationinthispublication,andspecificallydisclaimsimpliedwarrantiesofmerchantabilityorfitnessforaparticularpurpose.
Use,copying,anddistributionofanysoftwaredescribedinthispublicationrequiresanapplicablesoftwarelicense.
ThisdocumentmaycontaincertainwordsthatarenotconsistentwithDell'scurrentlanguageguidelines.Dellplanstoupdatethedocumentoversubsequentfuturereleasestorevisethesewordsaccordingly.
ThisdocumentmaycontainlanguagefromthirdpartycontentthatisnotunderDell'scontrolandisnotconsistentwithDell'scurrentguidelinesforDell'sowncontent.Whensuchthirdpartycontentisupdatedbytherelevantthirdparties,thisdocumentwillberevisedaccordingly.
Copyright?2020–2021DellInc.oritssubsidiaries.AllRightsReserved.DellTechnologies,Dell,EMC,DellEMCandothertrademarksaretrademarksofDellInc.oritssubsidiaries.Othertrademarksmaybetrademarksoftheirrespectiveowners.[3/31/2021][TechnicalWhitePaper][H18597]
Tableofcontents
3DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
Tableofcontents
Revisions 2
Acknowledgements 2
Tableofcontents 3
Executivesummary 4
Audience 4
1Introduction 5
1.1Deeplearningdataflow 5
2Solutionarchitecture 7
2.1Overview 7
2.2Storage–DellEMCIsilonF800 8
2.3Networking 9
2.3.1DellEMCPowerSwitchdatacenterswitches 9
2.3.2NVIDIAMellanoxSN3700VEthernetswitchforStorage 10
2.3.3NVIDIAMellanoxQM8700InfiniBandswitchforGPUInterconnect 10
2.4Compute:NVIDIADGXA100system 10
2.5NVIDIANGC 11
2.6Billofmaterials 11
3Deeplearningtrainingperformanceandanalysis 13
3.1Benchmarkmethodology 13
3.2MLPerfBenchmarkresults 13
3.3NVIDIAcollectivecommunicationlibrary(NCCL) 14
3.4Storage-onlyperformanceusingFIO 15
4Solutionsizingguidance 17
5Conclusion 18
6References 19
Executivesummary
4DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
Executivesummary
Deeplearning(DL)techniqueshaveenabledgreatsuccessesinmanyfieldssuchascomputervision,naturallanguageprocessing(NLP),gamingandautonomousdrivingbyenablingamodeltolearnfromexistingdataandthentomakecorrespondingpredictions.Thesuccessisduetoacombinationofimprovedalgorithms,accesstolargerdatasetsandincreasedcomputationalpower.Tobeeffectiveatenterprisescale,thecomputationalintensityofDLrequireshighlyefficientparallelarchitectures.Thechoiceanddesignofthesystemcomponents,carefullyselectedandtunedforDLuse-cases,canhaveabigimpactonthespeed,accuracyandbusinessvalueofimplementingartificialintelligence(AI)techniques.
Insuchademandingenvironment,itiscriticalthatorganizationsbeabletorelyonvendorsthattheytrust.Overthelastfewyears,DellTechnologiesandNVIDIAhaveestablishedastrongpartnershiptohelporganizationsfast-tracktheirAIinitiatives.OurpartnershipisbuiltonthephilosophyofofferingflexibilityandinformedchoiceacrossabroadportfoliowhichcombinesbestofbreedGPUacceleratedcompute,scale-outstorage,andnetworking.
ThispaperfocusesonhowDellEMCIsilonF800all-flashscale-outNASacceleratesAIinnovationbydeliveringtheperformance,scalabilityandI/OconcurrencytocomplementtherequirementsofNVIDIADGXA100systemsforhigh-performanceAIworkloads.
Audience
ThisdocumentisintendedfororganizationsinterestedinsimplifyingandacceleratingDLsolutionswithadvancedcomputingandscale-outdatamanagementsolutions.Solutionarchitects,systemadministratorsandotherinterestedreaderswithinthoseorganizationsconstitutethetargetaudience.
Introduction
5DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
1
1.1
Introduction
DLisanareaofAIwhichusesartificialneuralnetworkstoenableaccuratepatternrecognitionofcomplexreal-worldpatternsbycomputers.Thesenewlevelsofinnovationhaveapplicabilityacrossnearlyeveryindustryvertical.Someoftheearlyadoptersincludeadvancedresearch,precisionmedicine,hightechmanufacturing,advanceddriverassistancesystems(ADAS)andautonomousdriving.Buildingontheseinitialsuccesses,AIinitiativesarespringingupinvariousbusinessunits,suchasmanufacturing,customersupport,lifesciences,marketing,andsales.
Gartner
predictsthatAIaugmentationwillgenerate$2.9trillioninbusinessvalueby2021alone.Organizationsarefacedwithamultitudeofcomplexchoicesrelatedtodata,analyticskillsets,softwarestacks,analytictoolkits,andinfrastructurecomponents;eachwithsignificantimplicationsonthetimetomarketandthevalueassociatedwiththeseinitiatives.
Insuchacomplexenvironment,itiscriticalthatorganizationsbeabletorelyonvendorsthattheytrust.Overthelastfewyears,DellTechnologiesandNVIDIAhaveestablishedastrongpartnershiptohelporganizationsacceleratetheirAIinitiatives.Ourpartnershipisbuiltonthephilosophyofofferingflexibilityandinformedchoiceacrossanextensiveportfolio.TogetherourtechnologiesprovidethefoundationforsuccessfulAIsolutionswhichdrivethedevelopmentofadvancedDLsoftwareframeworks,delivermassivelyparallelcomputeintheformofNVIDIAGPUsforparallelmodeltrainingandscale-outfilesystemstosupporttheconcurrency,performance,andcapacityrequirementsofunstructuredimageandvideodatasets.
ThisdocumentfocusesonthelateststepintheDellTechnologiesandNVIDIAcollaboration,anewAIreferencearchitecturewithDellEMCIsilonF800storageandDGXA100systemsforDLworkloads.Thisnewoffergivescustomersmoreflexibilityinhowtheydeployscalable,highperformanceDLinfrastructure.TheresultsofstandardimageclassificationtrainingbenchmarkusingMLPerf0.7andmicro-benchmarkutilities,areincluded.
Deeplearningdataflow
Asvisualizedin
Figure1
,DLusuallyconsistsoftwodistinctworkflows,modeldevelopmentandinference.
CommonDLWorkflows-Modeldevelopmentandinference
Note:TheIsilonstorageandDGXA100systemarchitectureisoptimizedforthemodeldevelopmentworkflowwhichconsistsofthemodeltrainingandthebatchinferencevalidationsteps.Itisnotintendedforandnorwasitbenchmarkedforproductioninference.
Introduction
6DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
Theworkflowstepsaredefinedanddetailedbelow.
1.IngestLabeledData—Thelabeleddata(e.g.imagesandtheirlabelswhichindicatewhethertheimagecontainsadog,cat,orhorse)areingestedintotheDLsystem.
2.Transform—TransformationincludesalloperationsthatareappliedtothelabeleddatabeforetheyarepassedtotheDLalgorithm.Itissometimesreferredtoaspreprocessing.Forimages,thisoftenincludesfileparsing,JPEGdecoding,cropping,resizing,rotation,andcoloradjustments.Transformationscanbeperformedontheentiredatasetaheadoftime,storingthetransformeddataondisk.Manytransformationscanalsobeappliedinatrainingpipeline,avoidingtheneedtostoretheintermediatedata.
3.TrainModel—Themodelparameters(edgeweights)arelearnedfromthelabeleddatausingthestochasticgradientdescentoptimizationmethod.Inthecaseofimageclassification,thereareseveralprebuiltstructuresofneuralnetworksthathavebeenshowntoworkwell.
4.ValidateModel—Oncethemodeltrainingphasecompleteswithasatisfactoryaccuracy,you’llwanttomeasuretheaccuracyofitonvalidationdata–datathatthemodeltrainingprocesshasnotseen.Thisisdonebyusingthetrainedmodeltomakeinferencesfromthevalidationdataandcomparingtheresultwiththecorrectlabel.Thisisoftenreferredtoasinferencebutkeepinmindthatthisisadistinctstepfromproductioninference.
5.ProductionInference—Thetrainedandvalidatedmodelisthenoftendeployedtoasystemthatcanperformreal-timeinference.Itwillacceptasinputasingleimageandoutputthepredictedclass(dog,cat,horse).Insomecases,inputsarebatchedforhigherthroughputbuthigherlatency.
Solutionarchitecture
7DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
2
2.1
Solutionarchitecture
Overview
Figure2
illustratesthereferencearchitectureshowingthekeycomponentsthatmadeupthesolutionasitwastestedandbenchmarked.Notethatinacustomerdeployment,thenumberofDGXA100systemsand
F800storagenodeswillvaryandcanbescaledindependentlytomeettherequirementsofthespecificDL
workloads.Referto
Solutionsizingguidance
fordetails.
Note:Backend40GbEswitches
forF800notshown
(4)DellEMCIsilonF800nodesin(1)F800chassis
40GbENFS
100GbENFS
200GbHDRIB
200GbEISL
(2)SN3700Vswitches
(4)DGXA100systems
Note:QM8700wereinterconnectedwitheightISLs
(2)QM8700InfiniBandswitches
ReferenceArchitecture
8DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
2.2
Storage–DellEMCIsilonF800
DellEMCIsilonF800representsthesixthgenerationofhardwarebuilttorunthewell-provenandmassivelyscalableDellEMCPowerScaleOneFSoperatingsystem.EachDellEMCIsilonF800chassis,shownin
Figure3
,containsfourstoragenodes,60high-performancesolidstatedrives(SSDs)andeight40GbEnetworkconnections.OneFScombinesupto252nodesin63chassisintoasinglehigh-performancefilesystemdesignedtohandlethemostI/O-intenseworkloadssuchasDL.Asperformanceandcapacitydemandsincrease,theplatformcanbescaled-outsimplyandnon-disruptively,allowingapplicationsanduserstocontinueworking.
DellEMCIsilonF800chassis,containingfourstoragenodes
Inthesolutiontestedinthisdocument,fourF800nodes,inonechassis,wereused.
DellEMCIsilonF800hasthefollowingfeatures.
?Lowlatency,highthroughput,andmassivelyparallelI/OforAI
-Upto250,000fileIOPSperchassis,upto15.75millionIOPSpercluster
-Upto15GB/sthroughputperchassis,upto945GB/spercluster
-96TBto924TBrawflashcapacityperchassis;upto58PBpercluster(all-flash)
ThisshortenstimefortrainingandtestinganalyticalmodelsfordatasetsfromtensofTBstotensofPBsonAIplatformssuchasRAPIDS,TensorFlow,SparkML,Caffe,orproprietaryAIplatforms.
?TheabilitytorunAIin-placeondatausingmulti-protocolaccess
-Multi-protocolsupportsuchasSMB,NFS,HTTP,S3,andnativeHDFStomaximizeoperationalflexibility
Thiseliminatestheneedtomigrate/copydataandresultsovertoaseparateAIstack.OrganizationscanperformDLandrunotherITappsonthesamedataalreadyonIsilonbyaddingadditionalIsilonnodestoanexistingcluster.
?Enterprisegradefeaturesout-of-box
-Enterprisedataprotectionandresiliency
-Robustsecurityoptions
ThisenablesorganizationstomanageAIdatalifecyclewithminimalcostandrisk,whileprotectingdataandmeetingregulatoryrequirements.
?Extremescale
9DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
-SeamlesslytierbetweenAllFlash,Hybrid,andArchivenodesviaDellEMCPowerScaleSmartPools
-Grow-as-you-goscalabilitywithupto58PBflashcapacitypercluster
-Newnodescanbeaddedtoaclustersimplybyconnectingpower,back-endEthernetandfront-endEthernet
-Asnewnodesareadded,storagecapacity,throughput,IOPS,cache,andCPUgrow
-Upto63chassis(252nodes)maybeconnectedtoformasingleclusterwithasinglenamespaceandasinglecoherentcache
-Upto85%storageefficiencytoreducecostswithDellEMCPowerScaleSmartDedupesoftware
-Optionaldatade-dupandcompressionenablinguptoa3:1datareduction
OrganizationscanachieveAIatscaleinacost-effectivemanner,enablingthemtohandlemulti-petabytedatasetswithhighresolutioncontentwithoutre-architectureand/orperformancedegradation.
ThereareseveralkeyfeaturesofOneFSthatmakeitanexcellentstoragesystemforDLworkloadsthatrequireperformance,concurrency,andscale.Thesefeaturesarelistedbelow.
?StorageTieringusingDellEMCPowerScaleSmartPoolssoftwareenablesmultiplelevelsofperformance,protection,andstoragedensitytoco-existwithinthesamefilesystemandunlockstheabilitytoaggregateandconsolidateawiderangeofapplicationswithinasingleextensible,ubiquitousstorageresourcepool.Thishelpsprovidegranularperformanceoptimization,workflowisolation,higherutilization,andindependentscalability–allwithasinglepointofmanagement.Formoredetails,see
StorageTieringwithDellEMCIsilonSmartPools
.
?OneFScachinginfrastructuredesignispredicatedonaggregatingthecachepresentoneachnodeinaclusterintoonegloballyaccessiblepoolofmemory.Thisallowsallthememorycacheinanodetobeavailabletoeverynodeinthecluster.OneFScantakeadvantageofprefetchingofdatabasedonheuristicsusedbytheIsilonSmartReadcomponent.Thisgreatlyimprovessequential-readperformanceacrossallprotocolsandmeansthatreadscomedirectlyfromRAMwithinmilliseconds.Forhigh-sequentialcases,SmartReadcanveryaggressivelyprefetchahead,allowingreadsofindividualfilesatveryhighdatarates.Formoredetails,see
OneFSSmartFlash
.
?OneFShasafullydistributedlockmanagerthatcoordinateslocksondataacrossallnodesinastoragecluster.EfficientlockingiscriticaltosupporttheefficientparallelI/OprofiledemandedbymanyiterativeDLworkloadsenablingconcurrentfilereadaccessupintothemillions.Formoredetails,seethe
OneFSTechnicalOverview
.
2.3Networking
2.3.1DellEMCPowerSwitchdatacenterswitches
ThebenchmarktestinginthisbriefwasperformedinNVIDIA’spartnerfacilityandthenetworkingmaterialsmentionedrepresenttheequipmenttheyusedduringthetesting.DellTechnologiesofferstop-of-rackswitchesbuiltforbuildinghigh-capacitynetworkfabrics,andcore/aggregationswitchesdesignedforbuildingoptimizeddatacenterleaf/spinefabricsofvirtuallyanysize.DellEMCPowerSwitchS-andZ-SeriesaretestedandproveninDellTechnologies’performancelabs,toprankedinindustrytests(
Tolly
and
IT
Brand
Pulse),andarecurrentlydeployedincustomerdatacentersaroundtheworld.
LearnmoreaboutDellEMCPowerSwitchS-andZ-Series
10DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
DellEMCPowerSwitchDataCenterQuickReferenceGuide
2.3.2NVIDIAMellanoxSN3700VEthernetswitchforStorage
TheNVIDIAMellanoxSN3700VEthernetswitchesprovidethehighspeed“front-end”EthernetconnectivitybetweentheIsilonF800clusternodesandNVIDIADGXA100systems.TheF800nodesconnectwith25GbEor40GbEconnections,theDGXA100systemsconnectwith100GbEor200GbEconnections,andtheSN3700switchesautomaticallyforwardtrafficacrossthedifferentspeedconnectionswithminimallatency.BasedontheNVIDIASpectrum-2switchASICandpurposebuiltforthemoderndatacenter,theSN3000switchescombinehighperformancepacketprocessing,richdatacenterfeatures,cloudnetworkscaleandvisibility.Aflexibleunifiedbuffertoensurefairandpredictableperformanceacrossanycombinationofportsandspeedsfrom10Gb/sto200Gb/s,andanOpenEthernetdesignsupportsmultiplenetworkOSchoicesincludingNVIDIACumulusLinux,NVIDIAOnyx,andSONiC.
Learnmoreaboutthe
NVIDIAMellanoxSpectrumSN3000seriesswitches
.
2.3.3NVIDIAMellanoxQM8700InfiniBandswitchforGPUInterconnect
TheNVIDIAMellanoxQM8700InfiniBandswitchesprovidehigh-throughput,low-latencynetworkingbetweentheDGXA100systems.DesignedforbothEDR100Gb/sandHDR200Gb/sInfiniBandlinks,theyminimizelatencyandmaximizethroughputforallGPU-to-GPUcommunicationbetweensystems.TheQM8700switchessupportRemoteDirectMemoryAccess(RDMA)andin-networkcomputingoffloadsforAIanddataanalyticstoenablefasterandmoreefficientdatatransfers.TheysupportNVIDIAGPUDirect,MellanoxSHARPfornetwork-basedAIandanalyticsoffloads(suchasMPIAllReduce),andMellanoxSHIELDformaximumresiliencyinaself-healingnetwork.
Learnmoreaboutthe
NVIDIAMellanoxQuantumQM8700InfiniBandswitches
,
2.4Compute:NVIDIADGXA100system
TheDGXA100system
(Figure4)
isafullyintegrated,turnkeyhardwareandsoftwaresystemthatispurpose-builtforDLworkflows.EachDGXA100systemispoweredbyeightNVIDIAA100TensorCoreGPUsthatareinterconnectedusingNVIDIANVSwitch?technology,whichprovidesanultra-highbandwidthlow-latencyfabricforinter-GPUcommunication.Thistopologyisessentialformulti-GPUtraining,eliminatingthebottleneckthatisassociatedwithPCIe-basedinterconnectsthatcannotdeliverlinearityofperformanceasGPUcountincreases.TheDGXA100systemisalsoequippedwitheightsingle-portNVIDIAMellanoxConnectX-6VPIHDRInfiniBandadaptersforclusteringandtwodual-portConnectX-6VPIEthernetadapterforstorageandnetworking,allcapableof200Gb/s.
11DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
NVIDIADGXA100systemwitheightNVIDIAA100TensorCoreGPUs
2.5
2.6
NVIDIANGC
TheNVIDIANGC?containerregistryprovidesresearchers,datascientistsanddeveloperswithsimpleaccesstoacomprehensivecatalogofGPU-acceleratedsoftwareforAI,DL,machinelearning(ML)andHPCthattakefulladvantageofNVIDIADGXA100systems.NGCprovidescontainersfortoday’smostpopularAIframeworkssuchasRAPIDS,Caffe2,TensorFlow,PyTorch,MXNetandTensorRT,whichareoptimizedforNVIDIAGPUs.Thecontainersintegratetheframeworkorapplication,necessarydrivers,librariesandcommunicationsprimitivesandtheyareoptimizedacrossthestackbyNVIDIAformaximumGPU-acceleratedperformance.NGCcontainersincorporatetheNVIDIACUDA?Toolkit,whichprovidestheNVIDIACUDABasicLinearAlgebraSubroutinesLibrary(cuBLAS),theNVIDIACUDADeepNeuralNetworkLibrary(cuDNN),andmuchmore.TheNGCcontainersalsoincludetheNVIDIACollectiveCommunicationsLibrary(NCCL)formulti-GPUandmulti-nodecollectivecommunicationprimitives,enablingtopologyawarenessforDLtraining.NCCLenablescommunicationbetweenGPUsinsideasingleDGXA100systemandacrossmultipleDGXA100systems.
Billofmaterials
Billofmaterials
Component
Purpose
Quantity
?DellEMCIsilonF800
?96TBSSD
?1TBRAM
?Four1GbE,eight40GbEinterfaces
Sharedstorage
1chassis
(4nodes)
?NVIDIAMellanoxSN3700V200GbEthernetSwitch
StorageFabricSwitch
2
?NVIDIAMellanoxQM8700InfiniBandHDRSwitch
ComputeFabricSwitch
2
?NVIDIADGXA100system
?8NVIDIAA100TensorCoreGPUswith40GB
ComputeServer
4
12DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
Component
Purpose
Quantity
?Two64-CoreAMDEPYC7742@3.3GHz
?1TBRAM
?2xDual-PortNVIDIAMellanoxConnectX-6VPI200Gb/sEthernet
?8xSingle-PortNVIDIAMellanoxConnectX-6VPI200Gb/sHDRInfiniBand
?SOFTWAREVERSIONS
Softwareversionsthatweretestedforthisdocument
Component
Version
?DellEMCIsilon–OneFS
?
?Patches:8.2.1_KGA-RUP_2020-04_268538,8.2.1_UGA-PATCH-INFRA_2019-11_263088,8.2.1_UGA-RUP_2020-04_268536
?NVIDIAMellanoxSN3700V–NCLUVersion
?1.0-cl4.2.1u1
?NVIDIAMellanoxSN3700V–DistributionRelease
?4.2.1
?NVIDIAMellanoxQM8700ProductRelease
?3.9.0606
?DGXA100–BaseOS
?4.99.11
?DGXA100–Linuxkernel
?5.3.0-59-generic
?DGXA100–NVIDIADriver
?450.51.06
?DGXA100–Ubuntu
?18.04.5LTS
?NVIDIANGCMXNetImage
?nvcr.io/nvidia/mxnet:20.06-py3
?MLPerfBenchmarks
?
/mlperf/training_results_v0.7/tree/master/N
VIDIA/benchmarks/resnet/implementations/mxnet
Deeplearningtrainingperformanceandanalysis
13DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
3Deeplearningtrainingperformanceandanalysis
3.1Benchmarkmethodology
Inordertomeasuretheperformanceofthesolution,theimageclassificationbenchmarkfromthe
MLPerf
BenchmarkSuite
repositorywasexecuted.Thisbenchmarkperformstrainingofanimageclassificationconvolutionalneuralnetwork(CNN)onlabeledimagesusingMXNet.Essentially,thesystemlearnswhetheranimagecontainsacat,dog,car,train,etc.Thewell-knownILSVRC2012imagedataset(oftenreferredtoasImageNet)wasused.Thisdatasetcontains1,281,167trainingimagesin144.8GB1.Allimagesaregroupedinto1000categoriesorclasses.ThisdatasetiscommonlyusedbyDLresearchersforbenchmarkingandcomparisonstudies.
TheindividualJPEGimagesintheImageNetdatasetwereconvertedtoRecordIOformat.Thedatasetwasnotresized,notnormalizedandnopreprocessingwasperformedontherawImageNetJPEGimages.ItmaintainstheimagecompressionofferedbytheJPEGformatandthetotalsizeofthedatasetremainedroughlythesame(148GB).Theaverageimagesizewas115KB.
ThebenchmarkresultsinthissectionwereobtainedwithfourF800nodesinthecluster.Eachresultistheaverageoffiveexecutions.
3.2
MLPerfBenchmarkresults
Thereareafewconclusionsthatwecanmakefromthebenchmarksrepresentedin
Figure5
.
?Imagethroughputandthereforestoragethroughputscalelinearlyfrom8to32GPUs.
?ThedifferencebetweenEpoch0(whenthedataispulledfromstorageandcached)andOverallisminor,sothestorageisnotabottleneck.
1AllunitprefixesinthisdocumentusetheSIstandard(base10)where1GBis1billionbytes.
Deeplearningtrainingperformanceandanalysis
14DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
ResNet-50
Images/Sec
80,000
70,000
60,000
50,000
40,000
30,000
20,000
10,000
-
,705
54
20
14
78,4
76,7
747
39,
38,6
,947
21
20
8
32
16
GPUs
OverallEpoch0
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 2024工廠(chǎng)盤(pán)讓買(mǎi)賣(mài)合同
- 2024商品房買(mǎi)賣(mài)合同(預(yù)售)
- 2024不動(dòng)產(chǎn)附負(fù)擔(dān)贈(zèng)與合同下載
- 編程代碼大全(15篇)
- 2024上海市技術(shù)開(kāi)發(fā)合同
- 2024標(biāo)準(zhǔn)的贈(zèng)與合同格式
- 2024學(xué)校食堂委托經(jīng)營(yíng)的合同
- 2024勞務(wù)合同模板國(guó)際勞務(wù)合同范本
- 2024廣東省甘蔗種植訂購(gòu)合同范本
- 2024股票轉(zhuǎn)讓合同范本
- 鋼結(jié)構(gòu)工程冬季施工方案
- 2024年宏觀經(jīng)濟(jì)發(fā)展情況分析報(bào)告
- 攝影入門(mén)課程-攝影基礎(chǔ)與技巧全面解析
- 251直線(xiàn)與圓的位置關(guān)系(第1課時(shí))(導(dǎo)學(xué)案)(原卷版)
- XX有限公司人員分流方案
- 大語(yǔ)言模型賦能自動(dòng)化測(cè)試實(shí)踐、挑戰(zhàn)與展望-復(fù)旦大學(xué)(董震)
- 期中模擬檢測(cè)(1-3單元)2024-2025學(xué)年度第一學(xué)期西師大版二年級(jí)數(shù)學(xué)
- 追覓科技在線(xiàn)測(cè)評(píng)邏輯題
- 2024-2030年中國(guó)演藝行業(yè)發(fā)展分析及發(fā)展前景與趨勢(shì)預(yù)測(cè)研究報(bào)告
- 2024年重慶市渝北區(qū)數(shù)據(jù)谷八中小升初數(shù)學(xué)試卷
- 凝中國(guó)心鑄中華魂鑄牢中華民族共同體意識(shí)-小學(xué)民族團(tuán)結(jié)愛(ài)國(guó)主題班會(huì)課件
評(píng)論
0/150
提交評(píng)論