基于深度學(xué)習(xí)Dell Power scale和Nvidia DGX A100系統(tǒng)_第1頁(yè)
基于深度學(xué)習(xí)Dell Power scale和Nvidia DGX A100系統(tǒng)_第2頁(yè)
基于深度學(xué)習(xí)Dell Power scale和Nvidia DGX A100系統(tǒng)_第3頁(yè)
基于深度學(xué)習(xí)Dell Power scale和Nvidia DGX A100系統(tǒng)_第4頁(yè)
基于深度學(xué)習(xí)Dell Power scale和Nvidia DGX A100系統(tǒng)_第5頁(yè)
已閱讀5頁(yè),還剩25頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

TechnicalWhitePaper

H18597

DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning

Abstract

ThisdocumentdemonstrateshowtheDellEMCIsilonF800all-flashscale-outNASandNVIDIADGX?A100systemswithNVIDIA?A100TensorCoreGPUscanbeusedtoaccelerateandscaledeeplearningtrainingworkloads

November2020

Revisions

2DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597

Revisions

Date

Description

November2020

Initialrelease

Acknowledgments

Author:DamienMas

Other:NVIDIA

Theinformationinthispublicationisprovided“asis.”DellInc.makesnorepresentationsorwarrantiesofanykindwithrespecttotheinformationinthispublication,andspecificallydisclaimsimpliedwarrantiesofmerchantabilityorfitnessforaparticularpurpose.

Use,copying,anddistributionofanysoftwaredescribedinthispublicationrequiresanapplicablesoftwarelicense.

ThisdocumentmaycontaincertainwordsthatarenotconsistentwithDell'scurrentlanguageguidelines.Dellplanstoupdatethedocumentoversubsequentfuturereleasestorevisethesewordsaccordingly.

ThisdocumentmaycontainlanguagefromthirdpartycontentthatisnotunderDell'scontrolandisnotconsistentwithDell'scurrentguidelinesforDell'sowncontent.Whensuchthirdpartycontentisupdatedbytherelevantthirdparties,thisdocumentwillberevisedaccordingly.

Copyright?2020–2021DellInc.oritssubsidiaries.AllRightsReserved.DellTechnologies,Dell,EMC,DellEMCandothertrademarksaretrademarksofDellInc.oritssubsidiaries.Othertrademarksmaybetrademarksoftheirrespectiveowners.[3/31/2021][TechnicalWhitePaper][H18597]

Tableofcontents

3DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597

Tableofcontents

Revisions 2

Acknowledgements 2

Tableofcontents 3

Executivesummary 4

Audience 4

1Introduction 5

1.1Deeplearningdataflow 5

2Solutionarchitecture 7

2.1Overview 7

2.2Storage–DellEMCIsilonF800 8

2.3Networking 9

2.3.1DellEMCPowerSwitchdatacenterswitches 9

2.3.2NVIDIAMellanoxSN3700VEthernetswitchforStorage 10

2.3.3NVIDIAMellanoxQM8700InfiniBandswitchforGPUInterconnect 10

2.4Compute:NVIDIADGXA100system 10

2.5NVIDIANGC 11

2.6Billofmaterials 11

3Deeplearningtrainingperformanceandanalysis 13

3.1Benchmarkmethodology 13

3.2MLPerfBenchmarkresults 13

3.3NVIDIAcollectivecommunicationlibrary(NCCL) 14

3.4Storage-onlyperformanceusingFIO 15

4Solutionsizingguidance 17

5Conclusion 18

6References 19

Executivesummary

4DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597

Executivesummary

Deeplearning(DL)techniqueshaveenabledgreatsuccessesinmanyfieldssuchascomputervision,naturallanguageprocessing(NLP),gamingandautonomousdrivingbyenablingamodeltolearnfromexistingdataandthentomakecorrespondingpredictions.Thesuccessisduetoacombinationofimprovedalgorithms,accesstolargerdatasetsandincreasedcomputationalpower.Tobeeffectiveatenterprisescale,thecomputationalintensityofDLrequireshighlyefficientparallelarchitectures.Thechoiceanddesignofthesystemcomponents,carefullyselectedandtunedforDLuse-cases,canhaveabigimpactonthespeed,accuracyandbusinessvalueofimplementingartificialintelligence(AI)techniques.

Insuchademandingenvironment,itiscriticalthatorganizationsbeabletorelyonvendorsthattheytrust.Overthelastfewyears,DellTechnologiesandNVIDIAhaveestablishedastrongpartnershiptohelporganizationsfast-tracktheirAIinitiatives.OurpartnershipisbuiltonthephilosophyofofferingflexibilityandinformedchoiceacrossabroadportfoliowhichcombinesbestofbreedGPUacceleratedcompute,scale-outstorage,andnetworking.

ThispaperfocusesonhowDellEMCIsilonF800all-flashscale-outNASacceleratesAIinnovationbydeliveringtheperformance,scalabilityandI/OconcurrencytocomplementtherequirementsofNVIDIADGXA100systemsforhigh-performanceAIworkloads.

Audience

ThisdocumentisintendedfororganizationsinterestedinsimplifyingandacceleratingDLsolutionswithadvancedcomputingandscale-outdatamanagementsolutions.Solutionarchitects,systemadministratorsandotherinterestedreaderswithinthoseorganizationsconstitutethetargetaudience.

Introduction

5DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597

1

1.1

Introduction

DLisanareaofAIwhichusesartificialneuralnetworkstoenableaccuratepatternrecognitionofcomplexreal-worldpatternsbycomputers.Thesenewlevelsofinnovationhaveapplicabilityacrossnearlyeveryindustryvertical.Someoftheearlyadoptersincludeadvancedresearch,precisionmedicine,hightechmanufacturing,advanceddriverassistancesystems(ADAS)andautonomousdriving.Buildingontheseinitialsuccesses,AIinitiativesarespringingupinvariousbusinessunits,suchasmanufacturing,customersupport,lifesciences,marketing,andsales.

Gartner

predictsthatAIaugmentationwillgenerate$2.9trillioninbusinessvalueby2021alone.Organizationsarefacedwithamultitudeofcomplexchoicesrelatedtodata,analyticskillsets,softwarestacks,analytictoolkits,andinfrastructurecomponents;eachwithsignificantimplicationsonthetimetomarketandthevalueassociatedwiththeseinitiatives.

Insuchacomplexenvironment,itiscriticalthatorganizationsbeabletorelyonvendorsthattheytrust.Overthelastfewyears,DellTechnologiesandNVIDIAhaveestablishedastrongpartnershiptohelporganizationsacceleratetheirAIinitiatives.Ourpartnershipisbuiltonthephilosophyofofferingflexibilityandinformedchoiceacrossanextensiveportfolio.TogetherourtechnologiesprovidethefoundationforsuccessfulAIsolutionswhichdrivethedevelopmentofadvancedDLsoftwareframeworks,delivermassivelyparallelcomputeintheformofNVIDIAGPUsforparallelmodeltrainingandscale-outfilesystemstosupporttheconcurrency,performance,andcapacityrequirementsofunstructuredimageandvideodatasets.

ThisdocumentfocusesonthelateststepintheDellTechnologiesandNVIDIAcollaboration,anewAIreferencearchitecturewithDellEMCIsilonF800storageandDGXA100systemsforDLworkloads.Thisnewoffergivescustomersmoreflexibilityinhowtheydeployscalable,highperformanceDLinfrastructure.TheresultsofstandardimageclassificationtrainingbenchmarkusingMLPerf0.7andmicro-benchmarkutilities,areincluded.

Deeplearningdataflow

Asvisualizedin

Figure1

,DLusuallyconsistsoftwodistinctworkflows,modeldevelopmentandinference.

CommonDLWorkflows-Modeldevelopmentandinference

Note:TheIsilonstorageandDGXA100systemarchitectureisoptimizedforthemodeldevelopmentworkflowwhichconsistsofthemodeltrainingandthebatchinferencevalidationsteps.Itisnotintendedforandnorwasitbenchmarkedforproductioninference.

Introduction

6DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597

Theworkflowstepsaredefinedanddetailedbelow.

1.IngestLabeledData—Thelabeleddata(e.g.imagesandtheirlabelswhichindicatewhethertheimagecontainsadog,cat,orhorse)areingestedintotheDLsystem.

2.Transform—TransformationincludesalloperationsthatareappliedtothelabeleddatabeforetheyarepassedtotheDLalgorithm.Itissometimesreferredtoaspreprocessing.Forimages,thisoftenincludesfileparsing,JPEGdecoding,cropping,resizing,rotation,andcoloradjustments.Transformationscanbeperformedontheentiredatasetaheadoftime,storingthetransformeddataondisk.Manytransformationscanalsobeappliedinatrainingpipeline,avoidingtheneedtostoretheintermediatedata.

3.TrainModel—Themodelparameters(edgeweights)arelearnedfromthelabeleddatausingthestochasticgradientdescentoptimizationmethod.Inthecaseofimageclassification,thereareseveralprebuiltstructuresofneuralnetworksthathavebeenshowntoworkwell.

4.ValidateModel—Oncethemodeltrainingphasecompleteswithasatisfactoryaccuracy,you’llwanttomeasuretheaccuracyofitonvalidationdata–datathatthemodeltrainingprocesshasnotseen.Thisisdonebyusingthetrainedmodeltomakeinferencesfromthevalidationdataandcomparingtheresultwiththecorrectlabel.Thisisoftenreferredtoasinferencebutkeepinmindthatthisisadistinctstepfromproductioninference.

5.ProductionInference—Thetrainedandvalidatedmodelisthenoftendeployedtoasystemthatcanperformreal-timeinference.Itwillacceptasinputasingleimageandoutputthepredictedclass(dog,cat,horse).Insomecases,inputsarebatchedforhigherthroughputbuthigherlatency.

Solutionarchitecture

7DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597

2

2.1

Solutionarchitecture

Overview

Figure2

illustratesthereferencearchitectureshowingthekeycomponentsthatmadeupthesolutionasitwastestedandbenchmarked.Notethatinacustomerdeployment,thenumberofDGXA100systemsand

F800storagenodeswillvaryandcanbescaledindependentlytomeettherequirementsofthespecificDL

workloads.Referto

Solutionsizingguidance

fordetails.

Note:Backend40GbEswitches

forF800notshown

(4)DellEMCIsilonF800nodesin(1)F800chassis

40GbENFS

100GbENFS

200GbHDRIB

200GbEISL

(2)SN3700Vswitches

(4)DGXA100systems

Note:QM8700wereinterconnectedwitheightISLs

(2)QM8700InfiniBandswitches

ReferenceArchitecture

8DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597

2.2

Storage–DellEMCIsilonF800

DellEMCIsilonF800representsthesixthgenerationofhardwarebuilttorunthewell-provenandmassivelyscalableDellEMCPowerScaleOneFSoperatingsystem.EachDellEMCIsilonF800chassis,shownin

Figure3

,containsfourstoragenodes,60high-performancesolidstatedrives(SSDs)andeight40GbEnetworkconnections.OneFScombinesupto252nodesin63chassisintoasinglehigh-performancefilesystemdesignedtohandlethemostI/O-intenseworkloadssuchasDL.Asperformanceandcapacitydemandsincrease,theplatformcanbescaled-outsimplyandnon-disruptively,allowingapplicationsanduserstocontinueworking.

DellEMCIsilonF800chassis,containingfourstoragenodes

Inthesolutiontestedinthisdocument,fourF800nodes,inonechassis,wereused.

DellEMCIsilonF800hasthefollowingfeatures.

?Lowlatency,highthroughput,andmassivelyparallelI/OforAI

-Upto250,000fileIOPSperchassis,upto15.75millionIOPSpercluster

-Upto15GB/sthroughputperchassis,upto945GB/spercluster

-96TBto924TBrawflashcapacityperchassis;upto58PBpercluster(all-flash)

ThisshortenstimefortrainingandtestinganalyticalmodelsfordatasetsfromtensofTBstotensofPBsonAIplatformssuchasRAPIDS,TensorFlow,SparkML,Caffe,orproprietaryAIplatforms.

?TheabilitytorunAIin-placeondatausingmulti-protocolaccess

-Multi-protocolsupportsuchasSMB,NFS,HTTP,S3,andnativeHDFStomaximizeoperationalflexibility

Thiseliminatestheneedtomigrate/copydataandresultsovertoaseparateAIstack.OrganizationscanperformDLandrunotherITappsonthesamedataalreadyonIsilonbyaddingadditionalIsilonnodestoanexistingcluster.

?Enterprisegradefeaturesout-of-box

-Enterprisedataprotectionandresiliency

-Robustsecurityoptions

ThisenablesorganizationstomanageAIdatalifecyclewithminimalcostandrisk,whileprotectingdataandmeetingregulatoryrequirements.

?Extremescale

9DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597

-SeamlesslytierbetweenAllFlash,Hybrid,andArchivenodesviaDellEMCPowerScaleSmartPools

-Grow-as-you-goscalabilitywithupto58PBflashcapacitypercluster

-Newnodescanbeaddedtoaclustersimplybyconnectingpower,back-endEthernetandfront-endEthernet

-Asnewnodesareadded,storagecapacity,throughput,IOPS,cache,andCPUgrow

-Upto63chassis(252nodes)maybeconnectedtoformasingleclusterwithasinglenamespaceandasinglecoherentcache

-Upto85%storageefficiencytoreducecostswithDellEMCPowerScaleSmartDedupesoftware

-Optionaldatade-dupandcompressionenablinguptoa3:1datareduction

OrganizationscanachieveAIatscaleinacost-effectivemanner,enablingthemtohandlemulti-petabytedatasetswithhighresolutioncontentwithoutre-architectureand/orperformancedegradation.

ThereareseveralkeyfeaturesofOneFSthatmakeitanexcellentstoragesystemforDLworkloadsthatrequireperformance,concurrency,andscale.Thesefeaturesarelistedbelow.

?StorageTieringusingDellEMCPowerScaleSmartPoolssoftwareenablesmultiplelevelsofperformance,protection,andstoragedensitytoco-existwithinthesamefilesystemandunlockstheabilitytoaggregateandconsolidateawiderangeofapplicationswithinasingleextensible,ubiquitousstorageresourcepool.Thishelpsprovidegranularperformanceoptimization,workflowisolation,higherutilization,andindependentscalability–allwithasinglepointofmanagement.Formoredetails,see

StorageTieringwithDellEMCIsilonSmartPools

.

?OneFScachinginfrastructuredesignispredicatedonaggregatingthecachepresentoneachnodeinaclusterintoonegloballyaccessiblepoolofmemory.Thisallowsallthememorycacheinanodetobeavailabletoeverynodeinthecluster.OneFScantakeadvantageofprefetchingofdatabasedonheuristicsusedbytheIsilonSmartReadcomponent.Thisgreatlyimprovessequential-readperformanceacrossallprotocolsandmeansthatreadscomedirectlyfromRAMwithinmilliseconds.Forhigh-sequentialcases,SmartReadcanveryaggressivelyprefetchahead,allowingreadsofindividualfilesatveryhighdatarates.Formoredetails,see

OneFSSmartFlash

.

?OneFShasafullydistributedlockmanagerthatcoordinateslocksondataacrossallnodesinastoragecluster.EfficientlockingiscriticaltosupporttheefficientparallelI/OprofiledemandedbymanyiterativeDLworkloadsenablingconcurrentfilereadaccessupintothemillions.Formoredetails,seethe

OneFSTechnicalOverview

.

2.3Networking

2.3.1DellEMCPowerSwitchdatacenterswitches

ThebenchmarktestinginthisbriefwasperformedinNVIDIA’spartnerfacilityandthenetworkingmaterialsmentionedrepresenttheequipmenttheyusedduringthetesting.DellTechnologiesofferstop-of-rackswitchesbuiltforbuildinghigh-capacitynetworkfabrics,andcore/aggregationswitchesdesignedforbuildingoptimizeddatacenterleaf/spinefabricsofvirtuallyanysize.DellEMCPowerSwitchS-andZ-SeriesaretestedandproveninDellTechnologies’performancelabs,toprankedinindustrytests(

Tolly

and

IT

Brand

Pulse),andarecurrentlydeployedincustomerdatacentersaroundtheworld.

LearnmoreaboutDellEMCPowerSwitchS-andZ-Series

10DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597

DellEMCPowerSwitchDataCenterQuickReferenceGuide

2.3.2NVIDIAMellanoxSN3700VEthernetswitchforStorage

TheNVIDIAMellanoxSN3700VEthernetswitchesprovidethehighspeed“front-end”EthernetconnectivitybetweentheIsilonF800clusternodesandNVIDIADGXA100systems.TheF800nodesconnectwith25GbEor40GbEconnections,theDGXA100systemsconnectwith100GbEor200GbEconnections,andtheSN3700switchesautomaticallyforwardtrafficacrossthedifferentspeedconnectionswithminimallatency.BasedontheNVIDIASpectrum-2switchASICandpurposebuiltforthemoderndatacenter,theSN3000switchescombinehighperformancepacketprocessing,richdatacenterfeatures,cloudnetworkscaleandvisibility.Aflexibleunifiedbuffertoensurefairandpredictableperformanceacrossanycombinationofportsandspeedsfrom10Gb/sto200Gb/s,andanOpenEthernetdesignsupportsmultiplenetworkOSchoicesincludingNVIDIACumulusLinux,NVIDIAOnyx,andSONiC.

Learnmoreaboutthe

NVIDIAMellanoxSpectrumSN3000seriesswitches

.

2.3.3NVIDIAMellanoxQM8700InfiniBandswitchforGPUInterconnect

TheNVIDIAMellanoxQM8700InfiniBandswitchesprovidehigh-throughput,low-latencynetworkingbetweentheDGXA100systems.DesignedforbothEDR100Gb/sandHDR200Gb/sInfiniBandlinks,theyminimizelatencyandmaximizethroughputforallGPU-to-GPUcommunicationbetweensystems.TheQM8700switchessupportRemoteDirectMemoryAccess(RDMA)andin-networkcomputingoffloadsforAIanddataanalyticstoenablefasterandmoreefficientdatatransfers.TheysupportNVIDIAGPUDirect,MellanoxSHARPfornetwork-basedAIandanalyticsoffloads(suchasMPIAllReduce),andMellanoxSHIELDformaximumresiliencyinaself-healingnetwork.

Learnmoreaboutthe

NVIDIAMellanoxQuantumQM8700InfiniBandswitches

,

2.4Compute:NVIDIADGXA100system

TheDGXA100system

(Figure4)

isafullyintegrated,turnkeyhardwareandsoftwaresystemthatispurpose-builtforDLworkflows.EachDGXA100systemispoweredbyeightNVIDIAA100TensorCoreGPUsthatareinterconnectedusingNVIDIANVSwitch?technology,whichprovidesanultra-highbandwidthlow-latencyfabricforinter-GPUcommunication.Thistopologyisessentialformulti-GPUtraining,eliminatingthebottleneckthatisassociatedwithPCIe-basedinterconnectsthatcannotdeliverlinearityofperformanceasGPUcountincreases.TheDGXA100systemisalsoequippedwitheightsingle-portNVIDIAMellanoxConnectX-6VPIHDRInfiniBandadaptersforclusteringandtwodual-portConnectX-6VPIEthernetadapterforstorageandnetworking,allcapableof200Gb/s.

11DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597

NVIDIADGXA100systemwitheightNVIDIAA100TensorCoreGPUs

2.5

2.6

NVIDIANGC

TheNVIDIANGC?containerregistryprovidesresearchers,datascientistsanddeveloperswithsimpleaccesstoacomprehensivecatalogofGPU-acceleratedsoftwareforAI,DL,machinelearning(ML)andHPCthattakefulladvantageofNVIDIADGXA100systems.NGCprovidescontainersfortoday’smostpopularAIframeworkssuchasRAPIDS,Caffe2,TensorFlow,PyTorch,MXNetandTensorRT,whichareoptimizedforNVIDIAGPUs.Thecontainersintegratetheframeworkorapplication,necessarydrivers,librariesandcommunicationsprimitivesandtheyareoptimizedacrossthestackbyNVIDIAformaximumGPU-acceleratedperformance.NGCcontainersincorporatetheNVIDIACUDA?Toolkit,whichprovidestheNVIDIACUDABasicLinearAlgebraSubroutinesLibrary(cuBLAS),theNVIDIACUDADeepNeuralNetworkLibrary(cuDNN),andmuchmore.TheNGCcontainersalsoincludetheNVIDIACollectiveCommunicationsLibrary(NCCL)formulti-GPUandmulti-nodecollectivecommunicationprimitives,enablingtopologyawarenessforDLtraining.NCCLenablescommunicationbetweenGPUsinsideasingleDGXA100systemandacrossmultipleDGXA100systems.

Billofmaterials

Billofmaterials

Component

Purpose

Quantity

?DellEMCIsilonF800

?96TBSSD

?1TBRAM

?Four1GbE,eight40GbEinterfaces

Sharedstorage

1chassis

(4nodes)

?NVIDIAMellanoxSN3700V200GbEthernetSwitch

StorageFabricSwitch

2

?NVIDIAMellanoxQM8700InfiniBandHDRSwitch

ComputeFabricSwitch

2

?NVIDIADGXA100system

?8NVIDIAA100TensorCoreGPUswith40GB

ComputeServer

4

12DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597

Component

Purpose

Quantity

?Two64-CoreAMDEPYC7742@3.3GHz

?1TBRAM

?2xDual-PortNVIDIAMellanoxConnectX-6VPI200Gb/sEthernet

?8xSingle-PortNVIDIAMellanoxConnectX-6VPI200Gb/sHDRInfiniBand

?SOFTWAREVERSIONS

Softwareversionsthatweretestedforthisdocument

Component

Version

?DellEMCIsilon–OneFS

?

?Patches:8.2.1_KGA-RUP_2020-04_268538,8.2.1_UGA-PATCH-INFRA_2019-11_263088,8.2.1_UGA-RUP_2020-04_268536

?NVIDIAMellanoxSN3700V–NCLUVersion

?1.0-cl4.2.1u1

?NVIDIAMellanoxSN3700V–DistributionRelease

?4.2.1

?NVIDIAMellanoxQM8700ProductRelease

?3.9.0606

?DGXA100–BaseOS

?4.99.11

?DGXA100–Linuxkernel

?5.3.0-59-generic

?DGXA100–NVIDIADriver

?450.51.06

?DGXA100–Ubuntu

?18.04.5LTS

?NVIDIANGCMXNetImage

?nvcr.io/nvidia/mxnet:20.06-py3

?MLPerfBenchmarks

?

/mlperf/training_results_v0.7/tree/master/N

VIDIA/benchmarks/resnet/implementations/mxnet

Deeplearningtrainingperformanceandanalysis

13DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597

3Deeplearningtrainingperformanceandanalysis

3.1Benchmarkmethodology

Inordertomeasuretheperformanceofthesolution,theimageclassificationbenchmarkfromthe

MLPerf

BenchmarkSuite

repositorywasexecuted.Thisbenchmarkperformstrainingofanimageclassificationconvolutionalneuralnetwork(CNN)onlabeledimagesusingMXNet.Essentially,thesystemlearnswhetheranimagecontainsacat,dog,car,train,etc.Thewell-knownILSVRC2012imagedataset(oftenreferredtoasImageNet)wasused.Thisdatasetcontains1,281,167trainingimagesin144.8GB1.Allimagesaregroupedinto1000categoriesorclasses.ThisdatasetiscommonlyusedbyDLresearchersforbenchmarkingandcomparisonstudies.

TheindividualJPEGimagesintheImageNetdatasetwereconvertedtoRecordIOformat.Thedatasetwasnotresized,notnormalizedandnopreprocessingwasperformedontherawImageNetJPEGimages.ItmaintainstheimagecompressionofferedbytheJPEGformatandthetotalsizeofthedatasetremainedroughlythesame(148GB).Theaverageimagesizewas115KB.

ThebenchmarkresultsinthissectionwereobtainedwithfourF800nodesinthecluster.Eachresultistheaverageoffiveexecutions.

3.2

MLPerfBenchmarkresults

Thereareafewconclusionsthatwecanmakefromthebenchmarksrepresentedin

Figure5

.

?Imagethroughputandthereforestoragethroughputscalelinearlyfrom8to32GPUs.

?ThedifferencebetweenEpoch0(whenthedataispulledfromstorageandcached)andOverallisminor,sothestorageisnotabottleneck.

1AllunitprefixesinthisdocumentusetheSIstandard(base10)where1GBis1billionbytes.

Deeplearningtrainingperformanceandanalysis

14DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597

ResNet-50

Images/Sec

80,000

70,000

60,000

50,000

40,000

30,000

20,000

10,000

-

,705

54

20

14

78,4

76,7

747

39,

38,6

,947

21

20

8

32

16

GPUs

OverallEpoch0

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論