




版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
OptimizingAI/MLWorkflowsinPythonforGPUs
By:DanielHoward
dhoward@
,ConsultingServicesGroup,CISL&NCARDate:August25th,2022
Inthisnotebookweanalysetheoverallworkflowoftypicalmachinelearning/deeplearningprojects,emphasizinghowtoworktowardsoptimalperformanceonGPUs.WewillNOTcovertheoryoforhowtoimplementAIbasedprojects.Wewillcover:
BackgroundonmachinelearningresearchinEarthsciences
SettingupPythonvirtualcondaenvironmentsTheRAPIDSAIsoftwaresuite
GPUenabledTensorFlowandPyTorch
EnablingtuningandprofilingwithTensorFlowandPyTorch
ProfilingwithDLProf/TensorBoardandperformanceoptimizationsforNVIDIATensorCores
WorkshopEtiquette
Pleasemuteyourselfandturnoffvideoduringthesession.
Questionsmaybesubmittedinthechatandwillbeansweredwhenappropriate.Youmayalsoraiseyourhand,unmute,andaskquestionsduringQ&Aattheendofthepresentation.
Byparticipating,youareagreeingto
UCAR?sCodeofConduct
Recordings&othermaterialwillbearchived&sharedpublicly.
FeelfreetofollowupwiththeGPUworkshopteamviaSlackorsubmitsupportrequeststo
OfficeHours:Asynchronoussupportvia
Slack
orscheduleatimewithanorganizer
StartaJupyterHubSession
Headtothe
NCARJupyterHubportal
andstartaJupyterHubsessiononCasperPBSLoginNodeandopenthenotebookat15_OptimizeAIML/15_OptimizeAIML.ipynb.Besuretoclone(ifneeded)andupdate/pulltheNCARGPU_workshopdirectory.YouarewelcometouseaninteractiveGPUnodeforthefinalfewcellsofthisnotebook
#UsetheJupyterHubGitHubGUIontheleftpanelorthebelowshellcommands
gitclonegit@:NCAR/GPU_workshop.gitgitpull
NotebookSetup
TheGPU_TYPE=gp100nodesdonothavetensorcores!Thus,thegpuworkshopqueueisnotasusefulforthissession.Sayingasmuch,pleasesetGPU_TYPE=v100andusethegpudevorcasperqueuebothduringtheworkshopandforindependentwork.See
Casperqueuedocumentation
formoreinfo.
MachineLearningandDeepLearning?
MLandDLarestatisticalmodelsthataredesignedtolearnandpredictbehaviorfromalargeamountofinputtrainingdata.
TheBAMSarticle"
OutlookforExploitingArtificialIntelligenceintheEarthandEnvironmentalSciences
"byBoukabara,etalhighlightsadditionalapplicationsofAIintheEarthSciences.
OveriewofanEarthScienceAIWorkflow-RemoteSensing
MultiplestepsareneededtoenableAIforEarthScience.GPUsarecriticalinthemostexpensivestep,modelbuildingandtraining,sincetheyperformwellwithmatrixalgebra,foundationaltoMLmethods.
Image:ObjectDetectionandImageSegmentationwithDeepLearningonEarthObservationData:AReview
—PartII:ApplicationsbyHoeser,etal
WhyUseAIforEarthScience?
EarthScienceislargelybuiltonphysicsbasedtheoriesanddynamicalinteractionswiththebiosphere.Today,thesemodelshavescaledtoenormoussizes,consumingsignificantcomputationalresourcesanddatastorage.
4kmglobalrunsof
E3SM
(left)over100forecastyearsuses120Mcore-hoursand250GB/forecastday,or12PB.1kmECMWFruns(right),as
inthisarticle
andbyNilsWedi
keynoteatESMD2020
.
AIoffersanopportunitytoreducecomputationalresourcesrequired.FeelfreetoconsultAReviewofEarthArtificialIntelligenceforcurrent"GrandChallenges"
SurrogateModels
NovelwayscanbeexploredtouseEarthSciencedatatoreducerequiredcomputationalresources.Asurrogatemodelinmachinelearningisastatisticalmodeldesignedtomoreefficientlyapproximatetheoutputofaphysicsbasedmodel.
Image:IntroductiontoSurrogateModeling,ShuaiGuo.See"LearningNonlinearDynamicalSystemsfromDataUsingScientificMachineLearning"byMaulik,ANL.
NeuralOrdinaryDifferentialEquations
Forexample,astabilizedneuralODEcanbedesignedtoaccuratelysimulateshocksandchaoticdynamics.
SeepaperbyLinot,etal"StabilizedNeuralOrdinaryDifferentialEquationsforLong-TimeForecastingofDynamicalSystems".
PhysicsInformedNeuralNetworks(PINNs)
OtherapplicationstoconsiderarePhysicsInformedNeuralNetworks.PINNsattempttoembedknownphysicsrelationshipsintothedesignofamachinelearningmodel.ThismayincludedefiningtheNavier-StokesconservationlawsasconditionstominimizeinaMLmodel'slossfunction.
Image:
Wikipedia-PhysicsInformedNearalNetworks
ResourcesforEngagingandLearningAIinEarthSciences
Feelfreetoreachoutto
rchelp@
ifyouwantassistancerecreatingenvironmentsforanybelowcodeexamples.
OLCFAI4ScienceFluidFlowTutorial(
GitHub
)-Uses
MiniWeatherML
OpenHackathonsGPUBootcamp(
GitHub
)-
HPCAIExamples
forPINNs,CFD,andClimate
NSFAIInstituteforResearchonTrustworthyAIinWeather,Climate,andCoastalOceanography(
AI2ES.org
)-
EducationMaterials
and
2022Trust-a-thonGitHub
ArgonneALCF
2021Simulation,Data,andLearningWorkshopforAI(GitHub)-DetailedDLprofilingtutorialnotebooks&
video
2022IntroductiontoAI-drivenScienceonSupercomputers
(
GitHub
)
DataDrivenAtmosphericandWaterDynamicsBeuclerLab(U.ofLausanne-Switzerland)
GettingStartedwithMachineLearning
curatedresourcelist
NOAAWorkshoponLeveragingArtificialIntelligenceinEnvironmentalSciences
-4thWorkshopfreetoregister,virtualSept6-92022
NationalAcademies-2022workshopMachineLearningandArtificialIntelligencetoAdvanceEarth
SystemScience:OpportunitiesandChallenges
ClimateInformatics
community-
Conferences
and
Hackathons
Book-DeeplearningfortheEarthSciences--Acomprehensiveapproachtoremotesensing,climatescienceandgeosciences
climatechange.ai
-Globalinitiativetocatalyzeimpactfulworkattheintersectionofclimatechangeand
machinelearning.
HowtoManagePythonSoftwareforMLandDLModels
ThePythonecosystemalreadyprovidesmanyrobustpre-builtsoftwarepackagesandlibrarieswhicharecontinuallymaintained.LearningaboutandemployingthePythonecosystemwellcansimplifytheprocessofusingmachinelearningtools.
ThekernelGPU_Workshopalreadyhasmanyusefulpackagesplusothers(notably
Horovod
fordistributeddeep
learning)whichyouarewelcometoexploreonyourownbeyondthisworkshop.
RunthebelowcelltogetalistingofallpackagesinstalledintheGPU_Workshopcondaenvironment.
In[]:
!mambalist-p/glade/work/dhoward/conda/envs/GPU_Workshop/
SettingUpCondaEnvironments
Sinceensuringcompatibilityandreproducibilityisdifficultacrosspythonpackageenvironments,youareencouragedtomaintainyourownpersonalizedcondavirtualenvironments.Nonetheless,NCARprovidesabasesetofcommonlyusedPythonpackagesviathe
NCARPackageLibrary(NPL)
.NPLdoesincludethefasterpackagemanagementtoolmambawhichusesthesamecommandsyntaxasconda.
Ifyouprefertoinstallyourownandnotusemoduleloadconda,weencourage
Mambaforge
.Ingeneral,mambaissafetousecomparedtoconda.Toupdateallnon-pinnedpackagesinanenvironment,youcanusemambaupdate--all.
ChoosingCondaChannels
Tosourcepackages,thechannelconda-forgeisrecommendedandsetaspriorityonCasperbutotherchannelsyoumayconsiderarencar,nvidia,rapidsai,intel,pytorch,andanacondaamongothers.
Learntomanagechannels
here
usingyour$HOME/.condarcfile
Definepinnedpackages,iepackagesthatshouldstayataspecificversionoruseaspecificbuildtype,viathe
/path/to/env/conda-meta/pinnedfile
RAPIDSAIEnvironment
rapidsaichannelprovides
RAPIDS
,anopensource,NVIDIAmaintainedsuiteforend-to-enddatascienceandanalyticspipelinesonGPUs.FeelfreetoexploreRAPIDS
GettingStartedNotebooks
.
ScaleUpwithRAPIDStoolsandScaleOutwithDask/UCXorHorovodtools.
PythonPackagesandRAPIDSEquivalents
InstallRAPIDSenvironment
Settingflexiblechannelpriorityviacondaconfig--setchannel_priorityflexibleorin
~/.condarc,followinstalldirections
here
orbyrunning:
condacreate-nrapids-22.08-crapidsai-cnvidia-cconda-forge\rapids=22.08python=3.9cudatoolkit=11.5
InstallingCustomizedPythonPackages
Formorepersonalizedenvironments,anexampleprocesstosetupacondaenvironmentonCasperisbelow:
moduleloadconda
#Createsenvironmentin/glade/work/$USER/conda-envs/my-env-nameorafullyspecifiedpath
mambacreate-nmy-env-namemambaactivatemy-env-name
#ThePythonversioninstalledherewillautomaticallybepinned
#RecommendtonotusethelatestPythonversion(3.10+)givencompatibilityissues
mambainstallpython=3.9*
#EnsureswegetMKLoptimizedpackagestorunonCasper'sIntelCPUs
mambainstallnumpyscipypandasscikit-learnxarray"libblas=*=*mkl"
#EnsurescommonpackagesprovideMPIsupport(typicallydefaultstoOpenMPI).#Usefultopinpackagesin`/path/to/env/conda-meta/pinned`file.
mambainstallmpi4pyfftw=*=mpi*h5py=*=mpi*netcdf4=*=mpi*
Tohighlight,adding
<package-name>=<version>=<build-type>
isimportanttoensureyouinstallthemost
relevantandperformantversionforyourneeds.
Forexample,libblas=*=*mklguaranteesyougettheIntelMKLoptimizedversionsofpackagesthatutilizetheBLASlibrary.The*isawildcardforthelatestversionorotherbuildspecifications/hashes.
GPUEnabledPythonPackagesandTools
mambainstallcudatoolkitcudnncupynvtx
#MakesurepackagewheelIDincludes*cuda*toverifyGPUsupport
mambainstallpytorch=1.12.1=cuda112*
#Don’tusetensorflow-gpupackageaspackagesolverisinconsistentincondo-forgechannel#TFrecommendspipinstallforlatestofficialversionbutconda-forgeversionsalsoworkmambainstalltensorflow=2.9.1=cuda112*
#Enablesaddedprofilingcapabilities,onlyavailableviapipandPyPIorNVIDIA'spackageindex
pipinstallnvidia-pyindex
pipinstallnvidia-dlprofnvidia-dlprof-pytorch-nvtxpipinstalltensorboard_plugin_profile
MLlibraries
pytorch
and
tensorflow
requireadditionalstepstoensuretheyareinstalledwithGPUsupport.
Eachlibrary'sdocumentationlinkedabovehasmoreinfoaboutinstallationoptions.Asofthisworkshop,TensorFlowguaranteessupportuptoCUDAv11.2andPyTorchuptoCUDAv11.6sowespecifiedbuildswith=cuda112*.Runmambasearch<package>toviewallavailablepackagesgivenavailablechannels.
TensorFlowrecommendsinstallationviapipfortheirofficalversionsbutthecommunitydoestendtomaintainsimilarqualityreleasesviaconda-forge.Combiningpipwithconda/mambainstallsshouldbeavoidedifpossibleduetogreaterdifficultyinmaintainingenvironments.
HorovodforDistributedDeepLearning
moduleloadcuda/11.7gnu/10.1.0
mambainstallpipgxx_linux-64cmakencclexportHOROVOD_NCCL_HOME=$CONDA_PREFIXexportHOROVOD_CUDA_HOME=$CUDA_HOME
HOROVOD_GPU_OPERATIONS=NCCLpipinstallhorovod[tensorflow,keras,pytorch]horovodrun--check-build
Notethespecificationof
HOROVOD_GPU_OPERATIONS=NCCL
Fordistributeddeeplearningwith
Horovod
insteadofDask,seebelowor
Horovodinstallationdocumentation
forhowtousepiptoinstallHorovodfromPyPIonCasper.
touseNVIDIA'sCollectiveCommunicationLibrary.
AnMPIoptionisalsoselectableforCUDA-awareMPIlibraries.FindmoredetailsaboutHorovod'sGPUtensoroperationsand
GPUinstalloptionshere
.
AusefultutorialforHorovodwasgivenaspartofthe
ArgonneTrainingProgramonExtreme-ScaleComputing
(ATPESC)-
DataParallelDeepLearning
SharingPackageEnvironments
Onceyourenvironmentissetup,youcanshareorgiveaccesstoyourPythonvirtualenvironments,whichisvitallyimportanttoconsidertowardsenablingreproduciblescience.
Onasharedcluster,shareapathtoyourenvironment,seemambaenvlist.Makesureyouprovideread
accesspluswriteaccessifyouwantotherstobeabletomodifytheenvironment.Thenrunmambaactivate
/path/to/env
Othersmayinsteadcloneareadableenvironmentwithmambacreate--namecloned_env--clone
/path/to/original_env
Todistributeyourenvironment,runmambaenvexport>my-env.yml.Otherscantheninstallthisenvironmentwithmambaenvcreate-f/path/to/yaml-file
RunningaProfileronTensorFlowandPyTorchModels
BothtensorflowandpytorchhavebuiltintoolsandtensorboardGUIinterfaceforDLprofiling,whichtypicallyrunprofilesduringthetrainingportionofadeeplearningmodel.Baseguidesforusingthesebuilt-intoolsfollow:
PyTorch
ProfilerTutorial
BuildingaBenchmarkTutorial
PyTorchProfilerwithTensorBoardTutorial
TensorFlow
TensorFlowProfilerGuide
TensorBoardProfilerAnalysisGuide
TensorBoard-
CallbacksAPIClass
EasyWaystoImplementTensorFlowandPyTorchProfilers
PyTorch
record_shapes
model=models.resnet18().cuda()
inputs=torch.randn(5,3,224,224).cuda()
withprofile(activities=[
ProfilerActivity.CPU,ProfilerActivity.CUDA],record_shapes=True)asprof:
withrecord_function("model_inference"):model(inputs)
print(prof.key_averages().table(sort_by="cuda_time_total",row_limit=10))
Theshapes.
parameterensurestheprofilercollectsdataonthedatapipelinetypes,notablytensor
importtensorflowastffiler.experimental.start('/path/to/log/output/')#...trainingloop...
filer.experimental.stop()
TensorFlow-See
API
foradditionaloptions
UsingNVIDIAToolsforProfilingDLModels
ThetoolsnsysandncuaresimilarlyadaptabletorunagainstDLPythoncodes.The
dlproftool
was
previouslydevelopedtorunnsysonDLmodelsthenoutputaTensorBoardinterface.However,dlprofisno
longerbeingdevelopedinfavorofthepreviousbuiltinprofilingmethods.
PyTorch
DNNLayerannotationsaredisabledbydefault
Usewithfiler.emit_nvtx():Manuallywithtorch.cuda.nvtx.range_(push/pop)TensorRTbackendisalreadyannotated
TensorFlow
AnnotatedbydefaultwithNVTX,onlyin
containers
NVIDIANGCcontainers
ornvidia-pyindexTF1.X
exportTF_DISABLE_NVTX_RANGES=1todisableforproduction
ForTensorFlow2.X,mustmanuallyinlineNVTXrangesorusedlprof--mode=tensorflow2...
NVIDIAprovidestheirownguides,suchas
NVIDIADeepLearningPerformance
.Asmallexampleusingthe
nsys/ncutoolsanddlprofwithDLmodelscanbefoundhere.dlprofcanstillworkwellinNVIDIANGCContainersbutcompatibilityelsewhereisnotwellsupported.
CommonPerformanceConsiderations
I/O
UsedesignatedTF/PTdataloaders
TensorFlow-
BetterPerformancewiththetf.dataAPI
PyTorch-
Datasets&Dataloaders
Multithreading,eg
Multi-WorkerTrainingwithKeras
CPUto/fromGPUdatacopies
RewritecodewithTF/PTtensorsoruseCuPy,etcOverlapcopyandcomputation
Batchsize-IncreasebatchsizeuptoGPUissaturated
Precision(Background:SeeTheoMary's
MixedPrecisionArtithmetic
talkatLondonMathSociety)Considermixedprecision,
NVIDIAMixedPrecisionTrainingGuide
AutomaticMixedPrecision(AMP)settings
PTGuide
:scaler=torch.cuda.amp.GradScaler()
TFGuide
:policy=mixed_precision.Policy('mixed_float16');mixed_precision.set_global_policy(policy)
EnsureusageofTensorCoreswithMixedPrecision
TensorFlowprovidesacomprehensiveguide,OptimizeTensorFlowGPUperformancewiththeTensorFlowProfiler
PerformanceImprovementswithTensorCores
PerNVIDIA'srecommendationon
OptimizingforTensorCores
,settingparameterssuchasmatrixdimensionsizes,batchsizes,convolutionlayerchannelcounts,etc.asmultiplesof8isoptimalduetotensorcoreshapeconstraints.
Utilizingmixedprecisionandtensorcoreseffectivelycanleadto
theoreticalthroughputperformance
of9.70
TeraFLOPSforFP64arithmeticupto78.0TeraFLOPSforFP16arithmeticonA100GPUs.
ProfilerRunsofaGeomagneticFieldLSTMModel
ThisLongShort-TermMemory(LSTM)examplecomescourtesyofthe
TrustworthyAIforEnvironmental
ScienceTrust-a-thon.Youcanfollowtheoriginalexample,withdatapreparationandexplanationofhowt
he
LSTMmodelisimplementedinthe
sourcenotebook
.
Tobegin,let'sfirstdownloaddatatousefortrainingandvalidationofourLSTMmodel.
In[]:
%%capturecaptured_io
%%bash
#Downloaddataweneed.Ifadirectory"data/"alreadyexists,we'llassumethedataarealreadydownloaded.# Theabove"magic"statementsareusedtocaptureshellin/outandtorunthefollowingBashcommands.if[!-d"data"];then
wget--verbose/geomag/data/geomag/magnet/public.zip
wget--verbose/geomag/data/geomag/magnet/private.zipunzippublic.zip
unzipprivate.zipmkdir-vdata
mv-vpublicprivatedata/
mv-vpublic.zipprivate.zipdata/fi
#Uncommentfordebuggingifyouhavetroubledownloading:
#print(captured_io)
Profilethemagnet_lstm_tutorial.pyPythonScript
ThefullGeomagneticFieldLSTMmodeliscondensedintothePythonfile
magnet_lstm_tutorial.py
.Recallthatprofilingdoesnotrequireanalyzingthefullruntimeofmostmodels.InDL,m
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 天津機(jī)電職業(yè)技術(shù)學(xué)院《光學(xué)設(shè)計(jì)》2023-2024學(xué)年第二學(xué)期期末試卷
- 株洲師范高等??茖W(xué)?!恫牧蠝y(cè)試與分析技術(shù)》2023-2024學(xué)年第二學(xué)期期末試卷
- 2025年云南省昆明黃岡實(shí)驗(yàn)學(xué)校高三第四次聯(lián)考數(shù)學(xué)試題試卷含解析
- 河南省鎮(zhèn)平縣聯(lián)考2025年初三下學(xué)期期末考試化學(xué)試題(A卷)含解析
- 沈陽(yáng)城市建設(shè)學(xué)院《建筑透視》2023-2024學(xué)年第二學(xué)期期末試卷
- 2025年云南省宣威市第九中學(xué)高三下學(xué)期開(kāi)學(xué)暑假驗(yàn)收考試語(yǔ)文試題含解析
- 天津仁愛(ài)學(xué)院《現(xiàn)代生物制藥工程原理》2023-2024學(xué)年第二學(xué)期期末試卷
- 江蘇省南通港閘區(qū)五校聯(lián)考2025屆中考化學(xué)試題全真模擬密押卷(五)含解析
- 安康學(xué)院《幼兒園區(qū)域活動(dòng)》2023-2024學(xué)年第一學(xué)期期末試卷
- 腹水腹腔引流護(hù)理查房
- 2024年北京稻香村招聘考試真題
- 獸醫(yī)臨床診斷學(xué)試題及答案
- 2023年河南單招英語(yǔ)模擬試題及答案
- 2023-2024學(xué)年廣東省廣州市越秀區(qū)培正中學(xué)七年級(jí)(下)期中數(shù)學(xué)試卷(含答案)
- 內(nèi)部控制體系建設(shè)咨詢項(xiàng)目咨詢服務(wù)合同范本
- 嬰幼兒蚊蟲(chóng)咬傷概述陳丹丹講解
- 2025屆高考語(yǔ)文復(fù)習(xí):信息類文本五大類型的主觀題 課件
- 歷史選擇題中常見(jiàn)的名詞解釋課件-高三歷史統(tǒng)編版二輪復(fù)習(xí)
- 自建房質(zhì)量安全課件
- 北師大版二年級(jí)數(shù)學(xué)下冊(cè)各單元測(cè)試卷
- 品管圈PDCA改善案例-降低住院患者跌倒發(fā)生率
評(píng)論
0/150
提交評(píng)論