版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
IntroductiontoclustercomputingresourcesforNCNXufengWangElectricalandComputerEngineeringPurdueUniversityWestLafayette,IN47906IntroductionWelcome!ThispresentationisdesignedtohelppeoplegetfamiliarwithNCNcomputationalclusterresources.Youwilllearnwhatiscluster,itscomponents,andothers.2TableofcontentsPrelude:understandclustercomputingfromhumanthinkingClustercomponent#1:clustercomputingnodesClustercomponent#2:PublicBatchSystem(PBS)Clustercomponent#3:front-endmachinesNCNresourcesoverviewReferences3AsimpleproblemProblem“Ihave3redboxeswith10pensineachofthemand4blackboxeswith2pensineachofthem.HowmanypensdoIhaveintotal?”4CriticalelementsofthinkingDescribetheabstractproblemwithacertainmodel/toolthatmybraincanhandle.Forexample,mathematicalexpressions.Writeproblemonapieceofpaper:”3*10+4*2=?”.Problemisthusstoredonthepaper.Myeyesreadtheproblem,”3*10+4*2=?"isstored,orbuffered,inmybrain,readytobecomputed.Mybrainbeginstocompute:3*10+4*2=38Igottheanswer!Result“38”isbufferedinmybrain.Mybrainsignalsmyhandtowritedowntheresult.Resultisthusstoredonthepaper.Icanforgetaboutthebufferedresult“38”inmybrainnow,asitiswrittendownonthepaper.5Criticalelementsofthinking6PaperProblemMathmaticalExpressionMemorypowerofbrainComputingpowerofbrainDescribetheabstractproblemwithacertainmodel/toolthatmybraincanhandle.Forexample,mathematicalexpressions.3. Myeyesreadtheproblem,”3*10+4*2=?"isstored,orbuffered,inmybrain,readytobecomputed.4. Mybrainbeginstocompute:3*10+4*2=385. Igottheanswer!Result“38”isbufferedinmybrain.6. Mybrainsignalsmyhandtowritedowntheresult.Resultisthusstoredonthepaper.2. Writeproblemonapieceofpaper:”3*10+4*2=?”.Problemisthusstoredonthepaper.7. Icanforgetaboutthebufferedresult“38”inmybrainnow,asitiswrittendownonthepaper.Criticalelementsofcomputer’sthinking7ProblemMATLABscriptMemorypowerofcomputerComputingpowerofcomputerDescribetheabstractproblemwithacertainmodel/toolthatmybraincanhandle.Forexample,mathematicalexpressions.FilestoredinharddriveKeycharacteristicsMathmaticalexpression/MATLABscript[ComputerLanguage]Bothareintermediatethattranslateshuman’sabstractthinkingintoalanguageconvenientforcomputationandreadablebyothers.Paper/Filestoredonharddrive[Filestoragesystem]Botharephysicalitemsthatcanrecordinformation.Memorypowerofbrain/computer[RandomAccessMemory]Botharealsophysicalitemsthatcanrecord,butmuchfasterandprecious.Computingpowerofbrain/computer[CPU]Bothcancompute,thatis,processtheinformation.However,itcanonlyprocessinformationfromcertainphysicalmemory.8ComponentsonamodernASUSmotherboard9ProblemMATLABscriptHardDriveConnectorRAMsockets(yellow&black)MountedCPUinsideNBSBUSBNeedforcomputerclustersHereatNCN,weneedcomputingresourcesthatcan:Solvelargeamountofproblemsatthesametime.Servelargeamountofusersatthesametime.Basedonourunderstandingofsinglecorecomputer,howdoweexpandittosuitourneeds?Well,ofcourse,theobviousansweris:IfwesimplygetNsinglecorecomputersystems,wecanallowuptoNuserstosolveNproblemsatthesametime!Let’slookatascenariowhich2usersaretryingtosolve3problemssimultaneously.102userswith3problemsBasedonourpreviousidea,wenowhavethreeindependentandidenticalcomputerssolving3problemsfrom2users.But,isitefficient?11Problem1HardDriveforUserAP_1.mCPURAMProblem2HardDriveforUserAP_2.mCPURAMProblem3HardDriveforUserBP_3.mCPURAMHardDriveStorageExplained“Harddrive”and“RandomAccessMemory”(RAM)bothhasthecapabilitytostoreinformation.Whyweneedtohavetwomemoryunits?What’stheirdifference?12HarddriveRAMUsualsizeInordersofGBorTB8GB–128GBRead/writespeedSlowFastStructurePlatterwitharm“needle”SolidstatetransistorsVolatile?NoYesPriceLowHigh“Harddrive”isthusidealforstoringLargeamountofdata(largesize,lowcost)Datathathaslowread-writedemand(slowI/Orate)Long-termdata(non-volatile)RAMstorageexplainedHowever,whendoingintensivecomputation:thecommunicationbetweenmemorytoCPUshallberapid,veryfastI/Oneeded.onlyusedvariablesarestoredinmemory,thusthememorydoesn’thavetobelarge.memoryistemporary.Volatilememoryisok.RAMisthusidealforsuchsituation,andthatiswhywehavetwoformsofmemorystorageinacomputer.13HarddriveRAMUsualsizeInordersofGBorTB8GB–128GBRead/writespeedSlowFastStructurePlatterwitharm“needle”SolidstatetransistorsVolatile?NoYesPriceLowHighEPluribusUnumMemorystoragecanbesharedamongusers,aslongastheinformationarewellmanagedsousers’fileswon’tmixedup.14Problem1CPUProblem2CPUProblem3CPU1MBof500GBused4GBof8GBusedAdditionalofproblemswithoutIncreasingtheCost?15Problem1CPUProblem2CPUProblem3CPUProblem41.5MBof500GBused6GBof8GBused4problemscannotbeefficientlysolvedon3CPUssimultaneously.Wehowevercansolve3problemsfirstandthentheremainingonewheneveraCPUbecomesfree.It’slikedinningatabusyrestaurant:youneedtotakeyourorderandwaittobeseated.WhenasingleCPUtakesmultiplejobs
IfasingleCPUhasmultipletasksatthesametime(commonscenarioindesktopcomputers),itwillsimplyprocessonetaskforaveryshortmoment,stop,andgoprocessthenexttaskforaveryshortmoment,andsoon.Thisrapidprocessingofalltasksinsuccessiongivesauseranillusionthatalltasksarebeingprocessedatthesametime.Asthenumberofjobsincreases,moretimeisspentonCPUI/Ocommunication.JobswillbecomeslowerduetolongerwaittimetobeservedbyCPUandhigherI/Orequests.16CPUProcess#1Process#2Process#3Process#4Process#5Solving4problemswith3CPUs17Problem1CPUProblem2CPUProblem3CPUProblem41.5MBof500GBused6GBof8GBusedManagewhichjobtobesubmittedtoCPUsPBSScientificcomputationrequiresdedicatedCPU(s)tooneprocess.Thus,amanagementsystemisneededtoensureproperassignmentofCPUtoeachtask.ThisistheconceptofPublicBatchSystem(PBS)Clustercomponents18Problem1CPUProblem2CPUProblem3CPUProblem4PBSUserswrite,edit,andmanagefiles.Storelargeamountoffiles.Preparescriptsforrunning.Manageuser’srequest(numberofCPUs,RAMsize,etc.)CoordinatetaskswithcomputationalresourcesProviderawcomputationpowerFront-endMachinePBSClustersClustersexplained“Compute!Compute!Compute!”Inourdefinition,“clusters”aregroupsofRAMandCPUswiththeirsupportingcomponentstoproviderawcomputationalpower.19CPUCPUCPUOursimpleexamplehere:3CPUssharing1RAMisfarnotenoughtobeacomputationpowerhorse.Howdoweexpandthemtomakeahugeclustertoaccommodatelargeamountofcomputationaljobs?ToPBSAclusternodeRAMiscappedat8GBmaxforourCPUs.ThemoreCPUsattachedtoaRAM,thelessshareofmemoryeachCPUwillhaveinaverage.Inaddition,CPUmanufacturesusuallypack2(dualcore)or4(quadcore)CPUspersocket,with1~2socketssharing1RAM.20CPUCPUCPUSharedRAM(16GB)CPUCPUCPUCPUCPUCPUCPUCPUQuadCore#1QuadCore#2Thisisa(steele)clusterNodeFormingasimpleclusterwithnodesOuroriginalgoal:Solvelargeamountofproblemsatthesametime.Servelargeamountofusersatthesametime.WearchivedthegoalbycouplingCPUswithRAMtoformnodes,andexpandthenumberofnodesinservice.Inthissmallmodelcluster,wehave6nodeswith8CPUspernode=48totalCPUsinservice,averaging16GB/8=2GBRAMperCPUateachnode.Roughly,48problemscanbesolvedatthesametime.21NodeNodeNodeNodeNodeNodeToPBSExploitingthecomputationalresources,inagoodway“Ok,clustersseemtomearejustbunchofcomputerssittingtogether.Howcanthatgivethemacomputationaladvantageoversinglecorecomputers?”Answer:TherealpowerofclusterscomesfromthecouplingofCPUswithinanodeandamongthenodesthemselves.Ouroriginalproblem:“Ihave3redboxeswith10pensineachofthemand4blackboxeswith2pensineachofthem.HowmanypensdoIhaveintotal?”Solve: 3*10+4*2=?22Solve3*10+4*2=?23ToPBSSharedRAM(16GB)CPUCPUCPUCPUCPUCPUCPUCPUQuadCore#1QuadCore#2Thisisa(steele)clusterNodeCPU#1>>3*10+4*2=?communications3*10=304*2=830+8=38Solve3*10+4*2=?Uncoupledcalculationscanbedonesimultaneouslytosavetime.Exploitparallelism,butnotdowntomachinelevel,i.e.humanpostprocessingneeded.“Embarrassinglyparallelscheme”.24ToPBSSharedRAM(16GB)CPUCPUCPUCPUCPUCPUCPUCPUQuadCore#1QuadCore#2Thisisa(steele)clusterNodeTask#1>>3*10=?Task#2>>4*2=?Task#3>>30+8=?Processmanuallycommunications3*10=30communications4*2=8com.30+8=38waitforCPU#1postprocessCPU#0>>CPU#1do:3*10=?CPU#2do:4*2=?Solve3*10+4*2=?25CPU#1>>3*10=?CPU#2>>4*2=?CPU#0>>CPU#1do:30+8=?sendreceiveMasterCPUSlaveCPUsParallelprogramming:MasterandSlaveconfigurationcom.communications3*10=30communications4*2=830+8=38waitforCPU#1receivesendsendcom.receiveThose“actionsofcollaboration”betweenCPUscannotbearchivedbytraditionalprogramminglanguagesuchasC,C++,MATLAB,andetc.MessagePassingInterface(MPI)MessagePassingInterface,commonlyknownasMPI,isintroducedasadditionallibrariestoseveralpopularexistingcomputerlanguages(C,C++,FORTRAN)toarchivescript-levelparallelprogramming.MPIallowsthecodewritertocontrolthecommunicationbetweenCPUs.“Actions”mentionedpreviouslycanbearchivedbywritingspecificMPIsentenceswithintheprogram.Examples: “sendthisvariablefromCPU#0toCPU#1”–MPI_send “addtheresultsgotfromCPU#1andCPU#2”–MPI_addModernscientificcodeswithMPIcanconsumelargeamountofCPUsandhourstosolvecomplicatedproblems.(OMENforexample)26Howcan10,000CPUsworkfor1program?Nodesneedtocommunicatewitheachother,soCPUsfromseveralnodescantalkviaMPI.Physicalconnectionsneeded.Noteverynodeneedtocommunicatewithallothers.Acertainnetworkconfigurationisthusneeded.Interconnectsareachievedthroughcables,anddifferenttypesofcablenetworkwillyielddifferentperformance27NodeNodeNodeNodeNodeNodeToPBSNodesInterconnectNetwork(GigabitEthernet,Infiniband,etc)InterconnectnetworkperformanceMajorfactorsevaluatingtheperformanceofinterconnectcables:Transferrate:howmuchdatacanthecabletransferpersecond?Latency:howmuchdelaydoeachtransferoverthecablehas?ThreekindsofcablesaredeployedonPurdueclustersGigabitEthernet:1GB/secwithlowlatency.(steele,pete,etc.)Infiniband:10GB/secwithultralowlatency.(steele,non-NCN)10GigabitEthernet:10Gb/secwithultralowlatency.(Coates)ThingsworthtomentionSerialprogramsdonotbenefitfromtheseinterconnectcables;MPIprogramsthatneedslotsofI/ObetweenCPUswilldo.UtilizingInfinibandmayrequireextracompilinglibrary.28Clusterssummary29UsertypeSolveproblemsviaofficedesktop/laptopSolveproblemsviaclustersCausalusersShortserialprogramsSlowdownyourcomputer.Unreliable.Fastprocessorsandlargememory.Donotslowdownyourcomputer.IntermediateusersMultiple,long-runserialprogramsRunprogram1by1.Significantlyslowdownyourcomputer.Embarrassinglyparallelyourjobs.FastanddonotslowyourPCdown.AdvancedusersMultiple,long-run,MPIbasedparallelprogramsCannotdoparallelruninsinglecorecomputers.ProgramisdesignedtorunonclusterswithmanyCPUs.TheSteeleclusterClustershavetomeettheneedswithvarioususers,sotheycanbemadetohavedifferentkindsofnodes.30NCNownednodesarealllocatedatSub-Cluster“Steele-A”.NCNalsoownnodesonotherclusterssuchas“Pete”and“Coates”.Detailswillbediscussedlater.Referencesandrecommendations31InterludeMorecompletepictureofentiresystem32FrontendmachineexplainedFront-endmachineisthegatewayforallusers.Itprovidesstorageandallowsuserstocomposite,compile,andmanagetheirfiles.ItisarathercompletecomputeritselfwithitsownCPUsandRAMs.Itisdesignedtoservegreatnumberofusersandstoreextremelyhighvolumeoffiles.33Problem1Problem2Problem3Front-endRAMFront-endCPUSteele’sfront-endmachine34ComparingFront-endmachinetoclusters35Front-endmachineClustersCPURAMCPURAMCharacterSameasclustersSameasfront-endmachineNumberFewAbundantUsercontrolNocontroloverCPUassignmentorRAMsize.TotalcontroloverCPUassignmentandRAMsizeviaPBSParallelcomputingSinglecoreprogramonly.CancompilebutshouldnotrunMPIprograms.MPIprogramscanbecompiledandrunhere.PurposeLightdutyfileediting,management,andcompilingHeavydutycomputationThus,NOcomputationalprogram,ex.MATLAB,onfront-endmachineforheavycalculations.Thisevenincludesdatapost-processing.Forserialjobs,allocatesingleCPUfromclustersviaPBS.FilestoragesolutionsOurmodel“sharedharddrive”isinrealitya“sharednetworkstorage”offeredviaBlueArcsystem.Twotiersofstorageoffering320TBspace.36SharedNetworkStorageNewfilesFibreChanneldisk(fast&expensive)SATAdisk(slow&cheap)RecentfilesOldfilesIfcalledtobeusedIfgetsoldandunusedFortressDXULSystemFortressDXULsystemprovidesasolutiontolong-termstorageforlargefiles.Noactivefilesshallbestoredhere.Nolargecollectionsofsmallfilesshallbestoredhere.Compressthem(viatarballorzip)firstandthenstore.37SharedNetworkStorageFortressDXULSystemLow-costdisksTape/opticaldisksTapecartridgeTapecartridgePrimarycopySecondarycopyForfilessmallerthan0.5MBForfileslargerthan0.5MBFront-endmachinessummary38RegularofficeworkstationFront-endmachinewithBlueArcstorageFortressDXULSystemPrimarystoragesizeDepend(usually100GB-500GB)Largeintotal,butcanbelimitedperperson(1-10GB)Huge,upto5TBperperson.Primarybackup?UsuallynoYesYesSecondarystoragesizeDepend(usuallynosecondharddrive)Scratchdrives(250GB).Large.Second.backup?UsuallynoYesAccessspeedSlow(SATAdrive)Fast(Fibredisk)VeryslowSoftwareavailabilityLimitedAbundantVeryfewPurposeDailyusageGatewaytoclustersLong-termstorageReferencesandrecommendations3
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 二零二五版?zhèn)€人對(duì)個(gè)人民宿短租服務(wù)合同3篇
- 二零二五年度版權(quán)監(jiān)控合同2篇
- 二零二五版物流配送合同管理員安全生產(chǎn)保障協(xié)議3篇
- 二零二五年度餐飲業(yè)食品安全培訓(xùn)及咨詢服務(wù)合同范本3篇
- 二零二五年電梯安全知識(shí)競(jìng)賽獎(jiǎng)品贊助與提供合同3篇
- 二零二五年海參養(yǎng)殖基地與農(nóng)產(chǎn)品營(yíng)銷策劃公司合作合同文本3篇
- 二零二五年度鋼結(jié)構(gòu)景觀亭臺(tái)制作安裝合同3篇
- 二零二五年度CFG樁基施工與監(jiān)理一體化承包合同2篇
- 二零二五年度高鐵站車庫(kù)租賃與行李寄存服務(wù)合同3篇
- 二零二五年教育培訓(xùn)機(jī)構(gòu)實(shí)習(xí)學(xué)生勞動(dòng)合同規(guī)范文本3篇
- 2025年湖北武漢工程大學(xué)招聘6人歷年高頻重點(diǎn)提升(共500題)附帶答案詳解
- 【數(shù) 學(xué)】2024-2025學(xué)年北師大版數(shù)學(xué)七年級(jí)上冊(cè)期末能力提升卷
- GB/T 26846-2024電動(dòng)自行車用電動(dòng)機(jī)和控制器的引出線及接插件
- 遼寧省沈陽(yáng)市皇姑區(qū)2024-2025學(xué)年九年級(jí)上學(xué)期期末考試語(yǔ)文試題(含答案)
- 妊娠咳嗽的臨床特征
- 國(guó)家公務(wù)員考試(面試)試題及解答參考(2024年)
- 《阻燃材料與技術(shù)》課件 第6講 阻燃纖維及織物
- 2024年金融理財(cái)-擔(dān)保公司考試近5年真題附答案
- 泰山產(chǎn)業(yè)領(lǐng)軍人才申報(bào)書(shū)
- 高中語(yǔ)文古代文學(xué)課件:先秦文學(xué)
- 人教版五年級(jí)上冊(cè)遞等式計(jì)算100道及答案
評(píng)論
0/150
提交評(píng)論