NCN集群計(jì)算資源介紹_第1頁(yè)
NCN集群計(jì)算資源介紹_第2頁(yè)
NCN集群計(jì)算資源介紹_第3頁(yè)
NCN集群計(jì)算資源介紹_第4頁(yè)
NCN集群計(jì)算資源介紹_第5頁(yè)
已閱讀5頁(yè),還剩45頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

IntroductiontoclustercomputingresourcesforNCNXufengWangElectricalandComputerEngineeringPurdueUniversityWestLafayette,IN47906IntroductionWelcome!ThispresentationisdesignedtohelppeoplegetfamiliarwithNCNcomputationalclusterresources.Youwilllearnwhatiscluster,itscomponents,andothers.2TableofcontentsPrelude:understandclustercomputingfromhumanthinkingClustercomponent#1:clustercomputingnodesClustercomponent#2:PublicBatchSystem(PBS)Clustercomponent#3:front-endmachinesNCNresourcesoverviewReferences3AsimpleproblemProblem“Ihave3redboxeswith10pensineachofthemand4blackboxeswith2pensineachofthem.HowmanypensdoIhaveintotal?”4CriticalelementsofthinkingDescribetheabstractproblemwithacertainmodel/toolthatmybraincanhandle.Forexample,mathematicalexpressions.Writeproblemonapieceofpaper:”3*10+4*2=?”.Problemisthusstoredonthepaper.Myeyesreadtheproblem,”3*10+4*2=?"isstored,orbuffered,inmybrain,readytobecomputed.Mybrainbeginstocompute:3*10+4*2=38Igottheanswer!Result“38”isbufferedinmybrain.Mybrainsignalsmyhandtowritedowntheresult.Resultisthusstoredonthepaper.Icanforgetaboutthebufferedresult“38”inmybrainnow,asitiswrittendownonthepaper.5Criticalelementsofthinking6PaperProblemMathmaticalExpressionMemorypowerofbrainComputingpowerofbrainDescribetheabstractproblemwithacertainmodel/toolthatmybraincanhandle.Forexample,mathematicalexpressions.3. Myeyesreadtheproblem,”3*10+4*2=?"isstored,orbuffered,inmybrain,readytobecomputed.4. Mybrainbeginstocompute:3*10+4*2=385. Igottheanswer!Result“38”isbufferedinmybrain.6. Mybrainsignalsmyhandtowritedowntheresult.Resultisthusstoredonthepaper.2. Writeproblemonapieceofpaper:”3*10+4*2=?”.Problemisthusstoredonthepaper.7. Icanforgetaboutthebufferedresult“38”inmybrainnow,asitiswrittendownonthepaper.Criticalelementsofcomputer’sthinking7ProblemMATLABscriptMemorypowerofcomputerComputingpowerofcomputerDescribetheabstractproblemwithacertainmodel/toolthatmybraincanhandle.Forexample,mathematicalexpressions.FilestoredinharddriveKeycharacteristicsMathmaticalexpression/MATLABscript[ComputerLanguage]Bothareintermediatethattranslateshuman’sabstractthinkingintoalanguageconvenientforcomputationandreadablebyothers.Paper/Filestoredonharddrive[Filestoragesystem]Botharephysicalitemsthatcanrecordinformation.Memorypowerofbrain/computer[RandomAccessMemory]Botharealsophysicalitemsthatcanrecord,butmuchfasterandprecious.Computingpowerofbrain/computer[CPU]Bothcancompute,thatis,processtheinformation.However,itcanonlyprocessinformationfromcertainphysicalmemory.8ComponentsonamodernASUSmotherboard9ProblemMATLABscriptHardDriveConnectorRAMsockets(yellow&black)MountedCPUinsideNBSBUSBNeedforcomputerclustersHereatNCN,weneedcomputingresourcesthatcan:Solvelargeamountofproblemsatthesametime.Servelargeamountofusersatthesametime.Basedonourunderstandingofsinglecorecomputer,howdoweexpandittosuitourneeds?Well,ofcourse,theobviousansweris:IfwesimplygetNsinglecorecomputersystems,wecanallowuptoNuserstosolveNproblemsatthesametime!Let’slookatascenariowhich2usersaretryingtosolve3problemssimultaneously.102userswith3problemsBasedonourpreviousidea,wenowhavethreeindependentandidenticalcomputerssolving3problemsfrom2users.But,isitefficient?11Problem1HardDriveforUserAP_1.mCPURAMProblem2HardDriveforUserAP_2.mCPURAMProblem3HardDriveforUserBP_3.mCPURAMHardDriveStorageExplained“Harddrive”and“RandomAccessMemory”(RAM)bothhasthecapabilitytostoreinformation.Whyweneedtohavetwomemoryunits?What’stheirdifference?12HarddriveRAMUsualsizeInordersofGBorTB8GB–128GBRead/writespeedSlowFastStructurePlatterwitharm“needle”SolidstatetransistorsVolatile?NoYesPriceLowHigh“Harddrive”isthusidealforstoringLargeamountofdata(largesize,lowcost)Datathathaslowread-writedemand(slowI/Orate)Long-termdata(non-volatile)RAMstorageexplainedHowever,whendoingintensivecomputation:thecommunicationbetweenmemorytoCPUshallberapid,veryfastI/Oneeded.onlyusedvariablesarestoredinmemory,thusthememorydoesn’thavetobelarge.memoryistemporary.Volatilememoryisok.RAMisthusidealforsuchsituation,andthatiswhywehavetwoformsofmemorystorageinacomputer.13HarddriveRAMUsualsizeInordersofGBorTB8GB–128GBRead/writespeedSlowFastStructurePlatterwitharm“needle”SolidstatetransistorsVolatile?NoYesPriceLowHighEPluribusUnumMemorystoragecanbesharedamongusers,aslongastheinformationarewellmanagedsousers’fileswon’tmixedup.14Problem1CPUProblem2CPUProblem3CPU1MBof500GBused4GBof8GBusedAdditionalofproblemswithoutIncreasingtheCost?15Problem1CPUProblem2CPUProblem3CPUProblem41.5MBof500GBused6GBof8GBused4problemscannotbeefficientlysolvedon3CPUssimultaneously.Wehowevercansolve3problemsfirstandthentheremainingonewheneveraCPUbecomesfree.It’slikedinningatabusyrestaurant:youneedtotakeyourorderandwaittobeseated.WhenasingleCPUtakesmultiplejobs

IfasingleCPUhasmultipletasksatthesametime(commonscenarioindesktopcomputers),itwillsimplyprocessonetaskforaveryshortmoment,stop,andgoprocessthenexttaskforaveryshortmoment,andsoon.Thisrapidprocessingofalltasksinsuccessiongivesauseranillusionthatalltasksarebeingprocessedatthesametime.Asthenumberofjobsincreases,moretimeisspentonCPUI/Ocommunication.JobswillbecomeslowerduetolongerwaittimetobeservedbyCPUandhigherI/Orequests.16CPUProcess#1Process#2Process#3Process#4Process#5Solving4problemswith3CPUs17Problem1CPUProblem2CPUProblem3CPUProblem41.5MBof500GBused6GBof8GBusedManagewhichjobtobesubmittedtoCPUsPBSScientificcomputationrequiresdedicatedCPU(s)tooneprocess.Thus,amanagementsystemisneededtoensureproperassignmentofCPUtoeachtask.ThisistheconceptofPublicBatchSystem(PBS)Clustercomponents18Problem1CPUProblem2CPUProblem3CPUProblem4PBSUserswrite,edit,andmanagefiles.Storelargeamountoffiles.Preparescriptsforrunning.Manageuser’srequest(numberofCPUs,RAMsize,etc.)CoordinatetaskswithcomputationalresourcesProviderawcomputationpowerFront-endMachinePBSClustersClustersexplained“Compute!Compute!Compute!”Inourdefinition,“clusters”aregroupsofRAMandCPUswiththeirsupportingcomponentstoproviderawcomputationalpower.19CPUCPUCPUOursimpleexamplehere:3CPUssharing1RAMisfarnotenoughtobeacomputationpowerhorse.Howdoweexpandthemtomakeahugeclustertoaccommodatelargeamountofcomputationaljobs?ToPBSAclusternodeRAMiscappedat8GBmaxforourCPUs.ThemoreCPUsattachedtoaRAM,thelessshareofmemoryeachCPUwillhaveinaverage.Inaddition,CPUmanufacturesusuallypack2(dualcore)or4(quadcore)CPUspersocket,with1~2socketssharing1RAM.20CPUCPUCPUSharedRAM(16GB)CPUCPUCPUCPUCPUCPUCPUCPUQuadCore#1QuadCore#2Thisisa(steele)clusterNodeFormingasimpleclusterwithnodesOuroriginalgoal:Solvelargeamountofproblemsatthesametime.Servelargeamountofusersatthesametime.WearchivedthegoalbycouplingCPUswithRAMtoformnodes,andexpandthenumberofnodesinservice.Inthissmallmodelcluster,wehave6nodeswith8CPUspernode=48totalCPUsinservice,averaging16GB/8=2GBRAMperCPUateachnode.Roughly,48problemscanbesolvedatthesametime.21NodeNodeNodeNodeNodeNodeToPBSExploitingthecomputationalresources,inagoodway“Ok,clustersseemtomearejustbunchofcomputerssittingtogether.Howcanthatgivethemacomputationaladvantageoversinglecorecomputers?”Answer:TherealpowerofclusterscomesfromthecouplingofCPUswithinanodeandamongthenodesthemselves.Ouroriginalproblem:“Ihave3redboxeswith10pensineachofthemand4blackboxeswith2pensineachofthem.HowmanypensdoIhaveintotal?”Solve: 3*10+4*2=?22Solve3*10+4*2=?23ToPBSSharedRAM(16GB)CPUCPUCPUCPUCPUCPUCPUCPUQuadCore#1QuadCore#2Thisisa(steele)clusterNodeCPU#1>>3*10+4*2=?communications3*10=304*2=830+8=38Solve3*10+4*2=?Uncoupledcalculationscanbedonesimultaneouslytosavetime.Exploitparallelism,butnotdowntomachinelevel,i.e.humanpostprocessingneeded.“Embarrassinglyparallelscheme”.24ToPBSSharedRAM(16GB)CPUCPUCPUCPUCPUCPUCPUCPUQuadCore#1QuadCore#2Thisisa(steele)clusterNodeTask#1>>3*10=?Task#2>>4*2=?Task#3>>30+8=?Processmanuallycommunications3*10=30communications4*2=8com.30+8=38waitforCPU#1postprocessCPU#0>>CPU#1do:3*10=?CPU#2do:4*2=?Solve3*10+4*2=?25CPU#1>>3*10=?CPU#2>>4*2=?CPU#0>>CPU#1do:30+8=?sendreceiveMasterCPUSlaveCPUsParallelprogramming:MasterandSlaveconfigurationcom.communications3*10=30communications4*2=830+8=38waitforCPU#1receivesendsendcom.receiveThose“actionsofcollaboration”betweenCPUscannotbearchivedbytraditionalprogramminglanguagesuchasC,C++,MATLAB,andetc.MessagePassingInterface(MPI)MessagePassingInterface,commonlyknownasMPI,isintroducedasadditionallibrariestoseveralpopularexistingcomputerlanguages(C,C++,FORTRAN)toarchivescript-levelparallelprogramming.MPIallowsthecodewritertocontrolthecommunicationbetweenCPUs.“Actions”mentionedpreviouslycanbearchivedbywritingspecificMPIsentenceswithintheprogram.Examples: “sendthisvariablefromCPU#0toCPU#1”–MPI_send “addtheresultsgotfromCPU#1andCPU#2”–MPI_addModernscientificcodeswithMPIcanconsumelargeamountofCPUsandhourstosolvecomplicatedproblems.(OMENforexample)26Howcan10,000CPUsworkfor1program?Nodesneedtocommunicatewitheachother,soCPUsfromseveralnodescantalkviaMPI.Physicalconnectionsneeded.Noteverynodeneedtocommunicatewithallothers.Acertainnetworkconfigurationisthusneeded.Interconnectsareachievedthroughcables,anddifferenttypesofcablenetworkwillyielddifferentperformance27NodeNodeNodeNodeNodeNodeToPBSNodesInterconnectNetwork(GigabitEthernet,Infiniband,etc)InterconnectnetworkperformanceMajorfactorsevaluatingtheperformanceofinterconnectcables:Transferrate:howmuchdatacanthecabletransferpersecond?Latency:howmuchdelaydoeachtransferoverthecablehas?ThreekindsofcablesaredeployedonPurdueclustersGigabitEthernet:1GB/secwithlowlatency.(steele,pete,etc.)Infiniband:10GB/secwithultralowlatency.(steele,non-NCN)10GigabitEthernet:10Gb/secwithultralowlatency.(Coates)ThingsworthtomentionSerialprogramsdonotbenefitfromtheseinterconnectcables;MPIprogramsthatneedslotsofI/ObetweenCPUswilldo.UtilizingInfinibandmayrequireextracompilinglibrary.28Clusterssummary29UsertypeSolveproblemsviaofficedesktop/laptopSolveproblemsviaclustersCausalusersShortserialprogramsSlowdownyourcomputer.Unreliable.Fastprocessorsandlargememory.Donotslowdownyourcomputer.IntermediateusersMultiple,long-runserialprogramsRunprogram1by1.Significantlyslowdownyourcomputer.Embarrassinglyparallelyourjobs.FastanddonotslowyourPCdown.AdvancedusersMultiple,long-run,MPIbasedparallelprogramsCannotdoparallelruninsinglecorecomputers.ProgramisdesignedtorunonclusterswithmanyCPUs.TheSteeleclusterClustershavetomeettheneedswithvarioususers,sotheycanbemadetohavedifferentkindsofnodes.30NCNownednodesarealllocatedatSub-Cluster“Steele-A”.NCNalsoownnodesonotherclusterssuchas“Pete”and“Coates”.Detailswillbediscussedlater.Referencesandrecommendations31InterludeMorecompletepictureofentiresystem32FrontendmachineexplainedFront-endmachineisthegatewayforallusers.Itprovidesstorageandallowsuserstocomposite,compile,andmanagetheirfiles.ItisarathercompletecomputeritselfwithitsownCPUsandRAMs.Itisdesignedtoservegreatnumberofusersandstoreextremelyhighvolumeoffiles.33Problem1Problem2Problem3Front-endRAMFront-endCPUSteele’sfront-endmachine34ComparingFront-endmachinetoclusters35Front-endmachineClustersCPURAMCPURAMCharacterSameasclustersSameasfront-endmachineNumberFewAbundantUsercontrolNocontroloverCPUassignmentorRAMsize.TotalcontroloverCPUassignmentandRAMsizeviaPBSParallelcomputingSinglecoreprogramonly.CancompilebutshouldnotrunMPIprograms.MPIprogramscanbecompiledandrunhere.PurposeLightdutyfileediting,management,andcompilingHeavydutycomputationThus,NOcomputationalprogram,ex.MATLAB,onfront-endmachineforheavycalculations.Thisevenincludesdatapost-processing.Forserialjobs,allocatesingleCPUfromclustersviaPBS.FilestoragesolutionsOurmodel“sharedharddrive”isinrealitya“sharednetworkstorage”offeredviaBlueArcsystem.Twotiersofstorageoffering320TBspace.36SharedNetworkStorageNewfilesFibreChanneldisk(fast&expensive)SATAdisk(slow&cheap)RecentfilesOldfilesIfcalledtobeusedIfgetsoldandunusedFortressDXULSystemFortressDXULsystemprovidesasolutiontolong-termstorageforlargefiles.Noactivefilesshallbestoredhere.Nolargecollectionsofsmallfilesshallbestoredhere.Compressthem(viatarballorzip)firstandthenstore.37SharedNetworkStorageFortressDXULSystemLow-costdisksTape/opticaldisksTapecartridgeTapecartridgePrimarycopySecondarycopyForfilessmallerthan0.5MBForfileslargerthan0.5MBFront-endmachinessummary38RegularofficeworkstationFront-endmachinewithBlueArcstorageFortressDXULSystemPrimarystoragesizeDepend(usually100GB-500GB)Largeintotal,butcanbelimitedperperson(1-10GB)Huge,upto5TBperperson.Primarybackup?UsuallynoYesYesSecondarystoragesizeDepend(usuallynosecondharddrive)Scratchdrives(250GB).Large.Second.backup?UsuallynoYesAccessspeedSlow(SATAdrive)Fast(Fibredisk)VeryslowSoftwareavailabilityLimitedAbundantVeryfewPurposeDailyusageGatewaytoclustersLong-termstorageReferencesandrecommendations3

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論