![三個(gè)臭皮匠勝過一個(gè)諸葛亮_第1頁](http://file4.renrendoc.com/view/bc63c90b372d840f6484d6cc3a3de3e2/bc63c90b372d840f6484d6cc3a3de3e21.gif)
![三個(gè)臭皮匠勝過一個(gè)諸葛亮_第2頁](http://file4.renrendoc.com/view/bc63c90b372d840f6484d6cc3a3de3e2/bc63c90b372d840f6484d6cc3a3de3e22.gif)
![三個(gè)臭皮匠勝過一個(gè)諸葛亮_第3頁](http://file4.renrendoc.com/view/bc63c90b372d840f6484d6cc3a3de3e2/bc63c90b372d840f6484d6cc3a3de3e23.gif)
![三個(gè)臭皮匠勝過一個(gè)諸葛亮_第4頁](http://file4.renrendoc.com/view/bc63c90b372d840f6484d6cc3a3de3e2/bc63c90b372d840f6484d6cc3a3de3e24.gif)
![三個(gè)臭皮匠勝過一個(gè)諸葛亮_第5頁](http://file4.renrendoc.com/view/bc63c90b372d840f6484d6cc3a3de3e2/bc63c90b372d840f6484d6cc3a3de3e25.gif)
版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
ExploitingThread-LevelParallelismin
GeneralPurposeApplicationsPen-ChungYew游本中
DepartmentofComputerScienceandEngineeringUniversityofMinnesotahttp:///Agassiz2023/1/111PCYew-Taiwan三個(gè)臭皮匠勝過一個(gè)諸葛亮
三個(gè)諸葛亮勝不過一個(gè)臭皮匠Pen-ChungYew游本中
DepartmentofComputerScienceandEngineeringUniversityofMinnesotahttp:///Agassiz2023/1/112PCYew-TaiwanImpactofHardwareTechnologyonComputerArchitecturesPerformance
improvementofmicroprocessorssofarhasbeendrivenprimarilybyhigherclockrates:
smallerfeaturesizes
(Moore’sconjecture),higherpowerdensity,highercoolingcostResults?IntelcancelledtwoPentiumprojectsrecentlyVLSItechnologyallowsmorethan1billiontransistorsonasinglechip=>plentyofgates,whattodowiththem?Superscalarishardtoscalebeyond~10instructionsperclockcycle:inherent
ILP(Instruction-LevelParallelism)limitationinapplicationprograms,longwiredelays,highpowerdensityMemorywallisgettinghigherbetweenCPUandstoragedevicesImprovingsingleprogramperformanceisstillveryimportant2023/1/113PCYew-TaiwanParallelProcessingComestotheRescue–Finally?Parallelprocessinghasbeenproposedtosalvageclockratelimitation=>forthepastthirtyyears!!!Finally?=>multiplecoresinIntel’sroadmapaswellasinmostembeddedprocessorstodayWhatisnewhere?
Usethread-levelparallelism(TLP)toimproveinstruction-levelparallelism(ILP)
forgeneral-purposeapplications2023/1/114PCYew-TaiwanILPvs.TLPTimeop1op2op3op4op5op6op7op8op9op10op11op12……………..op21op22op23op24t1t2t3t6Timet1t2t3t6op1op2op3op4op5op6op7op8op9op10op11op12…………op21op22op23op24Th1Th2Th3Th4SuperscalarTLPILP2023/1/115PCYew-TaiwanParallelProcessingComestotheRescue–Finally?IsthereenoughTLPingeneral-purposeapplicationprograms(toimproveILP)?=>muchharderthanscientificapplications(floating-point-intensive)2023/1/116PCYew-Taiwan2023/1/117PCYew-TaiwanTLPChallengesinGeneral-PurposeApplicationsMostlyDo-whileloopsNeedthread-levelcontrolspeculationParallelismexistsmostlyinouterloopsNotgoodforVLIW(i.e.softwarepipelining),orvectorprocessing=>needthread-levelsupportPointerscomplicatealiasanddatadependenceanalysisNeedruntimedisambiguationanddataspeculationManysmallloopsanddoacrossloopsNeedfastandlowoverheadcommunicationSmallbasicblocks–needtoexploitbothILPandTLP
Neednewapproachestoapplyparallelprocessingtosuchapplications!!2023/1/118PCYew-TaiwanOutlineMulti-threadedarchitecturesSpeculationtobreakdependencySpeculativeexecutiononsingle-threadedprocessorsSpeculativeexecutiononmulti-threadedprocessorsProfile-basedanalysesConclusion2023/1/119PCYew-TaiwanMulti-ThreadedArchitectures
Toimprovesingle-programspeedupMultiscalarSuperthreaded
ProrcessorsTraceprocessorMultiprocessoronachipToimproveresourceutilization,throughputSimultaneousMultithreading(SMT)TohidememorylatencyTeracomputer,HyperthreadingTosupportsystem/applicationfunctionalityReference:SpeculativeExecutioninHighPerformanceComputerArchitectures,editedbyKaeliandYew,CRCPress,20052023/1/1110PCYew-TaiwanSuperthreadedArchitectures
Exploitthread-levelparallelismtoenhanceILPMultiplevs.singleinstructionwindows(notforscalabilityasintraditionalparallelprocessing)Controlspeculation(notstoppedbybranchinstructions)Dataspeculation(notstoppedbydatadep’sbetweenthreads)Fastcommunication=>smalltaskgranularityHighcachehitrates,automaticdataprefetchingNeednewhardwareandcompiler/softwaresupportReference:
TheSuperthreadedProcessorArchitecture,Tsai,etal
IEEETrans.OnComputers,Sept19992023/1/1111PCYew-Taiwan
InstructionCache
DataCacheThreadprocessingunitExecutionUnitComm.UnitMemoryBufferWritebackUnitThreadprocessingunitExecutionUnitComm.UnitMemoryBufferWritebackUnitThreadprocessingunitExecutionUnitComm.UnitMemoryBufferWritebackUnitThreadprocessingunitExecutionUnitComm.UnitMemoryBufferWritebackUnit2023/1/1112PCYew-TaiwanSpeculation:
BreakingProgramDependencyControlanddata
dependenceslimitprogramperformanceHowever,MostbrancheshavegoodpredictabilityMostdatadependences
happeninfrequently
atruntimeSpeculationisaneffectiveapproachtobreakdependencesOptimizeprogramexecutionbyignoringinfrequent
datadependences,ortakingpredictedpathsCheck
possibleviolation(mis-speculation)atruntimeRecoverifviolationoccurs2023/1/1113PCYew-TaiwanTypeofSpeculationControlspeculationSpeculateonprogramcontrolflowpathDataspeculationSpeculateonhowlikelymemoryreferencesaretothesamememorylocation(address)ValuespeculationSpeculationontheresultvalueofanoperation2023/1/1114PCYew-TaiwanOutlineMulti-treadedarchitecturesSpeculationtobreakdependencySpeculativeexecutiononsingle-threadedprocessorsSpeculativeexecutioninmulti-threadedprocessorsProfile-basedanalysesConclusion2023/1/1115PCYew-TaiwanSpeculationonIntelIA64BothcontrolanddataspeculationaresupportedonIntelIA64SpecialinstructionsandhardwareareprovidedMemoryloadoperationistargetedforspeculationMemorydelayisusuallythebottleneckofperformanceMemoryloadisusuallythestartofspeculativeoperations2023/1/1116PCYew-TaiwanSpeculatingonDataDependence
MorespeculativeoptimizationsI1:…=*qI2:*p=bI3:…=*qI4:*r=…I5:…=*pI6:*r=…SpeculateonthisdependenceRedundancyeliminationopportunity2023/1/1117PCYew-TaiwanSpeculateonDataDependences
MorespeculativeoptimizationsI1:…=*qI2:*p=bI3:…=*qI4:*r=…I5:…=*pI6:*r=…SpeculateonthisdependenceCopypropagationopportunity2023/1/1118PCYew-TaiwanSpeculateonDataDependences
MorespeculativeoptimizationsI1:…=*qI2:*p=bI3:…=*qI4:*r=…I5:…=*pI6:*r=…SpeculateonthisdependenceDeadstoreeliminationopportunity2023/1/1119PCYew-TaiwanObservationsSpeculativeoptimizationopportunitiesexistinmanyapplications(originally,itwasonlyformemorylatencyhidingduringcodescheduling)AgeneralcompilerframeworkisneededtosupportbothcontrolanddataspeculationinoptimizationsNeedtogeneraterecoverycodeformis-speculationNeedextensivesupportfordatadependence,alias,andvalueprofiling
(nolongerconservativeanalysis)Reference:
ACompilerFrameworkforSpeculativeAnalysisandOptimizations,ACM/SIGPLANConf.OnProgrammingLanguageDesignandImplementation(PLDI),June2003,alsoinACMTrans.OnArchitectureandCodeOptimization(TACO),Vol.1,No.3,Sept.2004,pp.247-2712023/1/1120PCYew-TaiwanACompilerFramework:
IntelOpenResearchCompiler(ORC)2023/1/1121PCYew-TaiwanPerformanceImprovementofSpeculativeRegisterPromotionBasedonaliasprofileandcomparedwith–O3withtype-basedaliasanalysisonIntelORCcompiler2023/1/1122PCYew-TaiwanValueSpeculation
ValueLocality:likelihoodofapreviously-seenvaluerecurringwithinastoragelocationObservedinanystoragelocationsRegistersCachememoryMainmemoryMostworkfocussingonvaluestoredinregisterstobreakpotentialdatadependences:registervaluelocality2023/1/1123PCYew-TaiwanPerformanceofValuePredictorsPredictabilityofDataValues,SazeidesandSmith,Micro-30,1997Lastvaluepredictionvariesfrom23%to61%,averageabout40%Stridepredictionvariesfrom38%to80%,averageabout56%FCMwithanorderof3variesfrom56%toover90%,withanaverageofabout78%ImprovementdiminishesasorderincreasesLesssensitivetodifferenttypesofinstructions2023/1/1124PCYew-TaiwanOutlineIntroductionSpeculationtobreakdependencySpeculativeexecutiononsingle-threadedprocessorsSpeculativeexecutiononmulti-threadedprocessorsProfile-basedanalysesConclusion2023/1/1125PCYew-TaiwanCompilerOptimizationsforSpeculativeThreadsWithoutcompileroptimization,thereislimitedTLPevenunderperfecthardwaresupport.[OplingerPACT99]CompilerhavetodecideWhichloops/regionstobetransformedintothreadUsesynchronizationorspeculationHowtoschedulethecodetoimproveoverlapsWhattransformationstobeusedWhen/HowtogeneraterecoverycodeProfile-basedanalysiscouldbeveryefficient2023/1/1126PCYew-TaiwanLoopSelectionprogramspeedupCarefullyselectedloopscanimproveperformancesignificantly!2023/1/1127PCYew-TaiwanSpeculativeCodeMotion*p=*p=*p==*p=*p=*p*p=*p=*p=
=*p=*p=*pstall
critical
pathother
computation
beforecodemotionaftercodemotion2023/1/1128PCYew-TaiwanOutlineIntroductionSpeculationtobreakdependencySpeculativeexecutiononsingle-threadedprocessorsSpeculativeexecutiononmulti-threadedprocessorsProfile-basedanalysisConclusion2023/1/1129PCYew-TaiwanCrucialConsiderationsinDependenceProfilingProgramcoverage=>needcompiler’ssupportoruseheuristicrulesInputsensitivityProfilingoverhead(spaceandtime)Usingaliasanddatadependenceprofilesisinherentlyspeculative=>needhardwaresupportforcorrectexecution2023/1/1130PCYew-TaiwanAliasProfilingvs.StaticAnalysis
Mostpossibledatadependencereportedbycompilerdonotoccuratruntime2023/1/1131PCYew-Taiwan
DataDependenceProfilingDatadependenceedgesamongmemoryreferencesandfunctioncallsDetailedinformationtype:flow,anti,output,orinputprobability:frequencyofoccurrenceWhenloopsaretargeteddependencedistance:limited2023/1/1132PCYew-TaiwanOverheadofProfiling96110102121120020406080bzip2craftygapgccgzipmcfparserperlbmktwolfvortexvpraverageXtimessloweraliasDDwithoutdistanceDDforinnermostloopsDD4-levelloopsCompiler:ORCversion2.0Machine:Itanium2,900MHzand2GmemoryBenchmarks:SPECCPU2000IntInstrumentationoptimizationhasbeendone2023/1/1133PCYew-TaiwanTechniquestoReduceProfilingOverheadReducethespacerequirementbyhashtableLargergranularityofaddressSmalleriterationcounterSamplingSamplethesnapshotsofproceduresorloopsinsteadofindividualreferencesUseinstrumentation-basedsamplingframeworkSwitchatproceduresorloops2023/1/1134PCYew-TaiwanConclusionsMicroprocessorshavecaughtupwithsupercomputersin’90andhavegonemulti-coreItisnon-trivialtoapplycurrentsupercomputingtechnologiestogeneral-purposeapplicationsNewarchitecturalsupportsuchasthread-levelspeculativeexecution,andnewcompilertechniquessuchasspeculativeoptimizationsusingaliasanddatadependenceprofiling,evendynamicoptimizationatruntime,arecrucial–asalwaysAveryexcitingandneweraforparallelprocessingmighthavearrived(especiallyinembeddedsystems)–finally!2023/1/1135PCYew-TaiwanReferencesJ.Linetal,ACompilerFrameworkforSpeculativeAnalysisandOptimizations,Proc.OfACM/SIGPLANConf.OnProgrammingLanguageDesignandImplementation(PLDI),June2003,alsoinACMTrans.OnArchitectureandCodeOptimization(TACO),Vol.1,No.3,Sept.2004,pp.247-271J.Linetal,RecoveryCodeGenerationforGeneralSpeculativeOptimizations,toappearinACMTrans.OnArchitectureandCodeOptimization(TACO)2005.(3)J.Linetal,SpeculativeRegisterPromotionUsingAdvancedLoadAddressTable(ALAT),Proc.OfIEEE/ACMInt’lSymp.OnCodeGenerationandOptimization(CGO),March2003(4)T.Chenetal,DataDependenceProfilingforSpeculativeOptimizations,Proc.OfInt’lConfonCompilerConstruction(CC),March2004(5)T.Chenetal,AnEmpiricalStudyontheGranularityofPointerAnalysisinCprograms,Proc.15thWorkshoponLanguagesandCompilersforParallelComputing(LCPC),August2002(6)J.Y.Tsaietal,TheSuperthreadedProcessorArchitecture,IEEETransonComputers,specialissueonMultithreadedArchitecture,Vol.48,No.9,Sept19992023/1/1136PCYew-TaiwanControlSpeculationld.s:movetheloadoperationacrossthebarri
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 用人單位勞動(dòng)勞務(wù)合同書(29篇)
- 2024文明禮儀學(xué)習(xí)心得(14篇)
- 蛋雞飼料加工智能化生產(chǎn)項(xiàng)目可行性研究報(bào)告寫作模板-備案審批
- 2025年世界知名品牌代理商合作協(xié)議
- 2025年購房意向策劃金協(xié)議范本版
- 2025年專利技術(shù)購買與轉(zhuǎn)讓合同范例
- 2025年硅藻泥項(xiàng)目申請(qǐng)報(bào)告模式
- 2025年信息技術(shù)咨詢服務(wù)收購協(xié)議
- 2025年鈉濾膜項(xiàng)目提案報(bào)告模板
- 2025年信用卡債務(wù)分期償還安排協(xié)議
- 垃圾中轉(zhuǎn)站運(yùn)營管理投標(biāo)方案(技術(shù)標(biāo))
- 分層作業(yè),分出活力小學(xué)數(shù)學(xué)作業(yè)分層設(shè)計(jì)的有效策略
- 2023湖南省修訂醫(yī)療服務(wù)價(jià)格項(xiàng)目匯總表
- 驗(yàn)證機(jī)械能守恒物理實(shí)驗(yàn)報(bào)告
- (完整)雙溪課程評(píng)量表
- 成人機(jī)械通氣患者俯臥位護(hù)理2023護(hù)理團(tuán)體標(biāo)準(zhǔn)7
- 體育心理學(xué)(第三版)PPT全套教學(xué)課件
- 【鋼鐵冶煉】-銻冶煉先關(guān)工藝
- 拉線的制作詳細(xì)
- 護(hù)理管理組織體系架構(gòu)圖
- 漸變方格模板
評(píng)論
0/150
提交評(píng)論