




版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
ComputerArchitecture,Spring2008
TsinghuaUniversity
流水線基本技術(shù)
(Pipelining)
汪東升(Prof.DongshengWang)
wds@
清華大學(xué)計(jì)算機(jī)系科學(xué)與技術(shù)系
http:〃CPU.
21
1JL
ArelevantquestionTsinghuaUniversity
■Assumingyou'vegot:
□Onewasher(takes30minutes)
□Onedrier(takes40minutes)
□One“folder"(takes20minutes)
■Ittakes90minutestowash,dry,andfold1loadoflaundry.
□Howlongdoes4loadstake?
2
ComputerArchitecture,Spring2008
TheslowwayTsinghuaUniversity
6PM7891011Midnight
Time
304020304020304020304020
R7
CD
夕____%_
■Ifeachloadisdonesequentiallyittakes6hours
3
ComputerArchitecture,Spring2008
LaundryPipeliningTsinghuaUniversity
■Starteachloadassoonaspossible
□Overlaploads
6PM7891011Midnight
Time
30?40?-40l"40'-40W
^^51?
■Pipelinedlaundrytakes3.5hours
ComputerArchitecture,Spring2008
PipeliningLessonsTsinghuaUniversity
?Pipeliningdoesn'thelplatencyof
6PM789
singleload,ithelpsthroughputof
Timeentireworkload
?Pipelineratelimitedbyslowest
3040Tb2-40pipelinestage
■Multipletasksoperating
simultaneouslyusingdifferent
resources
■Potentialspeedup=Numberpipe
stages
■Unbalancedlengthsofpipestages
reducesspeedup
■Timeto“fill”pipelineandtimeto“drain”
itreducesspeedup
5
ComputerArchitecture,Spring2008
PipeliningisnotjustMultiprocessingTsinghuaUniversity
■Pipeliningdoesinvolveparallelprocessing,butinaspecificway.
■Bothmultiprocessingandpipeliningrelatetotheprocessingofmultiple
“things"usingmultiple"functionalunits"
□Multiprocessingimplieseachthingisprocessedentirelybyasingle
functionalunit
■e.g.,multiplelanesatthesupermarket
□Inpipelining,eachthingisbrokenintoasequenceofpieces,where
eachpieceishandledbyadifferent(specialized)functionalunit.
■Supermarketanalogy?
■Pipeliningandmultiprocessingarenotmutuallyexclusive
□Modernprocessorsdoboth,withmultiplepipelines(e.g.,
superscalar)
6
ComputerArchitecture,Spring2008
PipeliningTsinghuaUniversity
■Pipeliningisageneral-purposeefficiencytechnique
□Itisnotspecifictoprocessors
■Pipeliningisusedin:
□Assemblylines
□Bucketbrigades
□Fastfoodrestaurants
■PipeliningisusedinotherCSdisciplines:
□Networking
□Serversoftwarearchitecture
■Usefultoincreasethroughputinthepresenceoflonglatency
□Moreonthatlater...
7
ComputerArchitecture,Spring2008
InstructionexecutionreviewTsinghuaUniversity
■ExecutingaMIPSinstructioncantakeuptofivesteps.
StepNameDescription
InstructionFetchIFReadaninstructionfrommemory.
InstructionDecodeIDReadsourceregistersandgeneratecontrolsignals.
ExecuteEXComputeanR-typeresultorabranchoutcome.
MemoryMEMReadorwritethedatamemory.
WritebackWBStorearesultinthedestinationregister.
■However,aswesaw,notallinstructionsneedallfivesteps.
InstructionStepsrequired
beqIFIDEX
R-typeIFIDEXWB
swIFIDEXMEM
IwIFIDEXMEMWB
8
ComputerArchitecture,Spring2008
浦多又承
Single-cycledatapathdiagramTsinghuaUniversity
■Howlongdoesittaketoexecuteeachinstruction?
9
ComputerArchitecture,Spring2008
Example:InstructionFetch(IF)TsinghuaUniversity
■LefsquicklyreviewhowIwisexecutedinthesingle-cycledatapath.
■WeUIignorePCincrementingandbranchingfornow.
■IntheInstructionFetch(IF)step,wereadtheinstructionmemory.
10
ComputerArchitecture,Spring2008
InstructionDecode(ID)TsinghuaUniversity
■TheInstructionDecode(ID)stepreadsthesourceregisterfrom
theregisterfile.
11
ComputerArchitecture,Spring2008
浦多又承
Execute(EX)TsinghuaUniversity
■Thethirdstep,Execute(EX),computestheeffective
memoryaddressfromthesourceregisterandthe
instruction'sconstantfield.
RegWrite
12
ComputerArchitecture,Spring2008
浦多又承
Memory(MEM)TsinghuaUniversity
■TheMemory(MEM)stepinvolvesreadingthedata
memory,fromtheaddresscomputedbytheALU.
RegWrite
13
ComputerArchitecture,Spring2008
Writeback(WB)TsinghuaUniversity
■Finally,intheWriteback(WB)step,thememory
valueisstoredintothedestinationregister.
RegWrite
14
ComputerArchitecture,Spring2008
AbunchoflazyfunctionalunitsTsinghuaUniversity
■Noticethateachexecutionstepusesadifferentfunctionalunit.
■Inotherwords,themainunitsareidleformostofthe8nscycle!
□TheinstructionRAMisusedforjust2nsatthestartofthe
cycle.
□RegistersarereadonceinID(1ns),andwrittenonceinWB
(1ns).
□TheALUisusedfor2nsnearthemiddleofthecycle.
□Readingthedatamemoryonlytakes2nsaswell.
■Thafsalotofhardwaresittingarounddoingnothing.
15
ComputerArchitecture,Spring2008
PuttingthoseslackerstoworkTsinghuaUniversity
■Weshouldn'thavetowaitfortheentireinstructiontocompletebefore
wecanre-usethefunctionalunits.
■Forexample,theinstructionmemoryisfreeintheInstructionDecode
stepasshownbelow,so...
IdleInstructionDecode(ID)
________A_______
16
ComputerArchitecture,Spring2008
DecodingandfetchingtogetherTsinghuaUniversity
■Whydon'twegoaheadandfetchthenextinstructionwhilewe5re
decodingthefirstone?
Fetch2ndDecode1stinstruction
17
ComputerArchitecture,Spring2008
Executing,decodingandfetching
■Similarly,oncethefirstinstructionentersitsExecutestage,wecango
aheadanddecodethesecondinstruction.
■Butnowtheinstructionmemoryisfreeagain,sowecanfetchthethird
instruction!
Fetch3rdDecode2ndExecute1st
________A__________________________八_______________人_______________
18
ComputerArchitecture,Spring2008
MakingPipeliningWorkTsinghuaUniversity
■We'llmakeourpipeline5stageslong,tohandleeachofthefivesteps
inaloadinstructions(thelongestinstructionforthismachine)
□Stagesare:IF,ID,EX,MEM,andWB
■Wewanttosupportexecuting5instructionssimultaneously:onein
eachstage.
19
ComputerArchitecture,Spring2008
Breakdatapathinto5stagesTsinghuaUniversity
■Insertpipelineregisters
■Eachstagehasitsownfunctionalunits.
■Eachstagecanexecutein2ns
IFIDEXEMEMWB
20
ComputerArchitecture,Spring2008
800C6u-」ds-9」nlo①l-llo」<」2ndE0。
69
(dsAoe一寸⑤m
蕓
(ds*)9二ocls
L(dsAMLMls
LCCM
L蕓
QL(ds*)8
蕓
9MlASWxaQ-H=(ds*)寸318
09g寸
00-0^0
speo"|
PipeliningPerformanceTsinghuaUniversity
Clockcycle
123456789
Iw$t0,4($sp)IFIDEXMEMWB
Iw$t1,8($sp)IFIDEXMEMWB
Iw$t2,12($sp)IFIDEXMEMWB
Iw$t3,16($sp)IFIDEXMEMWB
J
Iw$t4,20($sp)IFIDEXMEMWB
filling
■Executiontimeonidealpipeline:
□timetofillthepipeline+onecycleperinstruction
□HowlongforNinstructions?
■Comparewithotherimplementations:
□SingleCycle:(8nsclockperiod)
■HowmuchfasterispipeliningforN=1000?
22
ComputerArchitecture,Spring2008
PipelineDatapath:ResourceRequirements
Clockcycle
123456789
lw$t0,4($sp)IFIDEXMEMWB
Iw$t1,8($sp)IFIDEXMEMWB
lw$t2,12($sp)IFIDEXMEMWB
Iw$t3,16($sp)IFIDEXMEMWB
Iw$t4,20($sp)IFIDEXMEMWB
■Weneedtoperformseveraloperationsinthesamecycle.
□IncrementthePCandaddregistersatthesametime.
□Fetchoneinstructionwhileanotheronereadsorwritesdata.
■Whatdoesthatmeanforourhardware?
23
ComputerArchitecture,Spring2008
Pipeliningotherinstructiontypes
■R-typeinstructionsonlyrequire4stages:IF,ID,EX,andWB
□Wedon5tneedtheMEMstage
■WhathappensifwetrytopipelineloadswithR-type
instructions?
Clockcycle
123456789
add$sp,$sp,-4IFIDEXWB
sub$v0,$a0,$a1IFIDEXWB
Iw$t0,4($sp)IFIDEXMEMWB
or$s0,$s1,$s2IFIDEXWB
Iw$t1,8($sp)IFIDEXMEMWB
24
ComputerArchitecture,Spring2008
浦多又承
ImportantObservationTsinghuaUniversity
■Eachfunctionalunitcanonlybeusedonceperinstruction
■Eachfunctionalunitmustbeusedatthesamestageforall
instructions:
LoadusesRegisterFile'sWritePortduringits5thstage
R-typeusesRegisterFile'sWritePortduringits4thstage
Clockcycle
123456789
add$sp,$sp,-4IFIDEXWB
sub$v0,$a0,$a1IFIDEXWB
Iw$t0,4($sp)IFIDEXMEMWB
or$s0,$s1,$s2IFIDEXWB
Iw$t1,8($sp)IFIDEXMEMWB
25
ComputerArchitecture,Spring2008
Asolution:InsertNOPstages
■Enforceuniformity
□Makeallinstructionstake5cycles.
□Makethemhavethesamestages,inthesameorder
■Somestagesw川donothingforsomeinstructions
Rtype|IF|ID|EX|NOP|WB
Clockcycle
123456789
add$sp,$sp,-4IFIDEXNOPWB
sub$v0,$a0,$a1IFIDEXNOPWB
Iw$t0,4($sp)IFIDEXMEMWB
or$s0,$s1,$s2IFIDEXNOPWB
Iw$t1,8($sp)IFIDEXMEMWB
■StoresandBrancheshaveNOPstages,too...
storeIFIDEXMEMNOP
branchIFIDEXNOPNOP|
26
ComputerArchitecture,Spring2008
浦多又承
SummaryTsinghuaUniversity
■Pipeliningattemptstomaximizeinstructionthroughputby
overlappingtheexecutionofmultipleinstructions.
■Pipeliningoffersamazingspeedup.
□Inthebestcase,oneinstructionfinishesoneverycycle,and
thespeedupisequaltothepipelinedepth.
■Thepipelinedatapathismuchlikethesingle-cycleone,but
withaddedpipelineregisters
Eachstageneedsisownfunctionalunits
■Nexttimewe'llseethedatapathandcontrol,andwalkthrough
anexampleexecution.
27
ComputerArchitecture,Spring2008
PipelineddatapathandcontrolTsinghuaUniversity
■Lasttimeweintroducedthemainideasofpipelining.
■Todaywellseeabasicimplementationofapipelinedprocessor.
□Thedatapathandcontrolunitsharesimilaritieswiththesingle-
cycleimplementationthatwealreadysaw.
□Anexampleexecutionhighlightsimportantpipeliningconcepts.
■Infuturelectures,we'lldiscussseveralcomplicationsofpipelining
thatwe'rehidingfromyoufornow.
28
ComputerArchitecture,Spring2008
ComputerArchitecture,Spring2008
TsinghuaUniversity
Pipelineddatapathand
control
PipeliningconceptsTsinghuaUniversity
■Apipelinedprocessorallowsmultipleinstructionstoexecuteatonce,
andeachinstructionusesadifferentfunctionalunitinthedatapath.
■Thisincreasesthroughput,soprogramscanrunfaster.
□Oneinstructioncanfinishexecutingoneveryclockcycle,and
simplerstagesalsoleadtoshortercycletimes.
Clockcycle
123456789
Iw$t0,4($sp)IFIDEXMEMWB
sub$v0,$a0,$a1IFIDEXMEMWB
and$t1,$t2,$t3IFIDEXMEMWB
or$s0,$s1,$s2IFIDEXMEMWB
add$t5,$t6,$0IFIDEXMEMWB
30
ComputerArchitecture,Spring2008
PipelinedDatapathTsinghuaUniversity
■Thewholepointofpipeliningistoallowmultipleinstructionstoexecuteatthe
sametime.
■Wemayneedtoperformseveraloperationsinthesamecycle.
□IncrementthePCandaddregistersatthesametime.
□Fetchoneinstructionwhileanotheronereadsorwritesdata.
Clockcycle
123456789
Iw$t0,4($sp)IFIDEXMEMWB
sub$v0,$a0,$a1IFIDEXMEMWB
and$t1,$t2,$t3IFIDEXMEMWB
or$s0,$s1,$s2IFIDEXMEMWB
add$t5,$t6,$0IFIDEXMEMWB
■Thus,likethesingle-cycledatapath,apipelinedprocessorwillneedto
duplicatehardwareelementsthatareneededseveraltimesinthesameclock
cycle.
□Whatabouttheregisterfile?
31
ComputerArchitecture,Spring2008
OneregisterfileisenoughTsinghuaUniversity
■WeneedonlyoneregisterfiletosupportboththeIDandWBstages.
ReadRead
register1data1
ReadRead
register2data2
Write
register
Registers
Write
data
■Readsandwritesgotoseparateportsontheregisterfile.
■Wealreadytookadvantageofthispropertyinoursingle-cycleCPU.
32
ComputerArchitecture,Spring2008
浦多又承
TsinghuaUniversity
Single-cycledatapath,slightlyrearranged
ComputerArchitecture,Spring200833
PipelineregistersTsinghuaUniversity
■Welladdintermediateregisterstoourpipelineddatapath.
■There'salotofinformationtosave,however.We'llsimplifyourdiagramsby
drawingjustonebigpipelineregisterbetweeneachstage.
■Theregistersarenamedforthestagestheyconnect.
IF/IDID/EXEX/MEMMEM/WB
NoregisterisneededaftertheWBstage,becauseafterWBtheinstructionis
done.
34
ComputerArchitecture,Spring2008
PipelineddatapathTsinghuaUniversity
u
I
PCSrc
EX/MEMMEM/WB
Shift
RegWriteleft2
ReadRead
register1data1MemWrite
Zero
ReadRead
register2data2Resultl—>Address
Write
Data
registerMemToReg
memory
RegistersALUOp
WriteY
dataALUSrcWriteRead
datadata
Instr[15-0]Sign
RegDst
extendMemRead
Instr[20-16]
Instr[15-11]
35
ComputerArchitecture,Spring2008
PropagatingvaluesforwardTsinghuaUniversity
■Anydatavaluesrequiredinlaterstagesmustbepropagatedthrough
thepipelineregisters.
■Themostextremeexampleisthedestinationregister.
□Therdfieldoftheinstructionword,retrievedinthefirststage(IF),
determinesthedestinationregister.Butthatregisterisn'tupdated
untilthefifthstage(WB).
□Thus,therdfieldmustbepassedthroughallofthepipeline
stages,asshowninredonthenextslide.
■Noticethatwecan'tkeepasingle^instructionregister,becausethe
pipelinedmachineneedstofetchanewinstructioneveryclockcycle.
36
ComputerArchitecture,Spring2008
ThedestinationregisterTsinghuaUniversity
u
I
PCSrc
EX/MEMMEM/WB
RegWriteleft2
ReadRead
register1data1MemWrite
ReadRead
register2data2Result—>■>Address
Write
Data
registerMemToReg
memory
RegistersALUOp
WriteY
dataALUSrcWriteRead
■>datadata
Instr[15-0]Sign
RegDst
extendMemRead
Instr[20-16]
Instr[15-11]
37
ComputerArchitecture,Spring2008
Whataboutcontrolsignals?TsinghuaUniversity
■Thecontrolsignalsaregeneratedinthesamewayasinthesingle-
cycleprocessor——afteraninstructionisfetched,theprocessor
decodesitandproducestheappropriatecontrolvalues.
■Butjustlikebefore,someofthecontrolsignalswillnotbeneeded
untilsomelaterstageandclockcycle.
■Thesesignalsmustbepropagatedthroughthepipelineuntilthey
reachtheappropriatestage.Wecanjustpasstheminthepipeline
registers,alongwiththeotherdata.
■Controlsignalscanbecategorizedbythepipelinestagethatuses
them.
38
ComputerArchitecture,Spring2008
PipelineddatapathandcontrolTsinghuaUniversity
1
0K-I
ID/EX
EX/MEM
PCSrc
Control
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- (二檢)廈門(mén)市2025屆高中畢業(yè)班第二次質(zhì)量檢測(cè)歷史試卷
- 酒店勞動(dòng)外包合同(2篇)
- 技術(shù)研發(fā)團(tuán)隊(duì)人員結(jié)構(gòu)統(tǒng)計(jì)表格
- 心理學(xué)與社會(huì)行為分析試題及答案
- 農(nóng)業(yè)產(chǎn)業(yè)鏈?zhǔn)袌?chǎng)分析表
- 新型能源技術(shù)合作開(kāi)發(fā)保密條款合同書(shū)
- 《汽車(chē)電氣設(shè)備構(gòu)造與檢修》專(zhuān)題復(fù)習(xí) 課件匯 復(fù)習(xí)專(zhuān)題1-8
- 集裝箱運(yùn)輸合同
- 冰雪奇緣的童話世界征文
- 文件傳輸與接收流程表格
- 高中彎道跑教案
- 音樂(lè)劇悲慘世界歌詞
- 大狗巴布課件教學(xué)
- 湖南非稅在線繳費(fèi)操作步驟
- 精品殘疾兒童教育送教上門(mén)語(yǔ)文教案課程
- 《法院執(zhí)行實(shí)務(wù)》單元三(上)(課堂PPT)課件
- 煤礦防治水中長(zhǎng)期規(guī)劃2017—2019
- 幼兒園一日生活中的保教結(jié)合(課堂PPT)
- 有害物質(zhì)培訓(xùn)教材(ROHS2.0及REACH)
- 德語(yǔ)A1單詞表
- ARL4460 OXSAS曲線制作及學(xué)習(xí)筆記
評(píng)論
0/150
提交評(píng)論