版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
Chapter
4處理器概要決定CPU性能的三個(gè)關(guān)鍵因素指令數(shù)目取決于指令集結(jié)構(gòu)(ISA)和編譯器每條指令所需時(shí)鐘周期數(shù)CPI時(shí)鐘周期長(zhǎng)度介紹兩種MIPS實(shí)現(xiàn)方式簡(jiǎn)單形式更實(shí)際的流水線(xiàn)形式一個(gè)基本的MIPS實(shí)現(xiàn)器
指令:lw,sw數(shù)邏運(yùn)算指令:add,
sub,
and,
or,
slt(小于則設(shè)置)分支指令:beq,j§4.1Introduction取決于
CPU
硬件Chapter
4
—
The
Processor—
2指令的執(zhí)行程序計(jì)數(shù)器(PC)
指令所在的 單元,寄存器數(shù)目并從中取出指令取字指令
一個(gè)寄存器,其他指令一般需要 兩個(gè)寄存器后續(xù)指令的執(zhí)行取決于指令類(lèi)型算術(shù)邏輯單元(ALU)的運(yùn)算得到算術(shù)運(yùn)算結(jié)果器計(jì)算分支指令目標(biāo)地址/
器PC
目標(biāo)地址或
PC
+
4Chapter
4
—
The
Processor—
3CPU
概覽Chapter
4
—
The
Processor—
4多選器Chapter
4
—
The
Processor—
5這里不是簡(jiǎn)單的電路相連使用多選器控制Chapter
4
—
The
Processor—
6邏輯設(shè)計(jì)基礎(chǔ)§4.2Logic
Design
Conventions二進(jìn)制信息編碼低電平=0,高電平=1一線(xiàn)一位多位數(shù)據(jù)編碼在數(shù)據(jù)總線(xiàn)組合部件對(duì)數(shù)據(jù)操作輸出是輸入的函數(shù)狀態(tài)(時(shí)序)部件信息Chapter
4
—
The
Processor—
7組合部件與門(mén)Y
=
A
&
BABYI0I1Mu
YxS多選器Y
=
S
?
I1
:
I0ABY+BYALU數(shù)邏運(yùn)算單元Y
=
F(A,
B)AChapter
4
—
The
Processor—
8F加法器Y
=
A
+B時(shí)序部件寄存器:將數(shù)據(jù)
在電路里時(shí)鐘信號(hào)決定數(shù)據(jù)更新時(shí)刻邊沿觸發(fā):當(dāng)Clk從0變?yōu)?時(shí)更新數(shù)據(jù)DClkQClkDQChapter
4
—
The
Processor—
9時(shí)序部件帶有寫(xiě)信號(hào)控制的寄存器僅在時(shí)鐘邊沿且寫(xiě)控制信號(hào)為1時(shí)更新數(shù)據(jù)當(dāng)數(shù)據(jù)被寫(xiě)入時(shí)候使用DWriteClkQWriteChapter
4
—
The
Processor—
10DQClk時(shí)鐘方法組合邏輯在時(shí)鐘周期內(nèi)的數(shù)據(jù)變換關(guān)系邊沿觸發(fā)的時(shí)鐘從狀態(tài)單元輸入,輸出到狀態(tài)單元時(shí)鐘周期長(zhǎng)度決定了最長(zhǎng)延時(shí)組合邏輯、狀態(tài)單元和時(shí)鐘周期的關(guān)系Chapter
4
—
The
Processor—
11邊沿觸發(fā),支持狀態(tài)單元在同一個(gè)時(shí)鐘周期內(nèi)同時(shí)讀寫(xiě)建立數(shù)據(jù)通路數(shù)據(jù)通路CPU中處理數(shù)據(jù)和地址的部件寄存器,數(shù)邏運(yùn)算單元,多選器,逐步構(gòu)建MIPS數(shù)據(jù)通路器等通過(guò)逐步優(yōu)化總體設(shè)計(jì)理解MIPS數(shù)據(jù)通路§4.3
Building
aDatapathChapter
4
—
The
Processor—
12取指令32位寄存器Chapter
4
—
The
Processor—
13下一條指令地址為:PC+4寄存器型指令兩個(gè)寄存器操作數(shù)執(zhí)行數(shù)邏運(yùn)算將運(yùn)算結(jié)果寫(xiě)入寄存器oprsrtrdshamtfunct0R型指令格式Chapter
4
—
The
Processor—
14讀存指令讀寄存器操作數(shù)16位偏移量地址使用ALU,計(jì)算有符擴(kuò)展偏移量:讀 器更新寄存器:將寄存器內(nèi)容寫(xiě)入器B有符號(hào)擴(kuò)展單元oprsrtaddress35或43存取型指令格式Chapter
4
—
The
Processor—
15分支指令寄存器操作數(shù)比較兩個(gè)操作數(shù)使用ALU做減法,檢查ALU的零輸出信號(hào)計(jì)算目標(biāo)地址有符擴(kuò)展偏移量左移2位(以字為單位的偏移量)【參考P45】加到PC+4取指計(jì)算出PC+4(下一條指令的地址)oprsrtaddress4(eq)分支型指令格式Chapter
4
—
The
Processor—
16分支指令僅為重新定位線(xiàn)路符號(hào)位線(xiàn)路opaddress2(跳轉(zhuǎn))分支型指令格式Chapter
4
—
The
Processor—
17組成部件第一段數(shù)據(jù)通路需要在一個(gè)時(shí)鐘周期內(nèi)完成一條指令每個(gè)數(shù)據(jù)通路部件一次只能完成一個(gè)功能所以指令和數(shù)據(jù)需要分開(kāi)多選器為不同指令選擇數(shù)據(jù)源Chapter
4
—
The
Processor—
18寄存器型、讀存取數(shù)據(jù)通路Chapter
4
—
The
Processor—
19P191,圖4-10完整的數(shù)據(jù)通路【P192】Chapter
4
—
The
Processor—
20數(shù)邏控制【P192,ALU控制4.4.1】利用ALU讀存:F=加法分支:F=減法寄存器型:F由功能字段控制§4.4
A
Simple
Implementation
SchemeChapter
4
—
The
Processor—
21ALUcontrolFunction0000AND0001OR0010add0110subtract0111set-on-less-than1100NORChapter
4
—
The
Processor—
22數(shù)邏控制ALUOp中有2位操作碼組合邏輯驅(qū)動(dòng)數(shù)邏控制opcodeALUOpOperationfunctALU
functionALU
controllw00load
wordXXXXXXadd0010sw00store
wordXXXXXXadd0010beq01branch
equalXXXXXXsubtract0110R-type10add100000add0010subtract100010subtract0110AND100100AND0000OR100101OR0001set-on-less-than101010set-on-less-than0111主要控制單元指令中的控制信號(hào)0rsrtrdshamtfunct31:265:025:2120:1615:1110:635
or
43rsrtaddress31:2625:2120:1615:04rsrtaddress31:2625:2120:1615:0R-typeLoad/StoreBranch操作碼一直除load指令外讀數(shù)據(jù)寄存器型指令寫(xiě)數(shù)據(jù)并裝入符號(hào)擴(kuò)展和加法Chapter
4
—
The
Processor—
23帶控制信號(hào)的數(shù)據(jù)通路【p196】Chapter
4
—
The
Processor—
241、部件2、數(shù)據(jù)通路3、控制信號(hào)寄存器型指令【p198圖4-19】R型指令數(shù)據(jù)通路add
$t1,$t2,$t3器中取出指1)從指令令,PC自增。
2)從寄存器堆中讀出寄存器$t2和$t3;主控制單元計(jì)算出個(gè)控制信號(hào)的狀態(tài)。3)ALU根據(jù)funct字段(5:0位)確定ALU的功能。對(duì)從寄存器堆讀出的數(shù)據(jù)進(jìn)行操作4)將ALU的結(jié)果寫(xiě)入寄存器堆,根據(jù)指令的
15:11位選擇目標(biāo)寄存器($t1).Chapter
4
—
The
Processor—
25指令【p198圖4-20】指令數(shù)據(jù)通路Lw
$t1,offset($t2)器取指,從指令
PC自增。從寄存器堆讀出寄存器$t2的值。ALU將從寄存器堆讀出的值與符號(hào)擴(kuò)展后的指令低16位值(offset)相加。將ALU的結(jié)果作為數(shù)據(jù)存儲(chǔ)器的地址。5) 單元的數(shù)據(jù)寫(xiě)入寄存器堆,目標(biāo)寄存器由指令的20:16位($t1)
。Chapter
4
—
The
Processor—
26分支指令【p198圖4-21】分支指令數(shù)據(jù)通路beq
$t1,$t2,offset器中取指,從指令
PC自增。從寄存器堆讀出寄存器$t1和$t2的值。ALU將從寄存器堆讀出的兩數(shù)相減。PC+4的值與符號(hào)擴(kuò)展并左移2位后的指令低16位(offset)相加。根據(jù)ALU的零輸出決定哪個(gè)加法器的結(jié)果存入PC中。Chapter
4
—
The
Processor—
27執(zhí)行跳轉(zhuǎn)利用字地址跳轉(zhuǎn)用以下數(shù)更新PC原PC
的高四位26位跳轉(zhuǎn)地址00從操作碼中獲取的其他控制信號(hào)2address31:2625:0Jumpaddress25:0
27:2,00PC31:28加Chapter
4
—
The
Processor—
28加法跳轉(zhuǎn)數(shù)據(jù)通路【p201圖4-24】Chapter
4
—
The
Processor—
29性能最長(zhǎng)延遲決定時(shí)鐘周期關(guān)鍵通路:
指令指令器
寄存器堆
數(shù)邏單元
數(shù)據(jù)器
寄存器堆不同指令不可能使用不同的時(shí)鐘周期單周期指令集設(shè)計(jì)方法
設(shè)計(jì)原則對(duì)常用的指令加速,是設(shè)計(jì)指令遵守的原則流水線(xiàn)方式提高性能Chapter
4
—
The
Processor—
30流水線(xiàn)流水的干洗店:交錯(cuò)執(zhí)行并行提高性能§4.5
An
Overview
of
PipeliningChapter
4
—
The
Processor—
31四次裝載=
8/3.5
=2.3無(wú)停頓:加速比=
2n/0.5n
+
1.5
≈
4=
number
of
stages(流水線(xiàn)的級(jí)數(shù))1、把衣服放入洗衣機(jī);2、把洗好的衣服取出放入烘干機(jī);3、烘干好的衣服疊起來(lái);4、把衣服收好。共計(jì)4個(gè)操作,每個(gè)操作時(shí)間是半小時(shí),0.5×4×4=8小時(shí)。即4人順序洗衣的時(shí)間是8小時(shí)。MIPS
流水線(xiàn)5段,每段一步IF:
從
器ID:
指令
并指令寄存器EX:執(zhí)行操作或計(jì)算地址MEM:
器操作數(shù)WB:將結(jié)果寫(xiě)回寄存器Chapter
4
—
The
Processor—
32Chapter
4
—
The
Processor—
33流水線(xiàn)性能假定每段的時(shí)間是:寄存器讀寫(xiě)用時(shí)100ps其他段用時(shí)200ps流水線(xiàn)數(shù)據(jù)通路與單周期數(shù)據(jù)通路比較InstrInstr
fetchRegisterreadALUopMemoryaccessRegisterwriteTotal
timelw200ps100
ps200ps200ps100
ps800pssw200ps100
ps200ps200ps700psR-format200ps100
ps200ps100
ps600psbeq200ps100
ps200ps500ps流水線(xiàn)性能Single-cycle(Tc=800ps)單周期Pipelined(Tc=200ps)流水線(xiàn)Chapter
4
—
The
Processor—
34流水線(xiàn)加速如果所有的段是平衡的(例如時(shí)間相等)則:流水線(xiàn)中指令間的時(shí)間=非流水線(xiàn)中的指令間時(shí)間段數(shù)如果各段不平衡,則加速效果不明顯各級(jí)都有自己?jiǎn)为?dú)的工作單元加速提高吞吐率每條指令的執(zhí)行時(shí)間并沒(méi)有減少Chapter
4
—
The
Processor—
35評(píng)價(jià)流水線(xiàn)的量化指標(biāo)標(biāo)量流水線(xiàn)的主要性能1.吞吐率(Throughput
Rate)定義1:單位時(shí)間內(nèi)流水線(xiàn)執(zhí)行的指令條數(shù).定義2:單位時(shí)間內(nèi)流水線(xiàn)能流出的任務(wù)數(shù)或結(jié)果數(shù).2.加速比(Speed
up)
定義:流水方式相對(duì)于非流水(順序)方式速度提高的比值.3.效率(Efficiency)定義:指流水線(xiàn)中的設(shè)備(部件)的實(shí)際使用時(shí)間與
整體運(yùn)行時(shí)間之比,亦稱(chēng)流水線(xiàn)設(shè)備的利用率流水和ISA
設(shè)計(jì)MIPS指令集結(jié)構(gòu)適合流水處理的原因所有指令均為32位易于在一個(gè)周期內(nèi)完成存取和對(duì)比x86: 1-到17-位指令少而規(guī)整的指令格式一步內(nèi)完成
和 寄存器讀存操作的尋址器第三段計(jì)算地址,第四段器操作數(shù)對(duì)齊器 僅需一個(gè)周期ISA:
INSTRUCTION
SET
ARCHITECTUREChapter
4
—
The
Processor—
37下一條指令在下一個(gè)周期不能夠啟動(dòng),即為“
”,
分為三種情況:1、結(jié)構(gòu)需要的資源被占用(資源競(jìng)爭(zhēng))2、數(shù)據(jù)需要等待前面指令完成其數(shù)據(jù)讀寫(xiě)操作3、控制根據(jù)前面指令決定控制操作Chapter
4
—
The
Processor—
381
結(jié)構(gòu)
--資源使用在單器中的MIPS流水線(xiàn)讀存取所需數(shù)據(jù)指令返回只能阻塞到數(shù)據(jù) 周期結(jié)束將導(dǎo)致流水線(xiàn)冒泡流水的數(shù)據(jù)通路需要將指令和數(shù)據(jù)分別或者設(shè)計(jì)指令/數(shù)據(jù)高速緩存Chapter
4
—
The
Processor—
392
數(shù)據(jù)存在這樣的指令,必須依賴(lài)于之前的指令完成數(shù)據(jù)add$s0,$t0,$t1sub$t2,$s0,$t3Chapter
4
—
The
Processor—
402-1
轉(zhuǎn)發(fā)(旁路)需要使用計(jì)算后的結(jié)果結(jié)果尚未
到寄存器需要數(shù)據(jù)通路中其他連接(增加一條反饋線(xiàn))Chapter
4
—
The
Processor—
412-2.1
裝載數(shù)據(jù)無(wú)法通過(guò)轉(zhuǎn)發(fā)避免所有阻塞需要時(shí)尚未計(jì)算出結(jié)果轉(zhuǎn)發(fā)不能解決時(shí)間上的后推Chapter
4
—
The
Processor—
42重新安排代碼以避免流水線(xiàn)阻塞代碼重排以避免在下條指令中使用裝載結(jié)果C
code
for
A
=
B
+
E;
C
=
B
+
F;lw $t1,
0($t0)lw
$t2,
4($t0)add
$t3,
$t1,
$t2swlwaddsw$t3,
12($t0)$t4,
8($t0)$t5,
$t1,
$t4$t5,
16($t0)stallstalllw
$t1,
0($t0)lw
$t2,
4($t0)lw
$t4,
8($t0)addswaddsw$t3,$t3,$t5,$t5,$t1,
$t212($t0)$t1,
$t416($t0)11
cyclesChapter
4
—
The
Processor—
4313cycles12345678910111213lwIFIDEXMEMWBLwIFID$t0EXALUMEM$t2WB$t2addIFIDt2,t1EXMEMWBSwIFIDEXMEMWBLwIFID$t0EXMEM$t4WBaddIFIDt4,t1EXMEMWB$t5swIFIDEXMEMWB旁路方法解決,但是依然還要延后1拍12345678910111213lwIFIDEXMEMWBLwIFID$t0EXALUMEM$t2WB$t2lwIFID$t0EXMEM$t4WB$t4addIFIDt2,t1EXMEMWBswIFIDEXMEMWBaddIFIDt4,t1EXMEMWB$t5swIFIDEXMEMWB改變程序中指令順序,先寫(xiě)后讀(不 ,見(jiàn)226頁(yè))3
控制分支決定控制流程返回的下一條指令依賴(lài)于分支計(jì)算結(jié)果流水線(xiàn)并不總是能夠返回正確的指令一直停留在分支的ID段在MIPS流水線(xiàn)中需要比較兩個(gè)寄存器并提前計(jì)算目標(biāo)地址在ID段增加硬件Chapter
4
—
The
Processor—
46分支阻塞Chapter
4
—
The
Processor—
47在返回下一條指令前一直等待分支計(jì)算結(jié)果解決分支阻塞的方法--分支延長(zhǎng)流水線(xiàn)并不能提前確定分支結(jié)果阻塞懲罰是不可取的分支結(jié)果如果 錯(cuò)誤只能阻塞在MIPS流水線(xiàn)中可以 不跳轉(zhuǎn)的分支無(wú)延遲的返回分支后的指令Chapter
4
—
The
Processor—
48MIPS
中未發(fā)生的分支正確不正確Chapter
4
—
The
Processor—
49更理想的分支靜態(tài)分支基于典型的分支動(dòng)作例如:循環(huán)和if狀態(tài)跳轉(zhuǎn)已經(jīng)使用的分支沒(méi)有使用的分支向后向前動(dòng)態(tài)分支硬件衡量實(shí)際的分
為例如:保存每條分支的歷史記錄假定未來(lái)行為將繼續(xù)當(dāng)前趨勢(shì)當(dāng)出錯(cuò)后,在重新裝載時(shí)阻塞并更新歷史記錄Chapter
4
—
The
Processor—
50流水線(xiàn)總結(jié)流水線(xiàn)通過(guò)提高指令條的吞吐率改善性能并行執(zhí)行多條指令;每條指令的執(zhí)行時(shí)間都有相同的。受限結(jié)構(gòu),數(shù)據(jù),控制指令集的設(shè)計(jì)影響實(shí)現(xiàn)流水線(xiàn)的復(fù)雜性The
BIG
PictureThere
is
less
in
this
than
meets
the
eye.看起來(lái)東西很多,其實(shí)不然。Chapter
4
—
The
Processor—
51MIPS
的流水?dāng)?shù)據(jù)通路【p212】§4.6
Pipelined
Datapath
and
ControlWBMEM從右向左的數(shù)據(jù)流會(huì)導(dǎo)致取指令譯碼、讀 執(zhí)行、計(jì)寄存器 算地址器寫(xiě)回Chapter
4
—
The
Processor—
52流水線(xiàn)寄存器兩段之間需要寄存器寄存器作用:保留之前周期中產(chǎn)生的結(jié)果信息Chapter
4
—
The
Processor—
53流水線(xiàn)操作指令是一個(gè)周期接著一個(gè)周期(
Cycle-by-cycle
)的方式通過(guò)流水線(xiàn)的數(shù)據(jù)通路對(duì)比:多時(shí)鐘周期示意圖從流水線(xiàn)中抽出一個(gè)時(shí)鐘周期就表示了單時(shí)鐘周期流水線(xiàn)用于裝載/
的單時(shí)鐘周期示意圖Chapter
4
—
The
Processor—
54取指令I(lǐng)F,for
Load,
Store,…IF/ID寄存器為64位,是IR和PC的
體每個(gè)單時(shí)鐘周期中的流水示意圖在單時(shí)鐘周期中流水的操作陰影部分是本周期使用到的部件Chapter
4
—
The
Processor—
55指令譯碼ID,for
Load,
Store,…Chapter
4
—
The
Processor—
56ID/EX寄存器執(zhí)行指令EX,for
LoadChapter
4
—
The
Processor—
57MEM(器操作)
for
LoadChapter
4
—
The
Processor—
58寫(xiě)入WB,for
Load錯(cuò)誤的寄存器Chapter
4
—
The
Processor—
59修正的裝載數(shù)據(jù)通路Chapter
4
—
The
Processor—
60EX
for
StoreChapter
4
—
The
Processor—
61MEM
for
StoreChapter
4
—
The
Processor—
62WB
for
Store[空操作]參考
217頁(yè)-5:任何一條指令都必須經(jīng)過(guò)流水線(xiàn)的每一個(gè)步驟,即使在這個(gè)步驟中它實(shí)際什么都沒(méi)有做。WB寫(xiě)回Chapter
4
—
The
Processor—
63多時(shí)鐘周期示意圖【220頁(yè)】每個(gè)周期使用的資源清晰的體現(xiàn)在下表中這一段程序由5條指令組成Chapter
4
—
The
Processor—
64多時(shí)鐘周期示意圖【221】傳統(tǒng)的多時(shí)鐘周期流水線(xiàn)圖【時(shí)空?qǐng)D】時(shí)間Chapter
4
—
The
Processor—
65空間單時(shí)鐘周期示意圖Chapter
4
—
The
Processor—
66給定周期內(nèi)的流水狀態(tài)流水控制P222Chapter
4
—
The
Processor—
67Function字段的控制信號(hào)【222圖4-6參考P80圖2-19第三段op=000000時(shí),function功能】流水控制Chapter
4
—
The
Processor—
68來(lái)自指令的控制信號(hào)單周期內(nèi)的實(shí)現(xiàn)流水控制Chapter
4
—
The
Processor—
69ALU
指令中的數(shù)據(jù)考慮以下序列sub
$2,$1,$3and
$12,$2,$5or
$13,$6,$2add
$14,$2,$2sw
$15,100($2)通過(guò)轉(zhuǎn)發(fā)(旁路)方式解決數(shù)據(jù)怎樣確定何時(shí)轉(zhuǎn)發(fā)?§4.7
Data
Hazards:
Forwarding
vs.
StallingChapter
4
—
The
Processor—
70依賴(lài)&
轉(zhuǎn)發(fā)1a.
EX/MEM.RegisterRd=
ID/EX.RegisterRs目的寄存器 源寄存器IF/IDID/EXEX/MEM
MEM/WBChapter
4
—
The
Processor—
71檢測(cè)轉(zhuǎn)發(fā)的必要性通過(guò)流水線(xiàn)傳遞寄存器號(hào)例如:ID/EX的Rs寄存器=位于ID/EX流水線(xiàn)寄存器中的寄存器號(hào)EX段的ALU操作數(shù)寄存器號(hào)計(jì)算如下:ID/EX的源寄存器Rs,ID/EX的目的寄存器Rt以下情況存在數(shù)據(jù)1a.
EX/MEM.RegisterRd=
ID/EX.RegisterRs1b.
EX/MEM.RegisterRd
=
ID/EX.RegisterRt2a.
MEM/WB.RegisterRd
=
ID/EX.RegisterRs2b.
MEM/WB.RegisterRd
=
ID/EX.RegisterRt來(lái)自EX/MEM流水線(xiàn)寄存器的轉(zhuǎn)發(fā)來(lái)自MEM/WB流水線(xiàn)寄存器的轉(zhuǎn)發(fā)Chapter
4
—
The
Processor—
72Chapter
4
—
The
Processor—
73檢測(cè)轉(zhuǎn)發(fā)必要性?xún)H當(dāng)轉(zhuǎn)發(fā)指令將寫(xiě)入寄存器!EX/MEM.RegWrite,
MEM/WB.RegWrite并且僅當(dāng)寄存器寫(xiě)指令的Rd不是0EX/MEM.RegisterRd
≠
0,MEM/WB.RegisterRd
≠
0轉(zhuǎn)發(fā)路徑Chapter
4
—
The
Processor—
74轉(zhuǎn)發(fā)條件EXif
(EX/MEM.RegWrite
and
(EX/MEM.RegisterRd
≠
0)and
(EX/MEM.RegisterRd
=
ID/EX.RegisterRs))ForwardA
=
10if
(EX/MEM.RegWrite
and
(EX/MEM.RegisterRd
≠
0)and
(EX/MEM.RegisterRd=
ID/EX.RegisterRt))ForwardB
=
10MEMif
(MEM/WB.RegWrite
and
(MEM/WB.RegisterRd
≠
0)and
(MEM/WB.RegisterRd
=
ID/EX.RegisterRs))ForwardA
=
01if(MEM/WB.RegWrite
and
(MEM/WB.RegisterRd
≠0)and
(MEM/WB.RegisterRd
=
ID/EX.RegisterRt))ForwardB
=
01Chapter
4
—
The
Processor—
75雙重?cái)?shù)據(jù)考慮以下序列:add$1,$1,$2add$1,$1,$3add$1,$1,$4兩種
同時(shí)發(fā)生將使用最近的結(jié)果修訂MEM如果EX條件條件錯(cuò)誤,只有轉(zhuǎn)發(fā)Chapter
4
—
The
Processor—
76修訂轉(zhuǎn)發(fā)條件MEMif
(MEM/WB.RegWrite
and
(MEM/WB.RegisterRd
≠
0)and
not
(EX/MEM.RegWrite
and(EX/MEM.RegisterRd
≠
0)and
(EX/MEM.RegisterRd
=
ID/EX.RegisterRs))and
(MEM/WB.RegisterRd
=
ID/EX.RegisterRs))ForwardA
=
01if(MEM/WB.RegWrite
and
(MEM/WB.RegisterRd
≠0)and
not
(EX/MEM.RegWrite
and(EX/MEM.RegisterRd
≠
0)and
(EX/MEM.RegisterRd=
ID/EX.RegisterRt))and
(MEM/WB.RegisterRd
=
ID/EX.RegisterRt))ForwardB
=
01Chapter
4
—
The
Processor—
77帶轉(zhuǎn)發(fā)的數(shù)據(jù)通路Chapter
4
—
The
Processor
—78裝載使用數(shù)據(jù)阻塞一個(gè)周期Chapter
4
—
The
Processor—
79Load-Use檢測(cè)時(shí)檢查當(dāng)運(yùn)行指令在ID段給出ID段中的ALU
操作數(shù)寄存器號(hào)IF/ID.RegisterRs,
IF/ID.RegisterRt當(dāng)出現(xiàn)以下情況時(shí)存在裝載使用ID/EX.MemRead
and((ID/EX.RegisterRt
=
IF/ID.RegisterRs)
or(ID/EX.RegisterRt
=
IF/ID.RegisterRt))如果檢測(cè)到,阻塞并
氣泡Chapter
4
—
The
Processor—
80怎樣阻塞流水線(xiàn)將ID/EX
寄存器中的值強(qiáng)制為0EX,MEMand
WB
空操作更新
PC
和IF/ID
寄存器中的值將執(zhí)行中的指令重新再次取出后續(xù)指令1個(gè)周期的阻塞后允許MEM
數(shù)據(jù)接下來(lái)可以轉(zhuǎn)發(fā)到EX段Chapter
4
—
The
Processor—
81流水線(xiàn)中的阻塞和冒泡阻塞Chapter
4
—
The
Processor—
82流水線(xiàn)中的阻塞和冒泡Chapter
4
—
The
Processor—
83或,更精確帶有檢測(cè)的數(shù)據(jù)通路Chapter
4
—
The
Processor—
84阻塞及其性能阻塞會(huì)降低性能但是為了得到正確結(jié)果這是必要的編譯器可以重排代碼以避免
和阻塞流水線(xiàn)結(jié)構(gòu)的必要知識(shí)The
BIG
PictureChapter
4
—
The
Processor—
85分支MEM中確定If
分支結(jié)果§4.8
Control
HazardsPC清除這些指令(設(shè)置控制值為0)Chapter
4
—
The
Processor—
86Chapter
4
—
The
Processor—
87縮短分支延遲設(shè)置硬件將將分支地址計(jì)算提前到ID
段目標(biāo)地址加法器寄存器比較器例如:分支發(fā)生時(shí)的執(zhí)行情況36:sub$10,$4,$840:beq$1,$3,744:and$12,$2,$548:or$13,$2,$652:add$14,$4,$256:slt$15,$6,$7...72:
lw$4,
50($7)例如:分支發(fā)生Chapter
4
—
The
Processor—
88例如:分支發(fā)生Chapter
4
—
The
Processor—
89分支中的數(shù)據(jù)如果分支指令中所需的操作數(shù)是之前的數(shù)邏運(yùn)算指令的第二或第三個(gè)操作數(shù)…IFIDEX
MEM
WBIFIDEX
MEM
WBIFIDEX
MEM
WBIFIDEX
MEM
WBadd
$4,
$5,
$6add
$1,
$2,$3beq
$1,
$4,則使用轉(zhuǎn)發(fā)可以解決數(shù)據(jù)Chapter
4
—
The
Processor—
90分支中的數(shù)據(jù)分支指令中所需的操作數(shù)是之前數(shù)邏運(yùn)算的執(zhí)行結(jié)果,或者是之前的裝載指令的第二個(gè)操作數(shù)需要一個(gè)阻塞周期beq
stalledIFIDEXMEMWBIFIDEXMEMWBIFIDIDEXMEMWBadd
$4,
$5,$6lw
$1,
addrbeq
$1,$4,Chapter
4
—
The
Processor—
91分支中的數(shù)據(jù)分支指令中所需的操作數(shù)是立即數(shù)裝載指令的執(zhí)行結(jié)果需要阻塞兩個(gè)周期beq
stalledIFIDEXMEMWBIFIDIDIDEXMEMWBbeq
stalledlw
$1,
addrbeq
$1,$0,Chapter
4
—
The
Processor—
92動(dòng)態(tài)分支在更深的和動(dòng)態(tài)分支量流水線(xiàn)中,分支代價(jià)將增加分支 緩存
(或分支歷史記錄表)按照分支指令地址索引的
區(qū)包含分支是否發(fā)生的標(biāo)志位分支的執(zhí)行過(guò)程查表,采用具有相同的分支設(shè)置返回的目標(biāo)地址如果假設(shè)錯(cuò)誤,則刪除
錯(cuò)誤的指令,返回原來(lái)的位置重新取指執(zhí)行位取反Chapter
4
—
The
Processor—
931位位:缺陷循環(huán)分支將被錯(cuò)誤兩次!…,
inner…outer:
……inner:
……beq
…,beq
…,…,
outer在 循環(huán)結(jié)束時(shí)被錯(cuò)誤
一次在第一次循環(huán)迭代時(shí)發(fā)生錯(cuò)誤,因?yàn)樵谘h(huán)的上一次迭代時(shí)被前一個(gè)執(zhí)行設(shè)置為不執(zhí)行,于是導(dǎo)致分支 錯(cuò)誤Chapter
4
—
The
Processor—
94兩位
位只有連續(xù)兩次錯(cuò)誤時(shí)才改變位Chapter
4
—
The
Processor—
95計(jì)算分支目標(biāo)地址盡管設(shè)置分支
位,仍然需要計(jì)算分支目標(biāo)地址分支發(fā)生需要一個(gè)時(shí)鐘周期的開(kāi)銷(xiāo)分支目標(biāo)緩存目標(biāo)地址緩存目標(biāo)指令緩存(當(dāng)指令返回時(shí)對(duì)PC索引)如果
的分支發(fā)生,立刻取出目標(biāo)指令Chapter
4
—
The
Processor—
96異常和“Unexpected”
events
requiring
changein
flow
of
controlDifferent
ISAs
use
the
termsdifferentlyExceptionAriseswithin
the
CPUe.g.,
undefined
opcode,
overflow,
syscall,
…InterruptFroman
externalI/O
controllerDealing
with
them
without
sacrificingperformance
is
hard§4.9ExceptionsChapter
4
—
The
Processor—
97Chapter
4
—
The
Processor—
98Handling
ExceptionsIn
MIPS,
exceptions
managed
by
a
SystemControl
Coprocessor
(CP0)Save
PC
of
offending
(or
interrupted)
instructionIn
MIPS:
Exception
Program
Counter
(EPC)Save
indication
of
the
problemIn
MIPS:
Cause
registerWe’ll
assume
1-bit0
for
undefined
opcode,
1
for
overflowJump
to
handler
at
8000
00180ternate
MechanismVectored
InterruptsHandleraddress
determined
by
thecauseExample:C000
0000C000
0020C000
0040Undefined
opcode:Overflow:…:Instructions
eitherDealwith
the
interrupt,
orJump
to
real
handlerChapter
4
—
The
Processor—
99Chapter
4
—
The
Processor—
100Handler
ActionsRead
cause,
and
transfer
to
relevanthandlerDetermine
action
requiredIf
restartableTake
corrective
actionuse
EPC
to
returnto
programOtherwiseTerminate
programReport
error
using
EPC,
cause,
…Chapter
4
—
The
Processor—
101Exceptions
in
a
PipelineAnother
form
of
control
hazardConsider
overflow
on
add
in
EX
stageadd
$1,
$2,
$1Prevent
$1
from
beingclobberedComplete
previous
instructionsFlush
add
and
subsequent
instructionsSet
Cause
and
EPC
register
valuesTransfer
control
tohandlerSimilar
to
mispredicted
branchUse
much
of
the
same
hardwarePipeline
with
ExceptionsChapter
4
—
The
Processor—
102Chapter
4
—
The
Processor—
103Exception
PropertiesRestartable
exceptionsPipeline
can
flush
the
instructionHandlerexecutes,
then
returns
to
theinstructionRefetched
and
executed
from
scratchPC
saved
in
EPC
registerIdentifies
causing
instructionActually
PC
+4issavedHandler
must
adjustChapter
4
—
The
Processor—
104Exception
ExampleException
on
add
in40sub$11,$2,$444and$12,$2,$548or$13,$2,$64Cadd$1,$2,$150slt$15,$6,$7lw$16,50($7)54…Handler80000180sw$25,1000($0)80000184sw$26,1004($0)…Exception
ExampleChapter
4
—
The
Processor—
105Exception
ExampleChapter
4
—
The
Processor—
106Multiple
ExceptionsPipelinin erlaps
multiple
instructionsCould
have
multiple
exceptions
at
onceSimple
approach:
deal
with
exception
fromearliest
instructionFlushsubsequent
instructions“Precise”
exceptionsIn
complex
pipelinesMultiple
instructions
issued
per
cycleOut-of-order
completionMaintaining
precise
exceptions
is
difficult!Chapter
4
—
The
Processor—
107Chapter
4
—
The
Processor—
108Imprecise
ExceptionsJust
stop
pipeline
and
save
stateIncluding
exception
cause(s)Let
the
handler
work
outWhich
instruction(s)had
exceptionsWhich
to
completeorflushMay
require
“manual”
completionSimplifies
hardware,but
more
complex
handlersoftwareNot
feasible
for
complex
multiple-issueout-of-order
pipelinesInstruction-Level
Parallelism
(ILP)Pipelining:
executing
multiple
instructions
inparallelTo
increase
ILPDeeperpipelineLess
work
per
stage
shorter
clock
cycleMultiple
issueReplicate
pipeline
stages
multiple
pipelinesStart
multiple
instructions
per
clock
cycleCPI
<
1,
so
use
Instructions
Per
Cycle
(IPC)E.g.,
4GHz
4-waymultiple-issue16
BIPS,
peak
CPI
=
0.25,
peak
IPC
=
4But
dependencies
reduce
this
in
practice§4.10
Parallelism
and
Advanced
Instruction
Level
ParallelismChapter
4
—
The
Processor—
109Chapter
4
—
The
Processor—
110Multiple
IssueStatic
multiple
issueCompiler
groups
instructions
to
be
issued
togetherPackages
them
into
“issue
slots”Compilerdetects
and
avoids
hazardsDynamic
multiple
issueCPU
examines
instruction
stream
and
choosesinstructions
to
issue
eachcycleCompiler
can
help
by
reordering
instructionsCPU
resolves
hazards
using
advanced
techniques
atruntimeChapter
4
—
The
Processor—
111Speculation“Guess”
whatto
do
with
an
instructionStart
operation
assoon
aspossibleCheck
whether
guess
was
rightIf
so,
complete
the
operationIf
not,
roll-back
and
do
the
right
thingCommonto
static
and
dynamic
multiple
issueExamplesSpeculate
onbranch
eRoll
back
if
path
taken
is
differentSpeculate
onloadRoll
back
if
location
is
updatedChapter
4
—
The
Processor—
112Compiler/Hardware
SpeculationCompiler
can
reorder
instructionse.g.,
move
load
before
branchCaninclude
“fix-up”
instructions
to
recoverfrom
incorrectguessHardware
can
look
ahead
for
instructionsto
executeBuffer
results
untilit
determines
theyareactually
neededFlush
buffers
on
incorrect
speculationSpeculation
and
ExceptionsWhat
if
exception
occurs
on
aspeculatively
executed
instruction?e.g.,
speculative
load
before
null-pointercheckStatic
speculationCan
add
ISA
support
for
deferring
exceptionsDynamic
speculationCan
buffer
exceptions
until
instructioncompletion
(whi ay
not
occur)Chapter
4
—
The
Processor—
113Chapter
4
—
The
Processor—
114Static
Multiple
IssueCompiler
groups
instructions
into
“issuepackets”Group
ofinstructions
that
can
be
issued
on
asingle
cycleDeterminedby
pipelineresources
requiredThink
of
an
issue
packet
as
a
very
longinstructionSpecifies
multipleconcurrent
operations
Very
LongInstruction
Word
(VLIW)Chapter
4
—
The
Processor—
115Scheduling
Static
Multiple
IssueCompiler
must
remove
some/all
hazardsReorder
instructions
into
issuepacketsNo
dependencies
with
a
packetPossibly
some
dependenciesbetweenpacketsVaries
betweenISAs;
compiler
must
know!Pad
withnopif
necessaryChapter
4
—
The
Processor—
116MIPS
with
Static
Dual
IssueTwo-issue
packetsOne
ALU/branch
instructionOne
load/store
instruction64-bit
alignedALU/branch,
then
load/storePad
an
unused
instruction
with
nopAddressInstruction
typePipeline
StagesnALU/branchIFIDEXMEMWBn
+
4Load/storeIFIDEXMEMWBn
+
8ALU/branchIFIDEXMEMWBn
+
12Load/storeIFIDEXMEMWBn
+
16ALU/branchIFIDEXMEMWBn
+
20Load/storeIFIDEXMEMWBMIPS
with
Static
Dual
IssueChapter
4
—
The
Processor—
117Chapter
4
—
The
Processor—
118Hazards
in
the
Dual-Issue
MIPSMore
instructions
executing
inparallelEX
data
hazardForwarding
avoided
stalls
with
single-issueNow
can’t
use
ALU
result
in
load/store
in
same
packetadd
$t0,
$s0,
$s1load
$s2,
0($t0)Split
into
two
packets,
effectively
a
stallLoad-use
hazardStill
one
cycle
use
latency,
but
now
two
instructionsMore
aggressive
scheduling
requiredChapter
4
—
The
Processor—
119Scheduling
ExampleSchedule
this
for
dual-issue
MIPSLoop:
lw$t0,
0($s1)#$t0=array
elementaddu$t0,
$t0,
$s2#add
scalar
in
$s2sw$t0,
0($s1)#
store
resultaddi$s1,
$s1,–4#decrement
pointerbne$s1,
$zero,Loop#
branch
$s1!=0ALU/branchLoad/storecycleLoop:noplw
$t0,
0($s1)1addi
$s1,
$s1,–4nop2addu
$t0,
$t0,$s2nop3IbPneC
$=s15,
/4$ze=ro1,
.L2o5op
(cs.wf.
p$et0a,k4(I$Ps1C)
=2)4Loop
UnrollingReplica oop
body
to
expose
moreparallelismReduces
loop-control
overheadUse
different
registers
per
replicationCalled
“register
renaming”Avoidloop-carried“anti-dependencies”Store
followed
by
a
load
of
the
same
registerAka
“name
dependence”Reuse
of
a
register
nameChapter
4
—
The
Processor—
120Chapter
4
—
The
Processor—
121Loop
Unrolling
ExampleIALU/branchLoad/storecycleLoop:addi$s1,
$s1,–16lw$t0,
0($s1)1noplw$t1,
12($s1)2addu$t0,
$t0,$s2lw$t2,
8($s1)3addu$t1,
$t1,
$s2lw$t3,
4($s1)4addu$t2,
$t2,$s2sw$t0,
16($s1)5addu$t3,
$t4,$s2sw$t1,
12($s1)6PC
=n1o4p
/8
=1.75sw$t2,
8($s1)7Closbenre
to$2s1,,b$uztearot,coLosotpofsrwegis$tte3,rs4(a$nsd1)codes8izeChapter
4
—
The
Processor—
122Dynamic
Multiple
Issue“Superscalar”
processorsCPU
decides
whether
to
issue
0,
1,
2,
…each
cycleAvoiding
structural
and
data
hazardsAvoids
the
need
for
compiler
schedulingThough
it
may
still
helpCode
semantics
ensured
by
the
CPUChapter
4
—
The
Processor—
123Dynamic
Pipeline
SchedulingAllow
the
CPU
to
execute
instructions
outof
order
to
avoid
stallsBut
commit
result
to
registers
inorderExamplelw$t0,20($s2)addu$t1,$t0,
$t2sub$s4,$s4,
$t3slti$t5,$s4,
20Can
start
sub
while
addu
is
waiting
for
lwDynamically
Scheduled
CPUResults
also
sent
to
any
waitingreservation
stationsReorders
buffer
forregister
writesCan
supplyoperands
forissued
instructionsChapter
4
—
The
Processor—
124PreservesdependenciesHold
pendingoperandsChapter
4
—
The
Processor—
125Register
RenamingReservation
stations
and
reorder
buffereffectively
provide
register
renamingOn
instruction
issue
to
reservation
stationIf
operandis
available
in
register
file
orreorder
bufferCopied
to
reservation
stationNo
longer
required
inthe
register;
can
beoverwrittenIf
operand
is
not
yet
availableIt
will
be
provided
to
the
reservationstation
by
afunctionunitRegisterupdate
may
not
be
requiredChapter
4
—
The
Processor—
126SpeculationPredict
branch
and
continue
issuingDon’t
commit
until
branchdeterminedLoad
speculationAvoidloadand
cache
miss
delayPredict
the
effective
addressPredict
loaded
valueeLoad
before
completing
outstanding
storesBypass
stored
values
to
load
unitDon’t
commit
load
until
speculation
clearedChapter
4
—
The
Processor—
127Why
Do
Dynamic
Scheduling?Why
not
just
let
the
compiler
schedulecode?Not
all
stalls
are
predicablee.g.,
cache
missesCan’t
always
schedule
around
branchesBranch e
is
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 分公司合規(guī)聯(lián)系人工作實(shí)務(wù)講解
- 2.1《立在地球邊上放號(hào)》課件 2024-2025學(xué)年統(tǒng)編版高中語(yǔ)文必修上冊(cè)
- 河南省八市重點(diǎn)高中2025屆高三第五次模擬考試英語(yǔ)試卷含解析
- 北師大長(zhǎng)春附屬學(xué)校2025屆高考沖刺模擬數(shù)學(xué)試題含解析
- 甘肅省嘉峪關(guān)市2025屆高三第六次模擬考試英語(yǔ)試卷含解析
- 遼寧省清原中學(xué)2025屆高三第一次調(diào)研測(cè)試英語(yǔ)試卷含解析
- 四川省仁壽縣城北教學(xué)點(diǎn)2025屆高三第四次模擬考試數(shù)學(xué)試卷含解析
- 2025屆黑龍江省鶴崗市工農(nóng)區(qū)第一中學(xué)高三考前熱身英語(yǔ)試卷含解析
- 四川雙流棠湖中學(xué)2025屆高考語(yǔ)文必刷試卷含解析
- 江蘇省丹陽(yáng)市丹陽(yáng)高級(jí)中學(xué)2025屆高三第一次調(diào)研測(cè)試數(shù)學(xué)試卷含解析
- 制作同軸電纜接頭的方法課件
- 完整版鋼箱梁安裝及疊合梁施工
- 長(zhǎng)亞自動(dòng)定位打孔機(jī)使用說(shuō)明書(shū)
- 第六章、船舶通信設(shè)備
- 造價(jià)咨詢(xún)歸檔清單
- 淺談如何抓好重點(diǎn)項(xiàng)目前期工作
- 智慧樹(shù)知到《配位化學(xué)本科生版》章節(jié)測(cè)試答案
- 捐贈(zèng)合同協(xié)議書(shū)范本 紅十字會(huì)
- 4.機(jī)電安裝項(xiàng)目質(zhì)量目標(biāo)與控制措施
- 內(nèi)蒙古呼和浩特市中小學(xué)生家長(zhǎng)營(yíng)養(yǎng)知識(shí)現(xiàn)狀調(diào)查
- 鹽堿地改良標(biāo)準(zhǔn)及方法
評(píng)論
0/150
提交評(píng)論