英文版課件最終版本-chapter4處理器

上傳人：我*** IP屬地：北京上傳時(shí)間：2022-11-25 格式：PPTX 頁(yè)數(shù)：134 大?。?.02MB 積分：18 舉報(bào) 版權(quán)申訴

已閱讀5頁(yè)，還剩129頁(yè)未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說(shuō)明：本文檔由用戶(hù)提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

Chapter

4處理器概要決定CPU性能的三個(gè)關(guān)鍵因素指令數(shù)目取決于指令集結(jié)構(gòu)(ISA)和編譯器每條指令所需時(shí)鐘周期數(shù)CPI時(shí)鐘周期長(zhǎng)度介紹兩種MIPS實(shí)現(xiàn)方式簡(jiǎn)單形式更實(shí)際的流水線(xiàn)形式一個(gè)基本的MIPS實(shí)現(xiàn)器

指令：lw,sw數(shù)邏運(yùn)算指令：add,

sub,

and,

or,

slt(小于則設(shè)置)分支指令：beq,j§4.1Introduction取決于

CPU

硬件Chapter

—

The

Processor—

2指令的執(zhí)行程序計(jì)數(shù)器(PC)

指令所在的單元,寄存器數(shù)目并從中取出指令取字指令

一個(gè)寄存器，其他指令一般需要兩個(gè)寄存器后續(xù)指令的執(zhí)行取決于指令類(lèi)型算術(shù)邏輯單元(ALU)的運(yùn)算得到算術(shù)運(yùn)算結(jié)果器計(jì)算分支指令目標(biāo)地址/

器PC

目標(biāo)地址或

4Chapter

—

The

Processor—

3CPU

概覽Chapter

—

The

Processor—

4多選器Chapter

—

The

Processor—

5這里不是簡(jiǎn)單的電路相連使用多選器控制Chapter

—

The

Processor—

6邏輯設(shè)計(jì)基礎(chǔ)§4.2Logic

Design

Conventions二進(jìn)制信息編碼低電平=0,高電平=1一線(xiàn)一位多位數(shù)據(jù)編碼在數(shù)據(jù)總線(xiàn)組合部件對(duì)數(shù)據(jù)操作輸出是輸入的函數(shù)狀態(tài)（時(shí)序）部件信息Chapter

—

The

Processor—

7組合部件與門(mén)Y

BABYI0I1Mu

YxS多選器Y

I0ABY+BYALU數(shù)邏運(yùn)算單元Y

F(A,

B)AChapter

—

The

Processor—

8F加法器Y

+B時(shí)序部件寄存器：將數(shù)據(jù)

在電路里時(shí)鐘信號(hào)決定數(shù)據(jù)更新時(shí)刻邊沿觸發(fā)：當(dāng)Clk從0變?yōu)?時(shí)更新數(shù)據(jù)DClkQClkDQChapter

—

The

Processor—

9時(shí)序部件帶有寫(xiě)信號(hào)控制的寄存器僅在時(shí)鐘邊沿且寫(xiě)控制信號(hào)為1時(shí)更新數(shù)據(jù)當(dāng)數(shù)據(jù)被寫(xiě)入時(shí)候使用DWriteClkQWriteChapter

—

The

Processor—

10DQClk時(shí)鐘方法組合邏輯在時(shí)鐘周期內(nèi)的數(shù)據(jù)變換關(guān)系邊沿觸發(fā)的時(shí)鐘從狀態(tài)單元輸入，輸出到狀態(tài)單元時(shí)鐘周期長(zhǎng)度決定了最長(zhǎng)延時(shí)組合邏輯、狀態(tài)單元和時(shí)鐘周期的關(guān)系Chapter

—

The

Processor—

11邊沿觸發(fā)，支持狀態(tài)單元在同一個(gè)時(shí)鐘周期內(nèi)同時(shí)讀寫(xiě)建立數(shù)據(jù)通路數(shù)據(jù)通路CPU中處理數(shù)據(jù)和地址的部件寄存器，數(shù)邏運(yùn)算單元，多選器，逐步構(gòu)建MIPS數(shù)據(jù)通路器等通過(guò)逐步優(yōu)化總體設(shè)計(jì)理解MIPS數(shù)據(jù)通路§4.3

Building

aDatapathChapter

—

The

Processor—

12取指令32位寄存器Chapter

—

The

Processor—

13下一條指令地址為：PC+4寄存器型指令兩個(gè)寄存器操作數(shù)執(zhí)行數(shù)邏運(yùn)算將運(yùn)算結(jié)果寫(xiě)入寄存器oprsrtrdshamtfunct0R型指令格式Chapter

—

The

Processor—

14讀存指令讀寄存器操作數(shù)16位偏移量地址使用ALU，計(jì)算有符擴(kuò)展偏移量：讀器更新寄存器：將寄存器內(nèi)容寫(xiě)入器B有符號(hào)擴(kuò)展單元oprsrtaddress35或43存取型指令格式Chapter

—

The

Processor—

15分支指令寄存器操作數(shù)比較兩個(gè)操作數(shù)使用ALU做減法，檢查ALU的零輸出信號(hào)計(jì)算目標(biāo)地址有符擴(kuò)展偏移量左移2位(以字為單位的偏移量)【參考P45】加到PC+4取指計(jì)算出PC+4(下一條指令的地址)oprsrtaddress4(eq)分支型指令格式Chapter

—

The

Processor—

16分支指令僅為重新定位線(xiàn)路符號(hào)位線(xiàn)路opaddress2(跳轉(zhuǎn))分支型指令格式Chapter

—

The

Processor—

17組成部件第一段數(shù)據(jù)通路需要在一個(gè)時(shí)鐘周期內(nèi)完成一條指令每個(gè)數(shù)據(jù)通路部件一次只能完成一個(gè)功能所以指令和數(shù)據(jù)需要分開(kāi)多選器為不同指令選擇數(shù)據(jù)源Chapter

—

The

Processor—

18寄存器型、讀存取數(shù)據(jù)通路Chapter

—

The

Processor—

19P191，圖4-10完整的數(shù)據(jù)通路【P192】Chapter

—

The

Processor—

20數(shù)邏控制【P192,ALU控制4.4.1】利用ALU讀存：F=加法分支：F=減法寄存器型：F由功能字段控制§4.4

Simple

Implementation

SchemeChapter

—

The

Processor—

21ALUcontrolFunction0000AND0001OR0010add0110subtract0111set-on-less-than1100NORChapter

—

The

Processor—

22數(shù)邏控制ALUOp中有2位操作碼組合邏輯驅(qū)動(dòng)數(shù)邏控制opcodeALUOpOperationfunctALU

functionALU

controllw00load

wordXXXXXXadd0010sw00store

wordXXXXXXadd0010beq01branch

equalXXXXXXsubtract0110R-type10add100000add0010subtract100010subtract0110AND100100AND0000OR100101OR0001set-on-less-than101010set-on-less-than0111主要控制單元指令中的控制信號(hào)0rsrtrdshamtfunct31:265:025:2120:1615:1110:635

43rsrtaddress31:2625:2120:1615:04rsrtaddress31:2625:2120:1615:0R-typeLoad/StoreBranch操作碼一直除load指令外讀數(shù)據(jù)寄存器型指令寫(xiě)數(shù)據(jù)并裝入符號(hào)擴(kuò)展和加法Chapter

—

The

Processor—

23帶控制信號(hào)的數(shù)據(jù)通路【p196】Chapter

—

The

Processor—

241、部件2、數(shù)據(jù)通路3、控制信號(hào)寄存器型指令【p198圖4-19】R型指令數(shù)據(jù)通路add

$t1,$t2,$t3器中取出指1）從指令令，PC自增。

2）從寄存器堆中讀出寄存器$t2和$t3；主控制單元計(jì)算出個(gè)控制信號(hào)的狀態(tài)。3）ALU根據(jù)funct字段(5:0位)確定ALU的功能。對(duì)從寄存器堆讀出的數(shù)據(jù)進(jìn)行操作4）將ALU的結(jié)果寫(xiě)入寄存器堆，根據(jù)指令的

15:11位選擇目標(biāo)寄存器（$t1）.Chapter

—

The

Processor—

25指令【p198圖4-20】指令數(shù)據(jù)通路Lw

$t1,offset($t2)器取指，從指令

PC自增。從寄存器堆讀出寄存器$t2的值。ALU將從寄存器堆讀出的值與符號(hào)擴(kuò)展后的指令低16位值（offset）相加。將ALU的結(jié)果作為數(shù)據(jù)存儲(chǔ)器的地址。5）單元的數(shù)據(jù)寫(xiě)入寄存器堆，目標(biāo)寄存器由指令的20:16位（$t1）

。Chapter

—

The

Processor—

26分支指令【p198圖4-21】分支指令數(shù)據(jù)通路beq

$t1,$t2,offset器中取指，從指令

PC自增。從寄存器堆讀出寄存器$t1和$t2的值。ALU將從寄存器堆讀出的兩數(shù)相減。PC+4的值與符號(hào)擴(kuò)展并左移2位后的指令低16位（offset）相加。根據(jù)ALU的零輸出決定哪個(gè)加法器的結(jié)果存入PC中。Chapter

—

The

Processor—

27執(zhí)行跳轉(zhuǎn)利用字地址跳轉(zhuǎn)用以下數(shù)更新PC原PC

的高四位26位跳轉(zhuǎn)地址00從操作碼中獲取的其他控制信號(hào)2address31:2625:0Jumpaddress25:0

27:2，00PC31：28加Chapter

—

The

Processor—

28加法跳轉(zhuǎn)數(shù)據(jù)通路【p201圖4-24】Chapter

—

The

Processor—

29性能最長(zhǎng)延遲決定時(shí)鐘周期關(guān)鍵通路：

指令指令器

寄存器堆

數(shù)邏單元

數(shù)據(jù)器

寄存器堆不同指令不可能使用不同的時(shí)鐘周期單周期指令集設(shè)計(jì)方法

設(shè)計(jì)原則對(duì)常用的指令加速，是設(shè)計(jì)指令遵守的原則流水線(xiàn)方式提高性能Chapter

—

The

Processor—

30流水線(xiàn)流水的干洗店：交錯(cuò)執(zhí)行并行提高性能§4.5

Overview

PipeliningChapter

—

The

Processor—

31四次裝載=

8/3.5

=2.3無(wú)停頓:加速比=

2n/0.5n

1.5

≈

number

stages(流水線(xiàn)的級(jí)數(shù))1、把衣服放入洗衣機(jī)；2、把洗好的衣服取出放入烘干機(jī)；3、烘干好的衣服疊起來(lái)；4、把衣服收好。共計(jì)4個(gè)操作，每個(gè)操作時(shí)間是半小時(shí)，0.5×4×4=8小時(shí)。即4人順序洗衣的時(shí)間是8小時(shí)。MIPS

流水線(xiàn)5段，每段一步IF:

從

器ID:

指令

并指令寄存器EX:執(zhí)行操作或計(jì)算地址MEM:

器操作數(shù)WB:將結(jié)果寫(xiě)回寄存器Chapter

—

The

Processor—

32Chapter

—

The

Processor—

33流水線(xiàn)性能假定每段的時(shí)間是：寄存器讀寫(xiě)用時(shí)100ps其他段用時(shí)200ps流水線(xiàn)數(shù)據(jù)通路與單周期數(shù)據(jù)通路比較InstrInstr

fetchRegisterreadALUopMemoryaccessRegisterwriteTotal

timelw200ps100

ps200ps200ps100

ps800pssw200ps100

ps200ps200ps700psR-format200ps100

ps200ps100

ps600psbeq200ps100

ps200ps500ps流水線(xiàn)性能Single-cycle(Tc=800ps)單周期Pipelined(Tc=200ps)流水線(xiàn)Chapter

—

The

Processor—

34流水線(xiàn)加速如果所有的段是平衡的（例如時(shí)間相等）則：流水線(xiàn)中指令間的時(shí)間=非流水線(xiàn)中的指令間時(shí)間段數(shù)如果各段不平衡，則加速效果不明顯各級(jí)都有自己?jiǎn)为?dú)的工作單元加速提高吞吐率每條指令的執(zhí)行時(shí)間并沒(méi)有減少Chapter

—

The

Processor—

35評(píng)價(jià)流水線(xiàn)的量化指標(biāo)標(biāo)量流水線(xiàn)的主要性能1.吞吐率(Throughput

Rate)定義1:單位時(shí)間內(nèi)流水線(xiàn)執(zhí)行的指令條數(shù).定義2:單位時(shí)間內(nèi)流水線(xiàn)能流出的任務(wù)數(shù)或結(jié)果數(shù).2.加速比(Speed

up)

定義:流水方式相對(duì)于非流水(順序)方式速度提高的比值.3.效率(Efficiency)定義:指流水線(xiàn)中的設(shè)備(部件)的實(shí)際使用時(shí)間與

整體運(yùn)行時(shí)間之比,亦稱(chēng)流水線(xiàn)設(shè)備的利用率流水和ISA

設(shè)計(jì)MIPS指令集結(jié)構(gòu)適合流水處理的原因所有指令均為32位易于在一個(gè)周期內(nèi)完成存取和對(duì)比x86: 1-到17-位指令少而規(guī)整的指令格式一步內(nèi)完成

和寄存器讀存操作的尋址器第三段計(jì)算地址，第四段器操作數(shù)對(duì)齊器僅需一個(gè)周期ISA:

INSTRUCTION

SET

ARCHITECTUREChapter

—

The

Processor—

37下一條指令在下一個(gè)周期不能夠啟動(dòng)，即為“

”，

分為三種情況：1、結(jié)構(gòu)需要的資源被占用（資源競(jìng)爭(zhēng)）2、數(shù)據(jù)需要等待前面指令完成其數(shù)據(jù)讀寫(xiě)操作3、控制根據(jù)前面指令決定控制操作Chapter

—

The

Processor—

381

結(jié)構(gòu)

--資源使用在單器中的MIPS流水線(xiàn)讀存取所需數(shù)據(jù)指令返回只能阻塞到數(shù)據(jù) 周期結(jié)束將導(dǎo)致流水線(xiàn)冒泡流水的數(shù)據(jù)通路需要將指令和數(shù)據(jù)分別或者設(shè)計(jì)指令/數(shù)據(jù)高速緩存Chapter

—

The

Processor—

392

數(shù)據(jù)存在這樣的指令，必須依賴(lài)于之前的指令完成數(shù)據(jù)add$s0,$t0,$t1sub$t2,$s0,$t3Chapter

—

The

Processor—

402-1

轉(zhuǎn)發(fā)（旁路）需要使用計(jì)算后的結(jié)果結(jié)果尚未

到寄存器需要數(shù)據(jù)通路中其他連接(增加一條反饋線(xiàn))Chapter

—

The

Processor—

412-2.1

裝載數(shù)據(jù)無(wú)法通過(guò)轉(zhuǎn)發(fā)避免所有阻塞需要時(shí)尚未計(jì)算出結(jié)果轉(zhuǎn)發(fā)不能解決時(shí)間上的后推Chapter

—

The

Processor—

42重新安排代碼以避免流水線(xiàn)阻塞代碼重排以避免在下條指令中使用裝載結(jié)果C

code

for

F;lw $t1,

0($t0)lw

$t2,

4($t0)add

$t3,

$t1,

$t2swlwaddsw$t3,

12($t0)$t4,

8($t0)$t5,

$t1,

$t4$t5,

16($t0)stallstalllw

$t1,

0($t0)lw

$t2,

4($t0)lw

$t4,

8($t0)addswaddsw$t3,$t3,$t5,$t5,$t1,

$t212($t0)$t1,

$t416($t0)11

cyclesChapter

—

The

Processor—

4313cycles12345678910111213lwIFIDEXMEMWBLwIFID$t0EXALUMEM$t2WB$t2addIFIDt2,t1EXMEMWBSwIFIDEXMEMWBLwIFID$t0EXMEM$t4WBaddIFIDt4,t1EXMEMWB$t5swIFIDEXMEMWB旁路方法解決，但是依然還要延后1拍12345678910111213lwIFIDEXMEMWBLwIFID$t0EXALUMEM$t2WB$t2lwIFID$t0EXMEM$t4WB$t4addIFIDt2,t1EXMEMWBswIFIDEXMEMWBaddIFIDt4,t1EXMEMWB$t5swIFIDEXMEMWB改變程序中指令順序，先寫(xiě)后讀(不，見(jiàn)226頁(yè))3

控制分支決定控制流程返回的下一條指令依賴(lài)于分支計(jì)算結(jié)果流水線(xiàn)并不總是能夠返回正確的指令一直停留在分支的ID段在MIPS流水線(xiàn)中需要比較兩個(gè)寄存器并提前計(jì)算目標(biāo)地址在ID段增加硬件Chapter

—

The

Processor—

46分支阻塞Chapter

—

The

Processor—

47在返回下一條指令前一直等待分支計(jì)算結(jié)果解決分支阻塞的方法--分支延長(zhǎng)流水線(xiàn)并不能提前確定分支結(jié)果阻塞懲罰是不可取的分支結(jié)果如果錯(cuò)誤只能阻塞在MIPS流水線(xiàn)中可以不跳轉(zhuǎn)的分支無(wú)延遲的返回分支后的指令Chapter

—

The

Processor—

48MIPS

中未發(fā)生的分支正確不正確Chapter

—

The

Processor—

49更理想的分支靜態(tài)分支基于典型的分支動(dòng)作例如：循環(huán)和if狀態(tài)跳轉(zhuǎn)已經(jīng)使用的分支沒(méi)有使用的分支向后向前動(dòng)態(tài)分支硬件衡量實(shí)際的分

為例如：保存每條分支的歷史記錄假定未來(lái)行為將繼續(xù)當(dāng)前趨勢(shì)當(dāng)出錯(cuò)后，在重新裝載時(shí)阻塞并更新歷史記錄Chapter

—

The

Processor—

50流水線(xiàn)總結(jié)流水線(xiàn)通過(guò)提高指令條的吞吐率改善性能并行執(zhí)行多條指令；每條指令的執(zhí)行時(shí)間都有相同的。受限結(jié)構(gòu)，數(shù)據(jù)，控制指令集的設(shè)計(jì)影響實(shí)現(xiàn)流水線(xiàn)的復(fù)雜性The

BIG

PictureThere

less

this

than

meets

the

eye.看起來(lái)東西很多，其實(shí)不然。Chapter

—

The

Processor—

51MIPS

的流水?dāng)?shù)據(jù)通路【p212】§4.6

Pipelined

Datapath

and

ControlWBMEM從右向左的數(shù)據(jù)流會(huì)導(dǎo)致取指令譯碼、讀執(zhí)行、計(jì)寄存器算地址器寫(xiě)回Chapter

—

The

Processor—

52流水線(xiàn)寄存器兩段之間需要寄存器寄存器作用：保留之前周期中產(chǎn)生的結(jié)果信息Chapter

—

The

Processor—

53流水線(xiàn)操作指令是一個(gè)周期接著一個(gè)周期（

Cycle-by-cycle

）的方式通過(guò)流水線(xiàn)的數(shù)據(jù)通路對(duì)比：多時(shí)鐘周期示意圖從流水線(xiàn)中抽出一個(gè)時(shí)鐘周期就表示了單時(shí)鐘周期流水線(xiàn)用于裝載/

的單時(shí)鐘周期示意圖Chapter

—

The

Processor—

54取指令I(lǐng)F，for

Load,

Store,…IF/ID寄存器為64位，是IR和PC的

體每個(gè)單時(shí)鐘周期中的流水示意圖在單時(shí)鐘周期中流水的操作陰影部分是本周期使用到的部件Chapter

—

The

Processor—

55指令譯碼ID，for

Load,

Store,…Chapter

—

The

Processor—

56ID/EX寄存器執(zhí)行指令EX，for

LoadChapter

—

The

Processor—

57MEM（器操作）

for

LoadChapter

—

The

Processor—

58寫(xiě)入WB，for

Load錯(cuò)誤的寄存器Chapter

—

The

Processor—

59修正的裝載數(shù)據(jù)通路Chapter

—

The

Processor—

60EX

for

StoreChapter

—

The

Processor—

61MEM

for

StoreChapter

—

The

Processor—

62WB

for

Store[空操作]參考

217頁(yè)-5：任何一條指令都必須經(jīng)過(guò)流水線(xiàn)的每一個(gè)步驟，即使在這個(gè)步驟中它實(shí)際什么都沒(méi)有做。WB寫(xiě)回Chapter

—

The

Processor—

63多時(shí)鐘周期示意圖【220頁(yè)】每個(gè)周期使用的資源清晰的體現(xiàn)在下表中這一段程序由5條指令組成Chapter

—

The

Processor—

64多時(shí)鐘周期示意圖【221】傳統(tǒng)的多時(shí)鐘周期流水線(xiàn)圖【時(shí)空?qǐng)D】時(shí)間Chapter

—

The

Processor—

65空間單時(shí)鐘周期示意圖Chapter

—

The

Processor—

66給定周期內(nèi)的流水狀態(tài)流水控制P222Chapter

—

The

Processor—

67Function字段的控制信號(hào)【222圖4-6參考P80圖2-19第三段op=000000時(shí)，function功能】流水控制Chapter

—

The

Processor—

68來(lái)自指令的控制信號(hào)單周期內(nèi)的實(shí)現(xiàn)流水控制Chapter

—

The

Processor—

69ALU

指令中的數(shù)據(jù)考慮以下序列sub

$2,$1,$3and

$12,$2,$5or

$13,$6,$2add

$14,$2,$2sw

$15,100($2)通過(guò)轉(zhuǎn)發(fā)（旁路）方式解決數(shù)據(jù)怎樣確定何時(shí)轉(zhuǎn)發(fā)?§4.7

Data

Hazards:

Forwarding

vs.

StallingChapter

—

The

Processor—

70依賴(lài)&

轉(zhuǎn)發(fā)1a.

EX/MEM.RegisterRd=

ID/EX.RegisterRs目的寄存器源寄存器IF/IDID/EXEX/MEM

MEM/WBChapter

—

The

Processor—

71檢測(cè)轉(zhuǎn)發(fā)的必要性通過(guò)流水線(xiàn)傳遞寄存器號(hào)例如：ID/EX的Rs寄存器=位于ID/EX流水線(xiàn)寄存器中的寄存器號(hào)EX段的ALU操作數(shù)寄存器號(hào)計(jì)算如下：ID/EX的源寄存器Rs，ID/EX的目的寄存器Rt以下情況存在數(shù)據(jù)1a.

EX/MEM.RegisterRd=

ID/EX.RegisterRs1b.

EX/MEM.RegisterRd

ID/EX.RegisterRt2a.

MEM/WB.RegisterRd

ID/EX.RegisterRs2b.

MEM/WB.RegisterRd

ID/EX.RegisterRt來(lái)自EX/MEM流水線(xiàn)寄存器的轉(zhuǎn)發(fā)來(lái)自MEM/WB流水線(xiàn)寄存器的轉(zhuǎn)發(fā)Chapter

—

The

Processor—

72Chapter

—

The

Processor—

73檢測(cè)轉(zhuǎn)發(fā)必要性?xún)H當(dāng)轉(zhuǎn)發(fā)指令將寫(xiě)入寄存器!EX/MEM.RegWrite,

MEM/WB.RegWrite并且僅當(dāng)寄存器寫(xiě)指令的Rd不是0EX/MEM.RegisterRd

≠

0,MEM/WB.RegisterRd

≠

0轉(zhuǎn)發(fā)路徑Chapter

—

The

Processor—

74轉(zhuǎn)發(fā)條件EXif

(EX/MEM.RegWrite

and

(EX/MEM.RegisterRd

≠

0)and

(EX/MEM.RegisterRd

ID/EX.RegisterRs))ForwardA

10if

(EX/MEM.RegWrite

and

(EX/MEM.RegisterRd

≠

0)and

(EX/MEM.RegisterRd=

ID/EX.RegisterRt))ForwardB

10MEMif

(MEM/WB.RegWrite

and

(MEM/WB.RegisterRd

≠

0)and

(MEM/WB.RegisterRd

ID/EX.RegisterRs))ForwardA

01if(MEM/WB.RegWrite

and

(MEM/WB.RegisterRd

≠0)and

(MEM/WB.RegisterRd

ID/EX.RegisterRt))ForwardB

01Chapter

—

The

Processor—

75雙重?cái)?shù)據(jù)考慮以下序列：add$1,$1,$2add$1,$1,$3add$1,$1,$4兩種

同時(shí)發(fā)生將使用最近的結(jié)果修訂MEM如果EX條件條件錯(cuò)誤，只有轉(zhuǎn)發(fā)Chapter

—

The

Processor—

76修訂轉(zhuǎn)發(fā)條件MEMif

(MEM/WB.RegWrite

and

(MEM/WB.RegisterRd

≠

0)and

not

(EX/MEM.RegWrite

and(EX/MEM.RegisterRd

≠

0)and

(EX/MEM.RegisterRd

ID/EX.RegisterRs))and

(MEM/WB.RegisterRd

ID/EX.RegisterRs))ForwardA

01if(MEM/WB.RegWrite

and

(MEM/WB.RegisterRd

≠0)and

not

(EX/MEM.RegWrite

and(EX/MEM.RegisterRd

≠

0)and

(EX/MEM.RegisterRd=

ID/EX.RegisterRt))and

(MEM/WB.RegisterRd

ID/EX.RegisterRt))ForwardB

01Chapter

—

The

Processor—

77帶轉(zhuǎn)發(fā)的數(shù)據(jù)通路Chapter

—

The

Processor

—78裝載使用數(shù)據(jù)阻塞一個(gè)周期Chapter

—

The

Processor—

79Load-Use檢測(cè)時(shí)檢查當(dāng)運(yùn)行指令在ID段給出ID段中的ALU

操作數(shù)寄存器號(hào)IF/ID.RegisterRs,

IF/ID.RegisterRt當(dāng)出現(xiàn)以下情況時(shí)存在裝載使用ID/EX.MemRead

and((ID/EX.RegisterRt

IF/ID.RegisterRs)

or(ID/EX.RegisterRt

IF/ID.RegisterRt))如果檢測(cè)到，阻塞并

氣泡Chapter

—

The

Processor—

80怎樣阻塞流水線(xiàn)將ID/EX

寄存器中的值強(qiáng)制為0EX,MEMand

空操作更新

和IF/ID

寄存器中的值將執(zhí)行中的指令重新再次取出后續(xù)指令1個(gè)周期的阻塞后允許MEM

數(shù)據(jù)接下來(lái)可以轉(zhuǎn)發(fā)到EX段Chapter

—

The

Processor—

81流水線(xiàn)中的阻塞和冒泡阻塞Chapter

—

The

Processor—

82流水線(xiàn)中的阻塞和冒泡Chapter

—

The

Processor—

83或，更精確帶有檢測(cè)的數(shù)據(jù)通路Chapter

—

The

Processor—

84阻塞及其性能阻塞會(huì)降低性能但是為了得到正確結(jié)果這是必要的編譯器可以重排代碼以避免

和阻塞流水線(xiàn)結(jié)構(gòu)的必要知識(shí)The

BIG

PictureChapter

—

The

Processor—

85分支MEM中確定If

分支結(jié)果§4.8

Control

HazardsPC清除這些指令(設(shè)置控制值為0)Chapter

—

The

Processor—

86Chapter

—

The

Processor—

87縮短分支延遲設(shè)置硬件將將分支地址計(jì)算提前到ID

段目標(biāo)地址加法器寄存器比較器例如:分支發(fā)生時(shí)的執(zhí)行情況36:sub$10,$4,$840:beq$1,$3,744:and$12,$2,$548:or$13,$2,$652:add$14,$4,$256:slt$15,$6,$7...72:

lw$4,

50($7)例如：分支發(fā)生Chapter

—

The

Processor—

88例如：分支發(fā)生Chapter

—

The

Processor—

89分支中的數(shù)據(jù)如果分支指令中所需的操作數(shù)是之前的數(shù)邏運(yùn)算指令的第二或第三個(gè)操作數(shù)…IFIDEX

MEM

WBIFIDEX

MEM

WBIFIDEX

MEM

WBIFIDEX

MEM

WBadd

$4,

$5,

$6add

$1,

$2,$3beq

$1,

$4,則使用轉(zhuǎn)發(fā)可以解決數(shù)據(jù)Chapter

—

The

Processor—

90分支中的數(shù)據(jù)分支指令中所需的操作數(shù)是之前數(shù)邏運(yùn)算的執(zhí)行結(jié)果，或者是之前的裝載指令的第二個(gè)操作數(shù)需要一個(gè)阻塞周期beq

stalledIFIDEXMEMWBIFIDEXMEMWBIFIDIDEXMEMWBadd

$4,

$5,$6lw

$1,

addrbeq

$1,$4,Chapter

—

The

Processor—

91分支中的數(shù)據(jù)分支指令中所需的操作數(shù)是立即數(shù)裝載指令的執(zhí)行結(jié)果需要阻塞兩個(gè)周期beq

stalledIFIDEXMEMWBIFIDIDIDEXMEMWBbeq

stalledlw

$1,

addrbeq

$1,$0,Chapter

—

The

Processor—

92動(dòng)態(tài)分支在更深的和動(dòng)態(tài)分支量流水線(xiàn)中，分支代價(jià)將增加分支緩存

(或分支歷史記錄表)按照分支指令地址索引的

區(qū)包含分支是否發(fā)生的標(biāo)志位分支的執(zhí)行過(guò)程查表，采用具有相同的分支設(shè)置返回的目標(biāo)地址如果假設(shè)錯(cuò)誤，則刪除

錯(cuò)誤的指令，返回原來(lái)的位置重新取指執(zhí)行位取反Chapter

—

The

Processor—

931位位:缺陷循環(huán)分支將被錯(cuò)誤兩次!…,

inner…outer:

……inner:

……beq

…,beq

…,…,

outer在循環(huán)結(jié)束時(shí)被錯(cuò)誤

一次在第一次循環(huán)迭代時(shí)發(fā)生錯(cuò)誤，因?yàn)樵谘h(huán)的上一次迭代時(shí)被前一個(gè)執(zhí)行設(shè)置為不執(zhí)行，于是導(dǎo)致分支錯(cuò)誤Chapter

—

The

Processor—

94兩位

位只有連續(xù)兩次錯(cuò)誤時(shí)才改變位Chapter

—

The

Processor—

95計(jì)算分支目標(biāo)地址盡管設(shè)置分支

位，仍然需要計(jì)算分支目標(biāo)地址分支發(fā)生需要一個(gè)時(shí)鐘周期的開(kāi)銷(xiāo)分支目標(biāo)緩存目標(biāo)地址緩存目標(biāo)指令緩存(當(dāng)指令返回時(shí)對(duì)PC索引)如果

的分支發(fā)生，立刻取出目標(biāo)指令Chapter

—

The

Processor—

96異常和“Unexpected”

events

requiring

changein

flow

controlDifferent

ISAs

use

the

termsdifferentlyExceptionAriseswithin

the

CPUe.g.,

undefined

opcode,

overflow,

syscall,

…InterruptFroman

externalI/O

controllerDealing

with

them

without

sacrificingperformance

hard§4.9ExceptionsChapter

—

The

Processor—

97Chapter

—

The

Processor—

98Handling

ExceptionsIn

MIPS,

exceptions

managed

SystemControl

Coprocessor

(CP0)Save

offending

(or

interrupted)

instructionIn

MIPS:

Exception

Program

Counter

(EPC)Save

indication

the

problemIn

MIPS:

Cause

registerWe’ll

assume

1-bit0

for

undefined

opcode,

for

overflowJump

handler

8000

00180ternate

MechanismVectored

InterruptsHandleraddress

determined

thecauseExample:C000

0000C000

0020C000

0040Undefined

opcode:Overflow:…:Instructions

eitherDealwith

the

interrupt,

orJump

real

handlerChapter

—

The

Processor—

99Chapter

—

The

Processor—

100Handler

ActionsRead

cause,

and

transfer

relevanthandlerDetermine

action

requiredIf

restartableTake

corrective

actionuse

EPC

returnto

programOtherwiseTerminate

programReport

error

using

EPC,

cause,

…Chapter

—

The

Processor—

101Exceptions

PipelineAnother

form

control

hazardConsider

overflow

add

stageadd

$1,

$2,

$1Prevent

from

beingclobberedComplete

instructionsFlush

add

and

subsequent

instructionsSet

Cause

and

EPC

valuesTransfer

control

tohandlerSimilar

mispredicted

branchUse

much

the

same

hardwarePipeline

with

ExceptionsChapter

—

The

Processor—

102Chapter

—

The

Processor—

103Exception

PropertiesRestartable

exceptionsPipeline

can

flush

the

instructionHandlerexecutes,

then

returns

theinstructionRefetched

and

executed

from

scratchPC

saved

EPC

registerIdentifies

causing

instructionActually

+4issavedHandler

must

adjustChapter

—

The

Processor—

104Exception

ExampleException

add

in40sub$11,$2,$444and$12,$2,$548or$13,$2,$64Cadd$1,$2,$150slt$15,$6,$7lw$16,50($7)54…Handler80000180sw$25,1000($0)80000184sw$26,1004($0)…Exception

ExampleChapter

—

The

Processor—

105Exception

ExampleChapter

—

The

Processor—

106Multiple

ExceptionsPipelinin erlaps

multiple

instructionsCould

have

multiple

exceptions

onceSimple

approach:

deal

with

exception

fromearliest

instructionFlushsubsequent

instructions“Precise”

exceptionsIn

complex

pipelinesMultiple

instructions

issued

per

cycleOut-of-order

completionMaintaining

precise

exceptions

difficult!Chapter

—

The

Processor—

107Chapter

—

The

Processor—

108Imprecise

ExceptionsJust

stop

pipeline

and

save

stateIncluding

exception

cause(s)Let

the

handler

work

outWhich

instruction(s)had

exceptionsWhich

completeorflushMay

require

“manual”

completionSimplifies

hardware,but

complex

handlersoftwareNot

feasible

for

complex

multiple-issueout-of-order

pipelinesInstruction-Level

Parallelism

(ILP)Pipelining:

executing

multiple

instructions

inparallelTo

increase

ILPDeeperpipelineLess

work

per

stage

shorter

clock

cycleMultiple

issueReplicate

pipeline

stages

multiple

pipelinesStart

multiple

instructions

per

clock

cycleCPI

use

Instructions

Per

Cycle

(IPC)E.g.,

4GHz

4-waymultiple-issue16

BIPS,

peak

CPI

0.25,

peak

IPC

4But

dependencies

reduce

this

practice§4.10

Parallelism

and

Advanced

Instruction

Level

ParallelismChapter

—

The

Processor—

109Chapter

—

The

Processor—

110Multiple

IssueStatic

multiple

issueCompiler

groups

instructions

issued

togetherPackages

them

into

“issue

slots”Compilerdetects

and

avoids

hazardsDynamic

multiple

issueCPU

examines

instruction

stream

and

choosesinstructions

issue

eachcycleCompiler

can

help

reordering

instructionsCPU

resolves

hazards

using

advanced

techniques

atruntimeChapter

—

The

Processor—

111Speculation“Guess”

whatto

with

instructionStart

operation

assoon

aspossibleCheck

whether

guess

was

rightIf

so,

complete

the

operationIf

not,

roll-back

and

the

right

thingCommonto

static

and

dynamic

multiple

issueExamplesSpeculate

onbranch

eRoll

back

path

taken

differentSpeculate

onloadRoll

back

location

updatedChapter

—

The

Processor—

112Compiler/Hardware

SpeculationCompiler

can

reorder

instructionse.g.,

move

load

before

branchCaninclude

“fix-up”

instructions

recoverfrom

incorrectguessHardware

can

look

ahead

for

instructionsto

executeBuffer

results

untilit

determines

theyareactually

neededFlush

buffers

incorrect

speculationSpeculation

and

ExceptionsWhat

exception

occurs

aspeculatively

executed

instruction?e.g.,

speculative

load

before

null-pointercheckStatic

speculationCan

add

ISA

support

for

deferring

exceptionsDynamic

speculationCan

buffer

exceptions

until

instructioncompletion

(whi ay

not

occur)Chapter

—

The

Processor—

113Chapter

—

The

Processor—

114Static

Multiple

IssueCompiler

groups

instructions

into

“issuepackets”Group

ofinstructions

that

can

issued

asingle

cycleDeterminedby

pipelineresources

requiredThink

issue

packet

very

longinstructionSpecifies

multipleconcurrent

operations

Very

LongInstruction

Word

(VLIW)Chapter

—

The

Processor—

115Scheduling

Static

Multiple

IssueCompiler

must

remove

some/all

hazardsReorder

instructions

into

issuepacketsNo

dependencies

with

packetPossibly

some

dependenciesbetweenpacketsVaries

betweenISAs;

compiler

must

know!Pad

withnopif

necessaryChapter

—

The

Processor—

116MIPS

with

Static

Dual

IssueTwo-issue

packetsOne

ALU/branch

instructionOne

load/store

instruction64-bit

alignedALU/branch,

then

load/storePad

unused

instruction

with

nopAddressInstruction

typePipeline

StagesnALU/branchIFIDEXMEMWBn

4Load/storeIFIDEXMEMWBn

8ALU/branchIFIDEXMEMWBn

12Load/storeIFIDEXMEMWBn

16ALU/branchIFIDEXMEMWBn

20Load/storeIFIDEXMEMWBMIPS

with

Static

Dual

IssueChapter

—

The

Processor—

117Chapter

—

The

Processor—

118Hazards

the

Dual-Issue

MIPSMore

instructions

executing

inparallelEX

data

hazardForwarding

avoided

stalls

with

single-issueNow

can’t

use

ALU

result

load/store

same

packetadd

$t0,

$s0,

$s1load

$s2,

0($t0)Split

into

two

packets,

effectively

stallLoad-use

hazardStill

one

cycle

use

latency,

but

now

two

instructionsMore

aggressive

scheduling

requiredChapter

—

The

Processor—

119Scheduling

ExampleSchedule

this

for

dual-issue

MIPSLoop:

lw$t0,

0($s1)#$t0=array

elementaddu$t0,

$t0,

$s2#add

scalar

$s2sw$t0,

0($s1)#

store

resultaddi$s1,

$s1,–4#decrement

pointerbne$s1,

$zero,Loop#

branch

$s1!=0ALU/branchLoad/storecycleLoop:noplw

$t0,

0($s1)1addi

$s1,

$s1,–4nop2addu

$t0,

$t0,$s2nop3IbPneC

$=s15,

/4$ze=ro1,

.L2o5op

(cs.wf.

p$et0a,k4(I$Ps1C)

=2)4Loop

UnrollingReplica oop

body

expose

moreparallelismReduces

loop-control

overheadUse

different

registers

per

replicationCalled

“register

renaming”Avoidloop-carried“anti-dependencies”Store

followed

load

the

same

registerAka

“name

dependence”Reuse

nameChapter

—

The

Processor—

120Chapter

—

The

Processor—

121Loop

Unrolling

ExampleIALU/branchLoad/storecycleLoop:addi$s1,

$s1,–16lw$t0,

0($s1)1noplw$t1,

12($s1)2addu$t0,

$t0,$s2lw$t2,

8($s1)3addu$t1,

$t1,

$s2lw$t3,

4($s1)4addu$t2,

$t2,$s2sw$t0,

16($s1)5addu$t3,

$t4,$s2sw$t1,

12($s1)6PC

=n1o4p

=1.75sw$t2,

8($s1)7Closbenre

to$2s1,,b$uztearot,coLosotpofsrwegis$tte3,rs4(a$nsd1)codes8izeChapter

—

The

Processor—

122Dynamic

Multiple

Issue“Superscalar”

processorsCPU

decides

whether

issue

…each

cycleAvoiding

structural

and

data

hazardsAvoids

the

need

for

compiler

schedulingThough

may

still

helpCode

semantics

ensured

the

CPUChapter

—

The

Processor—

123Dynamic

Pipeline

SchedulingAllow

the

CPU

execute

instructions

outof

order

avoid

stallsBut

commit

result

registers

inorderExamplelw$t0,20($s2)addu$t1,$t0,

$t2sub$s4,$s4,

$t3slti$t5,$s4,

20Can

start

sub

while

addu

waiting

for

lwDynamically

Scheduled

CPUResults

also

sent

any

waitingreservation

stationsReorders

buffer

forregister

writesCan

supplyoperands

forissued

instructionsChapter

—

The

Processor—

124PreservesdependenciesHold

pendingoperandsChapter

—

The

Processor—

125Register

RenamingReservation

stations

and

reorder

buffereffectively

provide

renamingOn

instruction

issue

reservation

stationIf

operandis

available

file

orreorder

bufferCopied

reservation

stationNo

longer

required

inthe

can

beoverwrittenIf

operand

not

yet

availableIt

will

provided

the

reservationstation

afunctionunitRegisterupdate

may

not

requiredChapter

—

The

Processor—

126SpeculationPredict

branch

and

continue

issuingDon’t

commit

until

branchdeterminedLoad

speculationAvoidloadand

cache

miss

delayPredict

the

effective

addressPredict

loaded

valueeLoad

before

completing

outstanding

storesBypass

stored

values

load

unitDon’t

commit

load

until

speculation

clearedChapter

—

The

Processor—

127Why

Dynamic

Scheduling?Why

not

just

let

the

compiler

schedulecode?Not

all

stalls

are

predicablee.g.,

cache

missesCan’t

always

schedule

around

branchesBranch e

人人文庫(kù)> 全部分類(lèi)> 教育資料 > 課件下載

溫馨提示

1. 本站所有資源如無(wú)特殊說(shuō)明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

英文版課件最終版本-chapter4處理器

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

英文版課件最終版本-chapter4處理器

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔