




版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、復(fù)習(xí) 流水線基本概念 能夠流水的前提條件 流水線的評(píng)價(jià)指標(biāo) DLX基本流水線 五個(gè)階段,各階段的功能 不同數(shù)據(jù)通路的作用 段間寄存器的設(shè)置和作用流水段流水段表表3.1 DLX3.1 DLX流水線的每個(gè)流水段的操流水線的每個(gè)流水段的操作作任何指令類型任何指令類型ALU ALU 指令指令Load/Store Load/Store 指令指令分支指令分支指令I(lǐng)FIFIDIDEXEXIF/ID.IR IF/ID.IR MemPC MemPCIF/ID.NPC,PC IF/ID.NPC,PC (if EX/MEM.condEX/MEM.ALUOutput else PC+4); (if EX/MEM.co
2、ndEX/MEM.ALUOutput else PC+4);ID/EX.A ID/EX.A RegsIF/ID.IR RegsIF/ID.IR6.106.10; ID/EX.B ; ID/EX.B RegsIF/ID.IR RegsIF/ID.IR11.1511.15;ID/EX.NPC ID/EX.NPC IF/ID.NPC; ID/EX.IR IF/ID.NPC; ID/EX.IR IF/ID.IR; IF/ID.IR;ID/EX.Imm ID/EX.Imm (IR (IR1616) )1616#IR#IR16.3116.31; ;EX/MEM.IR ID/EX.IR; EX/MEM.IR
3、 ID/EX.IR; EX/MEM.ALUOutput EX/MEM.ALUOutput ID/EX.A op ID/EX.B ID/EX.A op ID/EX.B 或或EX/MEM.ALUOutput EX/MEM.ALUOutput ID/EX.A op ID/EX.Imm;ID/EX.A op ID/EX.Imm;EX/MEM.cond 0;EX/MEM.cond 0;EX/MEM.IR ID/EX.IR; EX/MEM.IR ID/EX.IR; EX/MEM.B ID/EX.BEX/MEM.B ID/EX.BEX/MEM.ALUOutput EX/MEM.ALUOutput ID/EX
4、.A + ID/EX.Imm;ID/EX.A + ID/EX.Imm;EX/MEM.cond 0;EX/MEM.cond 0;EX/MEM.ALUOutputEX/MEM.ALUOutputID/EX.NPC+ID/EX.Imm;ID/EX.NPC+ID/EX.Imm;EX/MEM.cond EX/MEM.cond (ID/EX.A op 0);(ID/EX.A op 0);流水段流水段DLXDLX流水線的每個(gè)流水段的操作(續(xù))流水線的每個(gè)流水段的操作(續(xù))任何指令類型任何指令類型ALU ALU 指令指令Load/Store Load/Store 指令指令分支指令分支指令MEMMEMWBWBM
5、EM/WB.IR EX/MEM.IR; MEM/WB.IR EX/MEM.IR; MEM/WB.ALUOutput MEM/WB.ALUOutput EX/MEM.ALUOutput;EX/MEM.ALUOutput;MEM/WB.IR EX/MEM.IR; MEM/WB.IR EX/MEM.IR; MEM/WB.LMD MEM/WB.LMD MemEX/MEM.ALUOutput;MemEX/MEM.ALUOutput;或或MemEX/MEM.ALUOutput MemEX/MEM.ALUOutput EX/MEM.B; EX/MEM.B;RegsMEM/WB.IRRegsMEM/WB.IR
6、16.2016.20 MEM/WB.ALUOutput; MEM/WB.ALUOutput;或或RegsMEM/WB.IRRegsMEM/WB.IR11.1511.15 MEM/WB.ALUOutput;MEM/WB.ALUOutput;RegsMEM/WB.IRRegsMEM/WB.IR11.1511.15 MEM/WB.LMD; MEM/WB.LMD;復(fù)習(xí) 流水線中的相關(guān) 結(jié)構(gòu)相關(guān):需要更多的硬件資源 數(shù)據(jù)相關(guān):需要定向,編譯器調(diào)度 控制相關(guān):盡早檢測(cè)條件,計(jì)算目標(biāo)地址,延遲轉(zhuǎn)移,預(yù)測(cè) 如何構(gòu)造,如何避免 實(shí)例分析:MIPS R4000 特點(diǎn)ADD R1, R2, R3 IF ID EX
7、ME WBSUB R5, R1, R7 IF ID EX ME WBXOR R6, R1, R7 IF ID EX ME WBOR R7, R1, R7 IF ID EX ME WBLW R1, 45(R2) IF ID EX ME WBSUB R8, R6, R7 IF ID EX ME WB ADD R5, R1, R7 IF ID EX ME WBADD R1, R2, R3 IF ID EX ME WBSUB R8, R6, R7 IF ID EX ME WB LW R5, 45(R1) IF ID EX ME WBLW R1, 30(R2) IF ID EX ME WBSUB R8,
8、 R6, R7 IF ID EX ME WB LW R5, 45(R1) IF ID EX ME WBADD R1, R2, R3 IF ID EX ME WBSW R5, 30(R1) IF ID EX ME WB SW R6, 45(R1) IF ID EX ME WBADD R1, R2, R3 IF ID EX ME WBSW R1, 45(R3) IF ID EX ME WB SW R1, 45(R4) IF ID EX ME WBLW R1, 56(R2) IF ID EX ME WBSW R1, 45(R3) IF ID EX ME WB SW R1, 45(R4) IF ID
9、EX ME WBCh 4 指令級(jí)并行Embedded System Lab Fall 2012內(nèi)容提要 基本的指令調(diào)度方法 記分牌算法 Tomasulo算法4.1 指令級(jí)并行(Instruction Level Parallelism) 相關(guān)是程序運(yùn)行的本質(zhì)特征 相關(guān)帶來(lái)數(shù)據(jù)冒險(xiǎn) 冒險(xiǎn)導(dǎo)致CPU停頓 Stall相關(guān)的分類: 數(shù)據(jù)相關(guān) 結(jié)構(gòu)相關(guān) 控制相關(guān) ILP: 無(wú)關(guān)的指令重疊執(zhí)行Loop: LD F0,0(R1)SUBI R2,R2,8SUBI R3,R3,8 ADDD F4,F0,F2 名相關(guān) 另一種相關(guān)稱為名相關(guān)( name dependence): 兩條指令使用同一個(gè)名字(regist
10、er or memory location) 但不交換數(shù)據(jù) 反相關(guān)(Antidependence) (WAR) Instruction j 所寫(xiě)的寄存器或存儲(chǔ)單元,與 instruction i 所讀的寄存器或存儲(chǔ)單元相同,注instruction i 先執(zhí)行 輸出相關(guān)(Output dependence) (WAW) Instruction i 和instruction j 對(duì)同一寄存器或存儲(chǔ)單元進(jìn)行寫(xiě)操作,必須保證兩條指令的寫(xiě)順序 下列是否有名相關(guān)? 1 Loop: LDF0,0(R1) 2ADDDF4,F0,F2 3SD0(R1),F4 4LDF0,-8(R1) 5ADDDF4,F0,F
11、2 6SD-8(R1),F4 7LDF0,-16(R1) 8ADDDF4,F0,F2 9SD-16(R1),F4 ; 10LDF0,-24(R1) 11ADDDF4,F0,F2 12SD-24(R1),F4 13SUBIR1,R1,#32 14BNEZR1,LOOP 15NOP 如何消除名相關(guān)如何消除名相關(guān)?名相關(guān)的消除 1 Loop: LDF0,0(R1) 2ADDDF4,F0,F2 3SD0(R1),F4 ;drop SUBI & BNEZ 4LDF6,-8(R1) 5ADDDF8,F6,F2 6SD-8(R1),F8 ;drop SUBI & BNEZ 7LDF10,-16(R1) 8A
12、DDDF12,F10,F2 9SD-16(R1),F12 ;drop SUBI & BNEZ 10LDF14,-24(R1) 11ADDDF16,F14,F2 12SD-24(R1),F16 13SUBIR1,R1,#32;alter to 4*8 14BNEZR1,LOOP 15NOP 這種方法稱為寄存器重命名這種方法稱為寄存器重命名“register renaming”指令級(jí)并行的若干定義 基本塊的定義 直線型代碼,無(wú)分支 整個(gè)程序是由分支語(yǔ)句連接基本塊構(gòu)成 MIPS 的分支指令占15%左右,基本塊的大小在47條指令指令級(jí)并行的若干定義 OS代碼中的分支較少負(fù)責(zé)資源管理填寫(xiě)狀態(tài)寄存器填寫(xiě)控
13、制寄存器設(shè)置控制變量 跨基本塊的并行(循環(huán)級(jí)并行) 循環(huán)的特征 控制循環(huán)的分支指令是有執(zhí)行偏好的 絕大多數(shù)是成功的, 預(yù)測(cè)比較容易,但必須有預(yù)測(cè)方案 流水線的平均CPI Pipeline CPI = Ideal Pipeline CPI + Struct Stalls + RAW Stalls + WAR Stalls + WAW Stalls + Control Stalls 本章研究 減少停頓(stalls)數(shù)的方法和技術(shù)采用的基本技術(shù)指令集調(diào)度的基本途徑基本途徑軟件方法(編譯器優(yōu)化)Gcc: 17%控制類指令5 instructions + 1 branch在基本塊上,得到更多的并行性挖
14、掘循環(huán)級(jí)并行硬件方法動(dòng)態(tài)調(diào)度方法靜態(tài)與動(dòng)態(tài)調(diào)度 8086 IO周期和CPU周期 386 指令重疊執(zhí)行 486 指令級(jí)并行 動(dòng)態(tài)指令集調(diào)度Pentium Pro Pentium II,III,IV, AMD Athlon, MIPS R10K R12K, Sun UltraSpac, PowerPC 603,G3,G4,G5(IBM-Motorola-Apple),Alpha 21264 靜態(tài)調(diào)度 Itanium & Transmeta: Crusoe 一個(gè)循環(huán)的例子for (i = 1; i = 1000; i+) x(i) = x(i) + y(i); 特征 計(jì)算x(i)時(shí)沒(méi)有相關(guān) 并行方式
15、 最簡(jiǎn)單的方法,循環(huán)展開(kāi)。 采用向量的方式X=X+Y60年代開(kāi)始 Cray HITACHI NEC Fujitsu目前均采用向量加速部件的形式 GPU DSP簡(jiǎn)單循環(huán)及其對(duì)應(yīng)的匯編程序for (i=1; i=1000; i+) x(i) = x(i) + s; Loop: LD F0,0(R1);F0=vector element ADDD F4,F0,F2;add scalar from F2 SD 0(R1),F4;store result SUBI R1,R1,8;decrement pointer 8B (DW) BNEZ R1,Loop;branch R1!=zero NOP;del
16、ayed branch slotFP 循環(huán)中的相關(guān)Loop:LDF0,0(R1);F0=vector element ADDDF4,F0,F2;add scalar from F2 SD0(R1),F4;store result SUBIR1,R1,8;decrement pointer 8B (DW) BNEZR1,Loop;branch R1!=zero NOP;delayed branch slot產(chǎn)生結(jié)果的指令產(chǎn)生結(jié)果的指令 使用結(jié)果的指令使用結(jié)果的指令所需的延時(shí)所需的延時(shí)FP ALU opAnother FP ALU op3FP ALU opStore double2 Load do
17、ubleFP ALU op1Load doubleStore double0Integer opInteger op0 需要在哪里加需要在哪里加stalls?(假設(shè)分支在(假設(shè)分支在ID段得到地址和條件)段得到地址和條件)FP 循環(huán)中的Stalls 10 clocks: 是否可以通過(guò)調(diào)整代碼順序使stalls減到最小 1 Loop:LDF0,0(R1);F0=vector element 2stall 3ADDD F4,F0,F2;add scalar in F2 4stall 5stall 6 SD0(R1),F4;store result 7 SUBIR1,R1,8;decrement p
18、ointer 8B (DW) 8 stall 9 BNEZR1,Loop;branch R1!=zero 10stall;delayed branch slot產(chǎn)生結(jié)果的指令產(chǎn)生結(jié)果的指令 使用結(jié)果的指令使用結(jié)果的指令所需的延時(shí)所需的延時(shí)FP ALU opAnother FP ALU op3FP ALU opStore double2 Load doubleFP ALU op1Load doubleStore double0Integer opInteger op0FP 循環(huán)中的最少Stalls數(shù) 6 clocks: 通過(guò)循環(huán)展開(kāi)通過(guò)循環(huán)展開(kāi)4次是否可以提高性能次是否可以提高性能? 1 Loo
19、p:LDF0,0(R1) 2SUBIR1,R1,8 3ADDDF4,F0,F2 4 stall 5BNEZR1,Loop;delayed branch 6 SD8(R1),F4;altered when move past SUBISwap BNEZ and SD by changing address of SD 1 Loop:LDF0,0(R1);F0=vector element 2stall 3ADDDF4,F0,F2;add scalar in F2 4stall 5stall 6 SD0(R1),F4;store result 7 SUBIR1,R1,8;decrement poi
20、nter 8B (DW) 8 stall 9 BNEZR1,Loop;branch R1!=zero 10stall;delayed branch slot循環(huán)展開(kāi)4次(straightforward way) Rewrite loop to minimize stalls? 1 Loop: LDF0,0(R1) stall 2ADDDF4,F0,F2 stall stall 3SD0(R1),F4 ;drop SUBI & BNEZ 4LDF6,-8(R1) stall 5ADDDF8,F6,F2 stall stall 6SD-8(R1),F8 ;drop SUBI & BNEZ 7LDF
21、10,-16(R1) stall 8ADDDF12,F10,F2 stall stall 9SD-16(R1),F12 ;drop SUBI & BNEZ 10LDF14,-24(R1) stall 11ADDDF16,F14,F2 stall stall 12SD-24(R1),F16 13SUBIR1,R1,#32 stall ;alter to 4*8 14BNEZR1,LOOP 15NOP 15 + 4 x (1+2) + 1 = 28 cycles, or 7 per iteration Assumes R1 is multiple of 4名相關(guān)如何解決名相關(guān)如何解決Stalls數(shù)
22、最小的循環(huán)展開(kāi) 代碼移動(dòng)后 SD移動(dòng)到SUBI后,注意偏移量的修改 Loads移動(dòng)到SD前,注意偏移量的修改1 Loop: LDF0,0(R1)2LDF6,-8(R1)3LDF10,-16(R1)4LDF14,-24(R1)5ADDDF4,F0,F26ADDDF8,F6,F27ADDDF12,F10,F28ADDDF16,F14,F29SD0(R1),F410SD-8(R1),F811SUBIR1,R1,#3212SD16(R1),F1213BNEZR1,LOOP14SD8(R1),F16; 8-32 = -24 14 clock cycles, or 3.5 per iteration循環(huán)展
23、開(kāi)示例小結(jié)移動(dòng)SD到SUBI和BNEZ后,需要調(diào)整SD中的偏移循環(huán)展開(kāi)對(duì)循環(huán)間無(wú)關(guān)的程序是有效降低stalls的手段(對(duì)循環(huán)級(jí)并行).不同次的循環(huán),使用不同的寄存器.指令調(diào)度,必須保證程序運(yùn)行的結(jié)果不變 指令重排+循環(huán)展開(kāi) 不做任何優(yōu)化 10000 采用指令重排 6000 4次循環(huán)展開(kāi) 7000 4次循環(huán)展開(kāi)+指令重排 3500循環(huán)展開(kāi)(1/3) Example: 下列程序段存在哪些數(shù)據(jù)相關(guān)? (A,B,C 指向不同的存儲(chǔ)區(qū)且不存在覆蓋區(qū)) for (i=1; i=100; i=i+1) Ai+1 = Ai + Ci; /* S1 */Bi+1 = Bi + Ai+1; /* S2 */ 1.
24、 S2使用由S1在同一循環(huán)計(jì)算出的 Ai+1. 2. S1 使用由S1在前一次循環(huán)中計(jì)算的值,同樣S2也使用由S2在前一次循環(huán)中計(jì)算的值. 這種存在于循環(huán)間的相關(guān),我們稱為 “l(fā)oop-carried dependence” 這表示循環(huán)間存在相關(guān),不能并行執(zhí)行,它與我們前面的例子中循環(huán)間無(wú)關(guān)是有區(qū)別的循環(huán)展開(kāi)(2/3) Example:A,B,C,D distinct & nonoverlapping for (i=1; i=100; i=i+1) Ai = Ai + Bi; /* S1 */Bi+1 = Ci + Di; /* S2 */1. S1和S2沒(méi)有相關(guān),S1和S2互換不會(huì)影響程序的
25、正確性 2. 在第一次循環(huán)中,S1依賴于前一次循環(huán)的Bi.循環(huán)展開(kāi)(3/3)A1 = A1 + B1;for (i=1; i=99; i=i+1) Bi+1 = Ci + Di;Ai+1 = Ai+1 + Bi+1;B101 = C100 + D100;for (i=1; i=100; i=i+1) Ai = Ai + Bi; /* S1 */Bi+1 = Ci + Di; /* S2 */OLD:NEW:期中測(cè)驗(yàn)復(fù)習(xí) 指令級(jí)并行 在流水線中多條指令能夠并行執(zhí)行 流水線技術(shù) 流水線的缺點(diǎn)? 數(shù)據(jù)相關(guān)、控制相關(guān)、結(jié)構(gòu)相關(guān) 順序執(zhí)行 解決方案 指令調(diào)度技術(shù)、循環(huán)展開(kāi)技術(shù)、重命名技術(shù) 記分牌和Tom
26、asulo算法簡(jiǎn)單循環(huán)及其對(duì)應(yīng)的匯編程序for (i=1; i out-of-order completion 記分牌算法 Tomasulo算法硬件方案之一: 記分牌 記分牌的基本概念示意圖記分牌控制的四階段(1/2)1. Issue指令流出,檢測(cè)結(jié)構(gòu)相關(guān) 如果當(dāng)前指令所使用的功能部件空閑,并且沒(méi)有其他活動(dòng)的指令使用相同的目的寄存器(WAW), 記分牌發(fā)射該指令到功能部件,并更新記分牌內(nèi)部數(shù)據(jù),如果有結(jié)構(gòu)相關(guān)或WAW相關(guān),則該指令的發(fā)射暫停,并且也不發(fā)射后繼指令,直到相關(guān)解除. 2. Read operands沒(méi)有數(shù)據(jù)相關(guān)時(shí),讀操作數(shù) 如果先前已發(fā)射的正在運(yùn)行的指令不對(duì)當(dāng)前指令的源操作數(shù)寄存器
27、進(jìn)行寫(xiě)操作,或者一個(gè)正在工作的功能部件已經(jīng)完成了對(duì)該寄存器的寫(xiě)操作,則該操作數(shù)有效. 操作數(shù)有效時(shí),記分牌控制功能部件讀操作數(shù),準(zhǔn)備執(zhí)行。 記分牌在這一步動(dòng)態(tài)地解決了RAW相關(guān),指令可能會(huì)亂序執(zhí)行。記分牌控制的四階段(2/2)3.Execution取到操作數(shù)后執(zhí)行 (EX) 接收到操作數(shù)后,功能部件開(kāi)始執(zhí)行. 當(dāng)計(jì)算出結(jié)果后,它通知記分牌,可以結(jié)束該條指令的執(zhí)行. 4.Write resultfinish execution (WR) 一旦記分牌得到功能部件執(zhí)行完畢的信息后,記分牌檢測(cè)WAR相關(guān),如果沒(méi)有WAR相關(guān),就寫(xiě)結(jié)果,如果有WAR 相關(guān),則暫停該條指令。Example: DIVDF0,
28、F2,F4 ADDDF10,F0,F8 SUBDF8,F8,F14 CDC 6600 scoreboard 將暫停 SUBD 直到ADDD 讀取操作數(shù)后,才進(jìn)入WR段處理。思考 記分牌和DLX流水線有什么關(guān)系ISROEXWRScoreboard記分牌的結(jié)構(gòu)1. Instruction status記錄正在執(zhí)行的各條指令處于四步中的哪一步2. Functional unit status記錄功能部件(FU)的狀態(tài)。用9個(gè)域記錄每個(gè)功能部件的9個(gè)參量:Busy指示該部件是否空閑Op該部件所完成的操作Fi其目標(biāo)寄存器編號(hào)Fj, Fk源寄存器編號(hào)Qj, Qk產(chǎn)生源操作數(shù)Fj, Fk的功能部件Rj, R
29、k標(biāo)識(shí)源操作數(shù)Fj, Fk是否就緒的標(biāo)志,讀走之后設(shè)置為No3. Register result status如果存在功能部件對(duì)某一寄存器進(jìn)行寫(xiě)操作,指示具體是哪個(gè)功能部件對(duì)該寄存器進(jìn)行寫(xiě)操作。如果沒(méi)有指令對(duì)該寄存器進(jìn)行寫(xiě)操作,則該域?yàn)锽lankScoreboard ExampleInstruction status ReadExecutionWriteInstructionjkIssueoperands complete ResultLDF634+R2LDF245+R3MULTD F0F2F4SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functional unit sta
30、tusdestS1S2FU for j FU for kFj?Fk?TimeNameBusyOpFiFjFkQjQkRjRkIntegerNoMult1NoMult2NoAddNoDivideNoRegister result statusClockF0F2F4F6F8F10F12.F30FU* *加法指令執(zhí)行需要加法指令執(zhí)行需要2 2個(gè)周期,乘法需要個(gè)周期,乘法需要1010個(gè)周期,除法需要個(gè)周期,除法需要4040個(gè)周期個(gè)周期LDLD指令使用指令使用IntegerInteger整型部件整型部件Instruction status:Read Exec WriteInstructionjkIssu
31、e Oper Comp ResultLDF634+ R21LDF245+ R3MULTDF0F2F4SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Ti m e Nam eBusyOpFiFjFkQjQkRjRkIntegerYesLoadF6R2YesMult1NoMult2NoAddNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F301FUIntegerScoreboard Example: Cycle 1Instruction
32、status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R212LDF245+ R3MULTDF0F2F4SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerYesLoadF6R2YesMult1NoMult2NoAddNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F302FUInteger I
33、ssue 2nd LD?Scoreboard Example: Cycle 2Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R2123LDF245+ R3MULTDF0F2F4SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerYesLoadF6R2NoMult1NoMult2NoAddNoDivideNoRegister r
34、esult status:ClockF0F2F4F6F8F10 F12.F303FUInteger Issue MULT?Scoreboard Example: Cycle 3Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R3MULTDF0F2F4SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkInteger
35、NoMult1NoMult2NoAddNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F304FUIntegerScoreboard Example: Cycle 4Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35MULTDF0F2F4SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time Nam
36、eBusyOpFiFjFkQjQkRjRkIntegerYesLoadF2R3YesMult1NoMult2NoAddNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F305FUIntegerScoreboard Example: Cycle 5Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R356MULTDF0F2F46SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functio
37、nal unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerYesLoadF2R3YesMult1YesMultF0F2F4IntegerNoYesMult2NoAddNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F306FUMult1 IntegerScoreboard Example: Cycle 6Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF63
38、4+ R21234LDF245+ R3567M ULTDF0F2F46SUBDF8F6F27DIVDF10F0F6ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerYesLoadF2R3NoMult1YesMultF0F2F4IntegerNoYesMult2NoAddYesSubF8F6F2IntegerYesNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F307FUMult1 IntegerAdd Re
39、ad multiply operands?Scoreboard Example: Cycle 7Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R3567MULTDF0F2F46SUBDF8F6F27DIVDF10F0F68ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerYesLoadF2R3NoMult1YesMultF0F2F4Int
40、egerNoYesMult2NoAddYesSubF8F6F2IntegerYesNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F308FUMult1 IntegerAddDivideScoreboard Example: Cycle 8a (First half of clock cycle)Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678M
41、ULTDF0F2F46SUBDF8F6F27DIVDF10F0F68ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNoMult1YesMultF0F2F4YesYesMult2NoAddYesSubF8F6F2YesYesDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F308FUMult1AddDivideScoreboard Example: Cycle 8b (
42、Second half of clock cycle)Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F279DIVDF10F0F68ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo10 Mult1YesMultF0F2F4YesYesMult2No2 AddYesSubF8F6F
43、2YesYesDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F309FUMult1AddDivide Read operands for MULT & SUB? Issue ADDD?Note RemainingScoreboard Example: Cycle 9Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF
44、8F6F279DIVDF10F0F68ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo9 Mult1YesMultF0F2F4NoNoMult2No1 AddYesSubF8F6F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3010FUMult1AddDivideScoreboard Example: Cycle 10Instruction stat
45、us:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F27911DIVDF10F0F68ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo8 Mult1YesMultF0F2F4NoNoMult2No0 AddYesSubF8F6F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister r
46、esult status:ClockF0F2F4F6F8F10 F12.F3011FUMult1AddDivideScoreboard Example: Cycle 11Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkR
47、jRkIntegerNo7 Mult1YesMultF0F2F4NoNoMult2NoAddNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3012FUMult1Divide Read operands for DIVD?Scoreboard Example: Cycle 12Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F
48、469SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F213Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo6 Mult1YesMultF0F2F4NoNoMult2NoAddYesAddF6F8F2YesYesDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3013FUMult1AddDivideScoreboard Example: Cycle 13In
49、struction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F21314Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo5 Mult1YesMultF0F2F4NoNoMult2No2 AddYesAddF6F8F2YesYesDivideYesDivF10F0
50、F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3014FUMult1AddDivideScoreboard Example: Cycle 14Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F21314Functional unit status:destS1S2FUFUFj?Fk?
51、Time NameBusyOpFiFjFkQjQkRjRkIntegerNo4 Mult1YesMultF0F2F4NoNoMult2No1 AddYesAddF6F8F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3015FUMult1AddDivideScoreboard Example: Cycle 15Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF
52、245+ R35678MULTDF0F2F469SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F2131416Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo3 Mult1YesMultF0F2F4NoNoMult2No0 AddYesAddF6F8F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3016FUMult1AddDivideScore
53、board Example: Cycle 16Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F2131416Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo2 Mult1YesMultF0F2F4NoNoMult2NoAddYesAddF6F8
54、F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3017FUMult1AddDivide Why not write result of ADD? WAR Hazard!Scoreboard Example: Cycle 17Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F2791112DIV
55、DF10F0F68ADDDF6F8F2131416Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo1 Mult1YesMultF0F2F4NoNoMult2NoAddYesAddF6F8F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3018FUMult1AddDivideScoreboard Example: Cycle 18Instruction status:Rea
56、d Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F46919SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F2131416Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo0 Mult1YesMultF0F2F4NoNoMult2NoAddYesAddF6F8F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister
57、 result status:ClockF0F2F4F6F8F10 F12.F3019FUMult1AddDivideScoreboard Example: Cycle 19Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F4691920SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F2131416Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyO
58、pFiFjFkQjQkRjRkIntegerNoMult1NoMult2NoAddYesAddF6F8F2NoNoDivideYesDivF10F0F6YesYesRegister result status:ClockF0F2F4F6F8F10 F12.F3020FUAddDivideScoreboard Example: Cycle 20Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F4691920SUBDF8F6F27911
59、12DIVDF10F0F6821ADDDF6F8F2131416Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNoMult1NoMult2NoAddYesAddF6F8F2NoNoDivideYesDivF10F0F6YesYesRegister result status:ClockF0F2F4F6F8F10 F12.F3021FUAddDivide WAR Hazard is now gone. Scoreboard Example: Cycle 21Instruction stat
60、us:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F4691920SUBDF8F6F2791112DIVDF10F0F6821ADDDF6F8F213141622Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNoMult1NoMult2NoAddNo39 DivideYesDivF10F0F6NoNoRegister result status:ClockF0
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025-2030網(wǎng)絡(luò)出版行業(yè)市場(chǎng)發(fā)展分析及前景趨勢(shì)與投資研究報(bào)告
- 2025-2030紅外線節(jié)能灶行業(yè)市場(chǎng)現(xiàn)狀供需分析及投資評(píng)估規(guī)劃分析研究報(bào)告
- 2025-2030精細(xì)化工行業(yè)市場(chǎng)深度調(diào)研及前景趨勢(shì)與投資戰(zhàn)略研究報(bào)告
- 2025-2030童鞋產(chǎn)業(yè)政府戰(zhàn)略管理與區(qū)域發(fā)展戰(zhàn)略研究報(bào)告
- 2025-2030空氣凈化產(chǎn)業(yè)政府戰(zhàn)略管理與區(qū)域發(fā)展戰(zhàn)略研究報(bào)告
- 2025-2030移動(dòng)云服務(wù)行業(yè)市場(chǎng)現(xiàn)狀供需分析及投資評(píng)估規(guī)劃分析研究報(bào)告
- 2025-2030硝酸甘油市場(chǎng)前景分析及投資策略與風(fēng)險(xiǎn)管理研究報(bào)告
- 2025-2030礦泉水產(chǎn)品入市調(diào)查研究報(bào)告
- 2025-2030石制家具行業(yè)市場(chǎng)深度分析及發(fā)展策略研究報(bào)告
- 2025-2030皮鞋產(chǎn)業(yè)政府戰(zhàn)略管理與區(qū)域發(fā)展戰(zhàn)略研究報(bào)告
- 菩薩蠻黃鶴樓(毛澤東).中職課件電子教案
- 鋁銀漿MSDS化學(xué)品安全技術(shù)說(shuō)明書(shū)
- 紫藍(lán)色可愛(ài)卡通風(fēng)《小王子》名著導(dǎo)讀兒童文學(xué)PPT模板
- 安全疏散設(shè)施檢查記錄參考模板范本
- KTV包房音響系統(tǒng)設(shè)計(jì)方案
- 常用物理英語(yǔ)詞匯大全
- 城市軌道交通設(shè)備系統(tǒng)_第十一章_車輛段與綜合基地
- 增值稅暫行條例實(shí)施細(xì)則釋義
- 如何挖掘商機(jī)PPT課件
- 平行四邊形培優(yōu)專題訓(xùn)練
- 公制螺紋塞規(guī)的尺寸計(jì)算
評(píng)論
0/150
提交評(píng)論