計算機系統(tǒng)結(jié)構(gòu)習(xí)題講解(1)0929重定序

上傳人：q*** IP屬地：湖北上傳時間：2022-07-05 格式：PPT 頁數(shù)：103 大?。?.04MB 積分：28 舉報 版權(quán)申訴

計算機系統(tǒng)結(jié)構(gòu)習(xí)題講解(1)0929重定序_第2頁

計算機系統(tǒng)結(jié)構(gòu)習(xí)題講解(1)0929重定序_第3頁

計算機系統(tǒng)結(jié)構(gòu)習(xí)題講解(1)0929重定序_第4頁

計算機系統(tǒng)結(jié)構(gòu)習(xí)題講解(1)0929重定序_第5頁

已閱讀5頁，還剩98頁未讀，繼續(xù)免費閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進行舉報或認(rèn)領(lǐng)

文檔簡介

1、計算機系統(tǒng)結(jié)構(gòu)習(xí)題講解 (1)內(nèi)容提要n基礎(chǔ)概念n指令系統(tǒng)n流水線處理機n指令級并行基礎(chǔ)概念n1-2. 如有一個經(jīng)解釋實現(xiàn)的計算機，可以按功能劃分成4級。每一級為了執(zhí)行一條指令需要下一級的N條指令解釋。若執(zhí)行第一級的一條指令需要K ns時間，那么執(zhí)行第2、3、4級的一條指令各需要用多少時間？分析與解答：NK ns、N2K ns、N3K ns4321第i+1級第i級.n個T(i+1)=n*T(i)基礎(chǔ)概念n1.6 Dhrystone is a well-known integer benchmark. Computer A is measured to perform DA executions

2、 of the Dhrystone benchmark per second, and to achieve a millions of instructions per second rate of MIPSA while doing Dhrystone. Computer B is measured to perform DB executions of the Dhrystone benchmark per second, and to achieve a millions of instructions per second rate of MIPSB while doing Dhry

3、stone. nQ. What is the fallacy in calculating the MIPS rating of computer B as MIPSB=MIPSA(DB/DA)?基礎(chǔ)概念nAnswer The proposed formulation for MIPSB can be rewritten as:MIPSA MIPSB - = - DA DBExamining the units of each factor, we haveComputer A instructions/second Computer B instructions/second- = -Dhr

4、ystoneA/secondDhrystoneB/second基礎(chǔ)概念The time units factor out, revealing that the formulation is founded upon the assumption thatComputer A instructions Computer B instructions- = -DhrystoneA DhrystoneBUnless Computer A and B have the same instruction set architecture and execute identically compiled

5、 Dhrystone executables, this assumption is likely false. If so, the formulation for MIPSB is also incorrect.基礎(chǔ)概念n1-10. 實現(xiàn)軟件移植的主要途徑有哪些？它們存在什么問題？適用于什么場合？n1統(tǒng)一高級語言統(tǒng)一高級語言n語言用途不同運行在不同系統(tǒng)結(jié)構(gòu)上基本結(jié)構(gòu)難以統(tǒng)一（MIPS，ARM）n2采用系列機思想采用系列機思想n只能在相同系統(tǒng)結(jié)構(gòu)的機器間實現(xiàn)軟件移植，兼容性往往會限制系統(tǒng)結(jié)構(gòu)的變革n3模擬與仿真模擬與仿真n模擬的運行速度慢而系統(tǒng)結(jié)構(gòu)差別大則難以進行仿真基礎(chǔ)概念n1-11. 想

6、在系列中發(fā)展一種新型號機器，你認(rèn)為下列想在系列中發(fā)展一種新型號機器，你認(rèn)為下列哪些設(shè)想可以考慮，哪些行不通，為什么？哪些設(shè)想可以考慮，哪些行不通，為什么？（1）新增加字符數(shù)據(jù)類型和若干條字符處理指令，以支持事務(wù)處理程序的編譯；（2）為增強中斷處理功能，將中斷分析由原來的4級增加到5級，并重新調(diào)整中斷響應(yīng)的優(yōu)先次序；（3）在CPU和主存之間增設(shè)Cache存儲器，以克服因主存訪問速率過低而造成的系統(tǒng)性能瓶頸；基礎(chǔ)概念（4）為增加尋址靈活性和減少平均指令字長，將原來全部采用等長操作碼的指令改成有3類不同碼長的擴展操作碼；并將源操作數(shù)尋址方式由原來的操作碼指明改成增加一個如VAX-11那樣的尋址方式位

7、字段來指明；（5）將CPU與主存之間的數(shù)據(jù)通路寬度由16位擴到32位，以加快主機內(nèi)部信息的傳送；（6）為了減少使用公用總線的沖突，將單總線改為雙總線；（7）把原來的0號通用寄存器改作為專用的堆棧指示器。指令系統(tǒng)n*2-1. 數(shù)據(jù)類型、數(shù)據(jù)表示和數(shù)據(jù)結(jié)構(gòu)之間的關(guān)系是什么？在設(shè)計一個計算機系統(tǒng)時，確定數(shù)據(jù)表示的原則主要有哪幾個？n數(shù)據(jù)類型指程序設(shè)計語言中所允許的變量的種類，包括定點數(shù)、浮點數(shù)、布爾數(shù)、字符、樹、圖、表等n數(shù)據(jù)表示指硬件能直接識別和引用的數(shù)據(jù)類型n數(shù)據(jù)結(jié)構(gòu)是帶有結(jié)構(gòu)的數(shù)據(jù)元素集合指令系統(tǒng)n數(shù)據(jù)表示和數(shù)據(jù)結(jié)構(gòu)都是數(shù)據(jù)類型的子集，由硬件實現(xiàn)的數(shù)據(jù)類型即數(shù)據(jù)表示，而數(shù)據(jù)結(jié)構(gòu)是由軟件實現(xiàn)的數(shù)

8、據(jù)類型，另外還有軟硬件共同實現(xiàn)的。n數(shù)據(jù)類型由硬件實現(xiàn)速度快，而軟件實現(xiàn)節(jié)省硬件成本，如何取舍：a.縮短程序運行時間；b.減少CPU與主存之間的通信量；c.這種數(shù)據(jù)表示的通用性和利用率。指令系統(tǒng)n*2-10.假設(shè)有A和B兩種不同類型的處理機，A處理機中的數(shù)據(jù)不帶標(biāo)志符，其指令字長和數(shù)據(jù)字長均為32位。B處理機的數(shù)據(jù)帶有標(biāo)志符，每個數(shù)據(jù)的字長增加至36位，其中有4位是標(biāo)志符，它的指令條數(shù)由最多256條減少至不到64條。如果每執(zhí)行一條指令平均要訪問兩個操作數(shù)，每個存放在存儲器中的操作數(shù)平均要被訪問8次。對于一個由1000條指令組成的程序，分別計算這個程序在A處理機和B處理機中占用的存儲空間大?。ò?/p>

9、括指令和數(shù)據(jù)），從中得到什么啟發(fā)？指令系統(tǒng)n分析：nSOP代表操作碼位數(shù)nSOD代表操作數(shù)地址碼位數(shù)nSI代表指令位數(shù)nSI=SOP+SODnSF代表標(biāo)志符位數(shù)nSV代表數(shù)據(jù)值位數(shù)nSD代表數(shù)據(jù)位數(shù)nSD=SF+SVnSP代表程序的二進制位數(shù)nSP=SI+SD=SOP+SOD+SF+SV指令系統(tǒng)nA256條指令n指令32位：n數(shù)據(jù)32位：nB64條指令n指令30位：n數(shù)據(jù)36位：操作碼（8位）地址碼（24位）數(shù)據(jù)（32位）操作碼（6位）地址碼（24位）標(biāo)志符（4位）數(shù)據(jù)（32位） OPFOPS (B)+S (B)S (A)指令系統(tǒng)n(1) 處理機A nSp(A )=SI(A )+SD(A )n

10、 =3210002321000/840000（bit）n處理機BnSP(B)=SI(B)+SD(B)n =3010002361000/839000（bit）OPFOPS (B)+S (B)S (A)OPFOPS (B)+S (B)S (A)指令系統(tǒng)n S0，Sp(B)Sp(A)n啟示：n若1 NR 1, S0;n若NR4，則 1, S0;n實際執(zhí)行時，經(jīng)測量有NR 10，0.85，S0SOD(A) =SOD(B)，Sv(A) =Sv(B)程序存儲容量的增量S=Sp(B)Sp(A) =SOP(B)+SF(B)-SOP(A)SOP(A)=8 NI,SOP(B)+SF(B)=6NI+ =8 NI (

11、0.75+ ) 0.75 + =0.875R1 I2: A2*B2-R2 I3: A3*B3-R3 I4: A4*B4-R4 I5: A5*B5-R5 I6: A6*B6-R6 I7: R1+R2-R7 I8: R3+R4-R8 I9: R5+R6-R9 I10: R7+R8-R10 I11: R9+R10-R11標(biāo)量處理機n時空圖S61234567891011S5123456S4123456S37891011S27891011S1123456789101112345678910111213141516171819202122標(biāo)量處理機吞吐率為：Tp=11/22t=1/2t加速比:S=11*4

12、t/22t=2效率：E=(11*4t)/(6*22t)=1/3流水線基礎(chǔ)nAppendix A.2Use the following code fragment:Loop: LDF0,0(R2) LD F4,0(R3) MUL.D F0,F0,F4 ADD.D F2,F0,F2 DADDUIR2,R2,#8 DADDUIR3,R3,#8 DSUBUR5,R4,R2 BNEZR5,LoopnAssume that the initial value of R4 is R2+792792/8 = 99(step is 8)iteration = 99 times流水線基礎(chǔ)nFor this exe

13、rcise assume the standard five stage integer pipeline and the MIPS FP pipeline as described in section A.5. If structural hazards are due to write-back contention, assume the earliest instruction gets priority and other instructions are stalled.Qa. Show the timing of this instruction sequence for th

14、e MIPS FP pipeline without any forwarding or bypassing hardware but assuming a register read and a write in the same clock cycle “forward” through the register file. Assume that the branch is handled by flushing the pipeline. If all memory references hit in the cache, how many cycles does this loop

15、take to execute?流水線基礎(chǔ)nQb. Show the timing of this instruction sequence for the MIPS FP pipeline with normal forwarding or bypassing hardware. Assume that the branch is handled by predicting it as not taken. If all memory references hit in the cache, how many cycles does this loop take to execute?nSe

16、e 3 Hazards & ForwardingnStructural HazardnData HazardnControl Hazard流水線基礎(chǔ)nAnswer a. (without forwarding)instructionClock cycle1 2 3 45 6 7 8 13 1415 1617 1819 2021 2223 2425 26 27LD F0,0(R2)F D E MWLD F4,0(R3) F D E M WMUL.D F0,F0,F4 F Ds s E E E MWADD.D F2,F0,F2 Fs s D s s ss E E E E MWDADDUI R2,R

17、2,#8 F s s ss D E MWDADDUI R3,R3,#8 F D EM WDSUBU R5,R4,R2 F Ds EM WBNEZ R5,Loop Fs Ds rL.D F0,0(R2) Fs s F D E M W流水線基礎(chǔ)nAnswer a. (without forwarding)instructionClock cycle1 2 3 45 6 7 8 13 1415 1617 1819 2021 2223 2425 26 27LD F0,0(R2)F D E MWLD F4,0(R3) F D E M WMUL.D F0,F0,F4 F Ds s E E E MWADD.

18、D F2,F0,F2 Fs s D s s ss E E E E MWDADDUI R2,R2,#8 F s s ss D E MWDADDUI R3,R3,#8 F D EM WDSUBU R5,R4,R2 F Ds EM WBNEZ R5,Loop Fs Ds rL.D F0,0(R2) Fs s F D E M WLoop 1Loop 2 流水線基礎(chǔ)nAnswer a. total loop execution time= 2299 = 2178 clock cycles流水線基礎(chǔ)nAnswer b. (with normal forwarding) instructionClock c

19、ycle1 2 3 45 6 7 .12 13 1415 1617 1819 2021 2223LD F0,0(R2)F D E MWLD F4,0(R3) F D E M WMUL.D F0,F0,F4 F Ds E E . E M WADD.D F2,F0,F2 Fs D s . s E EE E M W DADDUI R2,R2,#8 F s . s D EM WDADDUI R3,R3,#8 F DE MWDSUBU R5,R4,R2 FD sE MWBNEZ R5,LoopF sD rL.D F0,0(R2)F sF DE M W流水線基礎(chǔ)nAnswer b. (with norma

20、l forwarding) instructionClock cycle1 2 3 45 6 7 .12 13 1415 1617 1819 2021 2223LD F0,0(R2)F D E MWLD F4,0(R3) F D E M WMUL.D F0,F0,F4 F Ds E E . E M WWADD.D F2,F0,F2 Fs D s . s E EE E M W DADDUI R2,R2,#8 F s . s D EM WDADDUI R3,R3,#8 F DE MWDSUBU R5,R4,R2 FD sE MWBNEZ R5,LoopF sD rL.D F0,0(R2)F sF

21、DE M WLoop 1Loop 2 流水線基礎(chǔ)nAnswer b. total loop execution time= 1898 + 19 = 1783 clock cycles流水線基礎(chǔ)nAppendix A.3Suppose the branch frequencies (as percentage of all instructions) are as follows:nConditional branches15%nJumps & Calls1%nConditional branches60% are takenWe are examining a four-deep pipeli

22、ne where the branch is resolved at the end of the second cycle for unconditional branches and at the end of the third cycle for conditional branches. Assuming that only the first pipe stage can always be done independent of whether the branch goes and ignoring other pipeline stalls, how much faster

23、would the machine be without any branch hazards?流水線基礎(chǔ)nAnswernPipeline CPI = Ideal pipeline CPI + (Structural Hazard Stalls + Data Hazard Stalls + Control Hazard Stalls) Pipeline Depth Pipeline speedup=- 1 + Pipeline stalls nNo Control Hazard:Pipeline speedupideal = 4/(1+0) = 4流水線基礎(chǔ)nHaving Control Ha

24、zard:Assume 4 stage: IF,ID,EX and WBnHandle Jump & Call:InstructionClock cycle123456Jump or CallIFIDEXWBi+1IFIFIDEXi+2stallIFIDi+3stallIF流水線基礎(chǔ)nHandle taken conditional branch:InstructionClock cycle123456Taken BranchIFIDEXWBi+1IFstallIFIDi+2stallstallIFi+3stallstall流水線基礎(chǔ)nHandle not-taken conditional

25、branch:InstructionClock cycle123456Not-taken BranchIFIDEXWBi+1IFstallIDEXi+2stallIFIDi+3stallIF流水線基礎(chǔ)nSummary of above 3 control flow instructions:Pipeline Stallreal = (11%) + (29%) +(16%) = 0.25Pipeline Speedupreal= 4/(1+0.25) = 3.2Control flow typeFrequency (per instruction)Stall (cycles)Jump & Cal

26、l1%1Conditional (taken)15%60%=9%2Conditional (not taken)15%40%=6%1流水線基礎(chǔ)nPipeline Speedupwithout control hazard= 4/3.2 = 1.25 25% speedup流水線基礎(chǔ)nA.4A reduced hardware implementation of the classic five-stage RISC pipeline might use the EX stage hardware to perform a branch instruction comparison and th

27、e actually deliver the branch target PC to the IF stage until the clock cycle in which the branch instruction reaches the MEM stage. 流水線基礎(chǔ)nA.4Control hazard stalls can be reduced by resolving branch instructions in ID, but improving performance in one respect may reduce performance in other circumst

28、ances. How does determining branch outcome in the ID stage have the potential to increase data hazard stall cycles?流水線基礎(chǔ)nAnswerregister value computed , comparison performed in EX stage1234567cmptIFIDEXMEMWBbranchIFIDEXMEMWBtargetssIFIDEXregister value computed , comparison performed in ID stage1234

29、567cmptIFIDEXMEMWBbranchIFID(s)IDEXMEMWBtargetsIFIDEXneed register value流水線基礎(chǔ)nAnswerregister value loaded , comparison performed in EX stagen1234567cmptIFIDEXMEMWBbranchIFIDsEXMEMWBtargetsssIFIDregister value loaded , comparison performed in ID stage1234567cmptIFIDEXMEMWBbranchIFssIDEXMEMtargetssIFI

30、D指令級并行n3.2 Consider the following four MIPS code fragments each containing two instructions:i.DADDI R1,R1,#4LD R2,7(R1)ii.DADD R3,R1,R2SD R2,7(R1)iii.SD R2,7(R1)SD F2,200(R7)iv.BEZR1,placeSDR1,7(R1)指令級并行na. For each fragment (i) to (iv) identify each type of dependence that exists or that may exist

31、(a fragment may have no dependence) and describe what data flow, name reuse, or control structure causes or would cause the dependence. For a dependence that may exist, describe the source of the ambiguity and identify the time at which that uncertainty is resolved.nb. For each code fragment, discus

32、s whether dynamic scheduling is, may be, or is not sufficient to allow out-of-order execution of the fragment.指令級并行nAnswer Code fragmentData Dependence? Dynamic scheduling sufficient for out-of-order execution?DADDI R1,R1,4LD R2,7(R1)True dependence of R1No. Changing instruction order will break pro

33、gram semanticsDADD R3,R1,R2SD R2,7(R1)NoneYesSD R2,7(R1)SD F2,200(R7)Output dependence may existMaybe. If the hardware computes the effective addresses early enough, then the store order may be exchanged.BEZ R1,placeSD R1,7(R1)NoneNo. Changing instruction order is speculative until the branch resolv

34、ed指令級并行3.5List all the dependences (output, anti, and true) in the following code fragment. Indicate whether the true dependences are loop carried or not. Show why the loop is not parallel.n for (i=2;i 40%PB1PB2PB1PB2PB1PB2PB1PB2NTTTNTNTNTNTTTTTNTNTNTNTTCorrect prediction?-no-no-yes-no-yes-no-yes-no

35、PB1PB1PB1PB1NTTTNTNTTTNTCorrect prediction?-no-no-no-no-指令級并行nAnswer b.nPrediction Accuracy decreases: 100% - 0%PB1PB2PB1PB2PB1PB2PB1PB2NTTTNTNTTTNTNTTTNTNTTTNTCorrect prediction?-no-no-no-no-no-no-no-noPB1PB1PB1PB1NTTTTTTTTCorrect prediction?-no-yes-yes-yes-指令級并行n3.14Suppose we have a deeply pipeli

36、ne processor. For which we implement a branch-target buffer for the conditional branches only. Assume that the misprediction penalty is always 4 cycles and the buffer miss penalty is always 3 cycles. Assume 90% hit rate and 90% accuracy and 15% branch frequency.Q. How much faster is the processor wi

37、th the branch-target buffer versus a processor that has a fixed 2 cycle branch penalty? Assume a base CPI without branch stall of 1.指令級并行nBranch-Target Buffer (BTB):Address of branch index to get prediction AND branch address (if taken)指令級并行nAnswernCPIBTB - System with a branch-target bufferCPINBTB

38、- System without a branch-target buffer CPINBTB CPIbase + StallNBTB nSpeedup = - = - CPIBTB CPIbase + StallBTBCPIbase = 1 exercise statement指令級并行Stall = sStall FrequencysPenaltys StallNBTB = 15%2=0.3 StallBTB= 1.5%3 + 1.3%4 = 0.097BTB resultBTB predictionFrequency (per instruction) Penalty (cycle)Mi

39、ss-15%10%=1.5%3HitCorrect15%90%90%=12.1%0HitIncorrect15%90%10%=1.3%4Assume 90% hit rate and 90% accuracy and 15% branch frequency指令級并行 CPIbase + StallNBTB 1+0.3Speedup = - = - = 1.2 CPIbase + StallBTB 1+0.097 20% faster指令級并行n3.20 When an instruction is correctly speculated, what is the effect on the

40、 three factors comprising the CPU time formula: dynamic instruction count, average clocks per instruction, and clock cycle time? When speculation is incorrect, it is possible for CPU time to increase. Which factor of the CPU time formula best model this increase and why?nAnalysis: the CPU time=IC*CP

41、I*Clock cycle time指令級并行nAnswer When speculation is correct, it allows an instruction that should execute earlier by reducing or elimination stalls that would occur if execution were delayed until the instruction was no longer speculative. Early execution of a required instruction has no effect on in

42、struction count or clock cycle. The reduction in stall cycles improves CPI.指令級并行nAnswer When speculation is incorrect, instructions that are not on the path of execution are executed and their results ignored. There is no effect on clock cycle time, but the dynamic instruction count increases. The m

43、ix of instructions executed may change and lead to a minor effect on CPI, but the majority of the increase in CPU time will be due to the cycles spent on incorrectly speculated instructions, which is best modeled as an increase in IC.存儲系統(tǒng)n4.6In systems with a write-through L1 cache backed by a write

44、-back L2 cache instead of main memory, a merging write buffer can be simplified. Q. Explain how this can be done. 存儲系統(tǒng)nMerging Write BuffernIf the write buffer is empty, the data and the full address are written in the buffernWrite merging: If the write buffer contains other modified blocks, the add

45、resses can be checked to see if the address of this new data matches the address of a valid write buffer entry. if so, the new data are combined with that entry.nIf the buffer is full and there is no address match, the cache (and CPU) must wait until the buffer has an empty entry.存儲系統(tǒng)nAnswerThe merg

46、ing write buffer links the CPU to the write-back L2 cache. Two CPU writes cannot merge if they are to different sets in L2. So, each new entry into the buffer a quick check on only those address bits that determine the L2 set number need be performed at first. If there is no match in this “screening

47、” test, then the new entry is not merged. If there is a set number match, then all address bits can be checked for a definitive result.存儲系統(tǒng)n4.8 一個1616的矩陣，要示在一個存儲器周期內(nèi)實現(xiàn)按行，按列，按對角線和按反對角線的無沖突訪問。至少需要多少個存儲體？寫出矩陣的各元素在各個存儲體中存放的位置。nAnswer：對NN數(shù)組，同列相鄰元素地址距離為1,同行相鄰元素地址距離為2.則m取成22p+1.實現(xiàn)無沖突訪問的充分條件是使1=2p, 2=1 本題中m

48、=17,p=2, 1=4, 2=1存儲系統(tǒng)1234567891011121314151617000102030405060708090100110120130140151011.2021.存儲系統(tǒng)n4.19 假設(shè)在一個采用組相聯(lián)映象方式的Cache中，主存由B0-B7共8塊組成，Cache有2組，每組2塊，每塊的大小為16個字節(jié)，采用LRU塊替換算法。n在一個程序執(zhí)行過程中依次訪問這個Cache的塊地址流如下：B6，B2，B4，B1，B4，B6，B3，B0，B4，B5，B7，B3n假設(shè)主存與Cache之間的各個塊的映象的對應(yīng)關(guān)系如下：C0 C1 C2 C3B0B1B2B3B4B5B6B7區(qū)0區(qū)

49、1n寫出主存地址的格式，并標(biāo)出各字段的長度n寫出Cache的地址格式，并標(biāo)出各字段的長度n如果Cache的各個塊號為C0,C1,C2,C3,列出程序執(zhí)行過程中Cache的塊地址流情況n如果采用LRU替換算法，計算Cache的塊命中率n如果采用FIFO替換算法，計算Cache的塊命中率存儲系統(tǒng)區(qū)號1組號1塊號1塊內(nèi)地址（4位）組號1塊號1塊內(nèi)地址（4位）寫出Cache的地址格式，并標(biāo)出各字段的長度寫出主存地址的格式，并標(biāo)出各字段的長度存儲系統(tǒng)存儲系統(tǒng)3. 如果Cache的各個塊號為C0,C1,C2,C3,列出程序執(zhí)行過程中Cache的塊地址流情況B6B2B4B1B4B6B3B0B4B5B7B3

50、C0B4B4*B4B4*B4*B4*B4B4*B4*B4*C1B1B1*B1*B1*B0B0*B5B5*B5*C2B6B6*B6*B6*B6*B6B6*B6*B6*B6*B7B7*C3B2B2*B2*B2*B2*B3B3*B3*B3*B3*B3B6，B2， B4， B1，B4， B6， B3， B0，B4，B5， B7，B3C2C3 C0 C1C0 C2 C3 C1C0C1 C2C3n4. 如果采用LRU替換算法，計算Cache的塊命中率B6B2B4B1B4B6B3B0B4B5B7B3C0B4B4*B4B4*B4*B4*B4B4*B4*B4*C1B1B1*B1*B1*B0B0*B5B5*B5*

51、C2B6B6*B6*B6*B6*B6B6*B6*B6*B6*B7B7*C3B2B2*B2*B2*B2*B3B3*B3*B3*B3*B3命中率為：命中率為：4/12=1/3=33.3%存儲系統(tǒng)存儲系統(tǒng)B6B2B4B1B4B6B3B0B4B5B7B3C0B4B4*B4B4*B4*B0B0*B5B5*B5*C1B1B1*B1*B1*B1*B4B4*B4*B4*C2B6B6*B6*B6*B6*B6B3B3*B3*B3*B3*B3C3B2B2*B2*B2*B2*B2*B2*B2*B2*B7B7*命中率為：命中率為：3/12=1/4=25%5.如果采用FIFO替換算法，計算Cache的塊命中率存儲系統(tǒng)n4

52、.20Some memory systems handle TLB missed in software (as an exception), while others use hardware for TLB (Translation Lookaside Buffer) misses.存儲系統(tǒng)Qa. What are the trade-off between two methods for handling TLB misses?Qb. Will TLB miss handling in software always be slower than TLB miss handling in

53、 hardware? Explain.Qc. Are there page table structures that would be difficult to handle in hardware, but possible in software? Are there any such structure that would be difficult for software to handle but easy for hardware to manage?存儲系統(tǒng)Qd. Use the data from Figure 5.45 to calculate the penalty t

54、o CPI for TLB misses on the following workload assuming hardware TLB handlers require 10 cycles per miss and software TLB handlers takes 30 cycles per miss: (50% gcc, 25% perl, 25%ijpeg), (30% swim, 30% wave5, 20% hydro2d, 10% gcc). Qe. Are the TLB miss times in part(d) realistic? Discuss.Qf. Why ar

55、e TLB miss rate for floating-point program generally higher than those for integer program?存儲系統(tǒng)nAnswer a. Software is slower because of the overhead of a context switch to the handler code, but the replacement algorithm can be higher than hardware and a wider variety of virtual memory organizations can be readily accommodated.Hardware - faster but less flexiblenAnswer b. Factors affecting on the handling time include:nPage table paged?nMore efficient page table searching algorithm softwarenTLB entry prefetching hardwarenAnswer c. Page table structu

人人文庫> 全部分類> 教育資料 > 課件下載

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

計算機系統(tǒng)結(jié)構(gòu)習(xí)題講解(1)0929重定序

文檔簡介

溫馨提示

最新文檔

評論

計算機系統(tǒng)結(jié)構(gòu)習(xí)題講解(1)0929重定序

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔