第1章計(jì)算機(jī)體系結(jié)構(gòu)基本原理課件_第1頁(yè)
第1章計(jì)算機(jī)體系結(jié)構(gòu)基本原理課件_第2頁(yè)
第1章計(jì)算機(jī)體系結(jié)構(gòu)基本原理課件_第3頁(yè)
第1章計(jì)算機(jī)體系結(jié)構(gòu)基本原理課件_第4頁(yè)
第1章計(jì)算機(jī)體系結(jié)構(gòu)基本原理課件_第5頁(yè)
已閱讀5頁(yè),還剩71頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

ComputerArchitecture

計(jì)算機(jī)系統(tǒng)結(jié)構(gòu)UndergraduateCourse

WeiminWu(吳為民)SchoolofComputerandInformationTechnology,BeijingJiaotongUniveristySpring2014內(nèi)容1.FundamentalsofComputerArchitecture計(jì)算機(jī)系統(tǒng)結(jié)構(gòu)的基本原理2.InstructionSet指令集3.Pipeling流水線(xiàn)4.MemoryHierarchy存儲(chǔ)層次5.Input-OutputSubsystem輸入輸出子系統(tǒng)6.InterconnectionNetworks7.ParallelComputers

本課的一般情況1.

共48學(xué)時(shí)(24次課)。其中課堂講授32學(xué)時(shí)(16次課),實(shí)驗(yàn)16學(xué)時(shí)(8次課)。2.平時(shí)包括考勤、課堂作業(yè)和上機(jī)作業(yè)。3.最終有期末考試,開(kāi)卷。英文試卷。4.考核方式:平時(shí)40%,期末60%。5.要求盡量讀懂英文原文。讀不懂的地方可參見(jiàn)本書(shū)的翻譯版或者張晨曦的計(jì)算機(jī)系統(tǒng)結(jié)構(gòu)教材。也可發(fā)Email給我:wmwu@著重注意:作業(yè)實(shí)驗(yàn)報(bào)告中務(wù)必寫(xiě)上你的課程班號(hào)(01或03),學(xué)號(hào),姓名。1.FundamentalsofComputerArchitecture1.1LayersofComputerSystem

計(jì)算機(jī)系統(tǒng)的層次

1.2ComputerArchitectureandImplementation

計(jì)算機(jī)的系統(tǒng)結(jié)構(gòu)和實(shí)現(xiàn)1.3TheTaskofAComputerDesigner

計(jì)算機(jī)設(shè)計(jì)者的任務(wù)1.4MeasuringandReportingPerformance

測(cè)量和報(bào)告性能1.5QuantitativePrinciplesofComputerDesign

計(jì)算機(jī)設(shè)計(jì)的量化原理1.6ClassificationofComputerArchitecture

計(jì)算機(jī)系統(tǒng)結(jié)構(gòu)的分類(lèi)計(jì)算機(jī)系統(tǒng)結(jié)構(gòu)的基本原理1.1LayersofComputersystemsApplicationLanguageMachineM5應(yīng)用語(yǔ)言機(jī)High-LevelLanguageMachineM4高級(jí)語(yǔ)言機(jī)AssemblyLanguageMachineM3匯編語(yǔ)言機(jī)OperatingSystemMachineM2操作系統(tǒng)機(jī)ConventionalMachineM1傳統(tǒng)機(jī)MicroprogramMachineM0微程序機(jī)每個(gè)層次執(zhí)行相關(guān)的功能子集。每個(gè)層次要依賴(lài)于下一個(gè)低層去執(zhí)行更原始的功能。這就將問(wèn)題分解成更易處理的子問(wèn)題。從M2到M5的層次是虛擬機(jī)。在傳統(tǒng)機(jī)上的指令(算數(shù)、邏輯等)由微程序級(jí)的程序?qū)崿F(xiàn)。該程序是作為一個(gè)解釋器,能理解一組簡(jiǎn)單的操作集合,稱(chēng)為微指令集。計(jì)算機(jī)系統(tǒng)的層次1.2ComputerArchitectureandImplementationComputerArchitecture

計(jì)算機(jī)系統(tǒng)結(jié)構(gòu)Referstothoseattributesofasystemvisibletoaprogrammer,

orthoseattributeshavedirectimpactonlogicalexecutionofprogram.

程序員可見(jiàn),或者對(duì)程序執(zhí)行有直接影響的屬性Implementation實(shí)現(xiàn)Twocomponents:Organizationandhardware.*Organization(組織):includeshigh-levelaspectsofacomputer’sdesign,

suchas:memorysystem,busstructure,internalCPU.*Hardware(硬件):referstothespecificsofamachine,include:detailedlogicdesignandpackagingtechnology.計(jì)算機(jī)系統(tǒng)結(jié)構(gòu)和實(shí)現(xiàn)ArchitecturalAttributes系統(tǒng)結(jié)構(gòu)方面的屬性instructionset,指令集I/Omechanisms,I/O機(jī)制techniquesforaddressingmemory尋址技術(shù)

numberofbitsrepresentingvariousdatatype(numbers,characters)表示各種數(shù)據(jù)類(lèi)型的位數(shù)(數(shù)值、字符)1.2ComputerArchitectureandImplementation,cont’dHardwareAttributes硬件方面的屬性packagingtechnology封裝技術(shù)power功耗cooling冷卻

OrganizationalAttributes組織方面的屬性Hardwaredetailstransparenttotheprogrammer.

對(duì)于程序員透明的硬件細(xì)節(jié)suchas:controlsignals控制信號(hào)computer/peripheralinterfaces計(jì)算機(jī)/外設(shè)接口

memorytechnology存儲(chǔ)技術(shù)1.2ComputerArchitectureandImplementation,cont’dArchitecturalDesignIssue系統(tǒng)結(jié)構(gòu)設(shè)計(jì)問(wèn)題Whetheracomputerwillhaveamultiplyinstruction.是否要有一個(gè)乘法指令OrganizationalIssue組織設(shè)計(jì)問(wèn)題Whethertheinstructionwillbeimplementedbyaspecialmultiplyunitorbyrepeateduseoftheaddunit.是采用乘法單元還是采用加法單元迭代使用Thedecisionmaybebasedontheanticipatedfrequencyofuseofthemultiplyinstruction,therelativespeedofthetwoapproaches,andthecostandphysicalsizeofaspecialmultiplyunit.決策取決于乘法指令使用頻率,兩種方法的相對(duì)速度,乘法單元的成本和大小1.2ComputerArchitectureandImplementation,cont’d1.3TheTaskofAComputerDesignerIsacomplexone:是一個(gè)復(fù)雜的問(wèn)題

*Determinewhatattributesareimportantforanewmachine.確定哪些屬性是重要的*Designamachinetomaximizeperformance(性能)

whilestayingwithincost(成本)

andpower(功耗)

constraints,including:instructionsetdesign指令集設(shè)計(jì)

functionalorganization功能設(shè)計(jì)

logicdesign邏輯設(shè)計(jì)

implementation(實(shí)現(xiàn)):ICdesign,package,cooling計(jì)算機(jī)設(shè)計(jì)者的任務(wù)功能要求需要或支持的典型特征補(bǔ)充知識(shí)集成電路產(chǎn)業(yè)發(fā)展的里程碑:1947:Bell實(shí)驗(yàn)室的Bardeen、Brattain、Schockly發(fā)明了晶體管。共獲1956年諾貝爾物理學(xué)獎(jiǎng)。

晶體管是IC產(chǎn)業(yè)的基石。1952:SONY開(kāi)發(fā)出第一個(gè)基于晶體管的收音機(jī)。集成電路產(chǎn)業(yè)發(fā)展的里程碑(續(xù)):1958:TI的Kilby發(fā)明了第一塊集成電路(IC)。獲2000年諾貝爾物理學(xué)獎(jiǎng)。Noyce將其完善實(shí)用化。集成電路產(chǎn)業(yè)發(fā)展的里程碑(續(xù)):1965:Moore對(duì)IC發(fā)展作出預(yù)言:Moore定律GordonMooreIntelCo-FounderandChairmainEmeritusImagesource:IntelCorporation

歷史證明一直正確。但是,會(huì)繼續(xù)持續(xù)下去嗎?物理限制經(jīng)濟(jì)限制晶體管密度每18-24個(gè)月翻一番。性能每18-24個(gè)月翻一番。舉個(gè)例子:光刻過(guò)程:因此:產(chǎn)生光刻畸變,需要矯正(OPC)集成電路產(chǎn)業(yè)發(fā)展的里程碑(續(xù)):1968:Noyce和Moore建立了Intel。1970:Intel開(kāi)發(fā)出1KDRAM。1971:Intel研發(fā)出4位的4004微處理器(2250個(gè)晶體管)。集成電路產(chǎn)業(yè)發(fā)展的里程碑(續(xù)):1976/81:APPLEII/IBMPC。1984:Xilinx發(fā)明了FPGA。1985:Intel開(kāi)始集中研發(fā)微處理器產(chǎn)品。集成電路產(chǎn)業(yè)發(fā)展的里程碑(續(xù)):1987:TSMC建立.全球最大的專(zhuān)業(yè)芯片制造服務(wù)公司。1991:ARM開(kāi)發(fā)出其第一個(gè)可嵌入的RISCIP核(無(wú)芯片IC設(shè)計(jì))。集成電路產(chǎn)業(yè)發(fā)展的里程碑(續(xù)):1996:三星開(kāi)發(fā)出1GDRAM。1998:IBM研發(fā)出1GHz實(shí)驗(yàn)型微處理器。集成電路產(chǎn)業(yè)發(fā)展的里程碑(續(xù)):1999/較早:系統(tǒng)芯片(System-on-Chip,SOC)應(yīng)用。2002/較早:系統(tǒng)封裝(System-in-Package,SiP)工藝。1.4MeasuringandReportingPerformance快的涵義?*Theusermaysayacomputerisfasterwhenaprogramrunsinlesstime.用戶(hù):程序運(yùn)行時(shí)間短*thecomputercentermanagermaysayacomputerisfasterwhenitcompletesmorejobsinanhour.計(jì)算機(jī)中心經(jīng)理:在一小時(shí)內(nèi)做更多工作*Thecomputeruserisinterestedinreducingresponsetime(響應(yīng)時(shí)間)—thetimebetweenthestartandthecompletionofanevent—alsoreferredtoasexecutiontime(執(zhí)行時(shí)間).*Themanagerofadataprocessingcentermaybeinterestedinincreasingthroughput(吞吐量)—thetotalamountofworkdoneinagiventime.測(cè)量和報(bào)告計(jì)算機(jī)的性能Comparingdesignalternatives:*“XisfasterthanY”meanthattheresponsetimeisloweronXthanonY.X比Y快涵義*“XisntimesfasterthanY”mean:X比Y快n倍*Sinceexecutiontimeisthereciprocalofperformance:執(zhí)行時(shí)間是性能的倒數(shù)1.4MeasuringandReportingPerformance,cont’dEvenexecutiontimecanbedefinedindifferentways:執(zhí)行時(shí)間的不同定義*wall-clocktime,responsetime,orelapsedtime,whichisthelatencytocompleteatask,includingdiskaccesses,memoryaccesses,input/output

activities,operatingsystemoverhead.最直接的定義

*WithmultiprogrammingtheCPUworksonanotherprogramwhilewaitingforI/Oandmaynotnecessarilyminimizetheelapsedtimeofoneprogram.Henceweneedatermtotakethisactivityintoaccount.但多道程序的情況要考慮MeasuringPerformance測(cè)量性能1.4MeasuringandReportingPerformance,cont’dEvenexecutiontimecanbedefinedindifferentways:執(zhí)行時(shí)間的不同定義*CPUtime(CPU時(shí)間):meansthetimeCPUiscomputing,notincludingthetimewaitingforI/Oorrunningotherprograms.*CPUtimecanbefurtherdividedinto:進(jìn)一步分為

theCPUtimespentintheprogram,calleduserCPUtime(用戶(hù)CPU時(shí)間),theCPUtimespentintheoperatingsystemperformingtasksrequestedbytheprogram,calledsystemCPUtime(系統(tǒng)CPU時(shí)間).MeasuringPerformance測(cè)量性能1.4MeasuringandReportingPerformance,cont’dChoosingProgramstoEvaluatePerformance

選擇程序來(lái)評(píng)估性能1.4MeasuringandReportingPerformance,cont’dfourlevelsofprogramslistedbelowindecreasingorderofaccuracyofprediction.四個(gè)層次的程序,按預(yù)測(cè)精確度從高到底的次序1.Realapplications

真實(shí)應(yīng)用*ExamplesarecompilersforC,text-processingsoftwarelikeWord,andotherapplicationslikePhotoshop.*Realapplicationshaveinput,output,andoptionsthatausercanselectwhenrunningtheprogram.有輸入、輸出、可選項(xiàng)1.4MeasuringandReportingPerformance,cont’d

2.Kernels

核心程序*extractsmall,keypiecesfromrealprogramsandusethemtoevaluateperformance.關(guān)鍵片段*Unlikerealprograms,nouserwouldrunkernelprograms,fortheyexistsolelytoevaluateperformance.無(wú)實(shí)際用途,只用于評(píng)估性能*Kernelsarebestusedtoisolateperformanceofindividualfeaturesofamachinetoexplainthereasonsfordifferencesinperformanceofrealprograms.最便于辨析出機(jī)器單個(gè)特性的性能ChoosingProgramstoEvaluatePerformance

選擇程序來(lái)評(píng)估性能3.Toybenchmarks

玩具測(cè)試基準(zhǔn)*typicallybetween10and100linesofcodeandproducearesulttheuseralreadyknows.

10-100行的代碼,運(yùn)行結(jié)果已知。*ProgramslikePuzzle,andQuicksortarepopularbecausetheyaresmall,easytotype,andrunonalmostanycomputer.

小,易于鍵入,可運(yùn)行于幾乎所有計(jì)算機(jī)。1.4MeasuringandReportingPerformance,cont’dChoosingProgramstoEvaluatePerformance

選擇程序來(lái)評(píng)估性能4.Syntheticbenchmarks

合成測(cè)試基準(zhǔn)*Similarinphilosophytokernels,syntheticbenchmarkstrytomatchtheaveragefrequencyofoperationsandoperandsofalargesetofprograms.匹配程序中操作和操作數(shù)的平均頻率*Nouserrunssyntheticbenchmarks,becausetheydon’tcomputeanythingausercouldwant.1.4MeasuringandReportingPerformance,cont’dChoosingProgramstoEvaluatePerformance

選擇程序來(lái)評(píng)估性能puttogethercollectionsofbenchmarkstomeasuretheperformanceofprocessorswithavarietyofapplications.是一個(gè)有各種應(yīng)用的組合Akeyadvantageofsuchsuitesisthattheweaknessofonebenchmarkislessenedbythepresenceofotherbenchmarks.優(yōu)劣互補(bǔ)Benchmarksuitsaremadeofcollectionsofprograms,someofwhichmaybekernels,butmanyofwhicharetypicallyrealprograms.有些是核心程序,但很多是真實(shí)程序BenchmarkSuites測(cè)試基準(zhǔn)程序1.4MeasuringandReportingPerformance,cont’dTheguidingprincipleofreportingperformancemeasurementsshouldbereproducibility

(可重現(xiàn)性).requiresafairlycompletedescriptionofthemachine,thecompilerflags,aswellasthepublicationofboththebaselineandoptimizedresults.要求完整的描述containstheactualperformancetimes,shownbothintabularformandasagraph.

包含實(shí)際性能,并用表或圖的形式表示ReportingPerformanceResults報(bào)告性能結(jié)果1.4MeasuringandReportingPerformance,cont’dComparingandSummarizingPerformance

比較和總結(jié)性能1.4MeasuringandReportingPerformance,cont’dbattlesarefoughtoverwhatisthefairwaytosummarizerelativeperformanceofacollectionofprograms.什么是公平的方法Forexample,twoarticlesonsummarizingperformanceinthesamejournaltookopposingpointsofview.觀點(diǎn)不同F(xiàn)igure1.5,takenfromonearticle,isanexampleoftheconfusionthatcanarise.thefollowingstatementshold:*Ais10timesfasterthanBforprogramP1.A比B快10倍*Bis10timesfasterthanAforprogramP2.B比A快10倍*Ais20timesfasterthanCforprogramP1.A比C快20倍*Cis50timesfasterthanAforprogramP2.C比A快50倍*Bis2timesfasterthanCforprogramP1.B比C快2倍*Cis5timesfasterthanBforprogramP2.C比B快5倍TherelativeperformanceofA,B,andCisunclear.結(jié)論不明1.4MeasuringandReportingPerformance,cont’dusetotalexecutiontimeofP1andP2.*Bis9.1timesfasterthanA.*Cis25timesfasterthanA.*Cis2.75timesfasterthanB.Thissummarytracksexecutiontime,ourfinalmeasureofperformance.執(zhí)行時(shí)間,最終性能度量IftheworkloadconsistedofrunningprogramsP1andP2anequalnumberoftimes,thestatementsabovewouldpredicttherelativeexecutiontimes.如果P1和P2的執(zhí)行次數(shù)相等,okTotalExecutionTime:AConsistentSummaryMeasure總體執(zhí)行時(shí)間1.4MeasuringandReportingPerformance,cont’dAnaverageoftheexecutiontimeisthearithmeticmean:平均執(zhí)行時(shí)間whereTimeiistheexecutiontimefortheithprogram.1.4MeasuringandReportingPerformance,cont’dAreprogramsP1andP2infactrunequallyintheworkload?P1和P2同等嗎?程序出現(xiàn)頻率不同時(shí)的執(zhí)行時(shí)間計(jì)算方法。Ifnot,thenoneapproachistoassignaweightingfactor

wi

toeachprogramtoindicatetherelativefrequencyoftheprograminworkload.

第一種方法:對(duì)每個(gè)程序賦予權(quán)值,指明其出現(xiàn)的相對(duì)頻率WeightedExecutionTime加權(quán)執(zhí)行時(shí)間1.4MeasuringandReportingPerformance,cont’dThisiscalledtheweightedarithmeticmean:加權(quán)算數(shù)平均值whereWeighti

isthefrequencyoftheithprogramintheworkloadandTimei

istheexecutiontimeofthatprogram.1.4MeasuringandReportingPerformance,cont’dFigure1.6showsthedatafromFigure1.5withthreedifferentweightings,eachproportionaltotheexecutiontimeofaworkloadwithagivenmix.權(quán)值設(shè)定:與執(zhí)行時(shí)間成比例1.4MeasuringandReportingPerformance,cont’dABCAsecondapproachtounequalmixtureofprogramsistonormalizeexecutiontimestoareferencemachine(參考機(jī))

andtaketheaverageofthenormalizedexecutiontimes.第二種方法:歸一化執(zhí)行時(shí)間,再取平均值performanceofnewprogramscanbepredictedbysimplymultiplyingthisnumbertimesitsperformanceonthereferencemachine.實(shí)際性能=歸一化數(shù)×參考機(jī)性能NormalizedExecutionTimeandtheProsandConsofGeometricMeans歸一化執(zhí)行時(shí)間,以及幾何平均值的優(yōu)劣1.4MeasuringandReportingPerformance,cont’dAveragenormalizedexecutiontimecanbeexpressedaseitheranarithmeticorgeometricmean.可采用算數(shù)或幾何平均值Theformulaforthegeometricmeanis

幾何平均值的表達(dá)式whereExecutiontimeratioi

istheexecutiontime,normalizedtothereferencemachine,fortheithprogramofatotalofnintheworkload.1.4MeasuringandReportingPerformance,cont’dGeometricmeanshaveanicepropertyfortwosamplesXi

andYi:幾何平均值的好性質(zhì)幾何平均值的比率與比率的幾何平均值相同1.4MeasuringandReportingPerformance,cont’dIncontrasttoarithmeticmeans,geometricmeansofnormalizedexecutiontimesareconsistentnomatterwhichmachineisthereference.Hence,thearithmeticmeanshouldnotbeusedto.無(wú)論采用哪個(gè)機(jī)器作為參考機(jī),歸一化執(zhí)行時(shí)間的幾何平均值都是一致的。故不應(yīng)采用算數(shù)平均值。Figure1.7showssomevariationsusingbotharithmeticandgeometricmeans.ExecutiontimesfromFigure1.5normalizedtoeachmachine1.4MeasuringandReportingPerformance,cont’dThearithmeticmeanperformancevariesdependingonwhichisthereferencemachine*incolumn2,B’sexecutiontimeisfivetimeslongerthanA’s,althoughthereverseistrueincolumn4.*Incolumn3,Cisslowest,butincolumn9,Cisfastest.1.4MeasuringandReportingPerformance,cont’dThegeometricmeansareindependentofnormalization*AandBhavethesameperformance,andtheexecutiontimeofCis0.63ofAorB(1/1.58is0.63).*Unfortunately,thetotalexecutiontimeofAis10timeslongerthanthatofB,andBinturnisabout3timeslongerthanC.*Asapointofinterest,therelationshipbetweenthemeansofthesamesetofnumbersisalways:geometricmean≤arithmeticmeanadvantage:geometricmeanisindependentoftherunningtimesofindividualprograms,anditdoesn’tmatterwhichmachineisusedtonormalize.與各個(gè)程序運(yùn)行時(shí)間無(wú)關(guān),與采用哪一個(gè)機(jī)器進(jìn)行歸一化無(wú)關(guān)drawback:geometricmeansviolateourfundamentalprincipleofperformancemeasurement---donotpredictexecutiontime.違反了性能測(cè)量的基本原理,不預(yù)測(cè)時(shí)間1.4MeasuringandReportingPerformance,cont’dNormalizedExecutionTimeandtheProsandConsofGeometricMeans歸一化執(zhí)行時(shí)間,以及幾何平均值的優(yōu)劣MakeCommonCaseFast使常見(jiàn)情況更快Perhapsthemostimportantandpervasiveprincipleofcomputerdesignistomakethecommoncasefast.Inmakingadesigntradeoff,favorfrequentcaseoverinfrequentcase.照顧經(jīng)常發(fā)生的情況Thisprinciplealsoapplieswhendetermininghowtospendresources.

對(duì)資源使用也是這個(gè)道理1.5QuantitativePrinciplesofComputerDesign計(jì)算機(jī)設(shè)計(jì)的量化原理1.5QuantitativePrinciplesofComputerDesignAmdahl’sLaw阿姆達(dá)爾定律TheperformancegainobtainedbyimprovingsomeportionofacomputercanbecalculatedusingAmdahl’sLaw.用途Amdahl’sLawstatesthattheperformanceimprovementtobegainedfromusingsomefastermodeofexecutionislimitedbythefractionofthetimethefastermodecanbeused.阿姆達(dá)爾定律的涵義:由某些部分加速所得到的性能提高受加速部分的百分率所限。1.5QuantitativePrinciplesofComputerDesign或者Amdahl’sLawdefinesthespeedup

thatcanbegainedbyusingaparticularfeature.Speedupistheratio加速比的定義Amdahl’sLawgivesusaquickwaytofindthespeedupfromsomeenhancement,Speedupoverall,whichdependsontwofactors:加速比取決于兩個(gè)因素1.Thefractionofthecomputationtimeintheoriginalmachinethatcanbeconvertedtotakeadvantageoftheenhancement.

能加速的部分Fractionenhanced12.Theimprovementgainedbytheenhancedexecutionmode.

能加速的程度Speedupenhanced11.5QuantitativePrinciplesofComputerDesign新的執(zhí)行時(shí)間Theoverallspeedupistheratiooftheexecutiontimes:總體加速比1.5QuantitativePrinciplesofComputerDesignEXAMPLE:Supposethatweareconsideringanenhancementthatruns10timesfasterthantheoriginalmachine,butisonlyusable40%ofthetime.Whatistheoverallspeedupgainedbyincorporatingtheenhancement?例子1.5QuantitativePrinciplesofComputerDesignAmdahl’sLawexpressesthelawofdiminishingreturns(回報(bào)遞減法則):Theincrementalimprovementinspeedupgainedbyanadditionalimprovementinjustaportionofthecomputationdiminishesasimprovementsareadded.對(duì)于一部分性能的提高,總體加速比的提高呈遞減AnimportantcorollaryofAmdahl’sLawisthatifanenhancementisonlyusableforafractionofatask,wecan’tspeedupthetaskbymorethanthereciprocalof1minusthatfraction.總體加速比有上界1.5QuantitativePrinciplesofComputerDesignEXAMPLE:Implementationsoffloating-pointsquareroot(FPSQR)

varysignificantlyinperformance.SupposeFPSQRisresponsiblefor20%oftheexecutiontimeofacriticalbenchmark.OneproposalistoaddFPSQRhardwarethatwillspeedupthisoperationbyafactorof10.TheotheralternativeisjusttotrytomakeallFPinstructionsrunfaster;FPinstructionsareresponsibleforatotalof50%oftheexecutiontime.ThedesignteambelievesthattheycanmakeallFPinstructionsruntwotimesfasterwiththesameeffortasrequiredforthefastsquareroot.Comparethesetwodesignalternatives.ANSWER:comparingthespeedups:2.00.751.33ImprovingtheperformanceoftheFPoperationsoverallisslightlybetterbecauseofthehigherfrequency.1.5QuantitativePrinciplesofComputerDesignTheCPUPerformanceEquationCPU性能方程Essentiallyallcomputersareconstructedusingaclockrunningataconstantrate.Thesediscretetimeeventsarecalledticks,clockticks,clockperiods,clocks,cycles,orclockcycles.時(shí)鐘Computerdesignersrefertothetimeofaclockperiodbyitsduration(e.g.,1ns)orbyitsrate(e.g.,1GHz).CPUtimeforaprogramcanthenbeexpressedintwoways:程序的CPU時(shí)間1.5QuantitativePrinciplesofComputerDesignwecanalsocountthenumberofinstructionsexecuted---theinstructionpathlength

orinstructioncount

(IC).指令數(shù)

Ifweknowthenumberofclockcyclesandtheinstructioncountwecancalculatetheaveragenumberofclockcyclesperinstruction(CPI).

每條指令的平均時(shí)鐘數(shù)1.5QuantitativePrinciplesofComputerDesignThisallowsustouseCPIintheexecutiontimeformula:執(zhí)行時(shí)間的公式Expandingthefirstformulaas:1.5QuantitativePrinciplesofComputerDesignorSo,CPUperformanceisdependentupon:clockcycle(orrate),CPI,andIC.Butitisdifficulttochangeoneparameterinisolationfromothersbecausethebasictechnologiesinvolvedareinterdependent:很難改變一個(gè)參數(shù)而不影響其它參數(shù)*Clockcycletime

--Hardwaretechnologyandorganization*CPI--OrganizationandISA*Instructioncount--ISAandcompilertechnologyLuckily,manyimprovementtechniquesprimarilyimproveonecomponentwithsmallorpredictableimpactsontheothertwo.幸好,很多技術(shù)在改進(jìn)一個(gè)部分時(shí),對(duì)于其他部分影響很小或影響可預(yù)測(cè)1.5QuantitativePrinciplesofComputerDesignSometimesitisusefulindesigningtheCPUtouse:另一種計(jì)算公式whereICi

representsnumberoftimesinstructioniisexecutedinaprogramandCPIi

representstheaveragenumberofclockcyclesforinstructioni.ThisformcanbeusedtoexpressCPUtimeas:1.5QuantitativePrinciplesofComputerDesignandCPIas:EXAMPLE:

例子Supposewehavethefollowingmeasurements:*FrequencyofFPoperations=25%*AverageCPIofFPoperations=4.0*AverageCPIofotherinstructions=1.33*FrequencyofFPSQR=2%*CPIofFPSQR=20

測(cè)量結(jié)果Assumethatthetwodesignalternativesareto

reducetheCPIofFPSQRto2ortoreducetheaverageCPIofallFPoperationsto2.ComparethesetwodesignalternativesusingtheCPUperformanceequation.設(shè)計(jì)選擇1.5QuantitativePrinciplesofComputerDesignANSWER:答案First,observethatonlytheCPIchanges;theclockrateandinstructioncountremainidentical.只有CPI變化了WecancomputetheCPIfortheenhancedFPSQRby:增強(qiáng)FPSQR的CPI1.5QuantitativePrinciplesofComputerDesignWecomputetheCPIfortheenhancementofallFPinstructions:增強(qiáng)FP指令的CPITheCPIofoverallFPenhancementislower,itsperformancewillbetter.改進(jìn)FP的CPI更好Specifically,thespeedupfortheoverallFPenhancementis:2.01.5

1.5

1.33

1.5QuantitativePrinciplesofComputerDesignMeasuringtheComponentsofCPUPerformance

測(cè)量CPU性能的各組成部分TousetheCPUperformanceequation,weneedmeasurementsoftheindividualcomponents.需要測(cè)量性能非常的各組成部分Todeterminetheclockcycle:時(shí)鐘周期*iseasyforanexistingCPU.現(xiàn)有CPU:容易*Low-leveltools,calledtimingestimatorsortimingverifiers,areusedforacompleteddesign.

已完成的設(shè)計(jì):用時(shí)延估計(jì)器或時(shí)延驗(yàn)證器*Foradesignthatisnotcompleted,byexaminingcriticalpaths.未完成的設(shè)計(jì):考察關(guān)鍵路徑1.5QuantitativePrinciplesofComputerDesignMeasuringtheinstructioncount:

指令數(shù)測(cè)量*compilertogetherwithtoolsthatmeasuretheinstructionsetbehavior.編譯器及測(cè)量指令集行為的工具*Foracompiledversionofaprogram,therearetwomajormethodstoobtainIC.如何獲得ICfirstway:byinstructionsetsimulatorthatinterpretstheinstructions—slowbutcanmeasurealmostanyaspectofinstructionsetbehavioraccurately.指令集模擬器:慢,但能精確地測(cè)量指令集行為的幾乎所有方面secondway:usesexecution-basedmonitoring.thebinaryprogramismodifiedtoincludeinstrumentationcode

—veryfast,sinceprogramisexecuted,ratherthaninterpreted用基于執(zhí)行的監(jiān)視:修改程序(插樁代碼),快。1.5QuantitativePrinciplesofComputerDesignMeasuringtheCPI:difficult測(cè)量CPI困難*Forsimpleprocessors,CPIfromatable.查表*Formodernprocessorsusetechniquessuchaspipeliningandmemoryhierarchies:對(duì)于帶流水線(xiàn)和存儲(chǔ)層次的現(xiàn)代處理器DesignersoftenuseaverageCPIvalues,buttheseaverageCPIsarecomputedbymeasuringtheeffectsofthepipelineandcachestructure.通常使用平均CPI,需考慮流水線(xiàn)和cache結(jié)構(gòu)itisoftenusefultoseparatethecomponentarisingfromthememorysystemandthecomponentdeterminedbythepipeline.流水線(xiàn)和存儲(chǔ)系統(tǒng)分別考慮Thus,wecancomputetheCPIforinstructioni,as:

CPIi=PipelineCPIi+MemorysystemCPIi1.5QuantitativePrinciplesofComputerDesignUsingtheCPUPerformanceEquations:MoreExamples運(yùn)用CPU性能方程:更多例子EXAMPLE:例子weareconsideringtwoalternativesforourconditionalbranchinstructions(條件轉(zhuǎn)移指令),as:條件轉(zhuǎn)移指令有兩種設(shè)計(jì)選擇

*CPUA:Aconditioncodeissetbyacompareinstructionandfollowedbyabranchthatteststheconditioncode.先用比較指令置條件碼,然后轉(zhuǎn)移指令檢測(cè)條件碼*CPUB:Acompareisincludedinthebranch.

轉(zhuǎn)移指令中包含了比較操作1.5QuantitativePrinciplesofComputerDesignOnbothCPUs,conditionalbranchinstructiontakes2cycles,andallotherinstructionstake1clockcycle.條件轉(zhuǎn)移指令2周期,其他指令1周期

OnCPUA,20%ofallinstructionsexecutedareconditionalbranches.Sinceeverybranchneedsacompare,another20%oftheinstructionsarecompares.CPUA:有20%條件轉(zhuǎn)移指令,相應(yīng)也就有20%的比較指令BecauseCPUAdoesnothavethecompareincludedinthebranch,assumethatitsclockcycletimeis1.25timesfasterthanthatofCPUB.

CPUA的時(shí)鐘比CPUB的快1.25倍WhichCPUisfaster?哪一個(gè)更快?WhatifCPUAwasonly1.1timesfaster?

1.5QuantitativePrinciplesofComputerDesignANSWER:答案wecanuseCPUperformanceformula:

CPIA=0.202+0.801=1.2CPUtimeA=ICA1.2ClockcycletimeAClockcycletimeB=1.25ClockcycletimeAComparesarenotexecutedinCPUB,so20%/80%=25%instructionsarebranches:

CPIB=0.252+0.751=1.25Because,ICB=0.8ICA.so:

CPUtimeB=ICB1.25ClockcycletimeB

=0.8ICA1.25(1.25ClockcycletimeA)=1.25ICAClockcycletimeA

>CPUtimeA

所以此時(shí)A快1.5QuantitativePrinciplesofComputerDesignIfCPUAwereonly1.1timesfaster,thenClockcycletimesis1.10ClockcycletimeAandtheperformanceofCPUBis:如果CPUA只比CPUB快1.1倍

CPUtimeB=ICBCPIBClockcycletimeB

=0.8ICA1.25(1.10ClockcycletimeA)=1.10ICAClockcycletimeA<CPUtimeA

所以此時(shí)B快本質(zhì)上是時(shí)鐘周期和指令數(shù)量之間的權(quán)衡。1

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論