東北大學計算機組成原理第八章_第1頁
東北大學計算機組成原理第八章_第2頁
東北大學計算機組成原理第八章_第3頁
東北大學計算機組成原理第八章_第4頁
東北大學計算機組成原理第八章_第5頁
已閱讀5頁,還剩109頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領

文檔簡介

DigitalDesignandComputerArchitectureChapter8Chapter8::TopicsIntroductionMemorySystemPerformanceAnalysisCachesVirtualMemoryMemory-MappedI/OSummaryComputerperformancedependson:ProcessorperformanceMemorysystemperformance

MemoryInterfaceIntroductionInpriorchapters,assumedaccessmemoryin1clockcycle–buthasn’tbeentruesincethe1980’sProcessor-MemoryGapMakememorysystemappearasfastasprocessorUsehierarchyofmemoriesIdealmemory:FastCheap(inexpensive)Large(capacity)Butcanonlychoosetwo!MemorySystemChallengeDevelopmentThefirstcomputerusesTubetrigger;Mercurydelayline(汞延遲線)Magnetictape(磁帶)MagneticDrum(磁鼓)Magneticcore(磁芯(1951年始))Semiconductor(半導體)Disk-CD–Nanomemory(磁盤—光盤—納米存儲)Introduction

汞延遲線

磁鼓

磁芯Developmentcapacity Bit(b)

Byte(B)KiloByte(KB)MegaByte(MB) GigaByte(GB)

TeraByte(TB) PetaByte(PB)ExaByte(EB)ZetaByte(ZB) YottaByte(YB)NonaByte(NB)DoggaByte(DB)單位名稱常規(guī)十進制表示存儲器容量表示K(Kilo)1K=103=10001K=210=1024M(Mega)1M=106=103K1M=220=210K=1048576G(Giga)1G=109=106M1G=230=210M=1073741824T(Tera)1T=1012=109G1T=240=210G=1099511627776IntroductionDevelopment①fromMain–Auxiliarytomulti-layered②singletoMulti-bodycross(parallel)。③

VirtualmemoryIntroductionMainCharactersforEvaluationPrimaryMemory3Keycharacters:Capacity,speedandpriceLarger,fasterandcheaperpricePriceperbit,P=C/S C—priceofmemory,S—capacity(bits)Largercapacity,fasterspeed-higherprice

IntroductionSpeed(1)

MemoryAccessTime(存取時間TA):considerr/woperationonce.ThetimefromsendingRsignaltodataisget.(2)

MemoryCircleTime(存儲周期TM):considerintervalbetweentwocontinuousr/woperation.(3)

Bm(頻帶寬度):datathataccessinunittime.

Bm=W/TM(位/秒)W——datawidthofR/W,generallyequalstowordlengthofmemoryTM——MemoryCircleTimeTM

isthemostimportantamongtheabove3characters,whichcomprehensivelyreflectworkingefficiencyofmemory.Introduction三極管的開關時間0.9ICS0.1ICS延遲時間td開啟時間ton=td+tr存儲時間ts下降時間tf關閉時間toff=ts+tf上升時間trClassificationByStorageMedia:Magnetic/semiconductorsByaccessmode:Random/sequential(tape)Byr/wfunction:ROM,RAMRAM:staticram/dynamicramROM:MROM/PROM/EPROM/EEPROMByeffection:Primary/secondary/cache/controlmemoryIntroductionMulti-layeredMemorySpeedofmemoryfallsbehindtheneedofCPU.Thecapacityofmemoryfallsbehindtheneedofsoftware.Inordertosolvedemandsofbiggercapacity,higherspeed,lowercost,multi-layeredmemorystructureisadopted,thatiscache-primarymemory-externalmemory.IntroductionMulti-layeredMemoryGPR(通用寄存器)Cache(SRAM)MainMemory(DRAM,SRAM)SecondaryStorage聯(lián)機外部存儲器(磁盤等)Externalmemory脫機外部存儲器(磁帶、光盤存儲器等)CPUIncreaseincapacityIncreaseinspeedhostExternaldeviceIntroductionMemoryHierarchyExploitlocalitytomakememoryaccessesfastTemporalLocality:

LocalityintimeIfdatausedrecently,likelytouseitagainsoonHowtoexploit:keeprecentlyaccesseddatainhigherlevelsofmemoryhierarchySpatialLocality:

LocalityinspaceIfdatausedrecently,likelytousenearbydatasoonHowtoexploit:whenaccessdata,bringnearbydataintohigherlevelsofmemoryhierarchytooLocalityCacheCacheOrganizationMainMemoryMARM-CacheAddressmappingCARCacheCellarrayreplacementCPUmisshitSinglewordAddressbusDatabusblockmultiwordHit:

datafoundinthatlevelofmemoryhierarchyMiss:

datanotfound(mustgotonextlevel)

MemoryPerformanceAprogramhas2,000loadsandstores1,250ofthesedatavaluesincacheRestsuppliedbyotherlevelsofmemoryhierarchyWhatarethehitandmissratesforthecache?

MemoryPerformanceExample1Aprogramhas2,000loadsandstores1,250ofthesedatavaluesincacheRestsuppliedbyotherlevelsofmemoryhierarchyWhatarethehitandmissratesforthecache?

HitRate=1250/2000=0.625

MissRate=750/2000=0.375=1–HitRateMemoryPerformanceExample1EffectiveAccessTime Te=h×Thit+(1-h)TmissThit--timespentaccessingmemoryduringtheeventofacachehiti.e.,cacheaccesstime(Tc)Tmiss---timespentaccessingmemoryduringtheeventofacachemissi.e.,thetimespentfortransferringamainmemoryblocktothecache(Tm)Teaverageaccesstime(Ta)CacheCacheAccessEfficiencyofCache e

e=Tc/Ta=

Tc/[h×Tc+(1-h)Tm] =1/[h+(1-h)(Tm/Tc)]

Letr=Tm/Tc

representtimesthatcachecomparingtomemory,

then: e=1/[h+(1-h)r]=1/[r+(1-r)h]CacheEx:CPUperformsaprogram;itaccessescache1900andaccessesmainmemory100.KnownthatTc=1ns,Tm=6ns.Solvingthattheefficiency(e)andTaofthecache/memorysystem.

answer: h=Nc/(Nc+Nm)=1900/(1900+100)=0.95 r=tm/tc=6ns/1ns=6 e=1/[r+(1-r)h]=1/[6+(1-6)X0.95]=80.0% ta=tc/e=1ns/0.80=1.25ns ta=h*tc+(1-h)tm=0.95X1+0.05X6=1.25ns e=tc/ta=1/1.25=80.0%Averagememoryaccesstime(AMAT):averagetimeforprocessortoaccessdata

AMAT

=tcache+MRcache[tMM+MRMM(tVM)]MemoryPerformanceAprogramhas2,000loadsandstores1,250ofthesedatavaluesincacheSupposeprocessorhas2levelsofhierarchy:cacheandmainmemorytcache=1cycle,tMM=100cyclesWhatistheAMAToftheprogramfromExample1?

MemoryPerformanceExample2Supposeprocessorhas2levelsofhierarchy:cacheandmainmemorytcache=1cycle,tMM=100cyclesWhatistheAMAToftheprogramfromExample1?

AMAT

=tcache+MRcache(tMM)

=[1+0.375(100)]cycles

=38.5cyclesMemoryPerformanceExample2Amdahl’sLaw:theeffortspentincreasingtheperformanceofasubsystemiswastedunlessthesubsystemaffectsalargepercentageofoverallperformanceCo-founded3companies,includingonecalledAmdahlCorporationin1970GeneAmdahl,1922-HighestlevelinmemoryhierarchyFast(typically~1cycleaccesstime)IdeallysuppliesmostdatatoprocessorUsuallyholdsmostrecentlyaccesseddataCacheWhatdataisheldinthecache?Howisdatafound?Whatdataisreplaced?

Focusondataloads,butstoresfollowsameprinciplesCacheDesignQuestionsIdeally,cacheanticipatesneededdataandputsitincacheButimpossibletopredictfutureUsepasttopredictfuture–temporalandspatiallocality:Temporallocality:copynewlyaccesseddataintocacheSpatiallocality:copyneighboringdataintocachetooWhatdataisheldinthecache?Capacity(C):numberofdatabytesincacheBlocksize(b):bytesofdatabroughtintocacheatonceNumberofblocks(B=C/b):numberofblocksincache:B=C/bDegreeofassociativity(N):numberofblocksinasetNumberofsets(S=B/N):eachmemoryaddressmapstoexactlyonecachesetCacheTerminologyCacheorganizedintoSsetsEachmemoryaddressmapstoexactlyonesetCachescategorizedby#ofblocksinaset:Directmapped:

1blockpersetN-waysetassociative:

NblockspersetFullyassociative:

allcacheblocksin1setExamineeachorganizationforacachewith:Capacity(C=8words)Blocksize(b=1word)So,numberofblocks(B=8)Howisdatafound?C=8words(capacity)b=1word(blocksize)So,B=8(#ofblocks)Ridiculouslysmall,butwillillustrateorganizationsExampleCacheParametersDirectMappedCacheDirectMappedCacheHardware#MIPSassemblycode addi$t0,$0,5loop: beq$t0,$0,done lw$t1,0x4($0) lw$t2,0xC($0) lw$t3,0x8($0) addi$t0,$t0,-1 jloopdone:MissRate=?DirectMappedCachePerformance#MIPSassemblycode addi$t0,$0,5loop: beq$t0,$0,done lw$t1,0x4($0) lw$t2,0xC($0) lw$t3,0x8($0) addi$t0,$t0,-1 jloopdone:MissRate=3/15 =20%TemporalLocalityCompulsoryMissesDirectMappedCachePerformance#MIPSassemblycode addi$t0,$0,5loop: beq$t0,$0,done lw$t1,0x4($0) lw$t2,0x24($0) addi$t0,$t0,-1 jloopdone:MissRate=?DirectMappedCache:Conflict#MIPSassemblycode addi$t0,$0,5loop: beq$t0,$0,done lw$t1,0x4($0) lw$t2,0x24($0) addi$t0,$t0,-1 jloopdone:MissRate=10/10 =100%ConflictMissesDirectMappedCache:ConflictN-WaySetAssociativeCache#MIPSassemblycode addi$t0,$0,5loop: beq$t0,$0,done lw$t1,0x4($0) lw$t2,0x24($0) addi$t0,$t0,-1 jloopdone:MissRate=?N-WaySetAssociativePerformance#MIPSassemblycode addi$t0,$0,5loop: beq$t0,$0,done lw$t1,0x4($0) lw$t2,0x24($0) addi$t0,$t0,-1 jloopdone:MissRate=2/10 =20%AssociativityreducesconflictmissesN-WaySetAssociativePerformanceReducesconflictmissesExpensivetobuildFullyAssociativeCacheIncreaseblocksize:Blocksize,b=4

wordsC=8wordsDirectmapped(1blockperset)Numberofblocks,B=2(C/b

=8/4=2)SpatialLocality?CachewithLargerBlockSize addi$t0,$0,5loop: beq$t0,$0,done lw$t1,0x4($0) lw$t2,0xC($0) lw$t3,0x8($0) addi$t0,$t0,-1 jloopdone:MissRate=?DirectMappedCachePerformance addi$t0,$0,5loop: beq$t0,$0,done lw$t1,0x4($0) lw$t2,0xC($0) lw$t3,0x8($0) addi$t0,$t0,-1 jloopdone:MissRate=1/15 =6.67%LargerblocksreducecompulsorymissesthroughspatiallocalityDirectMappedCachePerformanceCapacity:CBlocksize:bNumberofblocksincache:B=C/bNumberofblocksinaset:NNumberofsets:S=B/NOrganizationNumberofWays(N)NumberofSets(S=B/N)DirectMapped1BN-WaySetAssociative1<N<BB/NFullyAssociativeB1CacheOrganizationRecapCacheistoosmalltoholdalldataofinterestatonceIfcachefull:programaccessesdataX&evictsdataYCapacitymiss

whenaccessYagainHowtochooseYtominimizechanceofneedingitagain?Leastrecentlyused(LRU)replacement:theleastrecentlyusedblockinasetevictedCapacityMisses#MIPSassemblylw$t0,0x04($0)lw$t1,0x24($0)lw$t2,0x54($0)LRUReplacement#MIPSassemblylw$t0,0x04($0)lw$t1,0x24($0)lw$t2,0x54($0)LRUReplacementCacheCacheReplacementAlgorithms—Commonlyused(implementedbyhardware):RAND

FIFOFirstInFirstOutLRULeastRecentlyUsedCacheCache

replacementI.E:thereare8blocksinmemory(0-7).Thereare4blocksin

Cache(0-3).Set-associatemappingisadopted,andthereare2blocksinagroup.LRUisusedasreplacementalgorithm.

memoryblockflow:1、2、4、1、3、7、0、1、2、5、4、6、4、7、2.Letcacheisemptyinitially.PleasegivedetailusageofCache.CacheI.E:thereare8blocksinmemory(0-7).Thereare4blocksin

Cache(0-3).Set-associatemappingisadopted,andthereare2blocksinagroup.LRUisusedasreplacementalgorithm.12411214214231473107310731072157245724562456245674527misshitHitratio:3/15=0.2塊分配情況Accessflow

:

1241370125

46

472Accesstime:

12345678910111213141501234567201231

413017426Whatdataisheldinthecache?Recentlyuseddata(temporallocality)Nearbydata(spatiallocality)Howisdatafound?SetisdeterminedbyaddressofdataWordwithinblockalsodeterminedbyaddressInassociativecaches,datacouldbeinoneofseveralwaysWhatdataisreplaced?Least-recentlyusedwayinthesetCacheSummaryCompulsory:

firsttimedataaccessedCapacity:cachetoosmalltoholdalldataofinterestConflict:dataofinterestmapstosamelocationincacheMisspenalty:

timeittakestoretrieveablockfromlowerlevelofhierarchyTypesofMissesBiggercachesreducecapacitymissesGreaterassociativityreducesconflictmissesAdaptedfromPatterson&Hennessy,ComputerArchitecture:AQuantitativeApproach,2011MissRateTrendsBiggerblocksreducecompulsorymissesBiggerblocksincreaseconflictmissesMissRateTrendsLargercacheshavelowermissrates,longeraccesstimesExpandmemoryhierarchytomultiplelevelsofcachesLevel1:smallandfast(e.g.16KB,1cycle)Level2:largerandslower(e.g.256KB,2-6cycles)MostmodernPCshaveL1,L2,andL3cacheMultilevelCachesIntelPentiumIIIDieGivestheillusionofbiggermemoryMainmemory(DRAM)actsascacheforharddiskVirtualMemoryPhysicalMemory:DRAM(MainMemory)VirtualMemory:HarddriveSlow,Large,CheapMemoryHierarchy TakesmillisecondstoseekcorrectlocationondiskHardDiskVirtualaddressesProgramsusevirtualaddressesEntirevirtualaddressspacestoredonaharddriveSubsetofvirtualaddressdatainDRAMCPUtranslatesvirtualaddressesintophysicaladdresses(DRAMaddresses)DatanotinDRAMfetchedfromharddriveVirtualMemoryMemoryProtectionEachprogramhasownvirtualtophysicalmappingTwoprogramscanusesamevirtualaddressfordifferentdataProgramsdon’tneedtobeawareothersarerunningOneprogram(orvirus)can’tcorruptmemoryusedbyanotherVirtualMemoryCacheVirtualMemoryBlockPageBlockSizePageSizeBlockOffsetPageOffsetMissPageFaultTagVirtualPageNumberPhysicalmemoryactsascacheforvirtualmemoryCache/VirtualMemoryAnaloguesPagesize:amountofmemorytransferredfromharddisktoDRAMatonceAddresstranslation:determiningphysicaladdressfromvirtualaddressPagetable:lookuptableusedtotranslatevirtualaddressestophysicaladdressesVirtualMemoryDefinitionsMostaccesseshitinphysicalmemoryButprogramshavethelargecapacityofvirtualmemoryVirtual&PhysicalAddressesAddressTranslationSystem:Virtualmemorysize:2GB=231bytesPhysicalmemorysize:128MB=227bytesPagesize:4KB=212bytesVirtualMemoryExampleSystem:Virtualmemorysize:2GB=231bytesPhysicalmemorysize:128MB=227bytesPagesize:4KB=212

bytesOrganization:Virtualaddress:31

bits(32位地址最高位總為0)Physicaladdress:27

bits

(32位地址高5位總為0)Pageoffset:12

bits#Virtualpages=231/212=219(VPN=19bits)#Physicalpages=227/212=215(PPN=15bits)VirtualMemoryExample19-bitvirtualpagenumbers15-bitphysicalpagenumbersVirtualMemoryExampleVirtualMemoryExample

Whatisthephysicaladdressofvirtualaddress0x247C?VirtualMemoryExample

Whatisthephysicaladdressofvirtualaddress0x247C?VPN=0x2VPN0x2mapstoPPN

0x7FFF12-bitpageoffset:0x47CPhysicaladdress=0x7FFF47CPagetableEntryforeachvirtualpageEntryfields:Validbit:

1ifpageinphysicalmemoryPhysicalpagenumber:wherethepageislocatedHowtoperformtranslation? VPNisindexintopagetablePageTableExample

Whatisthephysicaladdressofvirtualaddress0x5F20?

PageTableExample1

Whatisthephysicaladdressofvirtualaddress0x5F20?

VPN=5Entry5inpagetableVPN5=>physicalpage1Physicaladdress:0x1F20PageTableExample1

Whatisthephysicaladdressofvirtualaddress0x73E0?

PageTableExample2

Whatisthephysicaladdressofvirtualaddress0x73E0?

VPN=7Entry7isinvalidVirtualpagemustbepagedintophysicalmemoryfromdiskPageTableExample2PagetableislargeusuallylocatedinphysicalmemoryLoad/storerequires2mainmemoryaccesses:onefortranslation(pagetableread)onetoaccessdata(aftertranslation)CutsmemoryperformanceinhalfUnlesswegetclever…PageTableChallengesSmallcacheofmostrecenttranslationsReduces#ofmemoryaccessesformostloads/storesfrom2

to1TranslationLookasideBuffer(TLB)Pagetableaccesses:hightemporallocalityLargepagesize,soconsecutiveloads/storeslikelytoaccesssamepageTLBSmall:accessedin<1cycleTypically16-512entriesFullyassociative>99%hitratestypicalReduces#ofmemoryaccessesformostloads/storesfrom2to1TLBExample2-EntryTLBMultipleprocesses(programs)runatonceEachprocesshasitsownpagetableEachprocesscanuseentirevirtualaddressspaceAprocesscanonlyaccessphysicalpagesmappedinitsownpagetableMemoryProtectionVirtualmemoryincreasescapacityAsubsetofvirtualpagesinphysicalmemoryPagetable

mapsvirtualpagestophysicalpages–addresstranslationATLB

speedsupaddresstranslationDifferentpagetablesfordifferentprogramsprovidesmemoryprotectionVirtualMemorySummaryVirtualMemoryReplacementPolicies1、Rand2、FIFO3、LRU4、OPTVirtualMemory

——ReplacementPoliciesEx:aprogramconsistsof5pages.Duringprogramexecution,thepageaddressstreamisasfollows,P1、P2、P1、P5、P5、P1、P3、P4、P3、P4.Supposethatduringprogramexecution,systemallocatesonlythreepagestothemainmemoryofthisprogram.Question:(1)GivedispatchtableofusingreplacementpoliciesofFIFO,LRU,OPT.(2)WorkingoutthepagehitratiousingLRU.1541342415413424VirtualMemory

——ReplacementPoliciespagevirtualmemory

(2)Answer:Hp=4/10=0.4ProcessoraccessesI/Odevicesjustlikememory

(likekeyboards,monitors,printers)EachI/OdeviceassignedoneormoreaddressWhenthataddressisdetected,dataread/writtentoI/OdeviceinsteadofmemoryAportionoftheaddressspacededicatedtoI/OdevicesMemory-MappedI/OAddressDecoder:Looksataddresstodeterminewhichdevice/memorycommunicateswiththeprocessorI/ORegisters:HoldvalueswrittentotheI/OdevicesReadDataMultiplexer:SelectsbetweenmemoryandI/OdevicesassourceofdatasenttotheprocessorMemory-MappedI/OHardwareTheMemoryInterfaceMemory-MappedI/OHardwareSupposeI/ODevice1isassignedtheaddress0xFFFFFFF4Writethevalue42toI/ODevice1ReadvaluefromI/ODevice1andplacein$t3Memory-MappedI/OCodeWritethevalue42toI/ODevice1(0xFFFFFFF4) addi$t0,$0,42 sw$t0,0xFFF4($0)Memory-MappedI/OCodeReadthevaluefromI/ODevice1andplacein$t3 lw$t3,0xFFF4($0)Memory-MappedI/OCodeEmbeddedI/OSystemsToasters,LEDs,etc.PCI/OSystemsInput/Output(I/O)SystemsExamplemicrocontroller:PIC32microcontroller32-bitMIPSprocessorlow-levelperipheralsinclude:serialportstimersA/DconvertersEmbeddedI/OSystems//CCode#include<p3xxxx.h>intmain(void){

intswitches;

TRISD=0xFF00;//RD[7:0]outputs//RD[11:8]inputswhile(1){

//read&maskswitches,RD[11:8]switches=(PORTD>>8)&0xF;PORTD=switches;//displayonLEDs

}}DigitalI/OExampleserialprotocolsSPI:SerialPeripheralInterfaceUART:UniversalAsynchronousReceiver/TransmitterAlso:I2C,USB,Ethernet,etc.SerialI/OSPI:SerialPeripheralInterfaceMasterinitiatescommunicationtoslavebysendingpulsesonSCKMastersendsSDO(SerialDataOut)toslave,msbfirstSlavemaysenddata(SDI)tomaster,msbfirstUART:UniversalAsynchronousRx/TxConfiguration:startbit(0),7-8databits,parit

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論