版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
1、高性能多核和眾核處理機芯片技術發(fā)展李三立教授清華大學1引言處理機永遠是計算機技術和產業(yè)的重要驅動力。要進一步發(fā)展千億次(Petaflops)高性能計算機,是離不開多核與眾核芯片的發(fā)展的;計算機體系結構的新技術大多體現在高性能多核與眾核芯片上。希望我們關注高性能計算技術的發(fā)展;現在計算機體系結構是“系統(tǒng)”都做到“芯片上”去了(SOC)。希望我們計算機學院的“計算機組織”和“計算機體系結構”課程的老師和學生能夠在教學與學習中增加這方面內容,老師在申請自然科學基金和其它科研經費方面也注意加重這方面的研究方向;希望我們年輕教師和學生把興趣放在這一領域,把我國的處理機芯片技術搞上去。2我國萬萬億次超級計
2、算機CPU有望全部國產化 世界第一的“天河一號”超級計算機系統(tǒng)采用了“飛騰-1000”高性能多核微處理器?!疤旌右惶枴保?700萬億次的峰值速度和2566萬億次的持續(xù)速度 ;1000萬億次/秒為:1Petaflops 2019-3-8日環(huán)球網報道國防科大校長張育林談話3我國天河一號千萬億次超級計算機世界500強第一名,奧巴馬專門提到它4世界500強第一名天河1號插件版5提綱1。多核與眾核處理機結構芯片技術的需要2。多核和眾核體系結構處理機芯片的發(fā)展3。異構多核眾核結構芯片4。片上系統(tǒng)SOC互聯(lián)網絡的發(fā)展5。微電子工藝的進一步發(fā)展6。未來exaFlops高性能計算機芯片預測7。結論6(一)。 多
3、核與眾核處理機結構芯片技術的需要77/21/202288高性能計算應用需求1 Zettaflops100 Exaflops10 Exaflops1 Exaflops100 Petaflops10 Petaflops1 Petaflops100 TeraflopsSystem PerformancePlasma Fusion Simulation Jardin 03Simulation of more complex biomolecular structures200020202019No schedule provided by sourceApplicationsJardin 03 S.C
4、. Jardin, “Plasma Science Contribution to the SCaLeS Report,” Princeton Plasma Physics Laboratory, PPPL-3879 UC-70, available on Internet.Malone 03 Robert C. Malone, John B. Drake, Philip W. Jones, Douglas A. Rotman, “High-End Computing in Climate Modeling,” contribution to SCaLeS report.NASA 99 R.
5、T. Biedron, P. Mehrotra, M. L. Nelson, F. S. Preston, J. J. Rehder, J. L. Rogers, D. H. Rudy, J. Sobieski, and O. O. Storaasli, “Compute as Fast as the Engineers Can Think!”NASA/TM-2019-209715, available on Internet.NASA 02 NASA Goddard Space Flight Center, “Advanced Weather Prediction Technologies:
6、 NASAs Contribution to the Operational Agencies,” available on Internet.SCaLeS 03 Workshop on the Science Case for Large-scale Simulation, June 24-25, proceedings on Internet a /scales/.DeBenedictis 04, Erik P. DeBenedictis, “Matching Supercomputing to Progress in Science,” July 2019. Present
7、ation at Lawrence Berkeley National Laboratory, also published asSandia National Laboratories SAND report SAND2019-3333P. Sandia technical reports are available by going to and accessing the technical library.HEC04 Federal Plan for High-End Computing, May, 2019.Compute as fast as the engi
8、neer can thinkNASA 99 100 1000 SCaLeS 03 Geodata Earth Station Range NASA 02Full Global Climate Malone 03 Courtesy of Erik P. DeBenedictis simulation of medium biomolecular structures (us scale) simulation of large biomolecular structures (ms scale)protein folding50 TFLOPS250 TFLOPS1 PFLOPSHEC04cpeg
9、421-2019-F/Topic-3-I等離子體全球氣候模型海量地球數據更復雜生物分子結構模擬蛋白質結構生物分子結構系統(tǒng)性能應用1萬萬億次100萬萬億次1000萬萬億次8晶體管數目增長-Intel320億晶體管9芯片上頻率不能持續(xù)增長功耗問題停頓了10功耗引起發(fā)熱直觀圖片11CPU的水冷和風冷水冷系統(tǒng)風冷系統(tǒng)12解決功耗增長和晶體管增長的矛盾解決方案:新制造材料;新制冷技術;多核和眾核體系結構13多核和眾核的發(fā)展對于性能的影響多核三年的變化性能年份Intel著重在PC機發(fā)展14體系結構進展:單核多核眾核-片上互聯(lián)1993, Pentium2019, Pentium MMX2019, Penti
10、um II2019, Pentium III2019, Tualatin2019, Pentium 4Northwood2019, Pentium D2019, Core 2 Duo (Conroe)2019, Core 2 Quad(Kentisfield)2019, TeraScale 80-core prototypeSingle core with increased performanceMulticore processor with more and more cores!Key for Multicore:Interconnection15AMD通用單核的內部結構 AGUAGU
11、Int Decode & RenameFADDFMISCFMUL44-entryLoad/StoreQueue36-entry FP schedulerFP Decode & RenameALUAGUALUMULTALUResResResL1Icache64KBL1Dcache64KBFetchBranchPredictionInstruction Control Unit (72 entries)FastpathMicrocode EngineScan/Align/Decodeops取指轉移預測微碼硬布線微操作數據緩存指令緩存16AMD 雙核芯片的布局雙核AMD Opteron 處理機 19
12、9mm2 90nm 工藝單核 AMD Opteron 處理機 193mm2 130nm 工藝17AMD Opteron 的多核架構18Intel多核與眾核解決路線2005200920062008200720102004201120122013201420152016201720182019202012481625632641285121024Pentium DCore DuoCore 2 DuoConroe, Allendale, Wolfdale, Merom, PenrynCore 2 DuoKentsfield, YorkfieldCore i7Sandy BridgePolaris T
13、eraScale80 Cores / 80 ThreadsSingle Chip Cloud Computing48 Cores / 48 ThreadsKnight Corner50 Cores / 200 ThreadsCommercial PathResearch PathNehalem 核數商業(yè)路徑研究路徑19Intel的 Nehalem多核結構要有圖形核快速通道接口20Intel 的 Nehalem四核芯片布局快速通道連接96GB/S 快速通道連接96GB/S21Intel Nehalem多核處理機層次式存儲結構CPU Core32KB L1 D$32KB L1 I$256KB L2
14、$8MB Shared L3$CPU Core32KB L1 D$32KB L1 I$256KB L2$4-8 CoresDDR3 DRAM Memory ControllersQuickPath System InterconnectEach direction is 20b6.4Gb/sEach DRAM Channel is 64/72b wide at up to 1.33Gb/sQPI是重要特點22Intel 通用Nehalem的單核結構預取緩沖預譯碼指令隊列對準轉移預測循環(huán)流譯碼快速通道訪存QPI亂序執(zhí)行緩沖第三級Cache 23JFMAMJJASONDJFMAMJJASONDJF
15、MAMJJASONDJFMAMJJASONDJFMAMJJASONDJFMAMJJASONDJFMAMJJASONDJFMAMJJASONDJFMAMJJASONDJFMAMJJASONDPower4 (2019)1.1 to 1.3 GHz(1)(2)(2)Power4+ (2019)1.9 GHz(1)(2)(2)Power5 (2019)1.5-1.9 GHz(1)(2)(4)Power5+ (2019)1.5-2.26 GHz(1)(2)(4)CBE (2019)3.2 GHz(1)(9)(10)PowerXCell8i (2019)3.2GHz(1)(9)(10)Xenon (201
16、9)3.2 GHz(1)(3)(6)Power63.5-4.7 GHz(1)(2)(4)Power6+5 GHz(1)(2)(4)Power6+5 GHz(1)(2)(4)Pentium D3.8 GHz(1)(2)(4)Core 21.8-3.2 GHz(1)(4)(8)Dual Core Atom0.8-2.06 GHz(1)(2)(2)Sandy Bridge4.6 GHz(1)(8)(16)Xeon2.863.56 GHz(1)(2)(2)Xeon Quad Code2.133.56 GHz(1)(4)(8)Xeon Beckton2.83.56 GHz(1)(8)(16)Core 7
17、i2.663.33 GHz(1)(4)(8)Opteron Denmark1.6-2.8GHz(1)(2)(2)Opteron Barcelona1.76-2.6GHz(1)(4)(4)Opteron Istanbul2.26-2.66GHz(1)(6)(6)Opteron Sao Paolo?(1)(6)(6)Opteron Magny Cours?(1)(12)(12)Opteron Interlagos?(1)(16)(16)Ultra SPARC IV1-1.356 GHz(1)(2)(2)Ultra SPARC IV+1.5-2.16 GHz(1)(2)(2)Ultra SPARC
18、T11-1.46 GHz(1)(4)(32)Ultra SPARC T21-1.66 GHz(1)(8)(64)Ultra SPARC VII2.4-2.56 GHz(1)(4)(16)Ultra SPARC VIIIfx2.4-2.56 GHz(1)(8)(16)IBMSUN / ORACLEAMDINTEL20192019201920192019201920192019200920192019NameHertz(Processor)(Cores)(Threads)7/21/202224JPL-Dec-01-2009Chips with 8 physical cores or more其他公
19、司多核/眾核發(fā)展計劃24晶體管數(千)單線程性能(SpecINT)頻率(MHz)典型功耗(瓦)核數目小結:35年處理機發(fā)展綜合趨勢25(二)。多核和眾核體系結構處理機芯片的發(fā)展26為何要多核?CoreCacheCoreCacheCoreVoltage = 1Freq = 1Area = 1Power = 1Perf = 1Voltage = -15%Freq = -15%Area = 2Power = 1Perf = 1.8In the same process technology27GPGPGPGPGPGPGPGPGPGPGPGPGeneral Purpose Cores進一步多核異構芯片
20、-SOCSPSPSPSPSpecial Purpose HWCCCCCCCCCCCCCCCCInterconnect fabricHeterogeneous Multi-Core PlatformSOC通用核專用硬件互聯(lián)網絡28多核技術將要多樣化!Multiple parallel general-purpose processors (GPPs)Multiple application-specific processors (ASPs)Sun Niagara8 GPP cores (32 threads)IntelXScale Core32K IC32K DCMEv210MEv211MEv
21、212MEv215MEv214MEv213Rbuf64 128BTbuf64 128BHash48/64/128Scratch16KBQDRSRAM2QDRSRAM1RDRAM1RDRAM3RDRAM2GASKETPCI(64b)66 MHzIXP280016b16b1818181818181864bSPI4orCSIXStripeE/D QE/D QQDRSRAM3E/D Q1818MEv29MEv216MEv22MEv23MEv24MEv27MEv26MEv25MEv21MEv28CSRs -Fast_wr-UART-Timers-GPIO-BootROM/SlowPortQDRSRAM4
22、E/D Q1818Intel Network Processor1 GPP Core16 ASPs (128 threads)IBM Cell1 GPP (2 threads)8 ASPsPicochip DSP1 GPP core248 ASPsCisco CRS-1188 Tensilica GPPs處理機上有上千個線程處理機就是摩爾定理中的晶體管“The Processor is the new Transistor” Rowen29AMD做的GPU多核SIMD芯片結構30多核伴隨指令的擴展-加速31眾核處理機結構3232Intel Terascale 80 核處理機Tilera 64核
23、處理機云存儲服務器無線網絡32NVIDIAs Fermi GPU architecture consists of 16 streaming multiprocessors (SMs), each consisting of 32 cores, each of which can execute one floating-point or integer instruction per clock. The SMs are supported by a second-level cache, host interface, GigaThread scheduler, and multiple
24、DRAM interfaces.NVIDIA的新GPU眾核芯片FERMI 結構SM32核33Each Fermi SM includes 32 cores, 16 load/store units, four special-function units, a 32K-word register file, 64K of configurable RAM, and thread control logic. Each core has both floating-point and integer execution units寄存器堆32K字浮點定點每個CUDA核34多核芯片的片上、片外訪存
25、速度設計考慮(數據訪問速度Memory Wall)處理部件64 寄存器片上Cache16MB/32KBLoad 1, Store 11.92TB/sLoad 2, Store 1640GB/s片外靜態(tài)CacheSRAM 2.5MB Load 20 cycles, Store 10 cycles 320GB/s (片外差6倍)板外動態(tài)存儲器DRAM16GBLoad 36 cycles, Store 18 cycles 16GB/s (板外差120倍)35(三)。異構多核結構芯片36為什么要發(fā)展異構眾核芯片1。要研制千萬億次(PetaFlops)高性能計算機,單靠Intel 或AMD通用同構型眾核
26、芯片是不行的,必須要有加速器;2。同構眾核芯片又會遇到功耗問題,每個核都要有它Cache等配合硬件;因此,加速器要用較大量的“小核”;3。如果CPU和GPU芯片合用,因為GPU要求大量數據,所以在芯片之間傳送大量數據,是瓶頸,很難達到峰值;4。因此,CPU和GPU應該做在一個芯片上,芯片上的數據傳輸頻帶要寬很多;更進一步,GPU仍然有編程困難的問題,如有針對專門用途的、算法和編程都比較能簡化的小核,更為合適。另一個辦法是在眾核中擴充指令、實現加速。5。高性能計算機有分向的趨勢,一般通用HPC用現有的刀片式服務器、再加上Infiniband就可以很快造成,價廉、研制速度快;而自己專門設計板級產品
27、的、幾個PetaFlops的 HPC一般都只能針對一、二種應用,有專用化的趨勢。37Enabled by: Moores Law Voltage ScalingSingle-Core EraMulti-CoreEraHeterogeneousSystems EraEnabled by: Moores Law Desire For Throughput20 years of SMP archPowerParallel SW availabilityPerformance ScalabilityMicro-Architecture受限于: Power Complexity受限于: Enabled
28、by: Moores Law Abundant data parallelism Power efficient GPUs當前受限于: Programming models Communication overheads處理機性能的三個時代單線程性能吞吐率性能針對應用目標的性能We are hereWe are hereWe are here?單核多核異構38IBM異構型Cell-NOC:八個64位向量部件SXU和標量部件PXUCell處理機39Observed clock speed: a wide range of operating frequencies are supported t
29、o optimize for power and yield; Peak performance (single precision): 256 GFlopsPeak performance (double precision): 26 GFlopsIBM Cell 異構多核處理器結構詳細結構圖雙精度單精度向量部件SIMD標量部件互聯(lián)網絡40下一步:千萬億次高性能計算機怎么辦?Intel 或 AMD通用處理機再多,也無法達到;只有具有加速器功能的異構眾核處理機芯片才可以達到!硬件可以達到,軟件沒有充分準備好(我們大學以后不一定造HPC機器,可以搞軟件,和結合算法的軟件)。41GPU對于超級計算
30、機并非理想GPU對于高性能計算的編程不適當,解決辦法是把CPU和GPU結合。 Jack Dongarra說:“The obvious upside of GPUs is that they provide compelling performance for modest prices. The downside is that they are more difficult to program, since at the very least you will need to write one program for the CPUs and another program for th
31、e GPUs. Another problem that GPUs present pertains to the movement of data. Any machine that requires a lot of data movement will never come close to achieving its peak performance. The CPU-GPU link is a thin pipe, and that becomes the strangle-point for the effective use of GPUs. In the future this
32、 problem will be addressed by having the CPU and GPU integrated in a single socket?!?2Cell處理機對于高性能計算機已經死亡Cell is Dead for HPCChips that contain both x86 general processing cores as well as graphics processing cores are essentially heterogeneous multi-core processors, which AMD calls Fusion. The vast
33、 majority of multi-core chips today are homogenous chips that contain a number of similar processing engines. There are processors with different types of cores the Cell chips jointly developed by IBM, Sony Corp. and Toshiba Corp. which originally promised to redefine the market of multimedia chips
34、as well as CPUs for HPC market. However, since all three companies cease to develop Cell, it has no future.Jack Dongarra 說:“The Cell architecture is no longer being developed, so it is effectively dead. No new supercomputers will use Cell?!?43CPUmulti-threadingmulti-coremany-corefixed functionpartia
35、lly programmablefully programmable?programmabilityparallelismA Likely Trajectory - Collision or Convergence?CPUGPUmulti-threadingmulti-coremany-corefixed functionpartially programmablefully programmablefuture processor by 2019?programmabilityparallelismafter Justin Rattner, Intel, ISC 2019未來可能的軌跡多線程
36、多核眾核全部可編程部分可編程并行度可編程度通用性和并行度的結合-異構眾核44IBM Cyclops-64(C64)芯片體系結構On-chip bisection BW = 0.38 TB/s, total BW to 6 neighbors = 48GB/sec80個核45異構型處理機構成1.1PetaFlops 超級計算機的組裝46其他多用途的異構多核芯片Combination of different coresTwo main options:Different types Microcontroller + DSP, Processor + Accelerator .Different
37、 performance Big processor + small processorAdvantagesProcessors can be optimized for different tasks Operating system, multimedia, graphics, low power appsProcessors are decoupled Independent SW developmentDisadvantagesDifferent architectures - more to learn.Different toolsMore complex SW47Texas 的用
38、于移動終端的異構多核結構芯片各個核并行執(zhí)行不同的任務,可用在移動終端48(四)。片上系統(tǒng)SOC 互聯(lián)網絡的發(fā)展49NOC的發(fā)展片上互聯(lián)網絡隨工藝進步而發(fā)展片上互聯(lián)必然發(fā)展到NOC (Network On Chip)80386奔騰多核50片上眾核系統(tǒng)的互聯(lián)網絡之一片上眾核 + 通道SOC上面:P是處理機的核51片上眾核系統(tǒng)的互聯(lián)網絡之二片上眾核 + 通道 + 路由器R路由器結構圖開關52片上互聯(lián)網絡的兩種典型拓撲結構Torus 拓撲結構Mesh 拓撲結構53時鐘:NOC的SOC的片上時鐘是分布式的RRRRRRRRRRRRRRRR每一個顏色塊代表一個時鐘域兩種研究領域: 非同步路由器 設計簡單,低
39、功耗 非同步互聯(lián) 高頻寬,低功耗圖中R是NOC路由器54未來Exa-Scale片上網絡NOCParallelism replaces clock frequency scaling and core complexityResulting ChallengesScalabilityProgrammingPower55未來Exa-Scale片上網絡NOCUnpredictable Traffic LoadApplication2Application1ConventionalNoC System(number of cores102)TimeExa-Scale Micro-Networking
40、System(number of cores:102104)UnbalancedResource AllocationScalabilityGood Performance onSmall-Scale NetworkFaulty Router & LinkComplex Design & VerificationNoC FeaturesRegular ArchitecturePacket-based TransmissionFlexible Bandwidth Utilization56MIT:對于眾核結構的分析和考慮陣列式上千個小核可以解決芯片面積和擴展性問題,但是,編程將成為難于逾越的壁壘
41、; 上千個核的并行化應用是非常艱難的:1.任務和數據的劃分;2.通信會導致延遲的增加;3.較遠距離的通信會引起沿路上的資源競爭;從而降低功能增加功耗;4.沒有有效的廣播式通信(硅片上金屬線太長)。57MIT:對于眾核結構的分析和考慮為提高上千眾核芯片性能,必須有效管理通信和局域性:任務和數據兩者都要優(yōu)化劃分和(位置)置放:分析通信模式以便使延遲最小化;數據必須放在經常使用它的執(zhí)行部件附近;某些常用程序要靠近DRAM和I/O;動態(tài)的和不可預測的通信是很難優(yōu)化的;為此,MIT提出用廣播式光通信代替電連線的陣列式通信:廣播式通信容易實現共享存儲模式,從而易于編程;減少局域性的管理;價廉而且功耗小。技
42、術基礎研究的好題目5859ATAC ArchitecturepswitchmpswitchmpswitchmpswitchmpswitchmpswitchmpswitchmpswitchmpswitchmpswitchmpswitchmpswitchmpswitchmpswitchmpswitchmpswitchmOptical Broadcast WDM InterconnectElectrical Mesh InterconnectMIT麻省理工學院提出的上千個眾核芯片上的廣播式光通信ATAC電連線的陣列式互聯(lián)網絡廣播式光通信互聯(lián)網絡59MIT提出的眾核芯片廣播式光通信的優(yōu)點光導通過眾核芯
43、片上的每一個核;光導的不同波長可以完全消除資源競爭;型號全部可以在 2ns到達所有上千個核所有核都可以接收到同樣的信號,實現真正的廣播式傳播。廣播式光通信互聯(lián)托撲結構60(五)。微電子工藝的 進一步發(fā)展61Terascale Integration CapacityTotal Transistors,300mm2 die1.5B LogicTransistors100MB Cache片上集成度到幾千億個晶體管62Freq scaling will slow downVdd scaling will slow downPower will be too high300mm2 Die頻率、電壓和功
44、耗的擴展性問題頻率電壓功率63連線:芯片工藝線條變細引起的問題:影響時鐘分布、延遲設計、互聯(lián)結構等等金屬層4金屬層3金屬層2金屬層164Package封裝問題:System in a Package系統(tǒng)Si ChipSi ChipLimited pins: 10mm / 50 micron = 200 pinsLimited pinsSignal distance is large 10 mm higher powerComplex package65從兩維到三維的SOC20個芯片堆疊(TSV)66Package散熱問題:Anatomy of a Silicon ChipSi ChipHeat
45、-sinkHeatPowerSignals67PackageDRAM at the BottomDRAMCPUHeat-sinkPower and IO signals go through DRAM to CPUThin DRAM dieThrough DRAM viasThe most promising solution to feed the beast68(六)。未來exaFlops高性能計算機芯片預測69PetaFlops以后的進展The first 10 to 20 petaflop/s supercomputers should be in service by 2019 an
46、d after that comes a machine in the 100 petaflop/s range (2019). Scientists are moderately optimistic that exaflop/s (1000 petaflop/s) mainframes can be constructed by 2018 - 2020. However, are some of these expectations just plain irrational? (2019:1-2萬萬次);(2019:10萬萬次);(2018-2020:100萬萬次) Number of
47、cores per chip will double every two years Clock speed will not increase (possibly decrease) Need to deal with systems with millions of concurrent threads Need to deal with inter-chip parallelism as well as intra-chip parallelismthe future machines architecture. At best, it will require 20 Megawatts
48、 to run. So getting to the exaflop/s level or beyond may be extremely difficult. 500 x performance (peak) 100 x memory 5000 x concurrency 3x powerSpecialized software will be needed to best make use of the massive parallelism. Argonnes Leadership Computing Facility (ALCF) will install Mira, a next g
49、eneration Blue Gene system (BG/Q), in 2019. The ALCFs stated requirements for the 10 petaflops system include approximately 0.75 million cores and 0.75 petabytes of memory, with 16 cores and 16 gigabytes of memory per node.70$200M,20MWatt,64PB of RAM 的exaFlops高性能計算機“The current memory paradigm is hi
50、erarchical, based on registers, L1 and L2 caches, local memory, shared memory, and distributed memory among nodes. That is a potential model for exaFLOPS systems. However, we want exaFLOPS systems to be designed to be relatively easy to program. We therefore want a globally shared address space(全局地址
51、空間), and explicit methods to pass data between the processors in order to orchestrate the unfolding computation. That paradigm may be necessary for a machine that has a billion threads(百萬線程)” 71預計的兩種exaFLOPS HPC途徑“There are two models that we can use to get to an exaflop while staying within a 20meg
52、aW budget. The first model employs huge numbers of lightweight processors, such as IBM Blue Gene Processor running at 1.0GHz. If we use 1 million chips, and each chip has 1000 cores, then we can get to a potential billion threads of execution. The other approach is a hybrid that makes extensive use
53、of coprocessors or GPUs. It would use a 1.0GHz processor and 10 000 floating point units per socket, and 100 000 sockets per system,” 72IBM MIRA 1萬萬億次超級計算機scientists will have to scale their current computer codes tomore than 750,000 individual computing cores, providing them preliminary experience
54、on how scalability might be achieved on an exascale-class system with 100s of millions of cores. Despite a popular trend to use both central processing units (CPUs) and graphics processing units (GPU), the Mira will be based only on IBMs PowerPC chips.The IBM BlueGene/Q supercomputer design is based
55、 on sixteen-core IBM PowerPC A2 chip with 4-way simultaneous multi-threading technology. Each processor has at least 1GB of DDR3 memory. Featuring 750 thousand processing cores, the new supercomputer will be cooled-down using a special water-cooling system.IBM Blue/Gene Q-US Department of Energys (D
56、OE) Argonne National Laboratory IBM要為Laurence Livermore國家實驗室做20PetaFlops的 Sequoia , IBM把Blue/Gene結構發(fā)展到 50Petaflops 和100Petaflops73Mira 10PetaFlops的Power PC A2處理機PowerPC A2是具有高度多核和多線程能力的64位Power架構的處理器。 IBM 稱之為 “線速處理器”,他被設計為進行切換和路由工作的傳統(tǒng)網絡處理器與處理和封裝數據的典型服務器處理器的混合體。以A2核心為基礎的處理器版本從16核心, 2.3G頻率, 65W功耗到一個4核
57、心,1.4G頻率,20W功耗。每一個A2核心可以同時執(zhí)行4個多線程(補充:Intel的超線程是兩個)。每個核心有8M緩存,并且除了通用計算處理器外,還有一系列任務專用引擎,例如XML,加密解密,壓縮和傳統(tǒng)的表達加速,4個10G以太網接口和2個PCIe線路。不需要其他支持芯片的情況下,最多可以鏈接有四個芯片為SMP(對稱多處理器)系統(tǒng) 。這些芯片據說極其復雜,使用了14億3千萬的晶體管,在45納米制程下核心大小428平方毫米。注:線速處理器 “wire-speed processor”. 指處理器的數據吞吐量和通信標準的數據量相當。此概念IBM解釋為,處理器不再是消化數據的地方,即數據停滯。而是
58、一個過濾或者修改數據并再發(fā)送的地方。 74IBM Power PC A2 的體系結構PLLPLLPLLPLLPLLEnginePLLPLLPLLPLLPLLPatternAccessx8 PHYx8 PHYx4 PHYx8 PHYEI3EI3EI3Misc I/O4x 10GE MAC or4x 1GE MACPervasivePCI ExpGen 2PCI ExpGen 2Host Ethernet Controller / Packet ProcessorRootEngineRoot/EP EnginePbusMacroPBus ExternalControllerPBICPBICPBus
59、PBICPBICComp / DecompCryptoXMLMCMCMem PHYMem PHYAT32MB L2AT22MB L2AT12MB L2AT02MB L2加速器75IBM Power PC A2的加速和互聯(lián)四個芯片互聯(lián)成SMP4 Channels 800-1600MHzTechnologyIBM 45nm SOICore Frequency2.3GHz 0.97V (Worst Case Process)Chip size428 mm2 (including kerf)Chip Power (4-AT node) Chip Power (1-AT node)65W 2.0GHz,
60、 0.85V Max Single Chip20W 1.4GHz, 0.77V Min Single ChipMain Voltage (VDD)0.7V to 1.1VMetal Layers11 Cu (3-1x, 2-1.3x, 3-2x, 1-4x, 2-10 x)Latch Count3.2MTransistor Count1.43BA2 Cores / Threads16 / 64L1 I & D Cache16 x (16KB + 16KB) SRAML2 Cache4 x 2MB eDRAMHardware AcceleratorsCrypto, Compression, Re
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 推土機租賃合同參考
- 戰(zhàn)略合作伙伴關系合同
- 2024年有償服務協(xié)議書標準版
- 2024品牌轉讓談判方案
- 版權合作共贏協(xié)議書
- 2024年公司工程裝修合同
- 企業(yè)間采購協(xié)議范本
- 年度寫字樓裝修協(xié)議書范本
- 海外獨家銷售協(xié)議
- 餐飲行業(yè)合伙經營合同樣本
- 高級廚師基礎知識題庫100道及答案
- 2024年公安機關人民警察基本級執(zhí)法資格考試試題
- 大力弘揚教育家精神課件
- 人教版小學五年級科學上冊《第四單元 光》大單元整體教學設計
- DL∕T 5754-2017 智能變電站工程調試質量檢驗評定規(guī)程
- 近年來被公開報道的起重大醫(yī)院感染事件正式完整版
- 統(tǒng)編版(2024新教材)七年級上冊語文第一單元測試卷(含答案)
- GJB9001C首件鑒定報告
- 2024年《考評員》應知應會考試題庫(附答案)
- 人工智能設計倫理智慧樹知到期末考試答案章節(jié)答案2024年浙江大學
- 2024年職業(yè)院?!把袑W旅行”(高職組)技能大賽考試題庫及答案
評論
0/150
提交評論