第7講1大規(guī)模并行處理機系統(tǒng)MPP_第1頁
第7講1大規(guī)模并行處理機系統(tǒng)MPP_第2頁
第7講1大規(guī)模并行處理機系統(tǒng)MPP_第3頁
第7講1大規(guī)模并行處理機系統(tǒng)MPP_第4頁
第7講1大規(guī)模并行處理機系統(tǒng)MPP_第5頁
已閱讀5頁,還剩36頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)

文檔簡介

1、第7講 1 大規(guī)模并行處理機系統(tǒng) mpp古志民千萬億次超級計算機-天河一號tianhe-1 2009天河一號特點我國首臺千萬億次超級計算機系統(tǒng)“天河一號”由國防科學(xué)技術(shù)大學(xué)研制成功。在今天中國高性能計算在今天中國高性能計算機機top100組織公布的組織公布的2009年度前年度前100強排名中,天河強排名中,天河一號高居榜首。一號高居榜首。有關(guān)專家認(rèn)為,“天河一號”的誕生,是我國戰(zhàn)略高技術(shù)和大型基礎(chǔ)科技裝備研制領(lǐng)域取得的又一重大創(chuàng)新成果,實現(xiàn)了我國自主研制超級計算機能力從百萬億次到千萬億次的跨越,使我國成為繼我國成為繼美國之后世界上第二個能夠研制千萬億次超級計算機美國之后世界上第二個能夠研制千萬

2、億次超級計算機系統(tǒng)的國家。系統(tǒng)的國家。 系統(tǒng)峰值性能達(dá)每秒系統(tǒng)峰值性能達(dá)每秒1206萬億次雙精度浮點運算,內(nèi)萬億次雙精度浮點運算,內(nèi)存總?cè)萘看婵側(cè)萘?8tb,點點通信帶寬每秒,點點通信帶寬每秒40gb,共享磁盤容,共享磁盤容量為量為1pb,具有高性能、高能效、高安全和易使用等顯著特點,綜合技術(shù)水平進入世界前列 .ibm千萬億次超級計算機ibm千萬億次計算的超級計算機 ibm為美國洛斯阿拉莫斯國家實驗室建造的計算機系統(tǒng)成為世界上首個突破每秒鐘一千萬億次計算的超級計算機。 排名前10名中有5個系統(tǒng)出自ibm;前50名中有17個系統(tǒng)出自ibm;前100名中有35個系統(tǒng)出自ibm,此外,上海超級計算中

3、心的“曙光”5000a排名第15位。 在500強榜單中,有188臺超級計算機來自于ibm,卻有212臺超級計算機來自惠普。ibm for los national laboratory building computer system become the worlds first breakthrough one quadrillion times per second calculation of the super computer. the top 10 has five system from ibm; top 50 has 17 system from ibm; in the fi

4、rst 100 has 35 system from ibm, in addition, shanghai supercomputing center dawn 5000 a ranking 15th.in the 500 list, there are 188 sets of super computer from ibm, are 212 supercomputer from hp.1mpp (massively parallel processing)mpp (massively parallel processing) is the coordinated processing of

5、a program by multiple processor s that work on different parts of the program, with each processor using its own operating system and memory . typically, mpp processors communicate using some messaging interface. in some implementations, up to 200 or more processors can work on the same application.

6、 an interconnect arrangement of data paths allows messages to be sent between processors. typically, the setup for mpp is more complicated, requiring thought about how to partition a common database among processors and how to assign work among the processors. an mpp system is also known as a loosel

7、y coupled or shared nothing system.an mpp system is considered better than a symmetrically multi-processing system ( smp ) for applications that allow a number of databases to be searched in parallel. these include decision support system and data warehouse applications.2 mpp architecture 高速網(wǎng)絡(luò)(高速網(wǎng)絡(luò)(

8、hsn)本地互連網(wǎng)絡(luò) nicp/c.p/c m磁盤和其他i/osmp/single processormpp with/without smpsmp 2-64 processors today shared-everything architecture all processors share all the global resources available single copy of the os runs on these systemsmpp a large parallel processing system with a shared-nothing architecture

9、 consist of several hundred nodes with a high-speed interconnection network/switch each node consists of a main memory & one or more processors runs a separate copy of the os3 可擴放性可擴放性scalability-if an application needs more mips or megabytes, additional processors can be added help solve the proble

10、m采用物理分布式主存結(jié)構(gòu)distributed memory system;平衡的處理能力和主存與i/o能力,保證數(shù)據(jù)快速送入處理器;平衡的計算能力和并行性以及交互能力,保證進程/線程管理及通信與同步極小的開銷;以上述條件為基礎(chǔ)實施可擴放性。in a massively parallel processing system, current levels of technology allow for thousands of processors per system tens / hundreds of megabytes of ram per processor gigabytes of

11、 disk storage per processor tens of megabytes/sec global communication bandwidth per processor hundreds of mips / mflops per processor 4 系統(tǒng)成本system cost需要控制mpp系統(tǒng)中每一部件成本,采取的措施:1利用moor定律(每1824個月性能就翻一番)選用商用微處理器(為pc或小型系統(tǒng)或工作站設(shè)計);2采用殼體系結(jié)構(gòu)(用shell方法,系統(tǒng)其他部分無須改變),支持(微處理器)部件換代的可擴放性;然而也產(chǎn)生了問題:物理地址空間太??;tlb(transl

12、ation look-aside buffer)太??;單字(single-word stride)存取效率很低等。the need to control mpp system in each unit cost, take measures:1 use moor law (every 18 24 months performance is doubling) choose commercial microprocessor (for pc or a small system or workstation design);2 the shell system structure (with sh

13、ell method, system other part does not need to change) support (microprocessor) unit scalability; however also produced a problem: physical address space is too small; tlb (translation look - aside buffer) is too small; words (single - word stride) access efficiency is very low.5 通用性和可用性支持mimd;支持pvm

14、、mpi、hpf;支持節(jié)點分區(qū);高可用性;其他:支持通信需求;支持可擴放i/o性能;面臨的問題(some difficulties)實際性能差: rmax rpeak;并行程序可編程性困難,need new programming tools;if the system is designed intelligently, the overall performance of the system (global communication bandwidth, mips, mflops, etc.) will scale up linearly with the system size. i

15、t should be noted, though, that the degree to which performance can be extracted from a mpp system is very algorithm dependent.undoubtedly the level of computing power available in a large mpp system will increase dramatically over time. processor speeds and memory sizes are doubling approximately e

16、very eighteen months and this increase will be quickly adopted by mpp manufacturers. this means that the age of a teraflop/terabyte computer is not far off. extremely large amounts of data will be able to be analyzed using this amount of processing power. 7 實例1:cray t3e體系結(jié)構(gòu)ncc-numa+dsm 三維雙向環(huán)網(wǎng)鏈接 i/o設(shè)

17、備 千兆環(huán)通道alpha21164主存控制和寄存器路由器shell8 cray t3e性能300 mhz processoreach processor rpeak=600mflops62048 processorssystem rpeak=3.61228gflopsmemory size=14096gbmemory rpeak=7.22450gb/snetwork rpeak=600mb/s9 t3e系統(tǒng)軟件與價格unicos/mk (64 bit unix)pvmmpihpfc/c+totalview并行程序調(diào)試器mpp apprentice并行性能分析工具100萬美元,1995年交付使用

18、。10 實例2 intel/sandia asic option red (1997年交付使用,norma結(jié)構(gòu))4608節(jié)點,其中compute nodes 4536,service nodes 32,i/o nodes 24,system nodes 2,backup nodes 14;1540 power;616 mainboard;640 disks;2 個200mhz pentium pro處理器/每個node;594 gb memory 11 intel/sandia asic option red (architecture of mesh routing component)mr

19、cmrcmrcmrcmrcmrcmrcmrcmrcmrcmrcmrcnic of mainboard12 雙節(jié)點(4cpu)主板結(jié)構(gòu)l2 cache p6nicl2 cache p6l2 cache p6l2 cache p6nic引導(dǎo)支持主存控制主存控制simmssimmsi/o橋擴展連接器i/o橋引導(dǎo)支持?jǐn)U展連接器64bit,66mhzlocal busicfpci bus13 單節(jié)點(2 cpu)主板結(jié)構(gòu)pcil2 cache p6l2 cache p6nic引導(dǎo)支持主存控制主存控制simmssimmsi/o橋擴展連接器i/o橋引導(dǎo)支持?jǐn)U展連接器64bit,66mhzlocal busi

20、cfpci bus14 asci option red系統(tǒng)圖diskdiskdiskpci nodecom.nodepci nodepci nodecom.nodecom.nodecom.nodecom.nodecom.nodeser.nodeser.nodeser.nodepcinodeethe.nodenode站(ssi)引導(dǎo)nodei/ocomputing. nodei/o服務(wù)sys.nodewhat is single system image (ssi) ?a single system image is the illusion(幻覺), created by software o

21、r hardware, that presents a collection of resources as one, more powerful resource.ssi makes the mpp/cluster appear like a single machine to the user, to applications, and to the network.15 系統(tǒng)軟件paragon (based osf unix)for compute node run cougar(light weight kernel)mpinx message libc/c+mpp network r

22、eviewmpp network reviewmultithreadingwithout multithreading supportwith multithreading supporta related model to simd is vector processing-goodyear mpp, 1983mimd-ibm rs/6000 sp2 with 256 processors.this distributed-memory machine is built using boards from desktop computers largely unchanged plus a

23、custom switch as the interconnect. photo courtesy of the lawrence livermore national laboratory.scalability vs. single system imageup16 機群(cluster)系統(tǒng)引子系統(tǒng)特征smpmppcluster節(jié)點數(shù)量=101001000=100復(fù)雜度中、細(xì)粒度中、細(xì)粒度中、粗粒度通信共享共享、消息消息節(jié)點os1n 核和1主機nssi永遠(yuǎn)部分希望地址空間單一多或單一多個作業(yè)調(diào)度單一隊列主機單一隊列協(xié)作多隊列網(wǎng)絡(luò)協(xié)議非標(biāo)準(zhǔn)非標(biāo)準(zhǔn)非標(biāo)準(zhǔn)、標(biāo)準(zhǔn)可用性低低到中高或容錯性能/價格

24、一般一般高計算機機群cluster of computera cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers working together as a single, integrated computing resource. a collection of workstations of pcs that are interconnected by a high-spe

25、ed network work as an integrated collection of resources have a single system image spanning all its nodes計算機機群系統(tǒng)結(jié)構(gòu)architecture of cluster 高速互連網(wǎng)絡(luò)hsn 機群中間層 ssi、可用性底層osnodeosnodeosnodeosnodeosnode串行應(yīng)用并行應(yīng)用并行編程環(huán)境pvm、mpi、javacomputer cluster by using network計算機機群連接方式1(無共享) d p/c m mio nic d p/c m mio nic lan計算機機群連接方式2(共享磁盤) d p/c m mio nic d p/c m mio nic 共享磁盤19 連接方式(共享存儲器) d p/c m mio nic d p/c m mio nic sci21 設(shè)計要點可用性:充分利用冗余資源,使系統(tǒng)在盡可能時間內(nèi)為用戶服務(wù);單一系統(tǒng)映像ssi:通過組合各節(jié)點os提供對系統(tǒng)資源的統(tǒng)一訪問;job managementpfs需要高效通信系統(tǒng)關(guān)于可用性中的檢查點問題 checkpoint(a,b,c)可在內(nèi)核、庫、應(yīng)用程序三級發(fā)生;abdcxyzpqrprocess一致性快照一致性快照checkpoint cons

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論