




版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領
文檔簡介
1、Port AMSS-NCKU code to GPU Zhoujian Cao Academy of Mathematics and System Science, CAS Cowork with Zhihui Du, Steven Brandt, Frank Loeffler and Quan Yang 2013-8-72013 International School on Numerical Relativity and Gravitational Waves, Pohang KoreaOutlineMotivations from gravitational wave detectio
2、nNew parallel mesh refinement numerical schemeGPU acceleration for NRSummaryThe most stringent test of GRthe anomalous precession of theperihelion of Mercury (1915, v )Deflection of Starlight (1919, v )gravitational redshift (1965, v )gravitational time delayeffect (1968, v )EvidenceofGravitational
3、Waves (1978, v )frame-draggingeffect (2010, v )Direct gravitational wave detection (?, v1)GR = Newtonian Gravity + PN(v) + PN(v2) + Gravitational wave astronomySearch back to extremely early universe Hear the dark universe Gravitational wave and its detectionCategory of Black HolesSuper massive blac
4、k hole: M: 105109 MsunStellar massive black hole: M: 1-10s MsunIntermediate massive black hole: M: 10s105 Msun (mainly in globular cluster)Farrell, et al, Nature 460 (2009) 73; Feng, et al, New Astronomy Reviews 55 (2011) 166Category of Black Holes BinaryIMBHALIAXuefei Gong, et al, CQG 28, 094012 (2
5、011)1:10001:1Advanced LIGOAbadie, et al, PRD 85, 102004 (2012)IMBH and GW detectionData analysis and templateRef to Sang Hoon Ohs lectureTemplate model for BBH?Yi Pans talk, 2013Template model for BBHPN templates: for early stage of inspirallingEOBNR (effective one body model together with numerical
6、 relativity): for full inspiral + merger + ring down stage; works well for mass ratio less than 1:8 and extreme mass ratio BBH, high spinning, precession!But no reliable template for mass ratio 1:10 to 1:100From a given separation of the two BHs, when mass ratio increases the number of orbit increas
7、es quickly. This requires that the numerical simulation with full GR increases much consequently. In contrast to 1:1, 1:100 needs 10 times more computation cost.PN estimationComputational cost1:1, 9 days1:100, 20 daysLSSC cluster II, 128 CPUs, for last 2 orbits computational cost 1 to 20!Challenge o
8、f large mass BBH to NRCompared to 1:1, the computational cost of 1:100 BBH increase roughly 200 times!For typical simulation of 1:1 BBH, 14 days are needed. So by straight forward method to 1:100, roughly 1year is needed!Possible ways out1. Physical level: approximation method, such as self force fr
9、ame work (but still first order yet), 2. Numerical Algorithm level: implicit scheme R. Lau et al, PRD 84, 084023 (2011), combine Cauchy evolution to null evolution, 3. Computer level: improve scalability to use more CPUs, use GPU, Possible ways out1. Physical level: approximation method, such as sel
10、f force frame work (but still first order yet), 2. Numerical Algorithm level: implicit scheme R. Lau et al, PRD 84, 084023 (2011), combine Cauchy evolution to null evolution, 3. Computer level: improve scalability to use more CPUs, use GPU, Mesh refinement schemeHigh resolution mesh grids for region
11、 near BH, while low resolution mesh grids for far regionMesh refinement in CFDResult based on PARAMESHPARAMESHGrACEJASMINComparison of NR and CFDNR (only for BH): computational expensive on single grid point, but functions quite smooth few grid points (handrads), high order finite differenceCFD: com
12、putation on single point is cheap, but fluid dynamics quite complex (compare the lectures on HD) grid number is quite large (millions)Mesh refinement schemeScheme adopted by PARAMESHLevel 0Level 1Mesh refinement schemeScheme adopted by PARAMESHLevel 0Level 1txMesh refinement schemeScheme for NRLevel
13、 0Level 1Distribute data along one level to available processesMesh refinement schemeScheme for NRF. Loeffler et al, CQG 29, 115001 (2012)Level 0Level 1LS schemeMesh refinement schemeParallelization limit:200 x200 x2006th order finite difference (8 ghost points for two sides) processesHow about dist
14、ribute data on all levels and calculate them parallely?Parallel mesh level algorithmPX scheme: distribute data on all levels to all processes; calculate parallelyMesh refinement scheme Procs for lev0 procs for lev1 procs for lev2 run run run wait wait run wait run run wait wait run run run run Stron
15、g scalling property due to more data to distribute;Resource wasting (Lx procs of LS) due to waiting!Calculation speed: 2 times faster!timeParallel mesh level algorithmP2 scheme: distribute data on finest level to half processes and distribute data on other levels along the same level to another half
16、 processes; calculate parallely for finest level and other levels, while sequentially for other levelslev0lev2lev1Mesh refinement scheme Procs for lower levels procs for lev2 lev1 run lev0 run lev1 run wait run lev1 run Scalling property is weaker than PX;Less waiting (2x procs LS)!Calculation speed
17、: 2 times faster!timeComparison to LS schememore complicate casetxlev0lev1lev2 Now, procs for finest level have to wait!more complicate casetxlev0lev1lev2GPU accelerationFor system biology, Yamazaki, Igarashi, Neural Networks, 2013For GW data analysis, Zhihui Du, et al, CQG 29, 235018 (2012)Put RHS
18、calculation to GPUFor AMSS-NCKU code, time for RHS calculation 80%RHS function involves too many variables, even only transform their addresses are time consumingSo pack these addresses and store it in constant memory (do not transform any more during evolution), save shared memory at the same timeP
19、ut RHS calculation to GPUKeep the data on GPU till MPI data transfer between different processesUsing buffer point method to reduce MPI transfer for RK4 from 4 times to only 1 time; also reduce data transfer times between GPU and CPUPut RHS calculation to GPUArrange shared memoryDivide RHS calculati
20、on into 8 parts, let the memory requirement for each part can be satisfied with shared memoryFor one RHS calculation, copy data from global memory to shared memory once and use shared memory in most timePut restrict-prolong to GPUAfter put RHS to GPU, the most time consuming part is Restrict-Prolong interpolationHow to treat this part? The work is going on
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 未來商業(yè)機遇開發(fā)與英語能力試題及答案
- 護理理論學試題及答案
- 大學物理2025年考生心得分享試題及答案
- 家具設計與用戶體驗優(yōu)化考核試題及答案
- 家具設計中的局部與整體關系探討試題及答案
- 探討2025年創(chuàng)業(yè)扶持政策對未來職業(yè)發(fā)展的影響試題及答案
- 樂理考試經(jīng)典曲目分析試題及答案
- 單元反應在化學中的應用試題及答案
- 安全工程師考試經(jīng)驗教訓相關題目試題及答案
- 建筑施工安全教育方法試題及答案
- 2025屆河北省“五個一”名校聯(lián)盟高三下學期4月聯(lián)考地理試題(含答案)
- GB/T 17937-2024電工用鋁包鋼線
- 電子書 -《商業(yè)的底層邏輯》
- 多圖中華民族共同體概論課件第十一講 中華一家與中華民族格局底定(清前中期)根據(jù)高等教育出版社教材制作
- 外貿(mào)談判知識分享課件
- 教學課件-思想政治教育心理學-楊芷英
- 網(wǎng)絡互連設備-交換機VLAN配置
- 國際企業(yè)的財務管理完整版
- 2023年浙江省高考滿分作文:科技的新秀人文的毒酒
- 藥品召回函和通知單
- NY/T 405-2000脫毒大款種蒜(苗)病毒檢測技術規(guī)程
評論
0/150
提交評論