并行程序設(shè)計(jì) 中文課件 06 OpenMP多線程程序設(shè)計(jì)_第1頁
并行程序設(shè)計(jì) 中文課件 06 OpenMP多線程程序設(shè)計(jì)_第2頁
并行程序設(shè)計(jì) 中文課件 06 OpenMP多線程程序設(shè)計(jì)_第3頁
并行程序設(shè)計(jì) 中文課件 06 OpenMP多線程程序設(shè)計(jì)_第4頁
并行程序設(shè)計(jì) 中文課件 06 OpenMP多線程程序設(shè)計(jì)_第5頁
已閱讀5頁,還剩41頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

ParallelProgrammingInstructor:ZhangWeizhe(張偉哲)ComputerNetworkandInformationSecurityTechniqueResearchCenter,SchoolofComputerScienceandTechnology,HarbinInstituteofTechnology2ProgrammingwithOpenMPOutline3WhatIsOpenMP*?OpenMPisasetofextensionstoFortran/C/C++OpenMP是一組Fortran/C/C++的擴(kuò)展OpenMPcontainscompilerdirectives,libraryroutinesandenvironmentvariables.OpenMP包含編譯器指令,庫例程和環(huán)境變量。Availableonmostsingleaddressspacemachines.在大多數(shù)單個地址空間機(jī)器上可用。sharedmemorysystems,includingcc-NUMA共享內(nèi)存系統(tǒng),包括cc-NUMAChipMultiThreading:ChipMultiProcessing(SunUltraSPARCIV),SimultaneousMultithreading(IntelXeon)芯片多線程:芯片多處理(SunUltraSPARCIV),同時多線程(IntelXeon)notondistributedmemorysystems,classicMPPs,orPCclusters(yet!)不是分布式內(nèi)存系統(tǒng),經(jīng)典MPP或PC集群(還有?。?WhatIsOpenMP*?Compilerdirectivesformultithreadedprogramming用于多線程編程的編譯器指令EasytocreatethreadedFortranandC/C++codes容易創(chuàng)建線程Fortran和C/C++代碼Supportsdataparallelismmodel支持?jǐn)?shù)據(jù)并行模型Incrementalparallelism增量并行Combinesserialandparallelcodeinsinglesource在單個源中組合串行和并行代碼5WhatIsOpenMP*?omp_set_lock(lck)#pragmaompparallelforprivate(A,B)#pragmaompcriticalC$OMPparalleldoshared(a,b,c)C$OMPPARALLELREDUCTION(+:A,B)callOMP_INIT_LOCK(ilok)callomp_test_lock(jlok)setenvOMP_SCHEDULE“dynamic”CALLOMP_SET_NUM_THREADS(10)C$OMPDOlastprivate(XX)C$OMPORDEREDC$OMPSINGLEPRIVATE(X)C$OMPSECTIONSC$OMPMASTERC$OMPATOMICC$OMPFLUSHC$OMPPARALLELDOORDEREDPRIVATE(A,B,C)C$OMPTHREADPRIVATE(/ABC/)C$OMPPARALLELCOPYIN(/blk/)Nthrds=OMP_GET_NUM_PROCS()!$OMPBARRIERCurrentspecisOpenMP2.5250Pages(combinedC/C++andFortran)6OpenMPSyntaxMostoftheconstructsinOpenMParecompilerdirectivesorpragmas.

OpenMP中的大多數(shù)構(gòu)造都是編譯器指令或編譯指示。ForCandC++,thepragmastaketheform:#pragmaompconstruct[clause[clause]…]ForFortran,thedirectivestakeoneoftheforms:C$OMPconstruct[clause[clause]…]!$OMPconstruct[clause[clause]…]*$OMPconstruct[clause[clause]…]Sincetheconstructsaredirectives,anOpenMPprogramcanbecompiledbycompilersthatdon’tsupportOpenMP.由于構(gòu)造是指令,所以O(shè)penMP程序可以由不支持OpenMP的編譯器編譯。7OpenMPProgrammingModel

Fork-JoinParallelism:Masterthread

spawns

a

teamofthreads

asneeded.Parallelismisaddedincrementally:i.e.thesequentialprogramevolvesintoaparallelprogram.8OpenMP:HowisOpenMPTypicallyUsed?OpenMPisusuallyusedtoparallelizeloops:OpenMP通常用于并行化循環(huán):

Findyourmosttimeconsumingloops.找到最耗時的循環(huán)。Splitthemupbetweenthreads.在線程之間拆分它們。voidmain(){doubleRes[1000];#pragmaompparallelforfor(inti=0;i<1000;i++){do_huge_comp(Res[i]);}}voidmain(){doubleRes[1000];for(inti=0;i<1000;i++){do_huge_comp(Res[i]);}}Split-upthisloopbetweenmultiplethreadsSequentialprogramParallelprogram9OpenMPvs.POSIXThreadsPOSIXthreadsistheotherwidelyusedsharedprogrammingAPI.

POSIX線程是另一個廣泛使用的共享編程API。Fairlywidelyavailable,usuallyquitesimpletoimplementontopofOSkernelthreads.相當(dāng)廣泛的可用性,通常在OS內(nèi)核線程之上實(shí)現(xiàn)非常簡單。LowerlevelofabstractionthanOpenMP比OpenMP的抽象級別更低libraryroutinesonly,nodirectives只有庫程序,沒有指令moreflexible,buthardertoimplementandmaintain更靈活,更難實(shí)施和維護(hù)OpenMPcanbeimplementedontopofPOSIXthreadsOpenMP可以在POSIX線程之上實(shí)現(xiàn)Notmuchdifferenceinavailability可用性差異不大notthatmanyOpenMPC++implementations不是那么多的OpenMPC++實(shí)現(xiàn)nostandardFortraninterfaceforPOSIXthreads沒有標(biāo)準(zhǔn)的Fortran接口用于POSIX線程10OpenMPConstructs構(gòu)造OpenMP’sconstructsfallinto5categories:OpenMP的構(gòu)造分為五類:ParallelRegions平行區(qū)域Worksharing工作集DataEnvironment數(shù)據(jù)環(huán)境Synchronization同步Runtimefunctions/environmentvariables運(yùn)行時功能/環(huán)境變量OpenMPisbasicallythesamebetweenFortranandC/C++Fortran和C/C++之間的OpenMP基本相同11OpenMP:ParallelRegionsYoucreatethreadsinOpenMPwiththe“ompparallel”pragma.Forexample,Tocreatea4-threadParallelregion:Eachthreadcallspooh(ID,A)forID=0to3doubleA[1000];omp_set_num_threads(4);#pragmaompparallel{intID=omp_get_thread_num();pooh(ID,A);}Eachthreadredundantlyexecutesthecodewithinthestructuredblock12HowManyThreads?Setenvironmentvariablefornumberofthreads setOMP_NUM_THREADS=4ThereisnostandarddefaultforthisvariableManysystems:#ofthreads=#ofprocessors#個線程=#個處理器Intel?compilersusethisdefault1314OpenMP:Work-SharingConstructsSplitsloopiterationsintothreads將循環(huán)迭代拆分成線程Mustbeintheparallelregion必須在并行區(qū)域Mustprecedetheloop必須在循環(huán)之前#pragmaompparallel#pragmaompforfor(I=0;I<N;I++){NEAT_STUFF(I);}Bydefault,thereisabarrierattheendofthe“ompfor”.Usethe“nowait”clausetoturnoffthebarrier.15Work-sharingConstructThreadsareassignedanindependentsetofiterations線程被分配一組獨(dú)立的迭代Threadsmustwaitattheendofwork-sharingconstruct線程必須在工作共享結(jié)構(gòu)的末尾等待#pragmaompparallel#pragmaompforImplicitbarrieri=1i=2i=3i=4i=5i=6i=7i=8i=9i=10i=11i=12#pragmaompparallel#pragmaompforfor(i=1,i<13,i++)c[i]=a[i]+b[i]16WorkSharingConstructs

Amotivatingexamplefor(i=0;I<N;i++){a[i]=a[i]+b[i];}#pragmaompparallel{intid,i,Nthrds,istart,iend;id=omp_get_thread_num();Nthrds=omp_get_num_threads();istart=id*N/Nthrds;iend=(id+1)*N/Nthrds;for(i=istart;I<iend;i++){a[i]=a[i]+b[i];}}#pragmaompparallel#pragmaompforschedule(static)for(i=0;I<N;i++){a[i]=a[i]+b[i];}OpenMPparallelregionandawork-sharingforconstructSequentialcodeOpenMPParallelRegionOpenMPParallelRegionandawork-sharingforconstruct17AssigningIterationstoThreads

Thescheduleclauseofthefordirectivedealswiththeassignmentofiterationstothreads.for指令的schedule子句處理對線程的迭代的分配。Thegeneralformofthescheduledirectiveisschedule(scheduling_class[,parameter]).OpenMPsupportsfourschedulingclasses:static,dynamic,guided,andruntime.OpenMP支持四個調(diào)度類:靜態(tài),動態(tài),引導(dǎo)和運(yùn)行。18AssigningIterationstoThreads:Example/*staticschedulingofmatrixmultiplicationloops*/#pragmaompparalleldefault(private)shared(a,b,c,dim)\num_threads(4)#pragmaompforschedule(static)for(i=0;i<dim;i++){for(j=0;j<dim;j++){c(i,j)=0;for(k=0;k<dim;k++){c(i,j)+=a(i,k)*b(k,j);}}}19ScheduleClauseWhenToUseSTATICPredictableandsimilarworkperiteration每次迭代可預(yù)測和類似的工作DYNAMICUnpredictable,highlyvariableworkperiteration不可預(yù)測,高度可變的工作每次迭代GUIDEDSpecialcaseofdynamictoreduceschedulingoverhead動態(tài)減少調(diào)度開銷的特殊情況WhichScheduletoUse20ParallelSectionsIndependentsectionsofcodecanexecuteconcurrently代碼的獨(dú)立部分可以同時執(zhí)行SerialParallel#pragmaompparallelsections{#pragmaompsectionphase1();#pragmaompsectionphase2();#pragmaompsectionphase3();}21DataEnvironmentOpenMPusesashared-memoryprogrammingmodelOpenMP使用共享內(nèi)存編程模型Mostvariablesaresharedbydefault.大多數(shù)變量默認(rèn)共享。Globalvariablesaresharedamongthreads線程之間共享全局變量C/C++:Filescopevariables,static文件范圍變量,靜態(tài)But,noteverythingisshared...StackvariablesinfunctionscalledfromparallelregionsarePRIVATEAutomaticvariableswithinastatementblockarePRIVATELoopindexvariablesareprivate(withexceptions)C/C+:Thefirstloopindexvariableinnestedloopsfollowinga#pragmaompfor22DataScopeAttributesThedefaultstatuscanbemodifiedwith可以修改默認(rèn)狀態(tài)default(shared|none)Scopingattributeclauses作用域?qū)傩詶l款shared(varname,…)private(varname,…)23ThePrivateClauseReproducesthevariableforeachthread復(fù)制每個線程的變量Variablesareun-initialized;C++objectisdefaultconstructed變量未初始化;C++對象是默認(rèn)構(gòu)造的Anyvalueexternaltotheparallelregionisundefined并行區(qū)域外部的任何值未定義void*work(float*c,intN){floatx,y;inti;#pragmaompparallelforprivate(x,y)for(i=0;i<N;i++){ x=a[i];y=b[i]; c[i]=x+y;}}24OpenMP:ReductionAnotherclausethateffectsthewayvariablesareshared:另一個影響變量共享方式的條款:

reduction(op:list)Thevariablesin“l(fā)ist”mustbesharedintheenclosingparallelregion.“列表”中的變量必須在封閉的并行區(qū)域中共享。Insideaparalleloraworksharingconstruct:在平行或作業(yè)分配結(jié)構(gòu)中:Alocalcopyofeachlistvariableismadeandinitializeddependingonthe“op”(e.g.0for“+”)每個列表變量的本地副本根據(jù)“op”(例如0表示“+”)進(jìn)行初始化,pairwise“op”isupdatedonthelocalvalue成對的“op”更新為本地值Localcopiesarereducedintoasingleglobalcopyattheendoftheconstruct.本地副本的構(gòu)造末尾還原成一個單一的全局拷貝。25OpenMP:AnReductionExample#include<omp.h>#defineNUM_THREADS2voidmain(){ inti; doubleZZ,func(),sum=0.0;

omp_set_num_threads(NUM_THREADS)

#pragmaompparallelforreduction(+:sum)private(ZZ) for(i=0;i<1000;i++){ ZZ=func(i); sum=sum+ZZ; }}26ImplicitBarriersSeveralOpenMP*constructshaveimplicitbarriers幾個OpenMP*構(gòu)造具有隱含的障礙parallelforsingleUnnecessarybarriershurtperformance不必要的障礙傷害了表現(xiàn)Waitingthreadsaccomplishnowork!等待沒有工作的線程!Suppressimplicitbarriers,whensafe,withthenowaitclause使用nowait條款來抑制隱含的障礙27OpenMP:SynchronizationOpenMPhasthefollowingconstructstosupportsynchronization:OpenMP具有以下支持同步的結(jié)構(gòu):

barrier屏障criticalsection關(guān)鍵部分atomic原子flushordered

singlemaster28BarrierConstructExplicitbarriersynchronization顯式屏障同步Eachthreadwaitsuntilallthreadsarrive每個線程等待直到所有線程到達(dá)#pragmaompparallelshared(A,B,C)

{

DoSomeWork(A,B);

printf(“ProcessedAintoB\n”);

#pragmaompbarrier

DoSomeWork(B,C);

printf(“ProcessedBintoC\n”);

}29AtomicConstructSpecialcaseofacriticalsection關(guān)鍵部分的特殊情況Appliesonlytosimpleupdateofmemorylocation僅適用于簡單更新內(nèi)存位置#pragmaompparallelforshared(x,y,index,n)for(i=0;i<n;i++){#pragmaompatomicx[index[i]]+=work1(i);y[i]+=work2(i);}

30CriticalandAtomicOnlyonethreadatatimecanenteracriticalsection一次只能有一個線程進(jìn)入關(guān)鍵部分C$OMPPARALLELDOPRIVATE(B)C$OMP&SHARED(RES)DO100I=1,NITERS B=DOIT(I)C$OMPCRITICAL CALLCONSUME(B,RES)C$OMPENDCRITICAL100CONTINUEC$OMPPARALLELPRIVATE(B) B=DOIT(I)C$OMPATOMIC X=X+BC$OMPENDPARALLELAtomicisaspecialcaseofacriticalsectionthatcanbeusedforcertainsimplestatements:Atomic是一個關(guān)鍵部分的特殊情況,可用于某些簡單的語句:3132MasterdirectiveThemasterconstructdenotesastructuredblockthatisonlyexecutedbythemasterthread.Theotherthreadsjustskipit(noimpliedbarriersorflushes).主體結(jié)構(gòu)表示僅由主線程執(zhí)行的結(jié)構(gòu)化塊。其他線程只是跳過它(沒有屏障或刷新)。#pragmaompparallelprivate(tmp){do_many_things();#pragmaompmaster{exchange_boundaries();}#pragmabarrierdo_many_other_things();}33SingledirectiveThesingleconstructdenotesablockofcodethatisexecutedbyonlyonethread.單個構(gòu)造表示僅由一個線程執(zhí)行的代碼塊。Abarrierandaflushareimpliedattheendofthesingleblock.在單個塊的末尾隱含屏障和刷新。#pragmaompparallelprivate(tmp){ do_many_things();

#pragmaompsingle {exchange_boundaries();} do_many_other_things();}34OpenMP:LibraryroutinesLockroutinesomp_init_lock(),omp_set_lock(),omp_unset_lock(),omp_test_lock()Runtimeenvironmentroutines:Modify/Checkthenumberofthreadsomp_set_num_threads(),omp_get_num_threads(),omp_get_thread_num(),omp_get_max_threads()Turnon/offnestinganddynamicmodeomp_set_nested(),omp_set_dynamic(),omp_get_nested(),omp_get_dynamic()Areweinaparallelregion?omp_in_parallel()Howmanyprocessorsinthesystem?omp_num_procs()35#include<omp.h>main(){intnthreads,tid;/*Forkateamofthreadsgivingthemtheirowncopiesofvariables*/#pragmaompparallelprivate(nthreads,tid){

/*Obtainthreadnumber*/tid=omp_get_thread_num();printf("HelloWorldfromthread=%d\n",tid);

/*Onlymasterthreaddoesthis*/if(tid==0){nthreads=omp_get_num_threads();printf("Numberofthreads=%d\n",nthreads);}}/*Allthreadsjoinmasterthreadanddisband*/}1.HelloWorld!36#include<pthread.h>#include<stdio.h>#defineNUM_THREADS5void*PrintHello(void*threadid){printf("\n%d:HelloWorld!\n",threadid);pthread_exit(NULL);}intmain(intargc,char*argv[]){pthread_tthreads[NUM_THREADS];intrc,t;for(t=0;t<NUM_THREADS;t++){printf("Creatingthread%d\n",t);rc=pthread_create(&threads[t],NULL,PrintHello,(void*)t);if(rc){printf("ERROR;returncodefrompthread_create()is%d\n",rc);exit(-1);}}

pthread_exit(NULL);}ExampleCode-PthreadCreationandTermination

37

PROGRAMREDUCTIONINTEGERI,NREALA(100),B(100),SUM!SomeinitializationsN=100DOI=1,NA(I)=I*1.0B(I)=A(I)ENDDOSUM=0.0!$OMPPARALLELDOREDUCTION(+:SUM)DOI=1,NSUM=SUM+(A(I)*B(I))ENDDOPRINT*,'Sum=',SUMEND2.ParallelLoopReduction383.Matrix-vectormultiplyusingaparallelloopandcriticaldirective/***Spawnaparallelregionexplicitlyscopingallvariables***/#pragmaompparallelshared(a,b,c,nthreads,chunk)private(tid,i,j,k){#pragmaompforschedule(static,chunk)for(i=0;i<NRA;i++){printf("thread=%ddidrow=%d\n",tid,i);for(j=0;j<NCB;j++)for(k=0;k<NCA;k++)c[i][j]+=a[i][k]*b[k][j];}}39Parallelize:Win32API,PIvoidmain(){doublepi;inti;DWORDthreadID;intthreadArg[NUM_THREADS];for(i=0;i<NUM_THREADS;i++)threadArg[i]=i+1;InitializeCriticalSection(&hUpdateMutex);for(i=0;i<NUM_THREADS;i++){thread_handles[i]=CreateThread(0,0,(LPTHREAD_START_ROUTINE)Pi,&threadArg[i],0,&threadID);}WaitForMultipleObjects(NUM_THREADS,thread_handles,TRUE,INFINITE);pi=global_sum*step;printf("piis%f\n",pi);}#include<windows.h>#defineNUM_THREADS2HANDLEthread_handles[NUM_THREADS];CRITICAL_SECTIONhUpdateMutex;staticlongnum_steps=100000;doublestep;doubleglobal_sum=0.0;voidPi(void*arg){inti,start;doublex,sum=0.0;start=*(int*)arg;step=1.0/(double)num_steps;for(i=start;i<=num_steps;i=i+NUM_THREADS){ x=(i-0.5)*step; sum=sum+4.0/(1.0+x*x);}EnterCriticalSection(&hUpdateMutex);global_sum+=sum;LeaveCriticalSection(&hUpdateMutex);}Doublescodesize!40Solution:KeepitsimpleThreadslibraries:線程庫:Pro:ProgrammerhascontrolovereverythingPro:程序員掌握了一切Con:ProgrammermustcontroleverythingCon:程序員必須控制一切ProgrammersscaredawayFullcontrolIncreasedcomplexity增加復(fù)雜Sometimesasimpleevolutionaryapproachisbetter有時一個簡單的進(jìn)化方法更好41PIProgram:anexamplestaticlongnum_steps=100000;doublestep;voidmain(){ inti;doublex,pi,sum=0.0; step=1.0/(double)num_steps; for(i=1;i<=num_steps;i++){ x=(i-0.5)*step; sum=sum+4.0/(1.0+x*x); } pi=step*sum;}42OpenMPPIProgram:

ParallelRegionexample(SPMDProgram)#include<omp.h>staticlongnum_steps=100000;doublestep;#defineNUM_THREADS2voidmain(){inti;doublex,pi,sum[NUM_THREADS];step=1.0/(double)num_steps;

omp_set_num_threads(NUM_THREADS);#pragmaompparallel{doublex;intid;id=omp_get_thread_num();for(i=id,sum[id]=0.0;i<num_steps;i=i+NUM_THREADS){ x=(i+0.5)*step; sum[id]+=4.0/(1.0+x*x); }}for(i=0,pi=0.0;i<NUM_THREADS;i++)pi+

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論