intel tbb 的使用課件

上傳人：3*** IP屬地：湖北上傳時間：2021-10-24 格式：PPTX 頁數(shù)：14 大?。?03.95KB 積分：30 舉報 版權申訴

已閱讀5頁，還剩9頁未讀，繼續(xù)免費閱讀

版權說明：本文檔由用戶提供并上傳，收益歸屬內容提供方，若內容存在侵權，請進行舉報或認領

文檔簡介

1、TBB的幾個特性與線程不同，您可以對任務使用更高程度的抽象。Intel 聲稱，在 Linux 系統(tǒng)上，啟動和結束任務的速度是對線程執(zhí)行相同操作的 18 倍。Intel TBB 附帶了一個任務調度程序，該程序可以跨多個邏輯和物理內核高效地處理負載平衡。Intel TBB 中的默認任務調度策略不同于大多數(shù)線程調度程序所擁有的輪詢策略。Intel TBB 提供了一些可直接使用的線程安全容器，比如 concurrent_vector 和 concurrent_queue?？梢允褂猛ㄓ玫牟⑿兴惴ǎ?parallel_for 和 parallel_reduce。模板類 atomic 中提供了無鎖（Loc

2、k-free，也稱為 mutex-free）并發(fā)編程支持。這種支持使得 Intel TBB 適合用于高性能的應用程序，因為 Intel TBB 可以鎖定和解除鎖定互斥體 (mutex)。這都是用 C+ 實現(xiàn)的！沒有進行任何擴展或使用宏，Intel TBB 只使用這種語言，同時還使用了大量的模板。上圖的內容可以分為以下幾類：l 通用并行算法TBB提供了parallel_for，parallel_while，parallel_reduce等算法，應用于不同的并行算法場景l(fā) 并發(fā)容器這是對常見容器的線程安全版本的實現(xiàn)，同時考慮到性能要求，提供了細粒度的鎖機制，TBB2.0里提供的容器包括hash m

3、ap，vector，queue。l 任務調度器：提供了task機制的封裝l 同步原語：提供了原子操作、mutex、lock等同步原語的封裝l 內存分配：提供了對cache機制更友好的支持parallel_forl 摘要parallel_for是在一個值域執(zhí)行并行迭代操作的模板函數(shù)。l 語法templateFunc parallel_for( Index first, Index_type last, const Func& f , task_group_context&group ); templateFunc parallel_for( Index first, Index_type las

4、t, Index step, const Func&f , task_group_context&group ); template Void parallel_for( const Range& range, const Body& body, , partitioner,task_group_context& group );l 頭文件#include “tbb/parallel_for.h”l 描述parallel_for(first, last,step, f)表示一個循環(huán)的并行執(zhí)行： for(auto i= first; ilast; i+=step) f(i);example:#i

5、nclude #include using namespace tbb;using namespace std;int main() parallel_for(0,10,(int v)coutv ;); return0;parallel_for原型語義Body:Body(const Body&) 拷貝構造Body:Body()析構void Body:operator()(Range& range) const對range對象應用body對象l描述parallel_for(range, body, partitioner)提供了并行迭代的泛型形式。它表示在區(qū)域的每個值，并行執(zhí)行body。part

6、itioner選項指定了分割策略。Range類型必須符合Range概念模型。body必須符合下表的要求：采用最后一個模板以及stl中的vector容器改寫example:#include #include #include #include #include using namespace std;using namespace tbb;typedef vector:iterator IntVecIt;struct body void operator()(const blocked_range&r)const for(auto i = r.begin(); i!=r.end(); i+) c

7、out*i ; ;int main() vector vec; for(int i=0; i10; i+) vec.push_back(i); parallel_for(blocked_range(vec.begin(), vec.end() , body(); return 0;原型摘要R:R(const R& )構造函數(shù)R:R()析構函數(shù)bool R:empty() const區(qū)域為空返回turebool R:is_divisible() const 如果區(qū)域可再分，返回tureR:R(R& r, split)將r分為兩個子區(qū)域Parallel_reducel 摘要parallel_red

8、uce模板在一個區(qū)域迭代，將由各個任務計算得到的部分結果合并，得到最終結果。parallel_reduce對區(qū)域（range）類型的要求與parallel_for一樣。l 語法templateValue parallel_reduce(const Range& range, const Value& identity, const Func& func, const Reduction& reduction, , partitioner,task_group_context& group ); template void parallel_reduce(const Range& range,

9、const Body& body , partitioner,task_group_context& group );l 頭文件#include “tbb/parallel_reduce.h”原型摘要Value IdentityFunc:operator()的左標識元素Value Func:operator()(const Range& range, const Value& x)累計從初始值x開始的子區(qū)域的結果Value Reduction:operator()(const Value& x, const Value& y);合并x跟y的結果l 描述parallel_reduce模板有兩種形

10、式。函數(shù)形式是為方便與lambda表達式一起使用而設計。第二種形式是為了最小化數(shù)據(jù)拷貝。下面的表格總結了第一種形式中的identity,func,reduction的類型要求：Parallel_reduce#include #include #include #include using namespace std;using namespace tbb;int main() vector vec; for(int i=0; i100; i+) vec.push_back(i);int result = parallel_reduce(blocked_rangevector:iterator

11、(vec.begin(), vec.end(), 0, (const blocked_rangevector:iterator& r, int init)-int for(auto a = r.begin(); a!=r.end(); a+) init+=*a; return init; , (int x, int y)-int return x+y; ); coutresult:resultendl; return 0;了解TBB任務Intel TBB 基于任務的概念。您需要定義自己的任務，這些任務是從 tbb:task 中派生的，并使用 tbb/task.h 進行聲明。用戶需要在自己的

12、代碼中重寫純虛擬方法 task* task:execute ( )。下面展示了每個 Intel TBB 任務的一些屬性：當 Intel TBB 任務調度程序選擇運行一些任務時，會調用該任務的 execute 方法。這是入口點。execute 方法會返回一個 task*，告訴調度程序將要運行的下一個任務。如果它返回 NULL，那么調度程序可以自由選擇下一個任務。task:task( ) 是虛擬的，不管用戶任務占用了什么資源，都必須在這個析構函數(shù) (destructor) 中釋放。任務是通過調用 task:allocate_root( ) 來分配的。主任務通過調用 task:spawn_root_

13、and_wait(task) 來完成任務的運行。創(chuàng)建第一個 Intel TBB 任務#include tbb/tbb.h#include using namespace tbb;using namespace std;class first_task : public task public: task* execute( ) cout Hello World!n; return NULL; ;int main( ) task_scheduler_init init(task_scheduler_init:automatic); first_task& f1 = *new(tbb:task:a

14、llocate_root() first_task( ); tbb:task:spawn_root_and_wait(f1);Simple Example: Fibonacci NumbersThis is the serial code:long SerialFib( long n ) if( n2 ) return n; else return SerialFib(n-1)+SerialFib(n-2);The top-level code for the parallel task-based version is:long ParallelFib( long n ) long sum;

15、 FibTask& a = *new(task:allocate_root() FibTask(n,&sum); task:spawn_root_and_wait(a); return sum;The real work is inside struct FibTask. Its definition is shown below.class FibTask: public task public: const long n; long* const sum; FibTask( long n_, long* sum_ ) : n(n_), sum(sum_) task* execute() /

16、 Overrides virtual function task:execute if( nCutOff ) *sum = SerialFib(n); else long x, y; FibTask& a = *new( allocate_child() ) FibTask(n-1,&x); FibTask& b = *new( allocate_child() ) FibTask(n-2,&y); / Set ref_count to two children plus one for the wait. set_ref_count(3); / Start b running. spawn(

17、 b ); / Start a running and wait for all children (a and b). spawn_and_wait_for_all(a); / Do the sum *sum = x+y; return NULL; ;END謝謝觀看！Simple Example: Fibonacci NumbersThis is the serial code:long SerialFib( long n ) if( n2 ) return n; else return SerialFib(n-1)+SerialFib(n-2);The top-level code for

18、 the parallel task-based version is:long ParallelFib( long n ) long sum; FibTask& a = *new(task:allocate_root() FibTask(n,&sum); task:spawn_root_and_wait(a); return sum;ThiscodeusesataskoftypeFibTasktodotherealwork.Itinvolvesthefollowingdistinctsteps:1.Allocatespaceforthetask.Thisisdonebyaspecialove

19、rloadednewandmethodtask:allocate_root.The_rootsuffixinthenamedenotesthefactthatthetaskcreatedhasnoparent.Itistherootofatasktree.Tasksmustbeallocatedbyspecialmethodssothatthespacecanbeefficientlyrecycledwhenthetaskcompletes.2.ConstructthetaskwiththeconstructorFibTask(n,&sum)invokedbynew.Whenthetaskis

20、runinstep3,itcomputesthenthFibonaccinumberandstoresitinto*sum.3.Runthetasktocompletionwithtask:spawn_root_and_wait.The real work is inside struct FibTask. Its definition is shown below.class FibTask: public task public: const long n; long* const sum; FibTask( long n_, long* sum_ ) : n(n_), sum(sum_)

21、 task* execute() / Overrides virtual function task:execute if( nCutOff ) *sum = SerialFib(n); else long x, y; FibTask& a = *new( allocate_child() ) FibTask(n-1,&x); FibTask& b = *new( allocate_child() ) FibTask(n-2,&y); / Set ref_count to two children plus one for the wait. set_ref_count(3); / Start

22、 b running. spawn( b ); / Start a running and wait for all children (a and b). spawn_and_wait_for_all(a); / Do the sum *sum = x+y; return NULL; ;MethodFibTask:execute()doesthefollowing:Checksifnissosmallthatserialexecutionwouldbefaster.FindingtherightvalueofCutOffrequiressomeexperimentation.Avalueof

23、atleast16workswellinpracticeforgettingmostofthepossiblespeedupoutofthisexample.Resortingtoasequentialalgorithmwhentheproblemsizebecomessmallischaracteristicofmostdivide-and-conquerpatternsforparallelism.Findingthepointatwhichtoswitchrequiresexperimentation,sobesuretowriteyourcodeinawaythatallowsyout

24、oexperiment.Iftheelseistaken,thecodecreatesandrunstwochildtasksthatcomputethe(n-1)thand(n-2)thFibonaccinumbers.Here,inheritedmethodallocate_child()isusedtoallocatespaceforthetask.Rememberthatthetop-levelroutineParallelFibusedallocate_root()toallocatespaceforatask.Thedifferenceisthatherethetaskiscrea

25、tingchildtasks.Thisrelationshipisindicatedbythechoiceofallocationmethod.Callsset_ref_count(3).Thenumber3representsthetwochildrenandanadditionalimplicitreferencethatisrequiredbymethodspawn_and_wait_for_all.Makesuretocallset_reference_count(3)beforespawninganychildren.Failuretodosoresultsinundefinedbe

26、havior.Thedebugversionofthelibraryusuallydetectsandreportsthistypeoferror.Spawnstwochildtasks.Spawningataskindicatestotheschedulerthatitcanrunthetaskwheneveritchooses,possiblyinparallelwithothertasks.Thefirstspawning,bymethodspawn,returnsimmediatelywithoutwaitingforthechildtasktostartexecuting.These

27、condspawning,bymethodspawn_and_wait_for_all,causestheparenttowaituntilallcurrentlyallocatedchildtasksarefinished.Afterthetwochildtaskscomplete,theparentcomputesx+yandstoresitin*sum.https:/ TBB 和 OpenMP API 通過工作竊取來管理任務調度。在工作竊取過程中，線程池中的每個線程維護一個雙端列隊本地任務池。一個線程像使用堆棧一樣使用自身的任務池，并將所產(chǎn)生的新任務推堆棧頂部。當一個線程執(zhí)行了一個任務,

28、它會首先從本地堆棧的頂部彈出一個任務。堆棧頂部的任務是最新的，因此最有可能訪問到數(shù)據(jù)緩存中的熱點數(shù)據(jù)。如果本地任務池中沒有任務，它會試圖從另一線程（）那里竊取工作。當工作被竊取時，一個線程會將偷竊對象的雙端隊列作為普通隊列來使用，因，所竊取的僅是偷竊對象雙端隊列中最舊的任務。對于遞歸算法，這些最舊的任務均為位于任務樹高處的節(jié)點，因此屬于大型工作塊，并且通常不是偷竊對象數(shù)據(jù)緩存中的熱點。因此，工作竊取是一個實現(xiàn)負載平衡并且維持本地化緩存的高效機制。To set up the terminology for the following discussion, a thread belonging

29、 to TBBs internal thread pool will be called a “worker”, and any other thread will go under the alias of “master” (e.g. applications main thread, or any thread explicitly created by programmer).When the first master initialized TBB 2.2 task scheduler, the following things happened:1) Global “arena” object was instantiated2) Internal pool of “worker” threads (or simply workers) was created3) Workers registered themselves in the arenaWhen the master then started a

人人文庫> 全部分類> 教育資料 > 課件下載

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內容里面會有圖紙預覽，若沒有圖紙預覽就沒有圖紙。
4. 未經(jīng)權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內容的表現(xiàn)方式做保護處理，對用戶上傳分享的文檔內容本身不做任何修改或編輯，并不能對任何下載內容負責。
6. 下載文件中如有侵權或不適當內容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

intel tbb 的使用課件

文檔簡介

溫馨提示

最新文檔

評論

intel tbb 的使用課件

文檔簡介

溫馨提示

最新文檔

評論

相關文檔