外文翻譯-基于嵌入式系統(tǒng)的寄存器分配的混合進化算法的解決方案_第1頁
外文翻譯-基于嵌入式系統(tǒng)的寄存器分配的混合進化算法的解決方案_第2頁
外文翻譯-基于嵌入式系統(tǒng)的寄存器分配的混合進化算法的解決方案_第3頁
外文翻譯-基于嵌入式系統(tǒng)的寄存器分配的混合進化算法的解決方案_第4頁
外文翻譯-基于嵌入式系統(tǒng)的寄存器分配的混合進化算法的解決方案_第5頁
已閱讀5頁,還剩19頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)

文檔簡介

1、 畢業(yè)設(shè)計(論文)外文資料翻譯 學(xué)生姓名: 學(xué) 號: 所在學(xué)院: 專 業(yè): 指導(dǎo)教師: 2013年 4 月 18 日Hybrid Evolutionary Algorithm based solution for Register Allocation for Embedded SystemsAnjali MahajanG H Raisoni College of Engineering, Nagpur, IndiaM S AliProf. Ram Meghe Institute of Technology and Research,Badnera, Amravati, IndiaAbstra

2、ct Embedded systems have an ever-increasing need for optimizing compilers to produce high quality codes with a limited general purpose register set. Either memory or registers are used to store the results of computation of a program. As compared to memory, accessing a register is much faster, but t

3、hey are scarce resources and have to be utilized very efficiently. The optimization goal is to hold as many live variables as possible in registers in order to avoid expensive memory accesses. We present a hybrid evolutionary algorithm for graph coloring register allocation problem based on a new cr

4、ossover operator called crossover by conflict-free sets(CCS) and a new local search function. Index Termscompilers, compiler optimization, register allocation, hybrid evolutionary algorithm, embedded systems I. INTRODUCTION Register allocation is one of the most important optimizations a compiler pe

5、rforms and is becoming increasingly important as the gap between processor speed and memory access time widens. Its goal is to find a way to map the temporary variables used in a program into physical memory locations (either main memory or machine registers). Accessing a register is much faster tha

6、n accessing memory; therefore one tries to use registers as much as possible. Of course, this is not always possible, thus some variables must be transferred (spilled) to and from memory. This has a cost, the cost of load and store operations, which should be avoided as much as possible. Typically,

7、this degrades runtime performance and increases power consumption. Therefore, an efficient mapping should generally minimize the register requirements and the number of spilling instructions. The critical applications, especially in embedded computing, industrial compilers are ready to accept longer

8、 compilation times if the final code gets improved. This approach attempts to combine the flexibility of general-purpose programmable processors with the performance achieved by domain-specific architectureoptimizations. Compilers for embedded processors must cope with these architectural optimizati

9、ons and be able to exploit them. This paper presents a heuristic algorithm for graph coloring register allocation problem for embedded systems based on a new crossover operator and a new local search function. Graph coloring abstracts the problem of assigning registers to live ranges in a program in

10、to the problem of assigning colors to nodes in an interference graph. The register allocator attempts to “color” the graph with a finite number of machine registers, with one constraint that any two nodes connected by an interference edge must be colored with different registers. To model register a

11、llocation as a graph coloring problem, the compiler first constructs an interference graph G. The nodes in G correspond to live ranges, and the edges represent interferences. Thus, there is an edge in G from node i to node j if live range of i interferes with live range of j, that is, if they are si

12、multaneously live at some point and cannot occupy the same register. To find an allocation from G, the compiler looks for a k-coloring of G, that is, an assignment of k colors to the nodes of G such that adjacent nodes always have distinct colors. If we choose k to match the number of machine regist

13、ers, then we can map a k-coloring of G into a feasible register assignment for the underlying code. Because graph coloring is NP-complete, the compiler uses a heuristic method to search for a coloring; it is not guaranteed to find a k-coloring for all k-colorable graphs. If a k-coloring is not disco

14、vered, some values are spilled; the values are kept in memory rather than in registers. Spilling one or more live ranges creates a new and different interference graph. The compiler proceeds by iteratively spilling some live ranges and attempting to color the resulting new graph. In the context of e

15、mbedded processors, the most important restriction of the graph coloring approach is that it is based on the assumption of a homogenous register set. The different phases of register allocation are differentlyimpacted By embedded processor irregularities.II. RELATED WORKRegister allocation has been

16、widely addressed in literature, and many approaches have been proposed. Most of them are related to Chaitins 6 approach, but very few address problems raised by embedded processors architecture like support for irregular registers via register classes and register set concatenation. Although 13 prop

17、osed an approach using integer-programming supporting irregular register sets, this approach requires modeling of register constraints by inequations, making it difficult to implement in an industrial compiler. Briggs3,4 proposes an approach to model register pairs in the interference graph. He intr

18、oduced an improved coloring strategy that produces better allocations for many graphs on which Chaitins method fails. The difference lies in the timing of spill decision. George and Apple 9 introduced iterating coalescing where they eliminated aggressive coalescing completely. Embedded processors ar

19、e often characterized by a small number of registers. Algorithms and heuristics have been adapted in our infrastructure to limit spill when few registers are available. Ref. 17 presents a generalization of the degree K test , called the test, to handle irregular register sets and register classes. V

20、PO 13 is a portable optimizer for DSPs, based on the Zephyr retargetable compiler infrastructure. VPO address the problem of heterogeneous register classes by computing and propagating along the IR tree the correct reduced register class (register class resulting from the intersection of the target/

21、source operand of two instructions) that allows registers to be allocated without inserting an extra move operation. The PROPAN postpass optimizer 14 relies on integer linear programming to perform register allocation and scheduling. Some researchers attempt to use non-graph-coloring methods. Koes15

22、 proposes a progressive register allocator which uses a multi-commodity networkflow mode tp represent the intricacies of irregular architectures. Scholz18 formulates register allocation as a partitioned boolean quadratic optimization problem that allows generic modeling of processors peculiarities.

23、Hirnschrott 11 used Partitioned Boolean Quadratic Programming for register allocation shows better results in spill costs and coalescing benifits. Mahajan 16 present a new hybrid evolutionary algorithm (HGR) for graph coloring register allocation problem for embedded systems as one of the most effic

24、ient algorithms with highly specialized, domain specific crossover and local search function. There are many local search algorithms for graph coloring, such as simulated annealing 7, tabu search10,etc Galinier and Hao8 introduced a algorithm based on hybrid evolutionary algorithm. An important feat

25、ure of this algorithm is a specialized crossover but the mutation operator of the GA is replaced by an LS operator.III. A HYBRID EVOLUTIONARY ALGORITHMEvolutionary algorithms involve natural evolution and genetics. The genetic algorithm is a classical method in this category. It has been applied to

26、various optimization problems. There are several other methods like genetic programming, ant colony optimization, etc. The simple evolutionary algorithms are not generally efficient for complex combinatorial problems. The performance of the evolutionary algorithms is improved with the addition of pr

27、oblem specific knowledge. Specialized operators are combined with evolutionary algorithms to generate complex hybrid systems called hybrid genetic algorithms, hybrid evolutionary algorithms, genetic local search algorithms and memetic algorithms. An approach for combinatorial optimization is to embe

28、d local search into the framework of population based evolutionary algorithm, leading to hybrid evolutionary algorithm. HEA is based on two elements: an efficient local search (LS) operator and a highly specialized crossover operator. The basic idea consists in using the crossover operator to create

29、 new and potentially interesting configurations which are then improved by the LS operator. We present a hybrid evolutionary algorithm for graph coloring based on a new crossover operator for register allocation problem and a new local search function. This section presents our algorithm- Hybrid evo

30、lutionary Algorithm for Graph coloring Register Allocation with proposed operator called crossover by conflict-free sets(CCS).A. The Algorithm The HEA consists of a genetic component and a local search (LS). The genetic component initializes and evolves a population of solutions. The general algorit

31、hm is as given below:Input : Interference graph , IG = ( V,E ) ; number ofregisters , k Output : best configuration begin P = generate_population(|P|) iter = 0 while ( iter 0 ) do (p1, p2) = select_parents(P) p = crossover (p1, p2) p = local_search( p, L ) P = update_population(P,p) iter =iter + 1 e

32、ndwhile endThe algorithm first builds an initial population of configurations (generate_population) and then performs a series of cycles called generation. At each generation, two configurations p1 and p2 are chosen in the population (select_parents). A crossover is then used to produce an offspring

33、 p from p1 and p2 (crossover). The LS operator is applied to improve p for a fixed number L of iterations (local_search). Finally, the improved configuration p is inserted in the population by replacing another configuration ( update_population). This process repeats until the value of iter is less

34、than or equal to a prefixed number MaxIter is reached or population diversity (popu-diversity) is greater than zero. Population diversity is calculated as the average distance between all configurations in the population. For two configurations p1 and p2, the distance between p1 and p2 is the minimu

35、m number of elementary transformations necessary to transform p1 into p2. In our approach, we consider the partition method for string representation 8. Each solution Pi partitions the variables into register classes, Pi = R1 , R2 , . Rk where each class Ri includes the live ranges of variables that

36、 are mapped to the registers ri and k is the total number of registers. Given two parents p1= R11 , R12 , . R1k and p2= R21 , R22 , . R2k, the partial configuration will be a set R1 , R2 , . Rk of disjoint sets of nodes where each subset Ri is included in a class of one of the two parents, and all R

37、i are conflict-free sets. ( the nodes i and j in an interference graph are said to be conflict-free, when there is no edge connecting them) B. Initial population generation Generally the initial population is generated using the DSatur algorithm 5, which is a graph coloring heuristic. It considers t

38、he saturation degree of each node, which can be defined as the number of different colors to which the node is adjacent to. In the register allocation problem, both the spill costs and the degree of the nodes in a given interference graph are considered. We adopt the new metric called the spill degr

39、ee proposed by 17 that can be used for ordering the nodes. The spill degree of a node i can be defined by one of the three equations given below SDegree1(i) = SCost(i) x Degree(i) SDegree2(i) = SCost(i) x Degree2(i) SDegree3(i) = SCost(i) (1) In these expressions, SCost(i) is the spill cost and Degr

40、ee(i) is the number of edges incident to node i. In order to generate the initial population, the spill degrees are set using a combination of the three equations given in (1). The nodes are sorted in decreasing order of spill degrees. At each iteration, an unsigned node with the maximum spill degre

41、e is mapped to the register class with the lowest possible index value, where the selected node should be conflict free with the other nodes in the same class. This process is repeated until all the nodes ( i.e., their corresponding variables ) are mapped to one of the register classes.C. The crosso

42、ver operator The crossover used here is the new proposed operator Crossover by Conflict-free Sets (CCS). Given two parent configurations p1= R11 , R12 , . R1k and p2= R21 , R22 , . R2k, chosen randomly by the select_parents operator from the population, the algorithm crossover (p1, p2) builds an off

43、spring p= R1 , R2 , . Rk as followsInput: configurations p1= R11 , R12 , . R1k and p2= R21 , R22 , . R2kOutput: configuration p= R1 , R2 , . Rkbegin/ consider p1if conflict(p1)0 then call function conflict_free(p1)/ consider p2if conflict(p2)0 then call function conflict_free(p2) for i (1 i k ) doCf

44、R1n= max qp1 Cf_Individual(q)/ CfR1n is the partition with maximum number of/conflict-free variables from p1CfR2n= max qp2 Cf_Individual(q)/ CfR2n is the partition with maximum number of/ conflict-free variables from p2 choose j such that | CfR1nCfR2j| is maximum Ri = |CfR1n CfR2j| SpillQuality(Ri)=

45、Spillcost(Ri)*Reg_class_Conflict(Ri) remove the nodes of Ri from p1 and p2 endfor assign randomly the nodes of R- (Rl UU Rk)endfunction conflict_free(p)begin Cf_Individual=for i ( 1 i 0 then/ Reg_class_Conflict(i) is the total number of conflicts/ in ith register classbegininitialize t to 0while (t

46、(Reg_class_Conflict(i)/2) do select a variable m from Ri where Conflict(m)=maxcRiConflict(c) or spillcost(m)=mincRispillcost(c) remove variable m from Ri t=t+Conflict(m)endwhileCfR(i)= RiendifCf_Individual= Cf_Individual CfR(i) endforendHe algorithm builds step by step the k classes Rl , R2 Rk of th

47、e offspring. If any register set of the parent has conflict, the algorithm first determines the conflict free sets of all the register classes of each parent. The function to find the conflict free set is based on the heuristic from 20 for determining the conflict-free set with the maximum number of

48、 nodes of each register class. In that, initially the conflict-free set includes all variables in Rl. The total number of conflicts of each variable m in class Rl, conflict(m) are determined. The sum of the conflict(m) values gives us the total number of conflicts in the register class Rl., Reg_clas

49、s_Conflict(l). Then, the variables from the partition are removed one by one in decreasing order of conflict(m) values, until the total number of conflicts in Rl becomes equal to of its initial value. The function gives us the conflict free register classes of each individual parent. We then conside

50、r the partition CfR1n , which has maximum number of conflict-free nodes from p1 . It then selects a partition CfR2j from p2 that has largest number of nodes in common with CfR1n. The algorithm unifies the pair of these two partitions. It calculates the conflict factor of the produced partition. If i

51、t is zero, the partition is conflict free. It then removes all the nodes belonging to offspring register partition from the parents p1 and p2. In case of tie-breaking, any partition is taken randomly. This new partition becomes the base set of the first register class for the offspring. The process

52、is subsequently repeated for the other register classes. At the end of k steps, some nodes may remain unassigned. These are then assigned to a class which has minimum conflict factor.D. The Local Search LS operator After a solution is generated using the crossover operator, the local search phase im

53、proves it before inserting it into the population for a maximum of L iterations. We have used a Local search operator LS. It uses a 1-exchange neighborhood. Formally, given a partition R1 , R2 , . Rk, a neighboring partition is obtained by assigning a single variable to other register set, i.e., a v

54、ariable vi is removed from the original register class Ri to which it belongs to and where vi is already in conflict with one or more variables in that class. It is moved into a new register class Rj. A variable vi which has maximum conflict factor is selected. Assume that vi is already assigned to

55、class k. If the fitness value of the register class k with vi is more than the fitness of register class j without considering vi, then vi is moved from register class k to register class j. This process is repeated for other register classes also. The register class which has minimum value is the f

56、inal register class for variable vi. The same process is repeated with a new variable according to the decreasing order of conflict factor. In this way the quality of generated offspring is improved. It is then inserted in the population by replacing the worst of the two parents.E. Fitness function

57、calculation To solve a register allocation problem, we consider a set of k - register classes. In the interference graph IG = (V, E), all variables (i.e., nodes) vi are assigned to k - register classes. The nodes i and j in an interference graph are said to be conflict-free, when there is no edge co

58、nnecting them. Conflict(m) denotes the conflict of a variable m in register class Ri. Spillcost is the spill cost of variable m. The fitness of a register class is given as fitness(Ri)=Conflict(m)*Spillcost(m)/degree(m) (3) The total sum of fitness values of all register classes gives the fitness va

59、lue of the given individual kFitness = fitness(i) (4)i=1The goal of the optimization process is to minimize fitness until zero.IV. EXPERIMENTAL EVALUATIONThe experiments are based on MachineSUIF 19 compiler research framework. The register allocator of MachineSUIF implements George-Appels Iterated R

60、egister Coalescing algorithm 9. Our experimental analysis includes three algorithms We compare our algorithm with these algorithms. The first is the MachineSUIFs default algorithm9 with extenstions by Smith-Ramsey and Holloway 19 for handling register aliasing. We denote it by IRC. The second is bas

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論