docirangesdoc-1_W_第1頁(yè)
docirangesdoc-1_W_第2頁(yè)
docirangesdoc-1_W_第3頁(yè)
docirangesdoc-1_W_第4頁(yè)
docirangesdoc-1_W_第5頁(yè)
已閱讀5頁(yè),還剩11頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、An Overview of the IRanges packagePatrick Aboyoun, Michael Lawrence, Herv PagsEdited: February 2018; Compiled: January 13, 2020Contents12Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .IRanges objects . . . . . . . . . . . . . . . . . . . . . . . . . . .1245566677910112.12.22

2、.2.7Normality. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Lists of IRanges objects . . . . . . . . . . . . . . . . . . . . .Vector Extraction . . . . . . . . . . . . . . . . . . . . . . . . .Finding Overlapping Ranges . . . . . . . . . . . . . . . . . . .Counting Overlapping Ra

3、nges . . . . . . . . . . . . . . . . . .Finding Neighboring Ranges . . . . . . . . . . . . . . . . . . .Transforming Ranges . . . . . . . . . . . . . . . . . . . . . . ..22.7.3Adjusting starts, ends and widths. . . . . . . . . . . . . . .Making ranges disjoint . . . . . . . . . . . . . . . .

4、 . . .Other transformations. . . . . . . . . . . . . . . . . . . .2.8Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . .3Vector Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1112123.13.2Creating Views . . . . . . . . . . . . . . . . . . . . . . . . . .Aggregating View

5、s . . . . . . . . . . . . . . . . . . . . . . . .45Lists of Atomic Vectors. . . . . . . . . . . . . . . . . . . . . . . .Session Information . . . . . . . . . . . . . . . . . . . . . . . . .12161IntroductionWhen analyzing sequences, we are often interested in particular consecutive subsequences. For

6、 example, the a vector could be considered a sequence of lower-case letters, in alphabetical order. We would call the first five letters (a to e) a consecutive subsequence, while the subsequence containing only the vowels would not be consecutive. It is not uncommon for an analysis task to focus onl

7、y on the geometry of the regions, while ignoring the underlying sequence values. A list of indices would be a simple way to select a subsequence. However, a sparser representation for consecutive subsequences would be a range, a pairing of a start position and a width, as used when extracting sequen

8、ces with window.An Overview of the IRanges packageTwo central classes are available in Bioconductor for representing ranges: the IRanges class defined in the IRanges package for representing ranges defined on a single space, and the GRanges class defined in the GenomicRanges package for representing

9、 ranges defined on multiple spaces.In this vignette, we will focus on IRanges objects. We will rely on simple, illustrative example datasets, rather than large, real-world data, so that each data structure and algorithm can be explained in an intuitive, graphical manner. We expect that packages that

10、 apply IRanges to a particular problem domain will provide vignettes with relevant, realistic examples.The IRanges package is available at and can be downloaded via BiocMan ager:install:2IRanges objectsTo construct an IRanges object, we call the IRanges constructor. Ranges are norma

11、lly specified by passing two out of the three parameters: start, end and width (see help(IRanges) for more information).2 ir1 ir1IRanges object with 10 ranges and 0 metadata columns: startendwidth 11101022109331084410755106661057710488103991021010101 ir2 ir3 identical(ir1, ir2) & identical(ir1, ir3)

12、1 FALSE ir ir if (!require(BiocManager)+install.packages(BiocManager) BiocManager:install(IRanges) library(IRanges)An Overview of the IRanges packageAll of the above calls construct the same IRanges object, using different combinations of thestart, end and width parameters.Accessing the starts, ends

13、 and widths is supported via the start, end and width getters:Subsetting an IRanges object is supported, by numeric and logical indices:In order to illustrate range operations, well create a function to plot ranges.3 plotRanges ir1:4IRanges object with 4 ranges and 0 metadata columns: startendwidth

14、111212281363141964152915 irstart(ir) = 15IRanges object with 4 ranges and 0 metadata columns: startendwidth 111212281363141964152915 start(ir)1 1 8 14 15 19 34 40 end(ir)1 12 13 19 29 24 35 46 width(ir)1 12 6 6 15 6 2 7IRanges object with 7 ranges and 0 metadata columns: startendwidth 11121228136314

15、1964152915519246634352740467An Overview of the IRanges packageir010203040Figure 1: Plot of original ranges2.1NormalitySometimes, it is necessary to formally represent a subsequence, where no elements are re- peated and order is preserved. Also, it is occasionally useful to think of an IRanges object

16、 as a set of integers, where no elements are repeated and order does not matter.The NormalIRanges class formally represents a set of integers. By definition an IRanges object is said to be normal when its ranges are: (a) not empty (i.e. they have a non-null width); (b) not overlapping; (c) ordered f

17、rom left to right; (d) not even adjacent (i.e. there must be a non empty gap between 2 consecutive ranges).There are three main advantages of using a normal IRanges object: (1) it guarantees a subsequence encoding or set of integers, (2) it is compact in terms of the number of ranges, and (3) it uni

18、quely identifies its information, which simplifies comparisons.The reduce function reduces any IRanges object to a NormalIRanges by merging redundant ranges.4 reduce(ir)IRanges object with 3 ranges and 0 metadata columns: startendwidth 112929234352340467 plotRanges(reduce(ir) plotRanges(ir)+height -

19、 1+if (is(xlim, IntegerRanges)+xlim - c(min(start(xlim), max(end(xlim)+bins - disjointBins(IRanges(start(x), end(x) + 1)+plot.new()+plot.window(xlim, c(0, max(bins)*(height + sep)+ybottom set.seed(0) lambda xVector yVector xRle yRle irextract xRleirextractinteger-Rle of length 200 with 159 runsLengt

20、hs: 12 1 1 1 2 1 1 1 1 2 . 1 1 1 1 1 1 1 1 1Values : 0 1 0 2 0 1 0 1 0 1 . 9 12 6 5 10 9 6 9 12 rl start(rl)IntegerList of length 2 1 1 8 14 15 19 34 402 40 34 19 15 14 8 1An Overview of the IRanges packageir010203040Figure 3: Plot of ranges with accumulated coverage2.4Finding Overlapping RangesThe

21、function findOverlaps detects overlaps between two IRanges objects.2.5Counting Overlapping RangesThefunctioncoverage counts the number ofrangesover each position.2.6Finding Neighboring RangesThe nearest function finds the nearest neighbor ranges (overlapping is zero distance). Theprecede and follow

22、functions find the non-overlapping nearest neighbors on a specific side.60 3 cov plotRanges(ir) cov mat d mat mat lines(mat, col=red, lwd=4) axis(2) ol as.matrix(ol)queryHits subjectHits 1,112,213,314,415,516,627,73An Overview of the IRanges package2.7Transforming RangesUtilities are available for t

23、ransforming an IRanges object in a variety of ways. Some transfor- mations, like reduce introduced above, can be dramatic, while others are simple per-range adjustments of the starts, ends or widths.2.7.1Adjusting starts, ends and widthsPerhaps the simplest transformation is to adjust the start valu

24、es by a scalar offset, as per- formed by the shift function. Below, we shift all ranges forward 10 positions.There areseveral other ways to transform ranges. These include narrow, resize, flank, re flect, restrict, and threebands. For example narrow supports the adjustment of start, end and width va

25、lues, which should be relative to each range. These adjustments are vectorized over the ranges. As its name suggests, the ranges can only be narrowed.The restrict function ensures every range falls within a set of bounds. Ranges are contracted as necessary, and the ranges that fall completely outsid

26、e of but not adjacent to the bounds are dropped, by default.7 restrict(ir, start=2, end=3)IRanges object with 1 range and 0 metadata columns: startendwidth 1232 narrow(ir, start=1:5, width=2)IRanges object with 7 ranges and 0 metadata columns: startendwidth 112229102316172418192523242634352741422 sh

27、ift(ir, 10)IRanges object with 7 ranges and 0 metadata columns: startendwidth 11122122182363242964253915529346644452750567An Overview of the IRanges packageThe threebands function extends narrow so that the remaining left and right regions adjacent to the narrowed region are also returned in separat

28、e IRanges objects.The arithmetic operators +, - and * change both the start and the end/width by symmetrically expanding or contracting each range. Adding or subtracting a numeric (integer) vector to an IRanges causes each range to be expanded or contracted on each side by the corresponding value in

29、 the numeric vector.8 ir + seq_len(length(ir)IRanges object with 7 ranges and 0 metadata columns: startendwidth 101314261510 threebands(ir, start=1:5, width=2)$leftIRanges object with 7 ranges and 0 metadata columns: startendwidth 11002881314152415173519224634330740401$middleIRanges object with 7 ra

30、nges and 0 metadata columns: startendwidth 112229102316172418192523242634352741422$rightIRanges object with 7 ranges and 0 metadata columns: startendwidth 1312102111333181924202910525240636350743464An Overview of the IRanges packageThe * operator symmetrically magnifies an IRanges object by a factor

31、, where positive con- tracts (zooms in) and negative expands (zooms out).WARNING: The semantic of these arithmetic operators might be revisited at some point. Please restrict their use to the context of interactive visualization (where they arguably provide some convenience) but avoid to use them pr

32、ogrammatically.2.7.2Making ranges disjointAmore complex type ofoperation is making asetof ranges disjoint, i.e. non-overlapping. For example, threebands returns a disjoint set of three ranges for each input range.The disjoin function makes an IRanges object disjoint by fragmenting it into the widest

33、 ranges where the set of overlapping ranges is the same.9 disjoin(ir)IRanges object with 10 ranges and 0 metadata columns: startendwidth 1177281253131314141415151846191917202458252959343521040467 plotRanges(disjoin(ir) ir * -2 # double the widthIRanges object with 7 ranges and 0 metadata columns: st

34、artendwidth 1-5182425161231122124736305162712633364736491431122124113323514291662841147335321An Overview of the IRanges packagedisjoin(ir)010203040Figure 4: Plot of disjoined rangesA variant of disjoin is disjointBins, which divides the ranges into bins, such that the ranges in each bin are disjoint

35、. The return value is an integer vector of the bins.2.7.3Other transformationsOther transformations include reflect and flank. The former “flips” each range within a set of common reference bounds.The flank returnsranges ofa specified width that flank, to the left (default) or right, each input rang

36、e. One use case of this is forming promoter regions for a set of genes.10 flank(ir, width=seq_len(length(ir)IRanges object with 7 ranges and 0 metadata columns: startendwidth 10012672311133411144514185628336733397 reflect(ir, IRanges(start(ir), width=width(ir)*2)IRanges object with 7 ranges and 0 me

37、tadata columns: startendwidth 11324122141963202564304415525306636372747537 disjointBins(ir)1 1 2 1 2 3 1 1An Overview of the IRanges packagegaps(ir, start = 1, end = 50)01020304050Figure 5: Plot of gaps from ranges2.8Set OperationsSometimes, it is useful to consider an IRanges object as a set of int

38、egers, although there is always an implicit ordering. This is formalized by NormalIRanges, above, and we now present versions of the traditional mathematical set operations complement, union, intersect, and difference for IRanges objects. There are two variants for each operation. The first treats e

39、ach IRanges object as a set and returns a normal value, while the other has a “parallel” semantic like pmin/pmax and performs the operation for each range pairing separately.The complement operation is implemented by the gaps and pgap functions. By default, gaps will return the ranges that fall betw

40、een the ranges in the (normalized) input. It is possible to specify a set of bounds, so that flanking ranges areincluded.pgap considers each parallel pairing between two IRanges objects and finds the range, if any, between them. Note that the function name is singular, suggesting that only one range

41、 is returned per range in the input.The remaining operations, union, intersect and difference are implemented by the punion, pintersect and psetdiff functions, respectively. These are relatively self-explanatory.3Vector ViewsThe IRanges package provides the virtual Views class, which stores a vector

42、-like object, referred to as the “subject”, together with an IRanges object defining ranges on the subject. Each range is said to represent a view onto the subject.Here, we will demonstrate the RleViews class, where the subject is of class Rle. Other Viewsimplementations exist, such as XStringViews

43、in the Biostrings package.11 gaps(ir, start=1, end=50)IRanges object with 3 ranges and 0 metadata columns: startendwidth 130334236394347504 plotRanges(gaps(ir, start=1, end=50), c(1,50)An Overview of the IRanges package3.1Creating ViewsThere are two basic constructors for creating views: the Views f

44、unction based on indicators and the slice based on numeric boundaries.Note that RleList objects will be covered later in more details in the “Lists of Atomic Vectors”section of this document.3.2Aggregating ViewsWhile sapply can be used to loop over each window, the native functions viewMaxs, viewMin

45、s, viewSums, and viewMeans provide fast looping to calculate their respective statistical sum- maries.4Lists of Atomic VectorsIn addition tothe range-based objectsdescribed in the previous sections, the IRanges package provides containers for storing lists of atomic vectors such as integer or Rle ob

46、jects. The IntegerList and RleList classes represent lists of integer vectors and Rle objects respectively. They are subclasses of the AtomicList virtual class which is itself a subclass of the List virtual class defined in the S4Vectors package.12 showClass(RleList)Virtual Class RleList package IRa

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論