edgeR-DESeq2分析RNA-seq差異表達(dá)_第1頁(yè)
edgeR-DESeq2分析RNA-seq差異表達(dá)_第2頁(yè)
edgeR-DESeq2分析RNA-seq差異表達(dá)_第3頁(yè)
edgeR-DESeq2分析RNA-seq差異表達(dá)_第4頁(yè)
edgeR-DESeq2分析RNA-seq差異表達(dá)_第5頁(yè)
已閱讀5頁(yè),還剩11頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、edgeR包的安裝 edgeR包是基于Bioconductor 平臺(tái)發(fā)布的,所以安裝不能直接用install.packages()命令從CRAN上來(lái)下載安裝:# try http:/ if https:/ URLs are not supportedsource ( /biocLite.R)biocLite (edgeR)數(shù)據(jù)導(dǎo)入 由于edgeR對(duì)測(cè)序結(jié)果的下游分析是依賴 count計(jì)數(shù)來(lái)進(jìn)行基因差異 表達(dá)分析的,在這里使用的是featureCounts來(lái)進(jìn)行統(tǒng)計(jì) .bam文件中Map的結(jié)果 count結(jié)果如下:library (edgeR)myd

2、ata sampleNames names( mydata ) 7: 12 head ( mydata )GeneidChrStartEndStrandLengthCA 1CA 2CA 3CC 1CC 2CC 31gene1314NW_139421.12gene1315NW_139421.13gene1316NW_139421.14gene1317NW_139421.15gene1318NW_139421.16gene1319NW_139421.112571745 +48900000021153452 +133800000038564680 +82500000048665435 - 57000

3、000060666836 - 77100000072949483 +2190000000 在這里我們只是需要Geneid和后6列的樣本的count信息來(lái)組成矩陣,所以要處理下countMatrix rownames (countMatrix ) head (countMatrix )CA_1CA_2CA_3CC_1CC_2CC_3gene1314 000000gene1315 000000gene1316 000000gene1317 000000gene1318 000000gene1319 000000*要導(dǎo)入的矩陣由3v3樣本組成(三組生物學(xué)重復(fù))創(chuàng)建 DEGlistgroup yyAn

4、objectofclass DGEList $counts CA_1CA_2CA_3CC_1CC_2CC_3 gene1314 000000 gene1315 000000 gene1316 000000 gene1317 000000 gene1318 000000 14212 morerows$samplesgrouplib.sizenorm.factors CA_1CA_1 17885371 CA_2CA_218255461 CA_3CA_319030171 CC_1CC_118260421 CC_2CC_221244681 CC_3CC_320250631過(guò)濾過(guò)濾掉那些count結(jié)果都

5、為0的數(shù)據(jù),這些沒(méi)有表達(dá)的基因?qū)Y(jié)果的分 析沒(méi)有用,過(guò)濾又兩點(diǎn)好處:1可以減少內(nèi)存的壓力2可以減少計(jì)算的壓力keep 1) =2yyAnobjectofclass DGEList $counts CA 1CA 2CA 3CC 1CC 2CC 3 gene1321 161138129218194220 gene1322 231133 gene1323 202733475146 gene1324 60877986100132 gene1325 322921587556 3877 morerows $samples grouplib.sizenorm.factors CA_1CA_1 1788362

6、1 CA_2CA_218253081 CA_3CA_319027961 CC_1CC_118258891 CC_2CC_221241551 CC_3CC_320247861標(biāo)準(zhǔn)化處理 edgeR采用的是TMM方法進(jìn)行標(biāo)準(zhǔn)化處理,只有標(biāo)準(zhǔn)化處理后的數(shù)據(jù)才 又可比性yyAnobjectofclass DGEList$countsCA1CA2CA3CC1CC2CC3gene1321 161138129218194220gene1322 231133gene1323 202733475146gene1324 60877986100132gene1325 3229215875563877 morerow

7、s$samplesgrouplib.sizenorm.factorsCA_1CA_1 17883620.9553769CA_2CA_218253080.9052539CA_3CA_319027960.9686232CC_1CC_118258890.9923455CC_2CC_221241551.1275178CC_3CC_320247861.0668754設(shè)計(jì)矩陣為什么要一個(gè)設(shè)計(jì)矩陣呢,道理很簡(jiǎn)單,有了一個(gè)設(shè)計(jì)矩陣才能夠更好的 分組分析subGroup design rownames (design )design(Intercept ) subGroup2subGroup3groupCC C

8、A_11000 CA_21100 CA_31010 CC_11001 CC_21101 CC_31011 attr (, assign ) 1 0112 attr(,contrasts)attr(,contrasts)$subGroup1 contr.treatmentattr(,contrasts) $group1 contr.treatment評(píng)估離散度yy$common.dispersion1 0.02683622#plotplotBCV (y)QF055 E2 eCMO0S1015Average log CpM差異表達(dá)基因fit qlf topTags (qlf )Coefficien

9、t : groupCC logFClogCPM FPValueFDRgene7024 -5.5156489.612809594.92326.431484e-442.496702e-40gene6612 5.1302828.451143468.20601.557517e-393.023140e-36gene2743 4.3774925.586773208.02683.488383e-264.513967e-23 gene12032 4.7343835.098148192.93784.359649e-254.231040e-22 gene491 -2.73391010.412673190.9839

10、6.104188e-254.739291e-22 gene8941 2.9971856.839106177.76146.332836e-244.097345e-21 gene2611 -2.8469247.216173174.73321.099339e-236.096619e-21 gene6242 2.5291259.897771169.26583.022914e-231.466869e-20 gene7252 3.7323156.137670188.20943.890569e-231.678132e-20 gene6125 2.8754236.569935160.31891.656083e

11、-226.428914e-20查看差異表達(dá)基因原始的CMPtop cpm(y) top ,CA_1CA_2CA_3CC_1CC_2CC_3gene7024 1711.3830021405.8618991480.12111533.1141837.1604029.62696 gene6612 17.55864912.10384826.585753403.99298582.457961044.35046 gene2743 4.6823061.8155775.96823062.9169487.26431114.34156 gene12032 1.7558652.4207702.71283265.676

12、4647.5987275.45617gene491 2811.1397272059.4696692222.351938444.83381385.38258253.68087 gene8941 23.99682024.81288824.415488131.35291244.67410225.90560 gene2611 245.821088310.463691225.16505243.0484326.3045539.81123 gene6242 231.188880299.570228298.4115151348.298991343.619882191.93237 gene7252 9.3646

13、1313.3142325.42566492.71970108.55847181.92807 gene6125 23.41153214.52461729.841152145.70239160.75005185.16852查看上調(diào)和下調(diào)基因的數(shù)目summary ( dt isDE DEnameshead ( DEnames)1 gene1325gene1326gene1327gene1331gene1340gene1343”差異表達(dá)基因畫(huà)圖plotSmear (qlf , de.tags =DEnames)abline ( h=c( -1 ,1), col =blue)DESeq2包的安裝安裝:#

14、 try http:/ if https:/ URLs are not supportedsource ( /biocLite.RbiocLite (DESeq2)數(shù)據(jù)導(dǎo)入導(dǎo)入count矩陣,導(dǎo)入數(shù)據(jù)的方式很多這里直接導(dǎo)入count矩陣 count結(jié)果如下:library ( DESeq2)sampleNames - c( CA_1 , CA_2 , CA_3 , CC_1 , CC_2 , CC_3)mydata -read.table (counts.txt , header =TRUE quote =t, skip =1)names( myd

15、ata ) 7: 12 - sampleNamescountMatrix - as.matrix ( mydata 7: 12) rownames ( countMatrix ) - mydata $Geneidtable2 - data.frame ( name=c( CA_1 , CA_2 , CA_3 , CC_1 , CC_2 , CC_3 ), condit ion =(CA , CA , CA , CC , CC , CC) rownames (table2 ) dds ddsclass : DESeqDataSetdim : 142176metadata (0):assays (

16、1) : countsrownames (14217 ): gene1314gene1315gene6710gene6709rowRangesmetadatacolumn names ( 0):colnames ( 6) : CA_1CA_2CC_2CC_3colData names (2): namecondition過(guò)濾 過(guò)濾掉那些count結(jié)果都為0的數(shù)據(jù),這些沒(méi)有表達(dá)的基因?qū)Y(jié)果的分 析沒(méi)有用dds 1,ddsclass : DESeqDataSetdim : 41906metadata (0):assays (1) : countsrownames ( 4190 ) : gene13

17、21gene1322gene6712gene6710rowRangesmetadatacolumn names ( 0):colnames ( 6) : CA_1CA_2CC_2CC_3colData names (2): nameconditionPC矽析rld - rlog (dds )plotPCA (rld , intgroup =c(name , condition )5-group CA_1 :CA CA_2:CA CA 3:GA GG_1 :CC GC_2:0C CC_3lOCPC l:的% Marine 當(dāng)然也可以使用ggplot2 來(lái)畫(huà) PCA圖library(ggplot2

18、)rld - rlog(dds)data - plotPCA(rld, intgroup=c(condition, name), returnData=TRUE) percentVar - round(100 * attr(data, percentVar)p- ggplot(data, aes(PC1, PC2, color=condition, shape=name) + geom_point(size=3) +xlab(paste0(PC1: ,percentVar1,% variance) +ylab(paste0(PC2: ,percentVar2,% variance)p否則無(wú)法進(jìn)

19、行PCA分析 注意在進(jìn)行PCA分析前不要library(DESeq)condition* CA* ec ca_i * CA_2 CA_3 + GG.1 閏 QC_2 *CCJJPC 1:varianc e差異表達(dá)基因分析分析結(jié)果輸出library ( DESeq)dds - DESeq( dds ) res gene1321 173.2886810.262679590.20499831.28137422.000623e-01gene1322 2.118367-0.052379520.4989589-0.10497769.163936e-01gene1323 35.9737010.5005458

20、00.30380961.64756419.944215e-02gene1324 88.4216610.176776050.24027270.73573094.618945e-01gene1325 43.0018280.811431040.29193962.77944865.445127e-03 gene1326 662.136259-1.053561050.1752230-6.01268801.824720e-09 padj numeric gene1321 3.790396e-01gene1322 9.559679e-01gene1323 2.337858e-01gene1324 6.565

21、731e-01gene1325 2.447141e-02gene1326 4.520861e-08 注:(1)rownames:基因ID (2)baseMean:所有樣本矯正后的平均 reads 數(shù)(3)log2FoldChange:取log2 后的表達(dá)量差異(4)pvalue: 統(tǒng)計(jì)學(xué)差 異顯著性檢驗(yàn)指標(biāo)(5)padj:校正后的pvalue, padj越小,表示基因表達(dá)差異越顯著 summary查看整體分析結(jié)果summary(res)out of 4190 with nonzero total read countadjusted p-value 0 (up) : 595, 14%LFC 0 (down) : 644, 15%out

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論