gwas and snp數(shù)據(jù)分析.ppt_第1頁(yè)
gwas and snp數(shù)據(jù)分析.ppt_第2頁(yè)
gwas and snp數(shù)據(jù)分析.ppt_第3頁(yè)
gwas and snp數(shù)據(jù)分析.ppt_第4頁(yè)
gwas and snp數(shù)據(jù)分析.ppt_第5頁(yè)
免費(fèi)預(yù)覽已結(jié)束,剩余49頁(yè)可下載查看

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、Statistical Analysis in Case-Control studies,Summer International Workshop Aug, 09, Beijing,Liu Tian Genome Institute of Singapore .sg,Outline,Introduction Basic Statistical Methods of Case-control Study GWAS A Novel Epistasis-testing Procedure,Aim of Genetic Studies,Dramatic vari

2、ation do exist within a same spice Almost every biological phenomenon involves a genetic component There is always a keen need for us to seek the genetic variation relates to complex traits.,Different Design Strategies,Intervention studies Clinic trials Observational studies Case-control studies Coh

3、ort studies,Cohort Studies,A cohort study is a study where a group of individuals are followed. Cohort studies can be either prospective or retrospective,Case-Control Studies,Case-control studies are used to identify factors that may contribute to a medical condition by comparing subjects who have t

4、hat condition (the cases) with patients who do not have the condition but are otherwise similar (the controls) Case-control studies are retrospective and non-randomized,Case-Control Studies,Disease +,Disease -,Selection of Cases,Population-based cases: include all subjects or a random sample of all

5、subjects with the disease at a single point or during a given period of time in the defined population. Hospital-based cases: All patients in a hospital department at a given time,Selection of Controls,Principles of Control Selection: Study base: Controls can be used to characterise the distribution

6、 of exposure Comparable-accuracy: Equal reliability in the information obtained from cases and controls (to avoid systematic misclassification) Overcome confounding: Elimination of confounding through control selection (matching or stratified sampling),Selection of Controls,General population contro

7、ls: registries, households, telephone sampling costly and time consuming recall bias eventually high non-response rate Hospitalised controls: Patients at the same hospital as the cases Easy to identify; less recall bias; higher response rate,Case-Control Studies vs. Cohort Studies,Cohort study Rare

8、exposure Examine multiple effects of a single exposure Minimizes bias in the in exposure determination Direct measurements of incidence of the disease,Case-control study Quick, inexpensive Well-suited to the evaluation of diseases with long latency period Rare diseases Examine multiple etiologic fac

9、tors for a single disease,Cohort study Not rare diseases Prospective: Expensive and time consuming Retrospective: in adequate records Validity can be affected by losses to follow-up,Case-Control Studies vs. Cohort Studies,Case-control study Not rare exposure Incidence rates cannot be estimated unles

10、s the study is population based retrospective, non-randomized nature limits the conclusions that can be drawn from them.,Data Structure of Case-control studies,Outline,Introduction Basic Statistical Methods of Case-control Study GWAS A Novel Epistasis-testing Procedure,Population-Based Case-Control

11、Study,Individuals are unrelated To test if marker genotypes distribute differently between the cases and controls By comparing within cases and controls, we identify those genetic factors correlated with a pre-defined phenotype,Coding Genotypes,For one marker with two alleles, there can be three pos

12、sible genotypes:,Genetic Models and Underlining Hypotheses,Genotypic Model Hypothesis: all 3 different genotypes have different effects,Genotypic value is the expected phenotypic value of a particular genotype,AA vs. Aa vs. aa,Genetic Models and Underlining Hypotheses,Dominant Model Hypothesis: the

13、genetic effects of AA and Aa are the same (assuming A is the minor allele),AA and Aa vs. aa,Genetic Models and Underlining Hypotheses,Recessive Model Hypothesis: the genetic effects of Aa and aa are the same (A is the minor allele),AA vs. Aa and aa,Genetic Models and Underlining Hypotheses,Allelic M

14、odel Hypothesis: the genetic effects of allele A and allele a are different,A vs. a,Pearsons Chi-squared Test,Genotypic Model: Null Hypothesis: Independence,df = 2,Pearsons Chi-squared Test,Dominant Model: Null Hypothesis: Independence,df = 1,Pearsons Chi-squared Test,Recessive Model: Null Hypothesi

15、s: Independence,df = 1,Pearsons Chi-squared Test,Allelic Model: Null Hypothesis: Independence,df = 1,Test Statistic,Chi-squared Test Statistic: O is the observed cell counts E is the expected cell counts, under null hypothesis of independence,Example,The following table summarize the genotype counts

16、 of marker M : Different tests can be performed: - Allelic test - Dominant gene action - Recessive gene action - Genotypic test,Example (Dominant Gene Action),Using R: dominant_table - matrix(c(80,90,20,10), ncol = 2) print(dominant_table ) chisq.test(dominant_table ,correct=FALSE),Example (Recessiv

17、e Gene Action),Using R: recessive_table - matrix(c(36,18,164,182), ncol = 2) print(recessive_table) chisq.test(recessive_table,correct=FALSE),Example (Genotypic Test),Using R: genotypic_table - matrix(c(36,18,100,84,64,98), ncol = 3) print(genotypic_table) chisq.test(genotypic_table,correct=FALSE),E

18、xample (Allelic Test),Using R: allelic_table - matrix(c(172,120,228,280), ncol = 2) print(allelic _table) chisq.test(allelic_table,correct=FALSE),A General Model:,Logistic Regression Analysis,Where: pdisease is the probability that an individual has a particular disease. 0 is the intercept 1, 2 J ar

19、e the effects of genetic factors X1, X2 XJ are the dummy variables of genetic factors,Logistic Regression Analysis,Logistic regression describes the relationship between a dichotomous response variable and a set of explanatory variables. Logit model is the only model under which , the effect paramet

20、er, can be estimated in retrospective studies as same as in prospective studies. If the sampling rate for cases is 10 times that for controls, the intercept estimated is log(10) =2.3 than the one estimated with a prospective study.,Inference and Interpretation,Significant test focus on: Estimator is

21、 the estimated odds ratio for genetic factor i. The sign of determines whether is increasing or decreasing when the effect of genetic factor i exists.,An Example of R output,Other Options,Fishers Exact Test: When sample size is small, the asymptotic approximation of null distribution is no longer va

22、lid. By performing Fishers exact test, exact significance of the deviation from a null hypothesis can be calculated. For a 2 by 2 table, the exact p-value can be calculated as:,Other Options,Cochram-Armitage Trend Test - An advantage of the Cochran-Armitage test is that it does not assume Hardy-Wein

23、berg equilibrium - Typically used to test a 2k contingency table, when the effects of AA, Aa, and aa are thought to be ordered. - In genome-wide association studies, the additive (or codominant) version of the test is often used.,Outline,Introduction Basic Statistical Methods of Case-control Study G

24、WAS A Novel Epistasis-testing Procedure,Genome-wide Association Study,In genetic epidemiology, a genome-wide association study (GWAS) - also known as whole genome association study (WGA study) - is an examination of genetic variation across a given genome, designed to identify genetic associations w

25、ith observable traits. In human studies, this might include traits such as blood pressure or weight, or why some people get a disease or condition.,From: ,Technology makes it feasible - Affymetrix: 500K; 1M chip arrives in early 2007. (Randomly distributed) - Illumina: 550K chi

26、p costs (gene-based),Genome-wide Association Study,Requires little on sample, Case-control data, case-parents trio data are enough. Good for moderate effect sizes ( odds ratio 1.5). Particularly useful in finding genetic variations that contribute to common, complex diseases.,Genome-wide Association

27、 Study,What Is A SNP?,AAGTCAGTCTAGGATCGGG,TTCAGTCAGATCCTAGCCC,TTCAGTCAGATCCCAGCCC,AAGTCAGTCTAGGGTCGGG,Chromosome 1,Chromosome 2,Single Nucleotide Polymorphism,Handling GWAS,Storing and converting large amounts of genotype data Quality control Generating initial association analysis Specialized analysis,Quality Control Of SNPs,Exclude SNPs that failure the Hardy-Weinberg test - Expected proportions of genotypes are not consistent with observed allele frequency - HWE p-value 10-4 to 10-6 Genotyping success rate 95% Di

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論