




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
1、Databases for MicroarraysVidhya JagannathanSIB, LausanneOverviewMicroarray data in a nutshellWhy databases?What data to represent?What is a database?Different data modelsE-R modellingMicroarray DatabasesStandards being developedMicroarray ExperimentMicroarray Data in a NutshellLots of data to be man
2、aged before and after the experiment.Data to be stored before the experiment .Description of the array and the sample.Direct access to all the cDNA and gene sequences, annotations, and physical DNA resources.Data to be stored after the experimentRaw Data - scanned images.Gene Expression Matrix - Rel
3、ative expression levels observed on various sites on the array. Hence we can see that database software capable of dealing with larger volumes of numeric and image data is required.Why Databases?Tailored to datatypeTailored to the ScientistsIntuitive ways to query the dataDiagrams, forms, point and
4、click, text etc.Support for efficient answering of queries.Query optimisation, indexes, compact physical storage.Data RepresentationGoal: Represent data in an intuitive and convenient mannerWithout unnecessary replication of informationMaking it easy to write queries to find required informationSupp
5、orting efficient retrieval of required informationWhat is a Database?A database is an organised collection of pieces of structured electronic information.Example 1: Libraires use a database system to keep track of library inventory and loans.Example 2: All airlines use database system to manage thei
6、r flights and reservations.The collection of records kept for a common purpose such as these is known as a database.The records of the database normally reside on a hard disk and the records are retrieved into computer memory only when they are accessed.So the reasons are obvious why we need to disc
7、uss about a Microarray database.Data ModelsDescribes a container for data and methods to store and retrieve data from that container.Abstract math algorithms and concepts.Cannot touch a data model.Very usefulTypes of Data ModelsAd-hoc file formats (not really data models!)Relational data modelObject
8、-relational data modelObject-oriented data modelXML (Extensible Markup Language)Ad-hoc File FormatsThe various ad-hoc file formats in use for microarray data are:Flat file formats.Spread sheet formats.Not the least - Even MS-Word documents ! Very rudimentary method to store data .Sometimes contains
9、redundant information.Extremely inefficient for retrieval of particular subsets of the results.Relational Data ModelMost prevalent and used in many databases developed today.The collection of related information is represented as a set of tables.Data value is stored in the intersection of row and co
10、lumnColumn values are of the same kind. A Simple data validation.Rows are unique. So no data redundancy and every row is meaningful and can be identified by the unique key.Utilises Structured Query Language (SQL) for data storage, retrival and manipulation.CONTIG_STRANDCONTIG_ENDCONTIG_STARTCONTIG_I
11、DGENE_IDComplement 23210722308745NT_0106051.3GB2VN32Complement23607782354807 NT_0106058.3GB2VNGENEExampleTableRow or RecordField or ColumnExampleCONTIG_STRANDCONTIG_ENDCONTIG_BEGINCONTIG_IDGENE_IDComplement 23210722308745NT_0106051.3GB2VN32Complement23607782354807 NT_0106058.3GB2VNGENEMolecular func
12、tionTYPEDESCRIPTIONGENE_IDCLASS_IDDNA bindingGB2VN32GO:0003677GeneMSH(Drosophila)GB2VNMSX2CLASSIFICATIONAdvantages of Relational ModelAllows information to be broken up into logical units and stored in tables. Allows combining data from different tables in different ways to derive useful information
13、.Great for queries involving information from multiple original sources.Can easily gather related information.e.g. information about a particular gene from multiple datasets/experimentsObject Oriented ModelObject Oriented Model allows real world data to be represented as objects.Objects encapsulate
14、the data and provide methods to access or manipulate it.Objects with specific structure and set of methods are said to belong to the object class.Allows new classes to be created by extending the description of the parent class.Child classes inherit the data and methods of the parent class. ExampleD
15、NAProteinInheritsInheritsProteinProteinDNADNAget_bio_seq()Biomolecule. String seqOODBMSExample - ArrayExpressDatabase Design Entity-Relationship ConceptEntity ARelationshipEntity BExamplesEntitiesare real world objectsex: genecontain attributesex: gene_id, sequenceare drawn as rectangle boxes that h
16、olds the name of the entity and attribute in two different notations as there is no standard!Genegene_idsequenceGenegene_idsequencenotation 1notation 2RelationshipRelationships provide connections between two or more entitiesex: Which genes were used in which experimentWhen two entities are involved
17、 in a relationship, it is known as binary relationship.When three entities are invoved in a relationship, it is called as ternary relationship.When more than three entities are involved in a relationship, it is usually broken in to one or more binary or ternary relationships.are drawn as a line link
18、ing the involved entities as:GeneExperimentused_inExperiment Experiment-Id Date ImageExperimenter Experimenter-Id Name E-mail Dept. InstitutionSample Sample-Id Organism Cell-type Drug-IdsArray Array-Id Manufacturer Type BatchGene gene-id sequenceExpt-ExptrExpt-SampleExpt-Array Expression-valuevalue*
19、1Many-to-oneNotationMultivalued attributeExample E-R DiagramObject relational data modelImproved relational model by adding some features from object data models.Information is represented as in relational models but column values not restricted to one mutliple values are allowed.Example (sample tab
20、le in previous slides): sampe-idorganismcelltypedrug_id d1 d2s001ecolic123ac1 nmac3 nms002ecolic123Queries, queries, queries!Given a collection of microarray generated gene expression data, what kind of questions the users wish to pose.Constructing an extensive list of possible interesting queries a
21、nd data mining problems that has to be supported by the database will facilitate the design process.Query to the data Which genes are linked ?Which genes are expressed similarly to my gene XYZ?Which genes have a changed the expression in a second condition ?Which genes are co-expressed in differing
22、conditions ?classification (of tumors, diseased tissues etc.): which patterns are characteristic for a certain class of samples, which genes are involved?Queries, queries, queries!More Queries !Queries that add a link in additional knowledge functional classification of genes: Are changes clustered
23、in particular classes? metabolic pathway information: Is a certain pathway/route in a pathway affected?disease information & clinical follow up: correlation to expression patterns. phenotype information for mutants: Are there correlations between particular phenotypes and expression patterns?More Qu
24、eries !in what region is the interesting gene located in the genome? is there synteny in this region with other species? is there a known trait that maps to this region? Query Language Language in which user requests information from the database.SQLData definition helps you implement your model and
25、 data manipulation helps you modify and retrive dataAdvantages: Can specify query declaratively and let database system figure out best way of finding answersSupports queries of medium complexitySpecialized languages SQL language statements are not abstract but very close to spoken language.Basic SQ
26、L QueriesFind the image for experiment number 1345 select image from experimentwhere experiment-id = 1345;Find the experiment-id and image of all experiments involving e-coliselect experiment-id, image from experiment, sample where experiment.sample-id = sample.sample-id and anism = e.coli
27、;All combinations of rows from the relations in the from clause are considered, and those that satisfy the where conditions are outputInterfacingSQL queries are carried out on terminal screen which is not very useful and user friendly for an end user, so applications are created to interface more fr
28、iendly staments with the SQL statementsA web form is a typical example of interface for SQLApplications for data loading. More complex queries (e.g. data mining such as classification and clustering) are very imporatant part of the Microarray Analyis Protocol It is very important to interface the va
29、rious applications we use to analyse the retrieved data with database. Gene Expression Databases Require IntegrationThere are many different types of data presenting numerous relationships. There are a number of Databases with lots of information.Experiments need to be compared because the experimen
30、ts are very difficult to perform and very expensive.Solution: Make all the databases talk the same language.XML was the choice of data interchange format.Why XML?Why XML ?: XML provides the method for defining the meaning or semantics of data.Example : A XML file of the earlier table we defined GBVN
31、32 NT_010651 2354807 2360778 ComplementMapping XML to Relational DatabaseThe Data Structure in XML is defined in Document Type Descrciptor as follows This kind of DTD also helps us to have control over the vocabulary used.SQL: create table gene ( gene_id varchar(5) primary key, contig_id varchar(10)
32、 not null,contig_start integer not null, contig_end integer not null, contig_sequence text not null);So the DTD can be directly mapped into a relational database.MAGE-ML As Data Interchage FormatDatabasesExpression DataConverter (program)MAGE-MLExisting Microarray Databases Several gene expression d
33、atabases exist:Both commercial and non-commercial. Most focus on either a particular technolgy or a particular organism or both.Commercial databases:Rosetta Inpharmatics and Genelogic, the specifics of their internal structure is not available for internal scrutiny due to their proprietary nature. S
34、ome non-commercial efforts to design more general databases merit particular mention.We will discuss few of the most promising ones ArrayExpress - EBIThe Gene expression Omnibus (GEO) - NLMThe Standford microarray Database ExpressDB - Harvard Genex - NCGRArrayExpressPublic repository of microarray b
35、ased gene expression data.Implemented in Oracle at EBI.Contains:several curated gene expression datasets possible introduction of an image server to archive raw image data associated with the experiments.Accepts submissions in MAGE-ML format via a web-based data annotation/submission tool called MIA
36、MExpress. A demo version of MIAMExpress is available at: :/industry.ebi.ac.uk/parkinso/subtool/subtype.html Provides a simple web-based query interface and is directly linked to the Expression Profiler data analysis tool which allows expression data clustering and other types of data exploration dir
37、ectly through the web.Gene Express OmnibusThe Gene Expression Omnibus ia a gene expression database hosted at the National library of MedicineIt supports four basic data elements Platform ( the physical reagents used to generate the data)Sample (information about the mRNA being used)Submitter ( the
38、person and organisation submitting the data)Series ( the relationship among the samples).It allows download of entire datasets, it has not ability to query the relationships Data are entered as tab delimited ASCII records,with a number of columns that depend on the kind of array selected.Supports Se
39、rial Analysis of Gene Expression (SAGE) data.Stanford Microarray DatabaseContains the largest amount of data.Uses relational database to answer queries.Associated with numerious clustering and analysis features.Users can access the data in SMD from the web interface of the package.Disadvantage :It s
40、upports only Cy3/Cy5 glass slide dataIt is designed to exclusively use an oracle databaseHas been recently released outside without anykind of support !MaxdSQL Minor changes to the ArrayExpress object data model allowed it to be instantiated as a relational database, and MaxdSQL is the resulting imp
41、lementation.MaxdSQL supports both Spotted and Affymetrix data and not SAGE data.MaxdSQL is associated with the maxdView, a java suite of analysis and visualisation tools.This tool also provides an environment for developing tools and intergrating existing software.MaxdLoad is the data-loading applic
42、ation software.GeneX Open source database and integrated tool set released by NCGR :/ .Open source - provides a basic infrastructure upon which others can build.Stores numeric values for a spot measurement (primary or raw data), ratio and averaged data across array measurements.Includes a we
43、b interface to the database that allow users to retrieve: Entire datasets, subsetsGuided queries for processing by a particular analysis routineDownload data in both tab delimited form and GeneXML format ( more descriptions later)ExpressDBExpressDB is a relational database containing yeast and E.col
44、i RNA expression data.It has been conceived as an example on how to manage that kind of data.It allows web-querying or SQL-querying.It is linked to an integrated database for functional genomics called Biomolecule Interaction Growth and Expression Database (BIGED). BIGED is intended to support and i
45、ntegrate RNA expression data with other kinds of functional genomics dataThis survey is based on the article published in BRIEFINGS IN BIOINFORMATICS, Vol 2, No 2, pp 143-158, May 2001:A comparison of microarray databasesSurvey of existing microarray systemsThe Microarray Gene Expression Database Group (MGED)History and Future:Founded at a meeting in November, 1999 in Cambridge, UK.In May 2000 and March 2001: development of recommendations for microarray data annotations (MAIME, MAML).MGED 2nd meeting: establishment of a steering committee consisting of
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 環(huán)保設(shè)備安裝與維護服務(wù)合同
- 快遞合作協(xié)議合同
- 教育在線培訓(xùn)服務(wù)協(xié)議
- 建筑項目設(shè)計及施工合作協(xié)議
- 大灣區(qū)新興產(chǎn)業(yè)發(fā)展項目合作框架協(xié)議
- 環(huán)??萍柬椖垦邪l(fā)與推廣合同
- 總包單位簽訂分包合同
- 買賣手房反擔(dān)保合同
- 承包合同養(yǎng)殖合同
- 私人拖拉機買賣合同書
- 第五部分茶藝館的經(jīng)營與管理
- 《習(xí)作:那一刻-我長大了》課件ppt
- 小學(xué)道德與法治課堂生活化教學(xué)的策略講座稿
- 大學(xué)生返家鄉(xiāng)志愿服務(wù)證明
- (新版)網(wǎng)絡(luò)攻防知識考試題庫(含答案)
- 建筑工程資料檔案盒側(cè)面標簽
- 工程設(shè)計變更工程量計算表
- 動力工程及工程熱物理專業(yè)英語課件
- 幼兒系列故事繪本課件達芬奇想飛-
- 出納收入支出日記賬Excel模板
- 給水排水用格柵除污機通用技術(shù)條件
評論
0/150
提交評論