數(shù)據(jù)存儲方案.doc_第1頁
數(shù)據(jù)存儲方案.doc_第2頁
數(shù)據(jù)存儲方案.doc_第3頁
數(shù)據(jù)存儲方案.doc_第4頁
數(shù)據(jù)存儲方案.doc_第5頁
已閱讀5頁,還剩4頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

引言文獻是由Rick Cattell撰寫的論文,論文討論了可擴展的結(jié)構(gòu)化數(shù)據(jù)的、非結(jié)構(gòu)化的(包括基于鍵值對的、基于文檔的和面向列的)數(shù)據(jù)存儲方案(注:NOSQL是支撐大數(shù)據(jù)應(yīng)用的關(guān)鍵所在。事實上,將NOSQL翻譯為“非結(jié)構(gòu)化”不甚準確,因為NOSQL更為常見的解釋是:Not Only SQL(不僅僅是結(jié)構(gòu)化),換句話說,NOSQL并不是站在結(jié)構(gòu)化SQL的對立面,而是既可包括結(jié)構(gòu)化數(shù)據(jù),也可包括非結(jié)構(gòu)化數(shù)據(jù))。論文信息Scalable SQL and NoSQL Data StoresRick Cattell Originally published in 2010, last revised December 2011摘要ABSTRACTIn this paper, we examine a number of SQL and so- called “NoSQL” data stores designed to scale simple OLTP-style application loads over many servers.Originally motivated by Web 2.0 applications, these systems are designed to scale to thousands or millions of users doing updates as well as reads, in contrast to traditional DBMSs and data warehouses.We contrast the new systems on their data model, consistency mechanisms, storage mechanisms, durability guarantees, availability, query support, and other dimensions. These systems typically sacrifice some of these dimensions, e.g. database-wide transaction consistency, in order to achieve others, e.g. higher availability and scalability.在這篇文獻中,我們驗證了許多SQL和所謂的NoSQL數(shù)據(jù)存儲(它設(shè)計于支持簡單的OLTP風格的應(yīng)用,能夠用于擴展在很多服務(wù)器上)它最先由Web 2.0應(yīng)用引起,與傳統(tǒng)的數(shù)據(jù)庫管理系統(tǒng)和數(shù)據(jù)倉庫對比,這些系統(tǒng)設(shè)計為可擴展到數(shù)以千計或數(shù)以百萬計的用戶做更新,同時讀取。我們對比了新系統(tǒng)上的數(shù)據(jù)模型,一致性機制, 存儲機制,持久性保證,可用性,支持的查詢以及其它屬性,這些系統(tǒng)典型的犧牲(為了實現(xiàn)其它屬性而去掉)了一些屬性。如數(shù)據(jù)庫常有的事務(wù)一致性,犧牲了這個是為了其它的屬性,如高可用,可擴展。Note: Bibliographic references for systems are not listed, but URLs for more information can be found in the System References table at the end of this paper.注:參考書沒列出來(翻譯?。〤aveat: Statements in this paper are based on sources and documentation that may not be reliable, and the systems described are “moving targets,” so some statements may be incorrect. Verify through other sources before depending on information here. Nevertheless, we hope this comprehensive survey is useful! Check for future corrections on the authors web site /datastores.警告:一些提及的書可能不可用。盡管如此,我們還是希望這篇綜合的文獻對大家有幫助,我們網(wǎng)站:/datastores.Disclosure: The author is on the technical advisory board of Schooner Technologies and has a consulting business advising on scalable databases.透漏:作者是 可擴展數(shù)據(jù)庫商業(yè)顧問。1. OVERVIEWIn recent years a number of new systems have been designed to provide good horizontal scalability for simple read/write database operations distributed over many servers. In contrast, traditional database products have comparatively little or no ability to scale horizontally on these applications. This paper examines and compares the various new systems.近年,很多系統(tǒng)的設(shè)計提供良好水平擴展,支持在多服務(wù)器上分布式讀寫。相比較傳統(tǒng)的系統(tǒng),一般為無擴展,規(guī)模小。本篇文獻研究與對比很多不同的新系統(tǒng)(Yol注,其實就是各種NOSQL設(shè)計進行對比,比如Mongo與Hbase分類,簡介)Many of the new systems are referred to as “NoSQL” data stores. The definition of NoSQL, which stands for “Not Only SQL” or “Not Relational”, is not entirely agreed upon. For the purposes of this paper, NoSQL systems generally have six key features:NoSQL等于Not Only SQL, 或者Not Relational(弱關(guān)系型數(shù)據(jù)庫,與mysql比較起來),NoSQL的systems一般有6重要特征:1. the ability to horizontally scale “simple operation” throughput over many servers,通過簡單操作在多服務(wù)器上水平擴展的能力2. the ability to replicate and to distribute (partition) data over many servers,復(fù)制和分發(fā) (分區(qū)) 數(shù)據(jù)在多個服務(wù)器的能力3. a simple call level interface or protocol (in contrast to a SQL binding),一種簡單的調(diào)用級接口或協(xié)議 (相比較于 SQL 綁定)4. a weaker concurrency(并發(fā)性,并行性) model than the ACID transactions of most relational (SQL) database systems,對比大多數(shù)關(guān)系數(shù)據(jù)庫 (SQL) 數(shù)據(jù)庫管理系統(tǒng) ACID 事務(wù),它是一種較弱的并發(fā)模型5. efficient use of distributed indexes and RAM for data storage,有效地利用分布式的索引和 RAM 的數(shù)據(jù)存儲6.and the ability to dynamically add new attributes to data records.動態(tài)地在數(shù)據(jù)記錄中添加新的屬性The systems differ in other ways, and in this paper we contrast those differences. They range in functionality from the simplest distributed hashing, as supported by the popular memcached open source cache, to highly scalable partitioned tables, as supported by Googles BigTable 1. In fact, BigTable, memcached, and Amazons Dynamo 2 provided a “proof of concept” that inspired many of the data stores we describe here:這些系統(tǒng)在其他方面也有不同,在本文中我們對比了這些差異。它們的范圍從簡單的分布式哈希算法, 如流行的 開源memcached 緩存,到高度可擴展的已分區(qū)表,如谷歌的 BigTable 1。事實上,BigTable,memcached 和亞馬遜的Dynamo 2 提供”概念證明”,催動了許多我們在這兒描述的數(shù)據(jù)存儲: Memcached demonstrated(論證,證明) that in-memory indexes can be highly scalable, distributing and replicating objects over multiple nodes. Memcached 表明內(nèi)存中索引可以是高度可伸縮、 分布式和在多個節(jié)點上復(fù)制對象。 Dynamo pioneered the idea of eventual consistency as a way to achieve higher availability and scalability: data fetched are not guaranteed to be up-to-date, but updates are guaranteed to be propagated to all nodes eventually. Dynamo的先驅(qū)想了一個idea,以實現(xiàn)更高的可用性和可伸縮性的最終一致性, 那就是: 獲取數(shù)據(jù)不能保證是最新的,但保證這個最新能最終傳播到所有節(jié)點。 BigTable demonstrated that persistent record storage could be scaled to thousands of nodes, a feat that most of the other systems aspire to. BigTable 表明,持續(xù)的記錄存儲可以縮放到數(shù)千個節(jié)點,是其他系統(tǒng)最向往的。A key feature of NoSQL systems is “shared nothing” horizontal scaling replicating and partitioning data over many servers. This allows them to support a large number of simple read/write operations per second. This simple operation load is traditionally called OLTP (online transaction processing), but it is also common in modern web applicationsNoSQL 系統(tǒng)的一個核心特征是”無共享”的水平擴展 復(fù)制和數(shù)據(jù)分區(qū)在多臺服務(wù)器。這使他們能夠支持大量的每秒簡單的讀寫操作。這個簡單的操作負荷傳統(tǒng)上稱為 OLTP (聯(lián)機事務(wù)處理),但這在 web 應(yīng)用程序中很常見。The NoSQL systems described here generally do not provide ACID transactional properties: updates are eventually propagated, but there are limited guarantees on the consistency of reads. Some authors suggest a “BASE” acronym in contrast to the “ACID” acronym:通常這里描述的 NoSQL 系統(tǒng)不提供事務(wù)的 ACID 屬性: 更新最終傳播,但一致性的讀取有有限的保證。對比ACID的縮寫,有些作者建議”BASE”的首字母縮略詞,意義如下: BASE = Basically Available, Soft state, Eventually consistent 基本可用,軟狀態(tài),最終一致 ACID = Atomicity, Consistency, Isolation, and Durability 原子性、 一致性、 隔離和耐久性The idea is that by giving up ACID constraints, one can achieve much higher performance and scalability.這其中的想法是通過放棄ACID約束,可以實現(xiàn)多更高的性能和可擴展性.However, the systems differ in how much they give up. For example, most of the systems call themselves “eventually consistent”, meaning that updates are eventually propagated to all nodes, but many of them provide mechanisms for some degree of consistency, such as multi-version concurrency control (MVCC).然而,系統(tǒng)在他們放棄多少有所不同。例如,大部分的系統(tǒng)調(diào)用自己”最終一致性”,意味著更新最終傳播到所有節(jié)點,但其中許多人提供一定程度的一致性的機制,例如多版本并發(fā)控制 (MVCC)Proponents(n. (某事業(yè)、理論等的)支持者,擁護者) of NoSQL often cite Eric Brewers CAP theorem 4, which states that a system can have only two out of three of the following properties: consistency, availability, and partition-tolerance. The NoSQL systems generally give up consistency. However, the trade-offs are complex, as we will see.NoSQL 的擁護者經(jīng)常援引 Eric Brewer 帽定理 4,其中指出,一個系統(tǒng)可以有只有 2 / 3 的以下屬性: 一致性、 可用性和分區(qū)容忍性。NoSQL 系統(tǒng)通常會放棄一致性。然而,權(quán)衡取舍是復(fù)雜的正如我們將看到New relational DBMSs have also been introduced to provide better horizontal scaling for OLTP, when compared to traditional RDBMSs. After examining the NoSQL systems, we will look at these SQL systems and compare the strengths of the approaches. The SQL systems strive to provide horizontal scalability without abandoning SQL and ACID transactions. We will discuss the trade-offs(權(quán)衡取舍) here.此外介紹了新的關(guān)系型 Dbms 提供更好水平擴展用于 OLTP,相比傳統(tǒng)的 Rdbms。在檢查后的 NoSQL 系統(tǒng),我們將看看這些 SQL 系統(tǒng),然后比較優(yōu)勢。SQL 系統(tǒng)極力在不放棄 SQL 和 ACID 事務(wù)的前提下提供水平可伸縮性。我們將在這里討論權(quán)衡取舍In this paper, we will refer to both the new SQL and NoSQL systems as data stores, since the term “database system” is widely used to refer to traditional DBMSs. However, we will still use the term “database” to refer to the stored data in these systems. All of the data stores have some administrative unit that you would call a database: data may be stored in one file, or in a directory, or via some other mechanism that defines the scope of data used by a group of applications. Each database is an island unto itself, even if the database is partitioned and distributed over multiple machines: there is no “federated database” concept in these systems (as with some relational and object-oriented databases), allowing multiple separately-administered databases to appear as one. Most of the systems allow horizontal partitioning of data, storing records on different servers according to some key; this is called “sharding”. Some of the systems also allow vertical partitioning, where parts of a single record are stored on different servers.在本文中,我們將新 SQL 和 NoSQL 系統(tǒng)稱為數(shù)據(jù)存儲,因為”數(shù)據(jù)庫系統(tǒng)”一詞被廣泛用于指傳統(tǒng) DBMS。但是,我們?nèi)詫⑹褂谩睌?shù)據(jù)庫”一詞指在這些系統(tǒng)中存儲的數(shù)據(jù)引用。數(shù)據(jù)存儲的都是一些數(shù)據(jù)庫的(行政,管理)單位,: 數(shù)據(jù)可能存儲在一個文件中,或在目錄中,或通過定義范圍的數(shù)據(jù)使用的其他一些機制的一組應(yīng)用程序。每個數(shù)據(jù)庫是一座孤島本身,即使數(shù)據(jù)庫分區(qū)并且分布在多臺機器: 在這些系統(tǒng)中有沒有”聯(lián)邦的數(shù)據(jù)庫”概念 (如一些關(guān)系數(shù)據(jù)庫和面向?qū)ο髷?shù)據(jù)庫),允許多個單獨管理的數(shù)據(jù)庫,顯示為一個(Yol注:也就是不允許多個單獨的顯示為一個)。大多數(shù)系統(tǒng)允許根據(jù)一些鍵,進行水平分區(qū)存儲數(shù)據(jù),記錄在不同的服務(wù)器,;這就被所謂”切分”。一些系統(tǒng)還允許進行垂直分區(qū),單個記錄的分成部分,分布存儲在不同服務(wù)器上。1.1 Scope of this Paper此文獻討論范圍Before proceeding, some clarification is needed in defining “horizontal scalability” and “simple operations”. These define the focus of this paper.在開始之前,在定義”橫向擴展”和”操作簡單”需要一些澄清。這些定義本文的重點。By “simple operations”, we refer to key lookups, reads and writes of one record or a small number of records. This is in contrast to complex queries or joins, read- mostly access, or other application loads. With the advent of the web, especially Web 2.0 sites where millions of users may both read and write data, scalability for simple database operations has become more important. For example, applications may search and update multi-server databases of electronic mail, personal profiles, web postings, wikis, customerrecords, online dating records, classified ads, and many other kinds of data. These all generally fit the definition of “simple operation” applications: reading or writing a small number of related records in each operation.“簡單的操作,”指:我們是指關(guān)鍵的查找、 讀取和寫入一條記錄或記錄的小數(shù)目。這是與復(fù)雜的查詢或聯(lián)接(joins),只讀主要訪問,或其他應(yīng)用程序加載相對比的。隨著互聯(lián)網(wǎng)的出現(xiàn),特別是 Web 2.0 網(wǎng)站在那里數(shù)以百萬計的用戶可同時讀取和寫入數(shù)據(jù),簡單的數(shù)據(jù)庫操作的可擴展性已變得更為重要。例如,應(yīng)用程序可以搜索和更新多個服務(wù)器數(shù)據(jù)庫上的電子郵件、 個人配置文件、 網(wǎng)絡(luò)帖子、 wiki、 客戶記錄、 在線約會記錄,分類廣告和許多其他類型的數(shù)據(jù)。這些一般都符合定義的應(yīng)用程序”操作簡單”: 即讀取或?qū)懭朊總€操作中的相關(guān)記錄的小數(shù)目。The term “horizontal scalability” means the ability to distribute both the data and the load of these simple operations over many servers, with no RAM or disk shared among the servers. Horizontal scaling differs from “vertical” scaling, where a database system utilizes (利用)many cores and/or CPUs that share RAM and disks. Some of the systems we describe provide both vertical and horizontal scalability, and the effective use of multiple cores is important, but our main focus is on horizontal scalability, because the number of cores that can share memory is limited, and horizontal scaling generally proves less expensive, using commodity(商品) servers. Note that horizontal and vertical partitioning are not related to horizontal and vertical scaling, except that they are both useful for horizontal scaling.“橫向擴展”,(Yol注:英文中horizontal scalability可以說成橫向擴展,水平擴展,與縱向擴展,垂直擴展相對應(yīng))是指在多個服務(wù)器,進行數(shù)據(jù)分布式和簡單操作的負載,這些服務(wù)器之間沒有 RAM 共享或磁盤共享。水平擴展,有別于”垂直”擴展,垂直擴展是一個數(shù)據(jù)庫系統(tǒng)利用多核和/或共享 RAM 和磁盤的 Cpu。一些我們所描述的系統(tǒng)同時提供縱向和橫向的可擴展性,當然多個內(nèi)核的有效利用是重要的,但我們的主要焦點是水平可伸縮性,因為可以共享內(nèi)存的內(nèi)核的數(shù)量是有限的,水平縮放一般提供便宜,商用的服務(wù)器。請注意,水平和垂直分區(qū)與水平和垂直擴展無關(guān)的,雖然他們都有益于水平擴展。1.2 Systems Beyond our Scope超過我們范圍的系統(tǒng)Some authors have used a broad definition of NoSQL, including any database system that is not relational. Specifically, they include:一些作者已經(jīng)使用 是廣義定義的NoSQL,包括任何不是關(guān)系型的如: Graph database systems: Neo4j and OrientDB provide efficient distributed storage and queries of a graph of nodes with references among them.圖形數(shù)據(jù)庫系統(tǒng): Neo4j 和 OrientDB 提供了高效的分布式的存儲和在相互引用的節(jié)點中查詢。 Object-oriented database systems: Object-oriented DBMSs (e.g., Versant) also provide efficient distributed storage of a graph of objects, and materialize these objects as programming language objects.面向?qū)ο髷?shù)據(jù)庫系統(tǒng): 面向?qū)ο蟮臄?shù)據(jù)庫管理系統(tǒng) (例如,Versant) 也提供對象的高效的分布式的圖存儲,實現(xiàn)這些對象作為編程語言對象 Distributed object-oriented stores: Very similar to object-oriented DBMSs, systems such as GemFire distribute object graphs in-memory on multiple servers.分布式面向?qū)ο蟠鎯Γ悍浅n愃朴诿嫦驅(qū)ο蟮臄?shù)據(jù)庫管理系統(tǒng),像GemFire,在多個服務(wù)器內(nèi)存上 進行分布式對象的圖形存儲These systems are a good choice for applications that must do fast andextensive reference-following(索引跟蹤), especially where data fits in memory. Programming language integration is also valuable. Unlike the NoSQL systems, these systems generally provide ACID transactions. Many of them provide horizontal scaling for reference-following and distributed query decomposition, as well. Due to space limitations, however, we have omitted these systems from our comparisons. The applications and the necessary optimizations for scaling for these systems differ from the systems we cover here, where key lookups and simple operations predominate over reference- following and complex object behavior. It is possible these systems can scale on simple operations as well, but that is a topic for a future paper, and proof through benchmarks.對于那些應(yīng)用程序是必須do fast和索引跟蹤的需求,尤其是應(yīng)用數(shù)據(jù)在內(nèi)存中的情況,這些系統(tǒng)是一個不錯的選擇。編程語言集成也是有價值的(?這句沒懂)。不像 NoSQL 系統(tǒng),這些系統(tǒng)一般提供 ACID 事務(wù)。其中許多為提供索引跟蹤和分布式查詢分解,提供水平擴展。然而,由于篇幅的限制,我們省略了這些系統(tǒng)間的比較。應(yīng)用程序和為這些系統(tǒng)的必要優(yōu)化不是我們在這里要討論的,我們重點是關(guān)鍵查詢和操作簡單而不是索引跟蹤和復(fù)雜的對象行為。它是可能這些系統(tǒng)可以通過簡單的操作進行擴展,但那是未來的文獻再討論并通過一些原則再證明的了。Data warehousing database systems provide horizontal scaling, but are also beyond the scope of this paper. Data warehousing applications are different in important ways:數(shù)據(jù)倉庫數(shù)據(jù)庫系統(tǒng)提供水平擴展,但也超出了本文的范圍。數(shù)據(jù)倉庫應(yīng)用程序是不同的重要途徑(本小節(jié)以下略) They perform complex queries that collect and join information from many different tables. The ratio of reads to writes is high: that is, the database is read-only or read-mostly.There are existing systems for data warehousing that scale well horizontally. Because the data is infrequently updated, it is possible to organize or replicate the database in ways that make scaling possible.1.3 Data Model Terminology數(shù)據(jù)模型術(shù)語Unlike relational (SQL) DBMSs, the terminology(術(shù)語) used by NoSQL data stores is often inconsistent. For the purposes of this paper, we need a consistent way to compare the data models and functionality.不像關(guān)系型數(shù)據(jù)庫系統(tǒng),NoSQL 數(shù)據(jù)存儲的術(shù)語往往是不一致的。對于本文而言,我們需要以一致的方式進行比較的數(shù)據(jù)模型和功能All of the systems described here provide a way to store scalar values, like numbers and strings, as well as BLOBs. Some of them also provide a way to store more complex nested or reference values. The systems all store sets of attribute-value pairs, but use different data structures, specifically:所有這里描述的系統(tǒng)提供一種標量值,如數(shù)字、 字符串,如 Blob 存儲方式。其中有些還提供存儲更復(fù)雜的嵌套或參考值的方法。系統(tǒng)所有存儲組屬性-值對,但使用了不同的數(shù)據(jù)結(jié)構(gòu),具體為: A “tuple” is a row in a relational table, where attribute names are pre-defined in a schema, and the values must be scalar. The values are referenced by attribute name, as opposed to an array or list, where they are referenced by ordinal position. “元組”是一個關(guān)系表中的一行,在這里面,屬性名稱在schema預(yù)定義,值必須是標量。由屬性名稱做值的索引,而不像數(shù)組或列表中,值由它們的序號位置做索引。 A “document” allows values to be nested documents or lists as well as scalar values, and the attribute names are dynamically defined for each document at runtime. A document differs from a tuple in that the attributes are not defined in a global schema, and this wider ra

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論