Greenplum 數(shù)據(jù)庫(kù)開(kāi)發(fā)基礎(chǔ)_第1頁(yè)
Greenplum 數(shù)據(jù)庫(kù)開(kāi)發(fā)基礎(chǔ)_第2頁(yè)
Greenplum 數(shù)據(jù)庫(kù)開(kāi)發(fā)基礎(chǔ)_第3頁(yè)
Greenplum 數(shù)據(jù)庫(kù)開(kāi)發(fā)基礎(chǔ)_第4頁(yè)
Greenplum 數(shù)據(jù)庫(kù)開(kāi)發(fā)基礎(chǔ)_第5頁(yè)
已閱讀5頁(yè),還剩97頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

Greenplum數(shù)據(jù)庫(kù)開(kāi)發(fā)基礎(chǔ)12014年10月目錄Greenplum概述和數(shù)據(jù)分布數(shù)據(jù)加載和外部表客戶端工具Greenplum數(shù)據(jù)庫(kù)基礎(chǔ)Greenplum與Oracle優(yōu)化策略其他要點(diǎn)及示例MPP架構(gòu)MPP(MassivelyParallelProcessing)

Shared-NothingArchitectureMPP無(wú)共享架構(gòu)的優(yōu)勢(shì)InterconnectLoading數(shù)據(jù)分布在所有的并行節(jié)點(diǎn)上每個(gè)節(jié)點(diǎn)只處理其中一部分?jǐn)?shù)據(jù)最優(yōu)化的I/O處理所有的節(jié)點(diǎn)同時(shí)進(jìn)行并行處理節(jié)點(diǎn)之間完全無(wú)共享,無(wú)I/O沖突自動(dòng)化的并行處理機(jī)制內(nèi)部處理自動(dòng)化并行,無(wú)需人工分區(qū)或優(yōu)化加載與訪問(wèn)方式與一般數(shù)據(jù)庫(kù)相同最易于擴(kuò)展的架構(gòu)BI和數(shù)據(jù)分析的最佳選擇增加節(jié)點(diǎn)實(shí)現(xiàn)線性擴(kuò)展增加節(jié)點(diǎn)可線性增加存儲(chǔ)、查詢和加載性能Greenplum基本架構(gòu)MPP(MassivelyParallelProcessing)

Shared-NothingArchitectureNetworkInterconnect............Master

SeversQueryplanning&dispatchSegment

SeversQueryprocessing&datastorageSQLMapReduceExternal

SourcesLoading,streaming,etc.Greenplum的并行處理特性并行處理由系統(tǒng)自動(dòng)完成,無(wú)需人工干預(yù)所有數(shù)據(jù)均勻分布到所有節(jié)點(diǎn),每個(gè)節(jié)點(diǎn)都計(jì)算自己的部分?jǐn)?shù)據(jù),所以并行處理無(wú)需人工干預(yù),系統(tǒng)自動(dòng)完成。

無(wú)需復(fù)雜的調(diào)優(yōu)需求,只需要加載數(shù)據(jù)和查詢DBA工作量極少,無(wú)需復(fù)雜的調(diào)優(yōu)工作和維護(hù)工作。

擴(kuò)展性可線性擴(kuò)展到10,000個(gè)節(jié)點(diǎn)每增加一個(gè)節(jié)點(diǎn),查詢、加載性能都成線性增長(zhǎng)

客戶端訪問(wèn)及第三方工具支持完全支持?jǐn)?shù)據(jù)庫(kù)技術(shù)接口標(biāo)準(zhǔn),例如:SQL,ODBC,JDBC,OLEDB等。同時(shí),廣泛地支持各個(gè)BI和ETL軟件工具。Greenplum基本體系架構(gòu)客戶端接口和程序

psqlpgAdminIIIODBC/DatadirectJDBCPerlDBIPythonlibpqOLEDBMasterHost訪問(wèn)系統(tǒng)的入口建立與客戶端的連接和管理SQL的解析并形成執(zhí)行計(jì)劃執(zhí)行計(jì)劃向Segment的分發(fā)收集Segment的執(zhí)行結(jié)果協(xié)調(diào)工作處理過(guò)程Master不存儲(chǔ)業(yè)務(wù)數(shù)據(jù),只存儲(chǔ)系統(tǒng)目錄表和元數(shù)據(jù)(數(shù)據(jù)字典)Segment每段(Segment)存放一部分用戶數(shù)據(jù)

一個(gè)系統(tǒng)可以有多段

用戶不能直接存取訪問(wèn)

所有對(duì)段的訪問(wèn)都經(jīng)過(guò)Master用戶查詢SQL的執(zhí)行Interconnect

Greenplum數(shù)據(jù)庫(kù)之間的連接層

進(jìn)程間協(xié)調(diào)和管理

基于千兆以太網(wǎng)架構(gòu)

屬于系統(tǒng)內(nèi)部私網(wǎng)配置

支持兩種協(xié)議:TCPorUDPGreenplum高可用性體系架構(gòu)Master/Standby鏡像保護(hù)Standby實(shí)時(shí)與Master節(jié)點(diǎn)的Catalog和事務(wù)日志保持同步Standby節(jié)點(diǎn)用于當(dāng)PrimaryMaster節(jié)點(diǎn)損壞時(shí)提供Master服務(wù)數(shù)據(jù)冗余-Segment鏡像保護(hù)每個(gè)Segment的數(shù)據(jù)冗余存放在另一個(gè)Segment上,數(shù)據(jù)實(shí)時(shí)同步當(dāng)PrimarySegment失敗時(shí),MirrorSegment將自動(dòng)提供服務(wù)PrimarySegment恢復(fù)正常后,使用gprecoverseg–F

同步數(shù)據(jù)。表分布的策略-并行計(jì)算的基礎(chǔ)

Hash分布CREATETABLE…DISTRIBUTEDBY(column[,…])同樣數(shù)值的內(nèi)容被分配到同一個(gè)Segment上

循環(huán)分布

CREATETABLE…DISTRIBUTEDRANDOMLY具有同樣數(shù)值的行內(nèi)容并不一定在同一個(gè)Segment上分布鍵(DistributionKeys)用于將數(shù)據(jù)平均分布到Segments之中的一個(gè)或者多個(gè)字段用表的主鍵作分布鍵可以使數(shù)據(jù)分布均勻建表時(shí)使用DISTRIBUTEDBY

子句定義表的分布鍵

CREATETABLEsales

(dtdate,prcfloat,qtyint,cust_idint,

prod_idint,vend_idint)DISTRIBUTEDBY(dt,cust_id,prod_id);如果表沒(méi)有主鍵,或者沒(méi)有合適的字段作為分布鍵,可以使用隨機(jī)分布鍵

(DISTRIBUTEDRANDOMLY)如果沒(méi)有明確定義分布鍵,系統(tǒng)會(huì)把第一個(gè)字段作為表的分布鍵分布存儲(chǔ)數(shù)據(jù)均勻分布-并行處理的關(guān)鍵43Oct2020051264Oct2020051145Oct2020054246Oct2020056477Oct2020053248Oct20200512OrderOrder#Order

DateCustomer

ID50Oct2020053456Oct2020052163Oct2020051544Oct2020051053Oct2020058255Oct20200555策略:

數(shù)據(jù)盡可能的均勻分布到每個(gè)節(jié)點(diǎn)查詢命令的執(zhí)行SQL查詢處理機(jī)制并行查詢計(jì)劃SELECTcustomer,amountFROMsalesJOINcustomerUSING(cust_id)WHEREdate=04302008;壓縮存儲(chǔ)和行列存儲(chǔ)壓縮存儲(chǔ)支持ZLIB和QUICKLZ方式的壓縮,壓縮比可到10:1壓縮表只能是AppendOnly方式壓縮數(shù)據(jù)不一定會(huì)帶來(lái)性能的下降,壓縮表將消耗CPU資源,而減少I/O資源占用語(yǔ)法CREATETABLEfoo(aint,btext)

WITH(appendonly=true,compresstype=zlib,compresslevel=5);行或列存儲(chǔ)模式列模式目前只支持AppendOnly如果常用的查詢只取表中少量字段,則列模式效率更高,如查詢需要取表中的大量字段,行模式效率更高語(yǔ)法:

CREATETABLEsales2(LIKEsales)WITH(appendonly=true,orientation=column);鎖停止活動(dòng)的SQL查詢查看要停止的SQL查詢的進(jìn)程ID

執(zhí)行select*frompg_stat_activity查看到當(dāng)前數(shù)據(jù)庫(kù)連接的IP地址,用戶名,提交的查詢等。(另外也可以在master主機(jī)上查看進(jìn)程,對(duì)每個(gè)客戶端連接,master都會(huì)創(chuàng)建一個(gè)進(jìn)程。ps-ef|grep-ipostgres|grep-icon)##查詢表是否被鎖selectprocpid,t.*

frompg_stat_activityt

whereusename=‘lds_betl’

anddatname=‘ldsdb’

andwaiting=‘t’;停止SQL:

執(zhí)行Select

pg_cancel_backend(procpid)

或者Selectpg_terminate_backend(procpid)或者在MASTER

OS:$killprocpid注:極端情況下,kill不能停止SQL時(shí),采用kill

-11停止進(jìn)程千萬(wàn)不要使用kill

-9,該操作導(dǎo)致數(shù)據(jù)庫(kù)崩潰;

生產(chǎn)系統(tǒng)請(qǐng)不要采用kill操作。表分區(qū)的概念將一張大表邏輯性地分成多個(gè)部分,如按照分區(qū)條件進(jìn)行查詢,將減少數(shù)據(jù)的掃描范圍,提高系統(tǒng)性能。提高對(duì)于特定類型數(shù)據(jù)的查詢速度和性能也可以更方便數(shù)據(jù)庫(kù)的維護(hù)和更新兩種類型:Range分區(qū)(日期范圍或數(shù)字范圍)/如日期、價(jià)格等List分區(qū),例如地區(qū)、產(chǎn)品等Greenplum中的表分區(qū)在使用中具有總表的繼承性,并通過(guò)Check參數(shù)指定相應(yīng)的子表分區(qū)的子表依然根據(jù)分布策略分布在各segment上分區(qū)是一種非常有用的優(yōu)化措施,例如一年的交易按交易日期分區(qū)后,查詢一天的交易性能將提高365倍?。?!DataDistribution&PartitioningSegment1ASegment1BSegment1CSegment1DSegment2ASegment2BSegment2CSegment2DSegment3ASegment3BSegment3CSegment3DJan2005Feb2005Mar2005Apr2005May2005Jun2005Jul2005Aug2005Sep2005Oct2005Nov2005Dec2005每個(gè)分區(qū)表的數(shù)據(jù)平均分布到各個(gè)節(jié)點(diǎn)表分區(qū)可減少數(shù)據(jù)的搜索范圍,提高查詢性能FullTableScanVS.PartitionPruningSegment1ASegment1BSegment1CSegment1DSegment2ASegment2BSegment2CSegment2DSegment3ASegment3BSegment3CSegment3DSegment1ASegment1BSegment1CSegment1DSegment2ASegment2BSegment2CSegment2DSegment3ASegment3BSegment3CSegment3DSegment1ASegment1BSegment1CSegment1DSegment2ASegment2BSegment2CSegment2DSegment3ASegment3BSegment3CSegment3DSELECTCOUNT(*)FROMordersWHEREorder_date>=‘Oct202005’ANDorder_date<‘Oct272005’VSHashDistributionHashDistribution+TablePartitioningSegment1ASegment1BSegment1CSegment1DSegment2ASegment2BSegment2CSegment2DSegment3ASegment3BSegment3CSegment3D表分區(qū)示意圖Range分區(qū)CREATETABLEorders(order_id BIGINT,order_date TIMESTAMP,order_mode VARCHAR(8),customer_id NUMERIC(6),order_status NUMERIC(2),order_total NUMERIC(8,2),sales_rep_id NUMERIC(6),promotion_id NUMERIC(6))DISTRIBUTEDBY(customer_id)PARTITIONBYRANGE(order_date)(START('2005-12-01')END('2007-12-01')EVERY(interval'1year'),START('2007-12-01')END('2008-12-01')EVERY(interval'1month'),START('2008-12-01')END('2008-12-08')EVERY(interval'1day'),START('2008-12-08')END('2008-12-09')EVERY(interval'1hour'));List分區(qū)CREATETABLErank(

idint,

rankint,

yearint,

genderchar(1),

countint)DISTRIBUTEDBY(id)PARTITIONBYLIST(gender)(PARTITIONgirlsVALUES('F'),PARTITIONboysVALUES('M'),DEFAULTPARTITIONother);Multi-level分區(qū)CREATETABLEsales(

trans_idint,

datedate,

amountdecimal(9,2),

regiontext)DISTRIBUTEDBY(trans_id)PARTITIONBYRANGE(date)SUBPARTITIONBYLIST(region)SUBPARTITIONTEMPLATE(SUBPARTITIONusaVALUES('usa'),SUBPARTITIONasiaVALUES('asia'),SUBPARTITIONeuropeVALUES('europe'),START(date'2008-01-01')INCLUSIVEEND(date'2009-01-01')EXCLUSIVEEVERY(INTERVAL'1month'),DEFAULTPARTITIONoutlying_dates);修改表分區(qū)ALTERTABLE… |ALTERPARTITION… |DROPPARTITION… |TRUNCATEPARTITION… |RENAMEPARTITION… |ADDPARTITION… |EXCHANGEPARTITION… |SPLITPARTITION… |SETSUBPARTITIONTEMPLATEEg:ALTERTABLEfooEXCHANGEPARTITIONFOR(RANK(1))WITHTABLEbar;Querythecatalog:SELECTpartitiontablename,partitionlevel,partitionrank,partitionrangestart,partitionrangeend,partitioneveryclauseFROMpg_partitions;目錄Greenplum概述和數(shù)據(jù)分布數(shù)據(jù)加載和外部表客戶端工具Greenplum數(shù)據(jù)庫(kù)基礎(chǔ)Greenplum與Oracle優(yōu)化策略其他要點(diǎn)及示例外部表加載外部表的特征Read-only數(shù)據(jù)存放在數(shù)據(jù)庫(kù)外可執(zhí)行SELECT,JOIN,SORT等命令,類似正規(guī)表的操作外部表的優(yōu)點(diǎn)并行方式加載ETL的靈活性格式錯(cuò)誤行的容錯(cuò)處理支持多種數(shù)據(jù)源兩種方式ExternalTables:

基于文件WebTables:

基于URL或指令基于外部表的高速數(shù)據(jù)加載利用并行數(shù)據(jù)流引擎,Greenplum可以直接用SQL操作外部表數(shù)據(jù)加載完全并行Master主機(jī)Segment主機(jī)內(nèi)部互聯(lián)網(wǎng)—千兆以太網(wǎng)交換機(jī)gpfdistgpfdistSegment主機(jī)Segment主機(jī)Segment主機(jī)外部表文件外部表文件ETL服務(wù)器內(nèi)部網(wǎng)絡(luò)外部表加載的特征并行數(shù)據(jù)加載提供最好的性能能夠處理遠(yuǎn)程存儲(chǔ)的文件采用HTTP協(xié)議每個(gè)gpfdist可達(dá)到200MB/s數(shù)據(jù)分布率gpfdist文件分發(fā)守護(hù)進(jìn)程啟動(dòng):

gpfdist-d/var/load_files/expenses-p8080-l/home/gpadmin/log&外部表定義:CREATEEXTERNALTABLEext_expenses

(nametext,datedate,

amountfloat4,descriptiontext)LOCATION('gpfdist//etlhost:8081/*','gpfdist//etlhost:8082/*')FORMAT'TEXT'(DELIMITER'|')ENCODING’UTF-8’LOGERRORSINTOext_expenses_loaderrorsSEGMENTREJECTLIMIT10000ROWS;外部表加載異常處理加載正常數(shù)據(jù)并捕獲格式異常的數(shù)據(jù),比如:缺少某些屬性的行屬性數(shù)據(jù)類型錯(cuò)誤無(wú)效的字符集編碼不符合約束PRIMARYKEY,NOTNULL,CHECKorUNIQUEconstraints外部表錯(cuò)誤處理可選子句:[LOGERRORSINTOerror_table]SEGMENTREJECTLIMITcount[ROWS|PERCENT](PERCENTbasedongp_reject_percent_thresholdparameter)例子CREATEEXTERNALTABLEext_customer(idint,nametext,sponsortext)LOCATION('gpfdist://filehost:8081/*.txt')FORMAT'TEXT'(DELIMITER'|'NULL'')LOGERRORSINTOerr_customerSEGMENTREJECTLIMIT5ROWS;COPYSQL

命令PostgreSQL命令支持?jǐn)?shù)據(jù)加載和數(shù)據(jù)卸載加載大量數(shù)據(jù)的最佳方法串行加載所有行(非并行)從文件或者標(biāo)準(zhǔn)輸入讀取加載數(shù)據(jù)和外部表一樣支持錯(cuò)誤處理EXAMPLECOPYmytableFROM'/data/myfile.csv'WITHCSVHEADER;(文件生成在Master)\COPYmytableFROM‘/data/myfile.csv’WITHCSVHEADER;(文件生成在本地)COPYcountryFROM'/data/gpdb/country_data'WITHDELIMITER'|'LOGERRORSINTOerr_countrySEGMENTREJECTLIMIT10ROWS;數(shù)據(jù)加載性能優(yōu)化提示

刪除索引,加載完成后再重建

加載完成后執(zhí)行ANALYZE

加載出錯(cuò)、DELETE/UPDATE等操作之后執(zhí)行VACUUM

不要使用ODBCINSERT加載大量數(shù)據(jù)目錄Greenplum概述和數(shù)據(jù)分布數(shù)據(jù)加載和外部表客戶端工具Greenplum數(shù)據(jù)庫(kù)基礎(chǔ)Greenplum與Oracle優(yōu)化策略其他要點(diǎn)及示例客戶端工具pgAdmin3圖形化管理和SQL執(zhí)行/分析/監(jiān)控工具psql

行命令操作和管理工具pgAdmin3forGPDBpgAdmin3是一款重要的PostgreSQL圖形化管理和開(kāi)發(fā)的開(kāi)源管理工具pgAdmin3forGPDBpgAdmin3forGPDB監(jiān)控活動(dòng)session,同SQL:select*frompg_stat_activity;監(jiān)控鎖,從pg_lock中獲取信息可以停止正在運(yùn)行的SQLPSQL通過(guò)master建立連接連接選項(xiàng)databasename(-d|PGDATABASE)masterhostname(-h|PGHOST)masterport(-p|PGPORT)username(-U|PGUSER)ConnectToGPDBpsql-hdb_ip-pport-Udbusr-vON_ERROR_STOP=1-ddatabaseeg:psql-h1-p5432-Ugpadmin-vON_ERROR_STOP=1-dsordb常用PSQL命令\?(helponpsqlmeta-commands)\h(helponSQLcommandsyntax)\dt(showtables)\dtS(showsystemtables)\dgor\du(showroles)\l(showdatabases)\cdb_name(connecttothisdatabase)\q(quitpsql)\!(Enterintoshellmode)\df(showfunction)\dn(showschema)Setsearch_path=…\timing目錄Greenplum概述和數(shù)據(jù)分布數(shù)據(jù)加載和外部表客戶端工具Greenplum數(shù)據(jù)庫(kù)基礎(chǔ)Greenplum與Oracle優(yōu)化策略其他要點(diǎn)及示例數(shù)據(jù)庫(kù)ToCreate:CREATEDATABASEorcreatedbToDrop:DROPDATABASEordropdbToEdit:ALTERDATABASE

ChangenameAssignnewownerSetconfigurationparametersPSQLTipsPSQL顯示所連接的數(shù)據(jù)庫(kù)

EXAMPLE:template1=#(superuser)

names=>(non-superuser)Toshowalistofalldatabases:

\lToconnecttoanotherdatabase:

\cdb_nameUsePGDATABASEenvironmentvariabletosetthedefaultdatabaseSchemaToCreate:CREATESCHEMAToDrop:DROPSCHEMAToEdit:ALTERSCHEMAChangenameAssignnewownerPSQLTipsToseethecurrentschema:

SELECTcurrent_schema();Toseealistofallschemasinthedatabase:

\dnToseetheschemasearchpath:

SHOWsearch_path;Tosetthesearchpathforadatabase:

ALTERDATABASESETsearch_pathTOmyschema,public,pg_catalog;表ToCreate:CREATETABLEAdditionalDISTRIBUTEDBYorDISTRIBUTEDRANDOMLYclauseSomesyntaxnotsupportedToEdit:ALTERTABLECannotalterdistributionkeycolumnsToDrop:DROPTABLEPSQLTipsTolisttablesinthedatabase:

\dtToseestructureofatable:

\d+table_nameTolistsystemcatalogtables:

\dtSTolistexternaltablesonly:

\dxToseethedistributionkeycolumnsofatable:

\dtable_name表和字段約束CHECKtableorcolumnconstraintsNOTNULLcolumnconstraintsUNIQUEcolumnconstraintsOneallowedpertableUniquecolumnsmustalsobeindistributionkeyNotallowediftablealsohasaprimarykeyPRIMARYKEYtableconstraintsUsedasdistributionkeybydefaultFOREIGNKEYconstraintsdefinitionsaresupportedbutnotenforcedForeignkeyrelationshipsareutilizedbythequeryplannertoimprovequeryplans.視圖

ViewSQLCommands:CREATEVIEWDROPVIEW

PSQLTips:Tolistallviewswhileinpsql:\dv

Toseeaviewdefinition:\d+view_name

EXAMPLE:

CREATEVIEWtopten

ASSELECTname,rank,gender,year

FROMnames,rank

WHERErank<’11’ANDnames.id=rank.id;

SELECT*FROMtoptenORDERBYyear,rank;索引在Greenplum數(shù)據(jù)庫(kù)中應(yīng)謹(jǐn)慎創(chuàng)建索引索引不一定都能優(yōu)化查詢應(yīng)測(cè)試索引是否真正提升了性能刪除沒(méi)用的索引PRIMARYKEY索引會(huì)自動(dòng)創(chuàng)建唯一性索引只能在分布鍵字段創(chuàng)建索引(續(xù))索引類型:B-tree

Bitmap索引相關(guān)SQL命令:

CREATEINDEXALTERINDEXDROPINDEXREINDEX

PSQLTips:在PSQL顯示所有索引:

\di顯示索引定義:\d+index_name

大批量ETL加工最好不建索引,對(duì)性能提升作用不大B-TREE適用每次通過(guò)單一字段篩選查詢少量數(shù)據(jù)B-MAP適用每次通過(guò)多個(gè)字段篩選查詢大量數(shù)據(jù)其他數(shù)據(jù)庫(kù)對(duì)象FunctionsandoperatorsSequencesTriggersTablespaces數(shù)據(jù)類型常用數(shù)據(jù)類型CHAR,VARCHAR,TEXTSmallint,integer,bigintNumeric,real,doubleprecisionTimestamp,date,timeBooleanArray

類型。如

integer[]其它數(shù)據(jù)類型請(qǐng)參考常用系統(tǒng)表及視圖所有系統(tǒng)表在pg_catalogschema標(biāo)準(zhǔn)PostgreSQL系統(tǒng)表(pg_*)常用系統(tǒng)表:pg_stat_activitypg_tablespg_class

pg_attributepg_namespace在psql顯示所有系統(tǒng)表:\dtSPsql顯示所有系統(tǒng)視圖:\dvS其它c(diǎn)atalog參考

函數(shù)日期函數(shù)Extract(day|month|year。。。Fromdate);Selectdate+‘1day’::interval,date+‘1month’::intervalSELECTdate_part('day',TIMESTAMP'2001-02-1620:38:40');Result:16SELECTdate_trunc('hour',TIMESTAMP'2001-02-1620:38:40');Result:2001-02-1620:00:00pg_sleep(seconds);系統(tǒng)日期變量Current_dateCurrent_timeCurrent_timestampNow()Timeofday()在

事務(wù)中發(fā)生變化,以上函數(shù)在事務(wù)中不變函數(shù)字符串處理函數(shù)Substr/length/lower/upper/trim/replace/positionrPad/lpadTo_char,||(字符串連接)substringlike,simillar

to(模式匹配)其它雜類Case。。。When/Coalescenullifgenerate_seriesIn/notin/exists/any/allBuilt-InFunctions(SELECT)FunctionDescriptionExampleCURRENT_DATEReturnsthecurrentsystemdate2006-11-06CURRENT_TIMEReturnsthecurrentsystemtime16:50:54CURRENT_TIMESTAMPReturnsthecurrentsystemdateandtime2008-01-0616:51:44.430000+00:00LOCALTIMEReturnsthecurrentsystemtimewithtimezoneadjustment19:50:54LOCALTIMESTAMPReturnsthecurrentsystemdateandtimewithtimezoneadjustment2008-01-0619:51:44.430000+00:00CURRENT_ROLE

ROLEReturnsthecurrentdatabaseuserjdoeMathematicalFunctionsFunctionReturnsDescriptionExampleResults+-*/sameAdd,Subtract,Multiply&Divide1+12%IntegerModulo10%20^SameExponentiation2^24|/NumericSquareRoot|/93||/NumericCubeRoot||/82!NumericFactorial!36&|#~NumericBitwiseAnd,Or,XOR,Not91&1511<<>>NumericBitwiseShiftleft,right1<<48>>2162MathematicalFunctions(Continued)FunctionReturnsDescriptionExampleResultsabssameAbsoluteValueabs(-998.2)998.2ceiling(numeric)NumericReturnssmallestintegernotlessthanargumentceiling(48.2)49floor(numeric)NumericReturnslargestintegernotgreaterthanargumentfloor(48.2)48pi()NumericTheπconstantpi()3.1419…random()NumericRandomvaluebetween0.0and1.0random().87663round()NumericRoundtonearestintegerround(22.7)23StringFunctionsFunctionReturnsDescriptionExampleResultsstring||stringTextStringconcatenation‘my’||‘my’‘mymy’char_length(string)Integernumberofcharsinstringchar_length(‘mymy’)4position(stringinstring)IntegerLocationofspecifiedsub-stringposition(‘my’in‘ohmy’)3lower(string)TextConvertstolowercaselower(‘MYMY’)‘mymy’upper(string)TextConvertstouppercaseupper(‘mymy’)‘MYMY’substring(stringfromnforn)TextDisplaysportionofstringsubstring(‘myohmy’from3for2)‘oh’trim(both,leading,trailingfromstring)TextRemoveleadingand/ortrailingcharacterstrim(‘mymy‘)‘mymy’StringFunctions(Continued)FunctionReturnsDescriptionExampleResultsinitcap(string)TextChangescaseinitcap(‘mymy’)‘MyMy’length(string)IntegerReturnsstringlengthlength(‘mymy’)4split_part(string,delimiter,occurrence)TextSeparatesdelimitedlistsplit_part(‘one|two|three’,’|’,2)‘two’DateFunctionsFunctionReturnsDescriptionExampleResultsage(timestamp,timestamp)TimestampDifferenceinyears,monthsanddaysage(‘2008-08-12’timestamp,current_timestamp)0years1month11daysextract(fieldfromtimestamp)IntegerReturnsyear,month,day,hour,minuteorsecondextract(dayfromcurrent_date)11now()TimestampReturnscurrentdate&timenow()2008-09-2211:00:01overlapsBooleanSimplifiescomparingdaterangesWHERE(‘2008-01-01’,’2008-02-11’)overlaps(‘2008-02-01’,’2008-09-11’)TRUE存儲(chǔ)過(guò)程Greenplum支持SQL/PYTHON/PERL/C語(yǔ)言構(gòu)建函數(shù),以下著重介紹SQL

存儲(chǔ)過(guò)程。一個(gè)存儲(chǔ)過(guò)程就是一個(gè)事務(wù),包括對(duì)子過(guò)程的調(diào)用都在一個(gè)事務(wù)內(nèi)存儲(chǔ)過(guò)程結(jié)構(gòu):CREATEFUNCTIONsomefunc()RETURNSintegerAS$$DECLAREquantityinteger:=30;BEGINRETURN;END;$$LANGUAGEplpgsql;賦值給一個(gè)變量或行/記錄賦值用下面方法:identifier:=expression例子:user_id:=20;執(zhí)行一個(gè)沒(méi)有結(jié)果的查詢:PERFORMquery;一個(gè)例子:PERFORMcreate_mv('cs_session_page_requests_mv',my_query);存儲(chǔ)過(guò)程請(qǐng)參考:存儲(chǔ)過(guò)程動(dòng)態(tài)SQLEXECUTEcommand-string[INTO[STRICT]target];SELECTINTOExample:SELECTIDINTOVAR_IDFROMTABLEA獲取結(jié)果狀態(tài)GETDIAGNOSTICSvariable=item[,...];一個(gè)例子: ·GETDIAGNOSTICSinteger_var=ROW_COUNT;SQL返回變量SQLERRM(SQL出錯(cuò)信息),

SQLSTATE(SQL執(zhí)行返回狀態(tài)編碼)控制結(jié)構(gòu)IF...THEN...ELSEIF...THEN...ELSELOOP,EXIT,CONTINUE,WHILE,FOR從函數(shù)返回有兩個(gè)命令可以用來(lái)從函數(shù)中返回?cái)?shù)據(jù):RETURN和RETURNNEXT。Syntax:RETURNexpression;設(shè)置回調(diào)EXECSQLWHENEVERcondition

action;condition

可以是下列之一:SQLERROR,SQLWARNING,NOTFOUND存儲(chǔ)過(guò)程異常處理EXCEPTIONWHENunique_violationTHEN--donothingEND;忽略錯(cuò)誤:EXCEPTIONWHENOTHERSTHEN

RAISENOTICE'anEXCEPTIONisabouttoberaised';

RAISEEXCEPTION'NUM:%,DETAILS:%',SQLSTATE,SQLERRM;END;錯(cuò)誤和消息RAISElevel'format'[,expression[,...]];Level:Info:信息輸入Notice:信息提示Exception:產(chǎn)生一個(gè)例外,將退出存儲(chǔ)過(guò)程Example:RAISENOTICE'Callingcs_create_job(%)',v_job_id;OVER(PARTITIONBY…)ExampleSELECT*,row_number()OVER()FROMsaleORDERBYcn;SELECT*,row_number()OVER(PARTITIONBYcn)FROMsaleORDERBYcn;row_number|cn|vn|pn|dt|qty|prc++++++1|1|10|200|1401-03-01|1|02|1|30|300|1401-05-02|1|03|1|50|400|1401-06-01|1|04|1|30|500|1401-06-01|12|55|1|20|100|1401-05-01|1|01|2|50|400|1401-06-01|1|02|2|40|100|1401-01-01|1100|24001|3|40|200|1401-04-01|1|0(8rows)row_number|cn|vn|pn|dt|qty|prc++++++1|1|10|200|1401-03-01|1|02|1|30|300|1401-05-02|1|03|1|50|400|1401-06-01|1|04|1|30|500|1401-06-01|12|55|1|20|100|1401-05-01|1|06|2|50|400|1401-06-01|1|07|2|40|100|1401-01-01|1100|24008|3|40|200|1401-04-01|1|0(8rows)OVER(ORDERBY…)ExampleSELECTvn,sum(prc*qty)FROMsaleGROUPBYvnORDERBY2DESC;SELECTvn,sum(prc*qty),rank()OVER(ORDERBYsum(prc*qty)DESC)FROMsaleGROUPBYvnORDERBY2DESC;vn|sum|rank++40|2640002|130|180|250|0|320|0|310|0|3(5rows)vn|sum+40|264000230|18050|020|010|0(5rows)事務(wù)事務(wù)將多個(gè)語(yǔ)句捆綁為‘a(chǎn)ll-or-nothing’操作事務(wù)相關(guān)命令BEGINorSTARTTRANSACTIONENDorCOMMITROLLBACKSAVEPOINTandROLLBACKTOSAVEPOINTPsql中設(shè)置自動(dòng)提交模式:\setautocommiton|off目錄Greenplum概述和數(shù)據(jù)分布數(shù)據(jù)加載和外部表客戶端工具Greenplum數(shù)據(jù)庫(kù)基礎(chǔ)Greenplum與Oracle優(yōu)化策略其他要點(diǎn)及示例數(shù)據(jù)類型DataType

ORACLE

GreenplumNumericNUMBER(p,s)SMALLINT(2bytes)INTEGER(4bytes)BIGINT(8bytes)DECIMAL(p,s)(11+p/2bytes)NUMERIC(p,s)(11+p/2bytes)REAL(4bytes)DOUBLE(8bytes)CharacterCHARandNCHARCHARVARCHAR2andNVARCHAR2VARCHARDate&TimeDATE(includestimetosec)DATEorTIMESTAMPwithoutTimeZoneTIMESTAMPTIMESTAMPINTERVALINTERVAL/TIMEBinaryBFILE(>1GB)LargeObjects(upto2GB)RAWBFILE(<1GB)BYTEACLOBandNCLOBTEXTGreenplum與Oracle比較(1)DifferenceOracleGreenplumDUALSELECT1+1FROMDUALSELECT1+1;or

CREATEVIEWdualASSELECT'X'::VARCHAR(1)ASDUMMY;

SELECT1+1FROMdual;NEXTVALSELECTA_TABLE_SEQUENCE.NEXTVALFROMDUALSELECTNEXTVAL('A_TABLE_SEQUENCE')FROMDUALROWNUMSELECT*FROMAGE_TYPEWHEREROWNUM<=5SELECT*FROMAGE_TYPE

LIMIT5OFFSET0SELECT*FROMAGE_TYPEWHERECODEISNOTNULL

ANDROWNUM<=5

ORDERBYCODEDESCSELECT*FROMAGE_TYPE

WHERECODEISNOTNULL

ORDERBYCODEDESC

LIMIT5OFFSET0ASSELECTA.COL1A_COL1,

A.COL2A_COL2

FROMA_TABLEASELECTA.COL1ASA_COL1,

A.COL2ASA_COL2

FROMA_TABLEAGreenplum與Oracle比較(2)DifferenceOracleGreenplum(+)SELECT*FROMA_TABLEA,B_TABLEB

WHEREA.ID(+)=B.IDSELECT*FROMA_TABLEA

RIGHTOUTERJOINB_TABLEB

ONA.ID=B.IDSELECT*FROMA_TABLEA,B_TABLEB

WHEREA.ID(+)=B.ID

ANDA.COL1='COL1_VALUE'SELECT*FROMA_TABLEA

RIGHTOUTERJOINB_TABLEB

ONA.ID=B.IDANDA.COL1='COL1_VALUE'SELECT*FROMA_TABLEA,B_TABLEB,C_TABLEC,D_TABLED

WHEREA.ID=B.ID(+)AND

A.ID=C.ID(+)AND

A.COL1=D.COL1SELECT*FROM(A_TABLEA

LEFTOUTERJOINB_TABLEB

ONA.ID=B.ID)

LEFTOUTERJOINC_TABLEC

ONA.ID=C.ID,D_TABLED

WHEREA.COL1=D.COL1SELECT*FROMA_TABLEA

WHEREA.COL1(+)=0AND

A.COL2(+)='A_VALUE2'SELECT*FROMA_TABLEA

WHEREA.COL1=0AND

A.COL2='A_VALUE2'Greenplum與Oracle比較(3)DifferenceOracleGreenplumNVLSELECTNVL(SUM(VALUE11),0)FS_VALUE1,

NVL(SUM(VALUE21),0)FS_VALUE2

FROMFIELD_SUMSELECTCOALESCE(SUM(VALUE11),0)ASFS_VALUE1,

COALESCE(SUM(VALUE21),0)ASFS_VALUE2

FROMFIELD_SUMTO_NUMBERSELECTCOL1

FROMA_TABLE

ORDERBYTO_NUMBER(COL1)SELECTCOL1

FROMA_TABLE

ORDERBYTO_NUMBER(COL1,999999)

[note:'999999'islengthofCOL1]DECODESELECTDECODE(ENDFLAG,'1','A','B')ENDFLAG

FROMTESTSELECT(CASEENDFLAG

WHEN'1'THEN'A'ELSE'B'END)ASENDFLAG

FROMTEST||SELECTNULL||'-'||NULLASVALUES1

FROMDUALSELECTCOALESCE(NULL,'')||'-'||COALESCE(NULL,'')ASVALUES1

FROMDUALGreenplum與Oracle比較(4)DifferenceOracleGreenplumSYSDATEUPDATEA_TABLE

SETENTREDATE=SYSDATEUPDATEA_TABLE

SETENTREDATE=CURRENT_TIMESTAMP;or

UPDATEA_TABLE

SETENTREDATE=TO_TIMESTAMP(CURRENT_TIMESTAMP,'YYYY-MM-DDHH24:MI:SS')SELECTTO_DATE(SYSDATE,'YYYY-MM-DD')ASDAY

FROMDUALSELECTTO_DATE(CURRENT_DATE,'YYYY-MM-DD')ASDAY

FROMDUALaggregateSELECTROUND(AVG(SUM(BASICCNT1)))BASICCNT

FROMACCESS_INFO_SUM1_V

WHEREYEARCODEBETWEEN'200305'AND'200505'

GROUPBYSCCODESELECTROUND(AVG(AIV.BASICCNT))ASBASICCNT

FROM(SELECTSUM(BASICCNT1)ASBASICCNT

FROMACCESS_INFO_SUM1_V

WHEREYEARCODEBETWEEN'200305'AND'200505'

GROUPBYsccode

)AIVCEILSELECTCEIL(SYSDATE-TO_DATE('2005102714:56:10','YYYYMMDDHH24:MI:SS'))ASDAYS

FROMDUALSELECTEXTRACT(DAYFROM(TO_TIMESTAMP(CURRENT_TIMESTAMP,'YYYY-MM-DD-HH24-MI-SS')-TO_TIMESTAMP('2005-10-2714:56:10','YYYY-MM-DD-HH24-MI-SS')))+1ASDAYS

FROMDUALGreenplum與Oracle比較(5)DifferenceOracleGreenplum「"」SELECTLENGTH('')ASVALUE1FROMDUAL

[Result]VALUE1=NULLSELECTLENGTH('')ASVALUE1FROMDUAL

[Result]VALUE1=0SELECTTO_DATE('','YYYYMMDD')ASVALUE2

FROMDUAL

[Result]VALUE2=NULLSELECTTO_DATE('','YYYYMMDD')ASVALUE2

FROMDUAL

[Result]VALUE2=0001-01-01BCSELECTTO_NUMBER('',1)ASVALUE3FROMDUAL

[Result]VALUE3=NULLSELECTTO_NUMBER('',1)ASVALUE3FROMDUAL

[Result]cannotexecuteINSERTINTOTEST(VALUE4)VALUES('')

[Result]VALUE4=NULL

[note:VALUE4isnumerictype]INSERTINTOTEST(VALUE4)VALUES('')

[Result]VALUE4=0

[note:VALUE4isnumerictype]INSERTINTOTEST(VALUE5)VALUES('')

[Result]VALUE5=NULL

[note:VALUE5ischaractertype]INSERTINTOTEST(VALUE5)VALUES('')

[Result]VALUE5=''

[note:VALUE5ischaractertype,lengthequal0]INSERTINTOTEST(VALUE6)VALUES(TO_DATE('','YYYYMMDD'))

[Result]VALUE6=NULL

[note:VALUE6istimetype]INSERTINTOTEST(VALUE6)VALUES(TO_DATE('','YYYYMMDD'))

[Result]VALUE6=0001-01-01BC

[note:VALUE6istimetype]Greenplum與Oracle比較(6)DifferenceOracleGreenplumNULLIFnotsupportNULLIFSELECTNULLIF(VALUE1,VALUE2)ASCOL1FROMDUALCONCATCONCAT(CHAR,CHAR)CREATEFUNCTIONCONCAT(CHAR,CHAR)

RETURNSCHARAS'SELECT$1||$2'LANGUAGE'sql';ADD_MONTHSADD_MONTHS(date,int)CREATEFUNCTIONadd_months(date,int)

RETURNSdateAS

'SELECT($1+($2::text||''months'')::interval)::date;'

LANGUAGE'sql';LAST_DAYLAST_DAY(DATE)CREATEFUNCTIONLAST_DAY(DATE)

RETURNSDATEAS

'SELECTdate(substr(text($1+interval(''1month'')),1,7)||''-01'')-1'

LANGUAGE'sql';MONTH_BETWEENMONTH_BETWEEN(DATA,DATA)CREATEFUNCTIONMONTH_BETWEEN(DATA,DATA)

RETURNSNUMERICAS

'SELECTto_number((date($1)-date($2)),''999999999'')/31'

LANGUAGE'sql';BIN_TO_NUMSELECTBIN_TO_NUM(1,0,1,0)ASVALUE1FROMDUALSELECTCAST(B'1010'ASINTEGER)ASVALUE1BITANDBITAND(int,int)SELECTint&int目錄Greenplum概述和數(shù)據(jù)分布數(shù)據(jù)加載和外部表客戶端工具Greenplum數(shù)據(jù)庫(kù)基礎(chǔ)Greenplum與Oracle優(yōu)化策略其他要點(diǎn)及示例查看執(zhí)行計(jì)劃查看sql執(zhí)行計(jì)劃的命令:EXPLAIN<query>EXPLAINANALYZE<query>查看執(zhí)行計(jì)劃:一般從最后面往上查看以下命令可能會(huì)包含Gather,Redistribute,Broadcast等操作JoinsSortsAggregations每步操作給出如下度量:cost(unitsofdiskpagefetches)rows(rowsoutputbythisnode)width(bytecountofthewidestrowproducedbythisnode)執(zhí)行計(jì)劃(EXPLAIN)EXPLAINSELECT*FROMnamesWHEREname='Joelle';

QUERYPLANGatherMotion2:1(slice1)(cost=0.00..20.88rows=1width=13)->SeqScanon'names'(cost=0.00..20.88rows=1width=13)Filter:name::text~~'Joelle'::textSQLqueryFilterconditionGathermotionCost,rows,andwidth執(zhí)行計(jì)劃(EXPLAINANALYZE)EXPLAINANALYZESELECT*FROMnamesWHEREname='Joelle';QUERYPLANGatherMotion2:1(slice1)(cost=0.00..20.88rows=1width=13)recv:Total1rowswith0.305mstofirstrow,0.537mstoend.->SeqScanon'names'(cost=0.00..20.88rows=1width=13)

Total1rows(seg0)with0.255mstofirstrow,0.486mstoend.Filter:name::text~~'Joelle'::text22.548mselapsed1segment

returnedrowsActualtimeto

runthequery1rowreturned

tomaster優(yōu)化策略(1)1、數(shù)據(jù)分布-選用合適字段作為DistributionKey(DK),盡量做到平均分布Selectgp_segment_id,count(*)fromtablenamegroupby1;例1:優(yōu)化策略(2)2、盡量選擇常用連接條件或Groupby列作為DistributionKey,最好只用一個(gè)字段作為DK,并且DK列的distinct值越多越好。例2:insertintotablec(auction_id,….)select*fromtablealeftjointalbebontablea.selid=tablec.id;優(yōu)化前耗時(shí)120秒優(yōu)化方式:將tablea的DistributionKey改為selid,tableb的改為id,tablec的DistributionKey改為auction_id優(yōu)化后耗時(shí)88秒,提升了32秒

修改DK的語(yǔ)法:ALTERTABLEnameSETDISTRIBUTEDBY(column,[...]);ALTERTABLEnameSETWITH(REORGANIZE=true);

3、采用Createtable代替Insertinto;盡量對(duì)中間過(guò)程使用臨時(shí)表;

數(shù)據(jù)庫(kù)內(nèi)有很多表都是全量更新的,因此可以用Createtable來(lái)代替Insertinto,性能能夠大幅提高。例2中的SQL,改成Createtabletablecas(select*fromtablealeftjointablebontablea.selid=tablec.id)distributedby(auction_id),優(yōu)化后耗時(shí)為65秒,提升了13秒優(yōu)化策略(3)4、定期收集統(tǒng)計(jì)信息和執(zhí)行Vacuum定期收集統(tǒng)計(jì)信息,可以優(yōu)化SQL執(zhí)行路徑;sytax:analyzetable;數(shù)據(jù)加載后執(zhí)行Vacuum,或者采用重創(chuàng)建表的方式來(lái)釋放垃圾數(shù)據(jù),可以提高SQL效率;對(duì)系統(tǒng)的數(shù)據(jù)字典也需要定期做vacuum由于GPDB使用的是MVCC事務(wù)并發(fā)模型,被刪除或更新的數(shù)據(jù)行依然占據(jù)著物理磁盤(pán)空間,即便它們對(duì)于新的事務(wù)已經(jīng)不可見(jiàn)。如果數(shù)據(jù)庫(kù)有大量的更新和刪除,會(huì)產(chǎn)生大量過(guò)期記錄。VACUMM命令還會(huì)收集表級(jí)別的統(tǒng)計(jì)信息,如行數(shù)和頁(yè)面數(shù)。ANALYZE命令收集查詢規(guī)劃器需要用到的列統(tǒng)計(jì)信息。VACUUM和ANALYZE操作可以在同一個(gè)命令中一起運(yùn)行。例:=#VACUUMANALYZEmytable;

5、SQL合并在Greenplum中,建議將多個(gè)表的join和嵌套子查詢SQL適當(dāng)合并為一個(gè)SQL實(shí)現(xiàn),可以減少IO,達(dá)到提高性能的目的

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論