2022 數(shù)據(jù)峰會(huì) 數(shù)據(jù)庫(kù) -面向未來(lái)的數(shù)據(jù)倉(cāng)庫(kù) 理查德·溫特_第1頁(yè)
2022 數(shù)據(jù)峰會(huì) 數(shù)據(jù)庫(kù) -面向未來(lái)的數(shù)據(jù)倉(cāng)庫(kù) 理查德·溫特_第2頁(yè)
2022 數(shù)據(jù)峰會(huì) 數(shù)據(jù)庫(kù) -面向未來(lái)的數(shù)據(jù)倉(cāng)庫(kù) 理查德·溫特_第3頁(yè)
2022 數(shù)據(jù)峰會(huì) 數(shù)據(jù)庫(kù) -面向未來(lái)的數(shù)據(jù)倉(cāng)庫(kù) 理查德·溫特_第4頁(yè)
2022 數(shù)據(jù)峰會(huì) 數(shù)據(jù)庫(kù) -面向未來(lái)的數(shù)據(jù)倉(cāng)庫(kù) 理查德·溫特_第5頁(yè)
已閱讀5頁(yè),還剩56頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

HowtoSelectYour

CloudDataWarehousePlatformStrategically

May2022

RichardWinter,CEO

WinterCorpLLC

TheExpertsinAnalyticDataManagementatScale

WC

RichardWinter

Presenter

RichardWinter,CEOandPrincipalConsultant,WinterCorp

?Careerfocusinanalyticdatamanagementatlargescale

?Pioneeringdeveloperofdatawarehousetechnologyandproducts

?Hasperformedmorethan50datawarehouseevaluations,oftenincludingbenchmarks,forleadingenterprises

?Advisertoindustryleadersinanalytics,userandvendor

?Skilledatdatawarehousestrategies,requirements,strategicdesign,engineeringanalysis,testdesignandevaluation

?Experiencedconsultantandtechnologyeducator,widelypublished

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC2

TheDataDrivenEnterpriseofthe2020s

MostCompaniesareIncreasingUseof:

1.Digital,realtimebusinessprocesses

2.Automateddecisionmaking(AI/ML)

3.Focusoncompetingwithanalytics

Alltrendsacceleratedbytheincreasingspeedofbusinesscompetition

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC3

SourceSystems

Orders

Report1

Products

Products

Report2

Suppliers

ReportN

Shipments

Cleanseddata,integrated

acrosstheenterprise

WeeklyLoads

TheDataWarehouseImpetus:1990s

Orders

CONSISTENT,ACCURATEREPORTING

DataWarehouse

Inventory

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC4

ANALYZING

WHYdidit

happen?

REPORTING

WHAT

happened?

A

STRATEGICINTELLIGENCE

AT

PREDICTING

WHATWILL

happen?

StagesofDataWarehouseEvolution

OPERATIONALINTELLIGENCE

OPERATIONALIZING

WHATIS

happeningnow?

ACTIVATING

MAKEithappen!

A

T

A

T

Batch

AdHoc

AnalyticsContinuousupdates, tacticalqueriesEventdriven

A

T

Source:DiagramadaptedfromBrobst&Rarey,2003 ?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.WWCC5

WC

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

LeadingEdge–200%CAGR

EB

5

4

3

2

Estimated

1

2010201420182022

Data

Warehouse

Growth

Source:WinterCorpPrimaryResearch–surveysandcustomerreports

WC6

WC

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

Key

Dimensions

ofData

Warehouse

Scale

WC7

Averagecommercialbuilding(20ksf)

BurjKhalifa,Dubai–TallestBuildingintheWorld

Average

single

family

home

(2500sf)

?2717feettall

?3.3millionsquarefeetinteriorspace

?57elevators

?62milesofpipesforplumbing(plus132milesforthefireemergencysystem)

?24kwindows

?62khpcoolingsystem

?1100xthesizeofasinglefamilyhome

?165xthesizeoftheaveragecommercialbuilding

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC8

SCALE+COMPLEXITY

Ifyouonlyrememberonethingtoday…

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC9

DataMartvsIntegratedDataWarehouse(IDW)

?Purposebuilt

?Onelargetable

?Dimensionsusuallysmall

?1-1or1-to-manyrelationships

?Commonpartitioningkeys

?Mostjoinsnotsodemanding

?User-visiblepre-joinscanbeok

StarSchema–DataMart

Store

PK

storeid

storename

street

city

state

zip

managerid

storeclass

storesize

salesstrategy

buildingtype

dateopened

dateclosed

numberofregisterscachementpopulation

SMSA

censustract

Transaction

PK

transid

transtype

date

time

productid

storeid

customerid

associateid

basketid

quantity

unitprice

discount

taxcategory

taxamount

extendedpricepaymentmethodaccountid

Product

PK

productid

namedescription

SKU

package

size

color

style

attr1-n

Customer

PK

customerid

name

street

city

state

zip

phone

email

maritalstatusfamilysizeagegroupincomegroupeducationallevelgenderbirthyear

attr1-n

?Manylargetableswithmultiplejoinpaths

?Many-to-manyrelationships

?Differentpartitioningkeys

?Joinsaremoredemanding

?User-visiblepre-joinsareaproblem

RelationalSchema-IDW

Organization

PK

FK1

name

street

city

state

zip

country

phone

fax

emailwebaddressindustrycodenumberofempbusinesssizestatus

startdate

enddate

orgtype

partyID

Person

PK

personID

FK1

FK2

name

street

city

state

zip

phone

email

gender

dob

incomelevelstatus

startdateenddatehouseholdIDpartyID

Household

PK

householdID

FK1

street

ctiy

state

zip

phone

householdsizeincomelevelpartyID

Party

PK

partyID

partytypestartdateenddate

Arrangement

PK

arrangementID

arrangementtypenamedescription

startdate

enddate

notes

Resource

PK

resourceID

FK1

resourcetypestartdataenddatenamedescriptionarrangementID

Location

PK

locationID

FK1

locationtype

name

street

city

state

zip

partyID

Product

PK

productID

FK2

productclassnamedescriptionproductstatusarrangementID

Conditions

PK

conditionID

FK1

conditiontypestartdateenddateamountarrangementID

Statement

PK

statement_id

FK1

balancepast_due_amountpastduedayseventID

Event

PK

eventID

FK1

eventtypeeventdateeventtimeeventamountaccountID

Relationship

PK

relationshipID

FK1

FK2

FK3

rel_typeaccountIDarrangementIDpartyID

Account

PK

accountID

accounttypestartdateenddateaccountstatus

orgID

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC10

IllustrativeDataWarehouseUses:FinancialServices

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC11

WC

12

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

TheDataRelationshipsinanIntegratedDataWarehouse

?Analysisofactualdatarelationshipsinanintegrateddatawarehouse,largeUScompany

?Dotsaretablesorviews

?Linesaredatarelationshipsthatactuallyoccurinqueries

?Diagramillustratesthatacombinatorialexplosionofjoinsdevelopsasthebusinessvalueofthedatarelationshipsisexploited

WC

IntegratedDataWarehouse

TYPICALCOREREQUIREMENTS

?Supportefficient(bigtable,bigtable)joinswithoutco-locationorpre-join(Toomanydifferentjoinsarerequired)

?Oftenlargerscale

?Supportcomplexqueriesalongavarietyofjoinpaths

?Effectivecomplexqueryoptimization

?Efficient,fasttacticalquery

?Managecomplex,mixedworkloads

?Supportmanyconcurrentoperations

?Supportupdatesthroughoutthedayaswellasanynightlybatchruns

?High,ifnotcontinuous,availabilityofup-to-datedata

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC13

JoinPerformanceatScale

?Challengingproblemiswhenthesetstobematchedarelarge

?Costandtimearemanytimesgreaterwhen:

—Techniquesarenotscalable

—Intermediateresultsspilltodisk

?Customerscanbemisledwhenthesamequeriesworkfineatsmallerscale

?Dominatestheperformancepictureinmanydatawarehouses

?Allpopulardatawarehouseplatformsperformsomejoinswell

?Manyperformthelargerandmorecomplexjoinspoorly

?Differencebetweenthebestandtheworstisoftenverylarge

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC14

JoinTestDatabase

?TPC-Hbenchmarkdatabase

?ScaleFactors:10,100,1000,10000

?AtSF1databaseis1GB

TableName

RowCount

RowSize

(Bytes)

EstimatedTableSize(MB,Uncompressed)

LineItem

6,000,000

121

725

Order

1,500,00

110

165

Part

200,00

120

24

Part-Supplier

800,000

143

114

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC15

Orders

X4

6millionrows

660MB

Part-Supplier

800Krows

114MB

N

Part

200Krows

24MB

4

N

N

1

1

DatabaseQuery–BigJoin

AtSF1,resultingjoinisatablewith:

?24millionrows

?450bytesperrow

?≈10.8GBtotalsize,uncompressed

Allsizesincreaseproportionallywithscalefactor

Line

Item

6millionrows

725MB

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC16

QueryPerformanceandCostvsDataSize

?Runonsmallvirtualdatawarehouse

?Twoservers,eachwith8virtualprocessors

DataSize

ResponseTime

(Seconds)

OnDemand

Cost

Comparedto

LinearGrowth

10GB

156

$0.09

-

100GB

1626

$0.90

104%

1TB

30,656

$17

196%

10TB

736,072

$409

454%

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC17

$17at1TB

(196%ofprojectedcost)

QueryCostisWorseThanLinearwithDataSize

POPULARCLOUDDATAWAREHOUSESERVICE

QueryCost($)

$409at10TB

Source:WinterCorpCloudDatabaseLabTestingonGCP

400

(472%ofprojectedcost)

320

240

ActualProjected

160

$0.90at100GB

Shouldbe$90at10TBGB

80

DataSize(GB)

100

1000

500

Shouldbe

$9at1TB

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC18

BigJoinExperiment:Interpretation

?Forthesystemtested,causeoftheperformanceproblemwasthatintermediatequeryresultsweresignificantlylargerthanSSDperserver

?Largeintermediateresultsoccurwith:

—Joinofmultiplebigtables—commonintheintegrateddatawarehouse

—Self-joinofalargetable—commoninpathanalytics,networkanalytics,IoT

?Specificquerypatternoccursinpractice,thoughinfrequently

?Sameproblemoccursinnconcurrentquerieseachneedcreate(1/n)intermediateresults—canoccurinawiderangeofusecases

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC19

15

?Independentconsultant

?Differentrequirement

ProductB

12

mightfavoradifferentproduct

9

WC

20

AnnualCloudChargesfortheSameDataWarehouse

WorkloadwithThreePopularCloudDWEngines

?Realcustomerdatabase&workload

$Millions

ProductA

6

Nooneproductis

alwaysbest!!!

ProductC

3

Source:BEZNextWhitePaper-WhichPlatformIsBestforYourCloudDataWarehouse

/#readthepaper

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC

DataWarehousingintheCloud

THEPROMISE

—Agility

—Elasticity

—Cost

—Scalability

—WorkloadManagement

Thesub-text:don’tworry,moreresourcesarealwaysavailable

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC21

DataWarehousingintheCloud

THEREALITY

?Datawarehouseengineeringchallengesarestillfundamental—Performance

—Cost

—DataAvailability

majorriskareas

hardtodeliveratenterprisescaleandcomplexity

?Resourcesnotfree

?Doublingtheresourcesoftendoesn’tdoubletheperformanceorthroughput

?Scalerisesfast

DatawarehouseproductsdiffergreatlyinperformanceandcostandtodaywearefocusingonBIGdifferences(usually>3x)

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC22

WWCC

MostWidelyUsedCloudDataWarehousePlatforms

Vendor

AmazonWeb

Services

(AWS)

Cloudera

Microsoft

Google

Oracle

Snowflake

Teradata

Product/Service

Redshift

ClouderaData

Warehouse

AzureSynapseAnalytics

BigQuery

AutonomousDataWarehouse(ADW)

SnowflakeDataPlatform

Vantage

CloudAvailability

AWS

AWS,Azure,GCP

Azure

GCP

OracleCloud(AWSNotAs-a-Service)

AWS,Azure,GCP

AWS

GCPAnnounced

MicrosoftAzure

TeradataIntellicloud

OnPremisesAvailability

No

Yes,onHadoop

MicrosoftSQLServer

(Similarbutnotidentical)

No

Yes

Oracle

Cloud@Customer

No

Yes,optionallywithconsumptionpricing

Serverless

No

No

No

Yes

Twomodes:dedicatedandshared(~serverless)

ServerT-ShirtSizes

No

PricingModel

Default:ondemand

pricingbythesecond;

reservedinstance

discount(1-3yearterm);

payseparatelyfordata

storage

Priceperhourperserver

forsoftware;customer

payscloudprovider

separatelyforcompute

andstorage

Separatecomputeand

storage.Payfordata

stored,withflexible

scaling.Payforthe

computelevel,with

optiontopausecompute

entirely.Monthlypricing

(with3yearterm)

Default:ondemandpricing-

payperquerybasedondata

scanned;flatratepricingfor

reserved"queryslots"(a

blendedresourceunit);pay

separatelyfordatastorage

Unitpricingfor"shared"

publiccloudusefor

computeandstorage;

dedicatedhardwarewith

licenseportability

availableinpubliccloud

andonpremise;private

cloudonpremise

availableonsubscription

basis.

Separationofcompute

andstorage;"Snowflake

onDemand"persecond

pricing;reservedcapacity

availableonrequest;3

editions

Licenseportability;

consumptionpricing--

payonlyforcompleted

queries(cloudandon

prem);customernot

chargedforoverhead

operationssuchas

statistics;alsooffer

capacitypricing;

?2020,2021WinterCorpLLC,TyngsboroMA

23

WC

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

The

Datanolongerneedstobein

onecentralstore:cloud,onprem,multi-cloud,allofabove

Modern

Data

Warehouse

WC24

TheModernAnalyticDataPlatform

DataLake

Alltypesofdata

including“unstructured”

(largevideo,scans,etc.)

Managesalltypesofdataforanalyticuseincludingdatathatis/has:

?largeandvariableinsize

?relatively“unstructured”

?lowervaluedensity

?lessshareduse

?variedlevelsofcuration

?colder

Data&Requests

Data&Requests

Data&requestsflowinboth

directions

Boundariesareunclearand

changing

Maybeitwillsoonbeone

“dataplatform”–sometimes

positionedasthe“datacloud”

DataWarehouse

Manytypesofdata,mosthassomestructure

Managesmanytypesof

dataforanalyticusethat

ismost:

?intensivelyused

?widelyshared

?highlycurated

?integrated

?valuable

DataLakehousehassomecharacteristicsofbothlakeandwarehouse

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC25

Industry

Trends

?QuantifiedRequirementsare:

—DatabaseMacro

structure

Projected

Growth**

—Scale

—Workload(e.g.,queryclassesandfrequencies)

—Userpopulation

—ServiceLevels

Estimateinranges,but

estimate!!

DefiningQuantifiedArchitecturalRequirements

Business**Interests/Vision

CurrentNeeds

Projected

Requirements

(Quantified)

?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.

WC26

DoaQuantifiedEvaluation,withMeasurement

WorkingfromyourEstimateofMacro-Requirements

1.Canthecandidateplatformsperformatyourtargetlevelsofscaleandcomplexity?

?Notonlyfeatures,butanengineeringassessment

?Requireproductionreferenceswithsimilarmacro-structureandscale

2.Criti

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論