版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
HowtoSelectYour
CloudDataWarehousePlatformStrategically
May2022
RichardWinter,CEO
WinterCorpLLC
TheExpertsinAnalyticDataManagementatScale
WC
RichardWinter
Presenter
RichardWinter,CEOandPrincipalConsultant,WinterCorp
?Careerfocusinanalyticdatamanagementatlargescale
?Pioneeringdeveloperofdatawarehousetechnologyandproducts
?Hasperformedmorethan50datawarehouseevaluations,oftenincludingbenchmarks,forleadingenterprises
?Advisertoindustryleadersinanalytics,userandvendor
?Skilledatdatawarehousestrategies,requirements,strategicdesign,engineeringanalysis,testdesignandevaluation
?Experiencedconsultantandtechnologyeducator,widelypublished
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC2
TheDataDrivenEnterpriseofthe2020s
MostCompaniesareIncreasingUseof:
1.Digital,realtimebusinessprocesses
2.Automateddecisionmaking(AI/ML)
3.Focusoncompetingwithanalytics
Alltrendsacceleratedbytheincreasingspeedofbusinesscompetition
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC3
SourceSystems
…
Orders
Report1
Products
Products
Report2
Suppliers
ReportN
Shipments
Cleanseddata,integrated
acrosstheenterprise
WeeklyLoads
TheDataWarehouseImpetus:1990s
Orders
CONSISTENT,ACCURATEREPORTING
DataWarehouse
Inventory
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC4
ANALYZING
WHYdidit
happen?
REPORTING
WHAT
happened?
A
STRATEGICINTELLIGENCE
AT
PREDICTING
WHATWILL
happen?
StagesofDataWarehouseEvolution
OPERATIONALINTELLIGENCE
OPERATIONALIZING
WHATIS
happeningnow?
ACTIVATING
MAKEithappen!
A
T
A
T
Batch
AdHoc
AnalyticsContinuousupdates, tacticalqueriesEventdriven
A
T
Source:DiagramadaptedfromBrobst&Rarey,2003 ?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.WWCC5
WC
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
LeadingEdge–200%CAGR
EB
5
4
3
2
Estimated
1
2010201420182022
Data
Warehouse
Growth
Source:WinterCorpPrimaryResearch–surveysandcustomerreports
WC6
WC
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
Key
Dimensions
ofData
Warehouse
Scale
WC7
Averagecommercialbuilding(20ksf)
BurjKhalifa,Dubai–TallestBuildingintheWorld
Average
single
family
home
(2500sf)
?2717feettall
?3.3millionsquarefeetinteriorspace
?57elevators
?62milesofpipesforplumbing(plus132milesforthefireemergencysystem)
?24kwindows
?62khpcoolingsystem
?1100xthesizeofasinglefamilyhome
?165xthesizeoftheaveragecommercialbuilding
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC8
SCALE+COMPLEXITY
Ifyouonlyrememberonethingtoday…
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC9
DataMartvsIntegratedDataWarehouse(IDW)
?Purposebuilt
?Onelargetable
?Dimensionsusuallysmall
?1-1or1-to-manyrelationships
?Commonpartitioningkeys
?Mostjoinsnotsodemanding
?User-visiblepre-joinscanbeok
StarSchema–DataMart
Store
PK
storeid
storename
street
city
state
zip
managerid
storeclass
storesize
salesstrategy
buildingtype
dateopened
dateclosed
numberofregisterscachementpopulation
SMSA
censustract
Transaction
PK
transid
transtype
date
time
productid
storeid
customerid
associateid
basketid
quantity
unitprice
discount
taxcategory
taxamount
extendedpricepaymentmethodaccountid
Product
PK
productid
namedescription
SKU
package
size
color
style
attr1-n
Customer
PK
customerid
name
street
city
state
zip
phone
maritalstatusfamilysizeagegroupincomegroupeducationallevelgenderbirthyear
attr1-n
?Manylargetableswithmultiplejoinpaths
?Many-to-manyrelationships
?Differentpartitioningkeys
?Joinsaremoredemanding
?User-visiblepre-joinsareaproblem
RelationalSchema-IDW
Organization
PK
FK1
name
street
city
state
zip
country
phone
fax
emailwebaddressindustrycodenumberofempbusinesssizestatus
startdate
enddate
orgtype
partyID
Person
PK
personID
FK1
FK2
name
street
city
state
zip
phone
gender
dob
incomelevelstatus
startdateenddatehouseholdIDpartyID
Household
PK
householdID
FK1
street
ctiy
state
zip
phone
householdsizeincomelevelpartyID
Party
PK
partyID
partytypestartdateenddate
Arrangement
PK
arrangementID
arrangementtypenamedescription
startdate
enddate
notes
Resource
PK
resourceID
FK1
resourcetypestartdataenddatenamedescriptionarrangementID
Location
PK
locationID
FK1
locationtype
name
street
city
state
zip
partyID
Product
PK
productID
FK2
productclassnamedescriptionproductstatusarrangementID
Conditions
PK
conditionID
FK1
conditiontypestartdateenddateamountarrangementID
Statement
PK
statement_id
FK1
balancepast_due_amountpastduedayseventID
Event
PK
eventID
FK1
eventtypeeventdateeventtimeeventamountaccountID
Relationship
PK
relationshipID
FK1
FK2
FK3
rel_typeaccountIDarrangementIDpartyID
Account
PK
accountID
accounttypestartdateenddateaccountstatus
orgID
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC10
IllustrativeDataWarehouseUses:FinancialServices
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC11
WC
12
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
TheDataRelationshipsinanIntegratedDataWarehouse
?Analysisofactualdatarelationshipsinanintegrateddatawarehouse,largeUScompany
?Dotsaretablesorviews
?Linesaredatarelationshipsthatactuallyoccurinqueries
?Diagramillustratesthatacombinatorialexplosionofjoinsdevelopsasthebusinessvalueofthedatarelationshipsisexploited
WC
IntegratedDataWarehouse
TYPICALCOREREQUIREMENTS
?Supportefficient(bigtable,bigtable)joinswithoutco-locationorpre-join(Toomanydifferentjoinsarerequired)
?Oftenlargerscale
?Supportcomplexqueriesalongavarietyofjoinpaths
?Effectivecomplexqueryoptimization
?Efficient,fasttacticalquery
?Managecomplex,mixedworkloads
?Supportmanyconcurrentoperations
?Supportupdatesthroughoutthedayaswellasanynightlybatchruns
?High,ifnotcontinuous,availabilityofup-to-datedata
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC13
JoinPerformanceatScale
?Challengingproblemiswhenthesetstobematchedarelarge
?Costandtimearemanytimesgreaterwhen:
—Techniquesarenotscalable
—Intermediateresultsspilltodisk
?Customerscanbemisledwhenthesamequeriesworkfineatsmallerscale
?Dominatestheperformancepictureinmanydatawarehouses
?Allpopulardatawarehouseplatformsperformsomejoinswell
?Manyperformthelargerandmorecomplexjoinspoorly
?Differencebetweenthebestandtheworstisoftenverylarge
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC14
JoinTestDatabase
?TPC-Hbenchmarkdatabase
?ScaleFactors:10,100,1000,10000
?AtSF1databaseis1GB
TableName
RowCount
RowSize
(Bytes)
EstimatedTableSize(MB,Uncompressed)
LineItem
6,000,000
121
725
Order
1,500,00
110
165
Part
200,00
120
24
Part-Supplier
800,000
143
114
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC15
Orders
X4
6millionrows
660MB
Part-Supplier
800Krows
114MB
N
Part
200Krows
24MB
4
N
N
1
1
DatabaseQuery–BigJoin
AtSF1,resultingjoinisatablewith:
?24millionrows
?450bytesperrow
?≈10.8GBtotalsize,uncompressed
Allsizesincreaseproportionallywithscalefactor
Line
Item
6millionrows
725MB
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC16
QueryPerformanceandCostvsDataSize
?Runonsmallvirtualdatawarehouse
?Twoservers,eachwith8virtualprocessors
DataSize
ResponseTime
(Seconds)
OnDemand
Cost
Comparedto
LinearGrowth
10GB
156
$0.09
-
100GB
1626
$0.90
104%
1TB
30,656
$17
196%
10TB
736,072
$409
454%
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC17
$17at1TB
(196%ofprojectedcost)
QueryCostisWorseThanLinearwithDataSize
POPULARCLOUDDATAWAREHOUSESERVICE
QueryCost($)
$409at10TB
Source:WinterCorpCloudDatabaseLabTestingonGCP
400
(472%ofprojectedcost)
320
240
ActualProjected
160
$0.90at100GB
Shouldbe$90at10TBGB
80
DataSize(GB)
100
1000
500
Shouldbe
$9at1TB
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC18
BigJoinExperiment:Interpretation
?Forthesystemtested,causeoftheperformanceproblemwasthatintermediatequeryresultsweresignificantlylargerthanSSDperserver
?Largeintermediateresultsoccurwith:
—Joinofmultiplebigtables—commonintheintegrateddatawarehouse
—Self-joinofalargetable—commoninpathanalytics,networkanalytics,IoT
?Specificquerypatternoccursinpractice,thoughinfrequently
?Sameproblemoccursinnconcurrentquerieseachneedcreate(1/n)intermediateresults—canoccurinawiderangeofusecases
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC19
15
?Independentconsultant
?Differentrequirement
ProductB
12
mightfavoradifferentproduct
9
WC
20
AnnualCloudChargesfortheSameDataWarehouse
WorkloadwithThreePopularCloudDWEngines
?Realcustomerdatabase&workload
$Millions
ProductA
6
Nooneproductis
alwaysbest!!!
ProductC
3
Source:BEZNextWhitePaper-WhichPlatformIsBestforYourCloudDataWarehouse
/#readthepaper
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC
DataWarehousingintheCloud
THEPROMISE
—Agility
—Elasticity
—Cost
—Scalability
—WorkloadManagement
Thesub-text:don’tworry,moreresourcesarealwaysavailable
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC21
DataWarehousingintheCloud
THEREALITY
?Datawarehouseengineeringchallengesarestillfundamental—Performance
—Cost
—DataAvailability
majorriskareas
hardtodeliveratenterprisescaleandcomplexity
?Resourcesnotfree
?Doublingtheresourcesoftendoesn’tdoubletheperformanceorthroughput
?Scalerisesfast
DatawarehouseproductsdiffergreatlyinperformanceandcostandtodaywearefocusingonBIGdifferences(usually>3x)
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC22
WWCC
MostWidelyUsedCloudDataWarehousePlatforms
Vendor
AmazonWeb
Services
(AWS)
Cloudera
Microsoft
Oracle
Snowflake
Teradata
Product/Service
Redshift
ClouderaData
Warehouse
AzureSynapseAnalytics
BigQuery
AutonomousDataWarehouse(ADW)
SnowflakeDataPlatform
Vantage
CloudAvailability
AWS
AWS,Azure,GCP
Azure
GCP
OracleCloud(AWSNotAs-a-Service)
AWS,Azure,GCP
AWS
GCPAnnounced
MicrosoftAzure
TeradataIntellicloud
OnPremisesAvailability
No
Yes,onHadoop
MicrosoftSQLServer
(Similarbutnotidentical)
No
Yes
Oracle
Cloud@Customer
No
Yes,optionallywithconsumptionpricing
Serverless
No
No
No
Yes
Twomodes:dedicatedandshared(~serverless)
ServerT-ShirtSizes
No
PricingModel
Default:ondemand
pricingbythesecond;
reservedinstance
discount(1-3yearterm);
payseparatelyfordata
storage
Priceperhourperserver
forsoftware;customer
payscloudprovider
separatelyforcompute
andstorage
Separatecomputeand
storage.Payfordata
stored,withflexible
scaling.Payforthe
computelevel,with
optiontopausecompute
entirely.Monthlypricing
(with3yearterm)
Default:ondemandpricing-
payperquerybasedondata
scanned;flatratepricingfor
reserved"queryslots"(a
blendedresourceunit);pay
separatelyfordatastorage
Unitpricingfor"shared"
publiccloudusefor
computeandstorage;
dedicatedhardwarewith
licenseportability
availableinpubliccloud
andonpremise;private
cloudonpremise
availableonsubscription
basis.
Separationofcompute
andstorage;"Snowflake
onDemand"persecond
pricing;reservedcapacity
availableonrequest;3
editions
Licenseportability;
consumptionpricing--
payonlyforcompleted
queries(cloudandon
prem);customernot
chargedforoverhead
operationssuchas
statistics;alsooffer
capacitypricing;
?2020,2021WinterCorpLLC,TyngsboroMA
23
WC
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
The
Datanolongerneedstobein
onecentralstore:cloud,onprem,multi-cloud,allofabove
Modern
Data
Warehouse
WC24
TheModernAnalyticDataPlatform
DataLake
Alltypesofdata
including“unstructured”
(largevideo,scans,etc.)
Managesalltypesofdataforanalyticuseincludingdatathatis/has:
?largeandvariableinsize
?relatively“unstructured”
?lowervaluedensity
?lessshareduse
?variedlevelsofcuration
?colder
Data&Requests
Data&Requests
Data&requestsflowinboth
directions
Boundariesareunclearand
changing
Maybeitwillsoonbeone
“dataplatform”–sometimes
positionedasthe“datacloud”
DataWarehouse
Manytypesofdata,mosthassomestructure
Managesmanytypesof
dataforanalyticusethat
ismost:
?intensivelyused
?widelyshared
?highlycurated
?integrated
?valuable
DataLakehousehassomecharacteristicsofbothlakeandwarehouse
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC25
Industry
Trends
?QuantifiedRequirementsare:
—DatabaseMacro
structure
Projected
Growth**
—Scale
—Workload(e.g.,queryclassesandfrequencies)
—Userpopulation
—ServiceLevels
Estimateinranges,but
estimate!!
DefiningQuantifiedArchitecturalRequirements
Business**Interests/Vision
CurrentNeeds
Projected
Requirements
(Quantified)
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC26
DoaQuantifiedEvaluation,withMeasurement
WorkingfromyourEstimateofMacro-Requirements
1.Canthecandidateplatformsperformatyourtargetlevelsofscaleandcomplexity?
?Notonlyfeatures,butanengineeringassessment
?Requireproductionreferenceswithsimilarmacro-structureandscale
2.Criti
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 現(xiàn)澆混凝土銷售合同范例
- 購(gòu)置設(shè)備折舊合同范例
- 化工環(huán)境檢測(cè)合作合同范例
- 肉類進(jìn)口貿(mào)易合同范例
- 南昌商鋪購(gòu)買合同范例
- 武漢輕工大學(xué)《跨境電商溝通實(shí)務(wù)》2023-2024學(xué)年第一學(xué)期期末試卷
- 翻譯服務(wù)合同范例
- 相城區(qū)買賣房屋合同范例
- 武漢理工大學(xué)《人機(jī)工程學(xué)》2023-2024學(xué)年第一學(xué)期期末試卷
- 佛教合同范例
- 快樂(lè)讀書(shū)吧:中國(guó)民間故事(專項(xiàng)訓(xùn)練)-2023-2024學(xué)年五年級(jí)語(yǔ)文上冊(cè)(統(tǒng)編版)
- 2023-2024學(xué)年廣東省廣州市天河區(qū)七年級(jí)(上)期末英語(yǔ)試卷
- 2024秋國(guó)家開(kāi)放大學(xué)《形勢(shì)與政策》大作業(yè)參考答案
- 20以內(nèi)的加法口算練習(xí)題4000題 290
- Academic English智慧樹(shù)知到期末考試答案章節(jié)答案2024年杭州醫(yī)學(xué)院
- 敦煌的藝術(shù)智慧樹(shù)知到期末考試答案章節(jié)答案2024年北京大學(xué)
- 北京海淀區(qū)2023-2024學(xué)年六年級(jí)上學(xué)期期末數(shù)學(xué)數(shù)學(xué)試卷
- 2023年《思想道德與法治》期末考試復(fù)習(xí)題庫(kù)(帶答案)
- 銅芯聚氯乙烯絕緣聚氯乙烯護(hù)套控制電纜檢測(cè)報(bào)告可修改
- Q345鋼板焊接性能分析解析
- 神奇的數(shù)字治病法
評(píng)論
0/150
提交評(píng)論