




版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
09二月2025DataWarehousingandOLAPTechnology1數(shù)據(jù)倉(cāng)庫(kù)和OLAP技術(shù)什么是數(shù)據(jù)倉(cāng)庫(kù)(Whatisadatawarehouse)?多維數(shù)據(jù)模型(Amulti-dimensionaldatamodel)數(shù)據(jù)倉(cāng)庫(kù)體系結(jié)構(gòu)(Datawarehousearchitecture)數(shù)據(jù)倉(cāng)庫(kù)實(shí)現(xiàn)(Datawarehouseimplementation)FurtherdevelopmentofdatacubetechnologyFromdatawarehousingtodatamining09二月2025DataWarehousingandOLAPTechnology2數(shù)據(jù)庫(kù)的定義傳統(tǒng)的數(shù)據(jù)庫(kù)技術(shù)是以單一的數(shù)據(jù)資源為中心,同時(shí)進(jìn)行從事務(wù)處理,批處理到?jīng)Q策分析的各類處理;數(shù)據(jù)庫(kù)主要是為自動(dòng)化,精簡(jiǎn)工作任務(wù)和高速數(shù)據(jù)采集服務(wù)的。它的運(yùn)行是事務(wù)驅(qū)動(dòng),面向應(yīng)用的,數(shù)據(jù)庫(kù)的根本任務(wù)是完成數(shù)據(jù)操作,即及時(shí)安全地將當(dāng)前事務(wù)所產(chǎn)生的記錄保存下來(lái)。09二月2025DataWarehousingandOLAPTechnology3兩種不同的數(shù)據(jù)處理需求計(jì)算機(jī)系統(tǒng)中存在著兩類不同的數(shù)據(jù)處理需求,即:操作型處理(事務(wù)處理):主要是對(duì)一個(gè)或一組記錄的查詢和修改,這時(shí)候人們關(guān)心的是響應(yīng)時(shí)間、數(shù)據(jù)的安全性和完整性;分析型處理(信息型處理):用于管理人員的決策分析,如DDS(decisionsupportsystem)、多維分析等。
09二月2025DataWarehousingandOLAPTechnology4為什么要建立數(shù)據(jù)倉(cāng)庫(kù)?數(shù)據(jù)DATA知識(shí)KNOWLEDGE決定DECISIONSPatternsTrendsFactsRelationsModelsAssociationsSequencesTargetMarketsFundsallocationTradingoptionsWheretoadvertiseCatalogmailinglistSalesgeography財(cái)經(jīng)的Financial經(jīng)濟(jì)的Economic政府Government銷售分?jǐn)?shù)Point-of-Sale人口統(tǒng)計(jì)學(xué)Demographic生活方式Lifestyle
痛苦:太多數(shù)據(jù),無(wú)法作出正確判斷!09二月2025DataWarehousingandOLAPTechnology5WhatisDataWarehouse?"數(shù)據(jù)倉(cāng)庫(kù)是在企業(yè)管理和決策中面向主題的,集成的,與時(shí)間相關(guān)的和不可修改的數(shù)據(jù)集合“Adatawarehouseisa
subject-oriented,integrated,time-variant,andnonvolatile
collectionofdatainsupportofmanagement’sdecision-makingprocess.”—W.H.InmonDatawarehousing:Theprocessofconstructingandusingdatawarehouses09二月2025DataWarehousingandOLAPTechnology6DataWarehouse—Subject-OrientedOrganizedaroundmajorsubjects,suchascustomer,product,sales.Focusingonthemodelingandanalysisofdatafordecisionmakers,notondailyoperationsortransactionprocessing.Provideasimpleandconciseviewaroundparticularsubjectissuesbyexcludingdatathatarenotusefulinthedecisionsupportprocess.09二月2025DataWarehousingandOLAPTechnology7面向應(yīng)用舉例采購(gòu)子系統(tǒng):訂單(訂單號(hào),供應(yīng)商號(hào),總金額,日期)訂單細(xì)則(訂單號(hào),商品號(hào),類別,單價(jià),數(shù)量)供應(yīng)商(供應(yīng)商號(hào),供應(yīng)商名,地址,電話)銷售子系統(tǒng):顧客(顧客號(hào),姓名,性別,年齡,地址,電話)銷售(員工號(hào),顧客號(hào),商品號(hào),數(shù)量,單價(jià)日期)庫(kù)存管理子系統(tǒng):領(lǐng)料單(領(lǐng)料單號(hào),領(lǐng)料人,商品號(hào),數(shù)量,日期)進(jìn)料單(進(jìn)料單號(hào),訂單號(hào),進(jìn)料人,收料人,日期)庫(kù)存(商品號(hào),庫(kù)房號(hào),庫(kù)存量,日期)庫(kù)房(庫(kù)房號(hào),倉(cāng)庫(kù)保管員,地點(diǎn),庫(kù)存商品描述)人事管理子系統(tǒng):?jiǎn)T工(員工號(hào),姓名,性別,年齡,部門號(hào))部門(部門號(hào),部門名稱,部門主管,電話)面向主題舉例:商品:商品固有信息:商品號(hào),商品名,類別,顏色等商品采購(gòu)信息:商品號(hào),供應(yīng)商號(hào),供應(yīng)價(jià),供應(yīng)日期,供應(yīng)量等商品銷售信息:商品號(hào),顧客號(hào),售價(jià),銷售日期,銷售量等商品庫(kù)存信息:商品號(hào),庫(kù)房號(hào),日期,庫(kù)存量等供應(yīng)商:供應(yīng)商固有信息:供應(yīng)商號(hào),供應(yīng)商名,地址,電話等供應(yīng)商品信息:供應(yīng)商號(hào),商品號(hào),供應(yīng)價(jià),供應(yīng)日期,供應(yīng)量等顧客:顧客固有信息:顧客號(hào),顧客名,性別,年齡,住址,電話等顧客購(gòu)物信息:顧客號(hào),商品號(hào),售價(jià),購(gòu)買日期,購(gòu)買量等09二月2025DataWarehousingandOLAPTechnology8DataWarehouse—IntegratedConstructedbyintegratingmultiple,heterogeneousdatasourcesrelationaldatabases,flatfiles,on-linetransactionrecordsDatacleaninganddataintegrationtechniquesareapplied.Ensureconsistencyinnamingconventions,encodingstructures,attributemeasures,etc.amongdifferentdatasourcesE.g.,Hotelprice:currency,tax,breakfastcovered,etc.Whendataismovedtothewarehouse,itisconverted.09二月2025DataWarehousingandOLAPTechnology9DataWarehouse—TimeVariantThetimehorizonforthedatawarehouseissignificantlylongerthanthatofoperationalsystems.Operationaldatabase:currentvaluedata.Datawarehousedata:provideinformationfromahistoricalperspective(e.g.,past5-10years)EverykeystructureinthedatawarehouseContainsanelementoftime,explicitlyorimplicitlyButthekeyofoperationaldatamayormaynotcontain“timeelement”.09二月2025DataWarehousingandOLAPTechnology10DataWarehouse—Non-VolatileAphysicallyseparatestoreofdatatransformedfromtheoperationalenvironment.Operationalupdateofdatadoesnotoccurinthedatawarehouseenvironment.Doesnotrequiretransactionprocessing,recovery,andconcurrencycontrolmechanismsRequiresonlytwooperationsindataaccessing:initialloadingofdataandaccessofdata.09二月2025DataWarehousingandOLAPTechnology11DataWarehousevs.HeterogeneousDBMSTraditionalheterogeneousDBintegration:Buildwrappers/mediatorsontopofheterogeneousdatabasesQuerydrivenapproachWhenaqueryisposedtoaclientsite,ameta-dictionaryisusedtotranslatethequeryintoqueriesappropriateforindividualheterogeneoussitesinvolved,andtheresultsareintegratedintoaglobalanswersetComplexinformationfiltering,competeforresourcesDatawarehouse:update-driven,highperformanceInformationfromheterogeneoussourcesisintegratedinadvanceandstoredinwarehousesfordirectqueryandanalysis09二月2025DataWarehousingandOLAPTechnology12DataWarehousevs.OperationalDBMSOLTP(on-linetransactionprocessing)MajortaskoftraditionalrelationalDBMSDay-to-dayoperations:purchasing,inventory,banking,manufacturing,payroll,registration,accounting,etc.OLAP(on-lineanalyticalprocessing)MajortaskofdatawarehousesystemDataanalysisanddecisionmakingDistinctfeatures(OLTPvs.OLAP):Userandsystemorientation:customervs.marketDatacontents:current,detailedvs.historical,consolidatedDatabasedesign:ER+applicationvs.star+subjectView:current,localvs.evolutionary,integratedAccesspatterns:updatevs.read-onlybutcomplexqueries09二月2025DataWarehousingandOLAPTechnology13OLTPvs.OLAP09二月2025DataWarehousingandOLAPTechnology14WhySeparateDataWarehouse?HighperformanceforbothsystemsDBMS—tunedforOLTP:accessmethods,indexing,concurrencycontrol,recoveryWarehouse—tunedforOLAP:complexOLAPqueries,multidimensionalview,consolidation.Differentfunctionsanddifferentdata:missingdata:DecisionsupportrequireshistoricaldatawhichoperationalDBsdonottypicallymaintaindataconsolidation:DSrequiresconsolidation(aggregation,summarization)ofdatafromheterogeneoussourcesdataquality:differentsourcestypicallyuseinconsistentdatarepresentations,codesandformatswhichhavetobereconciled09二月2025DataWarehousingandOLAPTechnology15DataWarehousingandOLAPTechnologyWhatisadatawarehouse?Amulti-dimensionaldatamodelDatawarehousearchitectureDatawarehouseimplementationFurtherdevelopmentofdatacubetechnologyFromdatawarehousingtodatamining09二月2025DataWarehousingandOLAPTechnology16FromTablesandSpreadsheetstoDataCubesAdatawarehouseisbasedonamultidimensionaldatamodelwhichviewsdataintheformofadatacubeAdatacube,suchassales,allowsdatatobemodeledandviewedinmultipledimensionsDimensiontables,suchasitem(item_name,brand,type),ortime(day,week,month,quarter,year)Facttablecontainsmeasures(suchasdollars_sold)andkeystoeachoftherelateddimensiontablesIndatawarehousingliterature,ann-Dbasecubeiscalledabasecuboid.Thetopmost0-Dcuboid,whichholdsthehighest-levelofsummarization,iscalledtheapexcuboid.Thelatticeofcuboidsformsadatacube.09二月2025DataWarehousingandOLAPTechnology17Cube:ALatticeofCuboidsalltimeitemlocationsuppliertime,itemtime,locationtime,supplieritem,locationitem,supplierlocation,suppliertime,item,locationtime,item,suppliertime,location,supplieritem,location,suppliertime,item,location,supplier0-D(apex)cuboid1-Dcuboids2-Dcuboids3-Dcuboids4-D(base)cuboid09二月2025DataWarehousingandOLAPTechnology18ConceptualModelingofDataWarehousesModelingdatawarehouses:dimensions&measuresStarschema:AfacttableinthemiddleconnectedtoasetofdimensiontablesSnowflakeschema:Arefinementofstarschemawheresomedimensionalhierarchyisnormalizedintoasetofsmallerdimensiontables,formingashapesimilartosnowflakeFactconstellations:Multiplefacttablessharedimensiontables,viewedasacollectionofstars,thereforecalledgalaxyschemaorfactconstellation
09二月2025DataWarehousingandOLAPTechnology19ExampleofStarSchema
time_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcityprovince_or_streetcountrylocationSalesFactTable
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_salesMeasuresitem_keyitem_namebrandtypesupplier_typeitembranch_keybranch_namebranch_typebranch09二月2025DataWarehousingandOLAPTechnology20ExampleofSnowflakeSchematime_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcity_keylocationSalesFactTable
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_salesMeasuresitem_keyitem_namebrandtypesupplier_keyitembranch_keybranch_namebranch_typebranchsupplier_keysupplier_typesuppliercity_keycityprovince_or_streetcountrycity09二月2025DataWarehousingandOLAPTechnology21ExampleofFactConstellationtime_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcityprovince_or_streetcountrylocationSalesFactTabletime_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_salesMeasuresitem_keyitem_namebrandtypesupplier_typeitembranch_keybranch_namebranch_typebranchShippingFactTabletime_key
item_key
shipper_key
from_location
to_location
dollars_cost
units_shippedshipper_keyshipper_namelocation_keyshipper_typeshipper09二月2025DataWarehousingandOLAPTechnology22ADataMiningQueryLanguage,DMQL:LanguagePrimitivesCubeDefinition(FactTable)definecube<cube_name>[<dimension_list>]:<measure_list>DimensionDefinition(DimensionTable)definedimension<dimension_name>as(<attribute_or_subdimension_list>)SpecialCase(SharedDimensionTables)Firsttimeas“cubedefinition”definedimension<dimension_name>as<dimension_name_first_time>incube<cube_name_first_time>09二月2025DataWarehousingandOLAPTechnology23DefiningaStarSchemainDMQLdefinecubesales_star[time,item,branch,location]:dollars_sold=sum(sales_in_dollars),avg_sales=avg(sales_in_dollars),units_sold=count(*)definedimensiontimeas(time_key,day,day_of_week,month,quarter,year)definedimensionitemas(item_key,item_name,brand,type,supplier_type)definedimensionbranchas(branch_key,branch_name,branch_type)definedimensionlocationas(location_key,street,city,province_or_state,country)09二月2025DataWarehousingandOLAPTechnology24DefiningaSnowflakeSchemainDMQLdefinecubesales_snowflake[time,item,branch,location]:dollars_sold=sum(sales_in_dollars),avg_sales=avg(sales_in_dollars),units_sold=count(*)definedimensiontimeas(time_key,day,day_of_week,month,quarter,year)definedimensionitemas(item_key,item_name,brand,type,supplier(supplier_key,supplier_type))definedimensionbranchas(branch_key,branch_name,branch_type)definedimensionlocationas(location_key,street,city(city_key,province_or_state,country))09二月2025DataWarehousingandOLAPTechnology25DefiningaFactConstellationinDMQLdefinecubesales[time,item,branch,location]:dollars_sold=sum(sales_in_dollars),avg_sales=avg(sales_in_dollars),units_sold=count(*)definedimensiontimeas(time_key,day,day_of_week,month,quarter,year)definedimensionitemas(item_key,item_name,brand,type,supplier_type)definedimensionbranchas(branch_key,branch_name,branch_type)definedimensionlocationas(location_key,street,city,province_or_state,country)definecubeshipping[time,item,shipper,from_location,to_location]:dollar_cost=sum(cost_in_dollars),unit_shipped=count(*)definedimensiontimeastimeincubesalesdefinedimensionitemasitemincubesalesdefinedimensionshipperas(shipper_key,shipper_name,locationaslocationincubesales,shipper_type)definedimensionfrom_locationaslocationincubesalesdefinedimensionto_locationaslocationincubesales09二月2025DataWarehousingandOLAPTechnology26Measures:ThreeCategoriesdistributive:iftheresultderivedbyapplyingthefunctiontonaggregatevaluesisthesameasthatderivedbyapplyingthefunctiononallthedatawithoutpartitioning.E.g.,count(),sum(),min(),max().algebraic:
ifitcanbecomputedbyanalgebraicfunctionwithMarguments(whereMisaboundedinteger),eachofwhichisobtainedbyapplyingadistributiveaggregatefunction.E.g.,
avg(),min_N(),standard_deviation().holistic:ifthereisnoconstantboundonthestoragesizeneededtodescribeasubaggregate.
E.g.,median(),mode(),rank().09二月2025DataWarehousingandOLAPTechnology27AConceptHierarchy:Dimension(location)allEuropeNorth_AmericaMexicoCanadaSpainGermanyVancouverM.WindL.Chan..................allregionofficecountryTorontoFrankfurtcity09二月2025DataWarehousingandOLAPTechnology28ViewofWarehousesandHierarchiesSpecificationofhierarchiesSchemahierarchyday<{month<quarter;week}<yearSet_groupinghierarchy{1..10}<inexpensive09二月2025DataWarehousingandOLAPTechnology29MultidimensionalDataSalesvolumeasafunctionofproduct,month,andregionProductRegionMonthDimensions:Product,Location,TimeHierarchicalsummarizationpathsIndustryRegionYearCategoryCountryQuarterProductCityMonthWeekOfficeDay09二月2025DataWarehousingandOLAPTechnology30ASampleDataCubeTotalannualsalesofTVinU.S.A.DateProductCountryAll,All,Allsumsum
TVVCRPC1Qtr2Qtr3Qtr4QtrU.S.ACanadaMexicosum09二月2025DataWarehousingandOLAPTechnology31CuboidsCorrespondingtotheCubeallproductdatecountryproduct,dateproduct,countrydate,countryproduct,date,country0-D(apex)cuboid1-Dcuboids2-Dcuboids3-D(base)cuboid09二月2025DataWarehousingandOLAPTechnology32BrowsingaDataCubeVisualizationOLAPcapabilitiesInteractivemanipulation09二月2025DataWarehousingandOLAPTechnology33TypicalOLAPOperationsRollup(drill-up):summarizedatabyclimbinguphierarchyorbydimensionreductionDrilldown(rolldown):reverseofroll-upfromhigherlevelsummarytolowerlevelsummaryordetaileddata,orintroducingnewdimensionsSliceanddice:
projectandselect
Pivot(rotate):
reorientthecube,visualization,3Dtoseriesof2Dplanes.Otheroperationsdrillacross:involving(across)morethanonefacttabledrillthrough:throughthebottomlevelofthecubetoitsback-endrelationaltables(usingSQL)09二月2025DataWarehousingandOLAPTechnology34AStar-NetQueryModel
ShippingMethodAIR-EXPRESSTRUCKORDERCustomerOrdersCONTRACTSCustomerProductPRODUCTGROUPPRODUCTLINEPRODUCTITEMSALESPERSONDISTRICTDIVISIONOrganizationPromotionCITYCOUNTRYREGIONLocationDAILYQTRLYANNUALYTimeEachcircleiscalledafootprint09二月2025DataWarehousingandOLAPTechnology35DataWarehousingandOLAPTechnologyforDataMiningWhatisadatawarehouse?Amulti-dimensionaldatamodelDatawarehousearchitectureDatawarehouseimplementationFurtherdevelopmentofdatacubetechnologyFromdatawarehousingtodatamining09二月2025DataWarehousingandOLAPTechnology36DesignofaDataWarehouse:ABusinessAnalysisFrameworkFourviewsregardingthedesignofadatawarehouseTop-downviewallowsselectionoftherelevantinformationnecessaryforthedatawarehouseDatasourceviewexposestheinformationbeingcaptured,stored,andmanagedbyoperationalsystemsDatawarehouseviewconsistsoffacttablesanddimensiontablesBusinessqueryview
seestheperspectivesofdatainthewarehousefromtheviewofend-user09二月2025DataWarehousingandOLAPTechnology37DataWarehouseDesignProcessTop-down,bottom-upapproachesoracombinationofbothTop-down:Startswithoveralldesignandplanning(mature)Bottom-up:Startswithexperimentsandprototypes(rapid)FromsoftwareengineeringpointofviewWaterfall:structuredandsystematicanalysisateachstepbeforeproceedingtothenextSpiral:rapidgenerationofincreasinglyfunctionalsystems,shortturnaroundtime,quickturnaroundTypicaldatawarehousedesignprocessChooseabusinessprocesstomodel,e.g.,orders,invoices,etc.Choosethegrain(atomiclevelofdata)ofthebusinessprocessChoosethedimensionsthatwillapplytoeachfacttablerecordChoosethemeasurethatwillpopulateeachfacttablerecord09二月2025DataWarehousingandOLAPTechnology38Multi-TieredArchitectureDataWarehouseExtractTransformLoadRefreshOLAPEngineAnalysisQueryReportsDataminingMonitor&IntegratorMetadataDataSourcesFront-EndToolsServeDataMartsOperational
DBsothersourcesDataStorageOLAPServer09二月2025DataWarehousingandOLAPTechnology39SourceDatabasesDataExtraction,Transformation,loadWarehouseAdmin.ToolsExtract,TransformandLoadDataModelingToolCentralMetadataArchitectedDataMartsDataAccessandAnalysisEnd-UserDWToolsCentralDataWarehouseCentralDataWarehouseMid-TierMid-TierDataMartDataMartLocalMetadataLocalMetadataLocalMetadataMetadataExchangeMDBDataCleansingToolRelationalAppl.PackageLegacyExternalRDBMSRDBMS體系結(jié)構(gòu)
[Pieter,1998]數(shù)據(jù)倉(cāng)庫(kù)的焦點(diǎn)問(wèn)題-數(shù)據(jù)的獲得、存儲(chǔ)和使用
RelationalPackageLegacyExternalsourceDataCleanToolDataStagingEnterpriseDataWarehouseDatamartDatamartRDBMSROLAPRDBMSEnd-UserToolEnd-UserToolMDBEnd-UserToolEnd-UserTool數(shù)據(jù)倉(cāng)庫(kù)和集市的加載能力至關(guān)重要數(shù)據(jù)倉(cāng)庫(kù)和集市的查詢輸出能力至關(guān)重要ETL工具去掉操作型數(shù)據(jù)庫(kù)中的不需要的數(shù)據(jù)統(tǒng)一轉(zhuǎn)換數(shù)據(jù)的名稱和定義計(jì)算匯總數(shù)據(jù)和派生數(shù)據(jù)估計(jì)遺失數(shù)據(jù)的缺省值調(diào)節(jié)源數(shù)據(jù)的定義變化09二月2025DataWarehousingandOLAPTechnology42ThreeDataWarehouseModelsEnterprisewarehousecollectsalloftheinformationaboutsubjectsspanningtheentireorganizationDataMartasubsetofcorporate-widedatathatisofvaluetoaspecificgroupsofusers.Itsscopeisconfinedtospecific,selectedgroups,suchasmarketingdatamartIndependentvs.dependent(directlyfromwarehouse)datamartVirtualwarehouseAsetofviewsoveroperationaldatabasesOnlysomeofthepossiblesummaryviewsmaybematerialized09二月2025DataWarehousingandOLAPTechnology43DataWarehouseDevelopment:ARecommendedApproachDefineahigh-levelcorporatedatamodelDataMartDataMartDistributedDataMartsMulti-TierDataWarehouseEnterpriseDataWarehouseModelrefinementModelrefinement09二月2025DataWarehousingandOLAPTechnology44OLAPServerArchitecturesRelationalOLAP(ROLAP)
Userelationalorextended-relationalDBMStostoreandmanagewarehousedataandOLAPmiddlewaretosupportmissingpiecesIncludeoptimizationofDBMSbackend,implementationofaggregationnavigationlogic,andadditionaltoolsandservicesgreaterscalabilityMultidimensionalOLAP(MOLAP)
Array-basedmultidimensionalstorageengine(sparsematrixtechniques)fastindexingtopre-computedsummarizeddataHybridOLAP(HOLAP)Userflexibility,e.g.,lowlevel:relational,high-level:arraySpecializedSQLserversspecializedsupportforSQLqueriesoverstar/snowflakeschemas09二月2025DataWarehousingandOLAPTechnology45DataWarehousingandOLAPTechnologyforDataMiningWhatisadatawarehouse?Amulti-dimensionaldatamodelDatawarehousearchitectureDatawarehouseimplementationFurtherdevelopmentofdatacubetechnologyFromdatawarehousingtodatamining09二月2025DataWarehousingandOLAPTechnology46EfficientDataCubeComputationDatacubecanbeviewedasalatticeofcuboidsThebottom-mostcuboidisthebasecuboidThetop-mostcuboid(apex)containsonlyonecellHowmanycuboidsinann-dimensionalcubewithLlevels?MaterializationofdatacubeMaterializeevery(cuboid)(fullmaterialization),none(nomaterialization),orsome(partialmaterialization)SelectionofwhichcuboidstomaterializeBasedonsize,sharing,accessfrequency,etc.09二月2025DataWarehousingandOLAPTechnology47CubeOperationCubedefinitionandcomputationinDMQLdefinecubesales[item,city,year]:sum(sales_in_dollars)computecubesalesTransformitintoaSQL-likelanguage(withanewoperatorcubeby,introducedbyGrayetal.’96)SELECTitem,city,year,SUM(amount)FROMSALESCUBEBYitem,city,yearNeedcomputethefollowingGroup-Bys
(date,product,customer),(date,product),(date,customer),(product,customer),(date),(product),(customer)()(item)(city)()(year)(city,item)(city,year)(item,year)(city,item,year)09二月2025DataWarehousingandOLAPTechnology48CubeComputation:ROLAP-BasedMethodEfficientcubecomputationmethodsROLAP-basedcubingalgorithms(Agarwaletal’96)Array-basedcubingalgorithm(Zhaoetal’97)Bottom-upcomputationmethod(Bayer&Ramarkrishnan’99)ROLAP-basedcubingalgorithmsSorting,hashing,andgroupingoperationsareappliedtothedimensionattributesinordertoreorderandclusterrelatedtuplesGroupingisperformedonsomesubaggregatesasa“partialgroupingstep”Aggregatesmaybecomputedfrompreviouslycomputedaggregates,ratherthanfromthebasefacttable09二月2025DataWarehousingandOLAPTechnology49CubeComputation:ROLAP-BasedMethod(2)ThisisnotinthetextbookbutinaresearchpaperHash/sortbasedmethods(Agarwalet.al.
VLDB’96)Smallest-parent:computingacuboidfromthesmallestcubodpreviouslycomputedcuboid.Cache-results:cachingresultsofacuboidfromwhichothercuboidsarecomputedtoreducediskI/OsAmortize-scans:computingasmanyaspossiblecuboidsatthesametimetoamortizediskreadsShare-sorts:sharingsortingcostscrossmultiplecuboidswhensort-basedmethodisusedShare-partitions:sharingthepartitioningcostcrossmultiplecuboidswhenhash-basedalgorithmsareused09二月2025DataWarehousingandOLAPTechnology50Multi-wayArrayAggregationforCubeComputationPartitionarraysintochunks(asmallsubcubewhichfitsinmemory).Compressedsparsearrayaddressing:(chunk_id,offset)Computeaggregatesin“multiway”byvisitingcubecellsintheorderwhichminimizesthe#oftimestovisiteachcell,andreducesmemoryaccessandstoragecost.Whatisthebesttraversingordertodomulti-wayaggregation?AB29303132123459131415166463626148474645a1a0c3c2c1c0b3b2b1b0a2a3CB44285640245236206009二月2025DataWarehousingandOLAPTechnology51Multi-wayArrayAggregationforCubeComputationAB29303132123459131415166463626148474645a1a0c3c2c1c0b3b2b1b0a2a3C442856402452362060B09二月2025DataWarehousingandOLAPTechnology52Multi-wayArrayAggregationforCubeComputationAB29303132123459131415166463626148474645a1a0c3c2c1c0b3b2b1b0a2a3C442856402452362060B09二月2025DataWarehousingandOLAPTechnology53Multi-WayArrayAggregationforCubeComputation(Cont.)Method:theplanesshouldbesortedandcomputedaccordingtotheirsizeinascendingorder.SeethedetailsofExample2.12(pp.75-78)Idea:keepthesmallestplaneinthemainmemory,fetchandcomputeonlyonechunkatatimeforthelargestplaneLimitationofthemethod:computingwellonlyforasmallnumberofdimensionsIftherearealargenumberofdimensions,“bottom-upcomputation”andicebergcubecomputationmethodscanbeexplored09二月2025DataWarehousingandOLAPTechnology54IndexingOLAPData:BitmapIndexIndexonaparticularcolumnEachvalueinthecolumnhasabitvector:bit-opisfastThelengthofthebitvector:#ofrecordsinthebasetableThei-thbitissetifthei-throwofthebasetablehasthevaluefortheindexedcolumnnotsuitableforhighcardinalitydomainsBasetableIndexonRegionIndexonType09二月2025DataWarehousingandOLAPTechnology55IndexingOLAPData:JoinIndicesJoinindex:JI(R-id,S-id)whereR(R-id,…)S(S-id,…)TraditionalindicesmapthevaluestoalistofrecordidsItmaterializesrelationaljoininJIfileandspeedsuprelationaljoin—arathercostlyoperationIndatawarehouses,joinindexrelatesthevaluesofthedimensionsofastartschematorowsinthefacttable.E.g.facttable:SalesandtwodimensionscityandproductAjoinindexoncitymaintainsforeachdistinctcityalistofR-IDsofthetuplesrecordingtheSalesinthecityJoinindicescanspanmultipledimensions09二月2025DataWarehousingandOLAPTechnology56EfficientProcessingOLAPQueriesDeterminewhichoperationsshouldbeperformedontheavailablecuboids:transformdrill,roll,etc.intocorrespondingSQLand/orOLAPoperations,e.g,dice=selection+projectionDeterminetowhichmaterializedcuboid(s)therelevantoperationsshouldbeapplied.Exploringindexingstructuresandcompressedvs.densearraystructuresinMOLAP09二月2025DataWarehousingandOLAPTechnology57MetadataRepositoryMetadataisthedatadefiningwarehouseobjects.IthasthefollowingkindsDescriptionofthestructureofthewarehouseschema,view,dimensions,hierarchies,deriveddatadefn,datamartlocationsandcontentsOperationalmeta-datadatalineage(historyofmigrateddataandtransformationpath),currencyofdata(active,archived,orpurged),monitoringinformation(warehouseusagestatistics,errorreports,audittrails)ThealgorithmsusedforsummarizationThemappingfromoperationalenvironmenttothedatawarehouseDatarelatedtosystemperformancewarehouseschema,viewandderiveddatadefinitionsBusinessdatabusinesstermsanddefinitions,ownershipofdata,chargingpolicies09二月2025DataWarehousingandOLAPTechnology58DataWarehouseBack-EndToolsandUtilitiesDataextraction:getdatafrommultiple,heterogeneous,andexternalsourcesDatacleaning:detecterrorsinthedataandrectifythemwhenpossibleDatatransformation:convertdatafromlegacyorhostformattowarehouseformatLoad:sort,summarize,consolidate,computeviews,checkintegrity,andbuildindiciesandpartitionsRefreshpropagatetheupdatesfromthedatasourcestothewarehouse09二月2025DataWarehousingandOLAPTechnology59Discovery-DrivenExplorationofDataCubesHypothesis-driven:explorationbyuser,hugesearchspaceDiscovery-driven(Sarawagietal.’98)pre-computemeasuresindicatingexceptions,guideuserinthedataanalysis,atalllevelsofaggregationException:significantlydifferentfromthevalueanticipated,basedonastatisticalmodelVisualcuessuchasbackgroundcolorareusedtoreflectthedegreeofexceptionofeachcellComputationofexceptionindicato
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年甘肅省臨夏回族自治州單招職業(yè)適應(yīng)性測(cè)試題庫(kù)附答案
- 產(chǎn)品更新合同范本
- 科技制造的工藝進(jìn)步與市場(chǎng)應(yīng)用
- 短視頻在醫(yī)療行業(yè)品牌塑造中的實(shí)踐
- 2025至2030年中國(guó)灌腸兩用機(jī)數(shù)據(jù)監(jiān)測(cè)研究報(bào)告
- 電商時(shí)代下的潔廁劑行業(yè)營(yíng)銷策略調(diào)整
- 2025至2030年中國(guó)混紡繩子數(shù)據(jù)監(jiān)測(cè)研究報(bào)告
- 2025年中國(guó)有色集團(tuán)校園招聘正式啟動(dòng)筆試參考題庫(kù)附帶答案詳解
- 現(xiàn)代科技在農(nóng)業(yè)中的多元應(yīng)用與案例分析
- 電商行業(yè)的人才培養(yǎng)與職業(yè)發(fā)展
- 信息安全與網(wǎng)絡(luò)安全的重要性與意義
- 工會(huì)法人變更登記申請(qǐng)表
- 呼吸性堿中毒的病例分析
- 《鍋爐原理》試題庫(kù)及參考答案(學(xué)習(xí)資料)
- 防呆防錯(cuò)十大原理及案例分析
- 區(qū)塊鏈金融發(fā)展的現(xiàn)狀、挑戰(zhàn)與前景
- 秒的認(rèn)識(shí) 全國(guó)公開(kāi)課一等獎(jiǎng)
- 電工基礎(chǔ)(第五版) 課件全套 白乃平 第1-9章 電路的基本概念和基本定律- 磁路與鐵芯線圈+附錄 常用電工儀表簡(jiǎn)介
- ct增強(qiáng)掃描中造影劑外滲課件
- 《汽車發(fā)動(dòng)機(jī)構(gòu)造與維修》教案-
- 2021年陜西西安亮麗電力集團(tuán)有限責(zé)任公司招聘筆試試題
評(píng)論
0/150
提交評(píng)論