版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
High-DimensionalOLAP:
AMinimalCubingApproachpurposeHowtocubinginHigh-DimensinaldatawarehousesefficientlyThispaperproposeanovelmethodthatcomputesathinlayerofthedatacubetogetherwithassociatedvalue-listindicesIntroductionDatacubehasbeenplayinganessentialroleintheimplementationoffastOLAPoperationTherehavebeenmanyefficientcubecomputationalgorithmsproposedMultiwayarrayaggregationBUCH-cubingStar-cubingIntroduction(cont.)Traditionaldatawarehousemayhave10dimensions,butmorethat109
tuplesButforbioinformatics,textprocessing,dataarehighindimensionality,over100,1000dimensionsbutonlymediuminsize,egaround106
tuples.ExistingmethodistoocostlyincomputationtimeandstoragespacetohighdimensionalOLAPIntroduction(cont.)newmethodcalledshellfragmentVerticallypartitionsahighdimensionaldatasetintoasetofdisjointlowdimensionaldatasetsForeachfragment,computeitlocaldatacubeofflineWhenquery,assemblethesefragmentonlineAnalysisCurseofDimensionalityAhighdimensionaldatacuberequiresmassivememoryanddiskspaceCurrentalgorithmsareunabletomaterializethefullcubeundersuchconditionsIcebergCubeComputingonlythecuboidcellswhosecountorotheraggregatessatisfyingthecondition:HAVINGCOUNT(*)>=minsupMotivationOnlyasmallportionofcubecellsmaybe“abovethewater’’inasparsecubeOnlycalculate“interesting”data—dataabovecertainthresholdProblemofIcebergCubeFirst,ifahigh-dimensionalcellhasthesupportalreadypassingthecebergthreshold,itcannotbeprunedbytheicebergconditionandwillstillgenerateahugenumberofcells.abasecuboidcell:“(a1;a2;:::;a60):5"(i.e.,withcount5)willstillgenerate260icebergcubecells.ProblemofIcebergCube(cont.)Second,itisdifficulttosetupanappropriateicebergthreshold.Atoolowthresholdwillstillgenerateahugecube,butatoohighonemayinvalidatemanyusefulapplications.Third,anicebergcubecannotbeincrementallyup-dated.Samesituationhappensinthedwarf,quotientcubeSubstantialI/OoverheadforaccessingafullmaterializeddatacubeQueryordermightbeincompatiblewithaI/OproblemCuboidsarestoredondiskinsomefixedorder,thatordermightbeincompatiblewithparticularequery.CurrentpartialsolutionComputeathincubeshellCubeidwithMaybe3dimensionsorlessina60Existingalotofproblems:StillneedtocomputealotofcubeidDonotsupportOLAPover4dimensionsCannotsupportdrillingComputationModelSemi-onlinecomputatinmodelwithcertainpre-processingObservation,anOLAPquery: ignoremanydimensions(i.e.,treatingthemasirrelevant)fixsomedimensions(e.g.,usingqueryconstantsasinstantiations)leaveonlyafewtobemanipulated(fordrilling,pivoting,etc.).OLAPoperationsPrecomputationofshellFragmentsInvertedIndexLemma1TheinvertedindextableusesthesameamountofstoragespaceastheoriginaldatabaseShellFragmentsAllthedimensionsofadatasetarepartitionedintoindependentgroups,calledfragments.Foreachfragment,wecomputethecompletelocaldatacubewhileretainingtheinvertedindices.(A1……A60),fragmentsofsize3,140cubeids,whilecubeshellofsizeof336050cubeids.Example(A,B,C)and(D,E)Foreachfragment,wecomputethecompletedatacubebyintersectingthetid-lists{a1b2*}CuboidDELemma2GivenadatabaseofTtuplesandDdimensions,theamountofmemoryneededtostoretheshellfragmentsofsizeFisO(T(D/F)(2F-1))ComputingotherMeasuresSum,averageID_MeasurearrayAlgorithmforShellFragmentComputationOnlineQueryComputationPointQueryseeksaspecialcuboidcellintheoriginaldataspace.Inann-dimensionaldatacube(A1;A2;:::;An),apointqueryisintheformof(a1;a2;:::;an:M)MistheinquiredmeasureFordimensionsthatareirrelevantoraggregated,onecanuse*asitsvalue.SubcubeQueryseeksasetofcuboidcellsintheoriginaldataspaceItisonewhereatleastoneoftherelevantdimensionsinthequeryisinquired,Marked?.<a2;?;c1;*;?:count()>QueryProcessing<a1;a2;:::;an:M>.Eachaihas3possiblevalues:aninstantiatedvalue,Aggregate*,inquire?.Stepsforinstantiateddimensionalgatheralltheinstantiatedai'sifthereareanyexaminetheshellfragmentpartitionstocheckwhichai'sareinthesamefragments.retrievethetid-listsTheobtainedtid-listsareintersectedtoderivetheinstantiatedbasetable.Iftherearenoinquireddimensions,stopotherwiseStepsforinquireddimensionsForeachinquireddimension,weretrieveallitspossiblevaluesandtheirassociatedtid-lists.theyareintersectedwiththeinstantiatedbasetabletoformthelocalbasecuboidoftheinquiredandinstantiateddimensions.AnycubingalgorithmcanbeemployedtocomputethelocaldatacubeShellFragmentGrouping&SizeGroupingdomain-specificknowledgecanbeusedforbettergrouping.Size(F)IfFistoosmall,thespacerequiredtostorethefragmentcubeswillbesmallbutthetimeneededtocomputequeriesonlinewillbelong.2<=F<=4Bottom-UpComputation(BUC)BUC(Beyer&Ramakrishnan,SIGMOD’99)Bottom-upvs.top-down?—dependingonhowyouviewit!Aprioriproperty:Aggregatethedata, thenmovetothenextlevelIfminsupisnotmet,stop!Ifminsup=1TcomputefullCUBE!PartitioningUsually,entiredatasetcan’tfitinmainmemorySortdistinctvalues,partitionintoblocksthatfitContinueprocessingOptimizationsPartiti
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 二手房買賣分期付款標(biāo)準(zhǔn)協(xié)議版B版
- 二零二五版大學(xué)生家庭父母離婚子女撫養(yǎng)及教育協(xié)議3篇
- 二零二五年度賓館洗浴中心品牌戰(zhàn)略規(guī)劃與執(zhí)行合同3篇
- 二零二五年度綠色環(huán)保文具產(chǎn)品直銷合作協(xié)議2篇
- 2024油罐安全合同范本
- 二零二五版快遞柜安全使用與應(yīng)急處理服務(wù)合同2篇
- 二零二五年網(wǎng)絡(luò)安全防護(hù)與視頻監(jiān)控合同3篇
- 二零二五年度設(shè)備采購合同:乙方向甲方購買生產(chǎn)設(shè)備及其附屬設(shè)施
- 2024年綠色能源項(xiàng)目融資租賃合同擔(dān)保條款3篇
- 發(fā)起人協(xié)議書
- 輻射安全知識培訓(xùn)課件
- 2023-2024學(xué)年八年級(上)期末數(shù)學(xué)試卷
- 2025年煙花爆竹儲存證考試題庫
- 2025年北京機(jī)場地服崗位招聘歷年高頻重點(diǎn)提升(共500題)附帶答案詳解
- ICH《M10:生物分析方法驗(yàn)證及樣品分析》
- 2024-2030年全球及中國醫(yī)用除塵器行業(yè)銷售模式及盈利前景預(yù)測報(bào)告
- 2025學(xué)年人教新版英語七下Unit1隨堂小測
- 2024年度光伏發(fā)電項(xiàng)目施工合同工程量追加補(bǔ)充協(xié)議3篇
- 建筑廢棄混凝土處置和再生建材利用措施計(jì)劃
- 2024年藥品質(zhì)量信息管理制度(2篇)
- 2024-2025學(xué)年人教版八年級物理第七章力單元測試題
評論
0/150
提交評論