數(shù)據(jù)倉庫與數(shù)據(jù)挖掘綜述_第1頁
數(shù)據(jù)倉庫與數(shù)據(jù)挖掘綜述_第2頁
數(shù)據(jù)倉庫與數(shù)據(jù)挖掘綜述_第3頁
數(shù)據(jù)倉庫與數(shù)據(jù)挖掘綜述_第4頁
數(shù)據(jù)倉庫與數(shù)據(jù)挖掘綜述_第5頁
已閱讀5頁,還剩77頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘綜述概念、體系結(jié)構(gòu)、趨勢、應(yīng)用報告人:朱建秋提綱數(shù)據(jù)倉庫概念數(shù)據(jù)倉庫體系結(jié)構(gòu)及組件數(shù)據(jù)倉庫設(shè)計數(shù)據(jù)倉庫技術(shù)(與數(shù)據(jù)庫技術(shù)的區(qū)別)數(shù)據(jù)倉庫性能數(shù)據(jù)倉庫應(yīng)用數(shù)據(jù)挖掘應(yīng)用概述數(shù)據(jù)挖掘技術(shù)與趨勢數(shù)據(jù)挖掘應(yīng)用平臺(科委申請項目)數(shù)據(jù)倉庫概念基本概念對數(shù)據(jù)倉庫的一些誤解基本概念—數(shù)據(jù)倉庫Datawarehouseisasubjectoriented,integrated,non-volatileandtimevariantcollectionofdatainsupportofmanagement’sdecision——[Inmon,1996].Datawarehouseisasetofmethods,techniques,andtoolsthatmaybeleveragedtogethertoproduceavehiclethatdeliversdatatoend-usersonanintegratedplatform——[Ladley,1997].Datawarehouseisaprocessofcrating,maintaining,andusingadecision-supportinfrastructure——[Appleton,1995][Haley,1997][Gardner1998].基本概念—數(shù)據(jù)倉庫特征 [Inmon,1996]面向主題一個主題領(lǐng)域的表來源于多個操作型應(yīng)用(如:客戶主題,來源于:定單處理;應(yīng)收帳目;應(yīng)付帳目;…)典型的主題領(lǐng)域:客戶;產(chǎn)品;交易;帳目主題領(lǐng)域以一組相關(guān)的表來具體實現(xiàn)相關(guān)的表通過公共的鍵碼聯(lián)系起來(如:顧客標識號CustomerID)每個鍵碼都有時間元素(從日期到日期;每月累積;單獨日期…)主題內(nèi)數(shù)據(jù)可以存儲在不同介質(zhì)上(綜合級,細節(jié)級,多粒度)集成數(shù)據(jù)提取、凈化、轉(zhuǎn)換、裝載穩(wěn)定性批處理增加,倉庫已經(jīng)存在的數(shù)據(jù)不會改變隨時間而變化(時間維)管理決策支持基本概念—DataMart,ODSDataMart數(shù)據(jù)集市--

小型的,面向部門或工作組級數(shù)據(jù)倉庫。OperationDataStore操作數(shù)據(jù)存儲—ODS是能支持企業(yè)日常的全局應(yīng)用的數(shù)據(jù)集合,是不同于DB的一種新的數(shù)據(jù)環(huán)境,是DW擴展后得到的一個混合形式。四個基本特點:面向主題的(Subject-Oriented)、集成的、可變的、當(dāng)前或接近當(dāng)前的?;靖拍睢狤TL,元數(shù)據(jù),粒度,分割ETLETL(Extract/Transformation/Load)—數(shù)據(jù)裝載、轉(zhuǎn)換、抽取工具。MicrosoftDTS;IBMVisualWarehouseetc.元數(shù)據(jù)關(guān)于數(shù)據(jù)的數(shù)據(jù),用于構(gòu)造、維持、管理、和使用數(shù)據(jù)倉庫,在數(shù)據(jù)倉庫中尤為重要。粒度數(shù)據(jù)倉庫的數(shù)據(jù)單位中保存數(shù)據(jù)的細化或綜合程度的級別。細化程度越高,粒度越小。分割數(shù)據(jù)分散到各自的物理單元中去,它們能獨立地處理。對數(shù)據(jù)倉庫的一些誤解數(shù)據(jù)倉庫與OLAP星型數(shù)據(jù)模型多維分析數(shù)據(jù)倉庫不是一個虛擬的概念數(shù)據(jù)倉庫與范式理論需要非范式化處理提綱數(shù)據(jù)倉庫概念數(shù)據(jù)倉庫體系結(jié)構(gòu)及組件數(shù)據(jù)倉庫設(shè)計數(shù)據(jù)倉庫技術(shù)(與數(shù)據(jù)庫技術(shù)的區(qū)別)數(shù)據(jù)倉庫性能數(shù)據(jù)倉庫應(yīng)用數(shù)據(jù)挖掘應(yīng)用概述數(shù)據(jù)挖掘技術(shù)與趨勢數(shù)據(jù)挖掘應(yīng)用平臺(科委申請項目)數(shù)據(jù)倉庫體系結(jié)構(gòu)及組件體系結(jié)構(gòu)ETL工具元數(shù)據(jù)庫(Repository)及元數(shù)據(jù)管理數(shù)據(jù)訪問和分析工具體系結(jié)構(gòu)構(gòu)[Pieter,1998]SourceDatabasesDataExtraction,Transformation,loadWarehouseAdmin.ToolsExtract,TransformandLoadDataModelingToolCentralMetadataArchitectedDataMartsDataAccessandAnalysisEnd-UserDWToolsCentralDataWarehouseCentralDataWarehouseMid-TierMid-TierDataMartDataMartLocalMetadataLocalMetadataLocalMetadataMetadataExchangeMDBDataCleansingToolRelationalAppl.PackageLegacyExternalRDBMSRDBMS帶ODS的體系結(jié)結(jié)構(gòu)SourceDatabasesHub-DataExtraction,Transformation,loadWarehouseAdmin.ToolsExtract,TransformandLoadDataModelingToolCentralMetadataArchitectedDataMartsDataAccessandAnalysisCentralDataWare-houseandODSCentralDataWarehouseMid-TierRDBMSDataMartMid-TierRDBMSDataMartLocalMetadataLocalMetadataLocalMetadataMetadataExchangeODSOLTPToolsDataCleansingToolRelationalAppl.PackageLegacyExternalMDBEnd-UserDWTools現(xiàn)實環(huán)境境—異質(zhì)質(zhì)性[DouglasHackney,2001]CustomMarketingDataWarehousePackagedOracleFinancialDataWarehousePackagedI2SupplyChainNon-ArchitectedDataMartSubsetDataMartsOracleFinancialsi2SupplyChainSiebelCRM3rdPartye-Commerce聯(lián)合型數(shù)數(shù)據(jù)倉庫庫/數(shù)據(jù)據(jù)集市體體系結(jié)構(gòu)構(gòu)RealTimeODSFederatedFinancialDataWarehouseSubsetDataMartsCommonStagingAreaOracleFinancialsi2SupplyChainSiebelCRM3rdPartyFederatedPackagedI2SupplyChainDataMartsAnalyticalApplicationse-CommerceRealTimeDataMiningandAnalyticsRealTimeSegmentation,Classification,Qualification,Offerings,etc.FederatedMarketingDataWarehouseETLtools&DWtemplatesDataprofiling&reengineeringtoolsDemand-drivendataacquisition&analysisMetadataInterchangeFederateddatawarehouseanddatamartsystemsDecisionenginemodels,rulesandmetricsOLAP&dataminingtools,AnalysistemplatesAnalyticapplicationdevelopmenttools&componentsAnalyticapplicationsFront-andback-officeOLTPe-BusinesssystemsExternalinformationprovidersCRMAnalytics&ReportingSupplyChainAnalytics&ReportingEKP-EnterpriseKnowledgeManagementPortalEPMAnalytics&ReportingBusinessinformation&recommendationsInformeddecisions&actionsFinancialAnalytics&ReportingHRAnalytics&Reporting閉環(huán)的聯(lián)合型型BI體系結(jié)構(gòu)數(shù)據(jù)倉庫的焦焦點問題-數(shù)據(jù)的獲得、、存儲和使用用RelationalPackageLegacyExternalsourceDataCleanToolDataStagingEnterpriseDataWarehouseDatamartDatamartRDBMSROLAPRDBMSEnd-UserToolEnd-UserToolMDBEnd-UserToolEnd-UserTool數(shù)據(jù)倉庫和集集市的加載能能力至關(guān)重要要數(shù)據(jù)倉庫和集集市的查詢輸輸出能力至關(guān)關(guān)重要ETL工具去掉操作型數(shù)數(shù)據(jù)庫中的不不需要的數(shù)據(jù)據(jù)統(tǒng)一轉(zhuǎn)換數(shù)據(jù)據(jù)的名稱和定定義計算匯總數(shù)據(jù)據(jù)和派生數(shù)據(jù)據(jù)估計遺失數(shù)據(jù)據(jù)的缺省值調(diào)節(jié)源數(shù)據(jù)的的定義變化ETL工具體系結(jié)構(gòu)構(gòu)元數(shù)據(jù)庫及元元數(shù)據(jù)管理元數(shù)據(jù)分類::技術(shù)元數(shù)據(jù)據(jù);商業(yè)元數(shù)數(shù)據(jù);數(shù)據(jù)倉倉庫操作型信信息。-[AlexBersonetc,1999]技術(shù)元數(shù)據(jù)包括為數(shù)據(jù)倉倉庫設(shè)計人員員和管理員使使用的數(shù)據(jù)倉倉庫數(shù)據(jù)信息息,用于執(zhí)行行數(shù)據(jù)倉庫開開發(fā)和管理任任務(wù)。包括::數(shù)據(jù)源信息轉(zhuǎn)換描述(從從操作數(shù)據(jù)庫庫到數(shù)據(jù)倉庫庫的映射方法法,以及轉(zhuǎn)換換數(shù)據(jù)的算法法)目標數(shù)據(jù)的倉倉庫對象和數(shù)數(shù)據(jù)結(jié)構(gòu)定義義數(shù)據(jù)清洗和數(shù)數(shù)據(jù)增加的規(guī)規(guī)則數(shù)據(jù)映射操作作訪問權(quán)限,備備份歷史,存存檔歷史,信信息傳輸歷史史,數(shù)據(jù)獲取取歷史,數(shù)據(jù)據(jù)訪問,等等等元數(shù)據(jù)庫及元元數(shù)據(jù)管理商業(yè)元數(shù)據(jù)給用戶易于理理解的信息,,包括:主題區(qū)和信息息對象類型,,包括查詢、、報表、圖像像、音頻、視視頻等Internet主頁支持數(shù)據(jù)倉庫庫的其它信息息,例如對于于信息傳輸系系統(tǒng)包括預(yù)約約信息、調(diào)度度信息、傳送送目標的詳細細描述、商業(yè)業(yè)查詢對象,,等數(shù)據(jù)倉庫操作作型信息例如,數(shù)據(jù)歷歷史(快照,,版本),擁擁有權(quán),抽取取的審計軌跡跡,數(shù)據(jù)用法法元數(shù)據(jù)庫及元元數(shù)據(jù)管理元數(shù)據(jù)庫(metadatarepository)和工具—[MartinStardt,2000]數(shù)據(jù)訪問和分分析工具報表OLAP數(shù)據(jù)挖掘提綱數(shù)據(jù)倉庫概念念數(shù)據(jù)倉庫體系系結(jié)構(gòu)及組件件數(shù)據(jù)倉庫設(shè)計計數(shù)據(jù)倉庫技術(shù)術(shù)(與數(shù)據(jù)庫庫技術(shù)的區(qū)別別)數(shù)據(jù)倉庫性能能數(shù)據(jù)倉庫應(yīng)用用數(shù)據(jù)挖掘應(yīng)用用概述數(shù)據(jù)挖掘技術(shù)術(shù)與趨勢數(shù)據(jù)挖掘應(yīng)用用平臺(科委委申請項目))數(shù)據(jù)倉庫設(shè)計計自上而下(Top-Down)自底而上(BottomUp)混合的方法數(shù)據(jù)倉庫建模模Top-downApproachBuildEnterprisedatawarehouseCommoncentraldatamodelDatare-engineeringperformedonceMinimizeredundancyandinconsistencyDetailedandhistorydata;globaldatadiscoveryBuilddatamartsfromtheEnterpriseDataWarehouse(EDW)SubsetofEDWrelevanttodepartmentMostlysummarizeddataDirectdependencyonEDWdataavailabilityLocalDataMartExternalDataLocalDataMartOperationalDataEnterpriseWarehouse自底而上設(shè)計計方法創(chuàng)建部門的數(shù)數(shù)據(jù)集市范圍局限于一一個主題區(qū)域域快速的ROI--局部的商業(yè)需需求得到滿足足本部門自治--設(shè)計計上具有靈活活性對其他部門門數(shù)據(jù)集市市是一個好好的指導(dǎo)容易復(fù)制到到其他部門門需要為每個個部門做數(shù)數(shù)據(jù)重建有一定級別別的冗余和和不一致性性一個切實可可行的方法法擴大到企業(yè)業(yè)數(shù)據(jù)倉庫庫創(chuàng)建EDB作為一個長長期的目標標局部數(shù)據(jù)集集市外部數(shù)據(jù)操作型數(shù)據(jù)(全部)操作型數(shù)據(jù)(局部)操作型數(shù)據(jù)據(jù)(局部)局部數(shù)據(jù)集集市企業(yè)數(shù)據(jù)倉倉庫EDB數(shù)據(jù)倉庫建建?!切切湍J绞紼xampleofStarSchemaDateMonthYearDateCustIdCustNameCustCityCustCountryCustSalesFactTable

Date

Product

Store

Customer

unit_sales

dollar_sales

Yen_salesMeasurementsProductNoProdNameProdDescCategoryQOHProductStoreIDCityStateCountryRegionStore數(shù)據(jù)倉庫建建?!┭┢J绞紻ateMonthDateCustIdCustNameCustCityCustCountryCustSalesFactTableDateProductStoreCustomerunit_salesdollar_salesYen_salesMeasurementsProductNoProdNameProdDescCategoryQOHProductMonthYearMonthYearYearCityStateCityCountryRegionCountryStateCountryStateStoreIDCityStoreExampleofSnowflakeSchema操作型(OLTP)數(shù)據(jù)源銷銷售庫星形模式時間維事實表多維模型事實度量(Metrics)時間維時間維的屬性提綱數(shù)據(jù)倉庫概概念數(shù)據(jù)倉庫體體系結(jié)構(gòu)及及組件數(shù)據(jù)倉庫設(shè)設(shè)計數(shù)據(jù)倉庫技技術(shù)(與數(shù)數(shù)據(jù)庫技術(shù)術(shù)的區(qū)別))數(shù)據(jù)倉庫性性能數(shù)據(jù)倉庫應(yīng)應(yīng)用數(shù)據(jù)挖掘應(yīng)應(yīng)用概述數(shù)據(jù)挖掘技技術(shù)與趨勢勢數(shù)據(jù)挖掘應(yīng)應(yīng)用平臺((科委申請請項目)數(shù)據(jù)倉庫技技術(shù)——[Inmon,1996]管理大量數(shù)數(shù)據(jù)能夠管理大大量數(shù)據(jù)的的能力能夠管理好好的能力管理多介質(zhì)質(zhì)(層次))主存、擴展展內(nèi)存、高高速緩存、、DASD、、光盤、縮微微膠片監(jiān)視數(shù)據(jù)決定是否應(yīng)應(yīng)數(shù)據(jù)重組組決定索引是是否建立得得不恰當(dāng)決定是否有有太多數(shù)據(jù)據(jù)溢出決定剩余的的可用空間間利用多種技技術(shù)獲得和和傳送數(shù)據(jù)據(jù)批模式,聯(lián)聯(lián)機模式并并不非常有有用程序員/設(shè)設(shè)計者對數(shù)數(shù)據(jù)存放位位置的控制制(塊/頁頁)數(shù)據(jù)的并行行存儲/管管理元數(shù)據(jù)管理理數(shù)據(jù)倉庫技技術(shù)——[Inmon,1996]數(shù)據(jù)倉庫語語言接口能夠一次訪訪問一組數(shù)數(shù)據(jù)能夠一次訪訪問一條記記錄支持一個或或多個索引引有SQL接口數(shù)據(jù)的高效效裝入高效索引的的利用用位映像的的方法、多多級索引等等數(shù)據(jù)壓縮I/O資源比CPU資源少得多多,因此數(shù)數(shù)據(jù)解壓縮縮不是主要要問題復(fù)合合鍵鍵碼碼((因因為為數(shù)數(shù)據(jù)據(jù)隨隨時時間間變變化化))變長長數(shù)數(shù)據(jù)據(jù)加鎖鎖管管理理((程程序序員員能能顯顯式式控控制制鎖鎖管管理理程程序序))單獨獨索索引引處處理理((查查看看索索引引就就能能提提供供某某些些服服務(wù)務(wù)))快速速恢恢復(fù)復(fù)數(shù)據(jù)據(jù)倉倉庫庫技技術(shù)術(shù)——[Inmon,,1996]其他他技技術(shù)術(shù)特特征征,,傳傳統(tǒng)統(tǒng)技技術(shù)術(shù)起起很很小小作作用用事務(wù)務(wù)集集成成性性、、高高速速緩緩存存、、行行/頁頁級級鎖鎖定定、、參參照照完完整整性性、、數(shù)數(shù)據(jù)據(jù)視視圖圖傳統(tǒng)統(tǒng)DBMS與數(shù)數(shù)據(jù)據(jù)倉倉庫庫DBMS區(qū)別別為數(shù)數(shù)據(jù)據(jù)倉倉庫庫和和決決策策支支持持優(yōu)優(yōu)化化設(shè)設(shè)計計管理理更更多多數(shù)數(shù)據(jù)據(jù)::10GB/100GB/TB傳統(tǒng)統(tǒng)DBMS適合合記記錄錄級級更更新新,,提提供供::鎖鎖定定Lock、、提交交Commit、、檢測測點點CheckPoint、、日志志處處理理Log、、死鎖鎖處處理理DeadLock、、回退退Roolback.基本本數(shù)數(shù)據(jù)據(jù)管管理理,,如如::塊塊管管理理,,傳傳統(tǒng)統(tǒng)DBMS需要要預(yù)預(yù)留留空空間間索引引區(qū)區(qū)別別::傳傳統(tǒng)統(tǒng)DBMS限制制索索引引數(shù)數(shù)量量,,數(shù)數(shù)據(jù)據(jù)倉倉庫庫DBMS沒有有限限制制通用用DBMS物理理上上優(yōu)優(yōu)化化便便于于事事務(wù)務(wù)訪訪問問處處理理,,而而數(shù)數(shù)據(jù)據(jù)倉倉庫庫便便于于DSS訪問問分分析析改變變DBMS技術(shù)術(shù)多維維DBMS和數(shù)數(shù)據(jù)據(jù)倉倉庫庫多維維DBMS作為為數(shù)數(shù)據(jù)據(jù)倉倉庫庫的的數(shù)數(shù)據(jù)據(jù)庫庫技技術(shù)術(shù),,這這種種想想法法是是不不正正確確的的多維DBMS((OLAP)是一種種技術(shù)術(shù),數(shù)數(shù)據(jù)倉倉庫是是一種種體系系結(jié)構(gòu)構(gòu)的基基礎(chǔ)雙重粒粒度級級別((DASD/磁帶))數(shù)據(jù)倉倉庫技技術(shù)——[Inmon,1996]數(shù)據(jù)倉倉庫環(huán)環(huán)境中中的元元數(shù)據(jù)據(jù)DSS分析人人員和和IT專業(yè)人人員不不同,,需要要元數(shù)數(shù)據(jù)的的幫助助操作型型環(huán)境境和數(shù)數(shù)據(jù)倉倉庫環(huán)環(huán)境之之間的的映射射需要要元數(shù)數(shù)據(jù)數(shù)據(jù)倉倉庫包包含很很長時時間的的數(shù)據(jù)據(jù),必必須有有元數(shù)數(shù)據(jù)標標記數(shù)數(shù)據(jù)結(jié)結(jié)構(gòu)/定義義上下文文和內(nèi)內(nèi)容((上下下文維維)簡單上上下文文信息息(數(shù)數(shù)據(jù)結(jié)結(jié)構(gòu)/編碼碼/命命名約約定/度量量)復(fù)雜上上下文文信息息(產(chǎn)產(chǎn)品定定義/市場場領(lǐng)域域/定定價/包裝裝/組組織結(jié)結(jié)構(gòu)))外部上上下文文信息息(經(jīng)經(jīng)濟預(yù)預(yù)測::通貨貨膨脹脹、金金融、、稅收收/政政治信信息/競爭爭信息息/技技術(shù)進進展))刷新數(shù)數(shù)據(jù)倉倉庫數(shù)據(jù)復(fù)復(fù)制((觸發(fā)發(fā)器))變化數(shù)數(shù)據(jù)捕捕獲((CDC)((日志))提綱數(shù)據(jù)倉倉庫概概念數(shù)據(jù)倉倉庫體體系結(jié)結(jié)構(gòu)及及組件件數(shù)據(jù)倉倉庫設(shè)設(shè)計數(shù)據(jù)倉倉庫技技術(shù)((與數(shù)數(shù)據(jù)庫庫技術(shù)術(shù)的區(qū)區(qū)別))數(shù)據(jù)倉倉庫性性能數(shù)據(jù)倉倉庫應(yīng)應(yīng)用數(shù)據(jù)挖挖掘應(yīng)應(yīng)用概概述數(shù)據(jù)挖挖掘技技術(shù)與與趨勢勢數(shù)據(jù)挖挖掘應(yīng)應(yīng)用平平臺((科委委申請請項目目)數(shù)據(jù)倉倉庫性性能—[Inmon,1999]使用數(shù)據(jù)平臺服務(wù)管理王天佑等等譯,《數(shù)數(shù)據(jù)倉庫管管理》,電電子工業(yè)業(yè)出版社,,2000年5月提綱數(shù)據(jù)倉庫概概念數(shù)據(jù)倉庫體體系結(jié)構(gòu)及及組件數(shù)據(jù)倉庫設(shè)設(shè)計數(shù)據(jù)倉庫技技術(shù)(與數(shù)數(shù)據(jù)庫技術(shù)術(shù)的區(qū)別))數(shù)據(jù)倉庫性性能數(shù)據(jù)倉庫應(yīng)應(yīng)用數(shù)據(jù)挖掘應(yīng)應(yīng)用概述數(shù)據(jù)挖掘技技術(shù)與趨勢勢數(shù)據(jù)挖掘應(yīng)應(yīng)用平臺((科委申請請項目)數(shù)據(jù)倉庫應(yīng)應(yīng)用—DW用戶數(shù)的調(diào)調(diào)查“DW系統(tǒng)的用在100-500以內(nèi)或以上是未來一段時期內(nèi)的主要部分“DW用戶的調(diào)查最近一年MetaGroupSurvey調(diào)查對象::3000+用戶戶或意向用用戶DW數(shù)據(jù)規(guī)模的的調(diào)查DW規(guī)模的調(diào)查查最近一年MetaGroupSurvey調(diào)查對象::3000+用戶戶或意向用用戶HowMuch?$3-6mformid-sizecompany,lessifsmaller,moreiflarger$10m+forlargeorganizations,largedatasets10-50+%annualmaintenancecosts33%Hardware/33%Software/33%ServicesHowLong?2-4yearsfor80/20offullsystemformid-sizecompany6-12monthsforinitialiteration3-6monthsforsubsequentiterationsHowRisky?ForEDWProjects,20%(Meta)to70%(OTR,DWN)failHighfailureratefornon-businessdriveninitiativesVeryfewsystemsmeettheexpectationsofthebusinessFailurenotduetotechnology,dueto“soft”issuesMassiveupsidetosuccessfulprojects(100%-2000+%ROI)99%politics-1%technology參考文獻提綱數(shù)據(jù)倉庫庫概念數(shù)據(jù)倉庫庫體系結(jié)結(jié)構(gòu)及組組件數(shù)據(jù)倉庫庫設(shè)計數(shù)據(jù)倉庫庫技術(shù)((與數(shù)據(jù)據(jù)庫技術(shù)術(shù)的區(qū)別別)數(shù)據(jù)倉庫庫性能數(shù)據(jù)倉庫庫應(yīng)用數(shù)據(jù)挖挖掘應(yīng)應(yīng)用概概述數(shù)據(jù)挖挖掘技技術(shù)與與趨勢勢數(shù)據(jù)挖挖掘應(yīng)應(yīng)用平平臺((科委委申請請項目目)數(shù)據(jù)挖挖掘應(yīng)應(yīng)用綜綜述數(shù)據(jù)挖挖掘應(yīng)應(yīng)用概概述數(shù)據(jù)挖挖掘技技術(shù)與與趨勢勢數(shù)據(jù)挖挖掘應(yīng)應(yīng)用平平臺數(shù)據(jù)挖挖掘應(yīng)應(yīng)用概概述應(yīng)用比比例DataMiningUpsidesDataMiningDownsidesDataMiningUseDataMiningIndustryandApplicationDataMiningCosts應(yīng)用比比例Discoveryofpreviouslyunknownrelationships,trends,anomalies,etc.PowerfulcompetitiveweaponAutomationofrepetitiveanalysisPredictivecapabilitiesDataMiningUpsidesKnowledgediscoverytechnologyimmatureLonglearningandtuningcyclesforsometechnologies“Blackbox””technologyminimizesconfidenceVLDB(VeryLargeDataBase)requirementsDataMiningDownsidesDataMiningUsesDiscoveranomalies,outliersandexceptionsinprocessdataDiscoverbehaviorandpredictoutcomesofcustomerrelationshipsChurnmanagementTargetmarketing(marketofone)PromotionmanagementFrauddetectionPatternID&matching(darkprograms,science)DataMiningIndustryandApplicationsFromresearchprototypestodataminingproducts,languages,andstandardsIBMIntelligentMiner,SASEnterpriseMiner,SGIMineSet,Clementine,MS/SQLServer2000,DBMiner,BlueMartini,MineIt,DigiMine,etc.Afewdatamininglanguagesandstandards(esp.MSOLEDBforDataMining).ApplicationachievementsinmanydomainsMarketanalysis,trendanalysis,frauddetection,outlieranalysis,Webmining,etc.DataMiningCostsDesktoptools:$500andup(MSFTcomingatlowpricepoint)Server/MFbased:$20,000to$700,000+MustalsoaddcostofextensiveconsultingforhighendtoolsDon’tforgetlongtrainingandlearningcurvetimeOngoingprocess,nottaskautomationsoftware提綱數(shù)據(jù)倉倉庫概概念數(shù)據(jù)倉倉庫體體系結(jié)結(jié)構(gòu)及及組件件數(shù)據(jù)倉倉庫設(shè)設(shè)計數(shù)據(jù)倉倉庫技技術(shù)((與數(shù)數(shù)據(jù)庫庫技術(shù)術(shù)的區(qū)區(qū)別))數(shù)據(jù)倉倉庫性性能數(shù)據(jù)倉倉庫應(yīng)應(yīng)用數(shù)據(jù)挖挖掘應(yīng)應(yīng)用概概述數(shù)據(jù)挖挖掘技技術(shù)與與趨勢勢數(shù)據(jù)挖挖掘應(yīng)應(yīng)用平平臺((科委委申請請項目目)數(shù)據(jù)挖挖掘趨趨勢歷史回回顧多學(xué)科科交叉叉數(shù)據(jù)挖挖掘從從多個個角度度分類類最近十十年的的研究究進展展數(shù)據(jù)挖挖掘的的趨勢勢數(shù)據(jù)挖挖掘與與標準準化進進程歷史回回顧1989IJCAIWorkshoponKnowledgeDiscoveryinDatabasesKnowledgeDiscoveryinDatabases(G.Piatetsky-ShapiroandW.Frawley,1991)1991-1994WorkshopsonKnowledgeDiscoveryinDatabasesAdvancesinKnowledgeDiscoveryandDataMining(U.Fayyad,G.Piatetsky-Shapiro,P.Smyth,andR.Uthurusamy,1996)1995-1998InternationalConferencesonKnowledgeDiscoveryinDatabasesandDataMining(KDD’’95-98)JournalofDataMiningandKnowledgeDiscovery(1997)1998ACMSIGKDD,SIGKDD’1999-2001conferences,andSIGKDDExplorationsMoreconferencesondataminingPAKDD,PKDD,SIAM-DataMining,(IEEE)ICDM,DaWaK,SPIE-DM,etc.DataMining:ConfluenceofMultipleDisciplinesDataMiningDatabaseTechnologyStatisticsOtherDisciplinesInformationScienceMachineLearning(AI)VisualizationAMulti-DimensionalViewofDataMiningResearchProgressintheLastDecadeMulti-dimensionaldataanalysis:DatawarehouseandOLAP(on-lineanalyticalprocessing)Association,correlation,andcausalityanalysisClassification:scalabilityandnewapproachesClusteringandoutlieranalysisSequentialpatternsandtime-seriesanalysisSimilarityanalysis:curves,trends,images,texts,etc.Textmining,WebminingandWebloganalysisSpatial,multimedia,scientificdataanalysisDatapreprocessinganddatabasecompressionDatavisualizationandvisualdataminingManyothers,e.g.,collaborativefilteringResearchDirections—[HanJ.W.,2001]WebminingTowardsintegrateddataminingenvironmentsandtools“Vertical”(orapplication-specific)dataminingInvisibledataminingTowardsintelligent,efficient,andscalabledataminingmethodsTowardsIntegratedDataMiningEnvironmentsandToolsOLAPMining:IntegrationofDataWarehousingandDataMiningQueryingandMining:AnIntegratedInformationAnalysisEnvironmentBasicMiningOperationsandMiningQueryOptimization“Vertical”(orapplication-specific)dataminingInvisibledataminingQueryingandMining:AnIntegratedInformationAnalysisEnvironmentDataminingasacomponentofDBMS,datawarehouse,orWebinformationsystemIntegratedinformationprocessingenvironmentMS/SQLServer-2000(Analysisservice)IBMIntelligentMineronDB2SASEnterpriseMiner:datawarehousing+miningQuery-basedminingQueryingdatabase/DW/WebknowledgeEfficiencyandflexibility:preprocessing,on-lineprocessing,optimization,integration,etc.“Vertical””DataMiningGenericdataminingtools?——Toosimpletomatchdomain-specific,sophisticatedapplicationsExpertknowledgeandbusinesslogicrepresentmanyyearsofworkintheirownfields!Datamining+businesslogic+domainexpertsAmulti-dimensionalviewofdataminersComplexityofdata:Web,sequence,spatial,multimedia,……Complexityofdomains:DNA,astronomy,market,telecom,……Domain-specificdataminingtoolsProvideconcrete,killersolutiontospecificproblemsFeedbacktobuildmorepowerfultoolsInvisibleDataMiningBuildminingfunctionsintodailyinformationservicesWebsearchengine(linkanalysis,authoritativepages,userprofiles)—adaptivewebsites,etc.Improvementofqueryprocessing:history+dataMakingservicesmartandefficientBenefitsfrom/todataminingresearchDataminingresearchhasproducedmanyscalable,efficient,novelminingsolutionsApplicationsfeednewchallengeproblemstoresearchTowardsIntelligentToolsforDataMiningIntegrationpavesthewaytointelligentminingSmartinterfacebringsintelligenceEasytouse,understandandmanipulateOnepicturemayworth1,000wordsVisualandaudiodataminingHuman-CenteredDataMiningTowardsself-tuning,self-managing,self-triggeringdataminingIntegratedMining:ABoosterforIntelligentMiningIntegrationpavesthewaytointelligentminingDataminingintegrateswithDBMS,DW,WebDB,etcIntegrationinheritsthepowerofup-to-dateinformationtechnology:querying,MDanalysis,similaritysearch,etc.MiningcanbeviewedasqueryingdatabaseknowledgeIntegrationleadstostandardinterface/language,function/processstandardization,utility,andreachabilityEfficiencyandscalabilitybringintelligentminingtoreality數(shù)據(jù)挖挖掘與與標準準化進進程CRISP—DM過程標標準化化(CRoss-IndustryStandardProcessforDataMining))XML與數(shù)據(jù)據(jù)預(yù)處處理相相結(jié)合合SOAP((SimpleObjectAccessProtocol))數(shù)據(jù)庫庫與系系統(tǒng)互互操作作的標標準PMML預(yù)言模型型交換標標準OLEDBForDataMining數(shù)據(jù)挖掘掘系統(tǒng)基基于API的接口提綱數(shù)據(jù)倉庫庫概念數(shù)據(jù)倉庫庫體系結(jié)結(jié)構(gòu)及組組件數(shù)據(jù)倉庫庫設(shè)計數(shù)據(jù)倉庫庫技術(shù)((與數(shù)據(jù)據(jù)庫技術(shù)術(shù)的區(qū)別別)數(shù)據(jù)倉庫庫性能數(shù)據(jù)倉庫庫應(yīng)用數(shù)據(jù)挖掘掘應(yīng)用概概述數(shù)據(jù)挖掘掘技術(shù)與與趨勢數(shù)據(jù)挖掘掘應(yīng)用平平臺(科科委申請請項目))數(shù)據(jù)挖掘掘應(yīng)用平平臺項目最終終目標研究內(nèi)容容(含系系統(tǒng)結(jié)構(gòu)構(gòu)、層次次等)技術(shù)路線線和實現(xiàn)現(xiàn)方法關(guān)鍵技術(shù)術(shù)分析成果形式式和考核

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論