




版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
Lesson10DataWarehouseOverview
(第十課數據倉庫概論)
Vocabulary(詞匯)ImportantSentences(重點句)QuestionsandAnswers(問答)Problems(問題)
TheworddatawarehousewasfirstdevelopedbyBillInmonintheearly1990s.Hereferredtoitasbeingaintegratedcollectionofinformationthatcouldhelpcompaniesandorganizationsmakebetterdecisions.
Tobeeffective,adatawarehousehadtobeintegrated,subjectoriented,non-volatile,andtimevariant.Inthisarticle,Iwillgooverallthesefactorsindetail.Ifyouarebuildingadatawarehouse,itisimportantforyoutounderstandwhytheyareimportant.
Beingsubjectorientedmeansthatthedatawillprovideinformationaboutaspecificsubjectratherthantheinformationaboutthefunctionsofacompany.Becauseadatawarehouseissubjectoriented,itwillallowyoutoanalyzeinformationthatisconnectedtoaspecificsubject.Beingintegratedmeansthatthedatathatiscollectedwithinthedatawarehousecancomefromdifferentsources,butcanbecombinedintooneunitthatisrelevantandlogical.Havingatime-variantmeansthatalltheinformationwithinthedatawarehousecanbefoundwithagivenperiodoftime.[1]
Itisimportantthattheinformationcontainedwithinadatawarehouseisstable.Whiledatacanbeadded,itshouldneverbedeleted.Thispropertyisreferredtoasbeingnon-volatile.Whenacompanyusesadatawarehousethatisstable,thiswillallowthemtogetabetterunderstandingoftheoperationswithintheircompany.Despitethefactthatthesetermswerefirstcoinedinthe1990s,theyarestillhighlyaccuratetoday.However,itshouldbenotedthatsomedatawarehousesarevolatile.Thereasonforthisisbecausemanymoderndatawarehousesdealwithterabytesofdata.Becausetheymuststoreterabytesofdata,manycompaniesareforcedtodeletesomeoftheirinformationafteracertainperiodoftime.Forinstance,somecompanieswillsystematicallydeletedatathathasreachedthreeyearsofage.Beforeadatawarehousecanbebuilt,thecorrectdatamustbelocated.Generally,theinformationthatwillbeaddedtothewarehousewillcomefromdailyinformationorhistoricalinformation.Thehistoricalinformationmaybestoredinalegacysystem,andischallengingtoextract.
Thedesignofthedatawarehouseisimportantaswell.Itisimportantfordesignerstomakesurethedesignisconsistentwiththequeriesthatwillbeconductedwithinthewarehouse.Todothissuccessfully,itisimportantfordesignerstounderstandthedatabaseschema.Itiscrucialtomakesurethedatawarehouseisdesignedcorrectly,asitisdifficulttorecreatesomeformsofdata.Anotherimportantaspectofdatawarehousesisdataacquisition.Dataacquisitioncanbedefinedastransferringdatafromasourcetothewarehouse.Dataacquisitionisoneofthemostexpensivepartsofbuildingadatawarehouse.ThisprocesswilloftenbeconductedwithanETL(Extracting,TranslatingandLoading)tool.
Asofthistime,therearejustover50ETLtoolsbeingsold.Itmaycostacompanymillionsofdollarsinordertotransferdatafromsourcestothewarehouse.Oncetheinitialdatahasbeentransferredtothedatawarehouse,theprocessmustberepeatedconsistently.Dataacquisitionisacontinuousprocess,andthegoalofacompanyistomakesurethewarehouseisupdatedonaregularbasis.Whenthewarehouseisupdated,itisoftenhardtodeterminewhichinformationinthesourcehaschangedsincethepreviousupdate.Theprocessofdealingwiththisissueiscalledchangeddatacapture.Thisprocesshasbecomeaseparatefield,andthereareanumberofproductscurrentlybesoldtodealwithit.
Itisimportantfordatatobecleanedbeforeitcanbeplacedinthewarehouse.Thedatacleansingprocessisusuallydoneduringthedataacquisitionphase.Anydatathatisplacedinawarehousebeforebeingcleanwillposeadangertothesystem,anditcannotbeused.Thereasonforthisisbecausethedatamaynotbecorrectifitisnotcleaned,andacompanymaymakeincorrectdecisionsbasedonit.Thiscouldleadtoanumberofproblems.Forexample,alltheinformationwithinadatawarehousethatmeansthesamethingmustbestoredinthesameform.Ifthereisinformationthatreads“MS”and“Microsoft”,eventhoughtheymeanthesamething,onlyoneofthemcanbeusedtorecognizetheelementwithinthedatawarehouse.1DataWarehouseTools
Thereareanumberofimportanttoolswhichareconnectedtodatawarehouses,andoneoftheseisdataaggregation.Adatawarehousecanbedesignedtostoreinformationbasedonacertainlevelofdetail.Forexample,youcanstoredatabasedoneachtransaction,oryoucanstoreitbasedonasummary.Theseareexamplesofdataaggregation.Whendataissummarized,thequerieswillmoveatamuchfasterrate.However,someoftheinformationmaybelostduringaquery,andthisinformationmaybeimportantforsolvingacertainproblem.
Beforeyoudecidewhichoneyouwilluse,itisimportanttoweighyouroptionscarefully.Onceyouhavecarriedoutanoperation,youwillneedtorebuildthewarehouseinordertoundoit.Thebestwaytohandlethissituationistomakesurethedatawarehouseisconstructedwithalargeamountofdetail.However,thecostforthiscanbehugedependingonthestorageoptionsyouchoose.Onceyouhavefilledyourdatawarehousewithimportantinformation,youwillwanttousethisdatatohelpyoumakesmartinvestmentdecisions.Thetoolsthatcanallowyoutodothiswillfallunderatopicthatiscalledbusinessintelligence.
Businessintelligenceisafieldwhichisverydiverse.ItiscomprisedofthingssuchasExecutiveInformationSystems,DecisionSupportSystems,andBusinessintelligencecanfurtherbebrokendownintoafieldthatiscalledmulti-dimensionalanalysistools.Thesearetoolsthatwillallowausertoviewdatafromawidevarietyofangles.AquerytoolwillallowausertosendSQLquerieswithinawarehousetolookforresults.Dataminingisalsoafieldthatfallsunderbusinessintelligence,andwillallowyoutolookforpatternsandrelationshipswithinadatawarehouse.
Anothertoolthatisconnectedtodatawarehousesisdatavisualization.Thetoolsthatareusedfordatavisualizationwillpresentvisualmodelsofdata.Thisdatacouldcomeintheformofintricate3Dimages.Thegoalofdatavisualizationistoallowtheusertoviewtrendsinamethodwhichiseasiertounderstandthancomplicatedmodelsthatarebasedoffstatistics.OnetoolthatisallowingthisfieldtoadvanceisVRML,orVirtualRealityModelingLanguage.Inorderfordatawarehousestofunctionproperly,itisalsoimportanttoplaceanemphasisonmetadatamanagement.Metadatacanbedescribedasbeing“informationaboutinformation”.
Metadatamustbemanagedwhendataisacquiredoranalyzed.Metadatawillbeheldinarepository,andcangiveyouimportantinformationaboutmanyofthedatawarehousetools.Theprocessofproperlymanagingmetadatahasbecomeasciencewithinitself.Ifitisdoneproperly,thecompanycangreatlybenefit.Thereasonwhyitisimportantisbecauseitcanalloworganizationstoanalyzethechangesthatoccurwithindatabasetables.Thisisatoolthatplaysanimportantpartoftheconstructionofadatawarehouse.
Datawarehousingisafieldwhichissomewhatcomplicated.Therearemanyvendorswhoareattemptingtoadvertisethetools,butthecostandcomplexityinvolvedwiththeproductshasnotallowedthemtobeusedbyalargenumberofcompanies.Anycompanythatisthinkingofusingdatawarehousesmustmakesuretheyhavetakenthetimetoreviewandunderstandthetechnology.Itcanonlybeusefulifyouknowhowtouseit.Onceyouunderstandandacquirethetechnology,itispossibleforyoutogainapowerfuladvantageoveryourcompetitors.Thishasmadedatawarehousesattractivetomanycompanies.
Oneofthebiggestadvantagestodatawarehousesisthattheyallowyoutostoreinformationthatyoucanusetoimprovethemarketingstrategiesofyourcompany.Notonlycanyouimprovethemarketingstrategies,butyouwillalsobeabletomakestrategicdecisionsbasedontheinformationyouhavecompiledandorganized.Withtechniquessuchasdatamininganddatavisualization,youwillbeabletodiscoverimportantpatternsthatyoudidn’tknowexisted.Thepatternsthatyoudiscovercanallowyourcompanytoearnlargeprofits.2DataWarehousingMethods
Mostorganizationsagreethatdatawarehousesareausefultool.Theybenefitfromtheabilitytostoreandanalyzedata,andthiscanallowthemtomakesoundbusinessdecisions.Itisalsoimportantforthemtomakesurethecorrectinformationispublished,anditshouldbeeasytoaccessbythepeoplewhoareresponsibleformakingdecisions.
Therearetwoelementsthatmakeupthedatawarehouseenvironment,andthesearepresentationandstaging.Thestagingcouldalsobeknownastheacquisitionarea.ItiscomposedofETLoperations,andoncethedatahasbeenprepared,itwillbesenttothepresentationarea.
Whenthedataisplacedwithinthepresentationarea,anumberofprogramswillanalyzeandreviewit.Whilemanyorganizationsagreeontheoverallgoalofdatawarehouses,theapproachestobuildingthemmaydiffer.Attemptingtousedatamartsaloneisnotagoodapproach,becausetheyaregearedtowardsdepartments.Inadditiontothis,attemptingtousedatamartsalonewillbeinefficient,andyouwillrunintoanumberoflongtermproblems.Therearetwotechniquesforbuildingdatawarehousesthathavebecomeverypopular.ThesearetheKimballBusArchitectureandtheCorporateInformationFactory.
WiththeKimballtechnique,theroughdatawillbetransformedandrefinedwithinthestagingarea.Itisimportanttomakesurethedataisproperlyhandledduringthisstep.Duringthestagingprocess,theroughdatawillbepulledfromthesourcesystems.Whilesomeofthestagingprocessesmaybecentralized,otherswillbedistributed.Thepresentationareawillhaveadimensionalstructure,andthismodelwillholdthesameinformationasastandardmodel.However,itwillbeeasiertouse,anditwilldisplayinformationthatissummarized.
Adimensionalmodelwillbecreatedbyabusinessoperation.Departmentswithintheorganizationdonotplayaroleinthis.Thedatawillbepopulatedonceitisplacedwithinthedimensionalwarehouse,andisnotdependentonthevariousdepartmentsthatmaycomposeanorganization.Whenbusinessprocesseshavebeendevelopedwithinthewarehouse,thesystemwillbecomehighlyefficient.ThenextpopulardatawarehouseapproachthatyouwillwanttobecomefamiliarwithistheCorporateInformationFactory.AnothernameforthistechniqueistheEDWapproach.Thedatathatisextractedfromthesourcewillbecoordinated.
WithintheCIF,astandarddatawarehouseisusedtoholddatarepositories,anditmayalsohavespecificdatawarehouseswhicharedesignedfordatamining.Thedatamartsmaybedesignedforspecificdepartments,andtheymayhavesummarydatawhichisintheformofadimensionalstructure.Theatomicdatamaybeobtainedfromthestandarddatawarehouse.Whiletherearesomesimilaritiesbetweenthesetotechniques,therearesomenotabledifferencesaswell.
Oneoftheprimarydifferencesbetweenthesetwotechniquesisthenormalizeddatafoundation.WiththeKimballapproach,thedatastructuresthatmustbeobtainedbeforethedimensionalpresentationwillbedependentonthesourcedataandtransformation.Inmostcases,theduplicatestorageofdataisnotrequiredinbothdimensionalandnormalizedfoundations.Manyofthepeoplewhochoosetouseanormalizeddatastructurebelievethatitisfasterthanthedimensionalstructure,buttheyoftenfailtotakeETLintoconsideration.
Anotherthingthatseparatesthetwodatawarehouseapproachesisthemanagementofatomicdata.WiththeCIF,atomicdatawillbestoredwithinanormalizeddatawarehouse.Incontrast,theKimballmethodstatesthattheatomicdatashouldbeplacedwithinadimensionalstructure.Whenthedataisplacedwithinadimensionalstructure,itcanbesummarizedinawidevarietyofdifferentways.
Itisimportanttomakesuretheinformationyouhaveisdetailedsothatuserswillbeabletoaskrelevantquestions.Whilemostuserswillnotplaceanemphasisonthedetailsofoneatomictransaction,theymaywantasummaryofalargenumberoftransactions.Itisimportantforthemtohavethedetailssothattheywillbeabletoanswerimportantquestions.Theapproachthatyouchooseshouldbetheonewhichbestservestheneedsofyourcompany.3DataWarehouseDesignStrategies
Tobuildaneffectivedatawarehouse,itisimportantforyoutounderstanddatawarehousedesignprinciples.Ifyourdatawarehouseisnotbuiltcorrectly,youcanrunintoanumberofdifferentproblems.
Thepropermethodsforbuildingapowerfuldatawarehousearebasedoninformationtechnologytactics.Firstoff,itisimportantthatyouandyourorganizationunderstandtheimportanceofhavingadatawarehouse.Ifworkersfeelthatadatawarehouseisunnecessary,theymaynotuseit,andthiscouldcauseconflicts.Everyoneinyourorganizationshouldunderstandtheimportanceofusingthesystem.
Afteryouhavegotyourcolleaguesbehindtheconceptofusingadatawarehouse,youwillwanttonextfocusondataintegrity.Youwillwanttoavoiddesigningadatawarehousethatwillloaddatathatisnotconsistent.Itisalsoimportanttoavoidcreatingadatabasethatwillreplicatedata.Thegoalofyourorganizationshouldbetointegratedataandcreatestandardsthatwillbeusedandfollowed.Afterdataintegrity,youwillnextwanttolookatimplementationefficiency.Thisbasicallymeansthatyouwillwanttodesignatsystemthatissimpletouse.Itdoesn’tmatterhowwelldesignedyourdatawarehouseisifyourworkershaveahardtimeusingit.
Ifyourworkershaveahardtimeusingthedatawarehouse,itwillslowdownthespeedandproductivityofyouroperation.Whenitcomestocreatingadatawarehouse,youwillwanttomakeitassimpleaspossible.Allofyourworkersshouldbeabletouseitwithoutproblems.Implementationefficiencyisaprinciplethatnaturallyleadstothenexttopicyouwillwanttofocuson,andthisisuserfriendliness.Thisisaconceptthatisanimportantpartofyourbusiness.Thereasonforthisisbecauseenduserswillnotutilizeaprogramthatistoodifficulttouse.Itisimportantforyoutokeeptheminmind.Useadesignwhichisfriendlyandeasytolearn.
Onceyouhavedesignedadatawarehousethatisuserfriendly,youwillnextwanttolookatoperationalefficiency.Oncethedatawarehousehasbeencreated,itshouldbeabletocarryoutoperationsquickly.Inadditiontothis,itshouldnothaveerrorsorothertechnicalproblems.Whenerrorsortechnicalproblemsdooccur,theyshouldbesimpletofix.Anotherthingyouwillwanttolookatisthecostinvolvedwithsupportingthesystem.Youwillwanttokeepthesecostslowasmuchaspossible.
Thedesignprinciplesthathavebeendiscussedinthisarticlesofararemorerelatedtobusinessthaninformationtechnology.However,thereareanumberofITdesignprinciplesthatyouwillwanttofollow.Oneoftheseisscalability.Thisisaproblemthatmanydatawarehousedesignersruninto.Thebestwaytodealwiththisissueistocreateadatawarehousethatisscalablefromthebeginning.Designitinawaywhichwillallowittosupportexpansionsorupgrades.Youshouldbeabletoadaptittoanumberofdifferentbusinesssituations.Thebestdatawarehousesarethosewhicharescalable.
Thedatawarehousethatyoudesignshouldfallundertheguidelinesofinformationtechnologystandards.EverytoolthatyouusetobuildyourdatawarehouseshouldworkwellwithITstandards.Youwillwanttomakesureitisdesignedinawaythatmakesiteasierforyourworkerstouse.Whilefollowingtheguidelinesinthisarticlewon’tallowyoutoalwaysbesuccessful,itwillgreatlytiptheoddsinyourfavor.Youshouldbewaryofcompaniesthatpromiseyouperfectresultsifyouusetheirdesignmethods.[2]Nomatterhowwelldesignedyourdatawarehouseis,youwillalwaysrunintoproblems.However,followingtherightprincipleswillmaketheproblemseasiertorecognizeandsolve.
Whenitcomestousingadatawarehouse,itisnotamatterof“if”youwillrunintoproblems.Itismatterof“how”and“when”.Whenyourdatawarehouseiswelldesigned,youwillbebetterequippedtosolveanyproblemsyouencounter.
1.?warehousen.倉庫,貨棧。
2.?goover受歡迎,獲得接受;檢查。
3.?orientvt.vi.使熟悉,使適應;使朝向;確定位置;朝向;確定方向;使適應n.東方,亞洲。
4.?variantn.變體;變種;變型adj.不同的;差別的;變異的;各種各樣的。
5.?specificadj.明確的,確切的,詳盡的;具體的,特有的,特定的;僅限于……的。Vocabulary
6.?volatileadj.飛行的,揮發(fā)性的,可變的,不穩(wěn)定的,輕快的,爆炸性的n.有翅的動物,揮發(fā)物。
7.?scheman.概要,計劃,圖表,模式。
8.?acquisitionn.獲得,得到的東西;得到的人,買進。
9.?aggregationn.集合,凝聚,集成,集結(作用),集合[成]體,集團。
10.?strategyn.戰(zhàn)略(學),策略,計謀,作戰(zhàn)方針;智謀,手腕strategyandtactics戰(zhàn)略與戰(zhàn)術。
11.?Intricateadj.復雜的,錯綜的,難以理解的。
12.?martn.市場;貿易場所。
13.?repositoryn.倉庫,儲藏所;儲物器皿,博物館;學識淵博的人;受人信賴的人,知己。
14.?Stagingn.舉行,進行;配置,階變,級,級組,分段運輸;分級法。
15.?Populatevt.居住,使人口聚居于;移民于;殖民于人口稠密(稀少)的城市。
[1]Beingsubjectorientedmeansthatthedatawillprovideinformationaboutaspecificsubjectratherthantheinformationaboutthefunctionsofacompany.Becauseadatawarehouseissubjectoriented,itwillallowyoutoanalyzeinformationthatisconnectedtoaspecificsubject.Beingintegratedmeansthatthedatathatiscollectedwithinthedatawarehousecancomefromdifferentsources,butcanbecombinedintooneunitthatisrelevantandlogical.Havingatime-variantmeansthatalltheinformationwithinthedatawarehousecanbefoundwithagivenperiodoftime.ImportantSentences
所謂“面向主題”,就是數據將提供有關一個具體的主題的信息,而不是有關公司運行的信息。由于數據倉庫是面向主題的,因此它就允許你分析與具體主題相關的
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 學校樓層長管理制度
- 學校防恐怖管理制度
- 學生封閉化管理制度
- 學院服裝間管理制度
- 安全生產個管理制度
- 安委會工作管理制度
- 安裝部進度管理制度
- 完善請休假管理制度
- 實木床倉庫管理制度
- 客戶滿意度管理制度
- 人文英語4-005-國開機考復習資料
- 公司安全事故隱患內部舉報、報告獎勵制度
- 中國玉石及玉文化鑒賞智慧樹知到期末考試答案章節(jié)答案2024年同濟大學
- 網絡集成實踐報告
- 小學思政課《愛國主義教育》
- 有趣的行為金融學知到章節(jié)答案智慧樹2023年上海海洋大學
- 民辦學校辦學章程(營利性)
- 機關婦委會換屆選舉工作基本程序
- 零件加工檢驗標準
- UML網上購物系統課程設計DOC
- 懷化職業(yè)技術學院就業(yè)工作管理制度匯編 (一)
評論
0/150
提交評論