版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領
文檔簡介
1、 2009 VMware Inc. All rights reservedSerengeti - 虛擬化你的大數(shù)據(jù)應用藺永華Vmware, Inc.Agenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti solution Deep insight into Serengeti Summary Q&ATodays Big Data System:ETLUnstructured Data (HDFS)
2、Real TimeStructuredDatabaseBig SQLDataParallelBatchProcessingReal TimeStreamsReal-TimeProcessing(s4,storm)AnalyticsAgenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti solution Deep insight into Serengeti Summary Q&AChallenge
3、s To Use Hadoop in physical infrastructureDeployment Difficult to deploy, cost several people for several days even months Difficult to tune cluster performanceLow Efficiency Hadoop clusters are typically not 100% utilized across all hardware resources. Difficult to share resources safely between di
4、fferent workloadSingle Point of Failure Single point of failure for Name Node and Job tracker No HA for Hive, HCatalog, etc.Why Virtualize Hadoop? - Get your Hadoop cluster in minutes1/1000humanefforts,LeastHadoopoperation knowledgeFullyautomated process,10 minutesto get aHadoop/HBaseclusterfromscra
5、tchServer preparationOS installationAutomateby Serengeti onvSpherewith best practiceNetwork ConfigurationHadoop Installation andConfigurationManual process, costdaysWhy Virtualize Hadoop? - Consolidate sprawling clustersClustersshareserverswithstrongisolation Single Hardware Infrastructure Unified o
6、perations Optimize Shared Resources = higher utilization Elastic resources = faster on-demand accessHadoop DevHadoopProdHBaseClusterSprawlingSingle purpose clusters for variousbusiness applications lead to clustersprawl.Cluster Consolidation SimplifyFinanceHadoopVirtualization PlatformHadoopDevHadoo
7、pProdHBase.PortalHadoopPortalHadoop30%CAPEXDown50%+ resourcesaresittingidlewhilehighpriorityjob isburningup its cluster.Utilizeall resourcesfrompool on demand.Dynamic elasticscalingonsharedresourcepoolWhy Virtualize Hadoop? Utilize all your resources to solve the priority problem3X fasterto getanaly
8、ticresultsvSphere High Availability (HA) - protection against unplanned downtimeOverview Protection against host and VM failures Automatic failure detection (host, guest OS) Automatic virtual machine restart in minutes, on any available host in cluster OS and application-independent,does not require
9、 complex configurationchanges(Coordination)ZookeeprManagement ServerHigh Availability for the Hadoop Stack(Hadoop Distributed File System)HBase (Key-Valuestore)HDFSMapReduce (Job Scheduling/Execution System)Pig (Data Flow)HiveBI ReportingETLToolsRDBMSJobtrackerNamenode(SQL)HiveMetaDBHCatalogHcatalog
10、 MDBServerX XHA HAAppOSApp AppOS OSAppOSAppOSAppOSAppOSVMwareESXXVMwareESX Zero downtime, zero data lossfailover for all virtual machines incase of hardware failures Integrated with VMware HA/DRS No complex clustering orspecialized hardware required Single common mechanism for allapplications and op
11、eratingFTvSphere Fault Tolerance provides continuous protectionOverview Single identical VMs running inlockstep on separate hostssystemsZerodowntimeforNameNode,JobTrackerandothercomponentsin HadoopclustersAgenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions ab
12、out virtualization Serengeti solution Deep insight into Serengeti Summary Q&AEasy and rapid deployment and managementOpen sourceprojectlaunched in June 2012, 0.8 is released at Apr.and willrelease0.9 at Jun.Toolkitthat leveragevirtualizationto simplifyHadoop deploymentand operationsDeploy a clus
13、ter in 10 Minutes fully automatedCustomize Hadoop and HBase clusterAutomated cluster operationCome with eco-system componentsSupport all popular Hadoop DistributionsSerengetiDemo: 10 minutes to a Hadoop cluster with SerengetiAgenda Todays big data system Why virtualize hadoop? Serengeti introduction
14、 Common questions about virtualization Serengeti solution Deep insight into Serengeti Summary Q&ACommon questions about virtualizationLocal DiskCan local disk be used in virtualization environment?Flexibilityand ScalabilityHow to flexible schedule resources between clusters and differentapplicat
15、ions as mentioned above?Data stabilityIn virtual environment, how can we distribute data across host and rack?Data localityHadoop will schedule compute tasks near by the data, to reduce networkIO for data R/W. Can virtual environment get the same result?PerformanceHow about the performance in virtua
16、l environment?Agenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti solution Deep insight into Serengeti Summary Q&ACan I use local diskeasily?Other VMOther VMOther VMOther VMOther VMOther VMOther VMOther VMHadoopHadoopHadoopH
17、adoopHadoopHadoopHadoopHadoopHadoopHadoopSerengetiExtend Virtual StorageArchitectureto IncludeLocalDiskShared Storage:SAN or NAS Easy to provision Automated cluster rebalancingHybrid Storage SAN for boot images, otherworkloads Local disk for Hadoop & HDFSHostHostHostHostHostHostHow to flexiblesc
18、alein/scaleoutHow to flexiblescheduleresourcesbetween clustersanddifferentapplications?-ComputeCurrentHadoop:T1T2VMVMVMVMCombinedStorage/ComputeHadoopinVM- * VM lifecycledeterminedby Datanode- * Limited elasticityVMStorageSeparateStorageVMStorageSeparateComputeClusters- * Separate compute -fromdata-
19、 * Remove elasticconstrain- by Datanode- * Elastic compute- * Raise utilization-* Separate virtual compute* Compute clusterpertenant* Stronger VM-grade securityand resourceisolationEvolution of Hadoop on VMs Data/Compute separationSlave NodeSerengeti Node Scale Out / Scale InNameNodeHostDHostJobTrac
20、kerCCCCDHostCCCCDHostCCCCDHostCCCCSerengeti Ballooning Enhancement for Java ApplicationJVMGuest OSHostJVMGuest OSHostGuest OSJVMHow to keep data stability?How to access data locallyif data node and computenodeare located in differentVM?DatanodeandtasktrackercombinedclusterDataComputeseparatedcluster
21、masterHostworkerHostworkerHostmasterHostData nodeHostTasktrackerData nodeHostTasktrackerTasktrackerTasktrackerData nodeHostComputeonly cluster1Computeonly cluster2HDFS clusterCompute OnlyclusterRack1Rack2Rack1Distributed and Data/Compute Associated VM PlacementRack2Rack1Job trackerJob trackerName no
22、deHostRack2TasktrackerTasktrackerData nodeHostHadoopTopologyChangesfor VirtualizationHadoop Topology Awareness Serengeti HVE/D1D2R1R2N1H1H2H3H4H5H6H7H8H9H10H11H12R3R43/D1D2R1R2H1H2H3H4H5H6H7H8H9H10H11H12R3R423N2N3N4N5N6N7N81 12 321 1234HADOOP-8468(UmbrellaJIRA)HADOOP-8469HDFS-3495HDFS-3498HadoopNetw
23、orkTopologyExtensionHadoop Virtualization Extensions for TopologyHVETaskScheduling PolicyExtensionBalancerPolicy ExtensionReplicaChoosing PolicyExtensionReplicaPlacement PolicyExtensionReplicaRemovalPolicyExtensionHDFSMapReduceHadoop CommonMAPREDUCE-4310MAPREDUCE-4309HADOOP-8470HADOOP-8472Is there s
24、ignificantperformancedegradationin virtualizationenvironment?Is there any performancedata?Virtualized Hadoop PerformanceNative versus Virtual Platforms, 32 hosts, 16 disks/hostAgenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti
25、solution Deep insight into Serengeti Summary Q&ARestAPISpringBatchUpdateMetaDBstepVMPlacementcalculationVMProvisionstepSoftwareMgmtstepUI ClientFlex UISerengeti architecture diagramCLI ClientSpring ShellSerengetiWebServiceHibernate/DAOvPostgresVC adapterIronfanserviceThriftServiceProgressIronfan
26、reportChefserverRestAPICookbookVHMstepRabbitMQVM runtimeManagerHostHostHostHostHostVirtualization PlatformHadoopNodeChefClientHA kitHadoopNodeHadoopNodePackagerepositoryvCenterCustomizing your Hadoop/HBase cluster with Serengeti Choiceof distros Storageconfiguration Choice of shared storage or Local
27、 disk Resourceconfiguration High availabilityoption # of nodesdistro:apache,groups: name:master,roles:hadoop_namenode,hadoop_jobtracker”,storage: type: SHARED,sizeGB: 20,instance_type:MEDIUM,instance_num:1,ha:true,name:worker,roles:hadoop_datanode,hadoop_tasktracker,instance_type:SMALL,instance_num:
28、5,ha:falseOne command to scale out your cluster with Serengeticluster resize name -nodegroup worker instanceNum Configure/reconfigure Hadoop with ease by SerengetiModifyHadoop clusterconfigurationfromSerengeti Use the “configuration” section of the json spec file Specify Hadoop attributes in core-si
29、te.xml, hdfs-site.xml, mapred-site.xml,hadoop-env.sh, perties Apply new Hadoop configuration using the edited spec fileconfiguration:hadoop:core-site.xml: / check for all settings at /common/docs/r1.0.0/core-default.html,hdfs-site.xml:/ check for all settings at http:
30、//common/docs/r1.0.0/hdfs-default.html,mapred-site.xml:/ check for all settings at /common/docs/r1.0.0/mapred-default.htmlio.sort.mb: 300,hadoop-env.sh:/ HADOOP_HEAPSIZE:,/ HADOOP_NAMENODE_OPTS:,/ HADOOP_DATANODE_OPTS:, cluster config -name myHadoop -specFile
31、/home/serengeti/myHadoop.jsonFreedom of Choice and Open SourceCommunity ProjectsDistributions Flexibilityto choosefrom major distributionscluster create -name myHadoop -distro apache Supportfor multipleprojects Open architectureto welcomeindustryparticipation ContributingHadoop VirtualizationExtensions(HVE)to opensourcecommunityHDFS2 with Namenode Federation and HADeploy CDH4 Hadoop cluster Name Node Federation Name Node HA MapReduce v1 HBase, Pig, Hive, and Hive ServerCDH4 configurationsScale outElasticityJobTracker HA/FTActiveNamenodeStandby NamenodeActiveNamenodeStandby NamenodeZook
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年度個人委托新能源儲能技術投資合同3篇
- 商鋪售后返租合同中的履約保障與爭議解決(2025年版)2篇
- 二零二五年度建筑玻璃幕墻工程勞務分包及安全評估協(xié)議3篇
- 2025年度氣體滅火系統(tǒng)研發(fā)與生產(chǎn)合作協(xié)議
- 二零二五年度城市綠化帶植物病蟲害防治合同3篇
- 2025版壓路機設備翻新改造與租賃合同范本3篇
- 二零二五年度商用機動車買賣合同范本3篇
- 高速公路交通安全宣傳教育活動合同(二零二五版)3篇
- 專賣店銷售業(yè)績獎勵協(xié)議(2024年度)2篇
- 2025版新型外墻保溫及真石漆技術應用分包合同2篇
- 跨學科主題學習:實施策略、設計要素與評價方式(附案例)
- 場地委托授權
- 2024年四川省成都市龍泉驛區(qū)中考數(shù)學二診試卷(含答案)
- 項目工地春節(jié)放假安排及安全措施
- 印染廠安全培訓課件
- 紅色主題研學課程設計
- 胸外科手術圍手術期處理
- 裝置自動控制的先進性說明
- 《企業(yè)管理課件:團隊管理知識點詳解PPT》
- 移動商務內(nèi)容運營(吳洪貴)任務二 軟文的寫作
- 英語詞匯教學中落實英語學科核心素養(yǎng)
評論
0/150
提交評論