Serengeti-虛擬化你的大數(shù)據(jù)應用_第1頁
Serengeti-虛擬化你的大數(shù)據(jù)應用_第2頁
Serengeti-虛擬化你的大數(shù)據(jù)應用_第3頁
Serengeti-虛擬化你的大數(shù)據(jù)應用_第4頁
Serengeti-虛擬化你的大數(shù)據(jù)應用_第5頁
已閱讀5頁,還剩36頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領

文檔簡介

1、 2009 VMware Inc. All rights reservedSerengeti - 虛擬化你的大數(shù)據(jù)應用藺永華Vmware, Inc.Agenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti solution Deep insight into Serengeti Summary Q&ATodays Big Data System:ETLUnstructured Data (HDFS)

2、Real TimeStructuredDatabaseBig SQLDataParallelBatchProcessingReal TimeStreamsReal-TimeProcessing(s4,storm)AnalyticsAgenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti solution Deep insight into Serengeti Summary Q&AChallenge

3、s To Use Hadoop in physical infrastructureDeployment Difficult to deploy, cost several people for several days even months Difficult to tune cluster performanceLow Efficiency Hadoop clusters are typically not 100% utilized across all hardware resources. Difficult to share resources safely between di

4、fferent workloadSingle Point of Failure Single point of failure for Name Node and Job tracker No HA for Hive, HCatalog, etc.Why Virtualize Hadoop? - Get your Hadoop cluster in minutes1/1000humanefforts,LeastHadoopoperation knowledgeFullyautomated process,10 minutesto get aHadoop/HBaseclusterfromscra

5、tchServer preparationOS installationAutomateby Serengeti onvSpherewith best practiceNetwork ConfigurationHadoop Installation andConfigurationManual process, costdaysWhy Virtualize Hadoop? - Consolidate sprawling clustersClustersshareserverswithstrongisolation Single Hardware Infrastructure Unified o

6、perations Optimize Shared Resources = higher utilization Elastic resources = faster on-demand accessHadoop DevHadoopProdHBaseClusterSprawlingSingle purpose clusters for variousbusiness applications lead to clustersprawl.Cluster Consolidation SimplifyFinanceHadoopVirtualization PlatformHadoopDevHadoo

7、pProdHBase.PortalHadoopPortalHadoop30%CAPEXDown50%+ resourcesaresittingidlewhilehighpriorityjob isburningup its cluster.Utilizeall resourcesfrompool on demand.Dynamic elasticscalingonsharedresourcepoolWhy Virtualize Hadoop? Utilize all your resources to solve the priority problem3X fasterto getanaly

8、ticresultsvSphere High Availability (HA) - protection against unplanned downtimeOverview Protection against host and VM failures Automatic failure detection (host, guest OS) Automatic virtual machine restart in minutes, on any available host in cluster OS and application-independent,does not require

9、 complex configurationchanges(Coordination)ZookeeprManagement ServerHigh Availability for the Hadoop Stack(Hadoop Distributed File System)HBase (Key-Valuestore)HDFSMapReduce (Job Scheduling/Execution System)Pig (Data Flow)HiveBI ReportingETLToolsRDBMSJobtrackerNamenode(SQL)HiveMetaDBHCatalogHcatalog

10、 MDBServerX XHA HAAppOSApp AppOS OSAppOSAppOSAppOSAppOSVMwareESXXVMwareESX Zero downtime, zero data lossfailover for all virtual machines incase of hardware failures Integrated with VMware HA/DRS No complex clustering orspecialized hardware required Single common mechanism for allapplications and op

11、eratingFTvSphere Fault Tolerance provides continuous protectionOverview Single identical VMs running inlockstep on separate hostssystemsZerodowntimeforNameNode,JobTrackerandothercomponentsin HadoopclustersAgenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions ab

12、out virtualization Serengeti solution Deep insight into Serengeti Summary Q&AEasy and rapid deployment and managementOpen sourceprojectlaunched in June 2012, 0.8 is released at Apr.and willrelease0.9 at Jun.Toolkitthat leveragevirtualizationto simplifyHadoop deploymentand operationsDeploy a clus

13、ter in 10 Minutes fully automatedCustomize Hadoop and HBase clusterAutomated cluster operationCome with eco-system componentsSupport all popular Hadoop DistributionsSerengetiDemo: 10 minutes to a Hadoop cluster with SerengetiAgenda Todays big data system Why virtualize hadoop? Serengeti introduction

14、 Common questions about virtualization Serengeti solution Deep insight into Serengeti Summary Q&ACommon questions about virtualizationLocal DiskCan local disk be used in virtualization environment?Flexibilityand ScalabilityHow to flexible schedule resources between clusters and differentapplicat

15、ions as mentioned above?Data stabilityIn virtual environment, how can we distribute data across host and rack?Data localityHadoop will schedule compute tasks near by the data, to reduce networkIO for data R/W. Can virtual environment get the same result?PerformanceHow about the performance in virtua

16、l environment?Agenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti solution Deep insight into Serengeti Summary Q&ACan I use local diskeasily?Other VMOther VMOther VMOther VMOther VMOther VMOther VMOther VMHadoopHadoopHadoopH

17、adoopHadoopHadoopHadoopHadoopHadoopHadoopSerengetiExtend Virtual StorageArchitectureto IncludeLocalDiskShared Storage:SAN or NAS Easy to provision Automated cluster rebalancingHybrid Storage SAN for boot images, otherworkloads Local disk for Hadoop & HDFSHostHostHostHostHostHostHow to flexiblesc

18、alein/scaleoutHow to flexiblescheduleresourcesbetween clustersanddifferentapplications?-ComputeCurrentHadoop:T1T2VMVMVMVMCombinedStorage/ComputeHadoopinVM- * VM lifecycledeterminedby Datanode- * Limited elasticityVMStorageSeparateStorageVMStorageSeparateComputeClusters- * Separate compute -fromdata-

19、 * Remove elasticconstrain- by Datanode- * Elastic compute- * Raise utilization-* Separate virtual compute* Compute clusterpertenant* Stronger VM-grade securityand resourceisolationEvolution of Hadoop on VMs Data/Compute separationSlave NodeSerengeti Node Scale Out / Scale InNameNodeHostDHostJobTrac

20、kerCCCCDHostCCCCDHostCCCCDHostCCCCSerengeti Ballooning Enhancement for Java ApplicationJVMGuest OSHostJVMGuest OSHostGuest OSJVMHow to keep data stability?How to access data locallyif data node and computenodeare located in differentVM?DatanodeandtasktrackercombinedclusterDataComputeseparatedcluster

21、masterHostworkerHostworkerHostmasterHostData nodeHostTasktrackerData nodeHostTasktrackerTasktrackerTasktrackerData nodeHostComputeonly cluster1Computeonly cluster2HDFS clusterCompute OnlyclusterRack1Rack2Rack1Distributed and Data/Compute Associated VM PlacementRack2Rack1Job trackerJob trackerName no

22、deHostRack2TasktrackerTasktrackerData nodeHostHadoopTopologyChangesfor VirtualizationHadoop Topology Awareness Serengeti HVE/D1D2R1R2N1H1H2H3H4H5H6H7H8H9H10H11H12R3R43/D1D2R1R2H1H2H3H4H5H6H7H8H9H10H11H12R3R423N2N3N4N5N6N7N81 12 321 1234HADOOP-8468(UmbrellaJIRA)HADOOP-8469HDFS-3495HDFS-3498HadoopNetw

23、orkTopologyExtensionHadoop Virtualization Extensions for TopologyHVETaskScheduling PolicyExtensionBalancerPolicy ExtensionReplicaChoosing PolicyExtensionReplicaPlacement PolicyExtensionReplicaRemovalPolicyExtensionHDFSMapReduceHadoop CommonMAPREDUCE-4310MAPREDUCE-4309HADOOP-8470HADOOP-8472Is there s

24、ignificantperformancedegradationin virtualizationenvironment?Is there any performancedata?Virtualized Hadoop PerformanceNative versus Virtual Platforms, 32 hosts, 16 disks/hostAgenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti

25、solution Deep insight into Serengeti Summary Q&ARestAPISpringBatchUpdateMetaDBstepVMPlacementcalculationVMProvisionstepSoftwareMgmtstepUI ClientFlex UISerengeti architecture diagramCLI ClientSpring ShellSerengetiWebServiceHibernate/DAOvPostgresVC adapterIronfanserviceThriftServiceProgressIronfan

26、reportChefserverRestAPICookbookVHMstepRabbitMQVM runtimeManagerHostHostHostHostHostVirtualization PlatformHadoopNodeChefClientHA kitHadoopNodeHadoopNodePackagerepositoryvCenterCustomizing your Hadoop/HBase cluster with Serengeti Choiceof distros Storageconfiguration Choice of shared storage or Local

27、 disk Resourceconfiguration High availabilityoption # of nodesdistro:apache,groups: name:master,roles:hadoop_namenode,hadoop_jobtracker”,storage: type: SHARED,sizeGB: 20,instance_type:MEDIUM,instance_num:1,ha:true,name:worker,roles:hadoop_datanode,hadoop_tasktracker,instance_type:SMALL,instance_num:

28、5,ha:falseOne command to scale out your cluster with Serengeticluster resize name -nodegroup worker instanceNum Configure/reconfigure Hadoop with ease by SerengetiModifyHadoop clusterconfigurationfromSerengeti Use the “configuration” section of the json spec file Specify Hadoop attributes in core-si

29、te.xml, hdfs-site.xml, mapred-site.xml,hadoop-env.sh, perties Apply new Hadoop configuration using the edited spec fileconfiguration:hadoop:core-site.xml: / check for all settings at /common/docs/r1.0.0/core-default.html,hdfs-site.xml:/ check for all settings at http:

30、//common/docs/r1.0.0/hdfs-default.html,mapred-site.xml:/ check for all settings at /common/docs/r1.0.0/mapred-default.htmlio.sort.mb: 300,hadoop-env.sh:/ HADOOP_HEAPSIZE:,/ HADOOP_NAMENODE_OPTS:,/ HADOOP_DATANODE_OPTS:, cluster config -name myHadoop -specFile

31、/home/serengeti/myHadoop.jsonFreedom of Choice and Open SourceCommunity ProjectsDistributions Flexibilityto choosefrom major distributionscluster create -name myHadoop -distro apache Supportfor multipleprojects Open architectureto welcomeindustryparticipation ContributingHadoop VirtualizationExtensions(HVE)to opensourcecommunityHDFS2 with Namenode Federation and HADeploy CDH4 Hadoop cluster Name Node Federation Name Node HA MapReduce v1 HBase, Pig, Hive, and Hive ServerCDH4 configurationsScale outElasticityJobTracker HA/FTActiveNamenodeStandby NamenodeActiveNamenodeStandby NamenodeZook

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論