![Serengeti虛擬化你的大數(shù)據(jù)應用(VMWare)課件_第1頁](http://file4.renrendoc.com/view/2d92430707f245fb1604ba740b41f0e3/2d92430707f245fb1604ba740b41f0e31.gif)
![Serengeti虛擬化你的大數(shù)據(jù)應用(VMWare)課件_第2頁](http://file4.renrendoc.com/view/2d92430707f245fb1604ba740b41f0e3/2d92430707f245fb1604ba740b41f0e32.gif)
![Serengeti虛擬化你的大數(shù)據(jù)應用(VMWare)課件_第3頁](http://file4.renrendoc.com/view/2d92430707f245fb1604ba740b41f0e3/2d92430707f245fb1604ba740b41f0e33.gif)
![Serengeti虛擬化你的大數(shù)據(jù)應用(VMWare)課件_第4頁](http://file4.renrendoc.com/view/2d92430707f245fb1604ba740b41f0e3/2d92430707f245fb1604ba740b41f0e34.gif)
![Serengeti虛擬化你的大數(shù)據(jù)應用(VMWare)課件_第5頁](http://file4.renrendoc.com/view/2d92430707f245fb1604ba740b41f0e3/2d92430707f245fb1604ba740b41f0e35.gif)
版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領
文檔簡介
?
2009
VMware
Inc.
All
rights
reservedSerengeti
-
虛擬化你的大數(shù)據(jù)應用藺永華Vmware,
Inc.?2009VMwareInc.Allrights1Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste2Today’s
Big
Data
System:ETLUnstructured
Data
(HDFS)
Real
TimeStructured
DatabaseBig
SQLData
Parallel
BatchProcessingReal
Time
Streams
Real-Time
Processing
(s4,storm)AnalyticsToday’sBigDataSystem:ETLUns3Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste4Challenges
To
Use
Hadoop
in
physical
infrastructureDeployment?
Difficult
to
deploy,
cost
several
people
for
several
days
even
months?
Difficult
to
tune
cluster
performanceLow
Efficiency?
Hadoop
clusters
are
typically
not
100%
utilized
across
all
hardware
resources.?
Difficult
to
share
resources
safely
between
different
workloadSingle
Point
of
Failure?
Single
point
of
failure
for
Name
Node
and
Job
tracker?
No
HA
for
Hive,
HCatalog,
etc.ChallengesToUseHadoopinph5Why
Virtualize
Hadoop?
-
Get
your
Hadoop
cluster
in
minutes
1/1000humanefforts,
LeastHadoopoperation
knowledgeFullyautomated
process,10
minutesto
get
aHadoop/HBaseclusterfromscratch
Server
preparation
OS
installation
Automateby
Serengeti
on
vSpherewith
best
practice
Network
Configuration
Hadoop
Installation
and
ConfigurationManual
process,
costdaysWhyVirtualizeHadoop?-Gety6Why
Virtualize
Hadoop?
-
Consolidate
sprawling
clustersClustersshareserverswithstrongisolation
?
Single
Hardware
Infrastructure
?
Unified
operations
Optimize
?
Shared
Resources
=
higher
utilization
?
Elastic
resources
=
faster
on-demand
accessHadoop
DevHadoop
ProdHBase
ClusterSprawlingSingle
purpose
clusters
for
variousbusiness
applications
lead
to
clustersprawl.Cluster
Consolidation
SimplifyFinanceHadoopVirtualization
PlatformHadoop
DevHadoop
ProdHBase...
PortalHadoop
PortalHadoop30%CAPEXDownWhyVirtualizeHadoop?-Conso750%+
resourcesaresittingidlewhilehighpriorityjob
isburningup
its
cluster.Utilizeall
resourcesfrompool
on
demand.
Dynamic
elasticscalingonshared
resourcepoolWhy
Virtualize
Hadoop?
–Utilize
all
your
resources
to
solve
the
priority
problem
3X
fasterto
getanalyticresults50%+resourcesaresittingUtiliz8vSphere
High
Availability
(HA)
-
protection
against
unplanned
downtimeOverview
?
Protection
against
host
and
VM
failures
?
Automatic
failure
detection
(host,
guest
OS)
?
Automatic
virtual
machine
restart
in
minutes,
on
any
available
host
in
cluster
?
OS
and
application-independent,does
not
require
complex
configuration
changesvSphereHighAvailability(HA)9(Coordination)ZookeeprManagement
ServerHigh
Availability
for
the
Hadoop
Stack(Hadoop
Distributed
File
System)HBase
(Key-Valuestore)
HDFSMapReduce
(Job
Scheduling/Execution
System)Pig
(DataFlow)HiveBI
ReportingETLToolsRDBMSJobtracker
Namenode(SQL)
Hive
MetaDB
HCatalogHcatalog
MDBServer(Coordination)ZookeeprManageme10X
XHA
HAApp
OSApp
App
OS
OSApp
OSApp
OSApp
OSApp
OSVMwareESX
XVMwareESX?
Zero
downtime,
zero
data
loss
failover
for
all
virtual
machines
in
case
of
hardware
failures?
Integrated
with
VMware
HA/DRS?
No
complex
clustering
or
specialized
hardware
required?
Single
common
mechanism
for
all
applications
and
operatingFTvSphere
Fault
Tolerance
provides
continuous
protection
Overview
?
Single
identical
VMs
running
in
lockstep
on
separate
hosts
systemsZerodowntimeforNameNode,JobTrackerandothercomponentsin
HadoopclustersXXHAHAAppAppA11Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste12Easy
and
rapid
deployment
and
managementOpen
sourceprojectlaunched
in
June
2012,
0.8
is
released
at
Apr.and
willrelease0.9
at
Jun.Toolkitthat
leveragevirtualizationto
simplifyHadoop
deploymentand
operations
Deploy
a
cluster
in
10
Minutes
fully
automated
Customize
Hadoop
and
HBase
cluster
Automated
cluster
operationCome
with
eco-system
componentsSupport
all
popular
Hadoop
DistributionsSerengetiEasyandrapiddeploymentand13Demo:
10
minutes
to
a
Hadoop
cluster
with
SerengetiDemo:10minutestoaHadoopc14Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste15Common
questions
about
virtualization
Local
Disk?????
Can
local
disk
be
used
in
virtualization
environment?Flexibilityand
Scalability
How
to
flexible
schedule
resources
between
clusters
and
different
applications
as
mentioned
above?Data
stability
In
virtual
environment,
how
can
we
distribute
data
across
host
and
rack?Data
locality
Hadoop
will
schedule
compute
tasks
near
by
the
data,
to
reduce
network
IO
for
data
R/W.
Can
virtual
environment
get
the
same
result?Performance
How
about
the
performance
in
virtual
environment?Commonquestionsaboutvirtual16Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste17Can
I
use
local
diskeasily?CanIuselocaldiskeasily?18Other
VMOther
VMOther
VMOther
VMOther
VMOther
VMOther
VMOther
VMHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopSerengetiExtend
Virtual
StorageArchitectureto
IncludeLocalDiskShared
Storage:SAN
or
NAS
?
Easy
to
provision
?
Automated
cluster
rebalancingHybrid
Storage
?
SAN
for
boot
images,
other
workloads
?
Local
disk
for
Hadoop
&
HDFSHostHostHostHostHostHostOtherVMOtherVMOtherVMOther19How
to
flexiblescalein/scaleoutHow
to
flexiblescheduleresourcesbetween
clustersanddifferentapplications?Howtoflexiblescalein/scaleou20-ComputeCurrentHadoop:T1T2VMVMVMVM
Combined
Storage/Com
puteHadoopinVM-
*
VM
lifecycle
determined
by
Datanode-
*
Limited
elasticityVM
Storage
SeparateStorageVM
Storage
SeparateComputeClusters-
*
Separate
compute
-
fromdata-
*
Remove
elasticconstrain-
by
Datanode-
*
Elastic
compute-
*
Raise
utilization-*
Separate
virtual
compute*
Compute
clusterpertenant*
Stronger
VM-grade
securityand
resourceisolationEvolution
of
Hadoop
on
VMs
–
Data/Compute
separation
Slave
Node-ComputeCurrentT1T2VMVMVMVM Co21Serengeti
Node
Scale
Out
/
Scale
InNameNode
Host
DHostJobTrackerCCCC
DHostCCC
C
DHostCCC
C
DHostCCC
CSerengetiNodeScaleOut/Sca22Serengeti
Ballooning
Enhancement
for
Java
ApplicationJVMGuest
OSHostJVMGuest
OSHostGuest
OS
JVMSerengetiBallooningEnhanceme23How
to
keep
data
stability?How
to
access
data
locallyif
data
node
and
computenodeare
located
in
differentVM?Howtokeepdatastability?How24DatanodeandtasktrackercombinedclusterDataComputeseparatedclustermaster
Hostworker
Hostworker
Hostmaster
HostData
node
HostTasktrackerData
node
HostTasktrackerTasktrackerTasktracker
Data
node
HostComputeonly
cluster1Computeonly
cluster2HDFS
cluster
Compute
OnlyclusterRack1Rack2Rack1Distributed
and
Data/Compute
Associated
VM
Placement
Rack2
Rack1Job
trackerJob
trackerName
node
Host
Rack2TasktrackerTasktracker
Data
node
HostDatanodeandtasktrackercombined25HadoopTopologyChangesfor
VirtualizationHadoop
Topology
Awareness
–
Serengeti
HVE
/D1D2R1R2N1H1H2H3H4H5H6H7H8H9H10H11H12R3R43/D1D2R1R2H1H2H3H4H5H6H7H8H9H10H11H12R3R423N2N3N4N5N6N7N81
12
321
1234HadoopTopologyChangesforVirtu26HADOOP-8468(UmbrellaJIRA)HADOOP-8469HDFS-3495HDFS-3498HadoopNetworkTopologyExtension
Hadoop
Virtualization
Extensions
for
Topology
HVE
TaskScheduling
PolicyExtension
BalancerPolicy
ExtensionReplicaChoosing
PolicyExtensionReplicaPlacement
PolicyExtension
ReplicaRemovalPolicyExtensionHDFSMapReduceHadoop
CommonMAPREDUCE-4310MAPREDUCE-4309HADOOP-8470HADOOP-8472HADOOP-8468(UmbrellaJIRA)Hadoo27Is
there
significantperformancedegradationin
virtualizationenvironment?Is
there
any
performancedata?Istheresignificantperformanc28Virtualized
Hadoop
PerformanceVirtualizedHadoopPerformance29Native
versus
Virtual
Platforms,
32
hosts,
16
disks/hostNativeversusVirtualPlatform30Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste31RestAPISpringBatchUpdateMetaDBstepVMPlacementcalculationVMProvisionstepSoftwareMgmtstepUI
Client
Flex
UISerengeti
architecture
diagram
CLI
Client
Spring
Shell
Serengeti
Web
ServiceHibernate/
DAOvPostgresVC
adapter
Ironfan
service
ThriftService
ProgressIronfan
report
Chef
serverRestAPICookbookVHMstepRabbitMQVM
runtime
ManagerHostHostHostHostHostVirtualization
PlatformHadoop
NodeChefClient
HA
kitHadoop
NodeHadoop
NodePackagerepositoryvCenterRestAPISpringBatchUpdateVMVMSo32Customizing
your
Hadoop/HBase
cluster
with
Serengeti
Choiceof
distros
Storageconfiguration
?
Choice
of
shared
storage
or
Local
disk
Resourceconfiguration
High
availabilityoption
#
of
nodes…
"distro":"apache",
"groups":[
{
"name":"master",
"roles":[
"hadoop_namenode",
"hadoop_jobtracker”],
"storage":
{
"type":
"SHARED",
"sizeGB":
20},
"instance_type":MEDIUM,
"instance_num":1,
"ha":true},
{"name":"worker",
"roles":[
"hadoop_datanode",
"hadoop_tasktracker"
],
"instance_type":SMALL,
"instance_num":5,
"ha":false
…CustomizingyourHadoop/HBase33One
command
to
scale
out
your
cluster
with
Serengeti>cluster
resize
–name
<clustername>
--nodegroup
worker
–instanceNum
<#>Onecommandtoscaleoutyour34Configure/reconfigure
Hadoop
with
ease
by
SerengetiModifyHadoop
clusterconfigurationfromSerengeti?
Use
the
“configuration”
section
of
the
json
spec
file?
Specify
Hadoop
attributes
in
core-site.xml,
hdfs-site.xml,
mapred-site.xml,hadoop-env.sh,
perties?
Apply
new
Hadoop
configuration
using
the
edited
spec
file"configuration":{"hadoop":{"core-site.xml":
{//
check
for
all
settings
at
/common/docs/r1.0.0/core-default.html},"hdfs-site.xml":{//
check
for
all
settings
at
/common/docs/r1.0.0/hdfs-default.html},"mapred-site.xml":{//
check
for
all
settings
at
/common/docs/r1.0.0/mapred-default.html"io.sort.mb":
"300"},"hadoop-env.sh":{//
"HADOOP_HEAPSIZE":"",//
"HADOOP_NAMENODE_OPTS":"",//
"HADOOP_DATANODE_OPTS":"",…>
cluster
config
--name
myHadoop
--specFile
/home/serengeti/myHadoop.jsonConfigure/reconfigureHadoopw35Freedom
of
Choice
and
Open
SourceCommunity
ProjectsDistributions?
Flexibilityto
choosefrom
major
distributions
cluster
create
--name
myHadoop
--distro
apache?
Supportfor
multipleprojects?
Open
architectureto
welcomeindustryparticipation?
ContributingHadoop
VirtualizationExtensions(HVE)to
open
sourcecommunityFreedomofChoiceandOpenSou36HDFS2
with
Namenode
Federation
and
HADeploy
CDH4
Hadoop
cluster
?
Name
Node
Federation
?
Name
Node
HA
?
MapReduce
v1?
HBase,
Pig,
Hive,
and
Hive
ServerCDH4
configurationsScale
outElasticityJobTracker
HA/FTActiveNamenodeStandby
NamenodeActiveNamenodeStandby
NamenodeZookeeper
GroupZKZKZK
CoordinateNamenodeGroup1Coordinate
NamenodeGroup2Quorum-basedmetadatastore
Data
NodesDatanode
Datanode
Datanode
Datanode
Datanode
Datanode
Datanode
DatanodeBlockreportBlockreportHDFS2withNamenodeFederation37Proactive
monitoring
and
tuning
with
VCOPsProactivelymonitoring
through
VCOPsGain
comprehensivevisibilityEliminatemanual
processeswith
intelligentautomationProactivelymanage
operationsProactivemonitoringandtunin38Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste39VMWarebringsAgility,
Efficiency,
and
Elasticityto
Big
DataElasticity
Enable
full
elasticity
through
separation
of
Data
and
Compute
Scale
In/Out
Hadoop
with
Resource
ConstrainAgility
Deploy,
configure
and
monitor
Hadoop
clusters
on
the
fly
Dynamic
reconfiguring
of
Hadoop
to
meet
changing
business
demandsEfficiency
Consolidate
Hadoop
to
achieve
higher
utilization
Pool
resources
to
allow
for
increased
performance
and
priority
job
processingVMWarebringsAgility,Efficienc40Serengeti
ResourcesDownload
and
try
Serengeti
?
VMware
Hadoop
site
?
/hadoopSerengetiResourcesVMwareHado41?
2009
VMware
Inc.
All
rights
reservedSerengeti
-
虛擬化你的大數(shù)據(jù)應用藺永華Vmware,
Inc.?2009VMwareInc.Allrights42Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste43Today’s
Big
Data
System:ETLUnstructured
Data
(HDFS)
Real
TimeStructured
DatabaseBig
SQLData
Parallel
BatchProcessingReal
Time
Streams
Real-Time
Processing
(s4,storm)AnalyticsToday’sBigDataSystem:ETLUns44Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste45Challenges
To
Use
Hadoop
in
physical
infrastructureDeployment?
Difficult
to
deploy,
cost
several
people
for
several
days
even
months?
Difficult
to
tune
cluster
performanceLow
Efficiency?
Hadoop
clusters
are
typically
not
100%
utilized
across
all
hardware
resources.?
Difficult
to
share
resources
safely
between
different
workloadSingle
Point
of
Failure?
Single
point
of
failure
for
Name
Node
and
Job
tracker?
No
HA
for
Hive,
HCatalog,
etc.ChallengesToUseHadoopinph46Why
Virtualize
Hadoop?
-
Get
your
Hadoop
cluster
in
minutes
1/1000humanefforts,
LeastHadoopoperation
knowledgeFullyautomated
process,10
minutesto
get
aHadoop/HBaseclusterfromscratch
Server
preparation
OS
installation
Automateby
Serengeti
on
vSpherewith
best
practice
Network
Configuration
Hadoop
Installation
and
ConfigurationManual
process,
costdaysWhyVirtualizeHadoop?-Gety47Why
Virtualize
Hadoop?
-
Consolidate
sprawling
clustersClustersshareserverswithstrongisolation
?
Single
Hardware
Infrastructure
?
Unified
operations
Optimize
?
Shared
Resources
=
higher
utilization
?
Elastic
resources
=
faster
on-demand
accessHadoop
DevHadoop
ProdHBase
ClusterSprawlingSingle
purpose
clusters
for
variousbusiness
applications
lead
to
clustersprawl.Cluster
Consolidation
SimplifyFinanceHadoopVirtualization
PlatformHadoop
DevHadoop
ProdHBase...
PortalHadoop
PortalHadoop30%CAPEXDownWhyVirtualizeHadoop?-Conso4850%+
resourcesaresittingidlewhilehighpriorityjob
isburningup
its
cluster.Utilizeall
resourcesfrompool
on
demand.
Dynamic
elasticscalingonshared
resourcepoolWhy
Virtualize
Hadoop?
–Utilize
all
your
resources
to
solve
the
priority
problem
3X
fasterto
getanalyticresults50%+resourcesaresittingUtiliz49vSphere
High
Availability
(HA)
-
protection
against
unplanned
downtimeOverview
?
Protection
against
host
and
VM
failures
?
Automatic
failure
detection
(host,
guest
OS)
?
Automatic
virtual
machine
restart
in
minutes,
on
any
available
host
in
cluster
?
OS
and
application-independent,does
not
require
complex
configuration
changesvSphereHighAvailability(HA)50(Coordination)ZookeeprManagement
ServerHigh
Availability
for
the
Hadoop
Stack(Hadoop
Distributed
File
System)HBase
(Key-Valuestore)
HDFSMapReduce
(Job
Scheduling/Execution
System)Pig
(DataFlow)HiveBI
ReportingETLToolsRDBMSJobtracker
Namenode(SQL)
Hive
MetaDB
HCatalogHcatalog
MDBServer(Coordination)ZookeeprManageme51X
XHA
HAApp
OSApp
App
OS
OSApp
OSApp
OSApp
OSApp
OSVMwareESX
XVMwareESX?
Zero
downtime,
zero
data
loss
failover
for
all
virtual
machines
in
case
of
hardware
failures?
Integrated
with
VMware
HA/DRS?
No
complex
clustering
or
specialized
hardware
required?
Single
common
mechanism
for
all
applications
and
operatingFTvSphere
Fault
Tolerance
provides
continuous
protection
Overview
?
Single
identical
VMs
running
in
lockstep
on
separate
hosts
systemsZerodowntimeforNameNode,JobTrackerandothercomponentsin
HadoopclustersXXHAHAAppAppA52Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste53Easy
and
rapid
deployment
and
managementOpen
sourceprojectlaunched
in
June
2012,
0.8
is
released
at
Apr.and
willrelease0.9
at
Jun.Toolkitthat
leveragevirtualizationto
simplifyHadoop
deploymentand
operations
Deploy
a
cluster
in
10
Minutes
fully
automated
Customize
Hadoop
and
HBase
cluster
Automated
cluster
operationCome
with
eco-system
componentsSupport
all
popular
Hadoop
DistributionsSerengetiEasyandrapiddeploymentand54Demo:
10
minutes
to
a
Hadoop
cluster
with
SerengetiDemo:10minutestoaHadoopc55Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste56Common
questions
about
virtualization
Local
Disk?????
Can
local
disk
be
used
in
virtualization
environment?Flexibilityand
Scalability
How
to
flexible
schedule
resources
between
clusters
and
different
applications
as
mentioned
above?Data
stability
In
virtual
environment,
how
can
we
distribute
data
across
host
and
rack?Data
locality
Hadoop
will
schedule
compute
tasks
near
by
the
data,
to
reduce
network
IO
for
data
R/W.
Can
virtual
environment
get
the
same
result?Performance
How
about
the
performance
in
virtual
environment?Commonquestionsaboutvirtual57Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste58Can
I
use
local
diskeasily?CanIuselocaldiskeasily?59Other
VMOther
VMOther
VMOther
VMOther
VMOther
VMOther
VMOther
VMHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopSerengetiExtend
Virtual
StorageArchitectureto
IncludeLocalDiskShared
Storage:SAN
or
NAS
?
Easy
to
provision
?
Automated
cluster
rebalancingHybrid
Storage
?
SAN
for
boot
images,
other
workloads
?
Local
disk
for
Hadoop
&
HDFSHostHostHostHostHostHostOtherVMOtherVMOtherVMOther60How
to
flexiblescalein/scaleoutHow
to
flexiblescheduleresourcesbetween
clustersanddifferentapplications?Howtoflexiblescalein/scaleou61-ComputeCurrentHadoop:T1T2VMVMVMVM
Combined
Storage/Com
puteHadoopinVM-
*
VM
lifecycle
determined
by
Datanode-
*
Limited
elasticityVM
Storage
SeparateStorageVM
Storage
SeparateComputeClusters-
*
Separate
compute
-
fromdata-
*
Remove
elasticconstrain-
by
Datanode-
*
Elastic
compute-
*
Raise
utilization-*
Separate
virtual
compute*
Compute
clusterpertenant*
Stronger
VM-grade
securityand
resourceisolationEvolution
of
Hadoop
on
VMs
–
Data/Compute
separation
Slave
Node-ComputeCurrentT1T2VMVMVMVM Co62Serengeti
Node
Scale
Out
/
Scale
InNameNode
Host
DHostJobTrackerCCCC
DHostCCC
C
DHostCCC
C
DHostCCC
CSerengetiNodeScaleOut/Sca63Serengeti
Ballooning
Enhancement
for
Java
ApplicationJVMGuest
OSHostJVMGuest
OSHostGuest
OS
JVMSerengetiBallooningEnhanceme64How
to
keep
data
stability?How
to
access
data
locallyif
data
node
and
computenodeare
located
in
differentVM?Howtokeepdatastability?How65DatanodeandtasktrackercombinedclusterDataComputeseparatedclustermaster
Hostworker
Hostworker
Hostmaster
HostData
node
HostTasktrackerData
node
HostTasktrackerTasktrackerTasktracker
Data
node
HostComputeonly
cluster1Computeonly
cluster2HDFS
cluster
Compute
OnlyclusterRack1Rack2Rack1Distributed
and
Data/Compute
Associated
VM
Placement
Rack2
Rack1Job
trackerJob
trackerName
node
Host
Rack2TasktrackerTasktracker
Data
node
HostDatanodeandtasktrackercombined66HadoopTopologyChangesfor
VirtualizationHadoop
Topology
Awareness
–
Serengeti
HVE
/D1D2R1R2N1H1H2H3H4H5H6H7H8H9H10H11H12R3R43/D1D2R1R2H1H2H3H4H5H6H7H8H9H10H11H12R3R423N2N3N4N5N6N7N81
12
321
1234HadoopTopologyChangesforVirtu67HADOOP-8468(UmbrellaJIRA)HADOOP-8469HDFS-3495HDFS-3498HadoopNetworkTopologyExtension
Hadoop
Virtualization
Extensions
for
Topology
HVE
TaskScheduling
PolicyExtension
BalancerPolicy
ExtensionReplicaChoosing
PolicyExtensionReplicaPlacement
PolicyExtension
ReplicaRemovalPolicyExtensionHDFSMapReduceHadoop
CommonMAPREDUCE-4310MAPREDUCE-4309HADOOP-8470HADOOP-8472HADOOP-8468(UmbrellaJIRA)Hadoo68Is
there
significantperformancedegradationin
virtualizationenvironment?Is
there
any
performancedata?Istheresignificantperformanc69Virtualized
Hadoop
PerformanceVirtualizedHadoopPerformance70Native
versus
Virtual
Platforms,
32
hosts,
16
disks/hostNativeversusVirtualPlatform71Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste72RestAPISpringBatchUpdateMetaDBstepVMPlacementcalculationVMProvisionstepSoftwareMgmtstepUI
Client
Flex
UISerengeti
architecture
diagram
CLI
Client
Spring
Shell
Serengeti
Web
ServiceHibernate/
DAOvPostgresVC
adapter
Ironfan
service
ThriftService
ProgressIronfan
report
Chef
serverRestAPICookbookVHMstepRabbitMQVM
runtime
ManagerHostHostHostHostHostVirtualization
PlatformHadoop
NodeChefClient
HA
kitHadoop
NodeHadoop
NodePackagerepositoryvCenterRestAPISpringBatchUpdateVMVMSo73Customizing
your
Hadoop/HBase
cluster
with
Serengeti
Choiceof
distros
Storageconfiguration
?
Choice
of
shared
storage
or
Local
disk
Resourceconfiguration
High
availabilityoption
#
of
nodes…
"distro":"apache",
"groups":[
{
"name":"master",
"roles":[
"hadoop_namenode",
"hadoop_jobtracker”],
"storage":
{
"type":
"SHARED",
"sizeGB":
20},
"instance_type":MEDIUM,
"instance_num":1,
"ha":true},
{"name":"worker",
"roles":[
"hadoop_datanode",
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年中國引張線儀市場調(diào)查研究報告
- 2025至2030年中國香檳汽酒數(shù)據(jù)監(jiān)測研究報告
- 2025至2030年中國錐形過濾器數(shù)據(jù)監(jiān)測研究報告
- 2025至2030年中國鈮缸套數(shù)據(jù)監(jiān)測研究報告
- 2025至2030年中國電氣傳動控制系統(tǒng)數(shù)據(jù)監(jiān)測研究報告
- 2025至2030年中國數(shù)控外圓車刀數(shù)據(jù)監(jiān)測研究報告
- 2025至2030年中國三角豆數(shù)據(jù)監(jiān)測研究報告
- 2025至2030年中國12層熱壓機數(shù)據(jù)監(jiān)測研究報告
- 2025年中國音頻擴展器市場調(diào)查研究報告
- 合成氣在促進工業(yè)綠色發(fā)展中的技術應用策略研究考核試卷
- 郵輪外部市場營銷類型
- 2023年廣東廣州期貨交易所招聘筆試參考題庫附帶答案詳解
- GB/T 42460-2023信息安全技術個人信息去標識化效果評估指南
- 05G359-3 懸掛運輸設備軌道(適用于一般混凝土梁)
- 工程與倫理課程
- CKDMBD慢性腎臟病礦物質及骨代謝異常
- 蘇教版科學(2017)六年級下冊1-2《各種各樣的能量》表格式教案
- 潮汕英歌舞課件
- 田字格模版內(nèi)容
- 第一章 公共政策分析的基本理論與框架
- 熱連軋帶鋼生產(chǎn)工藝
評論
0/150
提交評論