




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
大數(shù)據(jù)系統(tǒng)的部署、調(diào)度與監(jiān)控徐葳本次課的目標(biāo)系統(tǒng)管理的重要性從裸機(jī)到大數(shù)據(jù)系統(tǒng)系統(tǒng)全局狀態(tài)的維護(hù)和管理一致性與Chubby
/
Zookeeper任務(wù)調(diào)度軟硬件系統(tǒng)的監(jiān)控怎么聽本節(jié)課我ResearcherPractitionerSplit
personality一個系統(tǒng)管理員(我)的血淚我維護(hù)的200節(jié)點(diǎn)集群ProductionSupport100sofresearchersrunning“bigdata”workloadSystemsResearchSelf-drivingbigdatainfrastructureEverythingmanagedby…HPC:Thegoodolddays
(forsysadmins)Rocks
Cluster
RollsJohn
Boyle.
Biology
must
develop
its
own
big-data
systems.
Nature
(world
view).
July
2013
Demand
1:customers
want
flexibility
…Motivation2:
Customersdemand
performance
…Prof.DavidHausslerBiologist
at
UC
Santa
CruzGodDamnI/O!Wehavea
variety
of
applicationsScientific
Image
ProcessingCryo-EM
and
Protein
StructureSocial“BigData”Social
NetworkingOnlineEducationDataLotsofdependencies…Natural
Language
Processing*ImagecourtesyofProf.GerarddeMelo@TsinghuaResource
hungry
too
…CC++JavaGenomeAnalysisCustomer’s
needs
change
…Protein
DesignCustomer’s
needs
change
…Protein
DesignCustomer’s
needs
change
…Protein
DesignCustomer’s
needs
change
…Protein
Design系統(tǒng)的部署:從裸機(jī)到大數(shù)據(jù)系統(tǒng)Source:
Juju
website基本想法:安裝一臺機(jī)器,自動安裝所有其他機(jī)器Rocks
Cluster
RollsHeadComputeNodesNoapplicableroll?==Sorry網(wǎng)絡(luò)和硬件的配置IPMIDNSCiscoRouterRAIDBMCVPNFirewall解決方案:定制化服務(wù)器+整機(jī)架交付開放數(shù)據(jù)中心委員會(ODCC)整機(jī)架OCP整機(jī)架*集裝箱規(guī)模的交付和部署Photo
from
Lintao
Zhang硬件支持如何遠(yuǎn)程控制裸機(jī)?IntelligentPlatformManagementInterface(IPMI)實(shí)現(xiàn)方法:專用BMC芯片功能:重啟機(jī)器,
Console,電壓,溫度,網(wǎng)絡(luò)連接PXE(網(wǎng)絡(luò)啟動)Bootp,
TFTP操作系統(tǒng)和基礎(chǔ)架構(gòu)CentOSGPUDriversLDAPImageServersCobblerLocalRepoStorageOSoptimizationNetworkDriversSSOOSDriversSecurityStorageSR-IOV解決方案:配置管理Ubuntu
MAAS
(Metal
as
a
Service)把配置轉(zhuǎn)化為程序流行的配置管理工具配置管理:可視化Figure
from
Juju
website項(xiàng)目要求和截止期階段0:項(xiàng)目選擇和組隊(duì)(本周)階段1:與用戶初次交流,提交需求分析與項(xiàng)目計(jì)劃(11月11日)階段2:至少每兩周與用戶交流一次,提交階段性報(bào)告(11月25日,12月9日,12月23日)階段3:進(jìn)行項(xiàng)目展示(12月30日課上)階段4:提交項(xiàng)目報(bào)告(17周末)課程項(xiàng)目comments1.不少組背景部分離項(xiàng)目本身有點(diǎn)遠(yuǎn),和項(xiàng)目相關(guān)的部分一筆帶過,感覺有點(diǎn)像湊篇幅的;
2.有幾個組需要自己爬數(shù)據(jù),而寫爬蟲代碼和搭系統(tǒng)的又是一波人,盡量別耽誤了后面的部分;
3.有些組給的技術(shù)路線只是現(xiàn)有技術(shù)的介紹,還沒有組織好,可能會影響后面的進(jìn)度;
4.那幾個被安排志愿組做得都挺好的,值得鼓勵課程項(xiàng)目特別提醒不能抄襲(加上了出處也不行)引用一些圖片可以,但是必須注明出處Hadoop作業(yè)問題?系統(tǒng)全局狀態(tài)的維護(hù)與管理系統(tǒng)的全局狀態(tài):挑戰(zhàn)GFS
--
masterMapReduce
–
masterDryad
–
master問題:誰是master節(jié)點(diǎn)?如果master節(jié)點(diǎn)掛了?解決?找一個人來決定誰是master問題?Chubby的解決方案:A
servicethatprovidessynchronization(leaderelection,sharedenv.info.)reliabilityavailabilityeasy-to-understandsemanticsperformance,throughput,latencyonlysecondaryPrimaryElectionDistributedconsensusproblemAsynchronouscommunicationloss,delay,reorderingWhy
it
is
hard?
FLPimpossibilityresultAmodel:twogeneralproblemTwoarmiesareonoppositesidesofacityinthevalleyThetwogeneralsshouldcoordinatetheattack;eachhasaninitialvalue(attackorretreat)Theonlycommunicationisthroughsendingmessengerswhicharepronetobeingcaptured/lostinthevalleyNodeterministicalgorithmforreachingconsensus!ProofbycontradictionFischer-Lynch-Paterson(FLP)Evenifwehavereliablemessagedelivery…Noconsensuscanbeguaranteedinanasynchronouscommunicationsysteminthepresenceofanyfailures.Intuitiona“failed”processmayjustbeslow,andcanrisefromthedeadatexactlythewrongtime.PaxosIntroductionPaxosisanasynchronousconsensusalgorithm.FLPresultsaysnoasynchronousconsensusalgorithmcanguaranteebothsafetyandliveness.Paxosisguaranteedsafe.Consensusisastableproperty:oncereacheditisneverviolated;theagreedvalueisnotchanged.Paxosisnotguaranteedlive.Consensusisreachedif“alargeenoughsubnetwork...isnon-faultyforalongenoughtime.”O(jiān)therwisePaxosmightneverterminate.Paxos:
the
namePaxosConsensus
ModelLeslieLamportTuring
Award,
2013“fundamentalcontributionstothetheoryandpracticeofdistributedandconcurrentsystems,notablytheinventionofconceptssuchascausalityandlogicalclocks,safetyandliveness,replicatedstatemachines,andsequentialconsistency”LaTeXSequentialconsistencyByzantinefaulttolerancePaxosalgorithmPhoto
from
WikipediaAPaxosRoundReplicated
State
MachineMaintainreplicasbyexecutingoperationsinexactly
the
sameorderRequiresallreplicasto“agree”onthe(setand)orderofoperationsThepoint:ifoneserverfails,canuseotherservers,whichhaveexactlythesamestateUsing
PaxosThree
(Five)
replicas
Clientscan
anyreplica(notjustprimary)Serverappendseachclientoptoareplicated*log*ofoperationsPut,Get,
Update,
DeleteNumberedlogentries–“instances”–seqPaxosagreementoncontentofeachlogentrynote:eachinstance(logentry)isanentirelyseparatePaxosagreement
withentirelyseparateproposalnumbersUsing
Paxos
to
replicate
statesKV
Server
Paxos
Peer
(library)Other
peersLogGET(a)PUT(a,b)……..Instances(LogEntry)
#Client
OpsExample
1:WriteKvpaxosServerS1KvpaxosServer
S2KvpaxosServer
S3Client
1PUT(a,b)LogEntry3,
PUT(a,b)LogEntry3,
PUT(a,b)LogEntry3,
PUT(a,b)LogEntry
3PUT(a,b)……..……..Example2:ReadKvpaxosServerS1KvpaxosServer
S2KvpaxosServer
S3Client
2GET(a)LogEntry4,
GET(a)LogEntry4,
GET(a)LogEntry4,
GET(a)PUT(a,b)GET(a)……..……..LogEntry
4Scan
upto
LogEntry4Consistent
during
a
PartitionKvpaxosServerS1KvpaxosServer
S2KvpaxosServer
S3Client
1Client
2Client
3GETPOSTPOSTPartitionWorks!Does
not
work!Chubby
Design:SystemStructureTwomaincomponents:server(Chubbycell)clientlibraryFigure
from
the
Chubby
paperDesign:Files,Dirs,HandlesFSinterface/ls/cs6464-cell/lab2/testspecializedAPIalsoviainterfaceusedbyGFSLock
LeasesSessionmaintainedthroughKeepAlivesHandles,locks,cacheddataremainvalidclientmustacknowledgeinvalidationmessagesTerminatedexplicitly,orafterleasetimeoutZooKeeperServiceServerServerServerServerServerServerLeaderOpen
source
alternative:
ZooKeeperClientClientClientClientClientClientClientClientAllserversstoreacopyofthedata(inmemory)?AleaderiselectedatstartupFollowersserviceclients,allupdatesgothroughleaderUpdateresponsesaresentwhenamajorityofservershavepersistedthechangeExample
use
of
Zookeeper(Well
known
address
for
Zookeeper)圖片復(fù)制于任務(wù)調(diào)度:問題和挑戰(zhàn)Problem:
ResourceSharinginDataCentersProblemNosingleframeworkoptimalforallapplicationsWanttorunmultipleframeworksinasinglecluster…tomaximizeutilization…tosharedatabetweenframeworksHadoopPregelMPISharedclusterSlide
from
Lintao
ZhangSolution:ResourceSchedulerResourceManagerNodeNodeNodeNodeHadoopPregel…NodeNodeHadoopNodeNodePregel…Slide
from
Lintao
ZhangWhat
are
the
“demands”?Multiple
usersJobs
–
tasksEach
have
different
requirementsRequests
coming
in
over
time
(online)What
are
the
“resources”?CPU,
RAM,
Disk
spaceNetworkingSpecial
constraints
Location,
colocationSpecial
hardwareGoals
for
the
scheduler
(1)Whatresourcesareavailable?resourcetracking
(who
already
has
what)failure
handlingGoals
for
the
scheduler
(2)Who
can
get
whatresource(andwhen)?FairnessImprove
utilizationImprove
average
completion
timeImprove
power
efficiency(often)
conflicting
goalsGoals
for
the
scheduler
(3)Howcantheuseraccesstheresource?namingOthergoalsEnsureuserisolation
(Container,
VMs)Allow
users
to
monitor
their
servicesA
description
language
/
UI
for
resource
specs任務(wù)調(diào)度舉例:BorgBorg10+
years
@
GoogleManaging
millions
of
machinesResources
managed
by
Borg~10,000
(median)
servers
per
cellHeterogeneous
machinesSize,
processor
type,
external
IPs,
peformanceSpecial
hardware
like
SSDDemandsJob
TasksDifferent
sizesProd
/
non-prodOnline
and
batchRequirement
descriptions
written
in
BCLCan
“update”
task
requirementsRolling
updatesBorg
ArchitectureSource:
Borg
EuroSys
paperHow
Borg
achieved
the
goalsResource
TrackingThrough
Borglets
(local
agents
on
each
machine)Monitoring
+
executions(logically)
single
central
Borg
MasterFault
tolerant
using
Chubby
(always
knows
which
is
the
current
master)Records
all
jobs
in
Paxos
storeBorg
Scheduling
PolicyPriority
+
admission
controlUsed
a
scoring
mechanism
Minimize
the
cost
change
when
placing
a
jobVs.
“best
fit”NamingBorg
names
a
process
with
an
IP
address
+
ports
To
allow
different
jobs
runs
on
a
single
machineShould
this
be
done
by
the
scheduler?Other
things
Borg
handlesPackage
distribution
(how
to
copy
the
binaries
to
all
machines)AutoscalingRe-packing
tasksContainers
to
do
performance
isolationMonitoring
UIDebugging
UITracing
Integration
(later)BCL
(Borg
Configuration
Language)Local
disk
management……LessonsThe
Borg
master
should
be
the
kernel
of
the
data
centerOther
things
can
move
to
separate
servicesShould
simplify
Naming
and
addressing
management
Should
have
multiple
ways
to
group
tasks
(not
necessarily
jobs)Too
much
optimizations
for
power
users,
too
complicated.(230
specifications
in
BCL)
Open
source:
kubernetes任務(wù)調(diào)度:MesosMesos
DemoMesosArchitectureSlide
from
Lintao
ZhangResourceOfferingResourceoffersOfferavailableresourcestoframeworks,letthempickwhichresourcestouseandwhichtaskstolaunch
KeepsMesossimple,letsitsupportfutureframeworksDecentralizeddecisionsmightnotbeoptimalOptimization:Letframeworksshort-circuitrejectionbyprovidingapredicateonresourcestobeofferedE.g.“nodesfromlistL”or“nodeswith>8GBRAM”CouldgeneralizetootherhintsaswellSlide
from
Lintao
Zhang任務(wù)調(diào)度:sparrow問題:scheduler太慢怎么辦?分布式?集中的scheduler:知道全局資源的狀態(tài)分散的scheduler:同步狀態(tài)?10min.10sec.100ms1ms2004:MapReducebatchjob2009:Hivequery2010:DremelQuery2012:Impalaquery2010:In-memorySparkquery2013:SparkstreamingOn
100016-coremachines26decisions/secondSchedulerthroughput1.6Kdecisions/second160Kdecisions/second16Mdecisions/secondFigure
from
KayOusterhout
et
al.
Sparrow
presentation多個scheduler的問題?WorkerWorkerWorkerWorkerWorkerSchedulerSchedulerSchedulerSchedulerJobWorkerFigure
from
KayOusterhout
et
al.
Sparrow
presentationPer-tasksamplingWorkerWorkerWorkerWorkerWorkerSchedulerSchedulerSchedulerSchedulerJobWorkerPowerofTwoChoicesFigure
from
KayOusterhout
et
al.
Sparrow
presentationPer-tasksamplingWorkerWorkerWorkerWorkerWorkerSchedulerSchedulerSchedulerSchedulerJobWorkerPowerofTwoChoicesFigure
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 原材料長期供應(yīng)合同范本
- 房產(chǎn)轉(zhuǎn)讓合同附加協(xié)議書
- 標(biāo)準(zhǔn)離婚合同范例
- 軟件定制開發(fā)戰(zhàn)略合作合同
- 法律服務(wù)領(lǐng)域戰(zhàn)略合作合同
- 產(chǎn)學(xué)研實(shí)習(xí)基地戰(zhàn)略合作合同
- 11《十六年前的回憶》第二課時(shí) 教學(xué)設(shè)計(jì)-2024-2025學(xué)年統(tǒng)編版語文六年級下冊
- 工程打樁包工合同范本
- 3當(dāng)沖突發(fā)生 教學(xué)設(shè)計(jì)-2023-2024學(xué)年道德與法治四年級下冊統(tǒng)編版
- 小學(xué)教師招聘-教師招聘考試《教學(xué)基礎(chǔ)知識》押題密卷3
- 2024年匯算清繳培訓(xùn)
- 幼兒園監(jiān)控項(xiàng)目技術(shù)方案
- 班主任工作培訓(xùn)內(nèi)容
- 搬遷項(xiàng)目驗(yàn)收報(bào)告模板
- 2024年海南省中考英語試題卷(含答案)+2023年中考英語試卷及答案
- 部編人教版四年級下冊道德與法制全冊教案
- 綜合應(yīng)用能力事業(yè)單位考試(綜合管理類A類)試卷及解答參考(2025年)
- Unit1Lesson2HowDoWeLikeTeachers'Feedback-課件高中英語北師大版選擇性
- 香港(2024年-2025年小學(xué)二年級語文)人教版摸底考試試卷(含答案)
- 民法典物權(quán)編詳細(xì)解讀課件
- 《推力和拉力》課件
評論
0/150
提交評論