系統(tǒng)管理與監(jiān)控_第1頁
系統(tǒng)管理與監(jiān)控_第2頁
系統(tǒng)管理與監(jiān)控_第3頁
系統(tǒng)管理與監(jiān)控_第4頁
系統(tǒng)管理與監(jiān)控_第5頁
已閱讀5頁,還剩92頁未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

大數(shù)據(jù)系統(tǒng)的部署、調(diào)度與監(jiān)控徐葳本次課的目標(biāo)系統(tǒng)管理的重要性從裸機(jī)到大數(shù)據(jù)系統(tǒng)系統(tǒng)全局狀態(tài)的維護(hù)和管理一致性與Chubby

/

Zookeeper任務(wù)調(diào)度軟硬件系統(tǒng)的監(jiān)控怎么聽本節(jié)課我ResearcherPractitionerSplit

personality一個系統(tǒng)管理員(我)的血淚我維護(hù)的200節(jié)點(diǎn)集群ProductionSupport100sofresearchersrunning“bigdata”workloadSystemsResearchSelf-drivingbigdatainfrastructureEverythingmanagedby…HPC:Thegoodolddays

(forsysadmins)Rocks

Cluster

RollsJohn

Boyle.

Biology

must

develop

its

own

big-data

systems.

Nature

(world

view).

July

2013

Demand

1:customers

want

flexibility

…Motivation2:

Customersdemand

performance

…Prof.DavidHausslerBiologist

at

UC

Santa

CruzGodDamnI/O!Wehavea

variety

of

applicationsScientific

Image

ProcessingCryo-EM

and

Protein

StructureSocial“BigData”Social

NetworkingOnlineEducationDataLotsofdependencies…Natural

Language

Processing*ImagecourtesyofProf.GerarddeMelo@TsinghuaResource

hungry

too

…CC++JavaGenomeAnalysisCustomer’s

needs

change

…Protein

DesignCustomer’s

needs

change

…Protein

DesignCustomer’s

needs

change

…Protein

DesignCustomer’s

needs

change

…Protein

Design系統(tǒng)的部署:從裸機(jī)到大數(shù)據(jù)系統(tǒng)Source:

Juju

website基本想法:安裝一臺機(jī)器,自動安裝所有其他機(jī)器Rocks

Cluster

RollsHeadComputeNodesNoapplicableroll?==Sorry網(wǎng)絡(luò)和硬件的配置IPMIDNSCiscoRouterRAIDBMCVPNFirewall解決方案:定制化服務(wù)器+整機(jī)架交付開放數(shù)據(jù)中心委員會(ODCC)整機(jī)架OCP整機(jī)架*集裝箱規(guī)模的交付和部署Photo

from

Lintao

Zhang硬件支持如何遠(yuǎn)程控制裸機(jī)?IntelligentPlatformManagementInterface(IPMI)實(shí)現(xiàn)方法:專用BMC芯片功能:重啟機(jī)器,

Console,電壓,溫度,網(wǎng)絡(luò)連接PXE(網(wǎng)絡(luò)啟動)Bootp,

TFTP操作系統(tǒng)和基礎(chǔ)架構(gòu)CentOSGPUDriversLDAPImageServersCobblerLocalRepoStorageOSoptimizationNetworkDriversSSOOSDriversSecurityStorageSR-IOV解決方案:配置管理Ubuntu

MAAS

(Metal

as

a

Service)把配置轉(zhuǎn)化為程序流行的配置管理工具配置管理:可視化Figure

from

Juju

website項(xiàng)目要求和截止期階段0:項(xiàng)目選擇和組隊(duì)(本周)階段1:與用戶初次交流,提交需求分析與項(xiàng)目計(jì)劃(11月11日)階段2:至少每兩周與用戶交流一次,提交階段性報(bào)告(11月25日,12月9日,12月23日)階段3:進(jìn)行項(xiàng)目展示(12月30日課上)階段4:提交項(xiàng)目報(bào)告(17周末)課程項(xiàng)目comments1.不少組背景部分離項(xiàng)目本身有點(diǎn)遠(yuǎn),和項(xiàng)目相關(guān)的部分一筆帶過,感覺有點(diǎn)像湊篇幅的;

2.有幾個組需要自己爬數(shù)據(jù),而寫爬蟲代碼和搭系統(tǒng)的又是一波人,盡量別耽誤了后面的部分;

3.有些組給的技術(shù)路線只是現(xiàn)有技術(shù)的介紹,還沒有組織好,可能會影響后面的進(jìn)度;

4.那幾個被安排志愿組做得都挺好的,值得鼓勵課程項(xiàng)目特別提醒不能抄襲(加上了出處也不行)引用一些圖片可以,但是必須注明出處Hadoop作業(yè)問題?系統(tǒng)全局狀態(tài)的維護(hù)與管理系統(tǒng)的全局狀態(tài):挑戰(zhàn)GFS

--

masterMapReduce

masterDryad

master問題:誰是master節(jié)點(diǎn)?如果master節(jié)點(diǎn)掛了?解決?找一個人來決定誰是master問題?Chubby的解決方案:A

servicethatprovidessynchronization(leaderelection,sharedenv.info.)reliabilityavailabilityeasy-to-understandsemanticsperformance,throughput,latencyonlysecondaryPrimaryElectionDistributedconsensusproblemAsynchronouscommunicationloss,delay,reorderingWhy

it

is

hard?

FLPimpossibilityresultAmodel:twogeneralproblemTwoarmiesareonoppositesidesofacityinthevalleyThetwogeneralsshouldcoordinatetheattack;eachhasaninitialvalue(attackorretreat)Theonlycommunicationisthroughsendingmessengerswhicharepronetobeingcaptured/lostinthevalleyNodeterministicalgorithmforreachingconsensus!ProofbycontradictionFischer-Lynch-Paterson(FLP)Evenifwehavereliablemessagedelivery…Noconsensuscanbeguaranteedinanasynchronouscommunicationsysteminthepresenceofanyfailures.Intuitiona“failed”processmayjustbeslow,andcanrisefromthedeadatexactlythewrongtime.PaxosIntroductionPaxosisanasynchronousconsensusalgorithm.FLPresultsaysnoasynchronousconsensusalgorithmcanguaranteebothsafetyandliveness.Paxosisguaranteedsafe.Consensusisastableproperty:oncereacheditisneverviolated;theagreedvalueisnotchanged.Paxosisnotguaranteedlive.Consensusisreachedif“alargeenoughsubnetwork...isnon-faultyforalongenoughtime.”O(jiān)therwisePaxosmightneverterminate.Paxos:

the

namePaxosConsensus

ModelLeslieLamportTuring

Award,

2013“fundamentalcontributionstothetheoryandpracticeofdistributedandconcurrentsystems,notablytheinventionofconceptssuchascausalityandlogicalclocks,safetyandliveness,replicatedstatemachines,andsequentialconsistency”LaTeXSequentialconsistencyByzantinefaulttolerancePaxosalgorithmPhoto

from

WikipediaAPaxosRoundReplicated

State

MachineMaintainreplicasbyexecutingoperationsinexactly

the

sameorderRequiresallreplicasto“agree”onthe(setand)orderofoperationsThepoint:ifoneserverfails,canuseotherservers,whichhaveexactlythesamestateUsing

PaxosThree

(Five)

replicas

Clientscan

anyreplica(notjustprimary)Serverappendseachclientoptoareplicated*log*ofoperationsPut,Get,

Update,

DeleteNumberedlogentries–“instances”–seqPaxosagreementoncontentofeachlogentrynote:eachinstance(logentry)isanentirelyseparatePaxosagreement

withentirelyseparateproposalnumbersUsing

Paxos

to

replicate

statesKV

Server

Paxos

Peer

(library)Other

peersLogGET(a)PUT(a,b)……..Instances(LogEntry)

#Client

OpsExample

1:WriteKvpaxosServerS1KvpaxosServer

S2KvpaxosServer

S3Client

1PUT(a,b)LogEntry3,

PUT(a,b)LogEntry3,

PUT(a,b)LogEntry3,

PUT(a,b)LogEntry

3PUT(a,b)……..……..Example2:ReadKvpaxosServerS1KvpaxosServer

S2KvpaxosServer

S3Client

2GET(a)LogEntry4,

GET(a)LogEntry4,

GET(a)LogEntry4,

GET(a)PUT(a,b)GET(a)……..……..LogEntry

4Scan

upto

LogEntry4Consistent

during

a

PartitionKvpaxosServerS1KvpaxosServer

S2KvpaxosServer

S3Client

1Client

2Client

3GETPOSTPOSTPartitionWorks!Does

not

work!Chubby

Design:SystemStructureTwomaincomponents:server(Chubbycell)clientlibraryFigure

from

the

Chubby

paperDesign:Files,Dirs,HandlesFSinterface/ls/cs6464-cell/lab2/testspecializedAPIalsoviainterfaceusedbyGFSLock

LeasesSessionmaintainedthroughKeepAlivesHandles,locks,cacheddataremainvalidclientmustacknowledgeinvalidationmessagesTerminatedexplicitly,orafterleasetimeoutZooKeeperServiceServerServerServerServerServerServerLeaderOpen

source

alternative:

ZooKeeperClientClientClientClientClientClientClientClientAllserversstoreacopyofthedata(inmemory)?AleaderiselectedatstartupFollowersserviceclients,allupdatesgothroughleaderUpdateresponsesaresentwhenamajorityofservershavepersistedthechangeExample

use

of

Zookeeper(Well

known

address

for

Zookeeper)圖片復(fù)制于任務(wù)調(diào)度:問題和挑戰(zhàn)Problem:

ResourceSharinginDataCentersProblemNosingleframeworkoptimalforallapplicationsWanttorunmultipleframeworksinasinglecluster…tomaximizeutilization…tosharedatabetweenframeworksHadoopPregelMPISharedclusterSlide

from

Lintao

ZhangSolution:ResourceSchedulerResourceManagerNodeNodeNodeNodeHadoopPregel…NodeNodeHadoopNodeNodePregel…Slide

from

Lintao

ZhangWhat

are

the

“demands”?Multiple

usersJobs

tasksEach

have

different

requirementsRequests

coming

in

over

time

(online)What

are

the

“resources”?CPU,

RAM,

Disk

spaceNetworkingSpecial

constraints

Location,

colocationSpecial

hardwareGoals

for

the

scheduler

(1)Whatresourcesareavailable?resourcetracking

(who

already

has

what)failure

handlingGoals

for

the

scheduler

(2)Who

can

get

whatresource(andwhen)?FairnessImprove

utilizationImprove

average

completion

timeImprove

power

efficiency(often)

conflicting

goalsGoals

for

the

scheduler

(3)Howcantheuseraccesstheresource?namingOthergoalsEnsureuserisolation

(Container,

VMs)Allow

users

to

monitor

their

servicesA

description

language

/

UI

for

resource

specs任務(wù)調(diào)度舉例:BorgBorg10+

years

@

GoogleManaging

millions

of

machinesResources

managed

by

Borg~10,000

(median)

servers

per

cellHeterogeneous

machinesSize,

processor

type,

external

IPs,

peformanceSpecial

hardware

like

SSDDemandsJob

TasksDifferent

sizesProd

/

non-prodOnline

and

batchRequirement

descriptions

written

in

BCLCan

“update”

task

requirementsRolling

updatesBorg

ArchitectureSource:

Borg

EuroSys

paperHow

Borg

achieved

the

goalsResource

TrackingThrough

Borglets

(local

agents

on

each

machine)Monitoring

+

executions(logically)

single

central

Borg

MasterFault

tolerant

using

Chubby

(always

knows

which

is

the

current

master)Records

all

jobs

in

Paxos

storeBorg

Scheduling

PolicyPriority

+

admission

controlUsed

a

scoring

mechanism

Minimize

the

cost

change

when

placing

a

jobVs.

“best

fit”NamingBorg

names

a

process

with

an

IP

address

+

ports

To

allow

different

jobs

runs

on

a

single

machineShould

this

be

done

by

the

scheduler?Other

things

Borg

handlesPackage

distribution

(how

to

copy

the

binaries

to

all

machines)AutoscalingRe-packing

tasksContainers

to

do

performance

isolationMonitoring

UIDebugging

UITracing

Integration

(later)BCL

(Borg

Configuration

Language)Local

disk

management……LessonsThe

Borg

master

should

be

the

kernel

of

the

data

centerOther

things

can

move

to

separate

servicesShould

simplify

Naming

and

addressing

management

Should

have

multiple

ways

to

group

tasks

(not

necessarily

jobs)Too

much

optimizations

for

power

users,

too

complicated.(230

specifications

in

BCL)

Open

source:

kubernetes任務(wù)調(diào)度:MesosMesos

DemoMesosArchitectureSlide

from

Lintao

ZhangResourceOfferingResourceoffersOfferavailableresourcestoframeworks,letthempickwhichresourcestouseandwhichtaskstolaunch

KeepsMesossimple,letsitsupportfutureframeworksDecentralizeddecisionsmightnotbeoptimalOptimization:Letframeworksshort-circuitrejectionbyprovidingapredicateonresourcestobeofferedE.g.“nodesfromlistL”or“nodeswith>8GBRAM”CouldgeneralizetootherhintsaswellSlide

from

Lintao

Zhang任務(wù)調(diào)度:sparrow問題:scheduler太慢怎么辦?分布式?集中的scheduler:知道全局資源的狀態(tài)分散的scheduler:同步狀態(tài)?10min.10sec.100ms1ms2004:MapReducebatchjob2009:Hivequery2010:DremelQuery2012:Impalaquery2010:In-memorySparkquery2013:SparkstreamingOn

100016-coremachines26decisions/secondSchedulerthroughput1.6Kdecisions/second160Kdecisions/second16Mdecisions/secondFigure

from

KayOusterhout

et

al.

Sparrow

presentation多個scheduler的問題?WorkerWorkerWorkerWorkerWorkerSchedulerSchedulerSchedulerSchedulerJobWorkerFigure

from

KayOusterhout

et

al.

Sparrow

presentationPer-tasksamplingWorkerWorkerWorkerWorkerWorkerSchedulerSchedulerSchedulerSchedulerJobWorkerPowerofTwoChoicesFigure

from

KayOusterhout

et

al.

Sparrow

presentationPer-tasksamplingWorkerWorkerWorkerWorkerWorkerSchedulerSchedulerSchedulerSchedulerJobWorkerPowerofTwoChoicesFigure

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論