OS-造成的長時間非典型-JVM-GC-停頓:深度分析和解決_第1頁
OS-造成的長時間非典型-JVM-GC-停頓:深度分析和解決_第2頁
OS-造成的長時間非典型-JVM-GC-停頓:深度分析和解決_第3頁
OS-造成的長時間非典型-JVM-GC-停頓:深度分析和解決_第4頁
OS-造成的長時間非典型-JVM-GC-停頓:深度分析和解決_第5頁
已閱讀5頁,還剩25頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)

文檔簡介

OS-caused

Long

JVM

Pauses-

Deep

Dive

and

Solutionsqqqqqq

OutlineIntroductionBackgroundScenario

1:

startup

stateScenario

2:

steady

state

with

memory

pressureScenario

3:

steady

state

with

heavy

IOLessons

learned3

Introductionq

Java

+

Linux

§

Java

is

popular

in

production

deployments

§

Linux

features

interact

with

JVM

operations

§

Unique

challenges

caused

by

concurrent

applicationsq

Long

JVM

pauses

caused

by

Linux

OS

§

Production

issues,

in

three

scenarios

§

Root

causes

§

Solutionsq

References

§

Ensuring

High-performance

of

Mission-critical

Java

Applications

in

Multi-

tenant

Cloud

Platforms,

IEEE

Cloud

2014

§

Eliminating

Large

JVM

GC

Pauses

Caused

by

Background

IO

Traffic,

LinkedIn

Engineering

Blog,

2016

(Too

many

tweets

bringing

down

a

twitter

server!

:)4Backgroundq

JVM

and

Heap§Oracle

HotSpot

JVMq

Garbage

collection

§

Generations

§

Garbage

collectorsq

Linux

OS

§

Paging

(Regular

page,

Huge

page)

§

Swapping

(Anonymous

memory)

§

Page

cache

writeback

(Batched,

Periodic)5

Scenariosq

Three

scenarios

§

Startup

state

§

Steady

state

with

memory

pressure

§

Steady

state

with

heavy

IOq

Workload

§

Java

application

keeps

allocating/de-allocating

objects

§

Background

applications

taking

memories

or

issuing

disk

IOq

Performance

metrics

§

Application

throughput

(K

allocations/sec)

§

Java

GC

pauses6Scenario

1:

Startup

State

(App.

Symptoms)q

When

Java

applications

startq

Life

is

good

in

the

beginningq

Then

Java

throughput

drops

sharplyq

Java

GC

pauses

spike

during

the

same

period7Scenario

1:

Startup

State

(Investigations)

Java

heap

is

gradually

allocatedWithout

enough

memory,

directpage

scanning

can

happenHeap

is

swapped

out

and

inIt

causes

large

GC8

Solutionsq

Pre-allocating

JVM

heap

spaces

§

JVM

“-XX:AlwaysPreTouch”q

Protecting

JVM

heap

spaces

from

being

swapped

out

§

Swappoff

command

§

Swappiness

?

=0

for

kernel

version

before

?

=1

for

kernel

version

from

§

Cgroup9Evaluations

(Pre-allocating

Heap)10Evaluations

(Protecting

Heap)182411Scenario

2:

Steady

State

(App.

Symptoms)q

During

steady

state

of

a

Java

application,

system

memory

stresses

due

to

other

applicationsq

Java

throughput

drops

sharply

and

performs

badlyq

Java

GC

pauses

spike12Scenario

2:

Steady

State

(Level-1

Investigations)13q

During

GC

pauses,

swapping

activities

persistq

Swapping

in

JVM

pages

causes

GC

pauses

real=54.83

secs]

§

High

sys-cpu

usage

(swapping

is

not

sys-cpu

intensive)Scenario

2:

Steady

State

(Level-2

Investigations)q

THP

(Transparent

Huge

Pages)

§

Improved

TLB

cache-hitsq

Bi-directional

operations

§

THPs

are

allocated

first,

but

split

during

memory

pressure

§

Regular

pages

are

collapsed

to

make

THPs

§

CPU

heavy,

and

thrashing!Regular

Pages

4KB

4KB

4KB

4KB

4KB

4KB

……

……Transparent

HugePages

(THP)

2MB

SplittingCollapsing14

Solutionsq

Dynamically

adjusting

THP

§

Enable

THP

when

no

memory

pressure

§

Disable

THP

during

memory

pressure

period

§

Fine

tuning

of

THP

parameters15MechanismTHPOffTHPOnDynamicTHPThroughput(Kallocations/sec)131112MechanismTHPOffTHPOnDynamicTHPThroughput(Kallocations/sec)121515

Evaluations

(Dynamic

THP)q

Without

memory

pressure

§

Dynamic

THP

delivers

similar

performance

as

THP

is

onq

With

memory

pressure

§

Dynamic

THP

has

some

performance

overhead

§

Performance

is

less

than

THP-off

§

But

better

than

THP-on16

Scenario

3:

Steady

State

(Heavy

IO)q

Production

issue

§

Online

products

§

Applications

have

light

workload

§

Both

CMS

and

G1

garbage

collectorsq

Preliminary

investigations

§

Examined

many

layers/metrics

§

The

only

suspect:

disk

IO

occasionally

is

heavy

§

But

all

application

IO

are

asynchronous17 ReproducingtheproblemqWorkload §Simplifiedtoavoidcomplexbusinesslogic §s://github/zhenyun/JavaGCworkloadqBackgroundIO §SaturatingHDD18Case

I:

Without

background

IO19No

single

longer-than-200ms

pauseCase

II:

With

background

IOHuge

pause!20Investigations21Time

linesq

At

time

35.04

(line

2),

a

young

GC

starts

and

takes

0.12

seconds

to

complete.q

The

young

GC

finishes

at

time

35.16

and

JVM

tries

to

output

the

young

GC

statistics

to

gc

log

file

by

issuing

a

write()

system

call

(line

4).q

The

write()

call

finishes

at

time

36.64

after

being

blocked

for

1.47

seconds

(line

5)q

When

write()

call

returns

to

JVM,

JVM

records

at

time

36.64

this

STW

pause

of

1.59

seconds

(i.e.,

0.12

+

1.47)

(line

3).22Interaction

between

JVM

and

OS23

Non-blocking

IO

can

be

blockedq

Stable

page

write

§

For

file-backed

writing,

OS

writes

to

page

cache

first

§

OS

has

write-back

mechanism

to

persist

dirty

pages

§

If

a

page

is

under

write-back,

the

page

is

lockedq

Journal

committing

§

Journals

are

generated

for

journaling

file

system

§

When

appending

GC

log

files

needs

new

blocks,

journals

need

to

be

committed

§

Commitment

might

need

to

wait24

Background

IO

activitiesq

OS

activity

such

as

swapping

§

Data

writing

to

underlying

disksq

Administration

and

housekeeping

software

§

System-level

software

such

as

CFEngine

also

perform

disk

IOq

Other

co-located

applications

§

Co-located

applications

that

share

the

disk

drives,

then

other

applications

contend

on

IOq

IO

of

the

same

JVM

instance

§

The

particular

JVM

instance

may

use

disk

IO

in

ways

other

than

GC

logging25

Solutionsq

Enhancing

JVM

§

Another

thread

§

Exposing

JVM

flagsq

Reducing

IO

activities

§

OS,

other

apps,

same

appq

Latency

sensitive

applications

§

Separate

disk

§

High

performing

disks

such

as

SSD

§

Tmpfs26Evaluationq

SSD

as

the

disk27The

good,

the

bad,

and

the

uglyq

The

good:

low

real

time

§

Low

user

time

and

low

sys

time

§

[user=0.18

sys=0.01,

real=0.04

secs]q

The

bad:

non-low

(but

not

high)

real

time

§

High

user

time

and

low

sys

time

§

[user=8.00

sys=0.02,

real=0.50

secs]q

The

ugly:

high

real

time

§

High

sys

time

[user=0.02

sys=1.20,

real=1.20

secs]

§

Low

sys

time,

low

user

time

[Example?

]28

Lessons

Learned

(I)q

Be

cautious

about

Linux’s

(and

other

OS)

new

features

§

Constantly

incorporating

new

features

to

optimize

performance

§

Some

features

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論