版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
FunctionalComparisonandPerformance
EvaluationOverview
Streaming
Core
MISC
Performance
BenchmarkChooseyour
weapon
!2*Other
names
and
brands
maybe
claimed
asthe
property
of
others.Execution
Model
+Fault
ToleranceMechanismContinuous
StreamingMicro-BatchApacheStorm*TwitterHeron*AapcheFlink*ApacheGearpump*Apache
SparkStreaming*Apache
StormTrident*Source
Operator
SinkSourceOperatorSink4*Other
names
and
brands
maybe
claimed
asthe
property
of
others.Continuous
StreamingAckperRecordContinuous
StreamingMicro-BatchCheckpoint
“perBatch”Checkpoint
perBatchApacheStorm*TwitterHeron*AapcheFlink*ApacheGearpump*Apache
SparkStreaming*Apache
StormTrident*StorageStorageStorageSourceOperatorSinkSourceOperatorSinkSourceoffsetOperator
Sinkoffsetstate
strackstate
str
job
statusididJobManager/HDFSAckerDriverHDFSThisisthecriticalpart,asitaffects
many
features5*Other
names
and
brands
maybe
claimed
asthe
property
of
others.Continuous
StreamingAckperRecordContinuous
StreamingMicro-BatchCheckpoint
“perBatch”Checkpoint
perBatchApacheStorm*TwitterHeron*AapcheFlink*ApacheGearpump*Apache
SparkStreaming*Apache
StormTrident*LowLatencyHighLatencyLowOverheadHighThroughputHighOverheadLowThroughput6*Other
names
and
brands
maybe
claimed
asthe
property
of
others.Delivery
GuaranteeApacheStorm*TwitterHeron*AapcheFlink*ApacheGearpump*Apache
SparkStreaming*Apache
StormTrident*Atleast
onceExactlyonce?
Ackersknow
about
ifarecord
is
processedsuccessfullyor
not.
Ifitfailed,replayit.?
State
is
persisted
indurablestorage?
Checkpointis
linked
withstate
storage
perBatch?
Thereis
nostateconsistency
guarantee.7*Other
names
and
brands
maybe
claimed
asthe
property
of
others.Native
StateOperatorApacheStorm*TwitterHeron*AapcheFlink*ApacheGearpump*Apache
SparkStreaming*Apache
StormTrident*Yes*YesYes?
Storm:?
FlinkJava
API:
ValueState
ListState?Spark1.5:
KeyValueStateupdateStateByKey
ReduceState?
Spark1.6:
mapWithState?
Heron:X
UserMaintain?
FlinkScala
API:
mapWithState?
Trident:
persistentAggregateState?
Gearpump
persistState8*Other
names
and
brands
maybe
claimed
asthe
property
of
others.APIApacheStorm*TwitterHeron*CompositionalApacheGearpump*?
Highly
customizable
operator
basedonbasic
building
blocks?
Manualtopology
definition
andoptimizationTopologyBuilderbuilder=newTopologyBuilder();builder.setSpout(“input",newRandomSentenceSpout(),1);builder.setBolt("split",newSplitSentence(),3).shuffleGrouping("spout");builder.setBolt("count",newWordCount(),
2).fieldsGrouping("split",newFields("word"));“foo,foo,
bar”“foo”,“foo”,“bar”{“foo”:
2,
“bar”:1}SpoutBoltBolt10*Other
names
and
brands
maybe
claimed
asthe
property
of
others.Apache
SparkStreaming*Apache
StormTrident*DeclarativeAapcheFlink*ApacheGearpump*?
Higher
order
function
asoperators
(filter,
mapWithState…)?
Logical
planoptimizationDataStream<String>text=env.readTextFile(params.get("input"));DataStream<Tuple2<String,Integer>>counts
=text.flatMap(newTokenizer()).keyBy(0).sum(1);“foo,foo,
bar”“foo”,“foo”,“bar”{“foo”:
1,
“foo”:1,
“bar”:1}{“foo”:
2,
“bar”:1}11*Other
names
and
brands
maybe
claimed
asthe
property
of
others.Statistical?
Data
scientistfriendly?
Dynamic
typeApache
SparkStreaming*ApacheStorm*TwitterHeron*?StructuredStreaming*?ApacheStorm*PythonRlines<-textFile(sc,
“input”)words<-flatMap(lines,function(line)
{lines=ssc.textFileStream(params.get("input"))words=lines.flatMap(lambdaline:line.split(“,"))pairs=words.map(lambdaword:(word,
1))counts=pairs.reduceByKey(lambdax,y:
x
+y)counts.saveAsTextFiles(params.get("output"))strsplit(line,“
”)[[1]]})wordCount<-lapply(words,function(word)
{list(word,1L)}counts<-reduceByKey(wordCount,“+”,
2L)12*Other
names
and
brands
maybe
claimed
asthe
property
of
others.SQLApache
SparkStreaming*AapcheStructuredStreamingApache
StormTrident*PureStyleFusionStyleFlink*CREATE
EXTERNALTABLEInputDStream.transform((rdd:
RDD[Order],time:
Time)
=>{import
sqlContext.implicits._rdd.toDF.registAsTempTableval
SQL
="SELECT
ID,UNIT_PRICE
*QUANTITYAS
TOTAL
FROM
ORDERS
WHERE
UNIT_PRICE
*QUANTITY>
50"ORDERS
(ID
INTPRIMARY
KEY,UNIT_PRICE
INT,QUANTITYINT)LOCATION
'kafka://localhost:2181/brokers?topic=orders'TBLPROPERTIES
'{...}}‘INSERT
INTO
LARGE_ORDERS
SELECT
ID,UNIT_PRICE
*QUANTITYAS
TOTAL
FROM
ORDERS
WHERE
UNIT_PRICE
*QUANTITY>50val
largeOrderDF
=
sqlContext.sql(SQL)largeOrderDF.toRDD})bin/storm
sql
XXXX.sql13*Other
names
and
brands
maybe
claimed
asthe
property
of
others.SummaryCompositionalDeclarativePython/RSQLApache
SparkStreaming*X√√√ApacheStorm*NOTsupportaggregation,windowing
andjoining√X√X√X√√√X√XApache
StormTrident*ApacheGearpump*XXAapcheFlink*Supportselect,
from,where,unionXTwitterHeron*√?X14*Other
names
and
brands
maybe
claimed
asthe
property
of
others.RuntimeModelTwitterHeron*?
Single
Taskon
SingleProcessJVMProcessJVMProcessConnectwithlocalSMConnectwithlocalSMTaskTaskThreadThreadThreadThreadAapcheFlink*?
MultiTasksof
MultiApplications
onSingle
ProcessJVMProcessJVMProcessTaskTaskTaskTaskTaskThreadThreadThreadTaskThreadThreadTasktask
from
application
Atask
from
application
B16*Other
names
and
brands
maybe
claimed
asthe
property
of
others.?
MultiTasksof
Singleapplication
on
Single
ProcessApache
SparkStreaming*o
Single
taskonsinglethreadJVMProcessJVMProcessTaskTaskTaskTaskTaskThreadThreadThreadThreadThreadApacheStorm*Apache
StormTrident*ApacheGearpump*o
Multi
tasksonsinglethreadJVMProcessJVMProcessTaskTaskTaskTaskTaskTaskTaskTaskThreadThreadThreadThread17*Other
names
and
brands
maybe
claimed
asthe
property
of
others.●Window
Support
●Out-of-order
Processing
●Memory
Management●ResourceManagement
●WebUI
●CommunityMaturityWindow
Supportsmaller
than
gapttSliding
WindowCount
Window?Session
Window??session
gapSession
WindowSlidingWindowCountWindowApache
SparkStreaming*Apache√√X√√X√XX?XXX√Storm*Apache
StormTrident*√Apache√?√Gearpump*Apache
Flink*ApacheHeron*XX19*Other
names
and
brands
maybe
claimed
asthe
property
of
others.Out-of-orderProcessingProcessing
TimeEvent
TimeWatermarkApache
Spark√√√√√√√?√X?√Streaming*ApacheStorm*Apache
StormTrident*X√X√ApacheGearpump*AapcheFlink*√√TwitterHeron*XX20*Other
names
and
brands
maybe
claimed
asthe
property
of
others.Memory
ManagementJVM
ManageSelf
Manage
on-heapSelf
Manage
off-heapApache
Spark√√√√√√?√√?√Streaming*AapcheFlink*ApacheStorm*XXXXXXApacheGearpump*TwitterHeron*21*Other
names
and
brands
maybe
claimed
asthe
property
of
others.ResourceManagementStandaloneYARNMesosApache
Spark√√√√√√√√?√?√√√?√?XStreaming*ApacheStorm*Apache
StormTrident*ApacheGearpump*AapcheFlink*√XTwitterHeron*√√22*Other
names
and
brands
maybe
claimed
asthe
property
of
others.WebUISubmitJobsCancelJobsInspectJobsShowStatisticsShowInput
RateCheckExceptionsInspectConfigAlertApacheSparkXX√√X√√√√X√√√√√√√√√√√√?√?X√√√√√√√√√√XXXXXStreaming*ApacheStorm*ApacheGearpump*ApacheFlink*TwitterHeron*√?23*Other
names
and
brands
maybe
claimed
asthe
property
of
others.Past
1Months
Summary
onGitHubCommunity
Maturity10008006004002000CommitsCommittor780ApacheInitiationTimeContributorsTopProjectApacheSpark21718413020132011201420102014201492621921102213420205Streaming*SparkStorm
GearpumpFlinkHeronApacheStorm*Source
website:
/apache/spark/pulse/monthlyPast
3Months
Summary
onJIRAApacheGearpump*Incubator2015Created514Resloved25002000150010005002161ApacheFlink*2084423716177TwitterHeron*2014N/A0SparkStorm
GearpumpFlinkHeronSource
website:
/jira/secure/Dashboard.jspa24*Other
names
andbrandsmay
beclaimed
astheproperty
ofothers.Inteldoes
notcontrolor
auditthird-party
benchmark
dataorthe
websites
referenced
inthisdocument.
Youshouldvisitthereferenced
websiteandconfirmwhether
referenced
dataareaccurate.HiBench
6.0Test
Philosophical?
“LazyBenchmarking”?
Simple
test
caseinfer
practicalusecaseCluster
SetupApacheKafka*
ClusterNameVersion?CPU:
2
xIntel(R)
Xeon(R)
CPU
E5-2699
v3@
2.30GHzMem:
128
GBDisk:8
xHDD
(1TB)Network:
10
Gbpsx3x7Java...1???ScalaApache
Hadoop*Apache
Zookeeper*Apache
Kafka*Apache
Spark*Apache
Storm*Apache
Flink*Apache
Gearpump*Test
Cluster?CPU:
2
xIntel(R)
Xeon(R)
CPU
E5-2697
v2@
2.70GHzCore:
20
/
24Mem:
80
/128
GBDisk:8
xHDD
(1TB
)Network:
10
Gbps??????Apache
Heron*
require
specific
Operation
System
(Ubuntu/CentOS/MacOS)Structured
Streaming
doesn’tsupport
Kafka
sourceyet
(Spark
2.0)27*Other
names
and
brands
maybe
claimed
asthe
property
of
others.ArchitectureTestCluster(Standalone)KafkaBrokerDataGeneratorKafkaBrokerClientMasterTopicAInTimeTopicAKafkaBrokerSlaveSlaveSlave20Core80GMem20Core80GMem20Core80GMemSlaveSlaveSlaveSlaveOutTime20Core80GMem20Core80GMem20Core80GMem20Core80GMemFileSystemResultMetricsReaderOutTime–InTimeFramework
ConfigurationFrameworkRelated
ConfigurationApache
SparkStreaming*7Executor140Parallelism7TaskManager140ParallelismAapcheFlink*ApacheStorm*28Worker140KafkaSpoutApacheGearpump*28Executors140KafkaSource29*Other
names
and
brands
maybe
claimed
asthe
property
of
others.RawInputData?
Kafka
Topic
Partition:140?
SizePerMessage(configurable):
200bytes?
RawInput
MessageExample:“0,6,nbizrgdziebsaecsecujfjcqtvnpcnxxwiopmddorcxnlijdizgoi,1991-06-10,0.115967035,Mozilla/5.0
(iPhone;
U;CPUlike
Mac
OS
X)AppleWebKit/420.1
(KHTML
like
Gecko)
Version/3.0
Mobile/4A93Safari/419.3,YEM,YEM-AR,snowdrops,1”?
Strong
Type:
classUserVisit
(ip,
sessionId,browser)5
minutes?
Keepfeedingdataat
specific
ratefor
5minutes30Data
InputRateThroughputMessage/SecondKafkaProducerNum40KB/s400KB/s4MB/s0.2K2K1120K200K400K2M140MB/s80MB/s400MB/s600MB/s800MB/s111015203M4MTest
Case:IdentityTheapplication
readsinput
datafrom
Kafka
andthen
writesresult
toKafka
immediately,
thereisno
complexbusinesslogic
involved.ResultP99
Latency
(s)8765432100100200300InputRate(MB/s)ApacheFlink*400500600700800ApacheSpark*ApacheStorm*withoutAckApacheStorm*withAck*Other
namesand
brands
maybe
claimed
astheproperty
ofothers.Formore
completeinformationaboutperformanceandbenchmark
results,visit
/benchmarks.Results
havebeenestimated
orsimulatedusing
internal
Intelanalysisorarchitecturesimulationormodeling,andprovided
to
youforinformationalpurposes.
Any
differencesinyoursystemhardware,
software
orconfigurationmayaffect
youractual
performance.TestCase:RepartitionBasically,
thistestcasecan
standfor
theefficiency
of
datashuffle.NetworkShuffleResultP99
Latency
(s)Throughput
(MB/s)8006004002000400300200100000200400600800200400600800InputRate(MB/s)InputRate(MB/s)ApacheSpark*ApacheFlink*ApacheSpark*ApacheFlink*ApacheStorm*withoutAckApacheGearpump*ApacheStorm*withAckApacheStorm*withoutAckApacheGearpump*ApacheStorm*withAck*Other
namesand
brands
maybe
claimed
astheproperty
ofothers.Formore
completeinformationaboutperformanceandbenchmark
results,visit
/benchmarks.Results
havebeenestimated
orsimulatedusing
internal
Intelanalysisorarchitecturesimulationormodeling,andprovided
to
youforinformationalpurposes.
Any
differencesinyoursystemhardware,
software
orconfigurationmayaffect
youractual
performance.Observation?
SparkStreaming
needto
scheduletaskwithadditionalcontext.Under
tinybatchintervalcase,
the
overhead
couldbedramaticworse
comparedtoother
frameworks.?
According
toourtest,minimumBatchInterval
ofSparkisabout80ms
(140tasksperbatch),otherwise
taskscheduledelaywill
keepincreasing?
Repartition
is
heavyforeveryframework,
but
usually
it’sunavoidable.?
Latencyof
Gearpumpis
stillquite
loweven
under800MB/s
inputthroughput.Test
Case:
StatefulWordCountNative
stateoperator
issupported
byall
frameworks
weevaluatedStateful
operator
performance
+Checkpoint/Acker
costResultP99
Latency
(s)Throughput
(MB/s)100808007006005004003002001000604020002004006008000200400600800InputRate(MB/s)InputRate(MB/s)ApacheSpark*ApacheFlink*withoutCPApacheGearpump*ApacheFlink*ApacheStorm*ApacheSpark*ApacheStorm*ApacheFlink*Gearpump**Other
namesand
brands
maybe
claimed
astheproperty
ofothers.Formore
completeinformationaboutperformanceandbenchmark
results,visit
/benchmarks.Results
havebeenestimated
orsimul
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 哺乳期解除勞動合同協(xié)議范本
- 2024年房屋補漏維修工程合同
- 2024專項資金借款的合同范本
- 員工聘用合同協(xié)議書范文2024年
- 建設(shè)工程內(nèi)部承包合同書2024年
- 2024新款供貨合同協(xié)議書
- 2024【流動資金外匯借貸合同】公司流動資金合同
- 2024年公司股東之間借款合同實例
- 專業(yè)房屋買賣合同模板大全
- 2024年事業(yè)單位聘用
- 市政道路工程施工全流程圖
- 猜猜哪是左哪是右課件
- 單層門式輕鋼結(jié)構(gòu)廠房施工組織設(shè)計
- 融資租賃租金計算模板
- DL5168-2023年110KV-750KV架空輸電線路施工質(zhì)量檢驗及評定規(guī)程
- 詳細解讀公文格式
- (全冊)教學設(shè)計(教案)新綱要云南省實驗教材小學信息技術(shù)四年級第3冊全冊
- 農(nóng)產(chǎn)品市場營銷-東北農(nóng)業(yè)大學中國大學mooc課后章節(jié)答案期末考試題庫2023年
- EN81-41升降平臺歐洲標準
- 內(nèi)鏡下粘膜剝離術(shù)-課件
- 2024屆福建省泉州高考一模地理試題(解析版)
評論
0/150
提交評論