專6-4-基于kafka-spark streaming的數(shù)據(jù)處理系統(tǒng)及測(cè)試-甄麗霞_第1頁(yè)
專6-4-基于kafka-spark streaming的數(shù)據(jù)處理系統(tǒng)及測(cè)試-甄麗霞_第2頁(yè)
專6-4-基于kafka-spark streaming的數(shù)據(jù)處理系統(tǒng)及測(cè)試-甄麗霞_第3頁(yè)
專6-4-基于kafka-spark streaming的數(shù)據(jù)處理系統(tǒng)及測(cè)試-甄麗霞_第4頁(yè)
專6-4-基于kafka-spark streaming的數(shù)據(jù)處理系統(tǒng)及測(cè)試-甄麗霞_第5頁(yè)
已閱讀5頁(yè),還剩35頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

parkStreaming數(shù)據(jù)處理系統(tǒng)及測(cè)試待測(cè)系統(tǒng)涉及到的相關(guān)技術(shù)知234AnswerAnswerColume非頻繁更新項(xiàng)頻繁更新項(xiàng)5fastfastfastpartition01fastpartition02fastpartition01fastpartition02fastpartition03freqpartition01freqpartition02freqpartition03fastpartition05fastpartition08freqpartition05freqpartition06freqpartition07nfast-fast-partition09fastpartition10freqpartition08freqpartition10n6jsonFasthbaseFastjsonFasthbaseFast…map1map2map3kkkk7SSMSMS8zhihu-spooldir-kafka-agent.sources=zhihu-spooldir-source-freqzhihu-spooldir-source-freq3zhihu-spooldir-source-fastzhihu-spooldir-source-instantzhihu-spooldir-kafka-agent.channels=zhihu-kafka-channel-freqzhihu-kafka-channel-freq3zhihu-kafka-channel-fastzhihu-kafka-channel-instant1定義zhihu-freq、zhihu-freq3、zhihu-fast、zhihu-instant各組件對(duì)應(yīng)的的配置項(xiàng)#zhihu-freqzhihu-spooldir-kafka-agent.sources.zhihu-spooldir-source-freq.inputCharset=UTF-8//注:編碼方式,默認(rèn)是"UTF-8"zhihu-spooldir-kafka-agent.sources.zhihu-spooldir-source-freq.decodeErrorPolicy=IGNORE//注:傳輸?shù)倪^(guò)程中有不可解碼的流出現(xiàn)會(huì)導(dǎo)致flume停止服務(wù),加上這個(gè)配置之后增加flume魯棒性zhihu-spooldir-kafka-agent.sources.zhihu-spooldir-source-freq.customSourceCounterType=TimedSourceCerceptors=timestamp-interceptorstatic-interceptorcirclenumber-interceptordocid-interceptor//注:在eventsheader中加入key和value的攔截器,包括時(shí)間戳、環(huán)號(hào)、erceptors.timestamp-interceptor.type=timestamp注:在header中加入時(shí)間戳erceptors.docid-interceptor.type=erceptor.zhihu.ZhihuDocIdInterceptor$Builder//注:調(diào)用docid算法,在header中加docidzhihu-spooldir-kafka-agent.sources.zhihu-spooldir-source-freq.channels=zhihu-kafka-channel-freq//注:確認(rèn)本source對(duì)應(yīng)的channel組件名稱zhihu-spooldir-kafka-agent.channels.zhihu-kafka-channel-freq.type=org.apache.flume.channel.kafka.KafkaChannel//注:定義channel的類型為Kafkachannel。zhihu-spooldir-kafka-agent.channels.zhihu-kafka-channel-freq.brokerList=10.1xx.1xx.29:xxxx//注:設(shè)定鏈接的kafka的ip和端口zhihu-spooldir-kafka-agent.channels.zhihu-kafka-channel-freq.topic=zhihu-freq//注:設(shè)定本source對(duì)應(yīng)的kafka的topic類型是zhihu-freqzhihu-spooldir-kafka-agent.channels.zhihu-kafka-channel-freq.zookeeperConnect=10.1xx.1xx.29:xxxx/kafka//注:設(shè)定需要注冊(cè)和分配資源的zookeeper的ip和端口9zhihu-kafka-hdfszhihu-kafka-hdfs-agent.sinks.zhihu-hdfs-sink-freq.hdfs.rollInterval=120//注:hdfssink間隔多長(zhǎng)將臨時(shí)文件滾動(dòng)成最終目標(biāo)文件,單位:秒;zhihu-kafka-hdfs-agent.sinks.zhihu-hdfs-sink-freq.hdfs.rollSize=0//注:當(dāng)臨時(shí)文件達(dá)到該大小(單位:bytes)時(shí),滾動(dòng)成目標(biāo)文件;如果設(shè)置成0,則表示不根據(jù)臨時(shí)文件大小來(lái)滾動(dòng)文件;zhihu-kafka-hdfs-agent.sinks.zhihu-hdfs-sink-freq.hdfs.callTimeout=120000//注:執(zhí)行HDFS操作的超時(shí)時(shí)間(單位:毫秒);zhihu-kafka-hdfs-agent.sinks.zhihu-hdfs-sink-freq.hdfs.batchSize=10000//注:每個(gè)批次刷新到HDFS上的events數(shù)量zhihu-kafka-hdfs-agent.sinks.zhihu-hdfs-sink-freq.channel=zhihu-kafka-channel-freq//注:該sink對(duì)應(yīng)的channel的名稱-hdfs-sink-instantzhihu-hdfs-sink-blacklistChannels.brokerList=ip:portelfreq定義channel、sink組件,每個(gè)channel都對(duì)應(yīng)一個(gè)sink配置:zhihu-kafka-hdfs-agent.channels=zhihu-kafka-channel-freq-localqueryzhihu-kafka-channel-freqzhihu-kafka-channel-freq3zhihu-kafka-channel-fastzhihu-kafka-channel-instantzhihu-kafka-channel-blacklistzhihu-kafka-hdfs-agent.channels.zhihu-kafka-channel-freq.topic=zhihu-freq//注:kafka里對(duì)應(yīng)的topic類型zhihu-kafka-hdfs-agent.channels.zhihu-kafka-channel-freq.zookeeperConnect=10.1xx.1xx.29:xxxx/kafka//注:zookeeper的端口zhihu-kafka-hdfs-agent.channels.zhihu-kafka-channel-freq.groupId=zhihu-freq//注:consumer需要設(shè)置其所歸屬的groupidzhihu-kafka-hdfs-agent.channels.zhihu-kafka-channel-freq.kafka.fetch.message.max.bytes=2000000000//注:表示消息的最大大小,單位是字節(jié)zhihu-kafka-hdfs-agent.sinks.zhihu-hdfs-sink-freq.type=hdfs//注:sink組件取出channel隊(duì)列中的數(shù)據(jù),存入相應(yīng)類型的存儲(chǔ)文件系統(tǒng)。這里定義的是存儲(chǔ)系統(tǒng)的類型zhihu-kafka-hdfs-agent.sinks.zhihu-hdfs-sink-freq.hdfs.path=hdfs://sss/xxx/xxx/data/zhihu-test/%{pushtype}/sjs_100_29/%Y%m/%Y%m%d//注:寫入hdfs的路徑,包含文件系統(tǒng)標(biāo)識(shí)“sss/xxx/xxx/data/”是hdfs的系統(tǒng)路徑;“zhihu-test”是為本次測(cè)試創(chuàng)建的hdfs的數(shù)據(jù)存儲(chǔ)目錄;“%{pushtype}”是在spooldir-kafka的配置文件(erceptors.static-interceptor.value)中設(shè)置的pushtype的值;“sjs_100_29”是本測(cè)試機(jī)的標(biāo)識(shí);“%Y%m/%Y%m%d”年月的目錄地址;//注:存儲(chǔ)到hdfs里的文件名稱:/sss/xxx/xxx/data/zhihu-test/freq/sjs_100_2/i26s0.1t/0p16e0=12/dfrefqs-2016012423-7.1453808288400.lzo(T3)(T3)(Producer)(push)(push)(push)(T3)(T3)(Producer)(push)(push)(push)(Broker)Real-time(T2)Hadoop(T1)Other(T1)ata(T3)(pull)(pull)(pull)(pull)(push)((8 (注:即是哪個(gè)原始文件的數(shù)據(jù))、circlenumber(注:根據(jù)數(shù)據(jù)類型,偽造各類型數(shù)據(jù)的url,計(jì)算環(huán)數(shù),數(shù)據(jù)總共分8環(huán))、pushtype(注:該數(shù)據(jù)類型是freqinstantfast寫入kafkaT1-P1-R0-LT1-P2-R0T1-P1-R2T1-P2-R1-LT2-PT1-P1-R0-LT1-P2-R0T1-P1-R2T1-P2-R1-LT2-P1-R2T2-P2-R1T1-P1-R1T1-P2-R2T2-P1-R1-LT2-P2-R0-LT2-P1-R1T2-P1-R0-LT2-P2-R2T1-P3-RT1-P3-R0-L(new)加載PA解析的jar包;加載配置文件確認(rèn)下游模塊;加載配置文件確認(rèn)topic類型;instantSparkstreaming加載PA解析的jar包;加載配置文件確認(rèn)下游模塊;加載配置文件確認(rèn)topic類型;instantSparkstreamingfreq加載配置文件確認(rèn)topic類型;加載hadoop數(shù)據(jù);Sparkstreaming根根據(jù)kafka的key,獲取對(duì)應(yīng)的hbase數(shù)據(jù);解析數(shù)據(jù)拼xpage,發(fā)送給index模塊;將instant數(shù)據(jù)寫入hbase;寫入寫入habasememory處理的進(jìn)程數(shù)memory處理的進(jìn)程數(shù)d1--masteryarn-client\//注:以client方式連接到Y(jié)ARN集群,集群的定位由環(huán)境變量HADOOP_CONF_DIR定義,該方式1--driver-memory1G\//注:drivermemory并不是master分配了多少內(nèi)存,而是管理多少內(nèi)存。換言之就是為當(dāng)前應(yīng)用分配了多少內(nèi)存運(yùn)行的集群隊(duì)列1--num-executors5\注:在yarn集群上啟動(dòng)5個(gè)進(jìn)程進(jìn)行數(shù)據(jù)處理,其中一個(gè)進(jìn)程讀取數(shù)據(jù),剩余進(jìn)程進(jìn)行數(shù)據(jù)處理一定程度就會(huì)爆棧。一般先設(shè)置DM(drivermemory),隨后根據(jù)集群情況、任務(wù)大小等實(shí)際情況來(lái)設(shè)置EM(executor一定程度就會(huì)爆棧。一般先設(shè)置DM(drivermemory),隨后根據(jù)集群情況、任務(wù)大小等實(shí)際情況來(lái)設(shè)置EM(executorinstant序列化之后發(fā)送給剩余的進(jìn)程進(jìn)行數(shù)據(jù)處理,但flume對(duì)于序列化的操even以通過(guò)yarn執(zhí)行的時(shí)候涉及到數(shù)據(jù)分發(fā)到多個(gè)進(jìn)程了,此時(shí)flumeeven函數(shù)進(jìn)行修改;但本機(jī)運(yùn)行的時(shí)候不涉及數(shù)據(jù)分發(fā)所以能確性測(cè)試:instantPAxpagexslt解析正確性;不同業(yè)務(wù)流程涉及不fre

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論