實(shí)驗(yàn)第七章mapreduce編程初級(jí)實(shí)踐答案廈門林子雨大數(shù)據(jù)技術(shù)原理與應(yīng)用配套機(jī)房上機(jī)指南

上傳人：我*** IP屬地：北京上傳時(shí)間：2022-08-27 格式：DOCX 頁數(shù)：14 大小：344.49KB 積分：9.6 舉報(bào) 版權(quán)申訴

實(shí)驗(yàn)第七章mapreduce編程初級(jí)實(shí)踐答案廈門林子雨大數(shù)據(jù)技術(shù)原理與應(yīng)用配套機(jī)房上機(jī)指南_第2頁

實(shí)驗(yàn)第七章mapreduce編程初級(jí)實(shí)踐答案廈門林子雨大數(shù)據(jù)技術(shù)原理與應(yīng)用配套機(jī)房上機(jī)指南_第3頁

實(shí)驗(yàn)第七章mapreduce編程初級(jí)實(shí)踐答案廈門林子雨大數(shù)據(jù)技術(shù)原理與應(yīng)用配套機(jī)房上機(jī)指南_第4頁

實(shí)驗(yàn)第七章mapreduce編程初級(jí)實(shí)踐答案廈門林子雨大數(shù)據(jù)技術(shù)原理與應(yīng)用配套機(jī)房上機(jī)指南_第5頁

已閱讀5頁，還剩9頁未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、廈門大學(xué)林子雨編著大數(shù)據(jù)技術(shù)原理與應(yīng)用配套機(jī)房上機(jī)實(shí)驗(yàn)指南（）實(shí)驗(yàn) 5第七章 MapReduce 編程初級(jí)實(shí)踐（版本號(hào)：2016 年 5 月 14 日版本）主講教師：林子雨廈門大學(xué)數(shù)據(jù)庫二零一六年五月目錄1.實(shí)驗(yàn)?zāi)康?2.實(shí)驗(yàn).1實(shí)驗(yàn)內(nèi)容和要求1編程實(shí)現(xiàn)文件合并和去重操作1編寫程序?qū)崿F(xiàn)對(duì)輸入文件的排序3對(duì)給定的表格進(jìn)行信息挖掘64.實(shí)驗(yàn).10附錄 1:任課教師介紹10附錄 2：課程介紹10附錄 3：中國高校大數(shù)據(jù)課程公共介紹12廈門大學(xué)林子雨編著大數(shù)據(jù)技術(shù)原理與應(yīng)用配套機(jī)房上機(jī)實(shí)驗(yàn)指南（）實(shí)驗(yàn) 5第七章 MapReduce 編程初級(jí)實(shí)踐主講教師：林子雨個(gè)人主頁： HYPERLINK http:

2、/w/linziyu http:/w/linziyu: z實(shí)驗(yàn)?zāi)康耐ㄟ^實(shí)驗(yàn)掌握基本的 MapReduce 編程方法；掌握用 MapReduce 解決一些常見的數(shù)據(jù)處理問題，包括數(shù)據(jù)去重、數(shù)據(jù)排序和數(shù)據(jù)挖掘等。實(shí)驗(yàn)已經(jīng)配置完成的 Hadoop 偽分布式環(huán)境。Ubuntu 下 Hadoop 偽分布式環(huán)境配置：http:/blog/install-hadoop-in-centos/Ubuntu 下使用 Eclipse 編譯運(yùn)行 MapReduce 程序示例：http:/本實(shí)驗(yàn)運(yùn)行環(huán)境為 Ubuntu14.04 Hadoop2.7.1/blog/hadouild-project-using-eclip

3、se/3.實(shí)驗(yàn)內(nèi)容和要求1.編程實(shí)現(xiàn)文件合并和去重操作對(duì)于兩個(gè)輸入文件，即文件 A 和文件 B，請(qǐng)編寫 MapReduce 程序，對(duì)兩個(gè)文件進(jìn)行合并，并剔除其中重復(fù)的內(nèi)容，得到一個(gè)新的輸出文件 C。下面是輸入文件和輸出文件的一個(gè)樣例供參考。輸入文件 A 的樣例如下：輸入文件 B 的樣例如下：20150101x20150102y20150103x20150104y20150105z20150106x根據(jù)輸入文件 A 和 B 合并得到的輸出文件 C 的樣例如下：package com.Merge;importjava.io.IOException;import import import impo

4、rt import import import import import import.apache.hadoop.conf.Configuration;.apache.hadoop.fs.Path;.apache.hadoop.io.Writable;.apache.hadoop.io.Text;.apache.hadoop.mapreduce.Job;.apache.hadoop.mapreduce.Mapper;.apache.hadoop.mapreduce.Reducer;.apache.hadoop.mapreduce.lib.input.FileInputFormat;.apa

5、che.hadoop.mapreduce.lib.output.FileOutputFormat;.apache.hadoop.util.GenericOptionsParser;publicclass Merge /*param args對(duì) A,B 兩個(gè)文件進(jìn)行合并，并剔除其中重復(fù)的內(nèi)容，得到一個(gè)新的輸出文件 C*/重載 map 函數(shù)，直接將輸入中的 value到輸出數(shù)據(jù)的 key 上public sic class Mxtends Mapperprivate sic Text text = new Text();public void map(Object key, Text value,

6、 Context context) throwsIOException,erruptedException20150101x20150101y20150102y20150103x20150104y20150104z20150105y20150105z20150106x20150101y20150102y20150103x20150104z20150105ytext = value;context.write(text, new Text();/重載 reduce 函數(shù)，直接將輸入中的 key到輸出數(shù)據(jù)的 key 上public sic class Reduce extends Reducer

7、public void reduce(Text key, Iterable values, Context context )throws IOException,erruptedExceptioncontext.write(key, new Text();public sic void main(String args) throws Exception/ TODO Auto-generated method stub Configuration conf = new Configuration();conf.set(,hdfs:/localhost:9000);String otherAr

8、gs = new Stringinput,output; /* 直接設(shè)置輸入?yún)?shù)*/if (otherArgs.length != 2) System.err.prln(Usage: wordcount ); System.exit(2);Job job = Job.getInstance(conf,Merge and duplicate removal); job.setJarByClass(Merge.class); job.setMapperClass(Map.class);binerClass(Reduce.class); job.setReducerClass(Reduce.clas

9、s); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class);FileInputFormat.addInputPath(job, ne FileOutputFormat.setOutputPath(job, neth(otherArgs0);th(otherArgs1);? 0 : 1);System.exit(pletion(true)2. 編寫程序?qū)崿F(xiàn)對(duì)輸入文件的排序現(xiàn)在有多個(gè)輸入文件，每個(gè)文件中的每行內(nèi)容均為一個(gè)整數(shù)。要求所有文件中的整數(shù)，進(jìn)行升序排序后，輸出到一個(gè)新的文件中，輸出的數(shù)據(jù)格式為每行兩個(gè)

10、整數(shù)，第一個(gè)數(shù)字為第二個(gè)整數(shù)的排序位次，第二個(gè)整數(shù)為原待排列的整數(shù)。下面是輸入文件和輸出文件的一個(gè)樣例供參考。輸入文件 1 的樣例如下：333712輸入文件 2 的樣例如下：輸入文件 3 的樣例如下：根據(jù)輸入文件 1、2 和 3 得到的輸出文件如下：package com.MergeSort;importjava.io.IOException;import import import import import import import import import import.apache.hadoop.conf.Configuration;.apache.hadoop.fs.Path;.

11、apache.hadoop.io.Writable;.apache.hadoop.io.Text;.apache.hadoop.mapreduce.Job;.apache.hadoop.mapreduce.Mapper;.apache.hadoop.mapreduce.Partitioner;.apache.hadoop.mapreduce.Reducer;.apache.hadoop.mapreduce.lib.input.FileInputFormat;.apache.hadoop.mapreduce.lib.output.FileOutputFormat;1 12 43 54 1259

12、3910 4011 451452541639540import.apache.hadoop.util.GenericOptionsParser;public class MergeSort /*param args輸入多個(gè)文件，每個(gè)文件中的每行內(nèi)容均為一個(gè)整數(shù)輸出到一個(gè)新的文件中，輸出的數(shù)據(jù)格式為每行兩個(gè)整數(shù)，第一個(gè)數(shù)字為第二個(gè)整數(shù)的排序位次，第二個(gè)整數(shù)為原待排列的整數(shù)*/map 函數(shù)輸入中的 value，將其轉(zhuǎn)化成Writable 類型，最后作為輸出 keypublic sic class Mxtends Mapperprivate sicWritable data = newWritabl

13、e();public void map(Object key, Text value, Context context) throwsIOException,erruptedExceptionString text = value.toString();data.set(egarse(text);Writable(1);context.write(data, new/reduce 函數(shù)將 map 輸入的 key到輸出的 value 上，然后根據(jù)輸入的 value-list中元素的個(gè)數(shù)決定 key 的輸出次數(shù),定義一個(gè)全局變量 line_num 來代表 key 的位次public s Writa

14、ble,ic class Reduce extends ReducerWritable,private sicWritable line_num = newWritable(1);public void reduce(Writable key, Iterable values, Contextcontext) throws IOException,erruptedExceptionfor(Writable val : values)context.wriine_num, key);line_num = newWritable(line_num.get() + 1);/自定義 Partition

15、函數(shù)，此函數(shù)根據(jù)輸入數(shù)據(jù)的最大值和 MapReduce 框架中 Partition 的數(shù)量獲取將輸入數(shù)據(jù)按照大小分塊的邊界，然后根據(jù)輸入數(shù)值和邊界的關(guān)系返回對(duì)應(yīng)的 Partiton IDpublic spublicic class Partition extends PartitionergetPartition(Writable key,Writable value,num_Partition)Maxnumber = 65223;/型的最大數(shù)值bound = Maxnumber/num_Partition+1; keynumber = key.get();for (i = 0; inum_

16、Partition; i+)if(keynumber=bound * i) return i;return -1;public sic void main(String args) throws Exception/ TODO Auto-generated method stub Configuration conf = new Configuration();conf.set(,hdfs:/localhost:9000);String otherArgs = new Stringinput,output; /* 直接設(shè)置輸入?yún)?shù)*/if (otherArgs.length != 2) Sys

17、tem.err.prln(Usage: wordcount ); System.exit(2);Job job = Job.getInstance(conf,Merge and sort); job.setJarByClass(MergeSort.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setPartitionerClass(Partition.class); job.setOutputKeyClass(Writable.class);job.setOutputValueClas

18、s(Writable.class);FileInputFormat.addInputPath(job, ne FileOutputFormat.setOutputPath(job, neth(otherArgs0);th(otherArgs1);? 0 : 1);System.exit(pletion(true)3. 對(duì)給定的表格進(jìn)行信息挖掘下面給出一個(gè) child-parent 的表格，要求挖掘其中的父子輩關(guān)系，給出祖孫輩關(guān)系的表格。輸入文件內(nèi)容如下：child parent Steven Lucy Steven Jack Jone Lucy Jone Jack Lucy Mary Lucy

19、 Fr Jack AliceJack Jesse輸出文件內(nèi)容如下：package com.simple_data_mining;import importjava.io.IOException; java.util.*;import import import import import import import import import import.apache.hadoop.conf.Configuration;.apache.hadoop.fs.Path;.apache.hadoop.io.Writable;.apache.hadoop.io.Text;.apache.hadoop

20、.mapreduce.Job;.apache.hadoop.mapreduce.Mapper;.apache.hadoop.mapreduce.Reducer;.apache.hadoop.mapreduce.lib.input.FileInputFormat;.apache.hadoop.mapreduce.lib.output.FileOutputFormat;.apache.hadoop.util.GenericOptionsParser;publicclass simple_data_mining public sictime = 0;grandchild grandparent Ma

21、rk JesseMark Alice PhilipJessePhilipAlice Jone JesseJone Alice StevenJesseStevenAliceStevenFrStevenMary Jone FrJone MaryDavid Alice David Jesse Philip David Philip Alma Mark DavidMark Alma/*param args輸入一個(gè) child-parent 的表格輸出一現(xiàn) grandchild-grandparent 關(guān)系的表格*/Map 將輸入文件按照空格分割成 child 和 parent，然后正序輸出一次作為右表

22、，反序輸出一次作為左表，需要注意的是在輸出的 value 中必須加上左右表區(qū)別標(biāo)志public spublic IOException,ic class MxtendsMapper Text value, Context context) throwsvoid map(Object key, erruptedExceptionString String String Stringchild_name = newString();parent_name = new String(); relation_type = new String(); line = value.toString();i

23、= 0;while(line.charAt(i) != ) i+;String values = line.substring(0,i),line.substring(i+1); if(pareTo(child) != 0)child_name = values0; parent_name = values1;relation_type = 1;/左右表區(qū)分標(biāo)志 context.write(new Text(values1), newText(relation_type+child_name+parent_name);/左表 relation_type = 2;context.write(ne

24、w Text(values0), new Text(relation_type+child_name+parent_name);/右表public sic class Reduce extends Reducerpublic void reduce(Text key, Iterable values,Context context) throwsIOException,erruptedExceptionif(time = 0)/輸出表頭context.write(new Text(grand_child), new Text(grand_parent);time+;grand_child_nu

25、m = 0;String grand_child = grand_parent_num = String grand_parent=new String10; 0;new String10;Iterator ite = values.iterator(); while(ite.hasNext()String record = ite.next().toString(); len = record.length();i = 2;if(len = 0) continue;char relation_type = record.charAt(0); String child_name = new S

26、tring(); String parent_name = new String();/獲取 value-list 中 value 的 childwhile(record.charAt(i) != +)child_name = child_name + record.charAt(i); i+;i=i+1;/獲取 value-list 中 value 的 parent while(ilen)parent_name = parent_name+record.charAt(i); i+;/左表，取出 child 放入 grand_child if(relation_type = 1)grand_c

27、hildgrand_child_num = child_name; grand_child_num+;else/右表，取出 parent 放入 grand_parent grand_parentgrand_parent_num = parent_name; grand_parent_num+;if(grand_parent_num != 0 & grand_child_num != 0 )for(for(m = 0;mgrand_child_num;m+) n=0;ngrand_parent_num;n+)context.write(new Text(grand_childm), newTex

28、t(grand_parentn);/輸出結(jié)果public sic void main(String args) throws Exception/ TODO Auto-generated method stub Configuration conf = new Configuration();conf.set(,hdfs:/localhost:9000);String otherArgs = new Stringinput,output; /* 直接設(shè)置輸入?yún)?shù)*/if (otherArgs.length != 2) System.err.prln(Usage: wordcount ); Sy

29、stem.exit(2);Job job = Job.getInstance(conf,Single table join ); job.setJarByClass(simple_data_mining.class);job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); FileInputFormat.addInputPath(job, ne FileOutputFormat.setOutputPath(job, neth(otherArgs0);th(otherArgs1);? 0 : 1);System.exit(pletion(true)4.實(shí)驗(yàn)附錄 1:任課教師介紹林子雨(1978),男,博士,廈門大學(xué)計(jì)算機(jī)科學(xué)系助理教授,主要研究領(lǐng)域?yàn)閿?shù)據(jù)庫,實(shí)時(shí)主動(dòng)數(shù)據(jù)倉庫,數(shù)據(jù)挖掘.主講課程：大數(shù)據(jù)技術(shù)

人人文庫> 全部分類> 教育資料 > 課件下載

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

實(shí)驗(yàn)第七章mapreduce編程初級(jí)實(shí)踐答案廈門林子雨大數(shù)據(jù)技術(shù)原理與應(yīng)用配套機(jī)房上機(jī)指南

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

實(shí)驗(yàn)第七章mapreduce編程初級(jí)實(shí)踐答案廈門林子雨大數(shù)據(jù)技術(shù)原理與應(yīng)用配套機(jī)房上機(jī)指南

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔