利用Hadoop構(gòu)建云計(jì)算基礎(chǔ)教程

上傳人：1*** IP屬地：上海上傳時(shí)間：2024-10-15 格式：DOC 頁(yè)數(shù)：63 大?。?.30MB 積分：9.6 舉報(bào) 版權(quán)申訴

已閱讀5頁(yè)，還剩58頁(yè)未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說(shuō)明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

TopofForm

BottomofForm

\o"Home"

Home

\o"WhatisBigData?"

BigData

\o"FindHadoopTutorialshere"

HadoopTutorials

\o"CassandraandCQL"

Cassandra

\o"CassandraHectorAPI"

HectorAPI

\o"AskforaTutorial"

RequestTutorial

\o"AboutMeandBigDataPlanet"

About

LABELS:

HADOOP-TUTORIAL

HDFS

3OCTOBER2013

HadoopTutorial:Part1-WhatisHadoop?(anOverview)

HadoopisanopensourcesoftwareframeworkthatsupportsdataintensivedistributedapplicationswhichislicensedunderApachev2license.

At-leastthisiswhatyouaregoingtofindasthefirstlineofdefinitiononHadoopinWikipedia.So

whatisdataintensivedistributedapplications?

Well

dataintensive

isnothingbut

BigData

(datathathasoutgrowninsize)anddistributedapplications

aretheapplicationsthatworksonnetworkbycommunicatingand

coordinatingwitheachotherbypassingmessages.(sayusingaRPCinterprocesscommunicationorthroughMessage-Queue)

HenceHadoopworksonadistributedenvironmentandisbuildtostore,handleandprocesslargeamountofdataset(inpetabytes,exabyteandmore).Nowheresinceiamsayingthathadoopstorespetabytesofdata,thisdoesn'tmeanthatHadoopisadatabase.Againrememberitsaframeworkthathandleslargeamountofdataforprocessing.YouwillgettoknowthedifferencebetweenHadoopandDatabases(orNoSQLDatabases,wellthat'swhatwecallBigData'sdatabases)asyougodownthelineinthecomingtutorials.

HadoopwasderivedfromtheresearchpaperpublishedbyGoogleon

GoogleFileSystem(GFS)

and

Google'sMapReduce.SotherearetwointegralpartsofHadoop:

HadoopDistributedFileSystem(HDFS)

and

HadoopMapReduce.

HadoopDistributedFileSystem(HDFS)

HDFSisafilesystemdesignedforstoring

verylargefiles

with

streamingdataaccesspatterns,runningonclustersof

commodityhardware.

WellLetsgetintothedetailsofthestatementmentionedabove:

VeryLargefiles:

Nowwhenwesayverylargefileswemeanherethatthesizeofthefilewillbeinarangeofgigabyte,terabyte,petabyteormaybemore.

Streamingdataaccess:

HDFSisbuiltaroundtheideathatthemostefficientdataprocessingpatternisawrite-once,read-many-timespattern.Adatasetistypicallygeneratedorcopiedfromsource,andthenvariousanalysesareperformedonthatdatasetovertime.Eachanalysiswillinvolvealargeproportion,ifnotall,ofthedataset,sothetimetoreadthewholedatasetismoreimportantthanthelatencyinreadingthefirstrecord.

CommodityHardware:

Hadoopdoesn'trequireexpensive,highlyreliablehardware.It’sdesignedtorun

onclustersofcommodityhardware(commonlyavailablehardwarethatcanbeobtainedfrommultiplevendors)forwhichthechanceofnodefailureacrosstheclusterishigh,atleastforlargeclusters.HDFSisdesignedtocarryonworkingwithoutanoticeableinterruptiontotheuserinthefaceofsuchfailure.

NowherewearetalkingaboutaFileSystem,HadoopDistributedFileSystem.AndweallknowaboutafewoftheotherFileSystemslikeLinuxFileSystemandWindowsFileSystem.Sothenextquestioncomesis...

WhatisthedifferencebetweennormalFileSystemandHadoopDistributedFileSystem?

ThemajortwodifferencesthatisnotablebetweenHDFSandotherFilesystemsare:

BlockSize:

Everydiskismadeupofablocksize.Andthisisthe

minimum

amountofdatathatiswrittenandreadfromaDisk.NowaFilesystemalsoconsistsofblockswhichismadeoutoftheseblocksonthedisk.Normallydiskblocksareof512bytesandthoseoffilesystemareofafewkilobytes.

Incaseof

HDFS

wealsohavetheblocksconcept.Buthereoneblocksizeisof64MBbydefaultandwhichcanbeincreasedinanintegralmultipleof64i.e.128MB,256MB,512MBorevenmoreinGB's.Italldependontherequirementanduse-cases.

SoWhyaretheseblockssizesolargeforHDFS?keeponreadingandyouwillgetitinanextfewtutorials:)

Metadata

Storage:

Innormalfilesystem

thereisa

hierarchical

storageofmetadatai.e.letssaythereisafolder

ABC,

insidethatfolderthereisagainoneanotherfolder

DEF,

andinsidethatthereis

hello.txt

file.Nowtheinformationabout

hello.txt

(i.e.metadatainfoofhello.txt)

filewillbewith

DEF

andagainthemetadataof

DEF

willbewith

ABC.Hencethisformsa

hierarchy

andthishierarchyismaintaineduntiltherootofthefilesystem.Butin

HDFS

wedon'thaveahierarchyofmetadata.Allthemetadatainformationresideswithasinglemachineknownas

Namenode

(orMasterNode)onthecluster.Andthisnodecontainsalltheinformationaboutotherfilesandfolderandlotsofotherinformationtoo,whichwewilllearninthenextfewtutorials.:)

WellthiswasjustanoverviewofHadoopandHadoopDistributedFileSystem.NowinthenextpartiwillgointothedepthofHDFSandthereafterMapReduceandwillcontinuefromhere...

Letmeknowifyouhaveanydoubtsin

understanding

anythingintothecommentsectionandiwillbereallygladtoanswerthesame:)

IfyoulikewhatyoujustreadandwanttocontinueyourlearningonBIGDATAyoucan

subscribetoourEmail

andLikeour

facebookpage

Thesemightalsohelpyou:,

HadoopTutorial:Part4-WriteOperationsinHDFS

HadoopTutorial:Part3-ReplicaPlacementorReplicationandReadOperationsinHDFS

HadoopTutorial:Part2-HadoopDistributedFileSystem(HDFS)

HadoopTutorial:Part1-WhatisHadoop?(anOverview)

BestofBooksandResourcestoGetStartedwithHadoop

HadoopTutorial:Part5-AllHadoopShellCommandsyouwillNeed.

HadoopInstallationonLocalMachine(SinglenodeCluster)

FindCommentsbeloworAddone

RomainRigaux

said...

Nicesummary!

\o"commentpermalink"

October03,2013

pragyakhare

said...

Iknowi'mabeginnerandthisquestionmytbeasilly1butcanyoupleaseexplaintomethathowPARALLELISMisachievedviamap-reduceattheprocessorlevel???ifI'veadualcoreprocessor,isitthatonly2jobswillrunatatimeinparallel?

\o"commentpermalink"

October05,2013

Anonymoussaid...

HiIamfromMainframebackgroundandwithlittleknowledgeofcorejava...DoyouthinkJavaisneededforlearningHadoopinadditiontoHive/PIG?EvenwanttolearnJavaformapreducebutcouldn'tfindwhatallwillbeusedinrealtime..anddefinitiveguidebooksseemstoughforlearningmapreducewithJava..anyoptionwhereIcanlearnitstepbystep?

Sorryforlongcomment..butitwouldbehelpfulifyoucanguideme..

\o"commentpermalink"

October05,2013

DeepakKumar

said...

@PragyaKhare...

Firstthingalwaysremember...theonePopularsayingNOQuestionsareFoolish:)Andbtwitisaverygoodquestion.

Actuallytherearetwothings:

Oneiswhatwillbethebestpractice?andotheriswhathappensintherebydefault?...

Wellbydefaultthenumberofmapperandreducerissetto2foranytasktracker,henceoneseesamaximumof2mapsand2reducesatagiveninstanceonaTaskTracker(whichisconfigurable)..WellthisDoesn'tonlydependontheProcessorbutonlotsofotherfactoraswelllikeram,cpu,power,diskandothers

/blog/best-practices-for-selecting-apache-hadoop-hardware/

Andfortheotherfactori.eforBestPracticesitdependsonyourusecase.Youcangothroughthe3rdpointofthebelowlinktounderstanditmoreconceptually

/blog/2009/12/7-tips-for-improving-mapreduce-performance/

WelliwillexplainallthesewheniwillreachtheadvanceMapReducetutorials..Tillthenkeepreading!!:)

\o"commentpermalink"

October05,2013

DeepakKumar

said...

@Anonymous

AsHadoopiswritteninJava,somostofitsAPI'sarewrittenincoreJava...WelltoknowabouttheHadooparchitectureyoudon'tneedJava...ButtogotoitsAPILevelandstartprogramminginMapReduceyouneedtoknowCoreJava.

Andasfortherequirementinjavayouhaveaskedfor...youjustneedsimplecorejavaconceptsandprogrammingforHadoopandMapReduce..AndHive/PIGaretheSQLkindofdataflowlanguagesthatisreallyeasytolearn...Andsinceyouarefromaprogrammingbackgrounditwon'tbeverydifficulttolearnjava:)youcanalsogothroughthelinkbelowforfurtherdetails:)

/2013/09/What-are-the-Pre-requsites-for-getting-started-with-Big-Data-Technologies.html

\o"commentpermalink"

October05,2013

PostaComment

\o"NewerPost"

NewerPost→

\o"OlderPost"

←OlderPost

ABOUTTHEAUTHOR

DEEPAKKUMAR

BigData/HadoopDeveloper,SoftwareEngineer,Thinker,Learner,Geek,Blogger,Coder

IlovetoplayaroundData.

BigData

SubscribeupdatesviaEmail

TopofForm

JoinBigDataPlanettocontinueyourlearningonBigDataTechnologies

BottomofForm

GetUpdatesonFacebook

BigDataLibraries

BIGDATANEWS

CASSANDRA

HADOOP-TUTORIAL

HDFS

HECTOR-API

INSTALLATION

SQOOP

WhichNoSQLDatabasesaccordingtoyouisMostPopular?

GetConnectedonGoogle+

MostPopularBlogArticle

HadoopInstallationonLocalMachine(SinglenodeCluster)

HadoopTutorial:Part5-AllHadoopShellCommandsyouwillNeed.

WhatarethePre-requisitesforgettingstartedwithBigDataTechnologies

HadoopTutorial:Part3-ReplicaPlacementorReplicationandReadOperationsinHDFS

HadoopTutorial:Part1-WhatisHadoop?(anOverview)

HadoopTutorial:Part2-HadoopDistributedFileSystem(HDFS)

HadoopTutorial:Part4-WriteOperationsinHDFS

BestofBooksandResourcestoGetStartedwithHadoop

HowtouseCassandraCQLinyourJavaApplication

BacktoTop▲

#Note:UseScreenResolutionof1280pxandmoretoviewthewebsite@itsbest.AlsousethelatestversionofthebrowserasthewebsiteusesHTML5andCSS3:)

\o"Twitter:@bigdataplanet"

Twitter

\o"Facebook:BigDataPlanet"

Facebook

\o"RSSFeed:Blog"

RSS

\o"GooglePlus:BigDataPlanet"

Google

ABOUTME

CONTACT

PRIVACYPOLICY

?2013AllRightsReserved

BigDataPlanet.

Allarticlesonthiswebsite

DeepakKumar

islicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike3.0UnportedLicense

TopofForm

BottomofForm

\o"Home"

Home

\o"WhatisBigData?"

BigData

\o"FindHadoopTutorialshere"

HadoopTutorials

\o"CassandraandCQL"

Cassandra

\o"CassandraHectorAPI"

HectorAPI

\o"AskforaTutorial"

RequestTutorial

\o"AboutMeandBigDataPlanet"

About

LABELS:

HADOOP-TUTORIAL

HDFS

6OCTOBER2013

HadoopTutorial:Part2-HadoopDistributedFileSystem(HDFS)

Inthelasttutorialon

WhatisHadoop?

ihavegivenyouabriefideaaboutHadoop.SothetwointegralpartsofHadoopisHadoop

HDFS

andHadoop

MapReduce.

LetsgofurtherdeepinsideHDFS.

HadoopDistributedFileSystem

(HDFS)

Concepts:

FirsttakealookatthefollowingtwoterminologiesthatwillbeusedwhiledescribingHDFS.

Cluster:Ahadoopclusterismadebyhavingmanymachinesinanetwork,eachmachineistermedasanode,andthesenodestalkstoeachotheroverthenetwork.

BlockSize:

Thisistheminimumamountofsizeofoneblockinafilesystem,inwhichdatacanbekeptcontiguously.

ThedefaultsizeofasingleblockinHDFSis64Mb.

InHDFS,Dataiskeptbysplittingitintosmallchunksorparts.Letssayyouhaveatextfileof200MBandyouwanttokeepthisfileinaHadoopCluster.Thenwhathappensisthat,

thefilebreaksorsplitsintoalargenumberofchunks,whereeachchunkisequaltotheblocksizethatissetfortheHDFScluster(whichis64MBbydefault).

Hencea200Mboffilegetssplitinto4parts,3partsof64mband1partof8mb,andeachpartwillbekeptonadifferentmachine.OnwhichmachinewhichsplitwillbekeptisdecidedbyNamenode,aboutwhichwewillbediscussingindetailsbelow.

NowinaHadoopDistributedFileSystemorHDFSCluster,therearetwokindsofnodes,AMasterNodeandmanyWorkerNodes.Theseareknownas:

Namenode(masternode)andDatanode(workernode).

Namenode:

Thenamenodemanagesthefilesystemnamespace.Itmaintainsthefilesystemtreeandthemetadataforallthefilesanddirectoriesinthetree.Soitcontainstheinformationofallthefiles,directoriesandtheirhierarchyintheclusterintheformofa

NamespaceImage

and

editlogs.AlongwiththefilesysteminformationitalsoknowsabouttheDatanodeonwhich

alltheblocksofafileiskept.

Aclientaccessesthefilesystemonbehalfoftheuserbycommunicatingwiththenamenodeanddatanodes.TheclientpresentsafilesysteminterfacesimilartoaPortableOperatingSystemInterface(POSIX),sotheusercodedoesnotneedtoknowaboutthenamenodeanddatanodetofunction.

Datanode:

Thesearetheworkersthatdoestherealwork.Andherebyrealworkwemeanthatthestorageofactualdataisdonebythedatanode.Theystoreandretrieveblockswhentheyaretoldto(byclientsorthenamenode),andtheyreportbacktothenamenodeperiodicallywithlistsofblocksthattheyarestoring.

Hereoneimportantthingthatistheretonote:

InoneclustertherewillbeonlyoneNamenodeandtherecanbeNnumberofdatanodes.

SincetheNamenodecontainsthemetadataofallthefilesanddirectoriesandalsoknowsaboutthedatanodeonwhicheachsplitoffilesarestored.SoletssayNamenodegoesdownthenwhatdoyouthinkwillhappen?.

Yes,iftheNamenodeisDownwecannotaccessanyofthefilesanddirectoriesinthecluster.

Evenwewillnotbeabletoconnectwithanyofthedatanodestogetanyofthefiles.

Nowthinkofit,sincewehavekeptourfilesbysplittingitin

different

chunksandalsowehavekeptthemindifferentdatanodes.AnditistheNamenodethatkeepstrackofallthefilesmetadata.SoonlyNamenodeknowshowtoreconstructafilebackintoonefromallthesplits.andthisisthereasonthatifNamenodeisdowninahadoopclustersoeverythingisdown.

Thisisalsothereason

that's

whyHadoopisknownasaSinglePointoffailure.

NowsinceNamenodeissoimportant,wehavetomakethenamenoderesilienttofailure.Andforthathadoopprovidesuswithtwomechanism.

Thefirstwayistobackupthefilesthatmakeupthepersistentstateofthefilesystemmetadata.Hadoopcanbeconfiguredsothatthenamenodewritesitspersistentstatetomultiplefilesystems.Thesewritesaresynchronousandatomic.TheusualconfigurationchoiceistowritetolocaldiskaswellasaremoteNFSmount.

Thesecondwayisrunninga

SecondaryNamenode.

Wellasthenamesuggests,it

doesnot

actlikeaNamenode.Soifitdoesn'tactlikeanamenodehowdoesitpreventsfromthefailure.

Wellthe

Secondarynamenode

alsocontainsa

namespaceimage

and

editlogs

likenamenode.Nowaftereverycertainintervaloftime(whichisonehourbydefault)

itcopiesthe

namespaceimage

from

namenode

andmergethis

namespaceimage

withthe

editlog

andcopyitbacktothe

namenode

sothat

namenode

willhavethefreshcopyof

namespaceimage.Nowletssupposeatanyinstanceoftimethe

namenodegoesdownandbecomescorruptthenwecanrestart

someothermachinewiththenamespaceimageandtheeditlogthat'swhatwehavewiththe

secondarynamenodeandhencecanbepreventedfromatotalfailure.

SecondaryNamenodetakesalmostthesameamountofmemoryandCPUforitsworkingastheNamenode.Soitisalsokeptinaseparatemachinelikethatofanamenode.Henceweseeherethat

inasingleclusterwehaveoneNamenode,oneSecondarynamenodeandmanyDatanodes,andHDFSconsistsofthesethreeelements.

ThiswasagainanoverviewofHadoopDistributedFileSystemHDFS,InthenextpartofthetutorialwewillknowabouttheworkingofNamenodeandDatanodeinamoredetailedmanner.WewillknowhowreadandwritehappensinHDFS.

Letmeknowifyouhaveanydoubtsin

understanding

anythingintothecommentsectionandiwillbereallygladtoansweryourquestions:)

IfyoulikewhatyoujustreadandwanttocontinueyourlearningonBIGDATAyoucan

subscribetoourEmail

andLikeour

facebookpage

Thesemightalsohelpyou:,

HadoopInstallationonLocalMachine(SinglenodeCluster)

HadoopTutorial:Part4-WriteOperationsinHDFS

HadoopTutorial:Part3-ReplicaPlacementorReplicationandReadOperationsinHDFS

HadoopTutorial:Part2-HadoopDistributedFileSystem(HDFS)

HadoopTutorial:Part1-WhatisHadoop?(anOverview)

BestofBooksandResourcestoGetStartedwithHadoop

HadoopTutorial:Part5-AllHadoopShellCommandsyouwillNeed.

FindCommentsbeloworAddone

vishwash

said...

veryinformative...

\o"commentpermalink"

October07,2013

TusharKarande

said...

Thanksforsuchainformatictutorials:)

pleasekeepposting..waitingformore...:)

\o"commentpermalink"

October08,2013

Anonymoussaid...

NiceinformationButIhaveonedoubtlike,whatistheadvantageofkeepingthefileinpartofchunksondifferent-2datanodes?Whatkindofbenefitwearegettinghere?

\o"commentpermalink"

October08,2013

DeepakKumar

said...

@Anonymous:Welltherearelotsofreasons...iwillexplainthatwithgreatdetailsinthenextfewarticles...

Butfornowletusunderstandthis...sincewehavesplitthefileintotwo,nowwecantakethepoweroftwoprocessors(parallelprocessing)ontwodifferentnodestodoouranalysis(likesearch,calculation,predictionandlotsmore)..Againletssaymyfilesizeisinsomepetabytes...Yourwon'tfindoneHarddiskthatbig..andletssayifitisthere...howdoyouthinkthatwearegoingtoreadandwriteonthatharddisk(thelatencywillbereallyhightoreadandwrite)...itwilltakelotsoftime...Againtherearemorereasonsforthesame...Iwillmakeyouunderstandthisinmoretechnicalwaysinthecomingtutorials...Tillthenkeepreading:)

\o"commentpermalink"

October08,2013

PostaComment

\o"NewerPost"

NewerPost→

\o"OlderPost"

←OlderPost

ABOUTTHEAUTHOR

DEEPAKKUMAR

BigData/HadoopDeveloper,SoftwareEngineer,Thinker,Learner,Geek,Blogger,Coder

IlovetoplayaroundData.

BigData

SubscribeupdatesviaEmail

TopofForm

JoinBigDataPlanettocontinueyourlearningonBigDataTechnologies

BottomofForm

GetUpdatesonFacebook

BigDataLibraries

BIGDATANEWS

CASSANDRA

HADOOP-TUTORIAL

HDFS

HECTOR-API

INSTALLATION

SQOOP

WhichNoSQLDatabasesaccordingtoyouisMostPopular?

GetConnectedonGoogle+

MostPopularBlogArticle

HadoopInstallationonLocalMachine(SinglenodeCluster)

HadoopTutorial:Part5-AllHadoopShellCommandsyouwillNeed.

WhatarethePre-requisitesforgettingstartedwithBigDataTechnologies

HadoopTutorial:Part3-ReplicaPlacementorReplicationandReadOperationsinHDFS

HadoopTutorial:Part1-WhatisHadoop?(anOverview)

HadoopTutorial:Part2-HadoopDistributedFileSystem(HDFS)

HadoopTutorial:Part4-WriteOperationsinHDFS

BestofBooksandResourcestoGetStartedwithHadoop

HowtouseCassandraCQLinyourJavaApplication

BacktoTop▲

#Note:UseScreenResolutionof1280pxandmoretoviewthewebsite@itsbest.AlsousethelatestversionofthebrowserasthewebsiteusesHTML5andCSS3:)

\o"Twitter:@bigdataplanet"

Twitter

\o"Facebook:BigDataPlanet"

Facebook

\o"RSSFeed:Blog"

RSS

\o"GooglePlus:BigDataPlanet"

Google

ABOUTME

CONTACT

PRIVACYPOLICY

?2013AllRightsReserved

BigDataPlanet.

Allarticlesonthiswebsite

DeepakKumar

islicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike3.0UnportedLicense

TopofForm

BottomofForm

\o"Home"

Home

\o"WhatisBigData?"

BigData

\o"FindHadoopTutorialshere"

HadoopTutorials

\o"CassandraandCQL"

Cassandra

\o"CassandraHectorAPI"

HectorAPI

\o"AskforaTutorial"

RequestTutorial

\o"AboutMeandBigDataPlanet"

About

LABELS:

HADOOP-TUTORIAL

HDFS

3OCTOBER2013

HadoopTutorial:Part1-WhatisHadoop?(anOverview)

HadoopisanopensourcesoftwareframeworkthatsupportsdataintensivedistributedapplicationswhichislicensedunderApachev2license.

At-leastthisiswhatyouaregoingtofindasthefirstlineofdefinitiononHadoopinWikipedia.So

whatisdataintensivedistributedapplications?

Well

dataintensive

isnothingbut

BigData

(datathathasoutgrowninsize)anddistributedapplications

aretheapplicationsthatworksonnetworkbycommunicatingand

coordinatingwitheachotherbypassingmessages.(sayusingaRPCinterprocesscommunicationorthroughMessage-Queue)

HadoopwasderivedfromtheresearchpaperpublishedbyGoogleon

GoogleFileSystem(GFS)

and

Google'sMapReduce.SotherearetwointegralpartsofHadoop:

HadoopDistributedFileSystem(HDFS)

and

HadoopMapReduce.

HadoopDistributedFileSystem(HDFS)

HDFSisafilesystemdesignedforstoring

verylargefiles

with

streamingdataaccesspatterns,runningonclustersof

commodityhardware.

WellLetsgetintothedetailsofthestatementmentionedabove:

VeryLargefiles:

Nowwhenwesayverylargefileswemeanherethatthesizeofthefilewillbeinarangeofgigabyte,terabyte,petabyteormaybemore.

Streamingdataaccess:

CommodityHardware:

Hadoopdoesn'trequireexpensive,highlyreliablehardware.It’sdesignedtorun

NowherewearetalkingaboutaFileSystem,HadoopDistributedFileSystem.AndweallknowaboutafewoftheotherFileSystemslikeLinuxFileSystemandWindowsFileSystem.Sothenextquestioncomesis...

WhatisthedifferencebetweennormalFileSystemandHadoopDistributedFileSystem?

ThemajortwodifferencesthatisnotablebetweenHDFSandotherFilesystemsare:

BlockSize:

Everydiskismadeupofablocksize.Andthisisthe

minimum

amountofdatathatiswrittenandreadfromaDisk.NowaFilesystemalsoconsistsofblockswhichismadeoutoftheseblocksonthedisk.Normallydiskblocksareof512bytesandthoseoffilesystemareofafewkilobytes.

Incaseof

HDFS

wealsohavetheblocksconcept.Buthereoneblocksizeisof64MBbydefaultandwhichcanbeincreasedinanintegralmultipleof64i.e.128MB,256MB,512MBorevenmoreinGB's.Italldependontherequirementanduse-cases.

SoWhyaretheseblockssizesolargeforHDFS?keeponreadingandyouwillgetitinanextfewtutorials:)

Metadata

Storage:

Innormalfilesystem

thereisa

hierarchical

storageofmetadatai.e.letssaythereisafolder

ABC,

insidethatfolderthereisagainoneanotherfolder

DEF,

andinsidethatthereis

hello.txt

file.Nowtheinformationabout

hello.txt

(i.e.metadatainfoofhello.txt)

filewillbewith

DEF

andagainthemetadataof

DEF

willbewith

ABC.Hencethisformsa

hierarchy

andthishierarchyismaintaineduntiltherootofthefilesystem.Butin

HDFS

wedon'thaveahierarchyofmetadata.Allthemetadatainformationresideswithasinglemachineknownas

Namenode

(orMasterNode)onthecluster.Andthisnodecontainsalltheinformationaboutotherfilesandfolderandlotsofotherinformationtoo,whichwewilllearninthenextfewtutorials.:)

WellthiswasjustanoverviewofHadoopandHadoopDistributedFileSystem.NowinthenextpartiwillgointothedepthofHDFSandthereafterMapReduceandwillcontinuefromhere...

Letmeknowifyouhaveanydoubtsin

understanding

anythingintothecommentsectionandiwillbereallygladtoanswerthesame:)

IfyoulikewhatyoujustreadandwanttocontinueyourlearningonBIGDATAyoucan

subscribetoourEmail

andLikeour

facebookpage

Thesemightalsohelpyou:,

HadoopTutorial:Part4-WriteOperationsinHDFS

HadoopTutorial:Part3-ReplicaPlacementorReplicationandReadOperationsinHDFS

HadoopTutorial:Part2-HadoopDistributedFileSystem(HDFS)

HadoopTutorial:Part1-WhatisHadoop?(anOverview)

BestofBooksandResourcestoGetStartedwithHadoop

HadoopTutorial:Part5-AllHadoopShellCommandsyouwillNeed.

HadoopInstallationonLocalMachine(SinglenodeCluster)

FindCommentsbeloworAddone

RomainRigaux

said...

Nicesummary!

\o"commentpermalink"

October03,2013

pragyakhare

said...

\o"commentpermalink"

October05,2013

Anonymoussaid...

Sorryforlongcomment..butitwouldbehelpfulifyoucanguideme..

\o"commentpermalink"

October05,2013

DeepakKumar

人人文庫(kù)> 全部分類> 專業(yè)文獻(xiàn) > 工業(yè)制造

溫馨提示

1. 本站所有資源如無(wú)特殊說(shuō)明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

利用Hadoop構(gòu)建云計(jì)算基礎(chǔ)教程

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

利用Hadoop構(gòu)建云計(jì)算基礎(chǔ)教程

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔