版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
TopofForm
BottomofForm
\o"Home"
Home
\o"WhatisBigData?"
BigData
\o"FindHadoopTutorialshere"
HadoopTutorials
\o"CassandraandCQL"
Cassandra
\o"CassandraHectorAPI"
HectorAPI
\o"AskforaTutorial"
RequestTutorial
\o"AboutMeandBigDataPlanet"
About
LABELS:
HADOOP-TUTORIAL
,
HDFS
3OCTOBER2013
HadoopTutorial:Part1-WhatisHadoop?(anOverview)
HadoopisanopensourcesoftwareframeworkthatsupportsdataintensivedistributedapplicationswhichislicensedunderApachev2license.
At-leastthisiswhatyouaregoingtofindasthefirstlineofdefinitiononHadoopinWikipedia.So
whatisdataintensivedistributedapplications?
Well
dataintensive
isnothingbut
BigData
(datathathasoutgrowninsize)anddistributedapplications
aretheapplicationsthatworksonnetworkbycommunicatingand
coordinatingwitheachotherbypassingmessages.(sayusingaRPCinterprocesscommunicationorthroughMessage-Queue)
HenceHadoopworksonadistributedenvironmentandisbuildtostore,handleandprocesslargeamountofdataset(inpetabytes,exabyteandmore).Nowheresinceiamsayingthathadoopstorespetabytesofdata,thisdoesn'tmeanthatHadoopisadatabase.Againrememberitsaframeworkthathandleslargeamountofdataforprocessing.YouwillgettoknowthedifferencebetweenHadoopandDatabases(orNoSQLDatabases,wellthat'swhatwecallBigData'sdatabases)asyougodownthelineinthecomingtutorials.
HadoopwasderivedfromtheresearchpaperpublishedbyGoogleon
GoogleFileSystem(GFS)
and
Google'sMapReduce.SotherearetwointegralpartsofHadoop:
HadoopDistributedFileSystem(HDFS)
and
HadoopMapReduce.
HadoopDistributedFileSystem(HDFS)
HDFSisafilesystemdesignedforstoring
verylargefiles
with
streamingdataaccesspatterns,runningonclustersof
commodityhardware.
WellLetsgetintothedetailsofthestatementmentionedabove:
VeryLargefiles:
Nowwhenwesayverylargefileswemeanherethatthesizeofthefilewillbeinarangeofgigabyte,terabyte,petabyteormaybemore.
Streamingdataaccess:
HDFSisbuiltaroundtheideathatthemostefficientdataprocessingpatternisawrite-once,read-many-timespattern.Adatasetistypicallygeneratedorcopiedfromsource,andthenvariousanalysesareperformedonthatdatasetovertime.Eachanalysiswillinvolvealargeproportion,ifnotall,ofthedataset,sothetimetoreadthewholedatasetismoreimportantthanthelatencyinreadingthefirstrecord.
CommodityHardware:
Hadoopdoesn'trequireexpensive,highlyreliablehardware.It’sdesignedtorun
onclustersofcommodityhardware(commonlyavailablehardwarethatcanbeobtainedfrommultiplevendors)forwhichthechanceofnodefailureacrosstheclusterishigh,atleastforlargeclusters.HDFSisdesignedtocarryonworkingwithoutanoticeableinterruptiontotheuserinthefaceofsuchfailure.
NowherewearetalkingaboutaFileSystem,HadoopDistributedFileSystem.AndweallknowaboutafewoftheotherFileSystemslikeLinuxFileSystemandWindowsFileSystem.Sothenextquestioncomesis...
WhatisthedifferencebetweennormalFileSystemandHadoopDistributedFileSystem?
ThemajortwodifferencesthatisnotablebetweenHDFSandotherFilesystemsare:
BlockSize:
Everydiskismadeupofablocksize.Andthisisthe
minimum
amountofdatathatiswrittenandreadfromaDisk.NowaFilesystemalsoconsistsofblockswhichismadeoutoftheseblocksonthedisk.Normallydiskblocksareof512bytesandthoseoffilesystemareofafewkilobytes.
Incaseof
HDFS
wealsohavetheblocksconcept.Buthereoneblocksizeisof64MBbydefaultandwhichcanbeincreasedinanintegralmultipleof64i.e.128MB,256MB,512MBorevenmoreinGB's.Italldependontherequirementanduse-cases.
SoWhyaretheseblockssizesolargeforHDFS?keeponreadingandyouwillgetitinanextfewtutorials:)
Metadata
Storage:
Innormalfilesystem
thereisa
hierarchical
storageofmetadatai.e.letssaythereisafolder
ABC,
insidethatfolderthereisagainoneanotherfolder
DEF,
andinsidethatthereis
hello.txt
file.Nowtheinformationabout
hello.txt
(i.e.metadatainfoofhello.txt)
filewillbewith
DEF
andagainthemetadataof
DEF
willbewith
ABC.Hencethisformsa
hierarchy
andthishierarchyismaintaineduntiltherootofthefilesystem.Butin
HDFS
wedon'thaveahierarchyofmetadata.Allthemetadatainformationresideswithasinglemachineknownas
Namenode
(orMasterNode)onthecluster.Andthisnodecontainsalltheinformationaboutotherfilesandfolderandlotsofotherinformationtoo,whichwewilllearninthenextfewtutorials.:)
WellthiswasjustanoverviewofHadoopandHadoopDistributedFileSystem.NowinthenextpartiwillgointothedepthofHDFSandthereafterMapReduceandwillcontinuefromhere...
Letmeknowifyouhaveanydoubtsin
understanding
anythingintothecommentsectionandiwillbereallygladtoanswerthesame:)
IfyoulikewhatyoujustreadandwanttocontinueyourlearningonBIGDATAyoucan
subscribetoourEmail
andLikeour
facebookpage
Thesemightalsohelpyou:,
HadoopTutorial:Part4-WriteOperationsinHDFS
HadoopTutorial:Part3-ReplicaPlacementorReplicationandReadOperationsinHDFS
HadoopTutorial:Part2-HadoopDistributedFileSystem(HDFS)
HadoopTutorial:Part1-WhatisHadoop?(anOverview)
BestofBooksandResourcestoGetStartedwithHadoop
HadoopTutorial:Part5-AllHadoopShellCommandsyouwillNeed.
HadoopInstallationonLocalMachine(SinglenodeCluster)
FindCommentsbeloworAddone
RomainRigaux
said...
Nicesummary!
\o"commentpermalink"
October03,2013
pragyakhare
said...
Iknowi'mabeginnerandthisquestionmytbeasilly1butcanyoupleaseexplaintomethathowPARALLELISMisachievedviamap-reduceattheprocessorlevel???ifI'veadualcoreprocessor,isitthatonly2jobswillrunatatimeinparallel?
\o"commentpermalink"
October05,2013
Anonymoussaid...
HiIamfromMainframebackgroundandwithlittleknowledgeofcorejava...DoyouthinkJavaisneededforlearningHadoopinadditiontoHive/PIG?EvenwanttolearnJavaformapreducebutcouldn'tfindwhatallwillbeusedinrealtime..anddefinitiveguidebooksseemstoughforlearningmapreducewithJava..anyoptionwhereIcanlearnitstepbystep?
Sorryforlongcomment..butitwouldbehelpfulifyoucanguideme..
\o"commentpermalink"
October05,2013
DeepakKumar
said...
@PragyaKhare...
Firstthingalwaysremember...theonePopularsayingNOQuestionsareFoolish:)Andbtwitisaverygoodquestion.
Actuallytherearetwothings:
Oneiswhatwillbethebestpractice?andotheriswhathappensintherebydefault?...
Wellbydefaultthenumberofmapperandreducerissetto2foranytasktracker,henceoneseesamaximumof2mapsand2reducesatagiveninstanceonaTaskTracker(whichisconfigurable)..WellthisDoesn'tonlydependontheProcessorbutonlotsofotherfactoraswelllikeram,cpu,power,diskandothers
/blog/best-practices-for-selecting-apache-hadoop-hardware/
Andfortheotherfactori.eforBestPracticesitdependsonyourusecase.Youcangothroughthe3rdpointofthebelowlinktounderstanditmoreconceptually
/blog/2009/12/7-tips-for-improving-mapreduce-performance/
WelliwillexplainallthesewheniwillreachtheadvanceMapReducetutorials..Tillthenkeepreading!!:)
\o"commentpermalink"
October05,2013
DeepakKumar
said...
@Anonymous
AsHadoopiswritteninJava,somostofitsAPI'sarewrittenincoreJava...WelltoknowabouttheHadooparchitectureyoudon'tneedJava...ButtogotoitsAPILevelandstartprogramminginMapReduceyouneedtoknowCoreJava.
Andasfortherequirementinjavayouhaveaskedfor...youjustneedsimplecorejavaconceptsandprogrammingforHadoopandMapReduce..AndHive/PIGaretheSQLkindofdataflowlanguagesthatisreallyeasytolearn...Andsinceyouarefromaprogrammingbackgrounditwon'tbeverydifficulttolearnjava:)youcanalsogothroughthelinkbelowforfurtherdetails:)
/2013/09/What-are-the-Pre-requsites-for-getting-started-with-Big-Data-Technologies.html
\o"commentpermalink"
October05,2013
PostaComment
\o"NewerPost"
NewerPost→
\o"OlderPost"
←OlderPost
ABOUTTHEAUTHOR
DEEPAKKUMAR
BigData/HadoopDeveloper,SoftwareEngineer,Thinker,Learner,Geek,Blogger,Coder
IlovetoplayaroundData.
BigData
!
SubscribeupdatesviaEmail
TopofForm
JoinBigDataPlanettocontinueyourlearningonBigDataTechnologies
BottomofForm
GetUpdatesonFacebook
BigDataLibraries
BIGDATANEWS
CASSANDRA
HADOOP-TUTORIAL
HDFS
HECTOR-API
INSTALLATION
SQOOP
WhichNoSQLDatabasesaccordingtoyouisMostPopular?
GetConnectedonGoogle+
MostPopularBlogArticle
HadoopInstallationonLocalMachine(SinglenodeCluster)
HadoopTutorial:Part5-AllHadoopShellCommandsyouwillNeed.
WhatarethePre-requisitesforgettingstartedwithBigDataTechnologies
HadoopTutorial:Part3-ReplicaPlacementorReplicationandReadOperationsinHDFS
HadoopTutorial:Part1-WhatisHadoop?(anOverview)
HadoopTutorial:Part2-HadoopDistributedFileSystem(HDFS)
HadoopTutorial:Part4-WriteOperationsinHDFS
BestofBooksandResourcestoGetStartedwithHadoop
HowtouseCassandraCQLinyourJavaApplication
BacktoTop▲
#Note:UseScreenResolutionof1280pxandmoretoviewthewebsite@itsbest.AlsousethelatestversionofthebrowserasthewebsiteusesHTML5andCSS3:)
\o"Twitter:@bigdataplanet"
\o"Facebook:BigDataPlanet"
\o"RSSFeed:Blog"
RSS
\o"GooglePlus:BigDataPlanet"
ABOUTME
CONTACT
PRIVACYPOLICY
?2013AllRightsReserved
BigDataPlanet.
Allarticlesonthiswebsite
by
DeepakKumar
islicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike3.0UnportedLicense
TopofForm
BottomofForm
\o"Home"
Home
\o"WhatisBigData?"
BigData
\o"FindHadoopTutorialshere"
HadoopTutorials
\o"CassandraandCQL"
Cassandra
\o"CassandraHectorAPI"
HectorAPI
\o"AskforaTutorial"
RequestTutorial
\o"AboutMeandBigDataPlanet"
About
LABELS:
HADOOP-TUTORIAL
,
HDFS
6OCTOBER2013
HadoopTutorial:Part2-HadoopDistributedFileSystem(HDFS)
Inthelasttutorialon
WhatisHadoop?
ihavegivenyouabriefideaaboutHadoop.SothetwointegralpartsofHadoopisHadoop
HDFS
andHadoop
MapReduce.
LetsgofurtherdeepinsideHDFS.
HadoopDistributedFileSystem
(HDFS)
Concepts:
FirsttakealookatthefollowingtwoterminologiesthatwillbeusedwhiledescribingHDFS.
Cluster:Ahadoopclusterismadebyhavingmanymachinesinanetwork,eachmachineistermedasanode,andthesenodestalkstoeachotheroverthenetwork.
BlockSize:
Thisistheminimumamountofsizeofoneblockinafilesystem,inwhichdatacanbekeptcontiguously.
ThedefaultsizeofasingleblockinHDFSis64Mb.
InHDFS,Dataiskeptbysplittingitintosmallchunksorparts.Letssayyouhaveatextfileof200MBandyouwanttokeepthisfileinaHadoopCluster.Thenwhathappensisthat,
thefilebreaksorsplitsintoalargenumberofchunks,whereeachchunkisequaltotheblocksizethatissetfortheHDFScluster(whichis64MBbydefault).
Hencea200Mboffilegetssplitinto4parts,3partsof64mband1partof8mb,andeachpartwillbekeptonadifferentmachine.OnwhichmachinewhichsplitwillbekeptisdecidedbyNamenode,aboutwhichwewillbediscussingindetailsbelow.
NowinaHadoopDistributedFileSystemorHDFSCluster,therearetwokindsofnodes,AMasterNodeandmanyWorkerNodes.Theseareknownas:
Namenode(masternode)andDatanode(workernode).
Namenode:
Thenamenodemanagesthefilesystemnamespace.Itmaintainsthefilesystemtreeandthemetadataforallthefilesanddirectoriesinthetree.Soitcontainstheinformationofallthefiles,directoriesandtheirhierarchyintheclusterintheformofa
NamespaceImage
and
editlogs.AlongwiththefilesysteminformationitalsoknowsabouttheDatanodeonwhich
alltheblocksofafileiskept.
Aclientaccessesthefilesystemonbehalfoftheuserbycommunicatingwiththenamenodeanddatanodes.TheclientpresentsafilesysteminterfacesimilartoaPortableOperatingSystemInterface(POSIX),sotheusercodedoesnotneedtoknowaboutthenamenodeanddatanodetofunction.
Datanode:
Thesearetheworkersthatdoestherealwork.Andherebyrealworkwemeanthatthestorageofactualdataisdonebythedatanode.Theystoreandretrieveblockswhentheyaretoldto(byclientsorthenamenode),andtheyreportbacktothenamenodeperiodicallywithlistsofblocksthattheyarestoring.
Hereoneimportantthingthatistheretonote:
InoneclustertherewillbeonlyoneNamenodeandtherecanbeNnumberofdatanodes.
SincetheNamenodecontainsthemetadataofallthefilesanddirectoriesandalsoknowsaboutthedatanodeonwhicheachsplitoffilesarestored.SoletssayNamenodegoesdownthenwhatdoyouthinkwillhappen?.
Yes,iftheNamenodeisDownwecannotaccessanyofthefilesanddirectoriesinthecluster.
Evenwewillnotbeabletoconnectwithanyofthedatanodestogetanyofthefiles.
Nowthinkofit,sincewehavekeptourfilesbysplittingitin
different
chunksandalsowehavekeptthemindifferentdatanodes.AnditistheNamenodethatkeepstrackofallthefilesmetadata.SoonlyNamenodeknowshowtoreconstructafilebackintoonefromallthesplits.andthisisthereasonthatifNamenodeisdowninahadoopclustersoeverythingisdown.
Thisisalsothereason
that's
whyHadoopisknownasaSinglePointoffailure.
NowsinceNamenodeissoimportant,wehavetomakethenamenoderesilienttofailure.Andforthathadoopprovidesuswithtwomechanism.
Thefirstwayistobackupthefilesthatmakeupthepersistentstateofthefilesystemmetadata.Hadoopcanbeconfiguredsothatthenamenodewritesitspersistentstatetomultiplefilesystems.Thesewritesaresynchronousandatomic.TheusualconfigurationchoiceistowritetolocaldiskaswellasaremoteNFSmount.
Thesecondwayisrunninga
SecondaryNamenode.
Wellasthenamesuggests,it
doesnot
actlikeaNamenode.Soifitdoesn'tactlikeanamenodehowdoesitpreventsfromthefailure.
Wellthe
Secondarynamenode
alsocontainsa
namespaceimage
and
editlogs
likenamenode.Nowaftereverycertainintervaloftime(whichisonehourbydefault)
itcopiesthe
namespaceimage
from
namenode
andmergethis
namespaceimage
withthe
editlog
andcopyitbacktothe
namenode
sothat
namenode
willhavethefreshcopyof
namespaceimage.Nowletssupposeatanyinstanceoftimethe
namenodegoesdownandbecomescorruptthenwecanrestart
someothermachinewiththenamespaceimageandtheeditlogthat'swhatwehavewiththe
secondarynamenodeandhencecanbepreventedfromatotalfailure.
SecondaryNamenodetakesalmostthesameamountofmemoryandCPUforitsworkingastheNamenode.Soitisalsokeptinaseparatemachinelikethatofanamenode.Henceweseeherethat
inasingleclusterwehaveoneNamenode,oneSecondarynamenodeandmanyDatanodes,andHDFSconsistsofthesethreeelements.
ThiswasagainanoverviewofHadoopDistributedFileSystemHDFS,InthenextpartofthetutorialwewillknowabouttheworkingofNamenodeandDatanodeinamoredetailedmanner.WewillknowhowreadandwritehappensinHDFS.
Letmeknowifyouhaveanydoubtsin
understanding
anythingintothecommentsectionandiwillbereallygladtoansweryourquestions:)
IfyoulikewhatyoujustreadandwanttocontinueyourlearningonBIGDATAyoucan
subscribetoourEmail
andLikeour
facebookpage
Thesemightalsohelpyou:,
HadoopInstallationonLocalMachine(SinglenodeCluster)
HadoopTutorial:Part4-WriteOperationsinHDFS
HadoopTutorial:Part3-ReplicaPlacementorReplicationandReadOperationsinHDFS
HadoopTutorial:Part2-HadoopDistributedFileSystem(HDFS)
HadoopTutorial:Part1-WhatisHadoop?(anOverview)
BestofBooksandResourcestoGetStartedwithHadoop
HadoopTutorial:Part5-AllHadoopShellCommandsyouwillNeed.
FindCommentsbeloworAddone
vishwash
said...
veryinformative...
\o"commentpermalink"
October07,2013
TusharKarande
said...
Thanksforsuchainformatictutorials:)
pleasekeepposting..waitingformore...:)
\o"commentpermalink"
October08,2013
Anonymoussaid...
NiceinformationButIhaveonedoubtlike,whatistheadvantageofkeepingthefileinpartofchunksondifferent-2datanodes?Whatkindofbenefitwearegettinghere?
\o"commentpermalink"
October08,2013
DeepakKumar
said...
@Anonymous:Welltherearelotsofreasons...iwillexplainthatwithgreatdetailsinthenextfewarticles...
Butfornowletusunderstandthis...sincewehavesplitthefileintotwo,nowwecantakethepoweroftwoprocessors(parallelprocessing)ontwodifferentnodestodoouranalysis(likesearch,calculation,predictionandlotsmore)..Againletssaymyfilesizeisinsomepetabytes...Yourwon'tfindoneHarddiskthatbig..andletssayifitisthere...howdoyouthinkthatwearegoingtoreadandwriteonthatharddisk(thelatencywillbereallyhightoreadandwrite)...itwilltakelotsoftime...Againtherearemorereasonsforthesame...Iwillmakeyouunderstandthisinmoretechnicalwaysinthecomingtutorials...Tillthenkeepreading:)
\o"commentpermalink"
October08,2013
PostaComment
\o"NewerPost"
NewerPost→
\o"OlderPost"
←OlderPost
ABOUTTHEAUTHOR
DEEPAKKUMAR
BigData/HadoopDeveloper,SoftwareEngineer,Thinker,Learner,Geek,Blogger,Coder
IlovetoplayaroundData.
BigData
!
SubscribeupdatesviaEmail
TopofForm
JoinBigDataPlanettocontinueyourlearningonBigDataTechnologies
BottomofForm
GetUpdatesonFacebook
BigDataLibraries
BIGDATANEWS
CASSANDRA
HADOOP-TUTORIAL
HDFS
HECTOR-API
INSTALLATION
SQOOP
WhichNoSQLDatabasesaccordingtoyouisMostPopular?
GetConnectedonGoogle+
MostPopularBlogArticle
HadoopInstallationonLocalMachine(SinglenodeCluster)
HadoopTutorial:Part5-AllHadoopShellCommandsyouwillNeed.
WhatarethePre-requisitesforgettingstartedwithBigDataTechnologies
HadoopTutorial:Part3-ReplicaPlacementorReplicationandReadOperationsinHDFS
HadoopTutorial:Part1-WhatisHadoop?(anOverview)
HadoopTutorial:Part2-HadoopDistributedFileSystem(HDFS)
HadoopTutorial:Part4-WriteOperationsinHDFS
BestofBooksandResourcestoGetStartedwithHadoop
HowtouseCassandraCQLinyourJavaApplication
BacktoTop▲
#Note:UseScreenResolutionof1280pxandmoretoviewthewebsite@itsbest.AlsousethelatestversionofthebrowserasthewebsiteusesHTML5andCSS3:)
\o"Twitter:@bigdataplanet"
\o"Facebook:BigDataPlanet"
\o"RSSFeed:Blog"
RSS
\o"GooglePlus:BigDataPlanet"
ABOUTME
CONTACT
PRIVACYPOLICY
?2013AllRightsReserved
BigDataPlanet.
Allarticlesonthiswebsite
by
DeepakKumar
islicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike3.0UnportedLicense
TopofForm
BottomofForm
\o"Home"
Home
\o"WhatisBigData?"
BigData
\o"FindHadoopTutorialshere"
HadoopTutorials
\o"CassandraandCQL"
Cassandra
\o"CassandraHectorAPI"
HectorAPI
\o"AskforaTutorial"
RequestTutorial
\o"AboutMeandBigDataPlanet"
About
LABELS:
HADOOP-TUTORIAL
,
HDFS
3OCTOBER2013
HadoopTutorial:Part1-WhatisHadoop?(anOverview)
HadoopisanopensourcesoftwareframeworkthatsupportsdataintensivedistributedapplicationswhichislicensedunderApachev2license.
At-leastthisiswhatyouaregoingtofindasthefirstlineofdefinitiononHadoopinWikipedia.So
whatisdataintensivedistributedapplications?
Well
dataintensive
isnothingbut
BigData
(datathathasoutgrowninsize)anddistributedapplications
aretheapplicationsthatworksonnetworkbycommunicatingand
coordinatingwitheachotherbypassingmessages.(sayusingaRPCinterprocesscommunicationorthroughMessage-Queue)
HenceHadoopworksonadistributedenvironmentandisbuildtostore,handleandprocesslargeamountofdataset(inpetabytes,exabyteandmore).Nowheresinceiamsayingthathadoopstorespetabytesofdata,thisdoesn'tmeanthatHadoopisadatabase.Againrememberitsaframeworkthathandleslargeamountofdataforprocessing.YouwillgettoknowthedifferencebetweenHadoopandDatabases(orNoSQLDatabases,wellthat'swhatwecallBigData'sdatabases)asyougodownthelineinthecomingtutorials.
HadoopwasderivedfromtheresearchpaperpublishedbyGoogleon
GoogleFileSystem(GFS)
and
Google'sMapReduce.SotherearetwointegralpartsofHadoop:
HadoopDistributedFileSystem(HDFS)
and
HadoopMapReduce.
HadoopDistributedFileSystem(HDFS)
HDFSisafilesystemdesignedforstoring
verylargefiles
with
streamingdataaccesspatterns,runningonclustersof
commodityhardware.
WellLetsgetintothedetailsofthestatementmentionedabove:
VeryLargefiles:
Nowwhenwesayverylargefileswemeanherethatthesizeofthefilewillbeinarangeofgigabyte,terabyte,petabyteormaybemore.
Streamingdataaccess:
HDFSisbuiltaroundtheideathatthemostefficientdataprocessingpatternisawrite-once,read-many-timespattern.Adatasetistypicallygeneratedorcopiedfromsource,andthenvariousanalysesareperformedonthatdatasetovertime.Eachanalysiswillinvolvealargeproportion,ifnotall,ofthedataset,sothetimetoreadthewholedatasetismoreimportantthanthelatencyinreadingthefirstrecord.
CommodityHardware:
Hadoopdoesn'trequireexpensive,highlyreliablehardware.It’sdesignedtorun
onclustersofcommodityhardware(commonlyavailablehardwarethatcanbeobtainedfrommultiplevendors)forwhichthechanceofnodefailureacrosstheclusterishigh,atleastforlargeclusters.HDFSisdesignedtocarryonworkingwithoutanoticeableinterruptiontotheuserinthefaceofsuchfailure.
NowherewearetalkingaboutaFileSystem,HadoopDistributedFileSystem.AndweallknowaboutafewoftheotherFileSystemslikeLinuxFileSystemandWindowsFileSystem.Sothenextquestioncomesis...
WhatisthedifferencebetweennormalFileSystemandHadoopDistributedFileSystem?
ThemajortwodifferencesthatisnotablebetweenHDFSandotherFilesystemsare:
BlockSize:
Everydiskismadeupofablocksize.Andthisisthe
minimum
amountofdatathatiswrittenandreadfromaDisk.NowaFilesystemalsoconsistsofblockswhichismadeoutoftheseblocksonthedisk.Normallydiskblocksareof512bytesandthoseoffilesystemareofafewkilobytes.
Incaseof
HDFS
wealsohavetheblocksconcept.Buthereoneblocksizeisof64MBbydefaultandwhichcanbeincreasedinanintegralmultipleof64i.e.128MB,256MB,512MBorevenmoreinGB's.Italldependontherequirementanduse-cases.
SoWhyaretheseblockssizesolargeforHDFS?keeponreadingandyouwillgetitinanextfewtutorials:)
Metadata
Storage:
Innormalfilesystem
thereisa
hierarchical
storageofmetadatai.e.letssaythereisafolder
ABC,
insidethatfolderthereisagainoneanotherfolder
DEF,
andinsidethatthereis
hello.txt
file.Nowtheinformationabout
hello.txt
(i.e.metadatainfoofhello.txt)
filewillbewith
DEF
andagainthemetadataof
DEF
willbewith
ABC.Hencethisformsa
hierarchy
andthishierarchyismaintaineduntiltherootofthefilesystem.Butin
HDFS
wedon'thaveahierarchyofmetadata.Allthemetadatainformationresideswithasinglemachineknownas
Namenode
(orMasterNode)onthecluster.Andthisnodecontainsalltheinformationaboutotherfilesandfolderandlotsofotherinformationtoo,whichwewilllearninthenextfewtutorials.:)
WellthiswasjustanoverviewofHadoopandHadoopDistributedFileSystem.NowinthenextpartiwillgointothedepthofHDFSandthereafterMapReduceandwillcontinuefromhere...
Letmeknowifyouhaveanydoubtsin
understanding
anythingintothecommentsectionandiwillbereallygladtoanswerthesame:)
IfyoulikewhatyoujustreadandwanttocontinueyourlearningonBIGDATAyoucan
subscribetoourEmail
andLikeour
facebookpage
Thesemightalsohelpyou:,
HadoopTutorial:Part4-WriteOperationsinHDFS
HadoopTutorial:Part3-ReplicaPlacementorReplicationandReadOperationsinHDFS
HadoopTutorial:Part2-HadoopDistributedFileSystem(HDFS)
HadoopTutorial:Part1-WhatisHadoop?(anOverview)
BestofBooksandResourcestoGetStartedwithHadoop
HadoopTutorial:Part5-AllHadoopShellCommandsyouwillNeed.
HadoopInstallationonLocalMachine(SinglenodeCluster)
FindCommentsbeloworAddone
RomainRigaux
said...
Nicesummary!
\o"commentpermalink"
October03,2013
pragyakhare
said...
Iknowi'mabeginnerandthisquestionmytbeasilly1butcanyoupleaseexplaintomethathowPARALLELISMisachievedviamap-reduceattheprocessorlevel???ifI'veadualcoreprocessor,isitthatonly2jobswillrunatatimeinparallel?
\o"commentpermalink"
October05,2013
Anonymoussaid...
HiIamfromMainframebackgroundandwithlittleknowledgeofcorejava...DoyouthinkJavaisneededforlearningHadoopinadditiontoHive/PIG?EvenwanttolearnJavaformapreducebutcouldn'tfindwhatallwillbeusedinrealtime..anddefinitiveguidebooksseemstoughforlearningmapreducewithJava..anyoptionwhereIcanlearnitstepbystep?
Sorryforlongcomment..butitwouldbehelpfulifyoucanguideme..
\o"commentpermalink"
October05,2013
DeepakKumar
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 商鋪租賃解除合同法律意見書
- 項目咨詢服務合同條件
- 電子借款合同格式
- 安全評估招標指南
- 房屋買賣合同中契稅繳納的注意事項
- 供應商品質保證書
- 商務樓衛(wèi)生維護契約
- 供貨協議合同模板
- 春運出行完全手冊解析
- 傳遞正能量的保證宣言
- 2023河南省成人高考《英語》(高升專)考試卷及答案(單選題型)
- 教學設計《營養(yǎng)健康我守護-數據的價值》
- 小學三年發(fā)展規(guī)劃(2022-2025)
- 專升本學英語心得體會范文英語專升本范文10篇(9篇)
- 安徽省水利工程資料表格
- JJG 2047-2006扭矩計量器具
- GB/T 1354-2018大米
- 超材料(metamaterials)教學講解課件
- 20XX年高校維穩(wěn)工作案例(四)
- 二年級上冊語文課件 語文園地八 人教部編版(共19張PPT)
- 2022(SOP)人民醫(yī)院倫理委員會標準操作規(guī)程
評論
0/150
提交評論