![程序設(shè)計(jì)與算法基礎(chǔ)課件6_第1頁](http://file4.renrendoc.com/view/4e4fc331d0047fc5490823a70fbb47d8/4e4fc331d0047fc5490823a70fbb47d81.gif)
![程序設(shè)計(jì)與算法基礎(chǔ)課件6_第2頁](http://file4.renrendoc.com/view/4e4fc331d0047fc5490823a70fbb47d8/4e4fc331d0047fc5490823a70fbb47d82.gif)
![程序設(shè)計(jì)與算法基礎(chǔ)課件6_第3頁](http://file4.renrendoc.com/view/4e4fc331d0047fc5490823a70fbb47d8/4e4fc331d0047fc5490823a70fbb47d83.gif)
![程序設(shè)計(jì)與算法基礎(chǔ)課件6_第4頁](http://file4.renrendoc.com/view/4e4fc331d0047fc5490823a70fbb47d8/4e4fc331d0047fc5490823a70fbb47d84.gif)
![程序設(shè)計(jì)與算法基礎(chǔ)課件6_第5頁](http://file4.renrendoc.com/view/4e4fc331d0047fc5490823a70fbb47d8/4e4fc331d0047fc5490823a70fbb47d85.gif)
版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
程序設(shè)計(jì)與算法基礎(chǔ)(6)潘愛民2006/10/30程序設(shè)計(jì)與算法基礎(chǔ)(6)潘愛民1OutlineHashtablesBloomfilterInvertedindexOutlineHashtables2SearchingproblemagainForalinkedlist,->O(n)Forasortedarray,->O(logn)CanweexpectO(1)?whichmeansRegardlessofthenumberofelementsbeingsearched,theruntimeisalwaysthesameGivenakey,thepositioninthetablecanbeaccesseddirectlySearchingproblemagainForal3OperationsonhashtablesCollectionofpairs(key,element),herekeymaybeastring,number,record,etc.PairshavedifferentkeysOperationsGet(theKey)Delete(theKey)Insert(theKey,theElement)OperationsonhashtablesColle4IdealHashingUsesa1Darray(ortable)table[0…m-1].EachpositionofthisarrayisabucketAbucketcannormallyholdonlyonepairUsesahashfunctionhthatconvertseachkeykintoanindexintherange[0,m-1]h(k)isthehomebucketforkeykEverypair(key,element)isstoredinitshomebuckettable[h[key]]IdealHashingUsesa1Darray(5[0][1][2][3][4][5][6][7]IdealHashingExamplePairsare:(22,a),
(33,c),
(3,d),
(73,e),
(85,f)Hashtableis
table[0…7],m=8Hashfunctionis
h(key)=key/11
Pairsarestoredintableasbelow(85,f)(22,a)(33,c)(3,d)(73,e)
Get,
Insert,and
Deletetake
O(1)
time[0][1][2][3][4][5][6][7]Ideal6WhatCanGoWrong?Wheredoes(26,g)go?Keysthathavethesamehomebucketaresynonyms22and26aresynonymswithrespecttothehashfunctionthatisinuse.Thehomebucketfor(26,g)isalreadyoccupied.[0][1][2][3][4][5][6][7](85,f)(22,a)(33,c)(3,d)(73,e)WhatCanGoWrong?Wheredoes(7WhatCanGoWrong?AcollisionoccurswhenthehomebucketforanewpairisoccupiedbyapairwithadifferentkeyAnoverflowoccurswhenthereisnospaceinthehomebucketforthenewpairWhenabucketcanholdonlyonepair,collisionsandoverflowsoccurtogetherNeedamethodtohandleoverflows(85,f)(22,a)(33,c)(3,d)(73,e)WhatCanGoWrong?Acollision8HashTableIssuesChoiceofhashfunctionsCollisionresolutionSizeofhashtableThesizeofbucket&numberofbucketsHashTableIssuesChoiceofhas9HashfunctionsDefinition:AhashfunctiontransformsakeyKintoanindexinthetableusedforstoringitemsofthesametypeasKIfhashfunctionhtransformsdifferentkeysintodifferentnumbers,itiscalledaperfecthashfunction
Therefore,ahashfunctionisamappingfromnitemstompositionsTotalnumberofpossiblemappingsismnIfn
m,thenumberofperfecthashfunctionsism!/(m-n)!HashfunctionsDefinition:10HashFunctionsTwopartsConvertakeyintoanintegerincasethekeyisnotanintegerh(k)Mapanintegerintoahomebucketf(k)isanintegerintherange[0,m-1],wheremisthenumberofbucketsinthetableHashFunctionsTwoparts11StringToNon-negativeIntegerEachcharacteris1bytelongAnintis4bytesAtwo-characterstringsmaybeconvertedintoaunique4bytenon-negativeintusingthecode:intanswer=s.at(0);answer=(answer<<8)+s.at(1);Stringsthatarelongerthan3charactersdonothaveauniquenon-negativeintrepresentationStringToNon-negativeInteger12StringToNonnegativeIntegerunsignedlongoperator()(conststringtheKey){ //ConverttheKeytoanonnegativeinteger.unsignedlonghashValue=0;intlength=(int)theKey.length();for(inti=0;i<length;i++)hashValue=5*hashValue+theKey.at(i);returnhashValue;}StringToNonnegativeIntegeru13MapintoahomebucketMostcommonmethodisbydivision
homeBucket=h(theKey)%divisor;divisor
equalsthenumberofbuckets
m0
homeBucket<divisor=m[0][1][2][3][4][5][6][7](85,f)(22,a)(33,c)(3,d)(73,e)MapintoahomebucketMostcom14UniformHashFunctionLetkeySpacebethesetofallpossiblekeysAuniformhashfunction
mapsthekeysinkeySpaceintobucketssuchthatapproximatelythesamenumberofkeysgetmappedintoeachbucket[0][1][2][3][4][5][6][7](85,f)(22,a)(33,c)(3,d)(73,e)UniformHashFunctionLetkeySp15UniformHashFunctionEquivalently,theprobabilitythatarandomlyselectedkeyhasbucketiasitshomebucketis1/m,0
i<mAuniformhashfunctionminimizesthelikelihoodofanoverflowwhenkeysareselectedatrandom[0][1][2][3][4][5][6][7](85,f)(22,a)(33,c)(3,d)(73,e)UniformHashFunctionEquivale16HashingByDivisionkeySpace=allintsForeverym,thenumberofintsthatgetmapped(hashed)intobucketiisapproximately232/mTherefore,thedivisionmethodresultsinauniformhashfunctionwhenkeySpace=allintsInpractice,keystendtobecorrelatedSo,thechoiceofthedivisormaffectsthedistributionofhomebucketsHashingByDivisionkeySpace=17SelectingTheDivisorBecauseofthiscorrelation,applicationstendtohaveabiastowardskeysthatmapintooddintegers(orintoevenones)Whenthedivisorisanevennumber,oddintegershashintooddhomebucketsandevenintegersintoevenhomebuckets20%14=6,30%14=2,8%14=815%14=1,3%14=3,23%14=9ThebiasinthekeysresultsinabiastowardeithertheoddorevenhomebucketsSelectingTheDivisorBecauseo18SelectingTheDivisorWhenthedivisorisanoddnumber,odd(even)integersmayhashintoanyhome20%15=5,30%15=0,8%15=815%15=0,3%15=3,23%15=8ThebiasinthekeysdoesnotresultinabiastowardeithertheoddorevenhomebucketsBetterchanceofuniformlydistributedhomebucketsSodonotuseanevendivisorSelectingTheDivisorWhenthe19SelectingTheDivisorSimilarbiaseddistributionofhomebucketsisseen,inpractice,whenthedivisorisamultipleofprimenumberssuchas3,5,7,…TheeffectofeachprimedivisorpofmdecreasesaspgetslargerIdeally,choosemsothatitisaprimenumberAlternatively,choosemsothatithasnoprimefactorsmallerthan
20SelectingTheDivisorSimilarb20HashingbyfoldingThekeyisdividedintoseveralparts,andthesepartsarecombinedorfoldedtogetherShiftfoldingUsingasimpleoperationsuchasadditiontocombinetheminacertainwayForexample,thesocialsecuritynumber(SSN),123-45-6789(123+456+789)%m (m=1000)BoundaryfoldingThepiecesofthekeysarefoldedonthebordersbetweendifferentparts,forexample(123+654+789)%m (m=1000)HashingbyfoldingThekeyisd21AboutfoldingSimpleandfast,bitpatterncanbeusedinsteadofnumericalvaluesInthecaseofstringsXor’ingthecharacterstogether,andusingtheresultfortheaddressXor’ingthechunksofstringsratherthansinglecharacters.ThelengthofachunkisequaltothenumberofbytesinanintegerTypically,theresultoffoldingprocessingaredividedmodulomAboutfoldingSimpleandfast,22HashingbyaMid-squarefunctionThekeyissquaredandthemiddleormidpartoftheresultisusedasthehashvalueEntirekeyparticipatesingeneratingthehashvalue,soabetterchancethatthedifferentvaluesaregeneratedfordifferentkeysInpractice,itismoreefficienttochooseapowerof2forthesizeofthetable,m,andextractthemidpartofthebitrepresentationofthesquareofakey,justusingamaskandashiftoperationForexample,31212=100101001010000101100001,
themidpart101000010=322HashingbyaMid-squarefuncti23HashingbyextractionOnlyapartofthekeyisusedtocomputetheaddressIfthepartisdistributeduniformly,itcanbesufficientforhashing(providedtheomittedportiondistinguishesthekeysonlyinaninsignificantway)Forexample,inthestudentID,somedigitsaresafelyomittedinahashfunctionISBNcodeissimilarHashingbyextractionOnlyapa24HashingbyradixtransformationThekeyistransformedintoanothernumberbaseForexample,Kisthedecimalnumber345,thenitsvalueinbase9is423(9)Thenitisdividedmodulominoriginalbase,theresultingnumberisusedasthehashvalueIfm=100,345(10)and245(10)arenothashedtothesamevalue,but345(10)and264(10)willbehashedtoasamevalue,23Hashingbyradixtransformatio25HashingbycryptographichashalgorithmsStronghashalgorithmsAchangeofanybitintheinputwillcausethechangeofanybitintheoutputwith0.5probabilityapproximatelyTheencryptionalgorithmhassimilarfeatureUsingcryptographicalgorithmstohashakeyForexample,computetheMD5(key)orSHA1(key)
orencryptthekeywithafixedencryptionkey,DESk(key)ThendividedmodulomHashingbycryptographichash26CollisionresolutionAvoidcollisionIncreasingthetablesizemayleadtobetterhashing,butsometimesnotThetwofactors–hashfunctionandtablesize–mayminimizethenumberofcollisions,buttheycannotcompletelyeliminatethem
(ifthesizeofkeyspaceislargerthanthetablesize)SomestrategiesOpenaddressingChainingBucketaddressingCollisionresolutionAvoidcoll27CollisionresolutionbyopeningaddressingAcollisionoccurswhenthehomebucketforanewpair(key,element)isoccupiedWemayhandlecollisionsby:Searchthehashtableinsomesystematicfashionforabucketthatisnotoccupied.LinearprobingQuadraticprobingRandomprobingEliminateoverflowsbypermittingeachbuckettokeepalistofallpairsforwhichitisthehomebucketDynamicarraylinkedlistCollisionresolutionbyopenin28OpeningaddressingIfthepositionh(K)isoccupied,thenthepositionsintheprobingsequence
norm(h(K)+p(1)),norm(h(K)+p(2)),…,norm(h(K)+p(m))
aretrieduntileitheranavailablecellisfoundorthesamepositionsaretriedrepeatedlyorthetableisfullIntheprobingsequence,Functionpisprobingfunctioniisaprobenormisanormalizationfunction,(divisionmodulotothesizeofthetable)OpeningaddressingIftheposit29LinearprobingIntheopenaddressingp(i)=c*i,
soifc=1,thenthepositiontobetriedis(h(K)+i)Inlinearprobing,Insertion:thepositiontostoreakeyisfoundbysequentiallysearchingallpositionsstartingfromthepositioncalculatedbythehashfunctionuntilanemptycellisfoundTendencytocreateclustersinthetable,theemptycellsfollowingclustershaveamuchgreaterchancetobefilledthanotherpositionsLinearprobingIntheopenaddr30LinearProbing–GetAndInsertdivisor=m
(numberofbuckets)=17Homebucket=key%170481216Insertpairswhosekeysare6,12,34,29,28,11,23,7,0,33,30,45612293428112370333045LinearProbing–GetAndInser31QuadraticprobingIntheopenaddressingIngeneral,p(i)=c2*i2+c1*i,forexample,
p(i)=(-1)i-1((i+1)/2)2,sothepositiontobetriedis
h(K),h(K)+1,h(K)-1,h(K)+4,h(K)-4,…,
h(K)+(m-1)2/4,h(K)–(m-1)2/4Theexperiencevalueofm(tablesize)isaprimeintheformof4j+1Question:Whynotusep(i)=i2,0i<m
(i2-j2)=(i+j)(i-j)TheeffectofclusteringisbetterthanlinearprobingSecondaryclusterforthekeyshashedtothesamelocationQuadraticprobingIntheopena32RandomprobingpfunctionisdefinedasapseudorandomnumbergeneratorArequirementInordertofindtheprobingsequence,therandomnumbergeneratorshouldbeinitializedtothesameseedforthesamekeyAvoidsecondaryclustersThesameprobingsequenceforakeyDifferentprobingsequencesfordifferentkeyswiththesamehashvalueRandomprobingpfunctionisde33Doublehashingpfunctionisdefinedwithanotherhashfunction,hp(K)
p(i)=i*hp(K)Theprobingsequenceish(K),h(K)+hp(K),h(K)+2hp(K),h(K)+3hp(K),…,
h(K)+(m-1)hp(K)Theprobingsequencewilldependonthechoiceofhashfunctionhp(K)Thehashfunctionhp(K)
canbedefinedwiththeoriginalhashfunctionh(K),forexample,
hp(K)=i*h(K)+1,thentheprobingsequenceis
h(K)+i*(i*h(K)+1)=(i2+1)h(K)+iSecondaryclustersDoublehashingpfunctionisde34PerformanceofopeningaddressingSuccessfulsearchesvs.unsuccessfulsearches
[TheArtofComputerProgramming,Volume3]LinearprobingQuadraticsearchDoublehashingSuccessfulsearches[1+1/(1-LF)]/21-ln(1-LF)-LF/2-ln(1-LF)/LFUnsuccessfulsearches[1+1/(1-LF)2]/21/(1-LF)-LF-ln(1-LF)1/(1-LF)LF=n/m,loadfactor=(numberofelementsinthetable)/tablesizePerformanceofopeningaddress35PerformanceOfLinearProbingWorst-caseGet/InserttimeisQ(n),wherenisthenumberofkeysinthetableThishappenswhenallpairsareinthesameclusterExpectedPerformanceSn=expectednumberofbucketsexaminedinasuccessfulsearchUn=expectednumberofbucketsexaminedinaunsuccessfulsearch0481216612293428112370333045PerformanceOfLinearProbingW36ExpectedPerformanceSn~[1+1/(1-LF)]/2Un~[1+1/(1-LF)2]/2Notethat0LF1LF
0.65isrecommended.LFSnUn0.51.52.50.651.94.60.752.58.50.905.550.5ExpectedPerformanceSn~[1+137LinearProbing–DeleteDelete(0)0481216612293428112370333045048121661229342811237453330Searchclusterforpair(ifany)tofillvacatedbucket048121661229342811237453330LinearProbing–DeleteDelete(38LinearProbing–Delete(34)Searchclusterforpair(ifany)tofillvacatedbucket0481216612293428112370333045048121661229028112373330450481216612290281123733304504812166122928112370333045LinearProbing–Delete(34)Sea39LinearProbing–Delete(29)Searchclusterforpair(ifany)tofillvacatedbucket048121661229342811237033304504812166123428112370333045048121661211342823703330450481216612113428237033304504812166121134282370333045LinearProbing–Delete(29)Sea40HashTableDesignPerformancerequirementsaregiven,determinemaximumpermissibleloadfactorWewantasuccessfulsearchtomakenomorethan10compares(expected)Sn~?(1+1/(1–LF))LF18/19Wewantanunsuccessfulsearchtomakenomorethan13compares(expected)Un~?(1+1/(1–LF)2)LF4/5SoLFmin{18/19,4/5}=4/5HashTableDesignPerformancer41HashTableDesignDynamicresizingoftableWheneverloadfactorexceedsthreshold(4/5inourexample),rehashintoatableofapproximatelytwicethecurrentsizeFixedtablesizeKnowmaximumnumberofpairsNomorethan1000pairsLF
4/5=>m
5/4*1000=1250.Pick
m(equaltodivisor)tobeaprimenumberoranoddnumberwithnoprimedivisorssmallerthan20HashTableDesignDynamicresiz42Collisionresolution:chainingEachbucketkeepsalinearlistofallpairsforwhichitisthehomebucketNeveroverflowifthecapacityofthelinearlistisunlimitedThelinearlistmayormaynotbesortedbykey.ThelinearlistmaybeanarraylinearlistorachainPerformanceIncreasingthelengthofthelistscandegraderetrievalperformanceRequireadditionalspaceforpointersCollisionresolution:chaining43Putinpairswhosekeysare6,12,34,29,28,11,23,7,0,33,30,45Homebucket=key%17.SortedChains[0][4][8][12][16]126342928112370333045Putinpairswhosekeysare6,44ExpectedPerformanceNotethatLF0.ExpectedchainlengthisLF.Sn~1+LF/2(inthecaseofunsortedlist)Un~LFExpectedPerformanceNotethat45Coalescedhashing(coalescedchaining)Combinelinearprobingandchainingh(K)=K%17
Insertpairswhosekeysare6,12,34,29,28,11,23,7,0Alternatively,thecollidingkeycanbeputinanoverflowarea(calledacellar)0481216612293428112370Coalescedhashing(coalescedc46BucketaddressingAssociateabucketwitheachaddress,andabucketisablockofspaceenoughtostoremultipleitemsCollisionisnottotallyavoided,ifabucketisfull,thenAnewitemhashedtothebucketcanbestoredinthenextbucket,justasdoesintheopenaddressingapproachThenewitemcanalsobestoredinanoverflowarea.ThebucketwillbemarkedwithaflagindicatingithasadditionalitemstobesearchedBucketaddressingAssociateab47PerfecthashfunctionsPerfecthashfunctionsoridealhashfunctionsIthashesakeytoitsproperposition,andnocollisionsoccurAnassumptionisthatthekeysetisknownIfthenumberofcellsinthetableisequaltothenumberofdataitems,itiscalledaminimalperfecthashfunction sonospaceiswastedExamplesReservedwordsusedbyassemblers,compilers,filesonunerasableopticaldisks,dictionaries,…ItisnoteasytoobtainaperfecthashfunctionPerfecthashfunctionsPerfect48Cichelli’smethodtoconstructaminimalperfecthashfunctionDevelopedbyRichardJ.CichelliUsedtohasharelativelysmallnumberofreservedwordsThefunctionisoftheform
h(word)=(length(word)+g(firstletter(word))
+g(lastletter(word)))modm
wheregisthefunctiontobeconstructedCichelli’smethodtoconstruct49Cichelli’salgorithmChooseavalueforMaxComputethenumberofoccurrencesofeachfirstandlastletterinthesetofallwordsOrderallwordsinaccordancetothefrequencyofoccurrenceofthefirstandthelastlettersExamples:Calliope,Clio,Erato,Euterpe,Melpomene,Polyhymnia,Terpsichore,Thalia,Urania=>E(6),A(3),C(2),O(2),T(2),M(1),P(1),U(1)Euterpe,Calliope,Erato,Terpsichore,Melpomene,Thalia,Clio,Polyhymnia,UraniaCichelli’salgorithmChooseav50Cichelli’salgorithm(cont.)Search(wordList) ifwordListisempty halt;
word=firstwordfromwordList;
wordList=wordListwiththefirstworddetached; ifthefirstandthelastlettersofwordareassignedg-values try(word,-1,-1)//-1signifies‘valuealreadyassigned’ ifsuccess Search(wordList) putwordatthebeginningofwordListanddetachitshashvalue; elseifneigherthefirstnorthelastlettershasag-value foreachn,min{0,…,Max) try(word,n,m); ifsuccess Search(wordList) putwordatthebeginningofwordListanddetachitshashvalue; elseifeitherthefirstorthelastletterhasag-value foreachnin{0,…,Max) try(word,-1,n); ifsuccess Search(wordList) putwordatthebeginningofwordListanddetachitshashvalue;Cichelli’salgorithm(cont.)Se51Cichelli’salgorithm(cont.)try(word,firstLetterValue,lastLetterValue) ifh(word)hasnotbeenclaimed reserveh(word); ifnot-1(i.e.,notreserved) assignfirstLetterValueand/orlastLetterValue
asg-valuesof firstletter(word)and/orlastletter(word) returnsuccess returnfailureCichelli’salgorithm(cont.)tr52Ainvocationsofthesearchingprocedure reservedhashvaluesEuterpeE=0h=7 {7}CalliopeC=0h=8 {78} EratoO=0h=5 {578} TerpsichoreT=0 h=2 {2578} Melpomene M=0 h=0 {02578} Thalia A=0 h=6 {025678} Clio h=4 {0245678} Polyhymnia P=0 h=1 {01245678} Urania U=0 h=6* {01245678} Urania U=1 h=7* {01245678} Urania U=2 h=8* {01245678} Urania U=3 h=0* {01245678} Urania U=4 h=1* {01245678} Polyhymnia P=1 h=2* {0245678} Polyhymnia P=2 h=3 {02345678} Urania U=0 h=6* {02345678} Urania U=1 h=7* {02345678} Urania U=2 h=8* {02345678} Urania U=3 h=0* {02345678} Urania U=4 h=1 {012345678}Ainvocationsofthesearching53Aboutcichelli’salgorithmAbrute-forcesearchforg-function,sothesearchprocessisexponentialItisnotapplicabletoalargenumberofwordsItdoesnotguaranteethataperfecthashfunctioncanbefoundExtensions(1)Thesecondtolastlettersinthewordareinvolvedinthehashfunctions(2)h(word)=length(word)+g1(firstletter(word))+…+glength(word)(lastletter(word))(3)h(word)=bucketgr(word)+hgr(word)(word)Aboutcichelli’salgorithmAbr54FHCDalgorithmSearchforaminimalperfecthashfunctionoftheform
h(word)=h0(word)+g(h1(word))+g(h2(word))
h0:keyspace->[0,m)
h1:keyspace->[0,r)
h2:keyspace->[r,2r)Steps:Mapping(dependencygraph):betweenh1toh2foreachwordOrdering:firstdeterminethenodewithmorelinksSearching:assignhashvaluestokeysbychoosingappropriateg-valueforeachvertex
Foreachvertex,eitherg(h1(word))org(h2(word))isknownFHCDalgorithmSearchforamin55Extensiblehashing0001101100011
001100010110011
110111110001100
010112221b00b01b1h(k)=11001h(k)=000010001101100011
00110001011001101100
010112222b00b01b101101111100110012b1100000101001100011
000011001101100
010113322b000b01b101101111100110012b1110010111011100110
001013b001Extensiblehashing00011011000156AnexampleofhashfunctionIndistributedenvironments,howtopartitionalargenumberofjobs(orrecords)intoaclusterofmachineskeyv=h(key)AnexampleofhashfunctionIn57LinearhashingNoindexisneededThebucketissplitifnecessaryAteachlevelofsplitting,linearhashingmaintaintwohashfunctions,hlevelandhlevel+1,suchthat
hlevel=Kmode(TSize*2level)0001
001000110101
01111000
100110111100
1101LinearhashingNoindexisneed58STLhash_mapSimplyusesadivisorthatisanoddnumberThissimplifiesimplementationbecausewemustbeabletoresizethehashtableasmorepairsareputintothedictionaryArraydoubling,forexample,requiresyoutogofroma1Darraytablewhoselengthism(whichisodd)toanarraywhoselengthis2m+1(whichisalsoodd)STLhash_mapSimplyusesadivi59AbouthashtablesPerfecthashfunctions(n
m)Uniformhashfunctions(nm)PerformancerequirementsHowtodealwithcollisionsHowtodealwithoverflowsifthesizeofbucketsisfixedAbouthashtablesPerfecthash60BloomFilters–anintroductionDifferentialFilesSimplelargedatabaseFileofrecordsresidingondiskSinglekeyIndextorecordsOperationsRetrieveUpdateInsertanewrecordMakechangestoanexistingrecordDeletearecordBloomFilters–anintroductio61Na?veModeOfOperationProblemsIndexandFilechangewithtimeRecovery=>CopyMasterFile(MF)frombackupCopyMasterIndex(MI)frombackupProcessalltransactionssincelastbackupRecoverytimedependsonMF&MIsize+#transactionssincelastbackupKeyIndexFileNa?veModeOfOperationProblem62DifferentialFileMakenochangestomasterfileAlterindexandwriteupdatedrecordtoanewfilecalleddifferentialfileKeyIndexFileDFAdvantageDFissmallerthanFileandsomaybebackedupmorefrequentlyIndexneedstobebackedupwheneverDFis.So,indexshouldbenolargerthanDFRecoverytimeisreducedDifferentialFileMakenochang63DifferentialFileOperationKeyIndexFileDFDisadvantageEventuallyDFbecomeslargeandcannolongerbebackedupwithdesiredfrequencyMustintegrateFileandDFnowFollowingintegration,DFisemptyDifferentialFileOperationKey64DifferentialFileOperationKeyIndexFileDFLargeIndexIndexcannotbebackedupasfrequentlyasdesiredTimetorecovercurrentstateofindex&DFisexcessiveUseadifferentialindexMakenochangestoIndexDIisanindextoalldeletedrecordsandupdatedrecordsinDFDifferentialFileOperationKey65DifferentialFile&IndexOperationPerformancehitMostqueriessearchbothDIandIndexIncreasein#ofdiskaccesses/queryUseafiltertodecidewhetherornotDIshouldbesearchedKeyIndexFileDFDIYNYDifferentialFile&IndexOper66IdealFilterKeyIndexFileDFFilterYNYDIYY
=>thiskeyisintheDIN
=>thiskeyisnotintheDIFunctionalityofidealfilterissameasthatofDISo,afilterthateliminatesperformancehitofDIdoesn’texistIdealFilterKeyIndexFileDFFilt67BloomFilter(BF)N
=>thiskeyisnotintheDIM(maybe)
=>thiskeymaybeintheDIFiltererrorBFsaysMaybeDIsaysNoKeyIndexFileDFBFYNYDIMNBloomFilter(BF)N=>thiskey68BloomFilter(BF)FiltererrorBFsaysMaybeDIsaysNoKeyIndexFileDFBFYNYDIMNBFresidesinmemoryPerformancehitpaidonlywhenthereisafiltererrorBloomFilter(BF)FiltererrorK69BloomFilterByBurtonH.Bloomin1970Bloomfilterisaspace-efficientprobabilisticdatastructurethatisusedtotestwhetheranelementisamemberofasetFalsepositivesarepossibleFalsenegativearenotItissuitableforthesetthatsupportaddition/queryoperationsRemovinganelementisnoteasy(consideringacountingfilter)See/wiki/Bloom_filterBloomFilterByBurtonH.Bloom70BloomFilterDesignUsembitsofmemoryfortheBFLargerm=>fewerfiltererrorsWhenBFempty,allm
bits
=0Usek>0hashfunctions:f1(),f2(),…,fk()WhenkeyKeyinsertedintoBF,setbitsf1(Key),f2(Key),…,andfk(Key)to1f1(Key),f2(key),…,fk(Key)isthesignatureofkeyKeyBloomFilterDesignUsembits71Examplem=11(normally,mwouldbemuchmuchlarger)k=2(2hashfunctions)f1(Key)
=Keymodmf2(Key)
=(2Key)modmKey=15Key=17000000000001234567890111110Examplem=11(normally,mwou72ExampleBFhasKey=15andKey=17Searchfor
Keyf1(Key)
=0or
f2(Key)
=0=>
Key
notinBFf1(Key)
=1and
f2(Key)
=1=>
Key
maybeinBFKey=6
=>filtererror000000000001234567890111110ExampleBFhasKey=15andKey73BloomFilterDesignChoosem(thefiltersizeinbits)UseasmuchmemoryasisavailablePickk(thenumberofhashfunctions)ktoosmall=>probabilityofdifferentkeyshavingsamesignatureishigh.ktoolarge=>filterbecomesfilledwith1stoosoonSelectthekhashfunctionsHashfunctionsshouldberelativelyindependentBloomFilterDesignChoosem(t74TheparametersofourbloomfilterProbabilityofafiltererrordependson:Filtersize…m#ofhashfunctions…k#ofelementsinsertedbeforefilterisresetto0…nAssumethatmandnare
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年自來水廠自動(dòng)化系統(tǒng)項(xiàng)目可行性研究報(bào)告
- 2025年度房地產(chǎn)開發(fā)商短期借款格式合同
- 2025年度全球?qū)@夹g(shù)許可及實(shí)施合同
- 2025年度景區(qū)場地租賃與旅游體驗(yàn)項(xiàng)目合作合同樣本
- 2025年度智能化駕校學(xué)員培訓(xùn)服務(wù)合同協(xié)議書
- 2025年度化工研發(fā)合作項(xiàng)目合同范本
- 2025年度智慧社區(qū)建設(shè)合同簽約承諾書范本
- 2025年度城市排水與污水處理工程設(shè)計(jì)合同
- 2025年度地下綜合管廊設(shè)計(jì)合同示范文本GF
- 2025年化工廠安全生產(chǎn)標(biāo)準(zhǔn)化建設(shè)勞動(dòng)合同
- 2024年中考語文試題分類匯編:散文、小說閱讀(第03期)含答案及解析
- 《宮頸癌篩查》課件
- 2024年聯(lián)勤保障部隊(duì)第九四〇醫(yī)院社會(huì)招聘考試真題
- 第二章《有理數(shù)的運(yùn)算》單元備課教學(xué)實(shí)錄2024-2025學(xué)年人教版數(shù)學(xué)七年級上冊
- DB31-T 596-2021 城市軌道交通合理通風(fēng)技術(shù)管理要求
- 華為智慧園區(qū)解決方案介紹
- 2022年江西省公務(wù)員錄用考試《申論》真題(縣鄉(xiāng)卷)及答案解析
- 2024年國家公務(wù)員考試《行測》真題(地市級)及答案解析
- 【招投標(biāo)管理探究的國內(nèi)外文獻(xiàn)綜述2600字】
- 人教版八年級英語上冊期末專項(xiàng)復(fù)習(xí)-完形填空和閱讀理解(含答案)
- 一例蛇串瘡患者個(gè)案護(hù)理課件
評論
0/150
提交評論