




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
PostgreSQLcheckpoint源碼分析-2015-05-0710:53:59|作者:|分類:PgSQL2015年度PG大象會(huì)報(bào)名地址:PostgreSQL中國(guó)社區(qū): PostgreSQL用戶 *Flush*Flushalldirtyblocksinbufferpooltodiskatcheckpoint*Note:temporaryotparticipateincheckpoints,so **needtobeCheckPointBuffers(int{ //bufferCheckpointStats.ckpt_write_t= CheckpointStats.ckpt_sync_t=// syncCheckpointStats.ckpt_sync_end_t= buffercheckpoint}BufferSyncbuffercheckpointflush的checkpointFLUSH但是需要注意,第一次設(shè)置為needcheckpoint的塊,有一個(gè)計(jì)數(shù),第二次在刷*BufferSync--Write*BufferSync--Writeoutalldirtybuffersinthe*ThisiscalledatcheckpointtimetowriteoutalldirtysharedThecheckpointrequestflagsshouldbepassed Ifisset,wedisabledelaysbetweenwrites;ifCHECKPOINT_END_OF_RECOVERYorCHECKPOINT_FLUSH_ALLisset,wewriteunloggedbuffers,whichareotherwise TheremainingcurrentlyhavenoeffectstaticBufferSync(int{ mask /*MakesurewecanhandlethepininsideSyncOneBufferUnlessthisisashutdowncheckpointorwehavebeenwewriteonlypermanent,dirty Butatshutdownorendrecovery,wewritealldirtyif(!((flags&(CHECKPOINT_IS_SHUTDOWN|CHECKPOINT_END_OF_RECOVERY)mask|= Loopoverallbuffers,andmarktheonesthatneedtobe Countthemaswego(num_to_write),socanestimatehowmuchworkneedstobeThisallowsustowriteonlythosepagesthatweredirtywhencheckpointbegan,andnotthosethatgetdirtiedwhileitWheneverapagewithBM_CHECKPOINT_NEEDEDiswrittenout,eitherlaterinthisfunction,orbynormalbackendsorthescan,theflagis AnybufferdirtiedafterthishavetheflagNotethatifwefailtowritesomebuffer,wemayleaveBM_CHECKPOINT_NEEDEDstill ThisisOKsinceanysuchcertainlyneedtobewrittenforthenextcheckpointattempt,num_to_write //BM_CHECKPOINT_NEEDEDforbuf_id0;buf_idNBuffers; {volatileBufferDesc*bufHdr=HeaderspinlockisenoughtoexamineBM_DIRTY,comment ifbufHdr->flags&mask) {bufHdr->flags|=}}if(num_to_write== nothingtodoTRACE_POSTGRESQL_BUFFER_SYNC_START(NBuffers, Loopoverallbuffersagain,andwritetheones(still)marked Inthisloop,westartattheclocksincewemightaswelldumpsoon-to-be-recycledbuffersNotethatwe'treadthebufferalloccountherethatleftuntouchedtillthenextBgBufferSync()buf_id=StrategySyncStart(NULL,num_to_scan=num_written=while(num_to_scan syncbuffer{volatileBufferDesc*bufHdr=only
* 'tneedtoacquirethelockhere,becauseatasinglebit.It'spossiblethatsomeoneelsetheandclearstheflagrightafterwecheck,butdoesn'tsinceSyncOneBufferwillthen thereisfurtherracecondition:it'sconceivablethatthetimeexaminethebithereandthetimeacquiressomeoneelsenotonlywrotethebufferbutreplacedwithpageanddirtied InthatimprobableSyncOneBufferwritethebufferthoughwedidn'tneed Itseemguardingagainstthis,if(bufHdr->flags& {if(SyncOneBuffer(buf_id,false) SyncOneBuffer{ Weknowthereareatnum_to_writebuffersBM_CHECKPOINT_NEEDEDset;socanstopscanningnum_writtenNotethatnum_writtenincludebufferswrittenotherbackends,orbybgwritercleaningscan.meansthattheestimateofmuchprogresswe'vemadeconservative,andalsothistestwilloftenfail Butitseemsmakingif(num_written SleeptothrottleourI/O(double)num_written (num_to_write),100(num_written) 0.1);,假 里,progress pletionTarget;=0.1*0.5= elapsed_xlogs=(recptr-ckpt_start_recptr))/XLogSegSize)/ progress progress0.5,num_writtennum_to_write1,10.5 }}if(++buf_id>=buf_id=}Updatecheckpointstatistics.Asnotedabove,thisdoesn'tbufferswrittenbyotherbackendsorbgwriterCheckpointStats.ckpt_bufs_written+= E(NBuffers, BM_CHECKPOINT_NEEDEDflush}*SyncOneBuffer--processasinglebufferduring**Ifskip_recently_usedistrue,'twritecurrently-pinnedbuffers,buffersmarkedrecentlyused,asthesearenotreplacement*Returnsabitmaskcontainingthefollowingflag*BUF_WRITTEN:wewrotethe*BUF_REUSABLE:bufferisavailableforreplacement,ie,it*pincount0andusagecount*(BUF_WRITTENcouldbesetinerrorifFlushBuffersfindsthebufferafterlockingit,but 'tcareallthatNote:callermust estaticintSyncOneBuffer(intbuf_id,bool{volatileBufferDesc*bufHdr=result=Checkwhetherbufferneeds*Wecanmakethischeckwithouttakingthebuffercontentlock*aswemarkpagesdirtyinaccessmethods*before*loggingXLogInsert():ifsomeonemarksthebufferdirtyjustaftercheck'tworrybecauseourcheckpoint.redopointsbeforelogingchangesandsowearenotrequiredtowritesuchif(bufHdr->refcount==0&&bufHdr->usage_count==result|=elseif{/*Callertoldusnottowriterecently-usedbuffersreturn}if(!(bufHdr->flags&BM_VALID)||!(bufHdr->flags&{/*It'sclean,sonothingtodoreturn}
*Pinit,share-lockit,write (FlushBuffer othing*bufferiscleanbythetimewe'velockedLWLockAcquire(bufHdr->content_lock,FlushBuffer(bufHdrFlushBuffer(bufHdr UnpinBuffer(bufHdr,true);returnresult|}FlushBufferBUFFERcheckpointWAL前,必須寫到磁盤。**Physicallywriteoutashared*NOTE:thisactuallyjustpassesthebuffercontentstothekernel;realwritetodiskwon'thappenuntilthekernelfeelslike isokayfromourpointofviewsincewecanredothechangesfromHowever,wewillneedtoforcethechangestodiskviafsyncwecancheckpoint checkpointWAL,bufferThecallermustholdapinonthebufferandhaveshare-lockedbuffer (Note:ashare-lockdoesnotpreventupdateshintbitsinthebuffer,sothepagecouldchangewhiletheisinprogress,butweassumethatthatwillnotinvalidatetheIfthecallerhasansmgrreferenceforthebuffer'srelation,passasthesecond Ifnot,passstaticFlushBuffer(volatileBufferDesc*buf,SMgrRelation{ ErrorContextCallback Acquirethebuffer'sio_in_progress Iffalse,thensomeoneelseflushedthebufferbeforewecould,sonotdoif(!StartBufferIO(buf,/*Setuperrortracebacksupportforereport()errcallback.callback=errcallback.arg=(void*)errcallback.previous=error_context_stack=/*Findsmgrrelationforbufferif(reln==reln=smgropen(buf->tag.rnode, RunPageGetLSNwhileholdingheaderlock,sincewe'thavebufferlockedexclusivelyinallrecptr BUFFER/*Tocheckifblockcontentchangeswhileflushing.-vadimbuf->flags&=ForceXLOGflushuptobuffer's Thisimplementsthe XLOGbufferlsnrulethatlogupdatesmusthitdiskbeforeanyofthedata- XLOGtheydescribeHowever,thisruledoesnotapplytounloggedrelations,whichlostafteracrash MostunloggedrelationpagesLSNssinceweneveremitWALrecordsforthem,andupthroughthebufferLSNwouldbeuseless, GiSTindexesuseLSNsinternallytotracksplits,unloggedGiSTpagesbear"fake"LSNsgenerated ItisunlikelybutpossiblethatLSNcountercouldadvancepasttheWALinsertionpoint;andifhappen,attemptingtoflushWALthroughthatlocationwoulddisastroussystem-wide Tomakesurethatskiptheflushifthebufferisn'tif(buf->flags&Nowit'ssafetowritebuffertodisk.Notethatnooneelsehavebeenabletowriteitwhilewewerebusywithlogwehavetheio_in_progressbufBlock=Updatepagechecksumif Sincewehaveonlysharedonbuffer,otherprocessesmightbeupdatinghintbitsinit,socopythepagetoprivatestorageifwedobufToWrite=PageSetChecksumCopy((Page)bufBlock,buf-ifbufToWriteiseitherthesharedbufferoracopy,as if{INSTR_TIME_SUBTRACT(io_time,INSTR_TIME_ADD(pgBufferUsage.blk_write_time,}Markthebufferasclean(unlessBM_JUST_DIRTIED eendtheio_in_progressTerminateBufferIO(buf,true, //單bufferflush/*Poptheerrorcontextstackerror_context_stack=}*smgrwrite()--*smgrwrite()--Writedbuffer**Thisistobe lyforupdatingalready-blocksof*relation(ie,thosebeforethecurrent Toextend*use**Thisisnotasynchronouswrite--theblockis*ondiskatreturn,onlydumpedoutto *provisionswillbemadetofsyncthewritebeforethe**skipFsyncindicatesthatthecallerwillmakeprovisions**fsynctherelation,soweneedn't relations*otrequiresmgrwrite(SMgrRelationreln,ForkNumberforknum,BlockNumberchar*buffer,bool{(*(smgrsw[reln->smgr_which].smgr_write))(reln,forknum,buffer,}*smgrsync()--Sync*smgrsync()--Syncfilestodiskduring{for(i=0;i<NSmgr;{if(*(smgrsw[i].smgr_sync))}}smgr_sync mdsync()--Syncpreviouswritestostable{staticboolmdsync_in_progress=HASH_SEQ_STATUSPendingOperationEntry /*Statisticsonsynctimes processed= longest= total_elapsed=Thisisonlycalledduringcheckpoints,andcheckpoints occurinprocessesthathavecreatedaifelog(ERROR,"cannotsyncwithoutaIfweareinthecheckpointer,thesynchadbetterincluderequeststhatwerequeuedbybackendsuptothis raceconditionthatcouldoccuristhatabufferthatmustandfsync'dforthecheckpointcouldhavebeendumpedbyabeforeitwasvisitedby Weknowthebackendqueuedanfsyncrequestbeforeclearingthebuffer'sdirtybit,aresafeaslongaswedoanAbsorbaftercompletingToavoidexcessfsync'ing(intheworstcase,maybeacheckpoint),wewanttoignorefsyncrequeststhatareentered
hashtableafterthispointtheyshouldbeprocessednext Weusemdsync_cycle_ctr loldentriesapartones:newoneswillhavecycle_ctrequaltotheincrementedInnormalcircumstances,allentriespresentinthetableatwillhavecycle_ctrexactlyequaltothecurrent(abouttobevalueof However,ifwefailpartwayfsync'ingloop,thenoldervaluesofcycle_ctrmightremainwhencomebackheretotry Repeatedcheckpointfailureseventuallywrapthecounteraroundtothepointwhereanoldmightappearnew,causingustoskipit,possiblyallowingtosucceedthatshouldnot Toforestallwraparound,timepreviousmdsync()failedtocomplete,runthroughthetableforciblysetcycle_ctr=Thinknottomergethisloopwiththemainloop,astheproblemexactlythatthatloopmayfailbeforehavingvisitedallFromaperformancepointofviewitdoesn'tmatteranyway,aswillneverbetakeninasystemthat'sfunctioningif{/*priortryfailed,soupdateanystalecycle_ctrhash_seq_init(&hstat,while((entry=(PendingOperationEntryhash_seq_search(&hstat))!={entry->cycle_ctr=}}/*Advancecountersothatnewhashtableentriesaredistinguishable/*Setflagtodetectfailureifwe'treachoftheloop*/mdsync_in_progress=true;/*Nowscanthehashtableforfsyncrequeststoprocessabsorb_counter=hash_seq_init(&hstat,while((entry=(PendingOperationEntry*)hash_seq_search(&hstat)){
Iftheentryisnew 'tprocessitthistime;containmultiplefsync-requestbits,buttheyare "continue"bypassesthehash-removecallattheoftheif(entry->cycle_ctr==/*Elseassertwehaven'tmisseditAssert((CycleCtr)(entry->cycle_ctr+1)ScanovertheforksandsegmentsrepresentedbyThebitmapmanipulationsareslightlytricky,wecanAbsorbFsyncRequests()insidetheloopandthatresultbms_add_member()modifyingandevenre-palloc'ingThisisokaybecauseweunlinkeachbitmapsetfromentrybeforescanning Thatmeansthatingrequestswillbeprocessednowiftheyreachthebeforebegintoscantheirfor(forknum=0;forknum<=MAX_FORKNUM;{ *requests entry->requests[forknum]=entry->canceled[forknum]=while((segno=bms_first_member(requests)){ havetobotheropening
Iffsyncisoffthen fileat (WecheckinguntilthispointsochangingfsynconthebehavesifIfincheckpointer,wewantabsorbpendingeverysooftentooverflowofthefsync Itiswhethernewly-addedentriesbutwe'tcaresince
bevisitedby'tneedtoprocessif(--absorb_counter<={absorb_counter}Thefsynctablecouldrequeststofsyncthathavebeen(unlinked)bythetimewegetthem.RatherthanjustanENOENT(orEACCESWindows)errorcanbewhatwedoonerrorabsorbpendingrequeststhen Sincequeuesa"cancel"beforeactuallyunlinking,fsyncrequestisguaranteedbemarkedcanceledafterabsorbifitreallywascase.DROPDATABASEhastolustoforgetrequestsbeforeitfor(failures=0;;failures++)loopexitsat"break"{SMgrRelation FindorcreatesmgrhashentryforthisThismayseemaunclean--mdcalling it'sreallybest Itensuresthatopenfileisn'tpermanentlyleakedifweanerrorhere.maysay"butanSMgrRelationstillaleak!"Notreally,becauseonlycaseinacheckpoint ebyathatisn'taboutshutdownisintheanditperiodicallydosmgrcloseall().Thisjustifiesourclosingtherelninthesuccesseither,whichisgoodthingsinceinnon-caseswesafelydorelnsmgropen(entry->rnode,/*Attempttoopenfsyncthetargetsegmentseg_mdfd_getseg(reln,(BlockNumber)segno*(BlockNumber)false,if(seg!=NULLg->mdfd_vfd)>={/*updatestatisticsaboutsynctimingsync_diffSUBTRACT(sync_diff,elapsed(elapsed>longest=ed+=;elog(DEBUG1,"checkpointsync:number=%dfile=%stime=%.3f(double)elapsed/ outofretryloop}/*Computefileforuseinmessagesave_er
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 智能化系統(tǒng)安裝工程合同書
- 水利水電工程勞務(wù)承包合同
- 土地使用權(quán)征收補(bǔ)償合同協(xié)議
- 影視劇本供應(yīng)與購買合同書版
- 規(guī)范化離婚合同文本范文
- 采購合同簡(jiǎn)版-鋼材專項(xiàng)
- 婦科培訓(xùn)課件模板
- 小學(xué)生唱音階課件圖片
- 公證員網(wǎng)絡(luò)知識(shí)產(chǎn)權(quán)考核試卷
- 墨水制備實(shí)驗(yàn)室建設(shè)與管理考核試卷
- 中國(guó)老年危重患者營(yíng)養(yǎng)支持治療指南2023解讀課件
- 《光伏電站運(yùn)行與維護(hù)》試題及答案一
- DBJ∕T 15-19-2020 建筑防水工程技術(shù)規(guī)程
- 2024年貴州省高職(專科)分類考試招收中職畢業(yè)生文化綜合考試語文試題
- 二十四式太極拳教案高一上學(xué)期體育與健康人教版
- 2024-2025學(xué)年外研版(2024)七年級(jí)英語上冊(cè)英語各單元教學(xué)設(shè)計(jì)
- 國(guó)家病案質(zhì)控死亡病例自查表
- 一年級(jí)體育教案全冊(cè)(水平一)下冊(cè)
- 全身麻醉后護(hù)理常規(guī)
- 《積極心理學(xué)(第3版)》 課件 第2章 心理流暢體驗(yàn)、第3章 積極情緒的價(jià)值
- 2024至2030年全球及中國(guó)3D硅電容器行業(yè)研究及十四五規(guī)劃分析報(bào)告
評(píng)論
0/150
提交評(píng)論