斯坦福大學(xué)自然語言處理14章課件_第1頁
斯坦福大學(xué)自然語言處理14章課件_第2頁
斯坦福大學(xué)自然語言處理14章課件_第3頁
斯坦福大學(xué)自然語言處理14章課件_第4頁
斯坦福大學(xué)自然語言處理14章課件_第5頁
已閱讀5頁,還剩52頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

CFGsandPCFGs(Probabilistic)Context-FreeGrammarsAphrasestructuregrammarSNPVPVPVNPVP

VNPPPNPNPNPNP

NPPPNPNNPePPPNPpeoplefishtankspeoplefishwithrodsNpeopleNfishNtanksN

rodsVpeopleVfishVtanksPwithPhrasestructuregrammars

=context-freegrammars(CFGs)G=(T,N,S,R)TisasetofterminalsymbolsNisasetofnonterminalsymbolsSisthestartsymbol(S∈N)Risasetofrules/productionsoftheformXX∈Nand∈(N∪T)*AgrammarGgeneratesalanguageL.PhrasestructuregrammarsinNLPG=(T,C,N,S,L,R)TisasetofterminalsymbolsCisasetofpreterminalsymbolsNisasetofnonterminalsymbolsSisthestartsymbol(S∈N)Listhelexicon,asetofitemsoftheformXxX∈Pandx∈TRisthegrammar,asetofitemsoftheformXX∈Nand∈(N∪C)*Byusualconvention,Sisthestartsymbol,butinstatisticalNLP,weusuallyhaveanextranodeatthetop(ROOT,TOP)Weusuallywriteeforanemptysequence,ratherthannothingAphrasestructuregrammarSNPVPVPVNPVP

VNPPPNPNPNPNPNPPPNPNNPePPPNPpeoplefishtankspeoplefishwithrodsN

people

N

fish

N

tanks

N

rods

V

people

V

fish

V

tanks

P

with

Probabilistic–orstochastic–context-freegrammars(PCFGs)G=(T,N,S,R,P)TisasetofterminalsymbolsNisasetofnonterminalsymbolsSisthestartsymbol(S∈N)Risasetofrules/productionsoftheformXPisaprobabilityfunctionP:R[0,1]

AgrammarGgeneratesalanguagemodelL.APCFGSNPVP 1.0VPVNP 0.6VP

VNPPP 0.4NPNPNP 0.1NPNPPP 0.2NPN 0.7PPPNP 1.0N

people

0.5N

fish 0.2N

tanks 0.2N

rods 0.1V

people 0.1V

fish 0.6V

tanks 0.3P

with 1.0[WithemptyNPremovedsolessambiguous]TheprobabilityoftreesandstringsP(t)–Theprobabilityofatreet

istheproductoftheprobabilitiesoftherulesusedtogenerateit.P(s)–Theprobabilityofthestrings

isthesumoftheprobabilitiesofthetreeswhichhavethatstringastheiryieldP(s)=Σj

P(s,t)wheretisaparseofs

=Σj

P(t)TreeandStringProbabilities s

=

peoplefishtankswithrodsP(t1)=1.0×0.7×0.4×0.5×0.6×0.7×1.0×0.2×1.0×0.7×0.1

=0.0008232P(t2)=1.0×0.7×0.6×0.5×0.6×0.2

×0.7×1.0×0.2×1.0×0.7×0.1=0.00024696

P(s)=P(t1)+P(t2) =0.0008232+0.00024696=0.00107016

VerbattachNounattachCFGs

andPCFGs(Probabilistic)Context-FreeGrammarsGrammar

TransformsRestrictingthegrammarformforefficientparsingChomskyNormalFormAllrulesareoftheformXYZorXwX,Y,Z∈Nandw∈TAtransformationtothisformdoesn’tchangetheweakgenerativecapacityofaCFGThatis,itrecognizesthesamelanguageButmaybewithdifferenttreesEmptiesandunariesareremovedrecursivelyn-aryrulesaredividedbyintroducingnewnonterminals(n>2)AphrasestructuregrammarSNPVPVPVNPVP

VNPPPNPNPNPNPNPPPNPNNPePPPNPN

people

N

fish

N

tanks

N

rods

V

people

V

fish

V

tanks

P

with

ChomskyNormalFormstepsSNPVPS

VPVPVNPVP

VVP

VNPPPVP

VPPNPNPNPNP

NPNPNPPPNP

PPNPNPPPNPPP

PN

people

N

fish

N

tanks

N

rods

V

people

V

fish

V

tanks

P

with

ChomskyNormalFormstepsSNPVPVPVNPS

VNPVP

VSVVP

VNPPPSVNPPPVP

VPPSVPPNPNPNPNP

NPNPNPPPNP

PPNPNPPPNPPP

PN

people

N

fish

N

tanks

N

rods

V

people

V

fish

V

tanks

P

with

ChomskyNormalFormstepsSNPVPVPVNPS

VNPVP

VVP

VNPPPSVNPPPVP

VPPSVPPNPNPNPNP

NPNPNPPPNP

PPNPNPPPNPPP

PN

people

N

fish

N

tanks

N

rods

V

people

S

people

V

fish

S

fish

V

tanks

S

tanks

P

with

ChomskyNormalFormstepsSNPVPVPVNPS

VNPVP

VNPPPSVNPPPVP

VPPSVPPNPNPNPNP

NPNPNPPPNP

PPNPNPPPNPPP

PN

people

N

fish

N

tanks

N

rods

V

people

S

people

VP

people

V

fish

S

fish

VP

fish

V

tanks

S

tanks

VP

tanks

P

with

ChomskyNormalFormstepsSNPVPVPVNPS

VNPVP

VNPPPSVNPPPVP

VPPSVPPNPNPNPNPNPPPNPPNPPPPNPNP

people

NP

fish

NP

tanks

NP

rods

V

people

S

people

VP

people

V

fish

S

fish

VP

fish

V

tanks

S

tanks

VP

tanks

P

with

PP

with

ChomskyNormalFormstepsSNPVPVPVNPS

VNPVP

V@VP_V@VP_V

NPPPSV@S_V@S_V

NPPPVP

VPPSVPPNPNPNPNPNPPPNPPNPPPPNPNP

people

NP

fish

NP

tanks

NP

rods

V

people

S

people

VP

people

V

fish

S

fish

VP

fish

V

tanks

S

tanks

VP

tanks

P

with

PP

with

AphrasestructuregrammarSNPVPVPVNPVP

VNPPPNPNPNPNPNPPPNPNNPePPPNPN

people

N

fish

N

tanks

N

rods

V

people

V

fish

V

tanks

P

with

ChomskyNormalFormstepsSNPVPVPVNPS

VNPVP

V@VP_V@VP_V

NPPPSV@S_V@S_V

NPPPVP

VPPSVPPNPNPNPNPNPPPNPPNPPPPNPNP

people

NP

fish

NP

tanks

NP

rods

V

people

S

people

VP

people

V

fish

S

fish

VP

fish

V

tanks

S

tanks

VP

tanks

P

with

PP

with

ChomskyNormalFormYoushouldthinkofthisasatransformationforefficientparsingWithsomeextrabook-keepinginsymbolnames,youcanevenreconstructthesametreeswithadetransformInpracticefullChomskyNormalFormisapainReconstructingn-ariesiseasyReconstructingunaries/emptiesistrickierBinarizationiscrucialforcubictimeCFGparsingTherestisn’tnecessary;itjustmakesthealgorithmscleanerandabitquickerROOTSNPVPNpeopleVNPPPPNProdswithtanksfishNNAnexample:beforebinarization…PNProdsNwithNPNpeopletanksfishNVPVNPPP@VP_VROOTSAfterbinarization…Treebank:emptiesandunariesROOTS-HLNNP-SUBJVPVB-NONE-eAtonePTBTreeROOTSNPVPVB-NONE-eAtoneNoFuncTagsROOTSVPVBAtoneNoEmptiesROOTSAtoneNoUnariesROOTVBAtoneHighLowUnaryrules:

alchemyinthelandoftreebanksSame-SpanReachabilityADJPADVPFRAGINTJNPPPPRNQPSSBARUCPVPWHNPTOPLSTCONJPWHADJPWHADVPWHPPNXNoEmptiesNACSBARQSINVRRCSQXPRTGrammar

TransformsRestrictingthegrammarformforefficientparsingCKYParsingExactpolynomialtimeparsingof(P)CFGsConstituencyParsing fish

peoplefishtanksRuleProbθi

SNPVP θ0NPNPNP θ1…Nfish θ42Npeople θ43V

fish θ44…PCFGNNVNVPNPNPSCocke-Kasami-Younger(CKY)

ConstituencyParsingfishpeoplefish

tanksViterbi(Max)ScorespeoplefishNP

0.35V 0.1N 0.5VP

0.06NP 0.14V 0.6N 0.2NP→NNNNS 0.13iNP=(0.13)(0.0023)(0.0014)=1.87×10-7NP→NNPNNS 0.056iNP=(0.056)(0.001)(0.0014)=7.84×10-8S1.87×10-7VPViterbi(Max)ScorespeoplefishNP

0.35V 0.1N 0.5VP

0.06NP 0.14V 0.6N 0.2SNPVP 0.9SVP 0.1VPVNP 0.5VPV 0.1VPV@VP_V 0.3VPVPP 0.1@VP_VNPPP 1.0NPNPNP 0.1NPNPPP 0.2NPN 0.7PPPNP 1.0ExtendedCKYparsingUnariescanbeincorporatedintothealgorithmMessy,butdoesn’tincreasealgorithmiccomplexityEmptiescanbeincorporatedUsefencepostsDoesn’tincreasecomplexity;essentiallylikeunariesBinarizationisvitalWithoutbinarization,youdon’tgetparsingcubicinthelengthofthesentenceandinthenumberofnonterminalsinthegrammarBinarizationmaybeanexplicittransformationorimplicitinhowtheparserworks(Early-styledottedrules),butit’salwaysthere.functionCKY(words,grammar)returns[most_probable_parse,prob]score=newdouble[#(words)+1][#(words)+1][#(nonterms)]back=newPair[#(words)+1][#(words)+1][#nonterms]]fori=0;i<#(words);i++forAinnontermsifA->words[i]ingrammarscore[i][i+1][A]=P(A->words[i])//handleunariesbooleanadded=truewhileaddedadded=falseforA,Binnontermsifscore[i][i+1][B]>0&&A->Bingrammarprob=P(A->B)*score[i][i+1][B]

ifprob>score[i][i+1][A]score[i][i+1][A]=probback[i][i+1][A]=Badded=trueTheCKYalgorithm(1960/1965)

…extendedtounariesforspan=2to#(words)forbegin=0to#(words)-spanend=begin+spanforsplit=begin+1toend-1forA,B,Cinnontermsprob=score[begin][split][B]*score[split][end][C]*P(A->BC)

ifprob>score[begin][end][A]score[begin]end][A]=probback[begin][end][A]=newTriple(split,B,C)

//handleunaries

booleanadded=true

whileadded

added=false

forA,Binnonterms

prob=P(A->B)*score[begin][end][B];

ifprob>score[begin][end][A]

score[begin][end][A]=prob

back[begin][end][A]=B

added=truereturnbuildTree(score,back)TheCKYalgorithm(1960/1965)

…extendedtounariesQuizQuestion!runsdownNNS 0.0023VB 0.001PP 0.2IN 0.0014NNS 0.0001PP→IN 0.002NP→NNSNNS 0.01NP→NNSNP 0.005NP→NNSPP 0.01VP→VBPP 0.045VP→VBNP 0.015?? ???? ??Whatconstituents(withwhatprobabilitycanyoumake?CKYParsingExactpolynomialtimeparsingof(P)CFGsCKYParsingAworkedexampleThegrammar:

Binary,noepsilons,SNPVP 0.9SVP 0.1VPVNP 0.5VP

V 0.1VP

V@VP_V 0.3VPVPP 0.1@VP_VNPPP 1.0NPNPNP 0.1NPNPPP 0.2NPN 0.7PPPNP 1.0N

people 0.5N

fish 0.2N

tanks 0.2N

rods 0.1V

people 0.1V

fish 0.6V

tanks 0.3P

with 1.0score[0][1]score[1][2]score[2][3]score[3][4]score[0][2]score[1][3]score[2][4]score[0][3]score[1][4]score[0][4]012341234fishpeoplefishtanks012341234fishpeoplefishtanksSNPVP 0.9SVP 0.1VPVNP 0.5VPV 0.1VPV@VP_V 0.3VPVPP 0.1@VP_VNPPP 1.0NPNPNP 0.1NPNPPP 0.2NPN 0.7PPPNP 1.0N

people 0.5N

fish

0.2N

tanks

0.2N

rods

0.1V

people 0.1V

fish

0.6V

tanks 0.3P

with

1.0fori=0;i<#(words);i++forAinnontermsifA->words[i]ingrammarscore[i][i+1][A]=P(A->words[i]);Nfish0.2Vfish0.6Npeople0.5Vpeople0.1Nfish0.2Vfish0.6N

tanks0.2V

tanks0.1012341234fishpeoplefishtanksSNPVP 0.9SVP 0.1VPVNP 0.5VPV 0.1VPV@VP_V 0.3VPVPP 0.1@VP_VNPPP 1.0NPNPNP 0.1NPNPPP 0.2NPN 0.7PPPNP 1.0N

people 0.5N

fish

0.2N

tanks

0.2N

rods

0.1V

people 0.1V

fish

0.6V

tanks 0.3P

with

1.0//handleunariesbooleanadded=truewhileaddedadded=falseforA,Binnontermsifscore[i][i+1][B]>0&&A->Bingrammarprob=P(A->B)*score[i][i+1][B]if(prob>score[i][i+1][A])score[i][i+1][A]=probback[i][i+1][A]=Badded=trueNfish0.2Vfish0.6NP

N0.14VP

V0.06S

VP0.006Npeople0.5Vpeople0.1NPN0.35VPV0.01SVP0.001Nfish0.2Vfish0.6NPN0.14VPV0.06SVP0.006N

tanks0.2V

tanks0.1NPN0.14VPV0.03SVP0.003012341234fishpeoplefishtanksSNPVP 0.9SVP 0.1VPVNP 0.5VPV 0.1VPV@VP_V 0.3VPVPP 0.1@VP_VNPPP 1.0NPNPNP 0.1NPNPPP 0.2NPN 0.7PPPNP 1.0N

people 0.5N

fish

0.2N

tanks

0.2N

rods

0.1V

people 0.1V

fish

0.6V

tanks 0.3P

with

1.0prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if(prob>score[begin][end][A])

score[begin]end][A]=prob

back[begin][end][A]=newTriple(split,B,C)Nfish0.2Vfish0.6NP

N0.14VP

V0.06S

VP0.006Npeople0.5Vpeople0.1NPN0.35VPV0.01SVP0.001Nfish0.2Vfish0.6NPN0.14VPV0.06SVP0.006N

tanks0.2V

tanks0.1NPN0.14VPV0.03SVP0.003NP

NPNP

0.0049VPVNP

0.105S

NPVP

0.00126NPNPNP

0.0049VPVNP

0.007SNPVP

0.0189NPNPNP

0.00196VPVNP

0.042SNPVP

0.00378012341234fishpeoplefishtanksSNPVP 0.9SVP 0.1VPVNP 0.5VPV 0.1VPV@VP_V 0.3VPVPP 0.1@VP_VNPPP 1.0NPNPNP 0.1NPNPPP 0.2NPN 0.7PPPNP 1.0N

people 0.5N

fish

0.2N

tanks

0.2N

rods

0.1V

people 0.1V

fish

0.6V

tanks 0.3P

with

1.0//handleunariesbooleanadded=truewhileadded

added=false

forA,Binnonterms

prob=P(A->B)*score[begin][end][B];

ifprob>score[begin][end][A]

score[begin][end][A]=prob

back[begin][end][A]=B

added=trueNfish0.2Vfish0.6NP

N0.14VP

V0.06S

VP0.006Npeople0.5Vpeople0.1NPN0.35VPV0.01SVP0.001Nfish0.2Vfish0.6NPN0.14VPV0.06SVP0.006N

tanks0.2V

tanks0.1NPN0.14VPV0.03SVP0.003NP

NPNP

0.0049VPVNP

0.105S

VP

0.0105NPNPNP

0.0049VPVNP

0.007SNPVP

0.0189NPNPNP

0.00196VPVNP

0.042S

VP

0.0042012341234fishpeoplefishtanksSNPVP 0.9SVP 0.1VPVNP 0.5VPV 0.1VPV@VP_V 0.3VPVPP 0.1@VP_VNPPP 1.0NPNPNP 0.1NPNPPP 0.2NPN 0.7PPPNP 1.0N

people 0.5N

fish

0.2N

tanks

0.2N

rods

0.1V

people 0.1V

fish

0.6V

tanks 0.3P

with

1.0forsplit=begin+1toend-1

forA,B,Cinnonterms

prob=score[begin][split][B]*score[split][end][C]*P(A->BC)

ifprob>score[begin][end][A]

score[begin]end][A]=prob

back[begin][end][A]=newTriple(split,B,C)Nfish0.2Vfish0.6NP

N0.14VP

V0.06S

VP0.006Npeople0.5Vpeople0.1NPN0.35VPV0.01SVP0.001Nfish0.2Vfish0.6NPN0.14VPV0.06SVP0.006N

tanks0.2V

tanks0.1NPN0.14VPV0.03SVP0.003NP

NPNP

0.0049VPVNP

0.105S

VP

0.0105NPNPNP

0.0049VPVNP

0.007SNPVP

0.0189NPNPNP

0.00196VPVNP

0.042S

VP

0.0042NPNPNP

0.0000686VPVNP

0.00147SNPVP

0.000882012341234fishpeoplefishtanksSNPVP 0.9SVP 0.1VPVNP 0.5VPV 0.1VPV@VP_V 0.3VPVPP 0.1@VP_VNPPP 1.0NPNPNP 0.1NPNPPP 0.2NPN 0.7PPPNP 1.0N

people 0.5N

fish

0.2N

tanks

0.2N

rods

0.1V

people 0.1V

fish

0.6V

tanks 0.3P

with

1.0forsplit=begin+1toend-1

forA,B,Cinnonterms

prob=score[begin][split][B]*score[split][end][C]*P(A->BC)

ifprob>score[begin][end][A]

score[begin]end][A]=prob

back[begin][end][A]=newTriple(split,B,C)Nfish0.2Vfish0.6NP

N0.14VP

V0.06S

VP0.006Npeople0.5Vpeople0.1NPN0.35VPV0.01SVP0.001Nfish0.2Vfish0.6NPN0.14VPV0.06SVP0.006N

tanks0.2V

tanks0.1NPN0.14VPV0.03SVP0.003NP

NPNP

0.0049VPVNP

0.105S

VP

0.0105NPNPNP

0.0049VPVNP

0.007SNPVP

0.0189NPNPNP

0.00196VPVNP

0.042S

VP

0.0042NPNPNP

0.0000686VPVNP

0.00147SNPVP

0.000882NPNPNP0.0000686VPVNP

0.000098SNPVP

0.01323012341234fishpeoplefishtanksSNPVP 0.9SVP 0.1VPVNP 0.5VPV 0.1VPV@VP_V 0.3VPVPP 0.1@VP_VNPPP 1.0NPNPNP 0.1NPNPPP 0.2NPN 0.7PPPNP 1.0N

people 0.5N

fish

0.2N

tanks

0.2N

rods

0.1V

people 0.1V

fish

0.6V

tanks 0.3P

with

1.0forsplit=begin+1toend-1

forA,B,Cinnonterms

prob=score[begin][split][B]*score[split][end][C]*P(A->BC)

ifprob>score[begin][end][A]

score[begin]end][A]=prob

back[begin][end][A]=newTriple(split,B,C)Nfish0.2Vfish0.6NP

N0.14VP

V0.06S

VP0.006Npeople0.5Vpeople0.1NPN0.35VPV0.01SVP0.001Nfish0.2Vfish0.6NPN0.14VPV0.0

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論