Dependency Parsing by Belief Propagation - JHU Department of 依存句法分析的信念傳播-約翰霍普金斯大學(xué)部_第1頁
Dependency Parsing by Belief Propagation - JHU Department of 依存句法分析的信念傳播-約翰霍普金斯大學(xué)部_第2頁
Dependency Parsing by Belief Propagation - JHU Department of 依存句法分析的信念傳播-約翰霍普金斯大學(xué)部_第3頁
Dependency Parsing by Belief Propagation - JHU Department of 依存句法分析的信念傳播-約翰霍普金斯大學(xué)部_第4頁
Dependency Parsing by Belief Propagation - JHU Department of 依存句法分析的信念傳播-約翰霍普金斯大學(xué)部_第5頁
已閱讀5頁,還剩15頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)

文檔簡介

1、1David A. Smith (JHU UMass Amherst)Jason Eisner (Johns Hopkins University)Dependency Parsingby Belief PropagationOutlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding the be

2、st parse: Belief propagationExperimentsConclusionsNew!OldOutlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding the best parse: Belief propagationExperimentsConclusionsNew!Ol

3、dMODWord Dependency ParsingHe reckons the current account deficit will narrow to only 1.8 billion in September.Raw sentencePart-of-speech tagging He reckons the current account deficit will narrow to only 1.8 billion in September. PRP VBZ DT JJ NN NN MD VB TO RB CD CD IN NNP .POS-tagged sentenceWord

4、 dependency parsingslide adapted from Yuji MatsumotoWord dependency parsed sentenceHe reckons the current account deficit will narrow to only billion in September .SUBJROOTS-COMPSUBJSPECMODMODCOMPCOMPWhat does parsing have to do with belief propagation?loopy belief propagationbeliefloopypropagationO

5、utlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding the best parse: Belief propagationExperimentsConclusionsNew!Old7Great ideas in NLP: Log-linear models (Berger, della Pie

6、tra, della Pietra 1996; Darroch & Ratcliff 1972)In the beginning, we used generative models.p(A) * p(B | A) * p(C | A,B) * p(D | A,B,C) * each choice depends on a limited part of the historybut which dependencies to allow?what if theyre all worthwhile?p(D | A,B,C)?p(D | A,B,C)? p(D | A,B) * p(C | A,

7、B,D)?8Great ideas in NLP: Log-linear models (Berger, della Pietra, della Pietra 1996; Darroch & Ratcliff 1972)In the beginning, we used generative models.Solution: Log-linear (max-entropy) modelingFeatures may interact in arbitrary waysIterative scaling keeps adjusting the feature weightsuntil the m

8、odel agrees with the training data.p(A) * p(B | A) * p(C | A,B) * p(D | A,B,C) * which dependencies to allow? (given limited training data)(1/Z) * (A) * (B,A) * (C,A) * (C,B) * (D,A,B) * (D,B,C) * (D,A,C) * throw them all in!Log-linear models great for n-way classificationAlso good for predicting se

9、quencesAlso good for dependency parsing9How about structured outputs?but to allow fast dynamic programming, only use n-gram featuresbut to allow fast dynamic programming or MST parsing,only use single-edge featuresfind preferred linksfindpreferredtagsvan10How about structured outputs?but to allow fa

10、st dynamic programming or MST parsing,only use single-edge featuresfind preferred linksEdge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingIs this a good edge?yes, lots of green .Edge-Factored Parsers (McD

11、onald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingIs this a good edge?jasn den(“bright day)Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainth

12、estrikingIs this a good edge?jasn den(“bright day)jasn N(“bright NOUN)VAAANJNVCEdge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingIs this a good edge?jasn den(“bright day)jasn N(“bright NOUN)VAAANJNVCA NE

13、dge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingIs this a good edge?jasn den(“bright day)jasn N(“bright NOUN)VAAANJNVCA Npreceding conjunctionA NEdge-Factored Parsers (McDonald et al. 2005)Byljasnstuden

14、dubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingHow about this competing edge?VAAANJNVCnot as good, lots of red .Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestriki

15、ngHow about this competing edge?VAAANJNVCjasn hodiny(“bright clocks). undertrained .Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingHow about this competing edge?VAAANJNVCbyljasnstuddubndenahodiodbtinj

16、asn hodiny(“bright clocks”). undertrained .jasn hodi(“bright clock,stems only)Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingHow about this competing edge?VAAANJNVCjasn hodi(“bright clock,stems only)b

17、yljasnstuddubndenahodiodbtinAplural Nsingular jasn hodiny(“bright clocks”). undertrained .jasn hodiny(“bright clocks”). undertrained .Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingHow about this comp

18、eting edge?VAAANJNVCjasn hodi(“bright clock,stems only)byljasnstuddubndenahodiodbtinAplural Nsingular A N where N followsa conjunctionjasnEdge-Factored Parsers (McDonald et al. 2005)Bylstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingVAAANJNVCbyljasnstu

19、ddubndenahodiodbtinWhich edge is better?“bright day or “bright clocks?jasnEdge-Factored Parsers (McDonald et al. 2005)Bylstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingVAAANJNVCbyljasnstuddubndenahodiodbtinWhich edge is better?Score of an edge e = fea

20、tures(e)Standard algos valid parse with max total scoreour current weight vectorEdge-Factored Parsers (McDonald et al. 2005)Which edge is better?Score of an edge e = features(e)Standard algos valid parse with max total scoreour current weight vectorcant have both(one parent per word)cant have both(n

21、o crossing links)Cant have all three(no cycles)Thus, an edge may lose (or win) because of a consensus of other edges. OutlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding t

22、he best parse: Belief propagationExperimentsConclusionsNew!OldFinding Highest-Scoring ParseThe cat in the hat wore a stovepipe. ROOTConvert to context-free grammar (CFG)Then use dynamic programmingeach subtree is a linguistic constituent(here a noun phrase)ThecatinthehatworeastovepipeROOTlets vertic

23、ally stretch this graph drawingFinding Highest-Scoring Parseeach subtree is a linguistic constituent(here a noun phrase)ThecatinthehatworeastovepipeROOTso CKYs “grammar constant is no longer constant Convert to context-free grammar (CFG)Then use dynamic programmingCKY algorithm for CFG parsing is O(

24、n3)Unfortunately, O(n5) in this caseto score “cat wore link, not enough to know this is NPmust know its rooted at “catso expand nonterminal set by O(n): NPthe, NPcat, NPhat, .Finding Highest-Scoring Parseeach subtree is a linguistic constituent(here a noun phrase)ThecatinthehatworeastovepipeROOTConv

25、ert to context-free grammar (CFG)Then use dynamic programmingCKY algorithm for CFG parsing is O(n3)Unfortunately, O(n5) in this caseSolution: Use a different decomposition (Eisner 1996)Back to O(n3)Spans vs. constituentsTwo kinds of substring.Constituent of the tree: links to the rest only through i

26、ts headword (root).Span of the tree: links to the rest only through its endwords.The cat in the hat wore a stovepipe. ROOTThe cat in the hat wore a stovepipe. ROOTDecomposing a tree into spansThe cat in the hat wore a stovepipe. ROOTThe catwore a stovepipe. ROOTcat in the hat wore+in the hat worecat

27、 in+hat worein the hat+cat in the hat wore a stovepipe. ROOTFinding Highest-Scoring ParseConvert to context-free grammar (CFG)Then use dynamic programmingCKY algorithm for CFG parsing is O(n3)Unfortunately, O(n5) in this caseSolution: Use a different decomposition (Eisner 1996)Back to O(n3)Can play

28、usual tricks for dynamic programming parsingFurther refining the constituents or spans Allow prob. model to keep track of even more internal information A*, best-first, coarse-to-fineTraining by EM etc.require “outside” probabilitiesof constituents, spans, or links Hard Constraints on Valid TreesSco

29、re of an edge e = features(e)Standard algos valid parse with max total scoreour current weight vectorcant have both(one parent per word)cant have both(no crossing links)Cant have all three(no cycles)Thus, an edge may lose (or win) because of a consensus of other edges. talkNon-Projective Parsescant

30、have both(no crossing links)The “projectivity restriction.Do we really want it?IgiveaonbootstrappingtomorrowROOTllsubtree rooted at “talkis a discontiguous noun phraseNon-Projective ParsesistameamnoritgloriacanitiemROOTIgiveaonbootstrappingtalktomorrowROOTllthatNOMmyACCmay-knowgloryNOMgoing-grayACC

31、That glory may-know my going-gray (i.e., it shall last till I go gray)occasional non-projectivity in Englishfrequent non-projectivity in Latin, etc.Finding highest-scoring non-projective treeConsider the sentence “John saw Mary (left).The Chu-Liu-Edmonds algorithm finds the maximum-weight spanning t

32、ree (right) may be non-projective.Can be found in time O(n2).91030203001139rootJohnsawMary103030rootJohnsawMaryslide thanks to Dragomir RadevEvery node selects best parentIf cycles, contract them and repeatSumming over all non-projective treesFinding highest-scoring non-projective treeConsider the s

33、entence “John saw Mary (left).The Chu-Liu-Edmonds algorithm finds the maximum-weight spanning tree (right) may be non-projective.Can be found in time O(n2).How about total weight Z of all trees?How about outside probabilities or gradients?Can be found in time O(n3) by matrix determinants and inverse

34、s (Smith & Smith, 2007).slide thanks to Dragomir RadevGraph Theory to the Rescue!Tuttes Matrix-Tree Theorem (1948)The determinant of the Kirchoff (aka Laplacian) adjacency matrix of directed graph G without row and column r is equal to the sum of scores of all directed spanning trees of G rooted at

35、node r.Exactly the Z we need!O(n3) time!36Building the Kirchoff (Laplacian) MatrixNegate edge scoresSum columns (children)Strike root row/col.Take determinantN.B.: This allows multiple children of root, but see Koo et al. 2007.37Why Should This Work?Chu-Liu-Edmonds analogy:Every node selects best pa

36、rentIf cycles, contract and recurClear for 1x1 matrix; use inductionUndirected case; special root cases for directed38OutlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding t

37、he best parse: Belief propagationExperimentsConclusionsNew!Old3940Exactly Finding the Best ParseWith arbitrary features, runtime blows upProjective parsing: O(n3) by dynamic programmingNon-projective: O(n2) by minimum spanning treebut to allow fast dynamic programming or MST parsing,only use single-

38、edge featuresfind preferred linksO(n4)grandparentsO(n5)grandp.+ siblingbigrams O(n3g6)POS trigrams O(2n)sibling pairs (non-adjacent)NP-hardany of the above featuressoft penalties for crossing linkspretty much anything else!41Lets reclaim our freedom (again!)Output probability is a product of local f

39、actorsThrow in any factors we want! (log-linear model)How could we find best parse?Integer linear programming (Riedel et al., 2006)doesnt give us probabilities when training or parsingMCMCSlow to mix? High rejection rate because of hard TREE constraint?Greedy hill-climbing (McDonald & Pereira 2006)T

40、his paper in a nutshell(1/Z) * (A) * (B,A) * (C,A) * (C,B) * (D,A,B) * (D,B,C) * (D,A,C) * none of these exploittree structure of parsesas the first-order methods do42Lets reclaim our freedom (again!)Output probability is a product of local factorsThrow in any factors we want! (log-linear model)Let

41、local factors negotiate via “belief propagationLinks (and tags) reinforce or suppress one another Each iteration takes total time O(n2) or O(n3)Converges to a pretty good (but approx.) global parsecertain global factors ok tooeach global factor can be handled fast via some traditional parsing algori

42、thm (e.g., inside-outside)This paper in a nutshell(1/Z) * (A) * (B,A) * (C,A) * (C,B) * (D,A,B) * (D,B,C) * (D,A,C) * Lets reclaim our freedom (again!)Training with many featuresDecoding with many featuresIterative scalingBelief propagationEach weight in turn is influenced by othersEach variable in

43、turn is influenced by othersIterate to achieve globally optimal weightsIterate to achievelocally consistent beliefsTo train distrib. over trees, use dynamic programming to compute normalizer ZTo decode distrib. over trees, use dynamic programming to compute messagesThis paper in a nutshellNew!Outlin

44、eEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding the best parse: Belief propagationExperimentsConclusionsNew!Old44First, a familiar exampleConditional Random Field (CRF) for

45、POS tagging45Local factors in a graphical modelfindpreferredtagsvvvPossible tagging (i.e., assignment to remaining variables)Observed input sentence (shaded)46Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsvanPossible tagging

46、 (i.e., assignment to remaining variables)Another possible taggingObserved input sentence (shaded)47Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsvnav 021n210a031vnav 021n210a031Binary factor that measures compatibility of 2

47、 adjacent tagsModel reusessame parametersat this position48Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsv 0.2n0.2a0“Unary factor evaluates this tagIts values depend on corresponding wordcant be adjv 0.2n0.2a049Local factors

48、 in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsv 0.2n0.2a0“Unary factor evaluates this tagIts values depend on corresponding word(could be made to depend onentire observed sentence)50Local factors in a graphical modelFirst, a familiar exa

49、mpleConditional Random Field (CRF) for POS taggingfindpreferredtagsv 0.2n0.2a0“Unary factor evaluates this tagDifferent unary factor at each positionv 0.3n0.02a0v 0.3n0a0.151Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsvnav

50、 021n210a031v 0.3n0.02a0vnav 021n210a031v 0.3n0a0.1v 0.2n0.2a0vanp(v a n) is proportionalto the product of allfactors values on v a n52Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsvnav 021n210a031v 0.3n0.02a0vnav 021n210a03

51、1v 0.3n0a0.1v 0.2n0.2a0van= 1*3*0.3*0.1*0.2 p(v a n) is proportionalto the product of allfactors values on v a n53First, a familiar exampleCRF for POS taggingNow lets do dependency parsing!O(n2) boolean variables for the possible linksvanLocal factors in a graphical modelfindpreferredlinksFirst, a f

52、amiliar exampleCRF for POS taggingNow lets do dependency parsing!O(n2) boolean variables for the possible links54Local factors in a graphical modelfindpreferredlinkstfftffPossible parse encoded as an assignment to these varsvanFirst, a familiar exampleCRF for POS taggingNow lets do dependency parsin

53、g!O(n2) boolean variables for the possible links55Local factors in a graphical modelfindpreferredlinksfftftfPossible parse encoded as an assignment to these varsAnother possible parsevanFirst, a familiar exampleCRF for POS taggingNow lets do dependency parsing!O(n2) boolean variables for the possibl

54、e links(cycle)56Local factors in a graphical modelfindpreferredlinksftttfPossible parse encoded as an assignment to these varsAnother possible parseAn illegal parsevanfFirst, a familiar exampleCRF for POS taggingNow lets do dependency parsing!O(n2) boolean variables for the possible links(cycle)57Lo

55、cal factors in a graphical modelfindpreferredlinkstttPossible parse encoded as an assignment to these varsAnother possible parseAn illegal parseAnother illegal parsevant(multiple parents)ffSo what factors shall we multiply to define parse probability?Unary factors to evaluate each link in isolation5

56、8Local factors for parsingfindpreferredlinkst 2f1t 1f2t 1f2t 1f6t 1f3as before, goodness of this link can depend on entireobserved input contextt 1f8some other linksarent as goodgiven this inputsentenceBut what if the best assignment isnt a tree?59Global factors for parsingSo what factors shall we m

57、ultiply to define parse probability?Unary factors to evaluate each link in isolationGlobal TREE factor to require that the links form a legal treethis is a “hard constraint: factor is either 0 or 1findpreferredlinksffffff0ffffft0fffftf0fftfft1tttttt0So what factors shall we multiply to define parse

58、probability?Unary factors to evaluate each link in isolationGlobal TREE factor to require that the links form a legal treethis is a “hard constraint: factor is either 0 or 160Global factors for parsingfindpreferredlinksffffff0ffffft0fffftf0fftfft1tttttt0tfftff64 entries (0/1)So far, this is equivale

59、nt toedge-factored parsing(McDonald et al. 2005).Note: McDonald et al. (2005) dont loop through this table to consider exponentially many trees one at a time.They use combinatorial algorithms; so should we!optionally require the tree to be projective (no crossing links)werelegal!So what factors shal

60、l we multiply to define parse probability?Unary factors to evaluate each link in isolationGlobal TREE factor to require that the links form a legal treethis is a “hard constraint: factor is either 0 or 1Second-order effects: factors on 2 variablesgrandparent61Local factors for parsingfindpreferredli

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論