版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)
文檔簡介
1、1David A. Smith (JHU UMass Amherst)Jason Eisner (Johns Hopkins University)Dependency Parsingby Belief PropagationOutlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding the be
2、st parse: Belief propagationExperimentsConclusionsNew!OldOutlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding the best parse: Belief propagationExperimentsConclusionsNew!Ol
3、dMODWord Dependency ParsingHe reckons the current account deficit will narrow to only 1.8 billion in September.Raw sentencePart-of-speech tagging He reckons the current account deficit will narrow to only 1.8 billion in September. PRP VBZ DT JJ NN NN MD VB TO RB CD CD IN NNP .POS-tagged sentenceWord
4、 dependency parsingslide adapted from Yuji MatsumotoWord dependency parsed sentenceHe reckons the current account deficit will narrow to only billion in September .SUBJROOTS-COMPSUBJSPECMODMODCOMPCOMPWhat does parsing have to do with belief propagation?loopy belief propagationbeliefloopypropagationO
5、utlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding the best parse: Belief propagationExperimentsConclusionsNew!Old7Great ideas in NLP: Log-linear models (Berger, della Pie
6、tra, della Pietra 1996; Darroch & Ratcliff 1972)In the beginning, we used generative models.p(A) * p(B | A) * p(C | A,B) * p(D | A,B,C) * each choice depends on a limited part of the historybut which dependencies to allow?what if theyre all worthwhile?p(D | A,B,C)?p(D | A,B,C)? p(D | A,B) * p(C | A,
7、B,D)?8Great ideas in NLP: Log-linear models (Berger, della Pietra, della Pietra 1996; Darroch & Ratcliff 1972)In the beginning, we used generative models.Solution: Log-linear (max-entropy) modelingFeatures may interact in arbitrary waysIterative scaling keeps adjusting the feature weightsuntil the m
8、odel agrees with the training data.p(A) * p(B | A) * p(C | A,B) * p(D | A,B,C) * which dependencies to allow? (given limited training data)(1/Z) * (A) * (B,A) * (C,A) * (C,B) * (D,A,B) * (D,B,C) * (D,A,C) * throw them all in!Log-linear models great for n-way classificationAlso good for predicting se
9、quencesAlso good for dependency parsing9How about structured outputs?but to allow fast dynamic programming, only use n-gram featuresbut to allow fast dynamic programming or MST parsing,only use single-edge featuresfind preferred linksfindpreferredtagsvan10How about structured outputs?but to allow fa
10、st dynamic programming or MST parsing,only use single-edge featuresfind preferred linksEdge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingIs this a good edge?yes, lots of green .Edge-Factored Parsers (McD
11、onald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingIs this a good edge?jasn den(“bright day)Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainth
12、estrikingIs this a good edge?jasn den(“bright day)jasn N(“bright NOUN)VAAANJNVCEdge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingIs this a good edge?jasn den(“bright day)jasn N(“bright NOUN)VAAANJNVCA NE
13、dge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingIs this a good edge?jasn den(“bright day)jasn N(“bright NOUN)VAAANJNVCA Npreceding conjunctionA NEdge-Factored Parsers (McDonald et al. 2005)Byljasnstuden
14、dubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingHow about this competing edge?VAAANJNVCnot as good, lots of red .Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestriki
15、ngHow about this competing edge?VAAANJNVCjasn hodiny(“bright clocks). undertrained .Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingHow about this competing edge?VAAANJNVCbyljasnstuddubndenahodiodbtinj
16、asn hodiny(“bright clocks”). undertrained .jasn hodi(“bright clock,stems only)Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingHow about this competing edge?VAAANJNVCjasn hodi(“bright clock,stems only)b
17、yljasnstuddubndenahodiodbtinAplural Nsingular jasn hodiny(“bright clocks”). undertrained .jasn hodiny(“bright clocks”). undertrained .Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingHow about this comp
18、eting edge?VAAANJNVCjasn hodi(“bright clock,stems only)byljasnstuddubndenahodiodbtinAplural Nsingular A N where N followsa conjunctionjasnEdge-Factored Parsers (McDonald et al. 2005)Bylstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingVAAANJNVCbyljasnstu
19、ddubndenahodiodbtinWhich edge is better?“bright day or “bright clocks?jasnEdge-Factored Parsers (McDonald et al. 2005)Bylstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingVAAANJNVCbyljasnstuddubndenahodiodbtinWhich edge is better?Score of an edge e = fea
20、tures(e)Standard algos valid parse with max total scoreour current weight vectorEdge-Factored Parsers (McDonald et al. 2005)Which edge is better?Score of an edge e = features(e)Standard algos valid parse with max total scoreour current weight vectorcant have both(one parent per word)cant have both(n
21、o crossing links)Cant have all three(no cycles)Thus, an edge may lose (or win) because of a consensus of other edges. OutlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding t
22、he best parse: Belief propagationExperimentsConclusionsNew!OldFinding Highest-Scoring ParseThe cat in the hat wore a stovepipe. ROOTConvert to context-free grammar (CFG)Then use dynamic programmingeach subtree is a linguistic constituent(here a noun phrase)ThecatinthehatworeastovepipeROOTlets vertic
23、ally stretch this graph drawingFinding Highest-Scoring Parseeach subtree is a linguistic constituent(here a noun phrase)ThecatinthehatworeastovepipeROOTso CKYs “grammar constant is no longer constant Convert to context-free grammar (CFG)Then use dynamic programmingCKY algorithm for CFG parsing is O(
24、n3)Unfortunately, O(n5) in this caseto score “cat wore link, not enough to know this is NPmust know its rooted at “catso expand nonterminal set by O(n): NPthe, NPcat, NPhat, .Finding Highest-Scoring Parseeach subtree is a linguistic constituent(here a noun phrase)ThecatinthehatworeastovepipeROOTConv
25、ert to context-free grammar (CFG)Then use dynamic programmingCKY algorithm for CFG parsing is O(n3)Unfortunately, O(n5) in this caseSolution: Use a different decomposition (Eisner 1996)Back to O(n3)Spans vs. constituentsTwo kinds of substring.Constituent of the tree: links to the rest only through i
26、ts headword (root).Span of the tree: links to the rest only through its endwords.The cat in the hat wore a stovepipe. ROOTThe cat in the hat wore a stovepipe. ROOTDecomposing a tree into spansThe cat in the hat wore a stovepipe. ROOTThe catwore a stovepipe. ROOTcat in the hat wore+in the hat worecat
27、 in+hat worein the hat+cat in the hat wore a stovepipe. ROOTFinding Highest-Scoring ParseConvert to context-free grammar (CFG)Then use dynamic programmingCKY algorithm for CFG parsing is O(n3)Unfortunately, O(n5) in this caseSolution: Use a different decomposition (Eisner 1996)Back to O(n3)Can play
28、usual tricks for dynamic programming parsingFurther refining the constituents or spans Allow prob. model to keep track of even more internal information A*, best-first, coarse-to-fineTraining by EM etc.require “outside” probabilitiesof constituents, spans, or links Hard Constraints on Valid TreesSco
29、re of an edge e = features(e)Standard algos valid parse with max total scoreour current weight vectorcant have both(one parent per word)cant have both(no crossing links)Cant have all three(no cycles)Thus, an edge may lose (or win) because of a consensus of other edges. talkNon-Projective Parsescant
30、have both(no crossing links)The “projectivity restriction.Do we really want it?IgiveaonbootstrappingtomorrowROOTllsubtree rooted at “talkis a discontiguous noun phraseNon-Projective ParsesistameamnoritgloriacanitiemROOTIgiveaonbootstrappingtalktomorrowROOTllthatNOMmyACCmay-knowgloryNOMgoing-grayACC
31、That glory may-know my going-gray (i.e., it shall last till I go gray)occasional non-projectivity in Englishfrequent non-projectivity in Latin, etc.Finding highest-scoring non-projective treeConsider the sentence “John saw Mary (left).The Chu-Liu-Edmonds algorithm finds the maximum-weight spanning t
32、ree (right) may be non-projective.Can be found in time O(n2).91030203001139rootJohnsawMary103030rootJohnsawMaryslide thanks to Dragomir RadevEvery node selects best parentIf cycles, contract them and repeatSumming over all non-projective treesFinding highest-scoring non-projective treeConsider the s
33、entence “John saw Mary (left).The Chu-Liu-Edmonds algorithm finds the maximum-weight spanning tree (right) may be non-projective.Can be found in time O(n2).How about total weight Z of all trees?How about outside probabilities or gradients?Can be found in time O(n3) by matrix determinants and inverse
34、s (Smith & Smith, 2007).slide thanks to Dragomir RadevGraph Theory to the Rescue!Tuttes Matrix-Tree Theorem (1948)The determinant of the Kirchoff (aka Laplacian) adjacency matrix of directed graph G without row and column r is equal to the sum of scores of all directed spanning trees of G rooted at
35、node r.Exactly the Z we need!O(n3) time!36Building the Kirchoff (Laplacian) MatrixNegate edge scoresSum columns (children)Strike root row/col.Take determinantN.B.: This allows multiple children of root, but see Koo et al. 2007.37Why Should This Work?Chu-Liu-Edmonds analogy:Every node selects best pa
36、rentIf cycles, contract and recurClear for 1x1 matrix; use inductionUndirected case; special root cases for directed38OutlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding t
37、he best parse: Belief propagationExperimentsConclusionsNew!Old3940Exactly Finding the Best ParseWith arbitrary features, runtime blows upProjective parsing: O(n3) by dynamic programmingNon-projective: O(n2) by minimum spanning treebut to allow fast dynamic programming or MST parsing,only use single-
38、edge featuresfind preferred linksO(n4)grandparentsO(n5)grandp.+ siblingbigrams O(n3g6)POS trigrams O(2n)sibling pairs (non-adjacent)NP-hardany of the above featuressoft penalties for crossing linkspretty much anything else!41Lets reclaim our freedom (again!)Output probability is a product of local f
39、actorsThrow in any factors we want! (log-linear model)How could we find best parse?Integer linear programming (Riedel et al., 2006)doesnt give us probabilities when training or parsingMCMCSlow to mix? High rejection rate because of hard TREE constraint?Greedy hill-climbing (McDonald & Pereira 2006)T
40、his paper in a nutshell(1/Z) * (A) * (B,A) * (C,A) * (C,B) * (D,A,B) * (D,B,C) * (D,A,C) * none of these exploittree structure of parsesas the first-order methods do42Lets reclaim our freedom (again!)Output probability is a product of local factorsThrow in any factors we want! (log-linear model)Let
41、local factors negotiate via “belief propagationLinks (and tags) reinforce or suppress one another Each iteration takes total time O(n2) or O(n3)Converges to a pretty good (but approx.) global parsecertain global factors ok tooeach global factor can be handled fast via some traditional parsing algori
42、thm (e.g., inside-outside)This paper in a nutshell(1/Z) * (A) * (B,A) * (C,A) * (C,B) * (D,A,B) * (D,B,C) * (D,A,C) * Lets reclaim our freedom (again!)Training with many featuresDecoding with many featuresIterative scalingBelief propagationEach weight in turn is influenced by othersEach variable in
43、turn is influenced by othersIterate to achieve globally optimal weightsIterate to achievelocally consistent beliefsTo train distrib. over trees, use dynamic programming to compute normalizer ZTo decode distrib. over trees, use dynamic programming to compute messagesThis paper in a nutshellNew!Outlin
44、eEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding the best parse: Belief propagationExperimentsConclusionsNew!Old44First, a familiar exampleConditional Random Field (CRF) for
45、POS tagging45Local factors in a graphical modelfindpreferredtagsvvvPossible tagging (i.e., assignment to remaining variables)Observed input sentence (shaded)46Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsvanPossible tagging
46、 (i.e., assignment to remaining variables)Another possible taggingObserved input sentence (shaded)47Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsvnav 021n210a031vnav 021n210a031Binary factor that measures compatibility of 2
47、 adjacent tagsModel reusessame parametersat this position48Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsv 0.2n0.2a0“Unary factor evaluates this tagIts values depend on corresponding wordcant be adjv 0.2n0.2a049Local factors
48、 in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsv 0.2n0.2a0“Unary factor evaluates this tagIts values depend on corresponding word(could be made to depend onentire observed sentence)50Local factors in a graphical modelFirst, a familiar exa
49、mpleConditional Random Field (CRF) for POS taggingfindpreferredtagsv 0.2n0.2a0“Unary factor evaluates this tagDifferent unary factor at each positionv 0.3n0.02a0v 0.3n0a0.151Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsvnav
50、 021n210a031v 0.3n0.02a0vnav 021n210a031v 0.3n0a0.1v 0.2n0.2a0vanp(v a n) is proportionalto the product of allfactors values on v a n52Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsvnav 021n210a031v 0.3n0.02a0vnav 021n210a03
51、1v 0.3n0a0.1v 0.2n0.2a0van= 1*3*0.3*0.1*0.2 p(v a n) is proportionalto the product of allfactors values on v a n53First, a familiar exampleCRF for POS taggingNow lets do dependency parsing!O(n2) boolean variables for the possible linksvanLocal factors in a graphical modelfindpreferredlinksFirst, a f
52、amiliar exampleCRF for POS taggingNow lets do dependency parsing!O(n2) boolean variables for the possible links54Local factors in a graphical modelfindpreferredlinkstfftffPossible parse encoded as an assignment to these varsvanFirst, a familiar exampleCRF for POS taggingNow lets do dependency parsin
53、g!O(n2) boolean variables for the possible links55Local factors in a graphical modelfindpreferredlinksfftftfPossible parse encoded as an assignment to these varsAnother possible parsevanFirst, a familiar exampleCRF for POS taggingNow lets do dependency parsing!O(n2) boolean variables for the possibl
54、e links(cycle)56Local factors in a graphical modelfindpreferredlinksftttfPossible parse encoded as an assignment to these varsAnother possible parseAn illegal parsevanfFirst, a familiar exampleCRF for POS taggingNow lets do dependency parsing!O(n2) boolean variables for the possible links(cycle)57Lo
55、cal factors in a graphical modelfindpreferredlinkstttPossible parse encoded as an assignment to these varsAnother possible parseAn illegal parseAnother illegal parsevant(multiple parents)ffSo what factors shall we multiply to define parse probability?Unary factors to evaluate each link in isolation5
56、8Local factors for parsingfindpreferredlinkst 2f1t 1f2t 1f2t 1f6t 1f3as before, goodness of this link can depend on entireobserved input contextt 1f8some other linksarent as goodgiven this inputsentenceBut what if the best assignment isnt a tree?59Global factors for parsingSo what factors shall we m
57、ultiply to define parse probability?Unary factors to evaluate each link in isolationGlobal TREE factor to require that the links form a legal treethis is a “hard constraint: factor is either 0 or 1findpreferredlinksffffff0ffffft0fffftf0fftfft1tttttt0So what factors shall we multiply to define parse
58、probability?Unary factors to evaluate each link in isolationGlobal TREE factor to require that the links form a legal treethis is a “hard constraint: factor is either 0 or 160Global factors for parsingfindpreferredlinksffffff0ffffft0fffftf0fftfft1tttttt0tfftff64 entries (0/1)So far, this is equivale
59、nt toedge-factored parsing(McDonald et al. 2005).Note: McDonald et al. (2005) dont loop through this table to consider exponentially many trees one at a time.They use combinatorial algorithms; so should we!optionally require the tree to be projective (no crossing links)werelegal!So what factors shal
60、l we multiply to define parse probability?Unary factors to evaluate each link in isolationGlobal TREE factor to require that the links form a legal treethis is a “hard constraint: factor is either 0 or 1Second-order effects: factors on 2 variablesgrandparent61Local factors for parsingfindpreferredli
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 護欄圍墻銷售合同范文
- 2024年社區(qū)老年大學(xué)教師勞動合同3篇
- 2024年金融科技解決方案采購合同
- 2024年環(huán)保合作:項目節(jié)能減排合同3篇
- 2024年高端別墅門套安裝與設(shè)計合同范本3篇
- 2025年度充電樁車位租賃及電動汽車充電服務(wù)合同3篇
- 2025年度智能電網(wǎng)改造項目中變壓器安裝及配套設(shè)備采購合同
- 2025版花崗巖石材礦山開采與出口采購合同3篇
- 2024年物流配送小程序定制開發(fā)與優(yōu)化合同3篇
- 2024年酒水供貨協(xié)議
- (八省聯(lián)考)云南省2025年普通高校招生適應(yīng)性測試 物理試卷(含答案解析)
- 【8地RJ期末】安徽省合肥市肥西縣2023-2024學(xué)年八年級上學(xué)期期末考試地理試題(含解析)
- 2024年中國干粉涂料市場調(diào)查研究報告
- 2024年副班主任工作總結(jié)(3篇)
- 課題申報書:古滇青銅文化基因圖譜構(gòu)建及活態(tài)深化研究
- GB/T 44979-2024智慧城市基礎(chǔ)設(shè)施緊湊型城市智慧交通
- 統(tǒng)編版2024-2025學(xué)年第一學(xué)期四年級語文期末學(xué)業(yè)質(zhì)量監(jiān)測試卷(含答案)
- 北師大版七年級上冊數(shù)學(xué)期末考試試題附答案
- 2024年城鄉(xiāng)學(xué)校結(jié)對幫扶工作總結(jié)范例(3篇)
- 房地產(chǎn)法律風(fēng)險防范手冊
- 理論力學(xué)知到智慧樹章節(jié)測試課后答案2024年秋浙江大學(xué)
評論
0/150
提交評論