




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
1、When Corpus Meets TheoryJames PustejovskyTSD 2002 September 10, 2002Models and DataTalk OutlineGoals for Language ModelingThe Role of Corpus in TheoryDisambiguationSelection discoveryClusteringCategory modification and formationGrammar inductionThe Role of Theory in CorpusGoals of Language ModelingS
2、tatistically informed models improve application performanceSpeechSearchClusteringParsingMachine translationSummarizationQuestion answeringTheory Drives the ModelCorpus Behavior of words is determined by their type.You cant find what you cant model. But, you dont want to find only what you model! Th
3、eory allows a model of reality, but Corpus brings reality to the model.Language Modeling with Generative LexiconSelection integrates paradigmatics and syntagmaticsModels the relationship between selectional contextsCoercion in typingComplex type (Dot Objects)All major categories behave functionallyQ
4、ualia structure models much of this behaviorSemantic Types are differentiated and ranked:Grammatical behavior follows (generally) from typeQuines Gambit in CorporaCo-occurrence reveals surface relations.Paradigmatics is first order.Syntagmatics is first order.LSA and other techniques create non-supe
5、rficial associations. Model Bias is necessary to create decision procedures Example: Complex TypesRecognizing Selection1. a. The man fell/died. b. The rock fell/!died.a. John forced/!convinced the door to open.b. John forced/convinced the guests to leave.a. John poured milk into /!on his coffee.b. J
6、ohn poured milk into/on the bowl.Modeling Paradigmatic SystemsIntegrating Selection into Grammars Qualia are used to create new types: They are generative coherence relations between types.Qualia StructureThree Ranks of TypeEntitiesEventsSystem of Generating TypesQualia are incorporated into Type It
7、selfQualia as TypesFunctional SelectionFunctional Type CoercionCo-compositionCoercion in Function CompositionSelection and CoercionType SpecificationType Determines Grammatical BehaviorBehavior is measurable in corpusCorpus Distribution of different types should correlate strongly with their type.Co
8、rpus Analysis provides probable values for CoercionDrinking, sipping, cooling,?pouring,?spilling, Complements of “begin in AP:(Pustejovsky and Rooth, 1991 ms)Complements of “veto in APLimitations of this Approach:Fuzzy SelectionDependencies that require Models:Complex TypesComplex TypesContexts Intr
9、oducing Complex Typesa. John read the story/the book.b. John told the story/!the book. 2. Mary read the subway wall. When Paradigmatic systems are modeled, Syntagmatic Processes are affectedThe specificity of argument selection by a predicate;The treatment of verbal polysemy and multiple subcategori
10、zation The treatment of type mismatches and the semantics of solidaritiesTypes of PropertiesNatural Binary PredicatePolar Predicateshot/coldbig/smallshort/tallclean/dirtyLexical AsymmetriesPreferences and Defaults: clean/dirty, empty/full, pretty/uglyLexical Gaps:bald/(hairy), toothless/(toothed)Lex
11、ical Perfectives:dead/aliveSortal Opposition:External Negation points up in the Type system:Internal Negation points down in the Type System:(1) a. Rocks are not alive. b. !Rocks are dead. (2) a. The Pope is not married.b. !The Pope is a bachelor.(3) a. Bill did not run the race.b. Hence, Bill did n
12、ot win the race.c. !Bill lost the race.Case Study I: Corpus Drives Lexical AcquisitionText Mining the Biobibliome40,000 papers published each month in Medline11 million abstracts currently in Medline Database 36 GB of textRobust Extraction of Relations from Biomedical TextsStatistical techniques are
13、 too course-grained“SU6656 does not inhibit the PDGF receptor.Local Named Entity Extraction is not informative enough“This protein binds to Src.Bag of words and bag of entities approaches are too weak“p16 inhibits Cdk4.“Cdk4 is inhibited by p16.Parsing MethodologyIdentify Targets of InterestEntities
14、 and relations to be extractedPerform Corpus Analysis over targetsCluster corpus occurrences by syntactic behavior and semantic typeGenerate Patterns for extractionTest and modify patterns against development corpusPossible Selectional Frames “p16 inhibits Cdk4. (entity,entity) “p16 inhibits cell gr
15、owth. (entity,process) “Methylation inhibits HDAC1. (process,entity) “Cell growth inhibits apoptosis. (process,process)Corpus Pattern AnalysisCreate concordances over target elementsAutomatically cluster complementation patternsSemi-automatically verify patterns and amend grammar rules accordingly.G
16、etting the Lexicon out of the CorpusPreliminary examination of the textSort concordances according to semantics patternsOne-sense-per-domain doesnt cut itComplementation patterns emerge from the corpus, with and without realizationSemantic patterns are a first step towards identifying lexical setsSe
17、mantic patterns identified with specific lexical sets yields co-specifications Implicatures can be identified with co-specifications for a very high proportion of uses of all predicators.Corpus-derived Grammars distinguish Textual FunctionTensed Sentence-based relational information conveys new info
18、rmation. A peptide representing the carboxyl-terminal tail of the met receptor inhibits kinase activity.Nominalization functions to:Allow further predication and modification;Bridge the new information with acceptance as given. Provide economy of expression in text;Agentive Nominal conveys a relatio
19、n as a given fact. The protein kinase C inhibitor staurosporine , inhibited actin assemblyProbable Syntactic Patterns: Sentential Forms A peptide representing the carboxyl-terminal tail of the met receptor inhibits kinase activity. Whereas phosphorylation of the IRK by ATP is inhibited by the nonhyd
20、rolyzable competitor adenylyl-imidodiphosphate, . The Met tail peptide inhibits the closely related Ron receptor but does not affect Although the ability of individual trichothecenes to inhibit protein synthesis and activate JNK/p38 kinases are dissociable , both effects contribute to the induction
21、of apoptosis . Probable Syntactic Patterns: Nominal Forms 12S E1A , an inhibitor of p300-dependent transcription , reduces the binding of TFIIB , but not that of cyclin E- Cdk2 , to p300. The protein kinase C inhibitor staurosporine , inhibited actin assembly and platelet aggregation induced by thro
22、mbin or PMA.Probable Syntactic Patterns: NominalizationsStructural basis for inhibition of protein tyrosine phosphatases by Keggin compounds phosphomolybdate and phosphotungstate. Previous reports raised question as to whether 8-Cl-cAMP is a prodrug for its metabolite, 8-Cl-adenosine which exerts gr
23、owth inhibition in a broad spectrum of cancer cells.Case Study II: Theory Drives Corpus AnalysisSemantic RerenderingA general technique for adapting and modifying an existing ontologyTypes are extended and created through: corpus analysis of patterns implicated with type structuresAd hoc database pr
24、ojections over a relational databaseSpecialized Ontologies in the Biomedical DomainThe UMLS from National Library of Medicinewide coverageshallow semantic type structure 180,998 instances of Amino Acid, Peptide, or Protein in UMLS Chemical Viewed Functionally and Chemical Viewed StructurallyThese 2
25、subtrees cover a large number of all types in the UMLSThe UMLS gives semantic type bindings to 1.5 million entitiesNLP Applications using Semantic TypingStatistical Categorization and Disambiguation TasksResolution of Prepositional AttachmentRelations between Constituents in Nominal CompoundsGeneral
26、izing across semantic classes = make up for the sparseness of dataIR Tasks Query ReformulationFiltering & Ranking of Retrieved Results Information Extraction TasksCoreference ResolutionRelation Extraction (via Anaphora Resolution) Entity IdentificationGL as Modeling Bias in RerenderingStructural sub
27、typing (Formal)Functional subtyping (Telic)Activation relations (Agentive)Molecular analysis (Const)Syntactic Rerendering Algorithm (I)Syntactic Rerendering Algorithm (II)Syntactic Rerendering Algorithm (III)Evaluating ResultsComparison against Existing Ontologiesoverlap with Gene Ontology (GO) for
28、select categoriesReceptor: 17.5% of 2nd level extension phrases are in GOImproved P&R for the client NLP ApplicationsCoreference Resolution ApplicationSortal Anaphora:“the enzyme, “the protease, “the same solvent, etc.Derivation of Instances for the Proposed SubtypesSyntactic templates (inhibitor, s
29、olvent) :definitional constructions: “X is a Y inhibitoraliasing constructions: “X (the solvent)appositions: “X, the inhibitor of Y,nominal compounds: “the solvent Xenumerations: “the following solvents: X, Y, .relative clausesadjuncts: “X and Y as solventsSemantic (Database) RerenderingDatabase of
30、relations extracted from the Medline corpusinhibit, block, phosphorylateTyped projection from relations table induces an ad hoc category subtype of T1X = X : T1| R(X,Y) T1UMLS1Syntactic vs. Semantic RerenderingSortals with no corresponding relational form solventSortal and relation predicatesinhibit
31、or/inhibit kinase/phosphorylateRelation predicates with no corresponding nominal formsbind withincreaseSyntactic vs. Semantic Rerendering (II)Overlap of derived subtypesCDK inhibitorp21(WAF-1) inhibited CDK2 and CDK4Recover different types of informationSyntactic templates for sortal predicates : ol
32、d informationTyped projections of database relations : new informationCase Study III: Applying Lexical Semantic Knowledge TERQAS: Time and Event Recognition for Question Answering SystemsRelevance to Question Answering SystemsIs Gates currently CEO of Microsoft? Were there any meetings between the t
33、errorist hijackers and Iraq before the WTC event?Did the Enron merger with Dynegy take place?How long did the hostage situation in Beirut last? When did the war between Iran and Iraq end? When did John Sununu travel to a fundraiser for John Ashcroft? How many Tutsis were killed by Hutus in Rwanda in
34、 1994? Who was Secretary of Defense during the Gulf War? What was the largest U.S. military operation since Vietnam? When did the astronauts return from the space station on the last shuttle flight?Questions over TIMBANK CorpusWorkshop Goals TimeML: Define and Design a Metadata Standard for Markup o
35、f events, their temporal anchoring, and how they are related to each other in News articles. TIMEBANK: Given the specification of TimeML, create a gold standard corpus of 300 articles marked up for temporal expressions, events, and basic temporal relations. TERQAS ParticipantsJames Pustejovsky, PIRo
36、b GaizauskasGraham KatzBob Ingria Jos CastaoInderjeet ManiAntonio SanfilippoDragomir RadevPatrick HanksMarc VerhagenBeth SundheimAndrea SetzerJerry HobbsBran BoguraevAndy LattoJohn FrankLisa FerroMarcia LazoRoser SaurAnna RumshiskyDavid DayLuc BelangerHarry WuAndrew SeeSupported by How TimeML Differ
37、s from Previous MarkupsExtends TIMEX2 annotation;Temporal Functions: three years agoAnchors to events and other temporal expressions: Identifies signals determining interpretation of temporal expressions;Temporal Prepositions: for, during, on, at;Temporal Connectives: before, after, while.Identifies
38、 event expressions; tensed verbs; has left, was captured, will resign;stative adjectives; sunken, stalled, on board;event nominals; merger, Military Operation, Gulf War;Creates dependencies between events and times:Anchoring; John left on Monday.Orderings; The party happened after midnight.Embedding
39、; John said Mary left.attributes := eid class tense aspect eid := IDeid := EventIDEventID := eclass := OCCURRENCE | PERCEPTION | REPORTING | ASPECTUAL | STATE | I_STATE | I_ACTION | MODALtense := PAST | PRESENT | FUTURE | NONEaspect := PROGRESSIVE | PERFECTIVE | PERFECTIVE_PROGRESSIVE | NONETimeML E
40、vent ClassesOccurrence: die, crash, build, merge, sell, take advantage of, .State:Be on board, kidnapped, recovering, love, .Reporting:Say, report, announce, I-Action:Attempt, try,promise, offerI-State:Believe, intend, want, Aspectual:begin, start, finish, stop, continue.Perception:See, hear, watch,
41、 feel.The young industrys rapid growth also is attracting regulators eager to police its many facets. The young industrys rapid growth also is attracting regulators eager to police its many facets.Israel will ask the United States to delay a military strike against Iraq until the Jewish state is ful
42、ly prepared for a possible Iraqi attack. Israel will askthe United States to delay a military strike against Iraq until the Jewish state is fullypreparedfor a possible IraqiattackFully Specified Temporal ExpressionsJune 11, 1989Summer, 2002Underspecified Temporal ExpressionsMondayNext monthLast year
43、Two days agoDurationsThree monthsTwo yearsfunctionInDocument allows for relative anchoring of temporal expression valuesTLINKTLINK or Temporal Link represents the temporal relationship holding between events or between an event and a time, and establishes a link between the involved entities, making
44、 explicit if they are: Simultaneous (happening at the same time)Identical: (referring to the same event)John drove to Boston. During his drive he ate a donut. 3. One before the other: The police looked into the slayings of 14 women. In six of the cases suspects have already been arrested.4. One afte
45、r the other: 5. One immediately before the other: All passengers died when the plane crashed into the mountain. 6.One immediately after than the other: 7.One including the other: John arrived in Boston last Thursday.8.One being included in the other: 9.One holding during the duration of the other: 1
46、0.One being the beginning of the other: John was in the gym between 6:00 p.m. and 7:00 p.m.11.One being begun by the other: 12.One being the ending of the other: John was in the gym between 6:00 p.m. and 7:00 p.m. 13.One being ended by the other: SLINKSLINK or Subordination Link is used for contexts
47、 introducing relations between two events, or an event and a signal, of the following sort: 1. Modal: Relation introduced mostly by modal verbs (should, could, would, etc.) and events that introduce a reference to a possible world -mainly I_STATEs: John should have bought some wine. Mary wanted John
48、 to buy some wine. 2. Factive: Certain verbs introduce an entailment (or presupposition) of the arguments veracity. They include forget in the tensed complement, regret, manage: John forgot that he was in Boston last year. Mary regrets that she didnt marry John. John managed to leave the party. 3. C
49、ounterfactive: The event introduces a presupposition about the non-veracity of its argument: forget (to), unable to (in past tense), prevent, cancel, avoid, decline, etc. John forgot to buy some wine. Mary was unable to marry John. John prevented the divorce. 4. Evidential: Evidential relations are
50、introduced by REPORTING or PERCEPTION: John said he bought some wine. Mary saw John carrying only beer. 5. Negative evidential: Introduced by REPORTING (and PERCEPTION?) events conveying negative polarity: John denied he bought only beer. 6. Negative: Introduced only by negative particles (not, nor, neither, etc.), which will be marked as SIGNALs, with respect to the events they are modifying: John didnt forgot to buy some wine. John did not wanted to marry Mary. ALINKALINK or Aspectual Link represent the relatio
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 幼兒工作計劃 個人
- 2025股權(quán)轉(zhuǎn)讓專項法律服務(wù)合同
- 2025版辦公室裝修合同范本
- 2025二手商品交易合同書模板
- 混凝土強度驗收
- 2025設(shè)備租賃合同示范文本
- 寶馬三系改色施工方案
- 凍品采購合同樣本
- 行業(yè)發(fā)展計劃推動技術(shù)創(chuàng)新
- 低價沙發(fā)轉(zhuǎn)讓合同樣本
- 傳染病防治知識和技能培訓(xùn)計劃
- 【MOOC】書法鑒賞-浙江傳媒學(xué)院 中國大學(xué)慕課MOOC答案
- 水利工程資料員培訓(xùn)課件
- 《史記》《漢書》第九-整本書閱讀《經(jīng)典常談》名著閱讀與練習(xí)
- 環(huán)衛(wèi)設(shè)施設(shè)備更新實施方案
- 機械制造技術(shù)基礎(chǔ)(課程課件完整版)
- 江西省南昌市高三二??荚嚨乩碓囶}
- 廣東省高州市2023-2024學(xué)年高一下學(xué)期期中考試數(shù)學(xué)
- 2024年高等教育文學(xué)類自考-06050人際關(guān)系心理學(xué)考試近5年真題附答案
- 福建省公路水運工程試驗檢測費用參考指標(biāo)
- CBL聯(lián)合情景模擬人文護(hù)理查房
評論
0/150
提交評論