編譯原理教學課件:Chapter 3 - Lexical Analysis_第1頁
編譯原理教學課件:Chapter 3 - Lexical Analysis_第2頁
編譯原理教學課件:Chapter 3 - Lexical Analysis_第3頁
編譯原理教學課件:Chapter 3 - Lexical Analysis_第4頁
編譯原理教學課件:Chapter 3 - Lexical Analysis_第5頁
已閱讀5頁,還剩164頁未讀 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、Where we are ?2022-4-261Lexical AnalysisSyntax AnalysisSemantic AnalysisIR GenerationIR OptimizationCode GenerationOptimizationSourceCodeMachineCodewhile (ip z)+ip;w h i l ewhile (ip z)+ip;( i pz ) nt + + i p ;w h i l ewhile (ip z)+ip;( i pz ) nt + + i p ;T_While(T_IdentT_Ident)+T_Identipzipw h i l

2、ewhile (ip z)+ip;( i pz ) nt + + i p ;T_While(T_IdentT_Ident)+T_IdentipzipWhile+IdentipIdentzIdentipScanning a Source Filew h i l e( 1 3 7 i ) nt + + i ;w h i l e( 1 3 7 i ) nt + + i ;Scanning a Source Filew h i l e( 1 3 7 i ) nt + + i ;Scanning a Source Filew h i l e( 1 3 7 i ) nt + + i ;Scanning a

3、 Source Filew h i l e( 1 3 7 i ) nt + + i ;Scanning a Source Filew h i l e( 1 3 7 i ) nt + + i ;Scanning a Source Filew h i l e( 1 3 7 i ) nt + + i ;Scanning a Source Filew h i l ei ) nt + + i ;T_While( 1 3 7Scanning a Source Filew h i l ei ) nt + + i ;( 1 3 7.T_WhileThe piece of the original progra

4、mfrom which we made the token iscalled a lexemeThis is called a token. You can think of it as an enumerated type representing what logical entity we read out of the source code.Scanning a Source FileScanning a Source Filew h i l ei ) nt + + i ;T_While( 1 3 7w h i l ei ) nt + + i ;T_While( 1 3 7Scann

5、ing a Source Filew h i l ei ) nt + + i ;T_While( 1 3 7Scanning a Source Filew h i l ei ) nt + + i ;T_While( 1 3 7Sometimes we will discard alexemerather than storing it for later use.Here, we ignore whitespace, since it has no bearing on the meaning Of the program.Scanning a Source Filew h i l ei )

6、nt + + i ;T_While( 1 3 7Scanning a Source Filew h i l ei ) nt + + i ;T_While( 1 3 7Scanning a Source Filew h i l ei ) nt + + i ;T_While( 1 3 7Scanning a Source Filei ) nt + + i ;( 1 3 7w h i l eT_While(Scanning a Source Filei ) nt + + i ;( 1 3 7w h i l eT_While(Scanning a Source Filei ) nt + + i ;(

7、1 3 7w h i l eT_While(Scanning a Source Filei ) nt + + i ;( 1 3 7w h i l eT_While(Scanning a Source Filei ) nt + + i ;( 1 3 7w h i l eT_While(Scanning a Source Filei ) nt + + i ;( 1 3 7w h i l eT_While (Scanning a Source Filei ) nt + + i ;w h i l eT_While( 1 3 7T_IntConst137(Scanning a Source Filew

8、h i l ei ) nt + + i ;T_While( 1 3 7(T_IntConst137Some tokens can haveattributes that store Extra information about the token. Here we store which integer is represented.Scanning a Source File C+: Nested template declarationsvectorvector myVectorScanning is Hard C+: Nested template declarationsvector

9、 vector myVectorScanning is Hard C+: Nested template declarations(vector (vector myVector)Scanning is HardScanning is Hard C+: Nested template declarations(vector (vector myVector) Again, can be difficult to determine where to split. PL/1: Keywords can be used as identifiers.Scanning is Hard PL/1: K

10、eywords can be used as identifiers.IF THEN THEN THEN = ELSE; ELSE ELSE = IFScanning is Hard PL/1: Keywords can be used as identifiers.IF THEN THEN THEN = ELSE; ELSE ELSE = IFScanning is HardScanning is Hard PL/1: Keywords can be used as identifiers.IF THEN THEN THEN = ELSE; ELSE ELSE = IF Can be dif

11、ficult to determine how to label lexemes.Challenges in ScanningnHow do we determine which lexemes are associated with each token?nWhen there are multiple ways we could scan the input, how do we know which one to pick?nHow do we address these concerns efficiently?#include using namespace std;int main

12、()for(int i=32;i=127;i+)coutASCII碼是i的字符是(char)iendl;return 0;2022-4-2639To automate the construction of a scanner1.Define the tokens with a Regular Expression2.Create the equivalent NFA3.Construct the equivalent DFA4.Program the DFA2022-4-2640Main Content of Scanning StudynSpecification of lexical s

13、tructure : Regular ExpressionnRecognition system: Finite Automata represents algorithms for recognizing strings given by regular expressionnPractical Methods for writing programs that implement the recognition processes represented by finite automata2022-4-2641Study GoalsnMaster: qThe construction o

14、f scanner; the transition from regular expression to DFAnUnderstand: qConcept of regular expression, NFA, DFAnKnow: qImplementation of a Scanner2022-4-26422.1 The Scanning ProcessReviewnThe task of scanner: Reading the source program as a file of characters and diving it up into tokensnTokenqToken i

15、s a sequence of characters that represents a unit of information.qToken represents a certain pattern of characters, such as keywords, identifiers, special symbols.2022-4-2643The Categories of Tokens: Fixed strings of characters that have special meaning in the language, such as “if” and “then”Includ

16、e arithmetic operations, assignment, equality and so on, such as +, -, :=, =sequences of letters and digits beginning with a letterinclude numeric constants and string literals ,such as 42, 3.14, “hello”, “a”2022-4-2644Token and lexemenToken is presented as (Kind, Value)nKinds are logical entities,

17、represented as IF,THEN,PLUS,MINUS,NUM,ID and so onnThe string value represented by a token is called lexeme. qReserved words and special symbol have only one lexemeqWhile number and identifier have infinitely many lexemesnExample:(IF, “if”) (PLUS, “+”) (ID, “x”)2022-4-2645Interface of ScannernScanni

18、ng is a single passqConvert the entire source program into a sequence of tokensnScanning is a sub function of the parserqWhen called by the parser it returns the single next token from the inputscannerparserTokenssourcescannerparsercallTokensource2022-4-26462.2 Regular ExpressionnFunctionRepresent p

19、atterns of strings of charactersnThe meaning of regular expressionqA regular expression r is completely defined by the set of strings that it matchesqThis set is called the language generated by the regular expression, written as The process of constructing a scanner :regular expressionprogram for s

20、cannerDFANFA2022-4-26472.2.1 String and Language1.AlphabetnDefinitionAny finite set of symbolsnExample=01 =ab,c2022-4-26482.StringnDefinitionqA string over some alphabet is a finite sequence of symbols drawn from that alphabet.qExamplesn0,00,10 are strings of =01na, ab, aaca are strings of =ab,cnLen

21、gth of stringqThe length of a string s, usually written as |s|,is the number of occurrences of symbols in s.qExample: |abc|=3nThe empty stringqDenoted by ,is a special string of length zero, a=a=aq is not equal to ( )2022-4-26493.Operations on StringnConcatenationqIf x and y are strings, then the co

22、ncatenation of x and y, written as xy, is the string formed by appending y to xqExample: x=ST, y=abu ,xy=STabu x = x=xnExponentiationqIf a is a string then an = aaaaqExample: a1=a a2=aa a0=2022-4-2650nOperations that generate new regular expression from existing ones are:qChoice among alternatives (

23、|)qConcatenation(.)qRepetition or closure (*)nThe precedence of the operations is:nExampleL(a|bc*)=a (b ,c, cc,)=a b, bc, bcc, =a, b, bc, bcc, Simple Regular ExpressionsnSuppose the only characters are 0 and 1.nHere is a regular expression for strings containing 00 as a substring: (0 | 1)*00(0 | 1)*

24、Simple Regular ExpressionsnSuppose the only characters are 0 and 1.nHere is a regular expression for strings containing 00 as a substring: (0 | 1)*00(0 | 1)*Simple Regular ExpressionsnSuppose the only characters are 0 and 1.nHere is a regular expression for strings containing 00 as a substring: (0 |

25、 1)*00(0 | 1)*11011100101000011111011110011111Simple Regular ExpressionsnSuppose the only characters are 0 and 1.nHere is a regular expression for strings of length exactly four:(0|1)(0|1)(0|1)(0|1)Simple Regular ExpressionsnSuppose the only characters are 0 and 1.nHere is a regular expression for s

26、trings of length exactly four:(0|1)(0|1)(0|1)(0|1)Simple Regular ExpressionsnSuppose the only characters are 0 and 1.nHere is a regular expression for strings of length exactly four:(0|1)(0|1)(0|1)(0|1)0000101011111000Simple Regular ExpressionsnSuppose the only characters are 0 and 1.nHere is a regu

27、lar expression for strings of length exactly four:(0|1)40000101011111000Simple Regular ExpressionsnSuppose the only characters are 0 and 1.nHere is a regular expression for strings that contain at most one zero:Simple Regular ExpressionsnSuppose the only characters are 0 and 1.nHere is a regular exp

28、ression for strings that contain at most one zero:1*(0 | )1*Simple Regular ExpressionsnSuppose the only characters are 0 and 1.nHere is a regular expression for strings that contain at most one zero:1*(0 | )1*1111011111111101110Simple Regular ExpressionsnSuppose the only characters are 0 and 1.nHere

29、 is a regular expression for strings that contain at most one zero:1*0?1*11110111111111011102022-4-2662ExampleGiven the description of the strings to be matched and translate the description into a regular expression=a,b,c, regular expression of strings that contain exactly one b is _?Regular expres

30、sion of strings that contain at most one b is _?Email address?(a|c)*b(a|c)*(a|c)*|(a|c)*b(a|c)* or (a|c)*(b| )(a|c)*a-z(a-z0-9*-_?a-z0-9+)*(a-z0-9*-_?a-z0-9+)+.a-z2,3(.a-z2)?2022-4-26634.LanguagenDefinitionAny set of strings over some fixed alphabetnExampleq is a languageq, the empty set is also a l

31、anguage2022-4-26645.Operations on language nConcatenationqConcatenation of L and M is written as LMLM =st|sL,tMqExample:L=ab,cde M = 0,1LM =ab0,ab1,cde0,cde1qA=A=AnExponentiationThe exponentiation of L is defined as:qL0 = qL1 = L , L2 = LLqLK = LL.L(LK is L concatenated with itself k-1 times)2022-4-

32、2665nClosureqClosure of L ( written as L*) denotes zero or more concatenations of L qL* = L 0 L1 L 2 L 3qExample:L=0,1L*=L0L1L2=,0,1,00,01,10,11,000,qPositive closure of L (written as L+) denotes one or more concatenation of LqL += L 1 L 2 L 3qL *= L 0 L +L += LL*= L* L2022-4-26662.2.2 Definition of

33、 Regular ExpressionsnThe set of basic regular expressionnEssential set of operations that generate new regular expression from existing onesA regular expression is one of the following:1) and are regular expressions ,L()=, L()= 2)Any is a regular expression of ,L()=2022-4-26673)if1 and2 are regular

34、expressions of , then the following are all regular expressions of :(1)1 (1)(2)12(1)(2)12(1)(1)language generated by the regular expressionregular expression2022-4-2668Example=a,b, the following are regular expressions and the language they generatedregular expression raa|bab(a|b)(a|b)a (a b) L(r)aa

35、, babaa, ab, ba, bb ,a, aa, ,a, b, aa, ab, 2022-4-2669ExplanationnThe same language may be generated by many different regular expressionsqe1= (ab), e2 = baqe1= b(ab) , e2 =(ba)bnNot all sets of strings that we can describe in simple terms can be generated by regular expressionsnExample:The set of s

36、trings S=b,aba,aabaa, = anban|n0 cannot be generated by regular expressions2022-4-26702.2.3 Regular Expression for Programming Language TokensnTypical regular expression for tokens, let l=a|b|z d=0|1|9qIdentifier:qUnsigned integer:qNumber with an exponent (e.g: 2.71E-2)qReserved word:2022-4-2671Ambi

37、guitynSome strings can be matched by several different regular expressionsnLanguage definition must give disambiguating rulesqWhen a string can be either an identifier or a keyword, keyword interpretation is preferredqWhen a string can be a single token or a sequence of several tokens, the single-to

38、ken interpretation is preferred (principle of longest substring)2022-4-2672Token delimitersnCharacters that are unambiguously part of other tokens are delimitersqExample: in string “xtemp=ytemp” = is a delimiter nBlanks, newlines, tab characters, comment are all token delimitersqExample: in string “

39、while x” two tokens “while” and “x” are separated by a blankqScanner discard them after checking for any token delimiting effects2022-4-2673LookaheadnScanner must deal with the problem of lookahead one or more charactersnFor example: recognizing the special symbol “:=”,when encounter “:” , scanner m

40、ust lookahead to determine whether the token is “:” or “:=” nLookahead tokens should not be consumed from the input string 2022-4-2674The process of constructing a scanner :regular expressionprogram for scannerDFANFAstartA Simple AutomatonA,B,C,.,ZEach circle is a state of the automaton. The automat

41、ons configuration is determined by what state(s) it is in.startA Simple AutomatonA,B,C,.,ZstartA Simple AutomatonA,B,C,.,ZThese arrows are called transitions . The automaton changes which state(s) it is in by following transitions.startA Simple AutomatonA,B,C,.,Z H E Y A The automaton takes a String

42、 as input and decides whether to accept or reject the string.startA Simple AutomatonA,B,C,.,Z H E Y A startA Simple AutomatonA,B,C,.,Z H E Y A startA Simple AutomatonA,B,C,.,Z H E Y A startA Simple AutomatonA,B,C,.,Z H E Y A startA Simple AutomatonA,B,C,.,Z H E Y A startA Simple AutomatonA,B,C,.,Z H

43、 E Y A startA Simple AutomatonA,B,C,.,Z H E Y A startA Simple AutomatonA,B,C,.,Z H E Y A startA Simple AutomatonA,B,C,.,Z H E Y A startA Simple AutomatonA,B,C,.,Z H E Y A startA Simple AutomatonA,B,C,.,Z H E Y A The double circle indicates that this state is an accepting state. Theautomaton accepts

44、the string if it ends in an accepting state. startA Simple AutomatonA,B,C,.,Z startA Simple AutomatonA,B,C,.,Z startA Simple AutomatonA,B,C,.,Z startA Simple AutomatonA,B,C,.,Z startA Simple AutomatonA,B,C,.,Z startA Simple AutomatonA,B,C,.,Z startA Simple AutomatonA,B,C,.,ZThere is no transition on

45、 “here, so the automaton dies and rejects. startA Simple AutomatonA,B,C,.,ZThere is no transition on “here, so the automaton dies and rejects.startA Simple AutomatonA,B,C,.,Z A B C A B CstartA Simple AutomatonA,B,C,.,ZstartA Simple AutomatonA,B,C,.,Z A B CstartA Simple AutomatonA,B,C,.,Z A B CstartA

46、 Simple AutomatonA,B,C,.,Z A B CstartA Simple AutomatonA,B,C,.,Z A B CstartA Simple AutomatonA,B,C,.,ZThis is not an accepting state, so the automaton rejects. A B C2022-4-261052.3 Finite AutomatanFunction :qFinite automata are mathematical ways of describing particular kinds of algorithms.qHere the

47、y are used to describe the process of recognizing patterns written in regular expressions and so can be used to construct scannersnCategory:qDeterministic Finite Automata (DFA)qNondeterministic Finite Automata (NFA)nRelationship between finite automata and regular expressions2022-4-26106ExampleRegul

48、ar expression for identifierlet letter=a|b|zdigit=0|1|9The process of recognizing such an identifier can be described as finite automataidentifier=letter(letter|digit)*13letterletterdigit2other2022-4-26107: are locations in the process of recognition recording how much of the pattern has already bee

49、n seen: record a change from one state to another: at which the recognition process begins: represent the end of the recognition process13letterletterdigit2other2022-4-26108nThe process of recognizing an actual string can be indicated by listing the sequence of states and transitions in the diagram

50、used in the recognition process. 13letterletterdigit2othernExample: recognizing process of “xtemp=.”:122222xtemp3=2022-4-261092.3.1 Definition of DFAA DFA M=(S,T,S0,A) is a set of statesis an alphabet is a transition function T:SS, T(Si,a)=Sj represents when the current state is Si and the current i

51、nput character is a,DFA will transit to state Sj is a start state is a set of accepting states2022-4-26110ExamplenDFA M=(S,U,V,Q, a,b, f, S, Q). f is defined as following:f(S,a)=Uf(S,b)=V f(V,a)=Uf(V,b)=Qf(U,a)=Qf(U,b)=Vf(Q,a)=Qf(Q,b)=Q2022-4-26111Transition Diagram of DFAnEach state is a node of th

52、e diagramnThe start state is indicated by drawing an unlabeled arrowed line to itnAccepting states are indicated by drawing a double-line border around the statenIf T(Si,a)=Sj, then drawing an arrowed line from the node Si to node Sj labeled by a2022-4-26112ExampleDFA M=(S,U,V,Q,a,b,f,S,Q)f(S,a)=Uf(S,b)=V f(V,a)=Uf(V,b)=Qf(U,a)=Qf(U,b)=Vf(Q,a)=Qf(Q,b)=QThe diagram of it is:bSUVQaaaba,b2022-4-26113nNotes about the DiagramqExtension to the definition:The transitions can also be labeled with names representing a set of characters13letterlette

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論