Bottom Up Parsing: A Basic Concise Guide in 5 Points

Introduction

To parse means to break something down into its component parts. For Ex: Breaking a sentence down and explaining what a verb does. In terms of data, the compiler unit does the sequential parse operation in a top down parsing, and bottom-up parsing after its lexer /tokenizer/ scanner has produced its tokens. Parsing (top down and bottom up parsing) is affected by the parser software component on the input data from the analysis phase of the lexer, wherein the output is checked for appropriate grammar/syntax and shown as the parse tree with the difference between top down parsing and bottom-up parsing indicating the sequence of tokens. There are thus 2 kinds of parsers available, namely the bottom up and top down parsing types.

In this article let us look at:

  1. Top-Down Parser
  2. Bottom-Up Parser
  3. Classification of Bottom Up Parsers
  4. Construction of LR(0) parsing table and GOTO graph
  5. Explanation of all Types

1. Top-down Parser

Such parsers parse the string input from the start of data, using the left-most sequence in derivation while ending with the parsing terminals. Based on their sequence they explain the difference between top down and bottom-up parsing and can be of 2 types namely

  • Recursive descent or Brute Force parser using backtracking and brute force to parse.
  • Dynamic/Non-recursive descent/ Predictive/LL1 parser using a parse table and no backtracking operation to obtain the parse tree.

2. Bottom-up Parser

This parser compresses the non-terminals where it begins and moves backward to the start symbol using grammar productions to draw the input string’s parse tree. It has 2 types of bottom-up parsing under it, namely

  • LR parses using bottom up parsing grammar that is unambiguous with a bottoms-up structure and having 4 types under it. 
  • LR(0)
  • LALR(1)
  • SLR(1)
  • CLR(1) 
  • Operator precedence parser uses given bottom up parsing grammar on the condition that epsilon and 2 consecutive non-terminals are not present in the RHS of a grammar production. 

3. Classification of bottom up parsers

Bottom-up Parsers are also called Shift Reduce Parsers as they parse the tree moving from leaves/ terminals to the root/ start symbol in a bottom-up structure running from right to left on the input string denoted by w explaining what is bottom up parsing.

The L stands for scan operations of bottom up parsing left-to-right, and R stands for scan right-to-left in bottom up parsing. LR bottom up parsing parsers have several advantages of bottom up parsing like

  • Use in many programming languages except Perl and C .
  • Efficient implementation.
  • In L-operations, they detect quickly any syntactic errors present in a bottom up parsing example.

Let’s study the grammar GOTO graphs using the 4 LR parsing techniques going directly to the GOTO graph construction with a bottom up parse tree example.

4. Construction of LR(0) parsing table and GOTO graph

Assume closure of augmented bottom up parsing LR(0) item closure is State I0 and then find all LR(0) items or set collections using the DFA. The action function uses the terminal an (end input marker $) and states i as an argument. The 4 forms of ACTION[i, a] value are 

  • Reduce A -> β 
  • Shift j.
  • Accept
  • Error

Using the GOTO function for sets of items-to-states, when I= GOTO[Ii, A], the state of i is mapped to the state j of a non-terminal A. In the GOTO graph LR(0) parsing table, note that when 2 reduced productions or both shifted and reduced productions are present, a situation of conflict occurs in distinct problems of bottom up parsing where the grammar is not acceptable as LR grammar. The conflict situations in the state are called RR conflict with 2 reduced productions and SR conflict when 1 shifted and 1 reduced production is present. If no conflict is present its grammar is LR(0) grammar.

5. Explanation of all types

  • SLR Parser

 This parser is the same as the LR(0) bottom up parsing parser but with a reduced productions entry which is written such that it uses the FOLLOW of the reduced production variable. Do note that if multiple entries are there in the parsing table, there is conflict and the grammar is not the grammar used by the parser and needs conflict resolution of the syntax before being accepted as the parser’s unambiguous grammar in bottom up parsing solved examples.

  •  CLR PARSER

The SLR method has LR(0)) items. CLR parsing uses LR(1) items where the lookaheads length is k to define the LR(k) item. Thus LR(1) parsers are more powerful and work with a lookahead item applied to the same but modified functions of GOTO and Closure.

  • LALR PARSER

 This parser is like the CLR parser. But while CLR parsers have 2 states differing in the portion of the lookahead, the states are combined in the LALR parser. After the operation of minimization, it is called LALR grammar when the parsing table reveals no grammar conflicts.

Conclusion

It is sufficient to look at the GOTO graph and not having to construct the bottom up parsing tables to determine the LR(0) grammar by looking for the conflicts being present or absent. When no conflicts are present, it is called LR(0) grammar. When one shift and one variable entry are present, the shift entries move to the GOTO part and the variable to the ACTION part, becoming reduced entries. Similarly, when 2 variables are present on the TERMINAL, no conflict is produced.

If you are interested in making it big in the world of data and evolve as a Future Leader, you may consider our Business Analytics Certification, a 10-month online program, in collaboration with IIM Indore!

ALSO READ

Related Articles

loader
Please wait while your application is being created.
Request Callback