Prerequisite: LR Parser, SLR Parser
SLR (1) grammar
SLR stands for Simple LR grammar. It is an example of a bottom-up parser. The “L” in SLR represents the scanning that advances from left to right and the “R” stands for constructions of derivation in reverse order, and the “(1)” represents the number of input symbols of lookahead.
Shift and Reduce Operations
The next important concept is SHIFT and REDUCE operations. During parsing, we can enter two types of states. First, we may have a state where ‘·’ is at the end of production. This state is called “Handle”. Here comes the role of REDUCE operation. Second, while parsing we may have a state where ‘·’ is pointing at some grammar symbol and there is still scope for advancement. In this scenario, a Shift operation is performed.
Use of Dot [ · ]
In LR parsing we use Dot ‘·’ as a character in the rule so that we know the progress in parsing at any given point. Note that ‘·’ is like an informer and, it should not be considered as a grammar symbol. In SLR (1), we have only one input symbol as lookahead, which is ultimately helping us determine the current state during parsing activity.
Parsing Decisions
For making the parsing decisions we need to construct the deterministic finite automaton. It helps us determine the current state to which we have arrived while we are parsing. When we design it for SLR parser it is referred to as LR automaton. The states of this automaton are the set of “Items” related to the production rules set.
Generating LR Collection
The first step is to create an augmented grammar. The augmentation process starts by bringing the start symbol on the right-hand side of a production rule. We cannot alter the existing rule so we add a new rule in the productions. Bringing the start symbol on RHS ensures that the parsing reaches the acceptance state. The REDUCE operation on this newly added rule determines the string acceptance.
For example,
IF we have ‘K’ as a start symbol
THEN L → K is added in productions
(where ‘L’ represents any non-preexisting symbol in the collection)
CLOSURE and GOTO Operations
In the process of construction of deterministic finite automaton, there are two requirements. The first is creating “States” and the second is developing “Transitions” in the automaton.
1) CLOSURE
Closure operation helps us to form the “States”. Before taking the closure operation all the rules must be separated. Then number all the rules. This will be helpful later for making the Shift and Reduce entries in the parsing table. Let I0 be the collection of rules obtained after grammar augmentation. This means we also have a newly added rule in collection I0.
Assumption – (consider [L, K] are non-terminals and [m, t, p] are set of zero or more terminals or non-terminals)
DO REPEAT (Till-No-New-Rule-Gets-Added) {
IF (any production of the form “ L → m · K t ” exists) and (we have production K → · p)
THEN {add production K → · p to the Closure set if not preexisting}
}
2) GOTO
GOTO operation helps us to form the “Transitions”. In operation GOTO (I, X), ‘I’ can be elaborated as the state we are looking at and ‘X’ is the symbol pointed by Dot (·). So, GOTO takes in a state with items and a grammar symbol and produces the new or existing state as output.
The GOTO (I, X) represents the state transition from “I” on the input symbol “X”.
For any production rule “ L → m · K t ” in “I”
GOTO (I, X) outputs the closure of a set of all the productions “ L → m K · t ”
Program approach
The input for the program is the list of rules having a set of items and lists determining terminal and non-terminal symbols. The control starts form grammarAugmentation() function, then we have to calculate I0 state, that is calculated by calling findClosure(), now for new states generation generateStates() function is called and finally createParseTable() function is called.
A) grammarAugmentation
- In this function firstly we create a unique symbol and use it to create a new item to bring the start symbol on RHS. Then we format the items into a nested list and add a dot at the start of the item’s RHS. Also, we keep only one derivation in one item. Thus we have generated a list named separatedRulesList.
B) findClosure
- This function runs differently for I0 state. For I0 state, we directly append to closureSet the newly created item from augmentation. In any other case, closureSet gets initialized with the received argument “input_state”.
- Now continue iterations till we are receiving new items in closureSet. We follow the rules mentioned under “Item-set Closure” title above to look for the next items to be added in closureSet.
C) generateStates
- In this function we are starting with GOTO computation. “statesDict” is a dictionary that stores all the states and is used as a global variable. We iterate over the statesDict till new states are getting added to it. We call the compute_GOTO() function on each added state only once.
D) compute_GOTO
- Now control reaches compute_GOTO, this function doesn’t have actual goto logic, but it creates metadata to call the GOTO() function iteratively. For the given input state, we call GOTO(state,Xi), where Xi represents a symbol pointed by a dot. There can be multiple Xi, so isolating iteration procedure from actual GOTO() logic implementation reduces complexity.
E) GOTO
- This function is called by compute_GOTO() function, we receive “input_state” and “charNextToDot”, we perform the shift operation and generate a new state.
- Now for any item in a new state, dot can point to some symbol for which we may have another item, so to include all the items for the new state, we call findClosure() function that I discussed above.
- To store this information of “GOTO(state, symbol) = newState”, the stateMap dictionary is created having key of type tuple and value of type integer. This will help in parsing table generation and printing output.
F) createParseTable
- Firstly we create the table in its initial empty state. Columns will have ACTION (Terminals and $) and GOTO (Non-terminals). Rows will have numbered states (I0-In).
- Using stateMap fill in the SHIFT and GOTO entries. To add reduce entries in LR parser, find the “handles” (item with the dot at the end) in the generated states, and place Rn (n is the item number in separatedRulesList), in Table[ stateNo ] [ Ai ], where “stateNo” is the state in which item belongs.
- “Ai” is any symbol belonging to FOLLOW of LHS symbol of the current item that we are traversing. REDUCE entry for the new augmented rule shows “Acceptance”. The parser may have RR (Reduce-Reduce) and SR (Shift-Reduce) conflicts.
Python3
# SLR(1) import copy # perform grammar augmentation def grammarAugmentation(rules, nonterm_userdef, start_symbol): # newRules stores processed output rules newRules = [] # create unique 'symbol' to # - represent new start symbol newChar = start_symbol + "'" while (newChar in nonterm_userdef): newChar += "'" # adding rule to bring start symbol to RHS newRules.append([newChar, [ '.' , start_symbol]]) # new format => [LHS,[.RHS]], # can't use dictionary since # - duplicate keys can be there for rule in rules: # split LHS from RHS k = rule.split( "->" ) lhs = k[ 0 ].strip() rhs = k[ 1 ].strip() # split all rule at '|' # keep single derivation in one rule multirhs = rhs.split( '|' ) for rhs1 in multirhs: rhs1 = rhs1.strip().split() # ADD dot pointer at start of RHS rhs1.insert( 0 , '.' ) newRules.append([lhs, rhs1]) return newRules # find closure def findClosure(input_state, dotSymbol): global start_symbol, \ separatedRulesList, \ statesDict # closureSet stores processed output closureSet = [] # if findClosure is called for # - 1st time i.e. for I0, # then LHS is received in "dotSymbol", # add all rules starting with # - LHS symbol to closureSet if dotSymbol = = start_symbol: for rule in separatedRulesList: if rule[ 0 ] = = dotSymbol: closureSet.append(rule) else : # for any higher state than I0, # set initial state as # - received input_state closureSet = input_state # iterate till new states are # - getting added in closureSet prevLen = - 1 while prevLen ! = len (closureSet): prevLen = len (closureSet) # "tempClosureSet" - used to eliminate # concurrent modification error tempClosureSet = [] # if dot pointing at new symbol, # add corresponding rules to tempClosure for rule in closureSet: indexOfDot = rule[ 1 ].index( '.' ) if rule[ 1 ][ - 1 ] ! = '.' : dotPointsHere = rule[ 1 ][indexOfDot + 1 ] for in_rule in separatedRulesList: if dotPointsHere = = in_rule[ 0 ] and \ in_rule not in tempClosureSet: tempClosureSet.append(in_rule) # add new closure rules to closureSet for rule in tempClosureSet: if rule not in closureSet: closureSet.append(rule) return closureSet def compute_GOTO(state): global statesDict, stateCount # find all symbols on which we need to # make function call - GOTO generateStatesFor = [] for rule in statesDict[state]: # if rule is not "Handle" if rule[ 1 ][ - 1 ] ! = '.' : indexOfDot = rule[ 1 ].index( '.' ) dotPointsHere = rule[ 1 ][indexOfDot + 1 ] if dotPointsHere not in generateStatesFor: generateStatesFor.append(dotPointsHere) # call GOTO iteratively on all symbols pointed by dot if len (generateStatesFor) ! = 0 : for symbol in generateStatesFor: GOTO(state, symbol) return def GOTO(state, charNextToDot): global statesDict, stateCount, stateMap # newState - stores processed new state newState = [] for rule in statesDict[state]: indexOfDot = rule[ 1 ].index( '.' ) if rule[ 1 ][ - 1 ] ! = '.' : if rule[ 1 ][indexOfDot + 1 ] = = \ charNextToDot: # swapping element with dot, # to perform shift operation shiftedRule = copy.deepcopy(rule) shiftedRule[ 1 ][indexOfDot] = \ shiftedRule[ 1 ][indexOfDot + 1 ] shiftedRule[ 1 ][indexOfDot + 1 ] = '.' newState.append(shiftedRule) # add closure rules for newState # call findClosure function iteratively # - on all existing rules in newState # addClosureRules - is used to store # new rules temporarily, # to prevent concurrent modification error addClosureRules = [] for rule in newState: indexDot = rule[ 1 ].index( '.' ) # check that rule is not "Handle" if rule[ 1 ][ - 1 ] ! = '.' : closureRes = \ findClosure(newState, rule[ 1 ][indexDot + 1 ]) for rule in closureRes: if rule not in addClosureRules \ and rule not in newState: addClosureRules.append(rule) # add closure result to newState for rule in addClosureRules: newState.append(rule) # find if newState already present # in Dictionary stateExists = - 1 for state_num in statesDict: if statesDict[state_num] = = newState: stateExists = state_num break # stateMap is a mapping of GOTO with # its output states if stateExists = = - 1 : # if newState is not in dictionary, # then create new state stateCount + = 1 statesDict[stateCount] = newState stateMap[(state, charNextToDot)] = stateCount else : # if state repetition found, # assign that previous state number stateMap[(state, charNextToDot)] = stateExists return def generateStates(statesDict): prev_len = - 1 called_GOTO_on = [] # run loop till new states are getting added while ( len (statesDict) ! = prev_len): prev_len = len (statesDict) keys = list (statesDict.keys()) # make compute_GOTO function call # on all states in dictionary for key in keys: if key not in called_GOTO_on: called_GOTO_on.append(key) compute_GOTO(key) return # calculation of first # epsilon is denoted by '#' (semi-colon) # pass rule in first function def first(rule): global rules, nonterm_userdef, \ term_userdef, diction, firsts # recursion base condition # (for terminal or epsilon) if len (rule) ! = 0 and (rule is not None ): if rule[ 0 ] in term_userdef: return rule[ 0 ] elif rule[ 0 ] = = '#' : return '#' # condition for Non-Terminals if len (rule) ! = 0 : if rule[ 0 ] in list (diction.keys()): # fres temporary list of result fres = [] rhs_rules = diction[rule[ 0 ]] # call first on each rule of RHS # fetched (& take union) for itr in rhs_rules: indivRes = first(itr) if type (indivRes) is list : for i in indivRes: fres.append(i) else : fres.append(indivRes) # if no epsilon in result # - received return fres if '#' not in fres: return fres else : # apply epsilon # rule => f(ABC)=f(A)-{e} U f(BC) newList = [] fres.remove( '#' ) if len (rule) > 1 : ansNew = first(rule[ 1 :]) if ansNew ! = None : if type (ansNew) is list : newList = fres + ansNew else : newList = fres + [ansNew] else : newList = fres return newList # if result is not already returned # - control reaches here # lastly if eplison still persists # - keep it in result of first fres.append( '#' ) return fres # calculation of follow def follow(nt): global start_symbol, rules, nonterm_userdef, \ term_userdef, diction, firsts, follows # for start symbol return $ (recursion base case) solset = set () if nt = = start_symbol: # return '$' solset.add( '$' ) # check all occurrences # solset - is result of computed 'follow' so far # For input, check in all rules for curNT in diction: rhs = diction[curNT] # go for all productions of NT for subrule in rhs: if nt in subrule: # call for all occurrences on # - non-terminal in subrule while nt in subrule: index_nt = subrule.index(nt) subrule = subrule[index_nt + 1 :] # empty condition - call follow on LHS if len (subrule) ! = 0 : # compute first if symbols on # - RHS of target Non-Terminal exists res = first(subrule) # if epsilon in result apply rule # - (A->aBX)- follow of - # - follow(B)=(first(X)-{ep}) U follow(A) if '#' in res: newList = [] res.remove( '#' ) ansNew = follow(curNT) if ansNew ! = None : if type (ansNew) is list : newList = res + ansNew else : newList = res + [ansNew] else : newList = res res = newList else : # when nothing in RHS, go circular # - and take follow of LHS # only if (NT in LHS)!=curNT if nt ! = curNT: res = follow(curNT) # add follow result in set form if res is not None : if type (res) is list : for g in res: solset.add(g) else : solset.add(res) return list (solset) def createParseTable(statesDict, stateMap, T, NT): global separatedRulesList, diction # create rows and cols rows = list (statesDict.keys()) cols = T + [ '$' ] + NT # create empty table Table = [] tempRow = [] for y in range ( len (cols)): tempRow.append('') for x in range ( len (rows)): Table.append(copy.deepcopy(tempRow)) # make shift and GOTO entries in table for entry in stateMap: state = entry[ 0 ] symbol = entry[ 1 ] # get index a = rows.index(state) b = cols.index(symbol) if symbol in NT: Table[a][b] = Table[a][b]\ + f "{stateMap[entry]} " elif symbol in T: Table[a][b] = Table[a][b]\ + f "S{stateMap[entry]} " # start REDUCE procedure # number the separated rules numbered = {} key_count = 0 for rule in separatedRulesList: tempRule = copy.deepcopy(rule) tempRule[ 1 ].remove( '.' ) numbered[key_count] = tempRule key_count + = 1 # start REDUCE procedure # format for follow computation addedR = f "{separatedRulesList[0][0]} -> " \ f "{separatedRulesList[0][1][1]}" rules.insert( 0 , addedR) for rule in rules: k = rule.split( "->" ) # remove un-necessary spaces k[ 0 ] = k[ 0 ].strip() k[ 1 ] = k[ 1 ].strip() rhs = k[ 1 ] multirhs = rhs.split( '|' ) # remove un-necessary spaces for i in range ( len (multirhs)): multirhs[i] = multirhs[i].strip() multirhs[i] = multirhs[i].split() diction[k[ 0 ]] = multirhs # find 'handle' items and calculate follow. for stateno in statesDict: for rule in statesDict[stateno]: if rule[ 1 ][ - 1 ] = = '.' : # match the item temp2 = copy.deepcopy(rule) temp2[ 1 ].remove( '.' ) for key in numbered: if numbered[key] = = temp2: # put Rn in those ACTION symbol columns, # who are in the follow of # LHS of current Item. follow_result = follow(rule[ 0 ]) for col in follow_result: index = cols.index(col) if key = = 0 : Table[stateno][index] = "Accept" else : Table[stateno][index] = \ Table[stateno][index] + f "R{key} " # printing table print ( "\nSLR(1) parsing table:\n" ) frmt = "{:>8}" * len (cols) print ( " " , frmt. format ( * cols), "\n" ) ptr = 0 j = 0 for y in Table: frmt1 = "{:>8}" * len (y) print (f "{{:>3}} {frmt1.format(*y)}" . format ( 'I' + str (j))) j + = 1 def printResult(rules): for rule in rules: print (f "{rule[0]} ->" f " {' '.join(rule[1])}" ) def printAllGOTO(diction): for itr in diction: print (f "GOTO ( I{itr[0]} ," f " {itr[1]} ) = I{stateMap[itr]}" ) # *** MAIN *** - Driver Code # uncomment any rules set to test code # follow given format to add - # user defined grammar rule set # rules section - *START* # example sample set 01 rules = [ "E -> E + T | T" , "T -> T * F | F" , "F -> ( E ) | id" ] nonterm_userdef = [ 'E' , 'T' , 'F' ] term_userdef = [ 'id' , '+' , '*' , '(' , ')' ] start_symbol = nonterm_userdef[ 0 ] # example sample set 02 # rules = ["S -> a X d | b Y d | a Y e | b X e", # "X -> c", # "Y -> c" # ] # nonterm_userdef = ['S','X','Y'] # term_userdef = ['a','b','c','d','e'] # start_symbol = nonterm_userdef[0] # rules section - *END* print ( "\nOriginal grammar input:\n" ) for y in rules: print (y) # print processed rules print ( "\nGrammar after Augmentation: \n" ) separatedRulesList = \ grammarAugmentation(rules, nonterm_userdef, start_symbol) printResult(separatedRulesList) # find closure start_symbol = separatedRulesList[ 0 ][ 0 ] print ( "\nCalculated closure: I0\n" ) I0 = findClosure( 0 , start_symbol) printResult(I0) # use statesDict to store the states # use stateMap to store GOTOs statesDict = {} stateMap = {} # add first state to statesDict # and maintain stateCount # - for newState generation statesDict[ 0 ] = I0 stateCount = 0 # computing states by GOTO generateStates(statesDict) # print goto states print ( "\nStates Generated: \n" ) for st in statesDict: print (f "State = I{st}" ) printResult(statesDict[st]) print () print ( "Result of GOTO computation:\n" ) printAllGOTO(stateMap) # "follow computation" for making REDUCE entries diction = {} # call createParseTable function createParseTable(statesDict, stateMap, term_userdef, nonterm_userdef) |
Output:
Original grammar input: E -> E + T | T T -> T * F | F F -> ( E ) | id Grammar after Augmentation: E' -> . E E -> . E + T E -> . T T -> . T * F T -> . F F -> . ( E ) F -> . id Calculated closure: I0 E' -> . E E -> . E + T E -> . T T -> . T * F T -> . F F -> . ( E ) F -> . id States Generated: State = I0 E' -> . E E -> . E + T E -> . T T -> . T * F T -> . F F -> . ( E ) F -> . id State = I1 E' -> E . E -> E . + T State = I2 E -> T . T -> T . * F State = I3 T -> F . State = I4 F -> ( . E ) E -> . E + T E -> . T T -> . T * F T -> . F F -> . ( E ) F -> . id State = I5 F -> id . State = I6 E -> E + . T T -> . T * F T -> . F F -> . ( E ) F -> . id State = I7 T -> T * . F F -> . ( E ) F -> . id State = I8 F -> ( E . ) E -> E . + T State = I9 E -> E + T . T -> T . * F State = I10 T -> T * F . State = I11 F -> ( E ) . Result of GOTO computation: GOTO ( I0 , E ) = I1 GOTO ( I0 , T ) = I2 GOTO ( I0 , F ) = I3 GOTO ( I0 , ( ) = I4 GOTO ( I0 , id ) = I5 GOTO ( I1 , + ) = I6 GOTO ( I2 , * ) = I7 GOTO ( I4 , E ) = I8 GOTO ( I4 , T ) = I2 GOTO ( I4 , F ) = I3 GOTO ( I4 , ( ) = I4 GOTO ( I4 , id ) = I5 GOTO ( I6 , T ) = I9 GOTO ( I6 , F ) = I3 GOTO ( I6 , ( ) = I4 GOTO ( I6 , id ) = I5 GOTO ( I7 , F ) = I10 GOTO ( I7 , ( ) = I4 GOTO ( I7 , id ) = I5 GOTO ( I8 , ) ) = I11 GOTO ( I8 , + ) = I6 GOTO ( I9 , * ) = I7 SLR(1) parsing table: id + * ( ) $ E T F I0 S5 S4 1 2 3 I1 S6 Accept I2 R2 S7 R2 R2 I3 R4 R4 R4 R4 I4 S5 S4 8 2 3 I5 R6 R6 R6 R6 I6 S5 S4 9 3 I7 S5 S4 10 I8 S6 S11 I9 R1 S7 R1 R1 I10 R3 R3 R3 R3 I11 R5 R5 R5 R5