Tuesday, November 19, 2024
Google search engine
HomeLanguagesCompiler Design SLR(1) Parser Using Python

Compiler Design SLR(1) Parser Using Python

Prerequisite: LR Parser, SLR Parser

SLR (1) grammar

SLR stands for Simple LR grammar. It is an example of a bottom-up parser. The “L” in SLR represents the scanning that advances from left to right and the “R” stands for constructions of derivation in reverse order, and the “(1)” represents the number of input symbols of lookahead.

Shift and Reduce Operations

The next important concept is SHIFT and REDUCE operations. During parsing, we can enter two types of states. First, we may have a state where ‘·’ is at the end of production. This state is called “Handle”. Here comes the role of REDUCE operation. Second, while parsing we may have a state where ‘·’ is pointing at some grammar symbol and there is still scope for advancement. In this scenario, a Shift operation is performed.

Use of Dot [ · ]

In LR parsing we use Dot ‘·’ as a character in the rule so that we know the progress in parsing at any given point. Note that ‘·’ is like an informer and, it should not be considered as a grammar symbol. In SLR (1), we have only one input symbol as lookahead, which is ultimately helping us determine the current state during parsing activity.

Parsing Decisions

For making the parsing decisions we need to construct the deterministic finite automaton. It helps us determine the current state to which we have arrived while we are parsing. When we design it for SLR parser it is referred to as LR automaton. The states of this automaton are the set of “Items” related to the production rules set.

Generating LR Collection

The first step is to create an augmented grammar. The augmentation process starts by bringing the start symbol on the right-hand side of a production rule. We cannot alter the existing rule so we add a new rule in the productions. Bringing the start symbol on RHS ensures that the parsing reaches the acceptance state. The REDUCE operation on this newly added rule determines the string acceptance.  

For example, 

IF we have ‘K’ as a start symbol

THEN L → K is added in productions

(where ‘L’ represents any non-preexisting symbol in the collection)

CLOSURE and GOTO Operations

In the process of construction of deterministic finite automaton, there are two requirements. The first is creating “States” and the second is developing “Transitions” in the automaton.

1) CLOSURE

Closure operation helps us to form the “States”. Before taking the closure operation all the rules must be separated. Then number all the rules. This will be helpful later for making the Shift and Reduce entries in the parsing table. Let I0 be the collection of rules obtained after grammar augmentation. This means we also have a newly added rule in collection I0.  

Assumption – (consider [L, K] are non-terminals and [m, t, p] are set of zero or more terminals or non-terminals)

DO REPEAT (Till-No-New-Rule-Gets-Added) {

IF (any production of the form “ L → m · K t ” exists) and (we have production K → · p)

THEN {add production K → · p to the Closure set if not preexisting}

}

2) GOTO

GOTO operation helps us to form the “Transitions”. In operation GOTO (I, X), ‘I’ can be elaborated as the state we are looking at and ‘X’ is the symbol pointed by Dot (·). So, GOTO takes in a state with items and a grammar symbol and produces the new or existing state as output.

The GOTO (I, X) represents the state transition from “I” on the input symbol “X”.

For any production rule “ L → m · K t ” in “I”

GOTO (I, X) outputs the closure of a set of all the productions “ L → m  K · t ”

Program approach

The input for the program is the list of rules having a set of items and lists determining terminal and non-terminal symbols. The control starts form grammarAugmentation() function, then we have to calculate I0 state, that is calculated by calling findClosure(), now for new states generation generateStates() function is called and finally createParseTable() function is called.

A) grammarAugmentation

  • In this function firstly we create a unique symbol and use it to create a new item to bring the start symbol on RHS. Then we format the items into a nested list and add a dot at the start of the item’s RHS. Also, we keep only one derivation in one item. Thus we have generated a list named separatedRulesList.

B) findClosure

  • This function runs differently for I0 state. For I0 state, we directly append to closureSet the newly created item from augmentation. In any other case, closureSet gets initialized with the received argument “input_state”.
  • Now continue iterations till we are receiving new items in closureSet. We follow the rules mentioned under “Item-set Closure” title above to look for the next items to be added in closureSet.

C) generateStates

  • In this function we are starting with GOTO computation. “statesDict” is a dictionary that stores all the states and is used as a global variable. We iterate over the statesDict till new states are getting added to it. We call the compute_GOTO() function on each added state only once.

D) compute_GOTO

  • Now control reaches compute_GOTO, this function doesn’t have actual goto logic, but it creates metadata to call the GOTO() function iteratively. For the given input state, we call GOTO(state,Xi), where Xi represents a symbol pointed by a dot. There can be multiple Xi, so isolating iteration procedure from actual GOTO() logic implementation reduces complexity.

E) GOTO

  • This function is called by compute_GOTO() function, we receive “input_state” and “charNextToDot”, we perform the shift operation and generate a new state.
  • Now for any item in a new state, dot can point to some symbol for which we may have another item, so to include all the items for the new state, we call findClosure() function that I discussed above.
  • To store this information of “GOTO(state, symbol) = newState”, the stateMap dictionary is created having key of type tuple and value of type integer. This will help in parsing table generation and printing output.

F) createParseTable

  • Firstly we create the table in its initial empty state. Columns will have ACTION (Terminals and $) and GOTO (Non-terminals). Rows will have numbered states (I0-In).
  • Using stateMap fill in the SHIFT and GOTO entries. To add reduce entries in LR parser, find the “handles” (item with the dot at the end) in the generated states, and place Rn (n is the item number in separatedRulesList), in Table[ stateNo ] [ Ai ], where “stateNo” is the state in which item belongs.
  • “Ai” is any symbol belonging to FOLLOW of  LHS symbol of the current item that we are traversing. REDUCE entry for the new augmented rule shows “Acceptance”. The parser may have RR (Reduce-Reduce) and SR  (Shift-Reduce) conflicts.

Python3




# SLR(1)
 
import copy
 
# perform grammar augmentation
def grammarAugmentation(rules, nonterm_userdef,
                        start_symbol):
   
    # newRules stores processed output rules
    newRules = []
 
    # create unique 'symbol' to
    # - represent new start symbol
    newChar = start_symbol + "'"
    while (newChar in nonterm_userdef):
        newChar += "'"
 
    # adding rule to bring start symbol to RHS
    newRules.append([newChar,
                     ['.', start_symbol]])
 
    # new format  => [LHS,[.RHS]],
    # can't use dictionary since
    # - duplicate keys can be there
    for rule in rules:
       
        # split LHS from RHS
        k = rule.split("->")
        lhs = k[0].strip()
        rhs = k[1].strip()
         
        # split all rule at '|'
        # keep single derivation in one rule
        multirhs = rhs.split('|')
        for rhs1 in multirhs:
            rhs1 = rhs1.strip().split()
             
            # ADD dot pointer at start of RHS
            rhs1.insert(0, '.')
            newRules.append([lhs, rhs1])
    return newRules
 
 
# find closure
def findClosure(input_state, dotSymbol):
    global start_symbol, \
        separatedRulesList, \
        statesDict
 
    # closureSet stores processed output
    closureSet = []
 
    # if findClosure is called for
    # - 1st time i.e. for I0,
    # then LHS is received in "dotSymbol",
    # add all rules starting with
    # - LHS symbol to closureSet
    if dotSymbol == start_symbol:
        for rule in separatedRulesList:
            if rule[0] == dotSymbol:
                closureSet.append(rule)
    else:
        # for any higher state than I0,
        # set initial state as
        # - received input_state
        closureSet = input_state
 
    # iterate till new states are
    # - getting added in closureSet
    prevLen = -1
    while prevLen != len(closureSet):
        prevLen = len(closureSet)
 
        # "tempClosureSet" - used to eliminate
        # concurrent modification error
        tempClosureSet = []
 
        # if dot pointing at new symbol,
        # add corresponding rules to tempClosure
        for rule in closureSet:
            indexOfDot = rule[1].index('.')
            if rule[1][-1] != '.':
                dotPointsHere = rule[1][indexOfDot + 1]
                for in_rule in separatedRulesList:
                    if dotPointsHere == in_rule[0] and \
                            in_rule not in tempClosureSet:
                        tempClosureSet.append(in_rule)
 
        # add new closure rules to closureSet
        for rule in tempClosureSet:
            if rule not in closureSet:
                closureSet.append(rule)
    return closureSet
 
 
def compute_GOTO(state):
    global statesDict, stateCount
 
    # find all symbols on which we need to
    # make function call - GOTO
    generateStatesFor = []
    for rule in statesDict[state]:
        # if rule is not "Handle"
        if rule[1][-1] != '.':
            indexOfDot = rule[1].index('.')
            dotPointsHere = rule[1][indexOfDot + 1]
            if dotPointsHere not in generateStatesFor:
                generateStatesFor.append(dotPointsHere)
 
    # call GOTO iteratively on all symbols pointed by dot
    if len(generateStatesFor) != 0:
        for symbol in generateStatesFor:
            GOTO(state, symbol)
    return
 
 
def GOTO(state, charNextToDot):
    global statesDict, stateCount, stateMap
 
    # newState - stores processed new state
    newState = []
    for rule in statesDict[state]:
        indexOfDot = rule[1].index('.')
        if rule[1][-1] != '.':
            if rule[1][indexOfDot + 1] == \
                    charNextToDot:
                # swapping element with dot,
                # to perform shift operation
                shiftedRule = copy.deepcopy(rule)
                shiftedRule[1][indexOfDot] = \
                    shiftedRule[1][indexOfDot + 1]
                shiftedRule[1][indexOfDot + 1] = '.'
                newState.append(shiftedRule)
 
    # add closure rules for newState
    # call findClosure function iteratively
    # - on all existing rules in newState
 
    # addClosureRules - is used to store
    # new rules temporarily,
    # to prevent concurrent modification error
    addClosureRules = []
    for rule in newState:
        indexDot = rule[1].index('.')
        # check that rule is not "Handle"
        if rule[1][-1] != '.':
            closureRes = \
                findClosure(newState, rule[1][indexDot + 1])
            for rule in closureRes:
                if rule not in addClosureRules \
                        and rule not in newState:
                    addClosureRules.append(rule)
 
    # add closure result to newState
    for rule in addClosureRules:
        newState.append(rule)
 
    # find if newState already present
    # in Dictionary
    stateExists = -1
    for state_num in statesDict:
        if statesDict[state_num] == newState:
            stateExists = state_num
            break
 
    # stateMap is a mapping of GOTO with
    # its output states
    if stateExists == -1:
       
        # if newState is not in dictionary,
        # then create new state
        stateCount += 1
        statesDict[stateCount] = newState
        stateMap[(state, charNextToDot)] = stateCount
    else:
       
        # if state repetition found,
        # assign that previous state number
        stateMap[(state, charNextToDot)] = stateExists
    return
 
 
def generateStates(statesDict):
    prev_len = -1
    called_GOTO_on = []
 
    # run loop till new states are getting added
    while (len(statesDict) != prev_len):
        prev_len = len(statesDict)
        keys = list(statesDict.keys())
 
        # make compute_GOTO function call
        # on all states in dictionary
        for key in keys:
            if key not in called_GOTO_on:
                called_GOTO_on.append(key)
                compute_GOTO(key)
    return
 
# calculation of first
# epsilon is denoted by '#' (semi-colon)
 
# pass rule in first function
def first(rule):
    global rules, nonterm_userdef, \
        term_userdef, diction, firsts
     
    # recursion base condition
    # (for terminal or epsilon)
    if len(rule) != 0 and (rule is not None):
        if rule[0] in term_userdef:
            return rule[0]
        elif rule[0] == '#':
            return '#'
 
    # condition for Non-Terminals
    if len(rule) != 0:
        if rule[0] in list(diction.keys()):
           
            # fres temporary list of result
            fres = []
            rhs_rules = diction[rule[0]]
             
            # call first on each rule of RHS
            # fetched (& take union)
            for itr in rhs_rules:
                indivRes = first(itr)
                if type(indivRes) is list:
                    for i in indivRes:
                        fres.append(i)
                else:
                    fres.append(indivRes)
 
            # if no epsilon in result
            # - received return fres
            if '#' not in fres:
                return fres
            else:
               
                # apply epsilon
                # rule => f(ABC)=f(A)-{e} U f(BC)
                newList = []
                fres.remove('#')
                if len(rule) > 1:
                    ansNew = first(rule[1:])
                    if ansNew != None:
                        if type(ansNew) is list:
                            newList = fres + ansNew
                        else:
                            newList = fres + [ansNew]
                    else:
                        newList = fres
                    return newList
                   
                # if result is not already returned
                # - control reaches here
                # lastly if eplison still persists
                # - keep it in result of first
                fres.append('#')
                return fres
 
 
# calculation of follow
def follow(nt):
    global start_symbol, rules, nonterm_userdef, \
        term_userdef, diction, firsts, follows
     
    # for start symbol return $ (recursion base case)
    solset = set()
    if nt == start_symbol:
        # return '$'
        solset.add('$')
 
    # check all occurrences
    # solset - is result of computed 'follow' so far
 
    # For input, check in all rules
    for curNT in diction:
        rhs = diction[curNT]
         
        # go for all productions of NT
        for subrule in rhs:
            if nt in subrule:
               
                # call for all occurrences on
                # - non-terminal in subrule
                while nt in subrule:
                    index_nt = subrule.index(nt)
                    subrule = subrule[index_nt + 1:]
                     
                    # empty condition - call follow on LHS
                    if len(subrule) != 0:
                       
                        # compute first if symbols on
                        # - RHS of target Non-Terminal exists
                        res = first(subrule)
                         
                        # if epsilon in result apply rule
                        # - (A->aBX)- follow of -
                        # - follow(B)=(first(X)-{ep}) U follow(A)
                        if '#' in res:
                            newList = []
                            res.remove('#')
                            ansNew = follow(curNT)
                            if ansNew != None:
                                if type(ansNew) is list:
                                    newList = res + ansNew
                                else:
                                    newList = res + [ansNew]
                            else:
                                newList = res
                            res = newList
                    else:
                       
                        # when nothing in RHS, go circular
                        # - and take follow of LHS
                        # only if (NT in LHS)!=curNT
                        if nt != curNT:
                            res = follow(curNT)
 
                    # add follow result in set form
                    if res is not None:
                        if type(res) is list:
                            for g in res:
                                solset.add(g)
                        else:
                            solset.add(res)
    return list(solset)
 
 
def createParseTable(statesDict, stateMap, T, NT):
    global separatedRulesList, diction
 
    # create rows and cols
    rows = list(statesDict.keys())
    cols = T+['$']+NT
 
    # create empty table
    Table = []
    tempRow = []
    for y in range(len(cols)):
        tempRow.append('')
    for x in range(len(rows)):
        Table.append(copy.deepcopy(tempRow))
 
    # make shift and GOTO entries in table
    for entry in stateMap:
        state = entry[0]
        symbol = entry[1]
        # get index
        a = rows.index(state)
        b = cols.index(symbol)
        if symbol in NT:
            Table[a][b] = Table[a][b]\
                + f"{stateMap[entry]} "
        elif symbol in T:
            Table[a][b] = Table[a][b]\
                + f"S{stateMap[entry]} "
 
    # start REDUCE procedure
 
    # number the separated rules
    numbered = {}
    key_count = 0
    for rule in separatedRulesList:
        tempRule = copy.deepcopy(rule)
        tempRule[1].remove('.')
        numbered[key_count] = tempRule
        key_count += 1
 
    # start REDUCE procedure
    # format for follow computation
    addedR = f"{separatedRulesList[0][0]} -> " \
        f"{separatedRulesList[0][1][1]}"
    rules.insert(0, addedR)
    for rule in rules:
        k = rule.split("->")
         
        # remove un-necessary spaces
        k[0] = k[0].strip()
        k[1] = k[1].strip()
        rhs = k[1]
        multirhs = rhs.split('|')
         
        # remove un-necessary spaces
        for i in range(len(multirhs)):
            multirhs[i] = multirhs[i].strip()
            multirhs[i] = multirhs[i].split()
        diction[k[0]] = multirhs
 
    # find 'handle' items and calculate follow.
    for stateno in statesDict:
        for rule in statesDict[stateno]:
            if rule[1][-1] == '.':
               
                # match the item
                temp2 = copy.deepcopy(rule)
                temp2[1].remove('.')
                for key in numbered:
                    if numbered[key] == temp2:
                       
                        # put Rn in those ACTION symbol columns,
                        # who are in the follow of
                        # LHS of current Item.
                        follow_result = follow(rule[0])
                        for col in follow_result:
                            index = cols.index(col)
                            if key == 0:
                                Table[stateno][index] = "Accept"
                            else:
                                Table[stateno][index] =\
                                    Table[stateno][index]+f"R{key} "
 
    # printing table
    print("\nSLR(1) parsing table:\n")
    frmt = "{:>8}" * len(cols)
    print("  ", frmt.format(*cols), "\n")
    ptr = 0
    j = 0
    for y in Table:
        frmt1 = "{:>8}" * len(y)
        print(f"{{:>3}} {frmt1.format(*y)}"
              .format('I'+str(j)))
        j += 1
 
def printResult(rules):
    for rule in rules:
        print(f"{rule[0]} ->"
              f" {' '.join(rule[1])}")
 
def printAllGOTO(diction):
    for itr in diction:
        print(f"GOTO ( I{itr[0]} ,"
              f" {itr[1]} ) = I{stateMap[itr]}")
 
# *** MAIN *** - Driver Code
 
# uncomment any rules set to test code
# follow given format to add -
# user defined grammar rule set
# rules section - *START*
 
# example sample set 01
rules = ["E -> E + T | T",
         "T -> T * F | F",
         "F -> ( E ) | id"
         ]
nonterm_userdef = ['E', 'T', 'F']
term_userdef = ['id', '+', '*', '(', ')']
start_symbol = nonterm_userdef[0]
 
# example sample set 02
# rules = ["S -> a X d | b Y d | a Y e | b X e",
#          "X -> c",
#          "Y -> c"
#          ]
# nonterm_userdef = ['S','X','Y']
# term_userdef = ['a','b','c','d','e']
# start_symbol = nonterm_userdef[0]
 
# rules section - *END*
print("\nOriginal grammar input:\n")
for y in rules:
    print(y)
 
# print processed rules
print("\nGrammar after Augmentation: \n")
separatedRulesList = \
    grammarAugmentation(rules,
                        nonterm_userdef,
                        start_symbol)
printResult(separatedRulesList)
 
# find closure
start_symbol = separatedRulesList[0][0]
print("\nCalculated closure: I0\n")
I0 = findClosure(0, start_symbol)
printResult(I0)
 
# use statesDict to store the states
# use stateMap to store GOTOs
statesDict = {}
stateMap = {}
 
# add first state to statesDict
# and maintain stateCount
# - for newState generation
statesDict[0] = I0
stateCount = 0
 
# computing states by GOTO
generateStates(statesDict)
 
# print goto states
print("\nStates Generated: \n")
for st in statesDict:
    print(f"State = I{st}")
    printResult(statesDict[st])
    print()
 
print("Result of GOTO computation:\n")
printAllGOTO(stateMap)
 
# "follow computation" for making REDUCE entries
diction = {}
 
# call createParseTable function
createParseTable(statesDict, stateMap,
                 term_userdef,
                 nonterm_userdef)


Output:

Original grammar input:

E -> E + T | T
T -> T * F | F
F -> ( E ) | id

Grammar after Augmentation: 

E' -> . E
E -> . E + T
E -> . T
T -> . T * F
T -> . F
F -> . ( E )
F -> . id

Calculated closure: I0

E' -> . E
E -> . E + T
E -> . T
T -> . T * F
T -> . F
F -> . ( E )
F -> . id

States Generated: 

State = I0
E' -> . E
E -> . E + T
E -> . T
T -> . T * F
T -> . F
F -> . ( E )
F -> . id

State = I1
E' -> E .
E -> E . + T

State = I2
E -> T .
T -> T . * F

State = I3
T -> F .

State = I4
F -> ( . E )
E -> . E + T
E -> . T
T -> . T * F
T -> . F
F -> . ( E )
F -> . id

State = I5
F -> id .

State = I6
E -> E + . T
T -> . T * F
T -> . F
F -> . ( E )
F -> . id

State = I7
T -> T * . F
F -> . ( E )
F -> . id

State = I8
F -> ( E . )
E -> E . + T

State = I9
E -> E + T .
T -> T . * F

State = I10
T -> T * F .

State = I11
F -> ( E ) .

Result of GOTO computation:

GOTO ( I0 , E ) = I1
GOTO ( I0 , T ) = I2
GOTO ( I0 , F ) = I3
GOTO ( I0 , ( ) = I4
GOTO ( I0 , id ) = I5
GOTO ( I1 , + ) = I6
GOTO ( I2 , * ) = I7
GOTO ( I4 , E ) = I8
GOTO ( I4 , T ) = I2
GOTO ( I4 , F ) = I3
GOTO ( I4 , ( ) = I4
GOTO ( I4 , id ) = I5
GOTO ( I6 , T ) = I9
GOTO ( I6 , F ) = I3
GOTO ( I6 , ( ) = I4
GOTO ( I6 , id ) = I5
GOTO ( I7 , F ) = I10
GOTO ( I7 , ( ) = I4
GOTO ( I7 , id ) = I5
GOTO ( I8 , ) ) = I11
GOTO ( I8 , + ) = I6
GOTO ( I9 , * ) = I7

SLR(1) parsing table:

         id       +       *       (       )       $       E       T       F 

 I0      S5                      S4                       1       2       3 
 I1              S6                           Accept                        
 I2              R2      S7              R2      R2                         
 I3              R4      R4              R4      R4                         
 I4      S5                      S4                       8       2       3 
 I5              R6      R6              R6      R6                         
 I6      S5                      S4                               9       3 
 I7      S5                      S4                                      10 
 I8              S6                     S11                                 
 I9              R1      S7              R1      R1                         
I10              R3      R3              R3      R3                         
I11              R5      R5              R5      R5    

RELATED ARTICLES

Most Popular

Recent Comments