Go

Parsing Ransack strings to SQL queries in Golang


Recently we were working on migrating a Rails app to Go but, just before we started working, we realized that the app we were migrating used ransack to do some basic search. We looked for a library that could do the same thing in Go but, we didn't find any.

We decided to implement a library that could take a Ransack-like string and parse it into a SQL query, then we could use it to do a normal query to a database in our Go code. And that's why we created Gransak for that task.

Tokenizing

The firs step was to tokenize the string. The Ransack statements use the underscore ( _ ) as a separator for fields and operators, due to this, it makes sense to split the string by the underscore to get the tokens but keeping in mind that the field names can be snake case (e.g. user_name).

Once we get the tokens, we just had to iterate through them and separate operators from fields.

Organizing

A state machine

We started thinking on the best way to parse the Ransack string and a state machine seemed to be the way to go. Let's consider the string not_cont_any. A simple state machine could be like following:

switch state {
    case init_state:
        state = initAction(token) //returns the new state
    case not_state: 
        state = notAction(token) 
    case not_cont:
        state = notContAction(token)
    ...
}

As we see in the example, we switch between states depending on the token.

We started on init_state which executes the initAction method and, inside this method, there should be logic that returns the not_state if the input is equal to "not" and so on with the other states.

Of course we can avoid the switch statement using the State design pattern but, the problem is that we could end up with a lot of functions that represent states scattered around the code. I think the state machine is better for situations like getting the content between the square brackets of this string "aaassss[aaxxx]dddd" where we change the state when we find the opening bracket, begin the extraction, and then we stop when we find the closing bracket but, in this case, we have a lot of posible states.

Using a Tree

We opted for a solution that could make the code more readable by changing our point of view so we could see the operators as some kind of trees e.g. "not" has the nodes "in", "eq", and "cont" and "cont" has the "any" node. With that tree, we can represent the "not_in", "not_eq", "not_cont" and "not_cont_any" operators.

image alt

We created a Node struct, which hold the name of the token, the nodes it contains, a method that will be executed in case the node is a "leaf" and a flag that will indicate if the node is an operator. This flag is necessary in the case of the operators such "not_cont_any" that checks for the operator "not_cont", so we need a way to specify that even when "cont" has the "any" node, it is itself an operator and should use the same rule as if it were a leaf.

type Node struct {
    Name       string
    Nodes      []*Node
    Apply      OperatorFunction
    IsOperator bool
}

Now we can represent the tree of "not_cont_any" like this:

&Node{
    Name: "not"
    Nodes: []*Node{
        &Node{
        Name: "cont",
        Apply: func(re *GransakFilter) {
            ...
        },
        IsOperator: true,
        Nodes: []*Node{
            &Node{
                Name: "any",
                Apply: func(re *GransakFilter) {
                    ...
                },
            },
        },
    },
     },
  },

We can see above that "not_cont" and "not_cont_any" are operators, so both have an Apply function. "not" is not an operator itself; is just the parent node.

Comparing

What comes to my mind when working with trees is recursion. While iterating over the tokens, we should call a recursive function that checks if the current token is an operator.

First, we need to find if the current token is part of an operator, specifically, the first part of the operator.
For example, if we find the token "not", we should check if the following tokens are "cont", "eq" or "in". If there is not a match, then probably is part of the name of a field.

func isCandidateToOperator(token string) (*Node, bool) {
    for _, node := range Tree.Nodes {
        if node.Name == token {
            return node, true
        }
    }
    return &Node{}, false
}

If we get a match, for example if the token is "not" we get the "not" node and pass it to the recursive function "find" which will check if some of the paths in the tree matches the following tokens.

The first thing our recursive function will do is to check if the current position given is not larger than the length of the total tokens. If it does, that means that there are no more tokens, so we must stop.

Then it is going to check if the passed token is equal to the name of the current node. If it does, then we check if the current node has child nodes and if it does, we call the recursive function for each one of them.

If a Node has no children nodes, that means it is a "leaf" node so we can return it. If it has child nodes but none of them matched and it is itself an operator, we return it.

func (this *GransakFilter) find(nodeParam *Node, pos int) (*Node, bool) {

    //Checking if there are tokens in the position we want to read   
    if pos >= len(this.toEvaluate) {
        return nil, false
    }

    //Getting next token
    next := this.toEvaluate[pos]

    //Checking if the current token matches the token name
    if nodeParam.Name != next {
        return nil, false
    }

    //check if the node has children
    if len(nodeParam.Nodes) > 0 {
            //call recursive function on each children checking the next token (pos+1)
        for _, node := range nodeParam.Nodes {
            if foundNode, found := this.find(node, pos+1); found {

                return foundNode, true

            }
        }

        //none of its children nodes matched, check if is itself an operator
        if nodeParam.IsOperator == true {
            this.pos = pos
            return nodeParam, true
        }

    } else {
            //it has no children so it is a leaf node
        this.pos = pos
        return nodeParam, true
    }

    return nil, false
}

Let's supose the Ransack string is "user_name_cont_any". The first time we call the "find" function, we pass the node and the current position of the current token, that is: token="cont" and node=Node{Name:"cont"}.

The function will detect that the "cont" node has child nodes, in this case the "any" node, so will call itself passing the "any" node and the current position plus 1 so we get the token in the left of the current token, that is: token="any" and node=Node{Name:"any"}.

The node name matches with the token and it has no children so we return the node and call its Apply method.

What about the fields?

As we saw earlier, the find method gets executed when we find a operator candidate but, what happens with the other tokens?.

Let's continue with the previous example string user_name_cont_any.

What happens while the parser hasn't found any operator candidate is that it puts the tokens in an array called "evaluated". Nothing happens with this array until an operator is found an the Apply function is executed. What happens here is that the elements in the "evaluated" array are joined by underscore ( If the field name has no underscore like "user-name" it will stay the same) and appended to a template followed by a placeholder e.g. "user_name {{.}}".

What the Apply function does

The functionality depends on the operator:

If an chain operator like "or" or "and" is found, then we append it to the template like this: "user_name {{.}} AND last_name {{.}}".

If a result operator like "eq", "cont", or "matches" is found, we replace the "{{.}}" placeholders with the corresponding symbol and append a "value holder" like this: "user\_name = {{v}} AND last_name = {{v}}".

Replacing the value holders

After all the nodes have been evaluated, the final step is to replace the value holder.

The value holders "{{v}}" are replaced by their corresponding place holder and that's it! now we should have a string like this: "user_name = ? AND last_name = ?", or "user_name = $1 AND last_name = $2" in the case of PostgreSql.

Conclusion

For parsing a Ransack string, a State Machine should work, but one of the goals when constructing Gransak was to be able to easily add more operators if needed. The tree structure offered a better way to organize the code so it could be more readable and closer to our mental model.

You can take a look at the details in the code by checking the Gransak project: https://github.com/crowdint/gransak.

Thanks for reading!