Recently we were working on migrating a Rails app to Go but, just before we started working, we realized that the app we were migrating used ransack to do some basic search. We looked for a library that could do the same thing in Go but, we didn't find any.
We decided to implement a library that could take a Ransack-like string and parse it into a SQL query, then we could use it to do a normal query to a database in our Go code. And that's why we created Gransak for that task.
Tokenizing
The firs step was to tokenize the string. The Ransack statements use the underscore ( _ ) as a separator for fields and operators, due to this, it makes sense to split the string by the underscore to get the tokens but keeping in mind that the field names can be snake case (e.g. user_name).
Once we get the tokens, we just had to iterate through them and separate operators from fields.
Organizing
A state machine
We started thinking on the best way to parse the Ransack string and a state machine seemed to be the way to go. Let's consider the string not_cont_any. A simple state machine could be like following:
switch state {
case init_state:
state = initAction(token) //returns the new state
case not_state:
state = notAction(token)
case not_cont:
state = notContAction(token)
...
}
As we see in the example, we switch between states depending on the token.
We started on init_state
which executes the initAction
method and, inside this method, there should be logic that returns the not_state
if the input is equal to "not" and so on with the other states.
Of course we can avoid the switch statement using the State design pattern but, the problem is that we could end up with a lot of functions that represent states scattered around the code. I think the state machine is better for situations like getting the content between the square brackets of this string "aaassss[aaxxx]dddd" where we change the state when we find the opening bracket, begin the extraction, and then we stop when we find the closing bracket but, in this case, we have a lot of posible states.
Using a Tree
We opted for a solution that could make the code more readable by changing our point of view so we could see the operators as some kind of trees e.g. "not" has the nodes "in", "eq", and "cont" and "cont" has the "any" node. With that tree, we can represent the "not_in", "not_eq", "not_cont" and "not_cont_any" operators.
We created a Node
struct, which hold the name of the token, the nodes it contains, a method that will be executed in case the node is a "leaf" and a flag that will indicate if the node is an operator. This flag is necessary in the case of the operators such "not_cont_any" that checks for the operator "not_cont", so we need a way to specify that even when "cont" has the "any" node, it is itself an operator and should use the same rule as if it were a leaf.
type Node struct {
Name string
Nodes []*Node
Apply OperatorFunction
IsOperator bool
}
Now we can represent the tree of "not_cont_any" like this:
&Node{
Name: "not"
Nodes: []*Node{
&Node{
Name: "cont",
Apply: func(re *GransakFilter) {
...
},
IsOperator: true,
Nodes: []*Node{
&Node{
Name: "any",
Apply: func(re *GransakFilter) {
...
},
},
},
},
},
},
We can see above that "not_cont" and "not_cont_any" are operators, so both have an Apply function. "not" is not an operator itself; is just the parent node.
Comparing
What comes to my mind when working with trees is recursion. While iterating over the tokens, we should call a recursive function that checks if the current token is an operator.
First, we need to find if the current token is part of an operator, specifically, the first part of the operator.
For example, if we find the token "not", we should check if the following tokens are "cont", "eq" or "in". If there is not a match, then probably is part of the name of a field.
func isCandidateToOperator(token string) (*Node, bool) {
for _, node := range Tree.Nodes {
if node.Name == token {
return node, true
}
}
return &Node{}, false
}
If we get a match, for example if the token is "not" we get the "not" node and pass it to the recursive function "find" which will check if some of the paths in the tree matches the following tokens.
The first thing our recursive function will do is to check if the current position given is not larger than the length of the total tokens. If it does, that means that there are no more tokens, so we must stop.
Then it is going to check if the passed token is equal to the name of the current node. If it does, then we check if the current node has child nodes and if it does, we call the recursive function for each one of them.
If a Node has no children nodes, that means it is a "leaf" node so we can return it. If it has child nodes but none of them matched and it is itself an operator, we return it.
func (this *GransakFilter) find(nodeParam *Node, pos int) (*Node, bool) {
//Checking if there are tokens in the position we want to read
if pos >= len(this.toEvaluate) {
return nil, false
}
//Getting next token
next := this.toEvaluate[pos]
//Checking if the current token matches the token name
if nodeParam.Name != next {
return nil, false
}
//check if the node has children
if len(nodeParam.Nodes) > 0 {
//call recursive function on each children checking the next token (pos+1)
for _, node := range nodeParam.Nodes {
if foundNode, found := this.find(node, pos+1); found {
return foundNode, true
}
}
//none of its children nodes matched, check if is itself an operator
if nodeParam.IsOperator == true {
this.pos = pos
return nodeParam, true
}
} else {
//it has no children so it is a leaf node
this.pos = pos
return nodeParam, true
}
return nil, false
}
Let's supose the Ransack string is "user_name_cont_any". The first time we call the "find" function, we pass the node and the current position of the current token, that is: token="cont"
and node=Node{Name:"cont"}
.
The function will detect that the "cont" node has child nodes, in this case the "any" node, so will call itself passing the "any" node and the current position plus 1 so we get the token in the left of the current token, that is: token="any"
and node=Node{Name:"any"}
.
The node name matches with the token and it has no children so we return the node and call its Apply
method.
What about the fields?
As we saw earlier, the find
method gets executed when we find a operator candidate but, what happens with the other tokens?.
Let's continue with the previous example string user_name_cont_any.
What happens while the parser hasn't found any operator candidate is that it puts the tokens in an array called "evaluated". Nothing happens with this array until an operator is found an the Apply function is executed. What happens here is that the elements in the "evaluated" array are joined by underscore ( If the field name has no underscore like "user-name" it will stay the same) and appended to a template followed by a placeholder e.g. "user_name {{.}}"
.
What the Apply function does
The functionality depends on the operator:
If an chain operator like "or" or "and" is found, then we append it to the template like this: "user_name {{.}} AND last_name {{.}}"
.
If a result operator like "eq", "cont", or "matches" is found, we replace the "{{.}}" placeholders with the corresponding symbol and append a "value holder" like this: "user\_name = {{v}} AND last_name = {{v}}"
.
Replacing the value holders
After all the nodes have been evaluated, the final step is to replace the value holder.
The value holders "{{v}}" are replaced by their corresponding place holder and that's it! now we should have a string like this: "user_name = ? AND last_name = ?"
, or "user_name = $1 AND last_name = $2"
in the case of PostgreSql.
Conclusion
For parsing a Ransack string, a State Machine should work, but one of the goals when constructing Gransak was to be able to easily add more operators if needed. The tree structure offered a better way to organize the code so it could be more readable and closer to our mental model.
You can take a look at the details in the code by checking the Gransak project: https://github.com/crowdint/gransak.
Thanks for reading!