Development

Sanitize your data from your Rails + Postgres application


Hello all, this time I would like to show you how to sanitize your
database using a very simple Ruby script.

This example has a very specific goal. You will find this
useful if you are using Postgres as your database engine and Rails as your
backend platform.

Rails is vastly used with Postgres as a database engine. So I think this
example could come in handy for a great number of developers.

The problem

So, what do I mean with sanitize? Well, sometimes you have records in the
database that contain characters that you don't want to have. Just imagine that you
forgot to add a validation before shipping your project to production
and now you have real information but with undesired characters.

Another scenario could be when you just changed your business logic, and
now you need to restrict certain characters from existing in your
records.

We are going to use Regular Expressions to find all of your records that contain
forbidden characters.

The scenarios

The first scenario is when you have to look for
text columns which may contain the following characters: \n \r \t.

The second scenario, a little bit more difficult to reproduce, but that can happen (believe me!), is when you have text columns with a
special control character. The control characters are those between
1 and 31 on the ASCII chart. For instance, char 27 is the ESC symbol, so on and so forth.

The solution.

Here is the script for the first scenario. Just adapt it
to your needs:

objects = YourModel.where("your_column ~ E'[\\n\\r\\t]+'")

objects.each do |object|
  object.update_attribute :your_column, object.your_column.squish
end

And for the second scenario, the only thing that changes is the
query, the each loop is the same:

objects = YourModel.where("your_column  ~ ('['||chr(1)||'-'||chr(31)||']')")

I want to emphasize that you could run these scripts within the Rails console or creating a migration and then run it with

rake db:migrate

It is up to you to decide which way suits your needs best. You could also personalize the each loop by adding puts and displaying which record has been updated. This could be helpful to keep an inventory of the records that were modified.

I hope you find this helpful and see you next time!

Thanks for reading.

Best Practices
De Código, Café y Cervezas 07 – ¿Somos profesionales?
Android
How to build Android apps and not crash in the attempt (Part I)
Beginner
Administrate review