Tips and Tricks in Sunspot
Lately, I've been working a lot with Solr and the awesome gem Sunspot (sunspot-rails), and sometimes, I've had the necessity to do some unusual stuff with Sunspot, like deleting a document from the index by hand, ordering by date beginning with the ones with no date then the ones with results, different statuses, etc.
So, here I will show a couple tricks and tips to accomplish this type of task.
Removing an object from Solr index
Did you ever need to delete an element from the index to reproduce a bug? Well, this is how you can do it:
Sunspot.remove_by_id(ObjectType, id)
Cool, isn't it? This is another way:
Sunspot.remove(ObjectType) do
with(:field, 'value')
end
#Removing all posts older than 1 week
Sunspot.remove(Post) { with(:created_at).less_than(7.days.ago) }
Reading Solr Config
Inspecting solr configuration
puts Sunspot.config.inspect
Changing default 30 results per page(initializer?)
Sunspot.config.pagination.default_per_page = 100
Conditional indexing
You can disable indexing conditionally, using if or unless,
class Product < ActiveRecord::Base
searchable(if: proc { |model| model.should_reindex? }
end
Avoid indexing if specific columns are updated
class Product < ActiveRecord::Base
searchable(ignore_attribute_changes_of: [:updated_at, :internal_state]
end
Eager loading
You can use ActiveRecord includes to eager load data needed to reindex documents
class Product < ActiveRecord::Base
searchable(includes: [:variants]) do
string :name, stored: true
text :description, stored: true
string(:sku, :stored => true, :multiple => true) { variants.map(&:sku) }
end
end
Case Insensitive Ordering
It's hard to support case insensitive ordering, if you need to perform ordering with case insensitive, the easiest way is to create a virtual column with everything in lower case
class Product < ActiveRecord::Base
searchable(includes: [:variants]) do
string(:name, stored: true)
string(:name_order) { name.downcase }
end
end
Product.search { order_by(:name_order, :asc) }
Ordering by nil values first then ordered results
By default, solr orders all documents with the ones with data going first, then the ones with no data, if you want to invert this ordering, you need to apply a workaround
class Product < ActiveRecord::Base
searchable(includes: [:variants]) do
date(:last_purchased_at, stored: true) { orders.last.created_at }
date(:last_purchased_at_order, stored: true) do
orders.last.created_at rescue Time.zone.at(1)
end
end
end
Product.search { order_by(:last_purchased_at_order, :asc) }
Reindexing records
Do you want to reindex all documents?
By Model
WARNING This will delete the entire index and start over!
Model.reindex
All Documents
RAILS_ENV=environment bundle exec rake sunspot:solr:reindex
Avoiding index destruction in production
You can disable sunspot:solr:reindex to be executed in production(I did it once and the site ran without products for more than 40 mins)
namespace :sunspot do
desc "Prevents from deleting solr index in production environment"
task :check_environment do
if Rails.env == "production"
fail "Can't use this task in production mode, as it's destructive to Solr index."
end
end
task :reindex => [:check_environment]
end
Incremental reindexing
Sometimes you need to reindex the full catalog, but you want to do it incrementally, in that case you can use something like this:
namespace :sunspot do
task :sunspot do
task :incremental_reindex do
model = ENV['MODEL_NAME']
raise 'Model not found' unless defined?(model)
model.constantize.find_in_batches do |records|
Sunspot.index records
end
end
end
end
Searching by hashtags?
It's really hard to make Sunspot to search by hashtags, instead of doing a full text search in normal string fields, you need to do a workaround like this:
class Product < ActiveRecord::Base
searchable(includes: [:variants]) do
text(:description, stored: true)
text(:description_tags, stored: true) do
description.scan(/#[\w]*/).join(' ')
end
end
def filter(keywords)
search do
keywords(keywords,
fields: keywords.start_with?('#') ? :description_tags : :description)
end
end
end
Product.filter('#tag1') #returns all documents that has '#tag1'
Product.filter('tag1') #returns all documents that has 'tag1' or '#tag1'
So that's it, thank you for reading!