Sunspot, on my Servers, Makes me Happy

After a week troubleshooting Solr and getting it running on Ubuntu for a test version of the Limspec app, I think I am allowed the bad humor.

I recently posted a summary of how I deployed a Rails application on a Debian VPS using Nginx.  The steps are fairly similar to what you would do with Ubuntu.  For the Limspec project I’m working on, we’re using Apache – mostly because that is what I started with, but there is no practical reason to not use Nginx.  The instructions in that post should work well, but you would need to install the mod_rails module for Apache and create the appropriate sites-available files.

The application that I was deploying is a fairly simple application for my Church’s folk dance director to use for managing the dance program, and at this point doesn’t utilize any search, let alone full text, so the instructions didn’t include what is a fairly critical element to some of my other applications, and that is Solr for full text searching.

Solr is from the Apache Lucene project, and is a very powerful enterprise search platform.  I had implemented it for Limspec quite some time ago.  However, we had a VM meltdown a few months ago, and the replacement VM only seemed to have pieces and parts from the previous VM.  This was a huge problem getting Limspec deployed again (in addition to this, I no longer had root access, which is probably good as I was forced to set things up in a more secure fashion).  When all was said and done, however, I had forgotten to check Solr out.  It was working fine on my dev machine, so all search related tests passed with flying colors.  Important safety tip with TDD.  Even if you test exensively on your dev machine, you need to be very aware of those things that are deployed quite differently in production.  Solr is one of those.  Although it appears that you might be able to use the sunspot_solr gem in production, the developers who created it indicate they only intend it for use in development.  After hours spent trying to make it work, I tend to agree.  I could never quite get it running, so I finally gave up.

So, not having taken notes on how I installed Solr the first time (well over a year ago), I set out to do it again.  Of course, there is a newer version of Solr, and a newer version of the sunspot_rails gem.  When I was rebuilding the actual Limspec server, I created an Ubuntu VM on my desktop to try everything on first.  So, I continued to use that VM to figure out Solr.  The following instructions are based on adding Solr to my local VM, which is running Precise Pangolin (Ubuntu 12.04 LTS).

Jetty

Solr is a Java servlet, and so needs a servlet container of some sort.  Previously, I had used Tomcat.  However, Tomcat is fairly memory intensive and is really only necessary for more complex Solr installs (such as multiple instance).  Of course, if you are already using Tomcat for other purposes, it would probably make more sense to deploy Solr with Tomcat than to run yet another web server.  If not, Solr comes complete with its own copy of Jetty.

From here on out, everything is fairly straightforward, but it took me a while to figure it all out.

The first step is to download the latest version of Solr, or at least the version you are interested in.  I opted for 4.6, which is the most recent version at the time of me writing this. Once you’ve downloaded the tar file, you can untar it wherever you’d like, as you’ll be copying a subdirectory out to another location.  There are a number of locations you can use as your Solr home.  I recommend that you take a look ahead to the startup script from the Solr Wiki and choose one of the standard locations in the script.  Remember that I’m looking to use Solr in support of another goal, so I want to minimize anything that makes my installation non-standard.  Taking this approach, as well, makes maintenance of the application and installation by other users, much easier.  I chose opt/solr as my home, so I executed a mv solr-4.6.0/example /opt/solr.

The next thing to do is decide whether you are going to run single or multicore.  I tend to have staging instances on the same server as production, so I want multicore.  To deploy for multicore, within opt/solr, delete the solr directory (i.e. rm -r /opt/solr/solr), then mv /opt/solr/multicore /opt/solr/solr.  This then gives you a multicore deploy.  By default, you have two cores in place, core0 and core1.  You can certainly stick with those names, but I wanted names that would tell me what those cores are being used for.  If you want to change the names, first execute a mv core0 <newCoreName>, then update the solr.xml file within /opt/solr/solr to indicate the new names and paths.  That is, change

<cores adminPath=”/admin/cores” host=”${host:}” hostPort=”${jetty.port:8983}” hostContext=”${hostContext:solr}”>
<core name=”core0″ instanceDir=”core0″ />
<core name=”core1″ instanceDir=”core1″ />
</cores>

to:

<cores adminPath=”/admin/cores” host=”${host:}” hostPort=”${jetty.port:8983}” hostContext=”${hostContext:solr}”>
<core name=“newCore” instanceDir=“newCore” />
<core name=“newCore2″ instanceDir=“newCore2″ />
</cores>

or whatever names you chose.  I tried to use names with spaces in them, and Jetty didn’t like that.  I’m not sure if it was because of the spaces, or the fact that the name wasn’t the same as the instanceDir, so I just made them both the same and the problem went away.

If you want to test your installation at this point, you can fire up jetty by running java -jar /opt/solr/start.jar.  Then go to http://yourserver:8983/solr, and you should see both of your cores in the coreadmin screen.  Note that you can change the port for Solr within the solr.xml file if you would like.

Starting Automatically

The next thing you’ll want is for Solr to launch on startup in the background.  This proves to be easy to do.  First, download the jetty.sh script linked to from the SolrJetty page.  If you looked ahead and parked Solr in one of the standard locations, the script will work fine as is.  Place the script in the /etc/init.d directory and make it executable.

Next, follow the instructions for creating the /etc/default/jetty file for the various parameters jetty will need on launch, setting the Jetty home Java home, jetty user, etc. as appropriate.  If you opt to run it under a non-privileged user, such as solr (always a good idea), then follow the instructions on this page for creating the user and changing ownership of the solr directory.  Also set the user name correctly in the jetty configuration file (/etc/default/jetty).  Finally set the run levels.  I just used the defaults (update-rc.d jetty.sh defaults).  I should note that every linux command you see on this page, I preface with sudo, as I’m not operating as root.  More than likely this will be your situation, or should be.

I always prefer to bounce my server after making a lot of these changes, to make sure that everything will start as it should.  So, I recommend doing that, then visit the solr admin page again to make sure everything is loaded.

Configuring for Rails

As I stated before, this is for my rails application, so I need to do a few things to make that work.  I’m assuming you’ve followed something like this to install sunspot_rails in your application.  If not, then do that.  Once completed, you will have a schema.xml file in your <rails_project>/solr/conf folder.  This needs to be copied into the conf folder for each core you are going to be using with your rails application (i.e., cp <rails_project>/solr/conf/schema.xml /opt/solr/solr/core0/conf/schema.xml).  If you have an old schema.xml as I did, that predated Solr 4, you will be missing a key field definition, that needs to be added back.  Sunspot has been patched so that if you just installed it, you shouldn’t have a problem.  If you get an error message about field _version_, then add this line in the fields definition section of schema.xml:

<field name=”_version_” type=”string” indexed=”true” stored=”true” multiValued=”false” />

Next, make sure your sunspot.yml file located in <rails_project>/config is accurate with regard to port and path.  One thing that wasn’t obvious to me, and sure is problematic is that the path is relative to the Solr directory.  That is, if your Solr directory is /opt/solr, and your core is /opt/solr/solr/core0, then the path in the yaml file should be /solr/core0.  The leading / is important as you will get an error otherwise.

Finally, you will want to run a rake sunspot:reindex from within the your app directory.  If you get one of those great rake errors about having the wrong rake running, do a bundle exec rake sunspot:reindex, and all should be well.  I typically run a reindex on every deploy, just to make sure everything is good.  Sunspot will only index new and modified database rows, so if you want pre-existing rows to be searchable, then you need to reindex.

My next step is to run all of this on our production server.  I’ll post back an update on how that goes.

UPDATE:  Ran this on the production server, and all worked as it should (provided you follow the directions, which I didn’t at first, but that’s another story).