Installing Redmine 3 on an Existing Ubuntu LAMP server

I will be upgrading some Ubuntu 12 servers in the near future and will be deploying Redmine 3 (upgrading from 2.2).  In order to prepare for that, and because I needed a Redmine instance for a project I will be undertaking (an store management app to integrate with Square), I decided to do a full deploy of Redmine to an Ubuntu 14 server I had setup to use in a course I was teaching at UCSD.  It had Apache, MySQL and PHP installed to support a BPM solution, so I would need to add RoR support to the server, and get Redmine running.  Although this wouldn’t be fully parallel to what I will be doing in a couple of weeks, it would serve as a refresher for me, and alert me to things I need to be aware of.  I was mostly able to follow the Redmine install documents, but there were a few exceptions that I wanted to document, as well as some additional pieces that the Redmine docs don’t really address.

The first step is to get Passenger running.  The instructions are simple and straightforward, and this installation worked just fine.  Once installed, you need to install Redmine.  The first question is about Ruby in your environment.  I attempted to use RBEnv as that seems to be growing in popularity.  After hitting a number of snags, I opted to uninstall RBEnv and switch back to RVM which I’ve had success with.  I suspect that had I wanted to try for a bit longer, I could have ironed out the RBEnv issues, but I decided to move on.  Basically what was happening was that I had first installed it in my home account, then realized that this would be problematic and located instructions for doing a multiuser install, as I want to run my app under a service account, not as my user.  I should point out that I did all of this work with a “standard” user account that was also in the sudoers list.  I believe that I will be able to effectively manage the upgrades over time with this approach.

I followed the multiuser RVM installation instructions located here.  Again, this is straightforward, just follow the instructions and make sure you are working in the multiuser part.  After completing the install, create a Redmine user to act as a service account.  The account can be locked and will work just fine, so people cannot attempt to log on to your system if they guess the service account name.  Once RVM is installed, you can install whatever Ruby you wish (or rubies).  I installed 2.2.3, then made it the system default (RVM use 2.2.3 –default).  I give the redmine user ownership of the files, tmp and log directories recursively as that user needs to write to those directories.

Once that is complete, create a database for Redmine and select a user for this.  As a matter of safety, I create a specific user to access the database, and limit connections to localhost only, for the Redmine user only.  If you have PHPMyAdmin running this is a simple enough task.  In general, and MySQL defaults to this, you shouldn’t be directly connecting to it from another machine.  If such a thing is necessary, you should do it on an IP by IP basis, so that you are allowing just a specific IP.  I have an implementation at a client where there is an integration between MySQL and another tool that requires a direct external connection for read only.  I created a read only user, and allow it to authenticate only from that one IP address.

Install both SVN and ImageMagick.  The OpenID libraries are dependent on whether you want OpenID.  ImageMagick is optional as well, but the default with Redmine is to use RMagick to handle images, and it will not install without ImageMagick.  You’ll want images, so just go for it.  There is one thing that won’t install with the RMagick gem, if you just install ImageMagick (sudo apt-get install imagemagick).  You will also need to install a dev library by using sudo apt-get install libmagickwand-dev.

At this point, you will want to install the MySQL libraries and drivers so that the MySQL2 gem will install correctly.  Run the following.

sudo apt-get install libmysql-ruby libmysqlclient-dev.

At this point download Redmine.  I used SVN and downloaded it to a directory in usr/local.  Then I created a symlink to the directory in /var/www.  From here, configure the database.yml file as directed in the installation directions, then install bundler (gem install bundler) then install the required gems using bundle install –without development test (you can exclude rmagick at this point, as well, if you aren’t going to use it).

Next, create the application secret by executing, from within the Redmine directory (also applies to any app) rake secret.  This generates a secret key and then captures it in an environment variable.  Rails 4 requires that you create a config/secrets.yml file and place the following two lines:

production:

secret_key_base: <%= ENV[“SECRET_KEY_BASE”] %>

This secures your application by not placing your secret key in any of the application files (it exists only at the OS level).

Finish by creating the database schema, default data set, and setting the file permissions as indicated.

The last step is to let Apache know about the new application.  In general, I prefer, like many, to create a redmine.conf file in the /etc/apache2/sites-available directory.  Within it, create the appropriate virtual host entry pointing to the public directory inside of the redline application (e.g. /var/www/redmine/public) for both the document root and the directory.  You will need the following lines to ensure that Passenger runs correctly:

PassengerRuby (path to ruby, see below)

PassengerDefaultUser <service account created earlier>

If you are upgrading from an older Ubuntu version, vs. performing a clean install, make sure that you don’t have any passenger configuration items in your apache2.conf file (typically in /etc/apache2/).  The passenger binary itself should be loaded as a module in Apache, and the PassengerRuby and Default user, although they can be defined in the apache2.conf should probably be specified within the virtual host definition.  It wouldn’t hurt to keep things in the apache2.conf file, but by keeping it in the sites definition, the option exists to run different rails apps using different rubies and users.  If, as I have had in the past, you have a legacy app, you can run it under a legacy ruby at the same time a newer app is running on the latest ruby.

To determine the path to ruby, run the following command:  passenger-config about ruby-command, then copy the path to the redmine.conf file as indicated.

Enable the redmine site (a2ensite redmine), restart apache, and all should work.

Automated Marketing Fail

As the world has become overwhelmed with spam, marketers are increasingly looking to tailor there message more to the recipient. It began years ago with including your target’s name, so that it appears you are receiving a personal message, when you are simply getting spam. Now marketers are becoming more creative, trolling news services to look for more specific information about the company you work for in order to craft a message that appears as if their company is truly interested in yours. Unfortunately, such automation can lead to abject failure, and accomplish the exact opposite (to be fair, I don’t know for a fact that automation is involved in the story I’m about to share, I can only assume that an actual person would have been more careful).

The company I currently work for had an unfortunately negative result in a key clinical trial recently. Such things are not unusual. In fact, clinical trial failures are more the rule than the exception in the pharmaceutical business. We all wish it were otherwise, but it isn’t. So, a consulting firm that provides a variety of services around the pharma industry apparently has some robots that troll Biospace.com looking for interesting headlines. One of our recent press releases had the following phrase “announces top-line data from” followed by the type of trial in question. From the title of the release, there would be no way to know whether the data was positive or negative. Anyone familiar with the industry would assume it was negative, simply because you are wrong more often than right in drug development. However, the individual who constructed this robot, apparently didn’t know that, so I received a congratulatory email about the announcement. The email started with “Congratulations on your recent success! We read about on biospace.com. Continued success to you all on this project.” The email then went on to tell me about the company, which purports to be a life sciences solution provider. They claim to have expertise in our industry. Given that they don’t understand the success rate of clinical trials, that is clearly not the case.

This blog cites a study that looked at failure rates from drugs transitioning from phase 2 to phase 3. Per the embedded diagram, that number sits at about 68%. A company that understands our industry would know that, and would not do anything as foolish as assuming that a pivotal phase II announcement is going to be positive. As a result of this email, I will remember the company’s name, and won’t waste even 30 seconds talking to them about an engagement. If they can’t get something as trivial as this correct, I’m not going to look to them for understanding in any of the more complex aspects of our business. I’m being kind and not including their name in this blog posting. Perhaps I should rethink that.

At any rate, this gets back to a theme that I have been thinking about a lot, which is success recipes for IT leadership. I’ll be teaching a course at UCSD Extension this summer that is focused, at some level, on these recipes. Unfortunately, like drug development, there is no magic bullet that guarantees success, so we’ll be exploring approaches that can help, not dictating a solution. What we can do, and will do, is look at approaches that can guarantee failure. The most common one that has plagued IT and IT related initiatives since the dawn of the computer era is a failure to understand the business you support. This particular email is yet another version of that story, where an IT solution was apparently crafted (the automated robot) that had a decision algorithm that was clearly not written by someone with an appropriate level of drug development understanding. Interestingly, and sadly, this means they have no chance at an engagement with me. I don’t have the luxury of that sort of risk taking. If they do this to other companies (or other people learn of their lack of expertise), what will this mean to the long term success of their organization?

Finding Shangri-La

Several months ago, UCSD extension assembled a group of senior IT professionals as an advisory board for their certificate programs in the IT space. One of the discussion threads was that software engineers (read programmers, business analysts, whatever) were coming out of school with inadequate knowledge of the business to successfully provide services to the businesses they were to work in. The other weak point seemed to be in the knowledge of the importance of systems integration. All of this led to a new class to be required as part of the Software Engineering Management certificate program. This class is focusing on integration from three perspectives. Integrating IT with the business, integrating IT with itself, and integrating IT with the regulatory environment of the business.

This isn’t a sales pitch for the class, but rather a bit of thinking out loud as I continue to struggle with exactly what I’m going to teach in this class. Yes, I’m the person who volunteered for this first effort. Of course, I’m also the person who kicked off the aforementioned discussion, so I definitely deserve this. What was I after, exactly when I initiated that discussion? And what is the solution to the underlying problem?

I started writing this post in early December. It is now the day of the first session, and I guess I’ve come up with at least a preliminary answer. I have a feeling it will morph a bit along the way.

For our first session we are going to look at both software failures – both familiar and a few I’ve been involved with, as well as successes. One area of focus will be a period of time when myself and a small group of others developed multiple systems that were broadly enabling, relatively popular, and were used for almost a decade. Initial attempts to replace these systems with commercial products were a failure. I refer to this point in time as my Shangri-La. It was an almost mythical IT world. Once I left, however, I’ve never been able to make it back.

When I look at Shangri-La, as well as other successes, and most notably failures, I find only one common theme. Tools varied, methodologies varied, but the one common theme was the degree to which developers understood what the users needed. Not what they wanted, but what they needed. In order to return to Shangri-La, we need to figure out how to develop this understanding.

To that end, I’m going to be focusing on the discipline of enterprise architecture as the most likely candidate for providing a general approach to get us, if not to Shangri-La itself, to a place where IT systems are more successful more frequently. My thesis, which is all that it really is, is that software projects are most successful when the developers, through whatever mechanism, really understand the business, its drivers, and its needs.

I’ll try to post back about this topic as we proceed.

IT’s Role in Organizational Design

Enterprise architecture(EA), as a specific discipline, dates back to the late 80‘s. However, it has been very difficult for EA programs to gain much traction. Very often, the architecture program simply becomes focused on creating as-builts of existing applications and infrastructure. Indeed, unless you are starting a completely new endeavor, some amount of retrospective work will become necessary. Prospectively, EA programs fall by the wayside. My own experience with an enterprise architecture program supports that. I engaged with the group as I thought the modeling tools might be useful. It never occurred to me that the architecture group might be of use in engineering new systems.

So, EA groups need to prepare and market themselves internally as resources to help facilitate new information systems engineering efforts. That would be a start, but it will not be enough. Two other things are necessary. The first is that the CIO needs to mandate that the architecture group isn’t simply a resource. Their tools and methodology must be the only path to travel. I’ll explore this topic in more detail down the road, as I prepare for a course I will be teaching early next year. The other thing needed is a significant change in the perspective of the typical CEO. Many CEOs look at IT as simply a provider of widgets, and as the group that makes sure computers are on desks and that email is on. That is a fundamentally flawed perspective.

As I sit here typing this on my iPad, and with an iPhone in my pocket, I’m led to reflect on the degree to which information technology has become ingrained in our daily lives. My devices help me find where I’m going, track my nutrition and exercise, allow me to communicate with others in a variety of ways, and help me stay plugged in to what is happening in the world. The devices have become integrated into my life. They improve it, and, yes, sometimes detract from it. But either way it has become integrated. Engineers and app developers all think about life processes and how IT might make them better. More and more frequently they re-imagine the processes themselves.

The modern CEO needs to take this perspective and expect their IT function to engage at the level of the business processes. Not only enhancing them, but even rethinking them. There are two different models that come to mind to effect this. One is an organization where IT is centralized. All information systems are built and managed by this group and they provide resources for business design to each line function. The other is a more distributed design, where IT is responsible for processes and standards, but execution is embedded within individual business units. I will explore both of these models in the future, but in either one we see that IT is not primarily a technology implementer, they are a business process designer. Failing to make this philosophical change will always limit the possibilities for an organization. The CEO shouldn’t just visit his head of IT when his phone is no longer receiving email. He should visit him every time he is pondering the way his organization runs.

Computer Science vs. STEM

The folks from Code.org posted something this morning that merits more than a standard Facebook response. They posted the following graphic:

jobgap

Then raised the question, is the government focusing too much energy on traditional STEM education, and not enough on computer science, since the job gap is in computer science, not in traditional STEM (in fact, there is a reverse job gap in STEM – more students than jobs). My reply to them is sort of a yes, but…

Yes, we need to be focusing on computer science education if there is this job gap, but…

The real problem is that we shouldn’t be having this conversation.

Back in the mid-80’s (I think that is the late Cretacious period of computer science), I was an undergraduate TA in the UCLA Department of Chemistry and Biochemistry undergraduate computer lab. I was majoring in Biochemistry, but had fallen in love with computers in high school, like so many others in my age group. The professor in charge of the lab, Dr. Sandra Lamb, was continuously frustrated by the hesitance of so many professors to include computer science education in their curriculum. This was a huge problem among the biochemistry professors, and less so among the chemistry professors. Only one professor, Dr. Daniel Atkinson, who had published a book on using spreadsheets to model cellular metabolism, agreed with her concept.

Fast forward maybe 10 years later, when I was setting up quality departments in the biotechnology sector. At that point, I was developing a significant amount of my own software, for document management, batch record management, etc., simply because it was impossible to manage the amount of work we had in a startup pharma company without the benefit of automation. It amazed me, at that point, as computer driven instrumentation had become the rule for the analytical lab, and was becoming the rule for the manufacturing environment (in my case it was the rule – all of our manufacturing processes were computer driven), how few people were coming out of the university system with any significant understanding of how computers operate.

I had, by then, reached the conclusion that perhaps pure computer science wasn’t so important on its own, than having a robust computer science component within a science education. In fact, I think virtually all disciplines need this. Within the sciences, the ability to use a computer is absolutely required (and as someone who has managed support functions for a number of years, it is a sore point how many scientists can barely turn one on these days). However, the ability to make a computer work for you is frequently as important. So many science careers require programming at some level. Bio-statisticians need to program in SAS, S+, and R. Process scientists need to build complex models, coding in tools like Matlab. People in other disciplines need to know how to code and script their various management and analysis tools to get the most use out of them.

In some cases, the lack of computer science knowledge simply means that a specific job may be out of reach (I can’t think of too many people in the bio stats field who can’t handle a reasonable amount of stats related programming), but in others it means that you have individuals who quite literally wallow in inefficiency. They don’t know how to code, so they do things the old fashioned way, waiting for someone who does know how to code to come along and rescue them.

The time for that has come to an end. I somehow suspect that a large number of the computer science jobs represented in the above graphic are jobs where the ideal candidate would have significant knowledge in a specific arena, like STEM, but must also have the computer science background. I suspect the same is true in the jobs listed within the STEM categories. We should no longer have the conversation of computer science vs. STEM, because we should no longer be graduating STEM students without significant computer science education.

It boggles the mind that Code.org needs to exist. This is the 21st century, after all, and this conversation should have ended back in the 20th when I was in school. I love Code.org. I support what they are doing. I also look forward to the day, as I’m sure they do, that they no longer need to exist.

How the Cloud Is Allowing Computers to Realize Their Potential

Ever since the dawn of the computer age, we have been seeking a mythical computer utopia.  A place where computers are truly an integral part of our lives and truly bring benefit to them.  I’m reminded, to some degree, of the car in the Robert Heinlein novel “Number of the Beast” that was in virtual constant communications with its owners, was able to write its own programs, etc (It may have been more of a computer that had an interface in the car, I haven’t read the book in probably 30 years).  Steve Jobs certainly had this vision, as is discussed in the book “Insanely Great”.  However, there is always the sense that we haven’t quite gotten there yet.  The ring is just out of reach.  It is always dangerous to say that thus and such a technology will allow us to grasp the ring, so I won’t go there.  However, I will say that Cloud technology has allowed us to move closer to the ring.

We all think of the cloud as a way to store data or deploy applications outside of our own data centers, and that is still its primary objective.  There is a necessarily positive side effect of doing this.  That is, I can now access these applications and data from anywhere.  When I started moving more of our functions into the cloud, it was, in part, so that our small team could function anywhere.  It dawned on me this morning, that this, still, was only the tip of the iceberg.  In fact, it is the ability to work anywhere, and on any device, that is key.  I am working in my home office right now, with no devices that belong to my company.  Yet, I needed to see what time a couple of meetings were today.  I looked on my iPad (because it was the closest device to me and was open at the time).  As I opened the collection where my calendar was, I was actually looking for a planner app I had been using a while back, but it wasn’t there anymore.  I then remembered why.  I had run across several planner apps that were something like the Covey planner system.  I like the way they handled tasks and presented everything holistically.  However, the bad ones were almost unusable, but the good ones used an independent system of tasks and reminders from the iOS ones.  Suddenly, any tasks I created were only available on my iPad.  I stopped using them.

My calendar, however, is available on every device I own.  From my office laptop, to my personal laptop, iPhone, iPad, you name it.  So are my reminders.  So are my emails.  So these apps are actually useful because they allow me to have my data at the ready wherever I am.  I’ll admit the most important device in this regard is my phone.  Not because I use it for that much, I don’t.  It is because it is the one device that is always with me.

In order for this sharing of data between devices to work, the data must exist in the cloud somewhere.  The alternative is to perform an inter computer sync, which was quite the rage for a while with different applications, but is beginning to go the way of the dinosaur.  Calendar, email, tasks, etc., are all in the cloud for a lot of us.  A growing number of applications will use a cloud file storage solution (Box, DropBox, Skydrive, etc.) as a backend.  I use DropBox extensively in my personal life simply because it permits me to have access to a lot of information no matter where I am.  Evernote is increasingly the poster child for this sort of functionality.  As I’ve looked at applications for things like handwritten note capture, I’ve dropped a great number of them because they either don’t provide a seamless multi device utilization model.

You see, the cloud allows our computers to be more integrated with our lives, and truly become servants.  Although a legitimate argument can be made that computers have taken over our lives, I think that is looking at the situation incorrectly.  When we had to go sit at a desk and fire up a machine, wait for the boot cycle, then dig for information in order to achieve our goals, we were slaves to the machine.  Somewhat like Oliver Twist begging for more, we would approach the machines like they were our masters.  Now, I expect my information to be at my finger tips when I want it, on my terms.  A bit more like Downton Abbey, where I can pull a cord from whatever room I am in, and a servant will arrive ready to provide whatever I need.  Applications that cannot perform this task are of no use to me, and must be banished.

When I first began thinking about using the cloud, it was all about risk mitigation and cost management.  Now, I see it more as how to enable computers to truly be integrated in our lives.  The next step, is to begin to educate my user base on this, so that they can realize these benefits.

Sunspot, on my Servers, Makes me Happy

After a week troubleshooting Solr and getting it running on Ubuntu for a test version of the Limspec app, I think I am allowed the bad humor.

I recently posted a summary of how I deployed a Rails application on a Debian VPS using Nginx.  The steps are fairly similar to what you would do with Ubuntu.  For the Limspec project I’m working on, we’re using Apache – mostly because that is what I started with, but there is no practical reason to not use Nginx.  The instructions in that post should work well, but you would need to install the mod_rails module for Apache and create the appropriate sites-available files.

The application that I was deploying is a fairly simple application for my Church’s folk dance director to use for managing the dance program, and at this point doesn’t utilize any search, let alone full text, so the instructions didn’t include what is a fairly critical element to some of my other applications, and that is Solr for full text searching.

Solr is from the Apache Lucene project, and is a very powerful enterprise search platform.  I had implemented it for Limspec quite some time ago.  However, we had a VM meltdown a few months ago, and the replacement VM only seemed to have pieces and parts from the previous VM.  This was a huge problem getting Limspec deployed again (in addition to this, I no longer had root access, which is probably good as I was forced to set things up in a more secure fashion).  When all was said and done, however, I had forgotten to check Solr out.  It was working fine on my dev machine, so all search related tests passed with flying colors.  Important safety tip with TDD.  Even if you test exensively on your dev machine, you need to be very aware of those things that are deployed quite differently in production.  Solr is one of those.  Although it appears that you might be able to use the sunspot_solr gem in production, the developers who created it indicate they only intend it for use in development.  After hours spent trying to make it work, I tend to agree.  I could never quite get it running, so I finally gave up.

So, not having taken notes on how I installed Solr the first time (well over a year ago), I set out to do it again.  Of course, there is a newer version of Solr, and a newer version of the sunspot_rails gem.  When I was rebuilding the actual Limspec server, I created an Ubuntu VM on my desktop to try everything on first.  So, I continued to use that VM to figure out Solr.  The following instructions are based on adding Solr to my local VM, which is running Precise Pangolin (Ubuntu 12.04 LTS).

Jetty

Solr is a Java servlet, and so needs a servlet container of some sort.  Previously, I had used Tomcat.  However, Tomcat is fairly memory intensive and is really only necessary for more complex Solr installs (such as multiple instance).  Of course, if you are already using Tomcat for other purposes, it would probably make more sense to deploy Solr with Tomcat than to run yet another web server.  If not, Solr comes complete with its own copy of Jetty.

From here on out, everything is fairly straightforward, but it took me a while to figure it all out.

The first step is to download the latest version of Solr, or at least the version you are interested in.  I opted for 4.6, which is the most recent version at the time of me writing this. Once you’ve downloaded the tar file, you can untar it wherever you’d like, as you’ll be copying a subdirectory out to another location.  There are a number of locations you can use as your Solr home.  I recommend that you take a look ahead to the startup script from the Solr Wiki and choose one of the standard locations in the script.  Remember that I’m looking to use Solr in support of another goal, so I want to minimize anything that makes my installation non-standard.  Taking this approach, as well, makes maintenance of the application and installation by other users, much easier.  I chose opt/solr as my home, so I executed a mv solr-4.6.0/example /opt/solr.

The next thing to do is decide whether you are going to run single or multicore.  I tend to have staging instances on the same server as production, so I want multicore.  To deploy for multicore, within opt/solr, delete the solr directory (i.e. rm -r /opt/solr/solr), then mv /opt/solr/multicore /opt/solr/solr.  This then gives you a multicore deploy.  By default, you have two cores in place, core0 and core1.  You can certainly stick with those names, but I wanted names that would tell me what those cores are being used for.  If you want to change the names, first execute a mv core0 <newCoreName>, then update the solr.xml file within /opt/solr/solr to indicate the new names and paths.  That is, change

<cores adminPath=”/admin/cores” host=”${host:}” hostPort=”${jetty.port:8983}” hostContext=”${hostContext:solr}”>
<core name=”core0″ instanceDir=”core0″ />
<core name=”core1″ instanceDir=”core1″ />
</cores>

to:

<cores adminPath=”/admin/cores” host=”${host:}” hostPort=”${jetty.port:8983}” hostContext=”${hostContext:solr}”>
<core name=“newCore” instanceDir=“newCore” />
<core name=“newCore2″ instanceDir=“newCore2″ />
</cores>

or whatever names you chose.  I tried to use names with spaces in them, and Jetty didn’t like that.  I’m not sure if it was because of the spaces, or the fact that the name wasn’t the same as the instanceDir, so I just made them both the same and the problem went away.

If you want to test your installation at this point, you can fire up jetty by running java -jar /opt/solr/start.jar.  Then go to http://yourserver:8983/solr, and you should see both of your cores in the coreadmin screen.  Note that you can change the port for Solr within the solr.xml file if you would like.

Starting Automatically

The next thing you’ll want is for Solr to launch on startup in the background.  This proves to be easy to do.  First, download the jetty.sh script linked to from the SolrJetty page.  If you looked ahead and parked Solr in one of the standard locations, the script will work fine as is.  Place the script in the /etc/init.d directory and make it executable.

Next, follow the instructions for creating the /etc/default/jetty file for the various parameters jetty will need on launch, setting the Jetty home Java home, jetty user, etc. as appropriate.  If you opt to run it under a non-privileged user, such as solr (always a good idea), then follow the instructions on this page for creating the user and changing ownership of the solr directory.  Also set the user name correctly in the jetty configuration file (/etc/default/jetty).  Finally set the run levels.  I just used the defaults (update-rc.d jetty.sh defaults).  I should note that every linux command you see on this page, I preface with sudo, as I’m not operating as root.  More than likely this will be your situation, or should be.

I always prefer to bounce my server after making a lot of these changes, to make sure that everything will start as it should.  So, I recommend doing that, then visit the solr admin page again to make sure everything is loaded.

Configuring for Rails

As I stated before, this is for my rails application, so I need to do a few things to make that work.  I’m assuming you’ve followed something like this to install sunspot_rails in your application.  If not, then do that.  Once completed, you will have a schema.xml file in your <rails_project>/solr/conf folder.  This needs to be copied into the conf folder for each core you are going to be using with your rails application (i.e., cp <rails_project>/solr/conf/schema.xml /opt/solr/solr/core0/conf/schema.xml).  If you have an old schema.xml as I did, that predated Solr 4, you will be missing a key field definition, that needs to be added back.  Sunspot has been patched so that if you just installed it, you shouldn’t have a problem.  If you get an error message about field _version_, then add this line in the fields definition section of schema.xml:

<field name=”_version_” type=”string” indexed=”true” stored=”true” multiValued=”false” />

Next, make sure your sunspot.yml file located in <rails_project>/config is accurate with regard to port and path.  One thing that wasn’t obvious to me, and sure is problematic is that the path is relative to the Solr directory.  That is, if your Solr directory is /opt/solr, and your core is /opt/solr/solr/core0, then the path in the yaml file should be /solr/core0.  The leading / is important as you will get an error otherwise.

Finally, you will want to run a rake sunspot:reindex from within the your app directory.  If you get one of those great rake errors about having the wrong rake running, do a bundle exec rake sunspot:reindex, and all should be well.  I typically run a reindex on every deploy, just to make sure everything is good.  Sunspot will only index new and modified database rows, so if you want pre-existing rows to be searchable, then you need to reindex.

My next step is to run all of this on our production server.  I’ll post back an update on how that goes.

UPDATE:  Ran this on the production server, and all worked as it should (provided you follow the directions, which I didn’t at first, but that’s another story).

Deploying a Rails App to a Linux Server

Rubymine

Why?

Recently, I’ve deployed different Rails apps to three different Linux servers from scratch.  All three were virtual servers.  Two were Ubuntu 12.04 servers using Apache.  The most recent is a Debian server (version 6) running Nginx.  Each server took progressively less time to launch, but with each one, I found myself grabbing bits and pieces from Stackoverflow.com, and other destinations in order to successfully complete my effort.  So, I’ve decided to take the notes I made on this most recent deploy and post them here so that perhaps others will be able to benefit from my trials.

The Host

For all of my basic websites and e-mail, as well as a couple of old Rails applications, I’ve used Dreamhost shared servers.  For websites, this has worked out well, being a very cost effective means of managing websites.  There has also been little downtime and few overall issues.  However, with Rails, there is a major issue.  While I understand the desire to avoid being on the bleeding edge with technology, Dreamhost has remained rather rooted in the distant past.  I have long since ceased doing updates to my first application, the one I use to manage our parish bookstore, simply because I can’t move to any current gems or technologies.

However, Dreamhost has a relatively low cost Virtual Private Server option, where you get a virtual server running Debian.  There are various levels of configuration for the server, from situations where Dreamhost manages most of the major configuration options (web server, users, etc.) to where you manage it all.  In the few days there are only two negatives with Dreamhost, and neither of them represent, in my judgement, much of a problem.  The first is that you have to use Debian – there is no other option.  The only other company I have to compare this to is Rackspace, where you have a wide variety of Linux flavors (and Windows for an appropriately larger fee).  With Rackspace, however, there is a bit of an increased cost that comes with the flexibility.  The other issue is around the chat support.  With Rackspace, I tend to have a very capable technical person in the chat app in under a minute.  With Dreamhost it has been 10 -15 minutes, and I’ll admit the quality of the technician isn’t as high.  Nothing dramatic, but there is a level of you get what you pay for.

While discussing hosts, I should mention my choice of code repository.  I have opted to use bitbucket.org, an Atlassian product. Why?  Well, free is the good part, but I also get to have the repository remain private.  The only limitation I have is that I can’t move past 5 users.  Well, since these are apps that I’m building myself, I doubt this will be an issue.  You can, of course, use Github, but you have to pay to be private.  I’m a huge Open Source fan, but I don’t necessarily want everything that I’m working on to simply be out there.  At some point in the future, I may take some of these apps and make them publicly available, but I like being able to start with them privately.

Okay, Let’s Get Down to Work

So, I provisioned the Dreamhost VPS and since I get a week free, I opted to max out the memory available for the server.  This proves to be beneficial as some of the software installation processes get very memory intensive.  Installing the Pasenger-nginx module will complain if you have less than 1024 Mb of RAM, and if your server doesn’t have that much, it will reboot in the middle of the installation process.  I opted to deselect every “Dreamhost Managed” option, perhaps even when I didn’t need to, but I figured it would be safest to be able to do my own installations.  This included selecting no web server to begin with.

For most of the installation process, I followed the instructions at Digital Ocean.  Yes, these instructions are for Ubuntu, but that is a Debian variant after all, so I didn’t run into any trouble.  The only thing I did differently to begin with was to run aptitude update and aptitude dist-upgrade in order to ensure that everything I needed was available.  I also opted to follow the RVM installation instructions from the RVM website for multiuser.  I have, over time, found various sets of instructions on RVM installation, and have always found it best to simply go with the authors.

Everything else installed as indicated (I did opt for Ruby version 2 instead of 1.9.3).

Nginx

I opted for Nginx in all of this for a couple of reasons.  The first is that I really didn’t need all of the capabilities of Apache to run just Rails applications.  Down the road, I do expect to use Solr, but I believe that the installation will build its own version of Apache.  Nginx is also supposed to keep a relatively small memory footprint, which is important as I’m paying for memory, and it is supposed to be faster.  I haven’t run my application on it long enough to decide, but time will tell.

When you are done running the above instructions, it is likely that Nginx won’t work.  :) Surprise.

I believe that the problem was I had residual Dreamhost Nginx pieces on my server, most notably the nginx init script from the /etc/init.d directory.  For those who are very adept at fixing Linux scripts, fixing the one that is present isn’t probably very difficult.  For my part, though, I just grabbed the script present on this page about setting up Debian and Rails.  The script is not entirely robust, as I find myself needing to manually kill the nginx processes if I need to restart them, but that isn’t much trouble and I’ll likely fix this later.  Outside of making the script executable and ensuring that it runs at startup, I mostly ignored this page.  A lot of it is because the default Debian install from Dreamhost has much of it taken care of.  The other issue has to do with RVM.  I’ve long since learned the advantage of using RVM, so manually installing Ruby seems like a bad idea.  There are some other interesting looking parts on that page, so I suspect it is more useful in general than I took advantage of.

After making these changes, Nginx just… didn’t work.  The problem was with the log files for Nginx which were all owned by root.  Seems like a bad idea.  I modified /opt/nginx/conf/nginx.conf to run as www-data then changed the log file ownership appropriately.  This is user is very much unprivileged in the system, and so seems like a good choice to run nginx as (Apache defaults to this to, so it should seem familiar to people who have worked with Apache).

MySQL

MySql installation was amazingly painless. I’ve had problems with it before, but I followed the instructions from cyberciti.biz, and all was happy.

Deployment and Rails Tidbits

A lot of what I’m going to say here will likely result in a bunch of face palming by more talented developers than I, but since I’ve not done a lot of new deploys in the past, I still trip over amazingly trivial things, so I figure (hope) I’m not alone in this.

The first bit is to remember to generate keys on both your development machine and on the server and provide the public keys to bitbucket, so you can download the source code during the deploy process.  BTW, I use Capistrano for deploying my rails apps, as I find it easier for doing updates.  Frankly for an initial install, I don’t think it helps too much, but down the road you’ll be happy if you use it.

When you create the keys on your server, make sure you do not use a pass phrase.  Although the server will ask for your passphrase during the deployment process, Capistrano doesn’t seem to actually transmit it, so your deploy will fail.

Also, don’t forget to run cap (stage) deploy:setup.  I always forget to do that on first install, then watch it fail as the target directories don’t exist.  Before you do the deploy, however, you should change the /var/www directory to be owned (chown www-data:www-data) and writable (chmod g+w) by the www-data group.  I should have mentioned that my deployment server user is a member of www-data.  This makes it easier to make changes during the installation process.   Turns out that giving global www-data too many privileges is not wise.  Plan on running the rails application under a dedicated service account and give that account permissions to the appropriate folders for running the application (typically you just need public, tmp and log directories and their subs, as well as any custom directories you need to write to).  The installation can run using your user account.

Two other issues I ran into had to do with bundler and with a javascript runtime.  I ran deploy and received an error that there was no bundler.  I performed a gem install bundler on the server, but that didn’t help.  I then discovered that I was missing a require ‘rvm/capistrano’ at the top of my deploy file, which is necessary for doing capistrano deploys in an rvm environment.

The javascript runtime is best dealt with by installing node.js which you can do by following the instructions here.  You can go get a cup of coffee and a doughnut while this installer is running.  It takes a while.

Another problem was with using Zurb foundation.  Since the foundation css file doesn’t actually exist, the application will not run when you access it from your web browser.  So, it is necessary to run a bundle exec rake assets:precompile at the end of your installation.  Apparently you will also need the compass gem in your gem file ahead of the foundation gem.

Finally, if you are running multiple stages on your server (I have a staging, um, stage, for testing new stuff out with users) you want to make sure the RAILS_ENV variable is properly set.  You can follow the instructions at the mod rails site for doing this.

The Future of Cloud

IMG 1487

Every time a significant shift in information technology appears, there is a period of extremes in response to it, where one minute everyone is really excited about the option and the next, an extreme pessimism sets in.  Cloud computing appears to be going through the same thing.  In a post from this morning’s CIO Journal, Salesforce.com’s declining profit forecast is used as a jumping off point to discuss some of the conversations at the MIT Sloan CIO Symposium, where the winner of this year’s CIO Leadership Award expressed some hesitancy around Cloud computing.

Now, I’m operating off of only a couple of quotes and paraphrases, so it is always hard to gauge the actual context of the remarks, but a superficial read of the blog posting might lead those pondering a move to the Cloud to doubt the wisdom of doing so.  However, this is really just a brief foray into pessimism.  In fact, a closer read of the piece merely underscores the sort of analysis that frankly everyone should go through before any technology decision is made.  My only real disagreement stems from the fact that either Mr. Blanchette is being guilty of a bit of hyperbole, there is more to what he really said, or he is somewhat mistaken, when he is quoted as saying that value propositions from Cloud vendors must be drastically better than those of on-premise vendors.

Frequently, business success comes at the margin, so the only reason I could see for setting the bar as high as “drastically better” would be if he has his eye on other issues.  For instance, in a large organization with a large technology investment, movement into the Cloud can mean a significant, and negative disruption to the organization.  Movement to the Cloud often means workforce realignment at some level.  In fact, I could envision a situation where movement to the Cloud could mean in increase in internal labor in order to manage the relationship.  This only seems plausible where the Cloud offering represents a new system, but I can envision such a situation.  At the same time the architectural complexity of the environment has significantly increased, and in most circumstances the level of control would decline, at least a little bit.

However, as I’ve posted before, the control issue is less real than people imagine.  I think the same holds true for the SLA’s Mr. Blanchette refers to.  Perhaps his company is different, but I have watched SLA’s (which is most companies mean Service Level Acceptance – that is the level the customer is forced to accept – as opposed to an actual agreement) gamed by a variety of techniques, such that outages and down time is harder to quantify.  I have one Cloud vendor where I can get a pretty talented support person on line within seconds for my apps and servers.  Even when I worked for a large company, that was absolutely never the case unless I was a personal friend of the individual I needed to talk to.  Frequently, the problems that the IT and business people further down in the trenches experience with their infrastructure and app support teams, are shielded from the senior management and CIO until they become really severe.  Even a value proposition that appears only marginally better on paper may reflect a significantly better reality.  The question is, how would you know?  The only real way, of course, is to ensure that you really understand your environment and the way it works for people.  How to do that, of course, is the subject of a great number of management classes and programs, so I’ll leave that for another time.

One thing I did like from the summit was the understanding on the part of the CIO’s that people obtain Cloud services on their own, independent of IT, because most corporate IT organizations are not responsive enough.  I just hope that this wasn’t a recent revelation.  The inability of most IT organizations to be responsive is very old news.  It is why, back in the early 90’s that Filemaker and 4D (and later Access and Excel) became the bane of IT’s existence.  Since IT couldn’t provide any sort of rapid application delivery to address emerging needs, people found tools that allowed them to go it on their own.  Of course, these tools were often used to create poorly engineered databases, but they worked, and business units grew to depend on them.  At some point, the original designer left, or they needed the application to support more people, and they would hand over the dysfunctional system to IT to manage.  The situation hasn’t changed.  We live in a world where some large organizations didn’t finish moving to Windows XP until over year after it left mainstream support and a mere 4 years before End of Life.  That is the reality of corporate IT.  I have only seen glimmers of organizational transformation that leads me to think that this might change at some point.  Until it does, the only thing that IT can really do is to take the lead into the Cloud, and be supportive of businesses seeking solutions there.  I have one user who needs a moderately expensive tool, that is available in the Cloud.  I could tell her to wait at least a year, until I have bandwidth to attempt to bring such a solution in house (assuming I can find the dollars necessary to support the infrastructure costs), or I can let her look at systems as a customer, with my presence only as a second opinion and to verify that these systems will fit into our overall strategy.  The former means she is operating with a significant handicap for at least a year, the latter that we are able to achieve more in less time.  Which do you think is the best approach?

Cloudy with a Chance of Leverage

Today’s CIO journal published a piece doing a compare and contrast between the big 3 of cloud computing. There were several things of note, but one particular line caught my attention: “Amazon.com appears most willing to enter into customized agreements with its larger customers…” Amazon is, from the sounds of the article, most willing to create some level of customization in their offering. However, only for larger customers. I get this. Losing one small customer is unlikely to have a significant impact on bottom line. Losing one big customer would be a different story.

In many areas, to offset this large customer gets the perks approach, smaller customers can band together to generate the leverage that they cannot enjoy individually. It occurs to me that this could be possible in the cloud. That is, some form of consortium could form to negotiate pricing, SLA, and other topics of interest. Of course, small users could opt for other cloud providers that have a better offering in terms of price and control. However, for many there are solid reasons behind wanting to stick with a major player, and a consortium would allow greater opportunity.