Apache Solr Coding from Drupalcon Chicago


Drupal interacts with Solr through HTTP. Drupal sends content into the search index via XML documents. It walks through a node and there's a library that converts the node's attributes to XML, and Solr indexes the XML. Solr accepts documents POSTed to /update, and a different POST can be made to delete. Searching is done via GETs.

If you set up Solr locally (you need Java installed) the server will run at the same default port that is specified in the module, so by installing the module and enabling it, Solr should work out of the box.

Once installed and on the administrative interface, you have access to a number of links including the Schema, which can help you troubleshoot issues with indexing and search results.

Dynamic Fields

When you start writing custom code, you're going to use dynamic fields. They give you a lot of flexibility. They use a wildcard, such as These can be indexed, stored, or even returned in the results. After all the dynamic fields are listed, we can list a so that Solr doesn't quit and fail to index something it doesn't expect.

If you do need to directly talk to the server, there's a Factory method. You can have multiple Solr instances, so just select by a the instance ID:
$solr = apachesolr_get_solr($id);

Drupal 7 changed the parameters, field, and taxonomy handling. All you have to deal with now is the $query object, and the parameters are held as an array attribute within the $query object. Also, in D6 we had a lot of taxonomy code, whereas in Drupal 7 we used the code for field handling in an enhanced way to do the taxonomy integration. We also got a few fixes into Drupal core, yay! So all that data that duplicated core code has been removed, since we're able to use the core search module's implementations.

Field API Integration

A facet is like a filter for your search results. Instead of writing custom code, the Field API integration lets you use fields on nodes, and then turn them automatically into facets. In Drupal 6, there was CCK integration (we would look at the widget) but in Drupal 7 the field type is more specific, so you'd have something like a ListText, which is constrained to always be a list. In Drupal 7, all the list fields can automatically be turned into facets. We create a field name that will go in the index, and then specify which of these fields we want to use as facets. We don't want to make every field a facet, because extra facets are computationally intensive for the server.

Apache Solr Recipes

I want to display additional pagination information

For example, "now viewing 1-5 of 98 results." We have to implement a little bit of custom code to do this. We're going to use the hook_preprocess_search_results(&$vars) function.

First, we'll find the current page and how many results we're using. In Drupal 7, there's an API function pager_find_page() that will tell you what page you're on. variable_get('apachesolr_rows', 10) gets the number of rows for each page. Next we'll get the total number of results from $response->response->numFound. Lastly, we'll build the message by concatenating strings, and feed that into a template variable. By displaying this template variable, we've got what we wanted.

I want the default sort to be on the title.

By default it's relevancy, but title may be more useful for something like a glossary. For this one, we have to invoke some of Solr's extension hooks. We'll use hook_apachesolr_prepare_query(), which you'll probably use often for complex extensions. We'll write a simple function that checks whether $_GET['solrsort'] is set, and if not, sets the query sort to by title.

I want to give users the ability to sort by 'nid'

You might have an instance where you want to sort by price, for example. We'll invoke the same hook, and we'll see that the query object has a set_available_sort parameter that allows you to specify the sort parameter.

I don't want users to sort by date.

We'll implement the same hook, and the query object has a remove_available_sort parameter and we'll set it to 'created'.

Why prepare_query() and modify_query()>

Prepare is invoked before $query is statically cached, and modify is invoked after $query is statically cached.

I want initial searches to be targeted.

You might want to enter a keyword, and have a facet selected. For example, you have an open keyword search but you have a list that a user can select from first. First, we'll want to use hook_form_search_block_form_alter() and add a dropbox to the search form with $form['group'] = ..., and then adding a custom submit handler. This alters the redirect of that form so that it passes the data through the query string.

Did you enjoy this post? Please spread the word.