Extending Apache Solr and Views 3 for Agence Science-Presse
Agence Science-Presse, a French-language newswire service specializing in scientific news, turned to Koumbit to redesign and upgrade their old Drupal 4.7 site to a more modern layout built on top of Drupal 6. Read on for a description of how we customized Apache Solr and combined it with Views 3 for a powerful user experience.
Key features of the site include a combined content search and browse functionality based on Apache Solr, the ability for users to interact with the site by commenting on and voting for articles they like, faceted searching based on taxonomy terms and user comments and rankings, and integration with social networking sites Twitter and Facebook including shared sign-on functionality.
Data migration
Because our redesign included creating new content types with streamlined CCK fields and fewer taxonomy terms, we used the table wizard and migrate modules rather than perform a standard upgrade of the existing Drupal 4.7 site to Drupal 5 and then Drupal 6. We also used this approach to import data from a blog managed with Nucleus CMS and bring it into a new content type.
Integration with Apache Solr
From a technical standpoint, integration with Solr was the most challenging part of this project. To make articles as easy to for users to find as possible, we seamlessly integrated browsing and searching by using Solr as the primary interface for finding content. To control how this was formatted, we decided early on to display a View of Solr results instead of theming the default search result pages, by using the Apache Solr Views module. As Robert Douglass said in his blog, it's worth upgrading to Views 3 for!
All of this worked very well out of the box, but we needed to add facets and sorting options which weren't provided by default. The Apache Solr module comes with very good documentation in its README.txt file explaining how to add CCK fields as a new facets by implementing hooks in a custom module, but in our cases we needed Solr to index the number of comments and the number of bookmark flags, and to allow the user to sort results on these fields. Apache Solr has already been extended to allow adding new sorts via a hook, which made our job much easier. We also had to hook Apache Solr to exclude content flagged by an administrator for inclusion in a private RSS feed, since these posts are not visible to the public.
We were required to allow users to switch between four different sorts for search results: highest scoring, most recent, most flagged, and most commented. Views allows changing the direction of the available sorts via an exposed filter block, but it is not possible to change the order of the sorts. To implement this, we created a View with four page displays, each with different sorting options, and created a custom block which provided links between these four displays, retaining the current search options, by checking $_GET['q'] to determine what the user had searched for. Exposing sorting by flag count via views required implementing hook_views_data_alter(), but it just required a few lines of code, as Apache Solr Views author Scott Reynolds graciously explained after seeing how we had first tried to do this by modifying his module.
Implementing a custom search
The entire custom module we wrote is available below. It implements the following hooks:
- hook_apachesolr_node_exclude()
- Controls whether Solr indexes certain nodes. Used to stop Solr from indexing nodes with a specific flag.
- hook_apachesolr_update_index()
- Modifies the document object before indexing. Used to add the number of bookmark flags to the document object.
- hook_apachesolr_prepare_query()
- Runs before query is statically cached, used to add new sorts via the set_available_sort() method.
- hook_apachesolr_modify_query()
- Can be used to modify the query object and the search params directly. Used while debugging.
- hook_block()
- Used to add a block displayed alongside Solr search results which allows users to change the sort options (by switching to a different page display for the view, with the same search parameters).
- hook_flag()
- Used to mark nodes for re-indexing by Solr after flagging or unflagging.
- hook_comment()
- Used to mark nodes for re-indexing by Solr after comments are added or deleted.
- hook_views_data_alter()
- Used to expose the new sort on flag count to the Solr views integration
If we just had to index CCK fields, we could have tried the Apache Solr Facet Builder module, but writing custom hooks ourselves seemed liked a simpler solution since we needed to also create custom sorting options.
Performance tuning with boost and APC
Other than the Solr search engine, which is hosted on our dedicated Solr hosting service, the site is run from just one machine, despite handling over 400,000 hits per day. Since most traffic is from anonymous users, we handle this load using APC, a PHP accelerator, and the boost module, which caches pages as static HTML so that Apache does not have to run PHP more than necessary.
We ran into an "interesting" problem with boost which we ended up working around instead of actually solving. It was hard to reproduce, meaning hard to understand -- you might call it a bugfoot kind of bug. We posted details of our workaround and all the debugging information we had in the boost issue queue, but unless someone else reports this problem it will be hard to make any progress on it, since it's not clear which combination of modules was responsible.
Screenshot Gallery
Searching and browsing articles

Editing View of Solr search results

Sample code
<?php
/*
sp_solr.module
written for Agence Science-Presse <a href="http://sciencepresse.qc.ca/
" title="http://sciencepresse.qc.ca/
">http://sciencepresse.qc.ca/
</a> by Réseau Koumbit <a href="http://koumbit.org/
*/
/*
" title="http://koumbit.org/
*/
/*
">http://koumbit.org/
*/
/*
</a> hook_apachesolr_node_exclude() is called for each node before its added to
the index. all nodes for which this returns TRUE are omitted.
*/
function sp_solr_apachesolr_node_exclude($node, $namespace) {
$flag_counts = flag_get_counts('node', $node->nid);
if (is_array($flag_counts) && $flag_counts['fils_rss_prive']) {
return TRUE;
}
}
/*
hook_apachesolr_update_index() modifies the $document object before
indexing.
*/
function sp_solr_apachesolr_update_index(&$document, $node) {
$flag_counts = flag_get_counts('node', $node->nid);
$flag_count = 0;
if (is_array($flag_counts) && $flag_counts['bookmarks']) {
$flag_count = $flag_counts['bookmarks'];
}
$document->is_flag_count = $flag_count;
}
/*
hook_apachesolr_prepare_query() runs before hook_apachesolr_modify_query(),
before query is statically cached. new sorts must be added at ths point.
*/
function sp_solr_apachesolr_prepare_query(&$query) {
$query->set_available_sort('comment_count', array(
'title' => t('Comment count'),
'default' => 'desc',
));
$query->set_available_sort('is_flag_count', array(
'title' => t('Flag count'),
'default' => 'desc',
));
}
/*
views doesn't allow changing the priority of different sorts via exposed
blocks, just their order. to fake this, we create multiple displays
implementing each of the different sorts, then manually parse the URL to
link between them, giving the effect of having changed the sort parameters.
*/
function sp_solr_block($op = 'list', $delta = 0, $edit = array()) {
if ($op == 'list') {
$blocks[0] = array('info' => 'sp_solr: '.t('Sort order'));
return $blocks;
}
elseif ($op == 'view') {
switch ($delta) {
case 0:
// define available sorts
$sorts['pertinents']['active'] = TRUE;
$sorts['recents']['active'] = FALSE;
$sorts['aimes']['active'] = FALSE;
$sorts['commentes']['active'] = FALSE;
$sorts['pertinents']['title'] = 'Plus pertinents';
$sorts['recents']['title'] = 'Plus récents';
$sorts['aimes']['title'] = 'Plus aimés';
$sorts['commentes']['title'] = 'Plus commentés';
$sorts['pertinents']['path_prefix'] = '';
$sorts['recents']['path_prefix'] = 'recents/';
$sorts['aimes']['path_prefix'] = 'aimes/';
$sorts['commentes']['path_prefix'] = 'commentes/';
// inspect path to determine current sort (default: 'plus pertinents')
$elements = explode('/', $_GET['q']);
if (in_array($elements[1], array_keys($sorts))) {
$sorts[$elements[1]]['active'] = TRUE;
$sorts['pertinents']['active'] = FALSE;
unset($elements[1]);
}
// build new path
unset($elements[0]);
$path_suffix = implode('/', $elements);
// create output
$content .= "<ul>\n";
foreach ($sorts as $sort => $data) {
$options = array();
if ($_GET['s']) {
$options['query'] = array('s' => $_GET['s']);
}
if ($data['active']) {
$options['attributes'] = array('class' => 'active-sort');
}
$content .= '<li>'. l($data['title'], 'articles/'.$data['path_prefix'].$path_suffix, $options);
}
$content .= "</ui>\n";
$block['content'] = $content;
$block['title'] = t('Sort order');
break;
}
return $block;
}
}
/*
hook_views_data_alter() exposes our new field and sort to views
*/
function sp_solr_views_data_alter(&$data) {
$data['apachesolr']['is_flag_count'] = array(
'title' => t('Flag count'),
'help' => t('The number of flags for the node.'),
'field' => array(
'numeric' => TRUE,
'handler' => 'views_handler_field_numeric',
'click sortable' => TRUE,
),
'sort' => array(
'handler' => 'apachesolr_views_handler_sort',
),
);
}
?>- Login to post comments

