Allowing your Internal Search Engine to Index JIRA Issues

Ben Tasker

2014-04-27 10:01

I use a number of tools on my network, including a private JIRA install (i.e. you need to log in to view anything) and the Sphider PHP search engine (I've generated a lot of documentation over the years).

Unfortunately the two aren't exactly compatible, as Sphider has no way to log into JIRA, but I wanted my JIRA issues and comments to be indexed so that relevant items can be included in my search results. One option would be to set JIRA to public mode, but I'd rather maintain the need to log in.

So instead I created a simple PHP script - JIRA Issue Listing - to generate a list that Sphider could index, but would redirect 'real' users to the relevant issue on JIRA.

This post is the documentation for that script

Installing the Script

Grab the script from GitHub

The script needs to be on a server that can talk to the JIRA database, so probably the same server as JIRA (depending on your setup).

I've taken the route of setting up an Apache virtual host specifically for the script, though you could just drop it into a subdirectory on an existing domain.

Configuring the Script

The script has a number of configuration variables at the top, most should be reasonably self explanatory

$conf->db = 'JiraDB'; // Jira's Database name
$conf->host = 'localhost'; // The hostname/IP of the database server
$conf->user = 'jiradbuser';// The username to authenticate with
$conf->password = '';// The password to use when authenticating with MySQL
$conf->dbprefix = '';// If you've got a database prefix set, specify it here

The remaining configuration variables are explained below

$conf->scriptname='index.php';

You only need to change this if you're renaming the script, or want to redirect via some other intermediate script

$conf->jiralocation = 'http://jira.example.com';

The URL you use to access JIRA

$conf->SphiderUA = 'Sphider';

The user-agent you've configured for your search engine

$conf->SphiderIP = array('192.168.1.65/30','192.168.1.96');

A range of IP's that can be considered authorised spiders, they'll be the only ones permitted to access the issue list. You can specify individual IP's or use CIDR notation

Real Users and Securing your script

Unless they're accessing from an authorised IP, requests for the full issue list won't be satisfied. A user attempting to access a single issue will be redirected to that issue on JIRA (where they'll have to authenticate).

This redirect, though, is based on user-agent, so can easily be avoided. It's therefore strongly recommended that you take additional precautions to restrict access to the script.

The simplest method would be to implement a conditional HTTP Basic Auth requirement for that virtualhost/directory - i.e. allow Sphider through without authentication, but require auth from other users

<VirtualHost *:80>
ServerAdmin webmaster@example.com
DocumentRoot /var/www/jiralisting
ServerName jiralisting.example.com
ErrorLog logs/jiralisting-error_log
CustomLog logs/jiralisting-access_log common

<Directory />
Options Indexes FollowSymLinks
order deny,allow
Deny from All
Allow from 192.168.1.65/30
Allow from 192.168.1.96
AuthType Basic
AuthName "Authentication is a must"
AuthUserFile "/etc/httpd/conf/.jiralisthtpasswd"
Require valid-user
Satisfy any
</Directory>
</VirtualHost>

Note: If you're using Basic auth, you might want to use https though!

Configuring your Search Engine

Taking the virtualhost example above, we'd simply configure our Search Engine to index http://jiralisting.example.com

Preventing Indexing of the Issues List

The script is preconfigured to prevent Sphider from indexing the full list (it'll follow the links and index the individual pages, but ignore the content of the full list). If you need to add additional tags for your search engine, search the script for the following tags and add the ones you need

<!--sphider_noindex-->
<!--/sphider_noindex-->

Script Output

The output isn't designed to be particularly human friendly as it should only ever really be parsed/viewed by our search engine

Full issue list (index.php)

Issue Page (index.php?issue=1&proj=ISS)

BACK

ISS-1: Some Issue


An issue relating to something really important
Reported By: btasker	Project:My Groundbreaking project

Comments

btasker
2014-04-25 22:53:27

Seems to have been broken by gremlins

The metadescription used on the Issue page is the Issue description, and the title element is a concatenation of the Issue Key (e.g. ISS-1) and the Issue summary/title