Showing posts with label lucene. Show all posts
Showing posts with label lucene. Show all posts

Thursday 26 August 2010

Using Payloads with DisMaxQParser in SOLR

Payloads are a good way of controlling the scores in SOLR/Lucene.

This post by Grant Ingersoll gives a good introduction to payloads, I also found http://www.ultramagnus.org/?p=1 pretty useful. 

What I will describe here is how to use the payloads and have the functionalities of the DisMaxQParser in SOLR.

SOLR already has a field type for analysing payloads 




and we can also define a custom Similarity to use with the payloads



 
then specify this in the SOLR schema

<!-- schema.xml -->
<similarity class="uk.org.company.solr.PayloadSimilarity" />


 
So far so good. We now need a QueryParser plugin in order to use the payloads in the search and as mentioned above, I want to keep the functionalities of the DisMaxQueryParser.
The problem is that we need to specify PayloadTermQuery objects instead of TermQueries which is down deep in the object hierarchies and cannot AFAIK be modified simply from DismaxQueryParser.
I have implemented a modified version of DismaxQueryParser which rewrites the main part of the query (a.k.a userQuery in the implementation) and substitutes the TermQueries with PayloadTermQueries.
First we'll create a QParserPlugin 




which does not do much but simply exposes the PLDisMaxQueryParser which is a modified version of the standard DisMaxQueryParser but with PayloadQuery objects.




Once these 3 classes have been compiled, jarred and put in the classpath of SOLR, we must add 



 to solrconfig.xml.
 
then specify for the requestHandler : 
 
<str name="defType">payload</str>
 
<!-- plf : comma separated list of field names --> 
 <str name="plf">
  payloads
 </str>
 
The fields listed in the parameter plf will be queried with Payload query objects.  Remember that you can use &debugQuery=true to get the details of the scores and check that the payloads are being used.