Thursday, 26 August 2010

Using Payloads with DisMaxQParser in SOLR

Payloads are a good way of controlling the scores in SOLR/Lucene.

This post by Grant Ingersoll gives a good introduction to payloads, I also found http://www.ultramagnus.org/?p=1 pretty useful. 

What I will describe here is how to use the payloads and have the functionalities of the DisMaxQParser in SOLR.

SOLR already has a field type for analysing payloads 




and we can also define a custom Similarity to use with the payloads



 
then specify this in the SOLR schema

<!-- schema.xml -->
<similarity class="uk.org.company.solr.PayloadSimilarity" />


 
So far so good. We now need a QueryParser plugin in order to use the payloads in the search and as mentioned above, I want to keep the functionalities of the DisMaxQueryParser.
The problem is that we need to specify PayloadTermQuery objects instead of TermQueries which is down deep in the object hierarchies and cannot AFAIK be modified simply from DismaxQueryParser.
I have implemented a modified version of DismaxQueryParser which rewrites the main part of the query (a.k.a userQuery in the implementation) and substitutes the TermQueries with PayloadTermQueries.
First we'll create a QParserPlugin 




which does not do much but simply exposes the PLDisMaxQueryParser which is a modified version of the standard DisMaxQueryParser but with PayloadQuery objects.




Once these 3 classes have been compiled, jarred and put in the classpath of SOLR, we must add 



 to solrconfig.xml.
 
then specify for the requestHandler : 
 
<str name="defType">payload</str>
 
<!-- plf : comma separated list of field names --> 
 <str name="plf">
  payloads
 </str>
 
The fields listed in the parameter plf will be queried with Payload query objects.  Remember that you can use &debugQuery=true to get the details of the scores and check that the payloads are being used.
 
 

7 comments:

  1. Awesome post. This was exactly what I needed, well done.

    ReplyDelete
  2. Great post. It works well for DisMax. Would it be possible to use the same technique for Extended DisMax (edismax)?

    ReplyDelete
  3. Thanks. Am pretty sure the same could be done with edismax (note : I haven't looked at the edismax code). Please post a comment if you manage to get it to work

    ReplyDelete
  4. Hi great post, but i can't get the whole thing working.
    I compiled the classes in a jar and added to my classpath (lib dir)
    i've a field with payloads and debugging the query i get:
    PLDisMaxQParser
    so the modified parser is used however the payloads are not used
    checking the debugQuery...
    I've not specified a request handler but i'm running the query using
    select/?q=&plf=&defType=payload&qf=

    Am i doing something wrong?

    ReplyDelete
    Replies
    1. Sorry, the request is not the one above but the follwing

      select/?q=something&plf=fieldwithpayload&defType=payload&qf=fields

      Furthermore i forgot to say that my payload
      value is a multivalued field.
      May this be the problem?

      Delete
  5. The last bit "then specify for the requestHandler :" does it mean I have to create a new one or just modify the dismax requestHandler to payload

    ReplyDelete
  6. It's been a long time since I wrote this and I can't remember the details. Maybe try modifying the existing one to see if it works? Sorry not to be more helpful

    ReplyDelete