Override "fl" parameter in Solr using SolrParams in a custom SearchComponent - solr

I have an interesting use case for a Solr implementation we have, where there are some fields in the Solr Schema that shouldn't be returned when doing a query. The ideal solution is to change the calling program so it doesn't query for &fl=score like it does now, and only requests the necessary fields, but that won't happen in the short term so in the meantime we have to filter out some fields from the Solr response.
The approach we think has the smallest performance impact (let me know if there is a better way to do this), is to override the &fl= parameter so it lists all the fields but the ones that should be filtered out. For this, we added a new SearchComponent to the RequestHandler components list that modifies the &fl parameter. The issue we ran into with this approach is that once we get the SolrParams from the SolrQueryRequest, it cannot be modified (which is I think is the right thing to do, since it could be changing something another SearchComponent relies on). But we still need to find a way to remove these extra fields.
So, this is the code we started to write:
public void prepare(ResponseBuilder rb) throws IOException {
SolrQueryRequest req = rb.req;
SolrParams params = req.getParams();
String fl = params.get("fl");
//Remove the "fl" parameter from params and replace it with a new list:
//Cannot be done"
...
And ran into the issue of not being able to add to the SolrParams.
As a plan B, that same SearchComponent is removing the fields in the process() method, but doing it this way is slower. The code has to go through the resulting SolrDocumentList, and for each SolrDocument call removeFields(), something similar to: (simplified code)
public void process(ResponseBuilder rb) throws IOException {
...
SolrQueryResponse rsp = rb.rsp;
NamedList values = rsp.getValues();
SolrDocumentList docs = (SolrDocumentList) values.get("response");
Iterator<SolrDocument> docsIterator = sdoclist.iterator();
while (docsIterator.hasNext()) {
SolrDocument sd = sdocIterator.next();
sd.removeFields(field);
...
Any ideas on how/if this can be achieved?
Thanks for any suggestion!

With your own SearchHandler you can specify invariants (things that will always be fixed no matter the request) on any query parameter, among which there is the &fl.
It's something in the lines of:
<requestHandler name="filtered" class="solr.StandardRequestHandler">
<lst name="invariants">
<str name="fl">score,id,something_else,etc.</bool>
</lst>
</requestHandler>
More documentation:
http://wiki.apache.org/solr/SearchHandler
The only problem is that, for now, there's no negative fl parameter (i.e. return all fields except those i'm telling you). https://issues.apache.org/jira/browse/SOLR-3191
Finally, to specify which SearchHandler you want to use at query time, simply add &qt=filtered (or the name you used for it)

Try removing the fields that you don't want from the ReturnFields object.
For example, something like this:
#Override
public void process(ResponseBuilder rb) throws IOException {
String fl = rb.req.getParams().get(CommonParams.FL);
List<String> fields = Lists.newArrayList(fl.split(","));
List<String> newFields = Lists.newArrayList();
for (String field : fields) {
if (!field.equals("score")) {
newFields.add(field);
}
}
String newFl = Joiner.on(",").join(newFields);
ReturnFields returnFields = new ReturnFields(newFl, rb.req);
rb.rsp.setReturnFields(returnFields);
}
I've set the custom SearchComponent in "last-components" at solrconfig.xml.
P.S: I was using guava libraries for Lists and Joiner.

Related

How to overwrite the Solr document field?

<arr name="itemDescSpell">
<str>Cable Tie, 4.0L (102mm), Miniature, Nyl</str>
<str>Cable Tie, 4.0L (102mm), Miniature, Nyl</str>
</arr>
itemDescSpell which is copyField which causing error when Solr Document is updated each time. I don't want to make the field as multiValued="true"
in schema , copyField is defined like below
<field name="itemDescSpell" type="textSpell"/>
<copyField source="description" dest="itemDescSpell"/>
The error is:
multiple values encountered for non multiValued field itemDescSpell.
Is anybody able to help me to solve this problem via SolrJ, while keeping this field type as textSpell?
Try using a custom UpdateRequestProcessor to overwrite the value present in the itemDescSpell field . Solr will throw the exception you get when a copyfield destination is already populated , so what you want to do is remove the copyField line from your schema and add a custom UpdateRequestProcessor to your config that could look like this :
public class CustomFactory extends UpdateRequestProcessorFactory {
#Override
public UpdateRequestProcessor getInstance(SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) {
return new Custom(next);
}
public class Custom extends UpdateRequestProcessor {
public Custom(UpdateRequestProcessor next){
super(next);
}
#Override
public void processAdd(AddUpdateCommand cmd) throws IOException {
cmd.solrDoc.setField("foo",cmd.solrDoc.getFieldValue("bar"));
}
}
}
This is NOT production ready code but it should give you an idea on how the final code should look. To customize the field values you can override the init method on the factory and pass those in the config.
The main difference is that solr uses addField when encountering copyField, and this class uses setField

Spring data mongo ConditionalGenericConverter empty TypeDescriptor

I'm trying to implement a somewhat general converter, which transforms the data based on a given annotation. Say I want to transform these annotated strings in any matter.
All is well, until the code hits my converter's "matches" method. The "sourceType" I'm getting is always stripped out of all of the useful information. Has anyone had any success with such a setup, or am I missing something?
public class TestStringWriteConverter implements ConditionalGenericConverter {
#Override
public boolean matches(TypeDescriptor sourceType, TypeDescriptor targetType) {
if (sourceType.hasAnnotation(GivenAnnotation.class)) {
//->never gets executed, because sourceType is being stripped out of it's useful infos
}
I followed the problem to MappingMongoConverter from this package org.springframework.data.mongodb.core.convert
protected void writeInternal(Object obj, final DBObject dbo, MongoPersistentEntity<?> entity) {
//...
if (null != propertyObj) {
if (!conversions.isSimpleType(propertyObj.getClass())) {
writePropertyInternal(propertyObj, dbo, prop);
} else {
// I always end up here, which is correct but the whole prop object is being omitted in favor of the getFieldName() property
writeSimpleInternal(propertyObj, dbo, prop.getFieldName());
}
}
}
The spring versions I'm using:
<spring.version>3.2.5.RELEASE</spring.version>
<spring.data.version>1.3.2.RELEASE</spring.data.version>
Any help is much appreciated.
I think you misunderstand what sourceType.hasAnnotation(…) actually returns. As the name suggests, it inspects the type for annotations. So for a given type like this:
#MyAnnotation
class Foo { }
it would allow you to find #MyAnnotation. However you are writing about "annotated strings". I assume you mean something like:
class Bar {
#MyAnnotation
String property;
}
This is not a type annotation and the Converter API is not meant to cover such cases. If you think supporting such scenarios would be worthfile, please file a JIRA ticket.

Solr, Special Characters, and the MultiFieldQueryParser

I need to programatically build boolean queries against multiple Solr fields. I thought that the Lucene MultiFieldQueryParser would be a good way to go. This works well except when special characters are involved.
public class QueryParserSpike {
String userQuery = "(-)-foo";
String escapedQuery = ClientUtils.escapeQueryChars(userQuery); // \(\-\)\-foo
Analyzer analyzer = new WhitespaceAnalyzer(Version.LUCENE_43);
QueryParser parser = new MultiFieldQueryParser(Version.LUCENE_43, new String[]{"a"}, analyzer);
#Test(expected=ParseException.class)
public void testNoEscape() throws Exception {
parser.parse(userQuery); // Throws an exception
}
#Test
public void testEscape() throws Exception {
Query q = parser.parse(escapedQuery);
System.out.println(q.toString()); // a:(-)-foo (This can't be parsed by Solr)
}
#Test
public void testDoubleEscape() throws Exception {
String doubleEscapedQuery = escapedQuery.replaceAll("\\\\", "\\\\\\\\") ;
Query q = parser.parse(doubleEscapedQuery);
System.out.println(q.toString()); // (a:\) (a:\-\) (a:\-foo) (This isn't the correct query)
}
}
What I'm trying to get out of this would be a:\(\-\)\-foo. Is there a Solr class that does something similar? Or is the best option to write something to process the result of the MultiFieldQueryParser myself?
What the query passes from Query.toString() method is a best effort at a user readable query. It is not necessarily a parsable query, like in this case. You can never rely on logic like: parser.parse(query.toString()). The Lucene Query API is capable of expressing many things that there is no way at all to express with the QueryParser syntax.
The method you use to escape the query in testEscape() should be correct, and give you the query you are looking for. You could also use QueryParser.escape(userQuery), for the raw Lucene method.

Limiting terms in Solr's TermsComponent to terms originating from certain documents

I am using Solrs TermsComponent to implement an autocomplete feature. My documents contain tags which I have indexed in a "tags" field. Now I can use TermsComponent to find out which tags are used in all the stored documents. This works pretty well so far.
However there is some additional requirement: Every document has an owner field which contains the ID of the user who owns it. The autocomplete list should only contain tags from documents, that the user who is requesting the autocomplete is actually owning.
I have tried to set the query parameter, however this seems to be ignored by the TermsComponent:
public static List<String> findUniqueTags(String beginningWith, User owner) throws IOException {
SolrParams q = new SolrQuery().setQueryType("/terms")
.setQuery("owner:" + owner.id.toString())
.set(TermsParams.TERMS, true).set(TermsParams.TERMS_FIELD, "tags")
.set(TermsParams.TERMS_LOWER, beginningWith)
.set(TermsParams.TERMS_LOWER_INCLUSIVE, false)
.set(TermsParams.TERMS_PREFIX_STR, beginningWith);
QueryResponse queryResponse;
try {
queryResponse = getSolrServer().query(q);
} catch (SolrServerException e) {
Logger.error(e, "Error when querying server.");
throw new IOException(e);
}
NamedList tags = (NamedList) ((NamedList)queryResponse.getResponse().get("terms")).get("tags");
List<String> result = new ArrayList<String>();
for (Iterator iterator = tags.iterator(); iterator.hasNext();) {
Map.Entry tag = (Map.Entry) iterator.next();
result.add(tag.getKey().toString());
}
return result;
}
So is there a way of limiting the tags returned by TermsComponent, or do I manually have to query all the tags of the user and filter them myself?
According to this and that post on the Solr mailing list, filtering on the terms component is not possible because it operates on raw index data.
Apparently, the Solr developers are working on a real autosuggest component that supports your filtering.
Depending on your requirements you might be able to use the faceting component for autocomplete instead of the terms component. It fully supports filter queries for reducing the set of eligible tags to a subset of the documents in the index.

Getting column length from Hibernate mappings?

To validate data I am receiving I need to make sure that the length is not going to exceeded a database column length. Now all the length information is stored in the Hibernate mapping files, is there anyway to access this information programmatically?
You can get to it but it's not easy. You might want to do something like below at startup and store a static cache of the values. There are a lot of special cases to deal with (inheritance, etc), but it should work for simple single-column mappings. I might have left out some instanceof and null checks.
for (Iterator iter=configuration.getClassMappings(); iter.hasNext();) {
PersistentClass persistentClass = (PersistentClass)iter.next();
for (Iterator iter2=persistentClass.getPropertyIterator(); iter2.hasNext();) {
Property property = (Property)iter2.next();
String class = persistentClass.getClassName();
String attribute = property.getName();
int length = ((Column)property.getColumnIterator().next()).getLength();
}
}
Based on Brian's answer, this is what I ended up doing.
private static final Configuration configuration = new Configuration().configure();
public static int getColumnLength(String className, String propertyName) {
PersistentClass persistentClass = configuration.getClassMapping(className);
Property property = persistentClass.getProperty(propertyName);
int length = ((Column) property.getColumnIterator().next()).getLength();
return length;
}
This appears to be working well. Hope this is helpful to anyone who stumbles upon this question.
My preferred development pattern is to base the column length on a constant, which can be easily referenced:
class MyEntity {
public static final int MY_FIELD_LENGTH = 500;
#Column(length = MY_FIELD_LENGTH)
String myField;
...
}
Sometimes it may be problem to get the Configuration object (if you are using some application framework and you are not creating session factory by yourself using the Configuration).
If you are using for example Spring, you can use the LocalSessionFactoryBean (from your applicationContext) to obtain Configuration object. Then obtaining of column length is just piece of cake ;)
factoryBean.getConfiguration().getClassMapping(String entityName) .getTable().getColumn(Column col).getLength()
However, when I try to access the LocalSessionFactoryBean, I take a class cast exception
LocalSessionFactoryBean factoryBean = (LocalSessionFactoryBean) WebHelper.instance().getBean("sessionFactory");
exception:
org.hibernate.impl.SessionFactoryImpl cannot be cast to org.springframework.orm.hibernate3.LocalSessionFactoryBean
<bean id="sessionFactory"
class="org.springframework.orm.hibernate3.LocalSessionFactoryBean>
This seems devious....
EDIT: found the answer. You need to use an ampersand in front of the bean name string
LocalSessionFactoryBean factoryBean = (LocalSessionFactoryBean) WebHelper.instance().getBean("&sessionFactory");
see this Spring forum post

Resources