Solr, Special Characters, and the MultiFieldQueryParser

Solr, Special Characters, and the MultiFieldQueryParser - solr

I need to programatically build boolean queries against multiple Solr fields. I thought that the Lucene MultiFieldQueryParser would be a good way to go. This works well except when special characters are involved.
public class QueryParserSpike {
String userQuery = "(-)-foo";
String escapedQuery = ClientUtils.escapeQueryChars(userQuery); // \(\-\)\-foo
Analyzer analyzer = new WhitespaceAnalyzer(Version.LUCENE_43);
QueryParser parser = new MultiFieldQueryParser(Version.LUCENE_43, new String[]{"a"}, analyzer);
#Test(expected=ParseException.class)
public void testNoEscape() throws Exception {
parser.parse(userQuery); // Throws an exception
}
#Test
public void testEscape() throws Exception {
Query q = parser.parse(escapedQuery);
System.out.println(q.toString()); // a:(-)-foo (This can't be parsed by Solr)
}
#Test
public void testDoubleEscape() throws Exception {
String doubleEscapedQuery = escapedQuery.replaceAll("\\\\", "\\\\\\\\") ;
Query q = parser.parse(doubleEscapedQuery);
System.out.println(q.toString()); // (a:\) (a:\-\) (a:\-foo) (This isn't the correct query)
}
}
What I'm trying to get out of this would be a:\(\-\)\-foo. Is there a Solr class that does something similar? Or is the best option to write something to process the result of the MultiFieldQueryParser myself?

What the query passes from Query.toString() method is a best effort at a user readable query. It is not necessarily a parsable query, like in this case. You can never rely on logic like: parser.parse(query.toString()). The Lucene Query API is capable of expressing many things that there is no way at all to express with the QueryParser syntax.
The method you use to escape the query in testEscape() should be correct, and give you the query you are looking for. You could also use QueryParser.escape(userQuery), for the raw Lucene method.

Related

Is there a way to avoid explicitly writing document fields as Strings in Spring Data MongoDB queries?

I have recently started to use Spring Data MongoDB and I wonder if there is any way to avoid writing entities' attributes explicitly as they are stored in the database. For example, given the following class representing a MongoDB collection:
public class Employee {
#Id
public String id;
private double salary;
...
}
If I want to make a query using MongoTemplate like:
public List findEmployeeBySalaryRange(double salary) {
Query query = new Query();
query.addCriteria(Criteria.where("salary").lt(salary));
...
}
I would like to avoid writing "salary", since that will make the code harder to maintain in the future in case the field name changes. I am thinking of something like getting the field name from the class attribute, but I am not quite sure how. Is there a way to do it? I have looked into the documentation but did not find anything related unless I missed it.
Thanks in advance.

You may create a Utility Class to store all database field names, use #Field annotation on field with constant from that class and use that constant in query to avoid error prone hardcoded Strings.
In Employee Model
#Field(DbFields.SALARY)
private double salary;
In Query,
query.addCriteria(Criteria.where(DbFields.SALARY).lt(salary));
In DbFields Utility class
public static final String SALARY = "salary";

adding new methods to LINQ to Entities

Is there any way to define the SQL conversion component for additional functions to Linq2Entities.
For example:
myQuery.Where(entity => entity.Contains('foo', SearchFlags.All))
Ideally I am looking for something that doesn't require editing and building a new version the EntityFramework.dll directly. Is there any way to allow extension methods to entity framework that can support SQL generation.
So far I have a template which would represent the method I need to replace for LINQ to Entities:
public static bool Contains(this object source, string searchTerms, SearchFlags flags)
{
return true;
}
Of course this causes the error:
LINQ to Entities does not recognize the method 'Boolean
CONTAINS(System.Object, System.String, SearchFlags)' method, and this method
cannot be translated into a store expression.
To be clear, I don't want to do:
myQuery.AsEnumerable().Where(entity => entity.Contains('foo', SearchFlags.All))
Because I want to be able to execute code in SQL space and not return all the entities manually.
I also cannot use the .ToString() of the IQueryable and execute it manually because I need Entity Framework to populate the objects from several .Include joins.

I don't understand your Q clearly. However if your problem is that you can't use your own methods or other linq to objects method, just use .AsEnumerable() and do your other jobs through linq to objects, not L2E:
myQuery.AsEnumerable().Where(entity => entity.Contains('foo', SearchFlags.All))
And if you need to use your myQuery several times somewhere else, first load it to memory, then use it as many as you want:
var myQuery = from e in context.myEntities
select d;
myQuery.Load();
// ...
var myOtherQuery = from d in context.myEntities.Local
select d;
// Now any L2O method is supported...

I ended up doing the following (which works but is very far from perfect):
All my entities inherit from an IEntity which defines long Id { get; set; }
I then added a redundant restriction
context.myEntities.Where(entity => entity.Id != 0) this is
redundant since the identity starts at 1, but Linq2Entities doesn't
know that.
I then call .ToString() on the IQueryable after I have done all
my other queries, since it is of type DBQuery<Entity> it returns
the SQL Command Text, I do a simple replace with my query restriction.
In order to get all the .Include(...) to work I actually execute
two different sql commands. There is no other more pretty way to tap into this because of query execution plan caching causes issues otherwise (even when disabled).
As a result my code looks like this:
public IQueryable<IEntity> MyNewFunction(IQueryable<IEntity> myQueryable, string queryRestriction)
{
string rawSQL = myQueryable.Select(entity => entity.Id).ToString().Replace("[Extent1].Id <> 0", queryRestriction);
List<long> ids = // now execute rawSQL, get the list of ids;
return myQuerable.Where(entity => ids.Contains(entity.Id));
}
In short, other than manually executing the SQL or running a similar SQL command and appending the restriction using the existing commands the only way to write your own methods to Linq-to-Entities is to manually alter and build your own EntityFramework.dll from the EF6 source.

Queries with Objectify: UmbrellaException

I am using Objectify to manage GAE Datastore for my GWT app. The problem is that I am not using queries properly and I get UmbrellaExceptions as per below:
Caused by: java.lang.RuntimeException: Server Error: java.lang.String cannot be cast to java.lang.Number
at com.google.web.bindery.requestfactory.shared.Receiver.onFailure(Receiver.java:44)
Say that I have a class Box with a unique field String id. I want to get the Box object whose id == "cHVQP6zZiUjM"
This is how I do it now:
public Box getBox(String boxId)
{
Objectify ofy = ObjectifyService.begin();
Query<Box> q=ofy.query(Box.class).filter("id",boxId);
Box targetBox = q.get();
return targetBox;
}
#Entity
public class Box extends DatastoreObject{
private String id;
private String title;
}
I tried doing this with ofy.load() but that method is not defined in my class Objectify (I don't know why).

Your key is encoded. Try using:
Box targetBox = ofy.get(Box.class, KeyFactory.stringToKey(boxId));
To decode your key.

The short answer: You are missing the #Id annotation in your entity.
The long answer: Id fields are special in the datastore. The id is not a real property, but rather a part of the Key that identifies the entity. You can't really filter on id fields, but you can filter on a special field called __key__. Objectify is somewhat clever about letting you filter by the id field and converting this to a __key__ filter under the covers, but it can't do it if you don't annotate the entity properly!
Actually I'm a little confused because Objectify shouldn't let you register the entity without an #Id field.
By the way, there are two sections of the documentation: Objectify4 (release coming soon) and Objectify3. Since you're using Ofy3, there is no load() method.
Another thing: Get-by-key operations are strongly preferred to queries when the operations are equivalent (as they are in your example).

Override "fl" parameter in Solr using SolrParams in a custom SearchComponent

I have an interesting use case for a Solr implementation we have, where there are some fields in the Solr Schema that shouldn't be returned when doing a query. The ideal solution is to change the calling program so it doesn't query for &fl=score like it does now, and only requests the necessary fields, but that won't happen in the short term so in the meantime we have to filter out some fields from the Solr response.
The approach we think has the smallest performance impact (let me know if there is a better way to do this), is to override the &fl= parameter so it lists all the fields but the ones that should be filtered out. For this, we added a new SearchComponent to the RequestHandler components list that modifies the &fl parameter. The issue we ran into with this approach is that once we get the SolrParams from the SolrQueryRequest, it cannot be modified (which is I think is the right thing to do, since it could be changing something another SearchComponent relies on). But we still need to find a way to remove these extra fields.
So, this is the code we started to write:
public void prepare(ResponseBuilder rb) throws IOException {
SolrQueryRequest req = rb.req;
SolrParams params = req.getParams();
String fl = params.get("fl");
//Remove the "fl" parameter from params and replace it with a new list:
//Cannot be done"
...
And ran into the issue of not being able to add to the SolrParams.
As a plan B, that same SearchComponent is removing the fields in the process() method, but doing it this way is slower. The code has to go through the resulting SolrDocumentList, and for each SolrDocument call removeFields(), something similar to: (simplified code)
public void process(ResponseBuilder rb) throws IOException {
...
SolrQueryResponse rsp = rb.rsp;
NamedList values = rsp.getValues();
SolrDocumentList docs = (SolrDocumentList) values.get("response");
Iterator<SolrDocument> docsIterator = sdoclist.iterator();
while (docsIterator.hasNext()) {
SolrDocument sd = sdocIterator.next();
sd.removeFields(field);
...
Any ideas on how/if this can be achieved?
Thanks for any suggestion!

With your own SearchHandler you can specify invariants (things that will always be fixed no matter the request) on any query parameter, among which there is the &fl.
It's something in the lines of:
<requestHandler name="filtered" class="solr.StandardRequestHandler">
<lst name="invariants">
<str name="fl">score,id,something_else,etc.</bool>
</lst>
</requestHandler>
More documentation:
http://wiki.apache.org/solr/SearchHandler
The only problem is that, for now, there's no negative fl parameter (i.e. return all fields except those i'm telling you). https://issues.apache.org/jira/browse/SOLR-3191
Finally, to specify which SearchHandler you want to use at query time, simply add &qt=filtered (or the name you used for it)

Try removing the fields that you don't want from the ReturnFields object.
For example, something like this:
#Override
public void process(ResponseBuilder rb) throws IOException {
String fl = rb.req.getParams().get(CommonParams.FL);
List<String> fields = Lists.newArrayList(fl.split(","));
List<String> newFields = Lists.newArrayList();
for (String field : fields) {
if (!field.equals("score")) {
newFields.add(field);
}
}
String newFl = Joiner.on(",").join(newFields);
ReturnFields returnFields = new ReturnFields(newFl, rb.req);
rb.rsp.setReturnFields(returnFields);
}
I've set the custom SearchComponent in "last-components" at solrconfig.xml.
P.S: I was using guava libraries for Lists and Joiner.

Problem in solrQuery.setFilteQueries() Method

I have the following query which I took from my URL
public static String query="pen&mq=pen&f=owners%5B%22abc%22%5D&f=application_type%5B%22cde%22%5D";
public static String q="pen";
I parsed my query string and took each facetname and facet value from it and stored in a map
String querydec = URLDecoder.decode(query, "UTF-8");
String[] facetswithval = querydec.split("&f=");
Map<String, String> facetMap = new HashMap<String, String>();
for (int i = 1; i < facetswithval.length; i++) {
String[] fsplit = facetswithval[i].split("\\[\"");
String[] value = fsplit[1].split("\"\\]");
facetMap.put(fsplit[0], value[0]);
}
Then i use the following code to query in solr using solrj
CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr/");
SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery(q);
for (Iterator<String> iter = facetMap.keySet().iterator(); iter.hasNext();){
String key=iter.next();
System.out.println("key="+key+"::value="+facetMap.get(key));
solrQuery.setFilterQueries(key+":"+facetMap.get(key));
}
solrQuery.setRows(MAX_ROW_NUM);
QueryResponse qr = server.query(solrQuery);
SolrDocumentList sdl = qr.getResults();
But after running my code I found out that solrQuery.setFilterQuery method is setting filter for only last set facet. That means if i m running the loop and using this function three times it is taking the last set filter values only.
Can somebody please clarify this and tell me better approach for doing this. Also I am decoding url. So, if my facet contains some special character in the middle then i am not getting any result for that. I tried using it without encoding also but it didnt work. :(

There is also a addFilterQuery method, I would call that since you are setting the filter queries individually in your for loop.
Also, please see this post Filter query with special character using SolrJ client from the Solr Users Mailing List about the need to still escape special characters in queries.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Solr, Special Characters, and the MultiFieldQueryParser - solr

Related

Is there a way to avoid explicitly writing document fields as Strings in Spring Data MongoDB queries?

adding new methods to LINQ to Entities

Queries with Objectify: UmbrellaException

Override "fl" parameter in Solr using SolrParams in a custom SearchComponent

Problem in solrQuery.setFilteQueries() Method

Categories

Resources