How do I get all unique documents by a max field - solr

I am working on a search feature for a Liferay 6.2 app, but I am struggling with how to get the latest articles.
For reasons, the client wants to track all versions of the Liferay Journal Articles in Solr. This means that every "version" gets stored as a separate document with an incrementing version field. For the purpose of the search, I need to grab the latest one.
For example, if I have a Journal Article like this in Solr:
[{
articleId:"123456",
title:"Sample Doc 1",
content:"abc 123 xyz",
version:"1.0"
},
{
articleId:"222111",
title:"Sample Doc 2",
content:"1111",
version:"1.0"
},
{
articleId:"222111",
title:"Sample Doc 2",
content:"2222",
version:"1.1"
},
{
articleId:"123456",
title:"Sample Doc 1",
content:"xxx xxx 1234556",
version:"1.1"
},
{
articleId:"222111",
title:"Sample Doc 2",
content:"3333",
version:"1.2"
}]
And I queried all documents I would expect the results:
[{
articleId:"123456",
title:"Sample Doc 1",
content:"xxx xxx 1234556",
version:"1.1"
},
{
articleId:"222111",
title:"Sample Doc 2",
content:"3333",
version:"1.2"
}]
Noticing that I only retrieved each unique articleId that had the max version.
Exact versions I am working on are:
Liferay 6.2.ee sp11 (with some patches)
Solr 4.10.4 under Tomcat 7.0.64
I tried googling for answers, but I am not sure what I am googling for here. I don't think facets are the answer, and grouping doesn't seem to return the results I need.

You can use grouping or a collapse filter for that. From my experience collapse filter is much faster than grouping. Here is how it should be used for your case:
fq={!collapse field=articleId max=version}

Related

How to respect Solr conditions in order

I need to send a query to Solr with two conditions in OR, instead of sending the query twice:
{!complexphrase inOrder=true}title:"some tests*" || title:(some tests*)
.. where, in the first condition, I want the precise result. If not found, then it goes to OR and retrieves any result that has at least one word in the search phrase. But when I launch the query, I still get the right condition results first.
Here is my data:
{
"title": "some values"
},
{
"title": "data tests"
},
{
"title": "some tests"
}
The response I need is:
{
"title": "some tests"
},
{
"title": "data tests"
},
{
"title": "some values"
}
I already tried using boosting, like so: {!complexphrase inOrder=true}title:"some tests*"^2 || title:(some tests*)^1 but didn't work. I am NOT able to change the Solr configuration since it's a software that's already in production and not managed by me. I even cannot sort by rating, infact I don't receive best occurences first. Solr version is 7.3.1. Any help is appreciated, thanks in advance!
I solved it with a work-around. Instead of putting two OR conditions, I managed to apply a working boost on the title field, using edismax.
What I had to change in my Java application was:
From
SolrQuery q = new SolrQuery("*");
To
SolrQuery q = new SolrQuery("(" + query + "*)");
and added:
q.set("defType", "edismax");
q.set("qf", "title^100");
Now, I'm not making a precise query but I'm retrieving documents with a higher match first without changing any configuration! The Solr Frontend equivalent is similar, but the query should look like this:
http://localhost:8983/solr/mycollection/select?defType=edismax&q=(some%20test*)&qf=title^100
Hope it helps someone

Alfresco 7: Search both in the version store and the content store

I have Alfresco 7.0 community edition, with search services version 2.0.2.
My need is to search for all documents having some metadata values, both in the version store and the content store.
I've tryied to search against the versionStore from public api with the following body
{
"query": {
"language": "afts",
"query": "TYPE:\"myc:projects\" AND myc:prop:\"pippo\""
},
"paging": {
"maxItems": 100,
"skipCount": 0
},
"include": [
"allowableOperations",
"properties"
],
"scope": {
"locations": "versions"
}
}
but I get http status 500.
If I try to search from Node Browser I receive this error message No solr query support for store workspace://version2Store.
I also tried the ligthweigth store (which is the difference?)
Is it possible to search with lucene, AFTS against the versions? Do I need to enable some property in alfresco-global.properties?
I've also seen this question and it's seems to me they can do searches.
Thanks a lot.
I've been able to handle both types of searches doing a CMIS query, but I had to write a module that handle that type of reseach (the form, results to be joined and displayed...).
Versioned nodes have a property which point to the noreRef of the current node.

Drupal Services Plugin ignoring multi-value fields

I'm using drupal 7 with services plugin 3.17
I'm trying to create a node with a field that accepts multiple values via json api with the following data:
{
"type":"custom_type_article",
"title":"My title",
"language":"und",
"body": {
"und": [ { "value": "Article body" } ]
},
"field_article_auhtors": {
"und": [{"value": "author 1"}, {"value": "author 2"}, {"value": "author 3"}]
}
}
The node is succesfully created but only the first value of field_article_auhtors is populated.
Is my json structure incorrect to create multiple values on "field_article_auhtors"?
Version 3.17 of Services has a bug with multi value fields. It looks like the bug is a regression introduced around version v3.6.
A patch was released in November, and multiple users are reporting it as working, though officially it's marked as 'Needs Work'. (The author has asked for a review of the code, and it has already been included in the dev version of Services. That said, a gentle nudge / reminder to test it in a dev environment. ;)
See the conversation, the patch, and a dev release of Services that includes it over on Drupal's official Services Project section at https://www.drupal.org/project/services/issues/2224803

How can you retrieve a full nested document in Solr?

In my instance of Solr 4.10.3 I would like to index JSONs with a nested structure.
Example:
{
"id": "myDoc",
"title": "myTitle"
"nestedDoc": {
"name": "test name"
"nestedAttribute": {
"attr1": "attr1Val"
}
}
}
I am able to store it correctly through the admin interface:
/solr/#/mySchema/documents
and I'm also able to search and retrieve the document.
The problem I'm facing is that when I get the response document from my Solr search, I cannot see the nested attributes. I only see:
{
"id": "myDoc",
"title": "myTitle"
}
Is there a way to include ALL the nested fields in the returned documents?
I tried with : "fl=[child parentFilter=title:myTitle]" but it's not working (ChildDocTransformerFactory from:https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents). Is that the right way to do it or is there any other way?
I'm using: Solr 4.10.3!!!!!!
To get returned all the nested structure, you indeed need to use ChildDocTransformerFactor. However, you first need to properly index your documents.
If you just passed your structure as it is, Solr will index them as separate documents and won't know that they're actually connected. If you want to be able to correctly query nested documents, you'll have to pre-process your data structure as described in this post or try using (modifying as needed) a pre-processing script. Unfortunately, including the latest Solr 6.0, there's no nice and smooth solution on indexing and returning nested document structures, so everything is done through "workarounds".
Particularly in your case, you'll need to transform your document structure into this:
{
"type": "parentDoc",
"id": "myDoc",
"title": "myTitle"
"_childDocuments_": [
{
"type": "nestedDoc",
"name": "test name",
"_childDocuments_" :[
{
"type": "nestedAttribute"
"attr1": "attr1Val"
}]
}]
}
Then, the following ChildDocTransformerFactor query will return you all subdocuments (btw, although it says it's available since Solr 4.9, I've actually only seen it in Solr 5.3... so you need to test):
q=title:myTitle&fl=*,[child parentFilter=type:parentDoc limit=50]
Note, although it returns all nested documents, the returned document structure will be flattend (alas!), i.e., you'll get:
{
"type": "parentDoc",
"id": "myDoc",
"title": "myTitle"
"_childDocuments_": [
{
"type": "nestedDoc",
"name": "test name"
},
{
"type": "nestedAttribute"
"attr1": "attr1Val"
}]
}
Probably, not really what you've expected but... this is the unfortunate Solr's behavior that will be fixed in a nearest future release.
You can put
q={!parent which=}
and in fl field :"fl=*,[child parentFilter=title:myTitle].
It will give you all parent field and children field of title:mytitle

How to query in the nested array.(using pymongo)

I'm new bee in mongodb.
I made a nested array document like this.
data = {
"title": "mongo community",
"description": "I am a new bee",
"topics": [{
"title": "how to find object in array",
"comments": [{
"description": "desc1"
}]
},
{
"title": "the case to use ensureIndex",
"comments": [{
"description": "before query"
},
{
"description": "If you want"
}
]
}
]
}
after that, put it in the "community"
db.community.insert(data)
so,I would like to accumulate "comments" which topics title is "how to find object in array"
then I tried,
data = db.community.find_one({"title":"mongo community","topics.title":"how to find object in array" } )
the result is
>>> print data
{
u 'topics': [{
u 'comments': [{
u 'description': u 'desc1'
}],
u 'title': u 'how to find object in array'
},
{
u 'comments': [{
u 'description': u 'before query'
},
{
u 'description': u 'If you want'
}],
u 'title': u 'the case to use ensureIndex'
}],
u '_id': ObjectId('4e6ce188d4baa71250000002'),
u 'description': u 'I am a new bee',
u 'title': u 'mongo community'
}
I don't need the topics "the case to use ensureIndex"
Whould you give me any advice.
thx.
It looks like you're embedding topics as an array all in a single document. You should try to avoid returning partial documents frequently from MongoDB. You can do it with the "fields" argument of the find method, but it isn't very easy to work with if you're doing it frequently.
So to solve this you could try to make each topic a separate document. I think that would be easier for you too. If you want to save information about the "community" for forum, put it in a separate collection. For example, you could use the following in the monbodb shell:
// ad a forum:
var forum = {
title:"mongo community",
description:"I am a new bee"
};
db.forums.save(forum);
// add first topic:
var topic = {
title: "how to find object in array",
comments: [ {description:"desc1"} ],
forum:"mongo community"
};
db.topics.save(topic);
// add second topic:
var topic = {
title: "the case to use ensureIndex",
comments: [
{description:"before query"},
{description:"If you want"}
],
forum:"mongo community"
};
db.topics.save(topic);
print("All topics:");
printjson(db.topics.find().toArray());
print("just the 'how to find object in array' topic:")
printjson(db.topics.find({title:"how to find object in array"}).toArray());
Also, see the document Trees In MongoDB about schema design in MongoDB. It happens to be using a similar schema to what you are working with and expands on it for more advanced use cases.
MongoDB operates on documents, that is, the top level documents (the things you save, update, insert, find, and find_one on). Mongo's query language lets you search within embedded objects, but will always return, update, or manipulate one (or more) of these top-level documents.
MongoDB is often called "schema-less," but something more like "(has) flexible schemas" or "(has) per-document schemas" would be a more accurate description. This is a case where your schema design -- having topics embedded directly within a community -- is not working for this particular query. However there are probably other queries that this schema supports more efficiently, like listing the topics within a community in a single query. You might want to consider the queries you want to make and re-design your schema accordingly.
A few notes on MongoDB limitations:
top-level documents are always returned (optionally with only a subset of fields, as #scott noted -- see the mongodb docs on this topic)
each document is limited to 16 megabytes of data (as of version 1.8+), so this schema will not work well if the communities have a long list of topics
For help with schema design, see the mongodb docs on schema design, Kyle Banker's video "Schema Design Basics", and Eliot Horowitz's video "Schema Design at Scale" for an introduction, tips, and considerations.

Resources