Alfresco 7: Search both in the version store and the content store - solr

I have Alfresco 7.0 community edition, with search services version 2.0.2.
My need is to search for all documents having some metadata values, both in the version store and the content store.
I've tryied to search against the versionStore from public api with the following body
{
"query": {
"language": "afts",
"query": "TYPE:\"myc:projects\" AND myc:prop:\"pippo\""
},
"paging": {
"maxItems": 100,
"skipCount": 0
},
"include": [
"allowableOperations",
"properties"
],
"scope": {
"locations": "versions"
}
}
but I get http status 500.
If I try to search from Node Browser I receive this error message No solr query support for store workspace://version2Store.
I also tried the ligthweigth store (which is the difference?)
Is it possible to search with lucene, AFTS against the versions? Do I need to enable some property in alfresco-global.properties?
I've also seen this question and it's seems to me they can do searches.
Thanks a lot.

I've been able to handle both types of searches doing a CMIS query, but I had to write a module that handle that type of reseach (the form, results to be joined and displayed...).
Versioned nodes have a property which point to the noreRef of the current node.

Related

Differences between Suggesters and NGram

I've built an index with a Custom Analyzer
"analyzers": [
{
"#odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "ingram",
"tokenizer": "whitespace",
"tokenFilters": [ "lowercase", "NGramTokenFilter" ],
"charFilters": []
}
],
"tokenFilters": [
{
"#odata.type": "#Microsoft.Azure.Search.NGramTokenFilterV2",
"name": "NGramTokenFilter",
"minGram": 3,
"maxGram": 8
}
],
I came upon Suggesters and was wondering what the pros/cons were between these 2 approaches.
Basically, I'm doing an JavaScript autocomplete text box. I need to do partial text search inside of the search text (i.e. search=ell would match on "Hello World".
Azure Search offers two features to enable this depending on the experience you want to give to your users:
- Suggestions: https://learn.microsoft.com/en-us/rest/api/searchservice/suggestions
- Autocomplete: https://learn.microsoft.com/en-us/rest/api/searchservice/autocomplete
Suggestions will return a list of matching documents even with incomplete query terms, and you are right that it can be reproduced with a custom analyzer that uses ngrams. It's just a simpler way to accomplish that (since we took care of setting up the analyzer for you).
Autocomplete is very similar, but instead of returning matching documents, it will simply return a list of completed "terms" that match the incomplete term in your query. This will make sure terms are not duplicated in the autocomplete list (which can happen when using the suggestions API, since as I mentioned above, suggestions return matching documents, rather than a list of terms).

Using search.in with all

Follwing statement find all profiles that has Facebook or twitter and this works:
$filter=SocialAccounts/any(x: search.in(x, 'Facebook,Twitter'))
But I cant find any samples for finding all that has both Facebook and twitter. I tried:
$filter=SocialAccounts/all(x: search.in(x, 'Facebook,Twitter'))
But this is not valid query.
Azure Search does not support the type of ‘all’ filter that you’re looking for. Using search.in with ‘all’ would be equivalent to using OR, but Azure Search can only handle AND in the body of an ‘all’ lambda (which is equivalent to OR in the body of an ‘any’ lambda).
You might try a workaround like this:
$filter=tags/any(t: t eq 'Facebook') and tags/any(t: t eq 'Twitter')
However, this isn't actually equivalent to using all with search.in. The query as expressed using all is matching documents where every social account is strictly either Facebook or Twitter. If any other social account is present, the document won’t match. The workaround doesn’t have this property. A document must have at least Facebook and Twitter in order to match, but not exclusively those. This is certainly a valid scenario; it just isn't the same as using all with search.in, which was the original question.
No matter how you try to rewrite the query, you won’t be able to express an equivalent to the all query. This is a limitation due to the way Azure Search stores collections of strings and other primitive types in the inverted index.
Please vote on user voice to help prioritize:
https://feedback.azure.com/forums/263029-azure-search/suggestions/37166749-efficient-way-to-express-a-true-all
A possible workaround is to use the new Complex Types feature, which does allow more expressive filters inside lambda expressions. For example, if you model tags as objects with a single value property instead of as a collection of strings, you should be able to execute a filter like this:
$filter=tags/all(t: search.in(t/value, 'Facebook,Twitter'))
In the REST API, you'd define tags like this:
{
"name": "myindex",
"fields": [
...
{
"name": "tags",
"type": "Collection(Edm.ComplexType)",
"fields": [
{ "name": "value", "type": "Edm.String", "filterable": true }
]
}
]
}
Note that this feature is in preview at the time of this writing, but will be generally available (and publicly documented) soon.

Fetch partial documents from couchdb

I'm using couchdb to store large documents, which is causing some trouble when fetching them to memory. I do realize the database is not meant to be used this way. As a fallback solution, is it possible to fetch partial documents from the database, without creating a view?
In example, if a document has the fields id, content and extra_content, I would like to retrieve only the first two.
Thank you in advance.
If you are using CouchDB 2.x, you can use /db/_find endpoint as a mechanism to retrieve part of the doc.
POST /db/_find
{
"selector": {
"_id": "a-doc-id"
},
"fields": [
"_id",
"content"
]
}
You'll get only the set of fields you have specified in the query
This is not possible prior to CouchDB 2.x. For CouchDB 2.x or greater, see JuanjoRodriguez's answer.
But one possible work around for any version of CouchDB would be to take advantage of file attachments, which by default are excluded from a fetch. If some of your data isn't always needed, and doesn't need to be included in indexes, you could potentially store it as (JSON) attachments, rather than as part of the document directly:
{
"id": "foo",
"content": "stuff",
"extra_content": "other stuff"
}
becomes:
{
"id": "foo",
"content": "stuff",
"_attachments": {
"extra_content": {
"content_type": "application/json",
"data": "ZXh0cmEgc3R1ZmYK"
}
}
}

Search for multiple id:s in cloudant

I am using node-red to communicate with cloudant and for each time my flow runs I might have different amount of id:s coming in msg.payload. Later I want to use these id:s to display all the relevant objects. Is it possible to search for multiple id:s in some way? Or do you have any other solution? Can't find anything about this online atm
It looks like Node-RED supports querying by _id, a search index, or all documents. When you use _id there does not seem to be a way to specify more than one ID. You can use a search index, however, to query for multiple IDs.
Create a search index in Cloudant similar to the following:
{
"_id": "_design/allDocSearch",
"views": {},
"language": "javascript",
"indexes": {
"byId": {
"analyzer": "standard",
"index": "function (doc) {\n index(\"id\", doc._id);\n}"
}
}
}
This corresponds to the following when using the Cloudant dashboard:
design doc = allDocSearch
index name = byId
index function =
function (doc) {
index("name", doc.name);
}
To search for multiple IDs your query would look something like this:
id:"1" OR id:"2"
In Node-Red set up your Cloudant node to point to the appropriate database, specify a "Search by" of search index, and configure your design document and index name (in this case it would be allDocSearch/byId).
You can test with a simple inject node with a payload similar to the search query above: id:"1" OR id:"2"

Solr document Submission

I am new in the solr technology.Can you please tell me how a document can submit to solr using user interface.Is it necessary to create xml of the document first?I expect a simplest way of document indexing..
Please Help.
The default Solr RequestHandler (from 4.0) supports four formats: XML, JSON, CSV and javabin. There's a page under the Admin interface to submit documents to the index (select the core and Documents).
There are examples of each of the formats available in the Solr reference guide. If you're using a client library, the library will usually handle this for you anyways, and use an appropriate format depending on which language it's written in and what built-in libraries are available.
The simplest format for manually adding documents is probably JSON:
[
{
"id": "1",
"title": "Doc 1"
},
{
"id": "2",
"title": "Doc 2"
}
]
You can also use the DataImportHandler to import data locally at the server, such as from an SQL-database. In that case you don't submit the actual rows to the server, but you tell the handler to fetch the rows and create documents for you.

Resources