Alfresco Solr Custom Search

Alfresco Solr Custom Search - solr

using Alfresco 4.0.1 we have added numerous new entities and linked them to cm:content. When we search, we want to be able to search not only by criteria of content, but want to say give us all content that is linked to libraries with these properties (for examlpe).
We would expect we need to add a new Solr core (index) and populate it.
Has anyone done this? Can someone offer a hint or two, or a link to a post exlpaining it.
Thanks
--MB
Addition 1: linked means the content is 'linked' with other entities using Alfresco's Peer (Non-Child) Associations.
Addition 2: for example if our model is content and libraries (but it's much more complicated then that), these are linked using peer (non-child) associations because we were not able to use parent-child for other reasons. So what we want to search for is all content with name "document", but that are linked to libreries with location "Texas".

The bottom-line is that Alfresco isn't relational. You can set up associations and through the API you can ask a give node for its associations, but you cannot run queries across associations like you can when you do joins in a relational database.
Maybe you should add a location property to your content node and update its value with a behavior any time an association is created, updated, or deleted on that node. Then you'd be able to run a query by AND-ing the location with other criteria on the node.
Obviously, if you have many such properties that you need to keep in sync your behavior could start to affect performance negatively, but if you have only a handful you should be okay.

Related

What is the limit of sortable fields in the iForels Database?

iForels should support multiple fields for sorting. Hovewer, when I try to sort my iForels Database by 3 columns it always delete previouse features from sort config.
Here is my request body: {"config":{"filters":[],"sorting":{"63e162f4d9afbf6e733534e8":1}}}
I try to include multiple rules inside sorting config and expect it would not be removed automaticaly but return sorted datasets.

Yes, you can easily sort your data by multiple fields, and there are no limits. iForels Database is a kind of fork from Lucene technology. It saves docs in operational memory. That's why it works much faster than MongoDB or SQL databases, even with sorting by multiple fields.
Actually, your config for API works for me. Add as many fields as you'd like. Just make sure you add objects where the prop name is feature_id, and the value is 1 or -1.
If you have this issue on the front end, make sure you turn ON the multisport option for your view. Please, check the screenshots.
Open View Config:
Select Multisort:
I hope you’ll find it helpful!

Solr documents with multiple parents

I'm currently trying to figure out if Solr is the right tool for me. I have the following setup:
There is the primary document type "blog". Then there are two additional document types "user" and "category". Both of these are parents of the "blog" document type.
Now when searching the "blog" documents, I not only want to search in those fields (e.g. title and content), but also in the parent fields (user>name and category>name.
Of course, I could just flatten that down to a single document for Solr, which would ease the search a lot. The downside to this is though, that when e.g. a user updates their name, I have to run through all blog posts of them and update the documents for that in Solr, instead of just updating a single document.
This becomes even worse when the user has another parent, on which I need to search as well.
Do you have any recommendations about how to handle this use case? Maybe my Google foo is just not good enough, but what I found (block joins, etc.) don't seem to do the trick.

The absolutely most performant and easiest solution would be to flatten everything to a single document. It turns out that these relations aren't updated as often as people think, and that searches are performed more often than the documents update. And even if one of the values that are identical across a large set of documents change, reindexing from the most recent documents (for a blog) and then going backwards will appear rather performant for most users. The assumes that you have to actually search the values and don't just need the values - which you could look up from secondary storage when displaying an item (and just store the never changing id in the document).
Another option is to divide this into a multi-search problem. One collection for blog posts, one collection for users and one collection for categories. You then search through each of the collections for the relevant data and merge it in your search model. You can also use [Streaming Expressions] to hand off most of this processing to a Solr cluster for you.
The reason why I always recommend flattening if possible is that most features in Solr (and Lucene) are written for a flat document structure, and allows you to fully leverage the features available. Since Lucene by design is a flat document store, most other features require special care to support blockjoins and parent/child relationships, and you end up experimenting a lot to get the correct queries and feature set you want (if possible). If the documents are flat, it just works.

Properties vs Categories on an Aspect in Alfresco

I'm using Alfresco 4.1.6 and Solr 1.4.
I'm reading about the possibility of using classifications for the nodes, specified with a type d:category in an aspect on the content model.
A good time of searchs in our project are the most important, is the reason I try to design the best option possible for this. Our repository have over 2 millions of documents, spread over directories, where each user (we have 3000 users aprox) have an own root path.
For the queries (FTS_ALFRESCO), we actually use TYPE (we have 5 distinct types of nodes defined on our model) and custom properties (all of them that we use in the queries are indexed).
My question is... Imagine I change my model and use one of our properties like a category. I delete a property and create an aspect with d:category with this property. The search will be more efficient and quickly if I search by TYPE, property and category? Alfresco gives me a best performance if I search this value like a category instead of when I search this value like a normal indexed property? Or really is the same? Whats the benefits of use this like a category?

Category and Properties both of them has different usage.
Main difference is
Property:You could have different value of same property for each content
Category:You will have same category which can be associated to muliple contents
So, based on your requirement you have to choose which one you want to use. As far as performence is concerned I guess category based search will be faster(I haven't really tried it though).

Partial Update of documents

We have a requirement that documents that we currently index in SOLR may periodically need to be PARTIALLY UPDATED. The updates can either be
a. add new fields
b. update the content of existing fields.
Some of the fields in our schema are stored, others are not.
SOLR 4 does allow this but all the fields must be stored. See Update a new field to existing document and http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
Questions:
1. Is there a way that SOLR can achieve this. We've tried SOLR JOINs in the past but it wasn't the right fit for all our use cases.
On the other hand, can elastic search , linkedin's senseidb or other text search engines achieve this ?
For now, we manage by re-indexing the affected documents when they need to be indexed
Thanks

Solr has the limitation of stored fields, that's correct. The underlying lucene always requires to delete the old document and index the new one. In fact lucene segments are write-once, it never goes back to modify the existing ones, thus it only markes documents as deleted and deletes them for real when a merge happens.
Search servers on top of lucene try to work around this problem by exposing a single endpoint that's able to delete the old document and reindex the new one automatically, but there must be a way to retrieve the old document somehow. Solr can do that only if you store all the fields.
Elasticsearch works around it storing the source documents by default, in a special field called _source. That's exactly the document that you sent to the search engine in the first place, while indexing. This is by the way one of the features that make elasticsearch similar to NoSQL databases. The elasticsearch Update API allows you to update a document in two ways:
Sending a new partial document that will be merged with the existing one (still deleting the old one and indexing the result of the merge
Executing a script on the existing document and indexing the result after deleting the old one
Both options rely on the presence of the _source field. Storing the source can be disabled, if you disable it you of course lose this great feature.

Data storage: "grouping" entities by property value? (like a dictionary/map?)

Using AppEngine datastore, but this might be agnostic, no idea.
Assume a database entity called Comment. Each Comment belongs to a User. Every Comment has a date property, pretty standard so far.
I want something that will let me: specify a User and get back a dictionary-ish (coming from a Python background, pardon. Hash table, map, however it should be called in this context) data structure where:
keys: every date appearing in the User's comment
values: Comments that were made on date.
I guess I could just iterate over a range of dates an build a map like this myself, but I seriously doubt I need to "invent" my own solution here.
Is there a way/tool/technique to do this?

Datastore supports both references and list properties. This let's you build one-to-many relationships in two ways:
Parent (User) has a list property containing keys of Child entities (Comment).
Child has a key property pointing to Parent.
Since you need to limit Comments by date, you'd best go with option two. Then you could query Comments which have date=somedate (or date range) and where user=someuserkey.
There is no native grouping functionality in Datastore, so to also "group" by date, you can add a sort on date to the query. Than when you iterate over the result, when the date changes you can use/store it as a grouping key.
Update
Designing no-sql databases should be access-oriented (versus datamodel oriented in sql): for often-used operations you should be getting data out as cheaply (= as few operations) as possible.
So, as a rule of thumb you should, in one operation, only get data that is needed at that moment (= shown on that page to user). I'm not sure about your app's design, but I doubt you need all user's full comments (with text and everything) at one time.

I'd start by saying you shouldn't apologize for having a Python background. App Engine started supporting only Python. Using the db module, you could have a User entity as the parent of several DailyCommentBatch entities each a parent of a couple Comment entities. IIRC, this will keep all related entities stored together (or close).
If you are using the NDB (I love it) you may have employ a StructuredProperty either at the User or DailyCommentBatch levels.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight