Compatible collections - SOLR 4 / SOLRCloud

Compatible collections - SOLR 4 / SOLRCloud - solr

I am trying to fiogure out what SOLR means by compatible collection in order to be able to run the following query:
Query all shards of multiple compatible collections, explicitly specified:
http://localhost:8983/solr/collection1/select?collection=collection1_NY,collection1_NJ,collection1_CT
Does this mean that the schemas.xml must be exactly same between those collections or just partially same (share same fields used to satisfy the query)?
cheers,
/Marcin

OK,
so I have tested all myself and the schema.xml doesn't need to be exactly the same, it just needs to be partially same as defined by the query scope so i.e. if you ask for the description:text all collections/shards need to have description field but other fields can be completely different.

Related

Solr: Normalize date field

I am new to apache solr and exploring some use cases that could potentially be applicable for my application.
In one of the use case, I have multiple mongodb instances pushing data to solr via mongo-connector. I am able to do so by running two instance of mongo-connector with two different mongo instance and using same solr core.
My question is: How do I handle a situation where I have a field in mongo-collection, say "startTime" which is of Date type in one mongo instance and another is treating it as long. I want this field to be treated as long type in solr. Does solr provide any sort of auto conversion or I will have to write my analyzer?

If you want both values to normalize to the same form, you should do that in the UpdateRequestProcessor (defined in solrconfig.xml). There is quite a number of them for various purposes, including date parsing. In fact, the schemaless mode is implemented by a chain of URPs, so that's an example you can review.
To process different Mongo instances in different ways, you can just define separate Update Request Handler endpoints (in solrconfig.xml again) and setup different processing for those. Use shared definitions to avoid duplicating what's common (using processor reference as in the schemaless definition linked above).
It may be more useful to normalize to dates rather than back from dates, as Solr allows more interesting searches that way, such as Date Math.

sorting fields inside a document in solr

I am working with a solr index that I have not made. I only have access to the solr admin.
In each document that is returned by the query I write in the solr admin, has around 40 fields. These fields are not sorted alphabetically.
Now my question is can I sort them somehow in the solr admin?
If I can not, I have the opportunity to import that index locally in my dev machine. I also have access to the config (solr config, data import config etc) files.
Is it possible to do some magic in any of those config files and import locally which will sort them alphabetically?

No, neither Lucene or Solr guarantees the order of the fields returned (the order of values inside a multi-valued field is however guaranteed)
You might have luck (you won't - see comment below - fl maintains the same order as in the document) by explicitly using the fl parameter to get the order you want, but that would require maintaining a long list of fields to be returned.
It's usually better to ask why you'd need the order of the fields to maintained. The data returned from Solr is usually not meant for the user directly, and should be processed in your controller / view layer to suit the use case.

You could return it using XSLT response writer instead of XML one. Usually it is used to transform XML into a different form, but you could probably use it for identity transformation but with sorting.
I don't think that's the best way forward, but if you are desperate, it is a way.

Use Solr Schemaless feature without automatically adding unknown fields to managed-schema

I have different datasources that uploads different documents to Solr Sink. Now if two datasources sends a same name field with different data types (say integer & double) then indexing of second field fails because data type of first field is already added in managed-schema.
All I need is that both fields get indexed properly as they used to work in Solr 4.x versions .
Since field names come at runtime,please suggest a solution that would work for me. I suppose it needs a change in solrconfig.xml but couldnot find the required.

How was your Solr configured to work in 4.x? You can still do it exactly the same way in Solr 6.
On the other hand, schemaless feature will define the type mapping on the first time it sees the field. It has no way to know what will come in the future. That's also why all auto-definitions are multivalued.
However, if you want to deal with specific mapping of integer being too narrow, you can change the definition of the UpdateRequestProcessor chain that is actually doing the mapping. Just merge the mapping of integer/long/number into one final tdoubles type.

Alfresco 5.0 transactional queries

I have some questions about how index in Alfresco One works with transactional queries.
We use Alfresco 5.0.2 and in documentation I can read this: "When you are upgrading the database, you can add optional indexes in order to support the metadata query feature."
Suppose that in my model.xml I add a custom property like this:
<type name="doc:myDoc">
<title>Document</title>
<parent>cm:content</parent>
<properties>
<property name="doc:level">
<title>Level</title>
<type>d:text</type>
<mandatory>true</mandatory>
<index enabled="true">
<atomic>true</atomic>
<stored>false</stored>
<tokenised>both</tokenised>
</index>
</property>
...
</properties>
</type>
And I have on my alfresco-global.properties these sets
solr.query.cmis.queryConsistency=TRANSACTIONAL_IF_POSSIBLE
solr.query.fts.queryConsistency=TRANSACTIONAL_IF_POSSIBLE
system.metadata-query-indexes.ignored=false
My first question is... How Alfresco knows which properties I want to index on DB? Read my model.xml and index only the indexed properties that I specify there? Index all the custom properties? Or I need to create a script to add these new indexes?
I read the script metadata-query-indexes.sql but I don't understand how rewrite it in order to add a new index for my property. If it's necessary this script, could you give me an example with the doc:myDoc property that I wrote before, please?
Another question is about query syntax that isn't supported by DB and goes directly to SOLR.
I read that PATH, SITE, ANCESTOR, OR, any d:content, d:boolean or d:any (among others) properties in your query or it will not be executable against the DB. But I don't understand what d:content is exactly.
For example, a query (based on my custom property written before) like TYPE:whatever AND #doc\:level:"value" is considered d:content? This query is supported by BD or goes to SOLR?
I read also this:
"Any property checks must be expressed in a form that means "identical value check" as querying the DB does not provide the same tokenization / similarity capabilities as the SOLR index. E.g. instead of my:property:"value" you'd have to use =my:property:"value" and "value" must be written in the proper case the value is stored in the DB."
This means that if I use the =, for example doing =#doc\:level:"value", this query isn't accepted on DB and goes to SOLR? I can't search for an exact value on DB?

I've been researching TMQs recently. I'm assuming that you need transactionality, which is why TMQ queries are interesting. Queries via SOLR are eventually consistent, but TMQs will immediately return the change. There are certain applications where eventual consistency is a huge problem, so I'm assuming this is why you are looking into them.
Alfresco says that they use TMQs by default, and in my limited testing (200k documents), I found no appreciable performance difference between a solr and TMQ query. I can't imagine they are horrible for performance if Alfresco set it up to be the default style, but I need to do further testing with millions of documents to be sure. It will of course depend on your database load. If your database is a bottleneck and you don't need the transactionality, you could consider using # syntax in metadata searches to avoid them, or you could disable them via properties configuration.
1) How Alfresco knows which properties I want to index on DB? Read my model.xml and index only the indexed properties that I specify there? Index all the custom properties? Or I need to create a script to add these new indexes?
When you execute a query using a syntax that is compatible with a TMQ, Alfresco will do so. The default behavior is "TRANSACTIONAL_IF_POSSIBLE":
http://docs.alfresco.com/4.2/concepts/intrans-metadata-configure.html
You do not have to have the field marked as indexable in the model for this to work. This is unclear from the documentation but I've tried disabling indexing for the field in the model and these queries still work. You don't even have to have solr running!
2) Another question is about query syntax that isn't supported by DB and goes directly to SOLR.
Your example of TYPE and an attribute does not go to solr. It's things like PATH that must go to SOLR.
3) "Any property checks must be expressed in a form that means "identical value check" as querying the DB does not provide the same tokenization / similarity capabilities as the SOLR index. E.g. instead of my:property:"value" you'd have to use =my:property:"value" and "value" must be written in the proper case the value is stored in the DB."
What they are saying is that you must use the = operator, not the default or # operator. The # operator depends on tokenization, but TMQs go straight to the database. However, you can use * in an attribute if you omit the "", like so:
=cm:\title:Startswith*
Works for me on 5.0.2 vía TMQ. You can absolutely search for an exact value as well however.
I hope this cleared it up for you. I highly recommend putting the solr.query.fts.queryConsistency=TRANSACTIONAL to force TMQs always in a test evironment and testing different queries if you still have questions about what syntax is supported.
Regards

A nice explanation can be found here.
https://community.alfresco.com/people/andy1/blog/2017/06/19/explaining-eventual-consistency
When changes are made to the repository they are picked up by SOLR via
a polling mechanism. The required updates are made to the Index Engine
to keep the two in sync. This takes some time. The Index Engine may
well be in a state that reflects some previous version of the
repository. It will eventually catch up and be consistent with the
repository - assuming it is not forever changing.

Solr equivalent to ElasticSearch Mapping Type

ElasticSearch has Mapping Types to, according to the docs:
Mapping types are a way to divide the documents in an index into
logical groups. Think of it as tables in a database.
Is there an equivalent in Solr for this?
I have seen that some people include a new field in the documents and later on they use this new field to limit the search to a certain type of documents, but as I understand it, they have to share the schema and (I believe) ElasticSearch Mapping Type doesn't. So, is there an equivalent?
Or, maybe a better question,
If I have a multiple document types and I want to limit searches to a certain document type, which one should offer a better solution?
I hope this question has any sense since I'm new to both of them.
Thanks!

You can configure multicore solr:
http://wiki.apache.org/solr/CoreAdmin
Maybe something has changed since solr 4.0 and it's easier now, i didn't look at it since i have switched to elasticsearch. Personally i find elasticsearch indexes/types system much better than that.

In Solr 4+.
If you are planning to do faceting or any other calculations across multiple types than create a single schema with a differentiator field. Then, on your business/mapping/client layer just define only the fields you actually want to look at. Use custom search handlers with 'fl' field to only return the fields relevant to that object. Of course, that means that all those single-type-only fields cannot be compulsory.
If your document types are completely disjoint, you can create a core/collection per type, each with its own definition file. You have full separation, but still have only one Solr server to maintain.

I have seen that some people include a new field in the documents and later on they use this new field to limit the search to a certain type of documents, but as I understand it, they have to share the schema and (I believe) ElasticSearch Mapping Type doesn't.
You can exactly do this in Solr. Add a field and use it to filter.
It is correct that Mapping Types in ElasticSearch do not have to share the same schema but under the hood ElasticSearch uses only ONE schema for all Mapping Types. So technical it makes to difference. In fact the MappingType is mapped to an internal schema field.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight