sorting fields inside a document in solr - solr

I am working with a solr index that I have not made. I only have access to the solr admin.
In each document that is returned by the query I write in the solr admin, has around 40 fields. These fields are not sorted alphabetically.
Now my question is can I sort them somehow in the solr admin?
If I can not, I have the opportunity to import that index locally in my dev machine. I also have access to the config (solr config, data import config etc) files.
Is it possible to do some magic in any of those config files and import locally which will sort them alphabetically?

No, neither Lucene or Solr guarantees the order of the fields returned (the order of values inside a multi-valued field is however guaranteed)
You might have luck (you won't - see comment below - fl maintains the same order as in the document) by explicitly using the fl parameter to get the order you want, but that would require maintaining a long list of fields to be returned.
It's usually better to ask why you'd need the order of the fields to maintained. The data returned from Solr is usually not meant for the user directly, and should be processed in your controller / view layer to suit the use case.

You could return it using XSLT response writer instead of XML one. Usually it is used to transform XML into a different form, but you could probably use it for identity transformation but with sorting.
I don't think that's the best way forward, but if you are desperate, it is a way.

Related

Use Solr Schemaless feature without automatically adding unknown fields to managed-schema

I have different datasources that uploads different documents to Solr Sink. Now if two datasources sends a same name field with different data types (say integer & double) then indexing of second field fails because data type of first field is already added in managed-schema.
All I need is that both fields get indexed properly as they used to work in Solr 4.x versions .
Since field names come at runtime,please suggest a solution that would work for me. I suppose it needs a change in solrconfig.xml but couldnot find the required.
How was your Solr configured to work in 4.x? You can still do it exactly the same way in Solr 6.
On the other hand, schemaless feature will define the type mapping on the first time it sees the field. It has no way to know what will come in the future. That's also why all auto-definitions are multivalued.
However, if you want to deal with specific mapping of integer being too narrow, you can change the definition of the UpdateRequestProcessor chain that is actually doing the mapping. Just merge the mapping of integer/long/number into one final tdoubles type.

Is there a better way to represent provenenace on a field level in SOLR

I have documents in SOLR which consist of fields where the values come from different source systems. The reason why I am doing this is because this document is what I want returned from the SOLR search, including functionality like hit highlighting. As far as I know, if I use join with multiple SOLR documents, there is no way to get what matched in the related documents. My document has fields like:
id => unique entity id
type => entity type
name => entity name
field_1_s => dynamic field from system A
field_2_s => dynamic field from system B
...
Now, my problem comes when data is updated in one of the source systems. I need to update or remove only the fields that correspond to that source system and keep the other fields untouched. My thought is to encode the dynamic field name with the first part of the field name being a 8 character hash representing the source system.. this way they can have common field names outside of the unique source hash. And in this way, I can easily clear out all fields that start with the source prefix, if needed.
Does this sound like something I should be doing, or is there some other way that others have attempted?
In our experience the easiest and least error prone way of implementing something like this is to have a straight forward way to build the resulting document, and then reindex the complete document with data from both subsystems retrieved at time of reindexing. Tracking field names and field removal tend to get into a lot of business rules that live outside of where you'd normally work with them.
By focusing on making the task of indexing a specific document easy and performant, you'll make the system more flexible regarding other issues in the future as well (retrieving all documents with a certain value from Solr, then triggering a reindex for those documents from a utility script, etc.).
That way you'll also have the same indexing flow for your application and primary indexing code, so that you don't have to maintain several sets of indexing code to do different stuff.
If the systems you're querying isn't able to perform when retrieving the number of documents you need, you can add a local cache (in SQL, memcached or something similar) to speed up the process, but that code can be specific to the indexing process. Usually the subsystems will be performant enough (at least if doing batch retrieval depending on the documents that are being updated).

Create index of nested documents in SOLR

How should I import nested entities from DB to Solr index? For some reasons i don't want to flatten documents into single one. What should i write in schema.xml and data-config.xml ? I'm using Solr 4.10.
The currently distributed version of the DataImportHandler does not support nested documents (or BlockJoins as they're called in Solr/Lucene).
There is however a patch available that you can try out - be sure to follow the discussion on JIRA (SOLR-5147) about how to use it and where it goes in the future.
Since you can't use the DataImportHandler, you could write custom code to do this. I'd recommend using SolrJ to load childDocuments. To handle childDocuments, first you have to create all of your required fields (for all of your different record types) in your schema.xml (or use dynamic fields). From there, you can create a SolrInputDocument for the parent, and a SolrInputDocument for the child, and then call addChildDocument(doc) on the parent SolrInputDocument to add the child to it.
I'd also recommend creating a field that can indicate what level you're at - something like "content_type" that you fill in with "parent" or "root," or whatever works for you. Then, once you've loaded the records, you can use the Block/Join Queries to search hierarchically. Be aware that doing this will create an entry for each record, though, and if you do a q=: query, you'll get all of your records intermixed with each other.

Partial Update of documents

We have a requirement that documents that we currently index in SOLR may periodically need to be PARTIALLY UPDATED. The updates can either be
a. add new fields
b. update the content of existing fields.
Some of the fields in our schema are stored, others are not.
SOLR 4 does allow this but all the fields must be stored. See Update a new field to existing document and http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
Questions:
1. Is there a way that SOLR can achieve this. We've tried SOLR JOINs in the past but it wasn't the right fit for all our use cases.
On the other hand, can elastic search , linkedin's senseidb or other text search engines achieve this ?
For now, we manage by re-indexing the affected documents when they need to be indexed
Thanks
Solr has the limitation of stored fields, that's correct. The underlying lucene always requires to delete the old document and index the new one. In fact lucene segments are write-once, it never goes back to modify the existing ones, thus it only markes documents as deleted and deletes them for real when a merge happens.
Search servers on top of lucene try to work around this problem by exposing a single endpoint that's able to delete the old document and reindex the new one automatically, but there must be a way to retrieve the old document somehow. Solr can do that only if you store all the fields.
Elasticsearch works around it storing the source documents by default, in a special field called _source. That's exactly the document that you sent to the search engine in the first place, while indexing. This is by the way one of the features that make elasticsearch similar to NoSQL databases. The elasticsearch Update API allows you to update a document in two ways:
Sending a new partial document that will be merged with the existing one (still deleting the old one and indexing the result of the merge
Executing a script on the existing document and indexing the result after deleting the old one
Both options rely on the presence of the _source field. Storing the source can be disabled, if you disable it you of course lose this great feature.

Solr equivalent to ElasticSearch Mapping Type

ElasticSearch has Mapping Types to, according to the docs:
Mapping types are a way to divide the documents in an index into
logical groups. Think of it as tables in a database.
Is there an equivalent in Solr for this?
I have seen that some people include a new field in the documents and later on they use this new field to limit the search to a certain type of documents, but as I understand it, they have to share the schema and (I believe) ElasticSearch Mapping Type doesn't. So, is there an equivalent?
Or, maybe a better question,
If I have a multiple document types and I want to limit searches to a certain document type, which one should offer a better solution?
I hope this question has any sense since I'm new to both of them.
Thanks!
You can configure multicore solr:
http://wiki.apache.org/solr/CoreAdmin
Maybe something has changed since solr 4.0 and it's easier now, i didn't look at it since i have switched to elasticsearch. Personally i find elasticsearch indexes/types system much better than that.
In Solr 4+.
If you are planning to do faceting or any other calculations across multiple types than create a single schema with a differentiator field. Then, on your business/mapping/client layer just define only the fields you actually want to look at. Use custom search handlers with 'fl' field to only return the fields relevant to that object. Of course, that means that all those single-type-only fields cannot be compulsory.
If your document types are completely disjoint, you can create a core/collection per type, each with its own definition file. You have full separation, but still have only one Solr server to maintain.
I have seen that some people include a new field in the documents and later on they use this new field to limit the search to a certain type of documents, but as I understand it, they have to share the schema and (I believe) ElasticSearch Mapping Type doesn't.
You can exactly do this in Solr. Add a field and use it to filter.
It is correct that Mapping Types in ElasticSearch do not have to share the same schema but under the hood ElasticSearch uses only ONE schema for all Mapping Types. So technical it makes to difference. In fact the MappingType is mapped to an internal schema field.

Resources