Solr database with multiple schema - solr

My solr db has multiple schema as below,
***Part of Schema 1***
<field1>
<field2>
<field3>
<field4>
<field5>
***Part of Schema 2***
<field6>
<field7>
<field8>
When I do a q = *:*, I get <field6>,<field7> and <field8> but not the remaining fields..
I am able to select fields 1-5 only when field1:'value' in the q object.
Is there a way to know that 6-8 is part of schema-2 and 1-5 is part of schema-1

Depending on your search handler (like (e)DISMAX) you can define the default search fields.
Or you can use the qf= Parameter to define the fields, you like to search in: http://wiki.apache.org/solr/ExtendedDisMax#qf_.28Query_Fields.29
If you like to separate your DB schema in solr, so that fields from schema-1 does not know the fields from schema-2, you can use 2 different solr cores: one for each schema.
Is there a way to know that 6-8 is part of schema-2 and 1-5 is part of schema-1
As far as i know, Solr does not support DB schemas. A field insight solr is a field. There is no way to add additional (meta) informations, where this field is coming from. So you will not be able to filter your queries depending on there origin - except by defining the Query fields or by separating schemas in cores or something like that.

Related

Ranking SOLR results by query

I have a SOLR DIH-conig.xml (dataimporter) with multiple queries, querying different tables (or views) in Oracle 11.
I simply want SOLR to return results from one specific query first before returning the results from the other queries.
How should I configure this in DIH-config.xml?
Thanks.
regards,
Erik
You can't configure that in data-config.xml in any way - what Solr returns is based on what's in the index. The Data Import Handler only imports data into Solr, the SQL queries aren't used for anything outside of getting data into Solr.
However you can work around this through having a special field with static values returned from each query, effectively identifying which query the document was imported from.
In your SQL query, add an aliased field name as the priority of the document:
SELECT ..., 1000 AS priority FROM ...
In the second query, do the same, but with a higher priority value:
SELECT ..., 2000 AS priority FROM ...
This require defining a long / integer field named priority first if you're not running in schemaless mode.
When querying Solr, use this value as the first sort criteria (sort=priority, score). This will give all documents from the first query first, internally sorted by score, then from the last query, also internally sorted by score.

Howto: Reload entities in solr

Lets say you have a Solr core with multiple entities in your document. In my case the reason for that is that the index is fed by SQL queries and I don't want to deal with multiple cores. So, in case you add or change one entity configuration, you eventually have to re-index the whole shop, which can be time consuming.
There is a way, to delete and re-index one single entity, and this is how it works:
Prerequisite: your index entries have to have a field, which reflects the entity name. You could either do that via a constant in your SQL statement or by using the TemplateTransformer:
<field column="entityName" name="entityName" template="yourNameForTheEntity"/>
You can use this name to remove all entity items from the index via using the Solr admin UI. Go to documents,
request-Handler: /update
Document-Type: JSON
Document(s): delete: {query:{entityName:yourNameForTheEntity}}
After submitting the document, all related items are gone and you can see that via running a query on the query page:
{!term f=entityName}yourNameForTheEntity
Then go to the Dataimport page to re-load you entity. Uncheck the Clean checkbox, select your entity and Execute.
After the indexing is complete, you can go back to the query page and check the result.
That's it.
Have fun,
Christian

Retrieve all documents but only specific fields from Cloudant database

I want to return all the documents in a Cloudant database but only include some of the fields. I know that you can make a GET request against https://$USERNAME.cloudant.com/$DATABASE/_all_docs, but there doesn't seem to be a way to select only certain fields.
Alternatively , you can POST to /db/_find and include selector and fields in the JSON body. However, is there a universal selector, similar to SELECT * in SQL databases?
You can use {"_id":{"$gt":0}} as your selector to match all the documents, although you should note that it is not going to be performant on large data sets.

conversion of DateField to TrieDateField in Solr

I'm using Apache Solr for powering the search functionality in my Drupal site using a contributed module for drupal named ApacheSolr Search Integration. I'm pretty novice with Solr and have a basic understanding of it, hence wish to convey my apologies in advance if this query sounds outrageous.
I have a date field added through one of drupal's hooks named ds_myDate which I initially used for sorting the search results. I decided to use a date boosting, so that the search results are displayed based on relevancy and boosted by their date rather than merely being displayed by the descending order of date. Once I had updated my hook to implement the same by adding a boost field as recip(ms(NOW/HOUR,ds_myDate),3.16e-11,1,1) I got a HTTP 400 error stating
Can't use ms() function on non-numeric legacy date field ds_myDate
Googling for the same suggested that I use a TrieDateField instead of the Legacy DateField to prevent this error. Adding a TrieDate field named tds_myDate following the suggested naming convention and implementing the boost as recip(ms(NOW/HOUR,tds_myDate),3.16e-11,1,1) did effectively achieve the boosting. However this requires me to reindex all the content (close to 500k records) to populate the new TrieDate field so that I may be able to use it effectively.
I'd request to know if there's an effective workaround than re-indexing all my content such as converting my ds_myDate to a TrieDate field like running an alter query on a mysql table field to change its type. Since I'm unfamiliar with how Solr works would request to know if such an option is feasible and what the right thing to do would be for this case.
You may be able to achieve it by doing a Partial update, but for that you need to be on on Solr 4+ and storing all indexed fields.
Here is how I would go with this:
Make sure version of Solr is 4+
Make sure all indexed fields are stored (requirement for partial updates)
If above two conditions meet, write a script(PHP), which does following:
1) Iterate through full Solr index, and for each doc:
----a) read value stored in ds_myDate field
----b) Convert it to TrieDateField format
----c) Push onto Solr, via partial update to only tds_myDate field (see sample query)
Sample query:
curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[{"id":"$id","tds_myDate":{"set":$converted_Val}}]'
For more details on partial updates: http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
Unfortunately, once a document has been indexed a certain way and you change the schema, you cannot have the new schema changes be applied to existing documents until those documents are re-indexed.
Please see this previous question - Does Schema Change need Reindex for additional details.

Adding indexed non-stored fields to schema

I added a new field in my schema that is indexed but not stored, so that I can copy another field into it. Do I still have to re-index all the documents because of this schema change? Or can I just restart my solr server? I looks like I have to re-index all documents since sorting on that new non-stored field is giving me unexpected results, but I would like a confirmation on that.
You have to full re-index. As schema change can contain different IndexAnalyzers Solr can't apply schema changes by itself.
Yes, you have to run indexer to actually fill in the data to that filed

Resources