I have a SOLR DIH-conig.xml (dataimporter) with multiple queries, querying different tables (or views) in Oracle 11.
I simply want SOLR to return results from one specific query first before returning the results from the other queries.
How should I configure this in DIH-config.xml?
Thanks.
regards,
Erik
You can't configure that in data-config.xml in any way - what Solr returns is based on what's in the index. The Data Import Handler only imports data into Solr, the SQL queries aren't used for anything outside of getting data into Solr.
However you can work around this through having a special field with static values returned from each query, effectively identifying which query the document was imported from.
In your SQL query, add an aliased field name as the priority of the document:
SELECT ..., 1000 AS priority FROM ...
In the second query, do the same, but with a higher priority value:
SELECT ..., 2000 AS priority FROM ...
This require defining a long / integer field named priority first if you're not running in schemaless mode.
When querying Solr, use this value as the first sort criteria (sort=priority, score). This will give all documents from the first query first, internally sorted by score, then from the last query, also internally sorted by score.
Related
I am trying to make a website and I'm using mongoDB to store my database. I have a question about the performance about the query findOne which I've used widely. Does this query take the whole collection from the database to the server and then perform the iteration over it or does it perform the iteration on the database and just return the document to the server? Picking up the whole collection from the server will be an issue because transferring such a huge chunk of data will take time.
Understanding how mongodb uses indexes would help you answer this question. If you pass in parameters to the findOne query, and those parameters match an index on the collection then mongodb will use the index to find your results. Without the index mongodb will need to scan the collection till it finds a match.
For example if you run a query like:
db.coll.findOne({"_id": ObjectId("5a0a0e6f29642fd7a970420c")})
then mongodb will know exactly which document you want since the _id field is unique and contains an index. If you query on another field which isn't indexed then mongodb will need to do a COLLSCAN to find the document(s) to return.
Quoting official MongoDB documentation:
findOne - Returns one document that satisfies the specified query criteria on
the collection or view. If multiple documents satisfy the query, this
method returns the first document according to the natural order which
reflects the order of documents on the disk.
Obviously implied is that the database itself will only return one collection, and in addition, you could always use postman, or console.log to check what the server returns (if you're not sure).
If i want to limit the cloudant query 'index' to certain set of documents, can i apply a selector clause at the index creation time, the same way we apply the selector clause at the time of Cloudant query.
Otherwise it ends up creating index for the specified field for the whole database.
Yes you can. However it is only for text indexes. See the documentation here.
I have a report that generates an excel file daily with data extracted from a MS-SQL database. I now have to add additional columns to my spreadsheet from an Oracle database where the ID matches the ID in the MS-SQL query results.
My problem is that I have about 1200-1400+ unique IDs generated on this report from the first query. When I plug them into an IN list with the Oracle query and try to do a CFDUMP to see if the results will come out as it should, I receive a CF error saying that query cannot list more than 1000 results from the oracle query.
I basically set the values from the first query into a valuelist for the ID column and then put that into the IN clause for the Oracle query. I then do a cfdump on the Oracle where I receive that error. I've also tried wrapping cfloop query = "firstquery"> around the Oracle query and just placing #firstquery.columnIDname# but that does not work either.
So two questions I have here is ..
How do I handle the limit on Oracle with 1k limit and if I only have read only access to the Oracle database with ColdFusion?
After #1 is figured out, how could I combine the results from the Oracle Query with my MSSQL query or in other words, add the columns I'm pulling from the Oracle query to the spreadsheet for the matching ID.
Thanks.
For your quick, dirty, and sub-optimal approach, visit cflib.org and look for a function called ListSplit(). It converts a long list to an array of short lists.
You then loop through this array and run a query each time. Make sure the query name changes with each loop iteration.
After the loop, do a query of queries union query. Then do whatever you have to do to combine that data with what you got from sql server.
Note that you will probably have to use array notation to access your dynamically named query objects.
My solr db has multiple schema as below,
***Part of Schema 1***
<field1>
<field2>
<field3>
<field4>
<field5>
***Part of Schema 2***
<field6>
<field7>
<field8>
When I do a q = *:*, I get <field6>,<field7> and <field8> but not the remaining fields..
I am able to select fields 1-5 only when field1:'value' in the q object.
Is there a way to know that 6-8 is part of schema-2 and 1-5 is part of schema-1
Depending on your search handler (like (e)DISMAX) you can define the default search fields.
Or you can use the qf= Parameter to define the fields, you like to search in: http://wiki.apache.org/solr/ExtendedDisMax#qf_.28Query_Fields.29
If you like to separate your DB schema in solr, so that fields from schema-1 does not know the fields from schema-2, you can use 2 different solr cores: one for each schema.
Is there a way to know that 6-8 is part of schema-2 and 1-5 is part of schema-1
As far as i know, Solr does not support DB schemas. A field insight solr is a field. There is no way to add additional (meta) informations, where this field is coming from. So you will not be able to filter your queries depending on there origin - except by defining the Query fields or by separating schemas in cores or something like that.
I'm using Apache Solr for powering the search functionality in my Drupal site using a contributed module for drupal named ApacheSolr Search Integration. I'm pretty novice with Solr and have a basic understanding of it, hence wish to convey my apologies in advance if this query sounds outrageous.
I have a date field added through one of drupal's hooks named ds_myDate which I initially used for sorting the search results. I decided to use a date boosting, so that the search results are displayed based on relevancy and boosted by their date rather than merely being displayed by the descending order of date. Once I had updated my hook to implement the same by adding a boost field as recip(ms(NOW/HOUR,ds_myDate),3.16e-11,1,1) I got a HTTP 400 error stating
Can't use ms() function on non-numeric legacy date field ds_myDate
Googling for the same suggested that I use a TrieDateField instead of the Legacy DateField to prevent this error. Adding a TrieDate field named tds_myDate following the suggested naming convention and implementing the boost as recip(ms(NOW/HOUR,tds_myDate),3.16e-11,1,1) did effectively achieve the boosting. However this requires me to reindex all the content (close to 500k records) to populate the new TrieDate field so that I may be able to use it effectively.
I'd request to know if there's an effective workaround than re-indexing all my content such as converting my ds_myDate to a TrieDate field like running an alter query on a mysql table field to change its type. Since I'm unfamiliar with how Solr works would request to know if such an option is feasible and what the right thing to do would be for this case.
You may be able to achieve it by doing a Partial update, but for that you need to be on on Solr 4+ and storing all indexed fields.
Here is how I would go with this:
Make sure version of Solr is 4+
Make sure all indexed fields are stored (requirement for partial updates)
If above two conditions meet, write a script(PHP), which does following:
1) Iterate through full Solr index, and for each doc:
----a) read value stored in ds_myDate field
----b) Convert it to TrieDateField format
----c) Push onto Solr, via partial update to only tds_myDate field (see sample query)
Sample query:
curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[{"id":"$id","tds_myDate":{"set":$converted_Val}}]'
For more details on partial updates: http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
Unfortunately, once a document has been indexed a certain way and you change the schema, you cannot have the new schema changes be applied to existing documents until those documents are re-indexed.
Please see this previous question - Does Schema Change need Reindex for additional details.