Deleting documents from cloudant db - cloudant

I need to delete specific documents from a cloudant database as illustrated below.
All documents are in the Json format:
{
"_id":"abcd",
"_rev":"1_efgh",
"moduleName":"co",
"userid":"knight",
"stateType":"yout"
}
I want to delete all the documents for which the "userid" field does not match any of the values in a list of String values,
say
["userid1", "userid2"]
I am using com.cloudant.client.api.CloudantClient in Springboot.

You can create a view to get the documents matching your criteria. Then you should perform a bulk operation to delete the docs you've fetched before.
See also related question: How to delete docs in Couch db

Related

Apache Solr Querying by search term from multiple tables and in all columns

I am new to Apache Solr and have worked with single table and importing it in Solr to get data using query.
Now I want to do following.
query from multiple tables ..... Like if I find by a word, it should return all occurances in multiple tables.
Search in all fields of table ....like I query by word in all fields in single table too.
Do I need to create single document by importing data from multiple tables using joins in data-config.xml? And then querying over it?
Any leads and guidance is welcome.
TIA.
Do I need to create single document by importing data from multiple tables using joins in data-config.xml? And then querying over it?
Yes. Solr uses a document model (rather than a relational model) and the general approach is to index a single document with the fields that you need for searching.
From the Apache Solr guide:
Solr’s basic unit of information is a document, which is a set of data
that describes something. A recipe document would contain the
ingredients, the instructions, the preparation time, the cooking time,
the tools needed, and so on. A document about a person, for example,
might contain the person’s name, biography, favorite color, and shoe
size. A document about a book could contain the title, author, year of
publication, number of pages, and so on.

Update all records in a solr column to "hello"

There are many of records in a solr collection. We need to update a particular column to "hello"
I have executed below json using update request handler, But it create a new record with primary key * and set its column to hello.
{
"Primary_key":"*",
"Column1":{"set":"hello"}
}
Is there any way to update a column1 in all records to hello?
There is no way to update a documents in Solr using a query like '*'.
According to me, the best way you can speed up your column update in this case is to submit multiple queries in single update request and use atomic updates.
Atomic updates allows changing only fields of a document without having to reindex the entire document.
You can send multiple update requests like,
[{"id":"1",
"column1":{"set":"hello"},
{"id":"2",
"column1":{"set":"hello"}]
There is a very old jira with this respect.

Database mapping to Solr

I'm building a Java app using a relational database and I wish to map it's primary data to a Solr index/es. However, I'm not sure how to map the components of a database. At the momement I've mapped a single row cell to a Solr/Lucene Document.
A doc would be something like this (each line is a field):
schema: "schemaName"
table: "tableName"
column: "columnName"
row: "rowNumber"
data: "data on schemaName.tableName.columnName.row"
This allows me to have a "fixed" Solr schema.xml(as far as I know it has to be defined "before" creating indexes). Also dynamic fields doesn't seem to serve my purpose.
What I've found while searching is that a single row is usually mapped to a Solr Document and each column is mapped as a Field. But, how can I add the column names as fields into schema.xml (when I don't know the columns a table has)? Also, I would need the info to be queried as if it was SQL. I.e, search for all rows of a column in a table, etc, etc.
With my current "solution" I can do that kind of queries but I'm worried with performance as I'm new to Solr and I don't know the implications it may have.
So, what do you say about my "solution"? Is there another way map a database to a Solr index concerning the schema.xml fields should be set before indexing? I've also "heard" that a table is usually mapped to a index: how could I achieve that?
Maybe I'm just being noob but by the research I did I don't see how I can map a database Row to a Solr Document without messing with schema.xml Fields.
I would appreciate any thoughts :) Regards.
You can specify your table columns in the schema before hand or use dynamic fields and then use the solr DIH to import the data into solr from the database. Select your dynamic fields name in the queries for DIH.
Please go through Solr DIH for database integration

Updating solr index with deleted records

I was trying to figure out how to update the index for the deleted records. I'm indexing from the database. I search for documents in the database, put them in an array and index them by creating a SolrInputDocument.
So, I couldn't figure out how to update the index for the deleted records (because they don't exist in the database now).
I'm using the php-solr-pecl extension.
You need to handle the deletion of the documents separately from Solr.
Solr won't handle it for you.
In case of Incremental, You need to maintain the Documents deleted from the Database and then fire a delete query for the same to clean up the index.
For this you have to maintain a timestamp and delete flag to identify the documents.
In case of the Full, you can just clean up the index and reindex all.
However, in case of failures you may loose all the data.
Solr DIH provides a bit of handling for the same
create a delete trigger on the database table which will insert the deleted record id in another table.(or have boolean field "deleted" and mark the record instead of actually deleting it, considering the trade-offs I would choose the trigger)
Once in a while do a batch delete on index based on the "deleted" table, also removing them from the table itself.
We faced the same issue and came up with batch deletion approach.
We created a program that will delete the document from SOLR based on the uniqueid, if the unique id is present in SOLR but not in database you can delete that document from SOLR.
(Get the uniqueid list from SOLR) minus (uniqueid list from database)
You can just use SQL minus to get the list of uniqueid belonging to the documents that needs to be deleted.
Else you can do everything in JAVA side. Get the list from database, get the list from solr.. Do a comparison between the 2 list and delete based on that..This would be lost faster for huge number of documents. You can use binary search method to do the comparison..
Something like
Collections.binarySearch(DatabaseUniqueidArray, "SOLRuniqueid");

Solr return file name

I have indexed a couple of documents using solr, now when I perform a search using the admin interface, it returns search results in the XML format.
I am trying to figure out how can I associate a document that I have indexed example: test.pdf with the results that I receive and then serve that document to my user ?
Will solr return to me a unique ID of the document that I index, so that after indexing a document I can store the document along with that UID in my database somewhere and then when the user performs a search solr return the unique ID's of documents that match the search criteria and then I serve them from the database
You will need to add the filename as a stored field. Look at your schema.xml and make sure you declare a field of type string and set the stored attribute to true. By setting stored=true you will ensure that Solr can return the field back in results.
See this page for more information: http://wiki.apache.org/solr/SchemaXml

Resources