Solr export / replication to a database - solr

My current application has a Solr engine for storing large amounts of data, however I would like to have some of this data copied over to a PostGres database.
Ideally this "replication" should have a delta comparation mechanism for not missing any data coming from Solr.
I can see lots of examples for importing the data from a database into Solr (DataImportHanlder) but not the other way around, I cannot find any DataExportHandler in the Solr Documentation.
Any suggestions?
Thanks

Related

How do I check the data integrity after migrating a Cassandra database onto AWS Keyspaces

I am trying to migrate Cassandra cluster onto AWS Keyspaces for Apache Cassandra.
After the migration is done how can I verify that the data has been migrated successfully as-is?
Many solutions are possible, you could simply read all rows of a partition and compute a checksum / signature and compare with your original data for instance.Then iterating through all your partitions, then doing it for all your tables. Checksums work.
You could use AWS Glue to perform an 'except' function. Spark has a lot of usefull functions for working with massive datasets. Glue is serverless spark. You can use the spark cassandra connector with Cassandra and Keyspaces to work with datasets in glue. For example you may want to see the data that is not in Keyspaces.
cassandraTableDataframe.except(keyspacesTableDateframe).
You could also do this by exporting both datasets to s3 and performing these queries in Athena.
Here is a helpful repository of Glue and Keyspaces functions including export, count, and distinct.

Replicate aws cloudsearch data into apache solr

Is there any way to replicate replicate file index data which is aws cloudsearch to apache solr hosted on ec2 real time.
Not really- in order to make sure that all of your original documents are indexed correctly you have to reindex them into Solr in their original form.

can we use elastic search on mongo DB?

As a Web Developer everyday we are hearing about new technologies, recently I came to know about Elastic Search it is used to analyze the big volumes of data. I've my data in Mongo DB weather it is possible to use elastic search on it.
MongoDB Atlas has a feature called 'Atlas Search', which implements the Apache Lucene engine. This could be a solution for your search requirements.
See Atlas Search for details
Depends what you mean by "analyze the big volumes of data", what are your requirements? Don't pay to much attention on marketing slogans. Maybe you can connect Elasticsearch with MongoDB via an ODBC driver. Elasticsearch is a document oriented NoSQL database like MongoDB is. As usual both have their pros and cons.
MongoDB is more like a database, i.e. it supports CRUD (Create, Read, Update, Delete) operations and the Aggregation Framework is very powerful.
In Elasticsearch you can store data and analyze or query it. I remember in earlier releases it was not so easy to delete or update existing single documents.

Solr for different accounts in a system

I'm working on a SaaS which have a database for each account, with basically the same tables. What's the best way to index all databases separately? I was thinking about setting different solr instances(different ports) for each database in the same server, but it could be hard on the server. So, i'm in this crazy doubt on what to do next. I haven't found any useful idea in the solr documentation. Could you guys help out. Thanks in advance.
If you store all the data from all of your tenants on one collection, it will be easy in the beginning because probably you will do several changes on your schema and it is easier if you do them once for all your customers.
As a negative point in this scenario you will have lots of unrelated data grouped together and you always have to use a filter query for the tenant (client) id.
What if you create, for starters, a collection for each of the tenant on the same Solr server? This way you don't mix the data of your tenants and you achieve the functionality you basically need.
In this scenario, as it happens for your relational database instances, you have to keep the schema changes in sync.
For relational databases there are tools like flyway or liquibase that can be used to version the changes applied on each of the tenant database.
For Solr there aren't AFAIK such tools, but you can apply your schema changes programmatically through Solr Schema API. In case you have to do highly detailed changes that can't be done via the Schema API, you can replace the schema.xml file of each collection with an updated version of it and restart the solr server.
What you need to keep in mind is backward compatibility. Whenever you add some changes to any of the databases (relational DB or Solr) you need to take into account that the old code must still work with the latest updates that you perform on the relational database/ solr schema structure.

How to explore HBase data

I am currently doing an app that loads data into HBase, I chose HBase because the data is not structured and therefore using a column based database is recommended.
Once the data is in HBase I thought of integrating Solr to it but I found little information about the subject and no answer for my question "https://stackoverflow.com/questions/36542936/integrating-solr-to-hbase"
So I wanted to ask how I can query data stored in HBase? Spark Streaming doesn't seem to be made for that ..
Any help please ?
Thanks in advance
Assuming that your question is on how to query data from Hbase.
Apache Phoenix Provides a Sql Wrapper over Hbase.
Hive Hbase Integration Hive also provides a Sql Wrapper over Hbase
Spark Hbase Plugin lets your Apache Spark application interact with Apache HBase.

Resources