Solrcloud Backup & Restore indexed data - solr

From last couple of weeks I am using SolrCloud on 3 development server with a single Load Balancer (in future I will extend it to 5 different server for Zookeeper and Solr). My current SolrCloud structure is like below.
Server 1 : Java + Solr(port 8983) + Zookeeper(port 2181)
Server 2 : Java + Solr(port 8983) + Zookeeper(port 2181)
Server 3 : Java + Solr(port 8983) + Zookeeper(port 2181)
Here I am able to create SOLR configuration from any server by uploading conf of my collection & RELOAD the collection using COLLECITON API, all my SOLR configuration is syncing and I am able to index and search my document perfectly. My collection had 1 shard and 3 replica, then I split the single shard to two. So basically its a single collection with 3 shard and 3 replica now.
So, now I have some questions
Q1) Is my current structure is OK ? or I need to change my current structure ?
Q2) How can I Backup and Restore my indexed collection data ?
Q3) What would happen if one of my server closed connection and then I am trying to backup and restore my solr data?
As I have seen the COLLECTION API endpoint to Backup and Restore collection data here at https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-Backup
but couldn't figure out how to set the path/to/my/shard/drive and all that things on those two API endpoint to backup and restore my indexed data. Need help badly

I have faced similar problem Solr collection api provides backup of complete collection from solr v6.0
Using Spring Solr Data or Not for Flexible Requests as Like Backup?
goto above link you can get backup in that way
need call backup command on each shard

use location param to set path/to/my/shard/drive.
This path should be present on all your servers 1,2,3.
When running restore API, you need to provide the same Path.
restore will recover each shard using data present on path/to/my/shard/drive.
If you dont want to Backup on local filesystem, you can use hdfs as backup filesystem.
This can be done by adding a new repository in solr.xml. And using this repository name in Backup/Recovery API.
location and repository options are mutually exclusive.

Related

Load database from offline back-up Neo4j

I backed up a neo4j database using
bin/neo4j-admin dump --database=neo4j --to=c:/
Then I load a database from an archive created with the dump command as follow
bin/neo4j-admin load --from=/var/lib/neo4j/data/c: --database=db
From Neo4j Enterprise Browser I execute
SHOW DATABASES
but I don't see the db previously loaded. How could I show it ?
If you are replacing your existing database with name "db" then use the --force option:
bin/neo4j-admin load --from=/var/lib/neo4j/data/c: --database=dbase --force
If you are restoring into a new database, then after the load, you need to create the database CREATE DATABASE dbase
Note that I changed the name of your database from db to something else since database names in Neo4j must be at least 3 characters long.

Migration from Standalone Solar to SolrCloud in AWS

Our Developers is working in Local Standalone Solr Server and we have many cores in Local Solr. Now we are planning to migrate it to SolrCloud in AWS Infrastructure for replication purpose with numShards:3 and replicationFact:3. We don't need data to be migrated from Local Solr server to AWS SolrCloud. We only need to transfer Core from Local Solr to collection in SolrCloud. I am newbie in these can you please any help me in these.
1) In layman word we only need to transfer content in Conf folder of core to SolrCloud Collection and we don't need to transfer data(data folder).
Answering the my own question ,so any one can check it, if issue arise.
Solution:
1) Create a new collection in SolrCloud with config set name same as of core.
2) Move the conf folder of the core in Local Standalone Solr Server to SolrCloud 'Collection' Folder.
3) Run zookeeper's zkCli.sh commands from bash to upload the conf file to all SolrCloud server.
cd /opt/solr/server/scripts/cloud-scripts/
bash zkcli.sh -cmd upconfig -confdir /opt/solr-7.4.0/server/solr/collectionname/conf/ -z IP1:2181,IP2:2181,IP3:2181 -confname confname
Reference : https://lucene.apache.org/solr/guide/6_6/using-zookeeper-to-manage-configuration-files.html#UsingZooKeepertoManageConfigurationFiles-UploadingConfigurationFilesusingbin_solrorSolrJ

How to undo a remove in mongoose

If I was to accidentally run a UserModel.remove({}, cb) and delete all of my user documents, how would I retrieve those documents? Would the process be same on a local DB on my computer VS a DB hosted on a remote server such as mLab?
No they can not be retrieved back once deleted. The only way to recover the data is to create a dump of that collections before deleting it or if you have a replica of that server. You can restore it from that.

Apache SOLR index on remote server

I want to be able to run a SOLR instance on my local computer by have the index directory on a remote server. Is this possible ?
I've been trying to look for a solution for days. Please help.
Update: We've got a business legal requirement where we are not allowed store client data on our servers ... we can just read, insert, delete and update it on Client request via our website and the data has to be stored on client servers. So each client will have their own index and we cannot run SOLR or any other web application on Client's server. Some of the clients have dropbox business account. So we thought may be just having the SOLR index file upload to dropbox might work.
enable remote streaming in solrConfig.xml and configure the remote file location in it.
It's working

How to reset / clear / delete neo4j database?

We can delete all nodes and relationships by following query.
MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r
But newly created node get internal id as ({last node internal id} + 1) . It doesn't reset to zero.
How can we reset neo4j database such as newly created node will get id as 0?
From 2.3, we can delete all nodes with relationships,
MATCH (n)
DETACH DELETE n
Shut down your Neo4j server, do a rm -rf data/graph.db and start up the server again. This procedure completely wipes your data, so handle with care.
run both commands.
match (a) -[r] -> () delete a, r
above command will delete all nodes with relationships.
then run ,
match (a) delete a
and it will delete nodes that have no relationships.
Dealing with multiple databases.
According to Neo4j manage multiple databases documentation:
One final administrative difference is how to completely clean out one database without impacting the entire instance with multiple databases. When dealing with a single instance and single database approach, users can delete the entire instance and start fresh. However, with multiple databases, we cannot do that unless we are comfortable losing everything from our other databases in that instance.
The approach is similar to other DBMSs where we can drop and recreate the database, but retain everything else. Cypher’s command for this is CREATE OR REPLACE DATABASE <name>. This will create the database (if it does not already exist) or replace an existing database with a clean one.
When neo4j is initiated, it is possible to access two databases, a system database and a default (neo4j) database. To clear/reset neo4j database:
1 - Switch to system database:
:use system
2 - Show all databases created with the instance:
SHOW DATABASES
3 - Run the command to clear the database.
CREATE OR REPLACE DATABASE <name>
In my experience, there are two ways to reset a Neo4j database, depending on what you need.
Method 1: Simply delete all nodes/relationships/indexes/constraints
In Neo4j Browser, or in Py2neo with graph.run() (link).
# All nodes and relationships.
MATCH (n) DETACH DELETE n
# All indexes and constraints.
CALL apoc.schema.assert({},{},true) YIELD label, key RETURN *
However, despite being convenient, this approach is not suitable in case of using command neo4j-admin.bat import for BULK import, i.e. ideal for importing millions of nodes at once quickly.
Method 2: Reset database for BULK Import Tool
It's not possible to BULK import when the database is not empty. I tried the above method, but still received the error:
Import error: C:\Users\[username]\AppData\Local\Neo4j\Relate\Data\dbmss\dbms-dd16c384-78c5-4c21-94f3-b0e63e6c4e06\data\databases\neo4j already contains data, cannot do import here
Caused by:C:\Users\[username]\AppData\Local\Neo4j\Relate\Data\dbmss\dbms-dd16c384-78c5-4c21-94f3-b0e63e6c4e06\data\databases\neo4j already contains data, cannot do import here
java.lang.IllegalStateException: C:\Users\[username]\AppData\Local\Neo4j\Relate\Data\dbmss\dbms-dd16c384-78c5-4c21-94f3-b0e63e6c4e06\data\databases\neo4j already contains data, cannot do import here
To tackle this issue, I deleted the following folders:
c:\Users\[username]\AppData\Local\Neo4j\Relate\Data\dbmss\dbms-dd16c384-78c5-4c21-94f3-b0e63e6c4e06\data\databases\neo4j
and
c:\Users\[username]\AppData\Local\Neo4j\Relate\Data\dbmss\dbms-dd16c384-78c5-4c21-94f3-b0e63e6c4e06\data\transactions\neo4j
Then carried out the Import command:
"C:\Users\[username]\AppData\Local\Neo4j\Relate\Data\dbmss\dbms-dd16c384-78c5-4c21-94f3-b0e63e6c4e06\bin\neo4j-admin.bat" import --database=neo4j --multiline-fields=true --nodes=node_ABC.csv --nodes=node_XYZ.csv relationships=relationship_LMN.csv --relationships=relationship_UIO.csv
Start the Neo4j database. In Neo4j Desktop, the labels and relationships should now be recognized.
Notice that the database I deleted (neo4j) and the database I imported to are the same.
This worked for me with ver. 4.3.2 of the community editition:
Stop the server
cd <neo home>
rm -Rf data/databases/* data/transactions/*
Restart the server
Now you've again the system and the neo4j DBs. The command above deletes the system DB too, and that seems necessary, since deleting a regular DB only (which, in the community edition can only be 'neo4j') makes the metadata in the system DB inconsistent and you start seeing errors.
data/dbms seems to contain the user credentials and you can keep it if you want to keep existing users (else, you'll go back to the default neo4j/test user).
The recommended method is to use the DROP or CREATE Cypher commands, however, these are available in the enterprise edition only (I think it's a shame that a basic feature like this is part of their premium offer, but that's it).
This command deletes everything but requires apoc to be installed :
CALL apoc.periodic.iterate('MATCH (n) RETURN n', 'DETACH DELETE n', {batchSize:1000})
If you are using it on a docker container, you can do
docker-compose rm -f -s -v myNeo4jService
Since neo4j only runs current database specified in the conf file, an easy way to start a new and clean db is to change the current database in the neo4j.conf file and then restart neo4j server.
dbms.active_database=graph.db --> dbms.active_database=graph2.db
Some might argue that the database name is changed. But as of this writing [2018-12], neo4j doesn't support multiple database instances. There is no need for us to differentiate between databases, thus the name of the database is not used in our code.
You can clear/truncate the database with the command below:
MATCH (n) DETACH DELETE n
What this command does is, it matches all the nodes in the database, then detaches all the relationships the matched nodes have and finally deleting the nodes themselves.

Resources