SOLR (8.11) cloud and Data Import Handler - solr

I have always used SOLR in a standalone mode and want to learn about using it in a cloud mode. I was reading the documentation provided https://solr.apache.org/guide/8_11/uploading-structured-data-store-data-with-the-data-import-handler.html
My question regarding data import is that the above link mentions The Data Import Handler has to be registered in solrconfig.xml
When starting SOLR in cloud mode solr.cmd start -e cloud, when I look at the directories structure, I can't find solrconfig.xml
Can someone help and point out how to go about importing data in the cloud mode?

Related

How can I connect Google App Engine to Elastic Cloud?

I'm trying to connect to my cluster on Elastic Cloud using elasticsearch-py on GAE, but I'm running into the following error:
ConnectionError: ConnectionError('VerifiedHTTPSConnection' object has no attribute '_tunnel_host') caused by: AttributeError('VerifiedHTTPSConnection' object has no attribute '_tunnel_host')
I've tried this fix that I've seen in a number of places already that reference the '_tunnel_host' error, but it's not resolving my issue:
from requests_toolbelt.adapters import appengine
appengine.monkeypatch()
I've also tried a few variations that I've seen for the es declaration, but none of them have worked; for example:
es = Elasticsearch(["https://elastic:password#xxxxx.us-central1.gcp.cloud.es.io:9243"],
send_get_body_as='POST',
use_ssl=True,
verify_certs=True)
I'd like to be able to establish the connection and begin sending and consuming data from my cluster, but can't find a way to do this. Any help would be much appreciated!
There is an article with example of real word app Elasticsearch on Google Cloud with Firebase functions.
On the other hand there is Google Cloud Marketplace with many available Elasticsearch solutions, for example:
1.You can deploy and configure Elasticsearch Cluster that works with kubernetes, using Google Click to Deploy containers.
Or Elasticsearch complete solution using virtual machines provided by Google.

How do I completely delete Cloud Datastore from a project?

I want to create a Firestore in Native mode in an existing project.
I don't have any data in Cloud Datastore, but it blocks me, saying
This project uses another database service
Your current project is set up to use Cloud Datastore or Cloud Firestore in Datastore mode. You can access data for this project from the Cloud Datastore console.
when going through https://console.cloud.google.com/firestore/
and
Cannot enable Firestore for this project
Currently Firestore cannot be enabled in projects already using Cloud Datastore or App Engine
when going through https://console.firebase.google.com/
I've tried it with writes to datastore enabled and disabled
I just want to completely purge the Cloud Datastore product from my project.
Unfortunately, there is no way to purge the previous existence of a Cloud Datastore database to try either Cloud Firestore in native or Datastore mode. You'll have to use a new project to try Cloud Firestore in either native or Datastore mode.
You can switch using the below command
$ gcloud alpha firestore databases update --type=firestore-native
This is the response I got from Google Cloud Support today (February 16th, 2021):
Generally, we do recommend to create a new project and enable Firestore therein. Nonetheless I may submit a request to delete your existing database which may allow you to change the database to the desired mode. However, please keep in mind that I’m unable to guarantee it’s approval.
I just tried it on my end. I deleted all my entities (test project) and disabled my Datastore API, same issue as you when I visit the console.firebase.google.com page.
This is likely an issue that needs to be reported either through support (if you have a support package for Google Cloud Platform), or through our issue tracker.
Still the same as of 10 June 2020.
If you have an empty Datastore database and you never executed a write to the database, you can upgrade to Firestore in Datastore mode or Native mode.
If you do not receive this option, then your database instance will be automatically upgraded at a future date.
If you upgrade from Datastore to Firestore in Datastore mode or from Datastore mode to Native mode, you cannot undo the operation.
See this page for additional details:
https://cloud.google.com/datastore/docs/upgrade-to-firestore
If you just want to lock down use the following security rule it will not delete but lock down your firestore so no one can write to it. Its not the answer you are looking for but in the same spirit.
// Deny read/write access to all users under any conditions
service cloud.firestore {
match /databases/{database}/documents {
match /{document=**} {
allow read, write: if false;
}
}
}
Just visit https://console.cloud.google.com/....... by clicking on the button, if you have not added any data it will show option to switch to native

How to automate solr indexing?

Normally we do indexing in solr from a browser. Can we do it automatically by writing a batch job or java code?
Please provide me some idea, if it is possible.
You can use the DataImportHandler, which can import from lot of different sources such as databases or xml files: https://wiki.apache.org/solr/DataImportHandler
If you have specific requirements which are not satisfied by the DataImportHandler you may implement your own indexer by using a solr client api:
https://cwiki.apache.org/confluence/display/solr/Client+APIs
If you want to do stuff with Solr programmaticaly take a look at: Solrj which is an API that'll do what your asking for.
You can use a web debugging proxy such as Fiddler to view the HTTP request that is generated when you trigger the data import via a web browser. Then send the same request from your Java code.

Solr Cloud Managed Resources

I am implementing Solr Cloud for the first time. I've worked with normal Solr and have that down pretty well, but I'm not finding a lot on what you can and can't do with Solr Cloud. So my question is about Managed Resources. I know you can CRUD stop words and synonyms using the new RESTful api in solr. However with the cloud do I need to CRUD my changes to each individual solr server in the cloud, or do I send them to a different url that sends them through to each server? I'm new to cloud and zookeeper. I have not found anything in the solr wiki about working with the managed resources in the cloud setup. Any advice would be helpful.
In SolrCloud configuration and other files like stopwords, are stored and maintained by Zookeeper. Which means you do not need to individually send updates to each server.
Once you have SolrCloud, before putting in any data, you will create a collection. Each collection has its own set of resources/config folder.
So for example if u have a collection called techproducts with 2 servers localhost1 and localhost2 the below command from any of the servers will work on the same resource.
curl "http://localhost1:8983/solr/techproducts/schema/analysis/synonyms/english"
curl "http://localhost2:8983/solr/techproducts/schema/analysis/synonyms/english"

How can I export data from Google App Engine High Replication datastore?

I am looking into using Google App Engine for a project and would like make sure I have a way to export all my data if I ever decide to leave GAE (or GAE shuts down).
Everything I search about exporting data from GAE points to https://developers.google.com/appengine/docs/python/tools/uploadingdata. However, that page contains this note:
Note: This document applies to apps that use the master/slave
datastore. If your app uses the High Replication datastore, it is
possible to copy data from the app, but Google does not currently
support this use case. If you attempt to copy from a High Replication
datastore, you'll see a high_replication_warning error in the Admin
Console, and the downloaded data might not include recently saved
entities.
The problem is that recently the master/slave datastore was recently deprecated in favor of the High Replication datastore. I understand that the master/slave datastore is still supported for a little while, but I don't feel comfortable using something that has officially been deprecated and is on its way out. So that leaves me with the High Replication datastore and the only way it seems to export the data is the method above that is not officially supported (and thus does not provide me with a guarantee that I can get my data out).
Is there any other (officially supported) way of exporting data from the High Replication datastore? I don't feel comfortable using Google App Engine if it means my data could be locked in there forever.
It took me quite a long time to setup the download of data from GAE as the documentation is not as clear as it should be.
If you extracting data from a Unix server, you maybe could reuse the script below.
Also, if you do not provide the "config_file" parameter, it will extract all your data for this kind but in a proprietary format which can only be used for restoring data afterwards.
#!/bin/sh
#------------------------------------------------------------------
#-- Param 1 : Namespace
#-- Param 2 : Kind (table id)
#-- Param 3 : Directory in which the csv file should be stored
#-- Param 4 : output file name
#------------------------------------------------------------------
appcfg.py download_data --secure --email=$BACKUP_USERID -- config_file=configClientExtract.yml --filename=$3/$4.csv --kind=$2 --url=$BACKUP_WEBSITE/remote_api --namespace=$1 --passin <<-EOF $BACKUP_PASSWORD EOF
Currently app engine datastore supports another option also. Data backup provision can be used to copy selected data into blob store or google cloud storage. This function is available under datastore admin area in app engine console. If required, the backed up data can then be downloaded from the blob viewer or cloud storage. For doing the backup for high replication datastore, it is recommended that datastore writes are disabled before taking the backup.
You need to configure a builtin called remote_api. This article has all the information and guide you need to be able to download all your data today and in the future.

Resources