How to get the running status of solr using solrj/ java code - solr

I need the status about if solr has started running, whether the indexing is going on or it has been stopped unexpectedly or finished. Also about the results of the query I have given. Like that for every action performed there in the interface, I need to know it via java code.

To check Solr is running, whether solr is responding.
You can use the ping service of solr
refer solr ping

Related

Questions regarding Flink streaming with Kafka

I have a Java application to lunch a flink job to process Kafka streaming.
The application is pending here at the job submission at flinkEnv.execute("flink job name") since the job is running forever for streamings incoming from kafka.
In this case, how can I get job id returned from the execution? I see the jobid is printing in the console. Just wonder, how to get jobid is this case without flinkEnv.execute returning yet.
How I can cancel a flink job given job name from remote server in Java?
As far as I know there is currently no nice programmatic way to control Flink. But since Flink is written in Java everything you can do with the console can also be done with internal class org.apache.flink.client.CliFrontend which is invoked by the console scripts.
An alternative would be using the REST API of the Flink JobManager.
you can use rest api to consume flink job process.
check below link: https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html.
maybe you can try to request http://host:port/jobs/overview to get all job's message that contains job's name and job's id. Such as
{"jobs":[{"jid":"d6e7b76f728d6d3715bd1b95883f8465","name":"Flink Streaming Job","state":"RUNNING","start-time":1628502261163,"end-time":-1,"duration":494208,"last-modification":1628502353963,"tasks":{"total":6,"created":0,"scheduled":0,"deploying":0,"running":6,"finished":0,"canceling":0,"canceled":0,"failed":0,"reconciling":0,"initializing":0}}]}
I really hope this will help you.

Timeout Solr search

I'm trying to timeout Solr if it takes too long to run a query.
What I've tried:
According to Solr documentation there is timeAllowed parameter which does exactly what I'm looking for, I've tried to
add <int name="timeAllowed">50</int> to requestHadnler in solrConfig.xml;
add timeAllowed parameter to request URL:
localhost:8080/solr4/alfresco/select?q=*&timeAllowed=50
But it doesn't work (time to load results is more than provided value).
Why I want to do this: we implemented access right validation on Solr side and now for some vague queries like * it takes too much time for Solr to respond.
Probably it could be possible to close such connections from Alfresco or from web container but then such queries will internally run on Solr slowing down the server.
You need to insert timeAllowed in milliseconds. timeAllowed 50 - This means that you need to complete the request in 50 milliseconds.
Try timeAllowed 5000.

Normal Query on Cassandra using DataStax Enterprise works, but not solr_query

I am having a strange issue occur while utilizing the solr_query handler to make queries in Cassandra on my terminal.
When I perform normal queries on my table, I am having no issues, but when I use solr_query I get the following error:
Unable to complete request: one or more nodes were unavailable.
Other individuals who have experienced this problem seem unable to do any queries on their data whatsoever, whether or not it is solr_query. My problem only persists while using that handler.
Can anyone give me a suggestion for what the issue may be with my solr node.
ALSO -- I can do queries off of the Solr Admin page but like I said, am unable to do so on a terminal within my macbook.....
Here is the query I used, for reference:
cqlsh:demo> select * from device WHERE solr_query='id:[1 to 10000000000}';
More info:
This is how I created my KEYSPACE:
CREATE KEYSPACE demo WITH REPLICATION = {'class':'NetworkTopologyStrategy', 'Solr':1};
This is how I created the Solr core:
bin/dsetool create_core demo.device generateResources=true reindex=true
Performed a nodetool ring -h on my localhost and got this back:
Datacenter: Solr
Address Rack Status State Load Owns Token
127.0.0.1 rack1 Up Normal 2.8 MB 100.00% -673443545391973027
So it appears my node is up and normal..... Which leads me to believe it is an issue with the actual solr_query handler.
I also found the requestHandler within my config file
Your query isn't probably correct: id:[1 to 10000000000}
The "unavailable nodes" error is unfortunately a red herring, as that's the way Thrift (which cqlsh in Cassandra 2.0 is based upon) translates given errors, while you should get a more meaningful error if you repeat the same query with a driver based on the native protocol.

How to stop Solr servers properly

I'm using SolrCloud and I wanted to know, if there exists a proper way to stop/shutdown solr servers.
Currently I am using the kill command in shell to kill the Solr process (using its PID) but it's not very nice !
Is there a way to stop Solr servers knowing their ports for example or something else?
I have found this post : How to stop solr with command line
I guess that there are differents ways to stop Solr, can someone recommend me the alternative approaches to do it with SolrCloud?
Thanks.

solr healthcheck for >0 documents

The default configuration for solr of /admin/ping provided for load balancer health check integrates well with the Amazon ELB load balancer health checks.
However since we're using master-slave replication when we provision a new node, Solr starts up, and replication happens, but in the meantime /admin/ping return success before the index has replicated across from master and there are documents.
We'd like nodes to only be brought live once they have done the first replication and have documents. I don't see any way of doing this with /admin/ping PingRequestHandler - it always return success if the search succeeds, even with zero results.
Nor is there anyway of matching/not matching expected text in the response with the ELB health check configuration.
How can we achieve this?
To expand on the nature of the problem here, the PingRequestHandler will always return a success unless....
Its query results in an exception being thrown.
It is configured to use a healthcheck file, and that file is not found.
Thus my suggestion is that you configure the PingRequestHandler handler to use a healthcheck file. You can then use a cron job on your Solr system whose job is to check for the existence of documents and create (or remove) the healthcheck file accordingly. If the healthcheck file is not present, the PingRequestHandler will throw a HTTP 503 which should be sufficient for ELB.
The rough algorithm that I'd use...
Every minute, query http://localhost:8983/solr/select?q=*:*
If numDocs > 0 then touch /path/to/solr-enabled
Else rm /path/to/solr-enabled (optional, depending on your strictness)
The healthcheck file can be configured in the <admin> block, and you can use an absolute path, or a filename relative to the directory from which you have started Solr.
<admin>
<defaultQuery>solr</defaultQuery>
<pingQuery>q=*:*</pingQuery>
<healthcheck type="file">/path/to/solr-enabled</healthcheck>
</admin>
Let me know how that works out! I'm tempted to implement something similar for read slaves at Websolr.
I ran into an interesting solution here: https://jobs.zalando.com/tech/blog/zookeeper-less-solr-architecture-aws/?gh_src=4n3gxh1
It's basically a servlet that you could add to the Solr webapp and then check all of the cores to make sure they have documents.
I'm toying with a more sophisticated solution but haven't tested it/made much progress yet: https://gist.github.com/er1c/e261939629d2a279a6d74231ce2969cf
What I like about this approach (in theory) is the ability to check the replication status/success for multiple cores. If anyone finds an actual implementation of this approach please let me know!

Resources