Solr - index updates automatically - solr

In my data-config I handle data loading using:
query, deltaQuery,deltaImportQuery & deletedPkQuery.
I want to run updates automatically every X minutes.
Is there a way to do it?
Thanks.

You can schedule a cron job to execute the DeltaImportQuery via curl at your specified interval.
Also consider using the DataImportScheduler that is described in the Scheduling section of the DataImportHandler on the Solr Wiki.

Related

Set Flink Detached Mode using Java

Flink Cluster Details,
Number of nodes : 4
Flink Version : 1.11
Flink Client : RestCluserClient
We are submitting Flink batch job from streaming job using PackagedProgram, but our requirement is to execute only one job at a time, let's say we got 2 events from source so idealy 2 batch job must be triggered(each per event) but only one at a time. To achieve this, we were using client.setDetached(false) (in previous version of flink), but once we have migrated it to 1.11 setDetached(false) API has been removed.
Do we have any idea how to implement this requirement ?
After analysis more on this, I found the solution.
Flink 1.11 API provided Utils class for submitting the job viz,ClientUtils and it has two methods,
ClientUtils.submitJob() -> this method works with detached mode as true
ClientUtils.submitJobAndWaitForExecutionResult() -> this works as detached mode as false.

Start solr indexing from CI job

We use Solr 6.4.1 and implement several cores for searching. In one of core contain several entities. All steps for refreshing index start manually from UI, including the credentials of the database.
My question is can I reindex solr core with several entities from a remote console? I need create CI job for this.
And the second question is where I can specify custom parameters with database credentials for all cores on the server?
If the application has some sort of command, you could just trigger the command directly from the CI pipeline, if it's not the case and the indexing/update code is highly coupled to the UI, then you could use DataImportHandler so you configure in Solr (as described in the documentation) the credentials, the queries that Solr needs to execute, etc. And you just trigger the import handler from the CI pipeline, something like:
http://<host>:<port>/solr/<collection_name>/command=delta-import
This will start a delta-import, for some more commands check the Data Import Handler Commands section on the previous link.

Timeout Solr search

I'm trying to timeout Solr if it takes too long to run a query.
What I've tried:
According to Solr documentation there is timeAllowed parameter which does exactly what I'm looking for, I've tried to
add <int name="timeAllowed">50</int> to requestHadnler in solrConfig.xml;
add timeAllowed parameter to request URL:
localhost:8080/solr4/alfresco/select?q=*&timeAllowed=50
But it doesn't work (time to load results is more than provided value).
Why I want to do this: we implemented access right validation on Solr side and now for some vague queries like * it takes too much time for Solr to respond.
Probably it could be possible to close such connections from Alfresco or from web container but then such queries will internally run on Solr slowing down the server.
You need to insert timeAllowed in milliseconds. timeAllowed 50 - This means that you need to complete the request in 50 milliseconds.
Try timeAllowed 5000.

Solr Query Log - Need SQL Results

We are usig solr 1.4.1 Dataimport handler to build our solr index. Whenver a record on table( where the DIH queries) is updated we call the DIH with a query that updates that solr record with the new values. Right now the problem is sometimes the solr records are not updated eventhough we see on the logs that solr query have been called when there is record update on the DB side. Is there anyway we can turn on solr to show us the follwing stuff onthe logs;
Show the SQL query it's executing
Results returned ( Both the count as well as the individual records).
Tried debugQuery=true but that does not give us the No.2(above) we are looking for.
Any help would be greatly appreciated
Thanks
s
You should be able to see the sql queries fired by Solr data import handler if you change your logging level to fine or finest.
You can dynamically change the logging level for solr.
You can also use http://wiki.apache.org/solr/DataImportHandler#Commands, the debug feature to sample test you data.
debugQuery would only help you debug search results and relevance.

How can I Schedule data imports in Solr

The wiki page, http://wiki.apache.org/solr/DataImportHandler explains how to index data using DataImportHandler. But the example uses a command to initiate the import operation. How can I schedule a job to do this on a regular basis?c
On UNIX/Linux, cron jobs are your friends! On Windows, there is Task Scheduler.
UPDATE
To do it from Java code, since this is a simple GET request, you can use the HTTP Client library. See this tutorial on using the GetMethod.
If you need to programmatically send other requests to Solr, you probably should use the Solrj library. It allows to send all the basic commands to Solr ant it can be configured to access any Solr handlers:
CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("command", "full-import");
QueryRequest request = new QueryRequest(params);
request.setPath("/dataimport");
server.request(request);
I was able to make it work following the steps:
Create classes ApplicationListener, HTTPPostScheduler and SolrDataImportProperties (source code listed on http://wiki.apache.org/solr/DataImportHandler#Scheduling).
I believe these classes haven't been committed yet.
Add the following listener to Solr web.xml file:
<listener>
<listener-class>org.apache.solr.handler.dataimport.scheduler.ApplicationListener</listener-class>
</listener>
Configure dataimport.properties as per instructions in the wiki page.
simple add this line to your crontab with crontab -e command:
0,30 * * * * /usr/bin/wget http://<solr_host>:8983/solr/<core_name>/dataimport?command=full-import
This will full import every 30 minutes. Replace <solr_host> and <core_name> with your configuration
There's a fresh patch by Esteve Fernandez that makes the whole thing work on Unix/Linux: https://issues.apache.org/jira/browse/SOLR-2305
#Eldo If you're going to need more help in building your own JAR just drop a question here...
This is a bit old, but I created a Windows WPF application and service to deal with this, as using CRON jobs and Task Scheduler is a bit difficult to maintain if you have a lot of cores / environments.
https://github.com/systemidx/SolrScheduler
You basically just drop in a JSON file in a specified folder and it will use a REST client to issue the commands to Solr.
We can use Quartz to do that, which is like the crontab on linux. But basically, the TimerTask embedded in jdk is enough for you.

Resources