How can I Schedule data imports in Solr - solr

The wiki page, http://wiki.apache.org/solr/DataImportHandler explains how to index data using DataImportHandler. But the example uses a command to initiate the import operation. How can I schedule a job to do this on a regular basis?c

On UNIX/Linux, cron jobs are your friends! On Windows, there is Task Scheduler.
UPDATE
To do it from Java code, since this is a simple GET request, you can use the HTTP Client library. See this tutorial on using the GetMethod.
If you need to programmatically send other requests to Solr, you probably should use the Solrj library. It allows to send all the basic commands to Solr ant it can be configured to access any Solr handlers:
CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("command", "full-import");
QueryRequest request = new QueryRequest(params);
request.setPath("/dataimport");
server.request(request);

I was able to make it work following the steps:
Create classes ApplicationListener, HTTPPostScheduler and SolrDataImportProperties (source code listed on http://wiki.apache.org/solr/DataImportHandler#Scheduling).
I believe these classes haven't been committed yet.
Add the following listener to Solr web.xml file:
<listener>
<listener-class>org.apache.solr.handler.dataimport.scheduler.ApplicationListener</listener-class>
</listener>
Configure dataimport.properties as per instructions in the wiki page.

simple add this line to your crontab with crontab -e command:
0,30 * * * * /usr/bin/wget http://<solr_host>:8983/solr/<core_name>/dataimport?command=full-import
This will full import every 30 minutes. Replace <solr_host> and <core_name> with your configuration

There's a fresh patch by Esteve Fernandez that makes the whole thing work on Unix/Linux: https://issues.apache.org/jira/browse/SOLR-2305
#Eldo If you're going to need more help in building your own JAR just drop a question here...

This is a bit old, but I created a Windows WPF application and service to deal with this, as using CRON jobs and Task Scheduler is a bit difficult to maintain if you have a lot of cores / environments.
https://github.com/systemidx/SolrScheduler
You basically just drop in a JSON file in a specified folder and it will use a REST client to issue the commands to Solr.

We can use Quartz to do that, which is like the crontab on linux. But basically, the TimerTask embedded in jdk is enough for you.

Related

Using Flink LocalEnvironment for Production

I wanted to understand the limitations of LocalExecutionEnvironment and if it can be used to run in production ?
Appreciate any help/insight. Thanks
LocalExecutionEnvironment spins up a Flink MiniCluster, which runs the entire Flink system (JobManager, TaskManager) in a single JVM. So you're limited to CPU cores and memory available on that one machine. You also don't have HA from multiple JobManagers. I haven't looked at other limitations of the MiniCluster environment, but I'm sure more exist.
A LocalExecutionEnvironment doesn't load a config file on startup, so you have to do all of the configuration in the application. By default it also doesn't offer a REST endpoint. You can solve both these issues by doing something like this:
String cwd = Paths.get(".").toAbsolutePath().normalize().toString();
Configuration conf = GlobalConfiguration.loadConfiguration(cwd);
env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf);
Logging may be another issue that will require a workaround.
I don't believe you'll be able to use the Flink CLI to control the job, but if you create the Web UI (as shown above) you can at least use the REST API to do things like triggering savepoints (after first using the REST API to get the job ID).

Solr to Application Insights

How can I configure Solr logs to get sent to Azure Application Insights?
I see can use a Log4J appender.
https://learn.microsoft.com/en-us/azure/application-insights/app-insights-java-trace-logs
Solr is an open source project, and I don't compile it myself, I just use the distribution.
How can I drop in Application Insights/Log4J appender, without recompiling having installed the SDK?
I just want to configure the logs to get sent to application insghts, for effectively a 3rd party application.
And configure the instrumentation key.
I'm normally a C# dev, but familiar with Log4Net. So appologies if this is simple in Java Log4J. Not been able to find a post for this scenario so posting here.
Using Solr 6.6.
It takes a lot less configuration than you'd expect, and most of the info is hidden away in the link that you've already got: https://learn.microsoft.com/en-gb/azure/azure-monitor/app/java-trace-logs
First, go download the jar files from https://github.com/Microsoft/ApplicationInsights-Java/releases. You'll want applicationinsights-logging-log4j1_2-2.3.0 and applicationinsights-core-2.3.0. Put these in the server/lib folder and Solr will load them automatically for you.
Next you''ll need to add a new appender for app insights into your log4j.properties file
# Appinsights
log4j.appender.aiAppender=com.microsoft.applicationinsights.log4j.v1_2.ApplicationInsightsAppender
log4j.appender.aiAppender.layout=org.apache.log4j.EnhancedPatternLayout
log4j.appender.aiAppender.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss.SSS} %-5p (%t) [%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n
You also need to add this aiAppender to the log4j.rootLogger list in the same file (it'll probably look something like this: log4j.rootLogger=INFO, file, CONSOLE, aiAppender)
Finally, you need an ApplicationInsights.xml file, which you can get an example of from here https://learn.microsoft.com/en-gb/azure/azure-monitor/app/java-get-started#2-add-the-application-insights-sdk-for-java-to-your-project
Drop this in the server/resources folder, set your instrumentation key and you're good to go!

Start solr indexing from CI job

We use Solr 6.4.1 and implement several cores for searching. In one of core contain several entities. All steps for refreshing index start manually from UI, including the credentials of the database.
My question is can I reindex solr core with several entities from a remote console? I need create CI job for this.
And the second question is where I can specify custom parameters with database credentials for all cores on the server?
If the application has some sort of command, you could just trigger the command directly from the CI pipeline, if it's not the case and the indexing/update code is highly coupled to the UI, then you could use DataImportHandler so you configure in Solr (as described in the documentation) the credentials, the queries that Solr needs to execute, etc. And you just trigger the import handler from the CI pipeline, something like:
http://<host>:<port>/solr/<collection_name>/command=delta-import
This will start a delta-import, for some more commands check the Data Import Handler Commands section on the previous link.

How do I create a Solr core without creating the config file first?

I am making a Solr web-based application and one of the features is the user can create a core and schema to the Solr. My friend made it using child process by going to the directory of the Solr first and then using the command 'bin/solr create -c...' the core can be created. But I am thinking of another approach, like using the http api request. I found this.
http://localhost:8983/solr/admin/cores?action=CREATE&name=mycore&instanceDir=path/to/instance&configSet=configset2
But apparently, it cannot run properly because you need to make the config file first for the core. The error says like this.
Error CREATEing SolrCore 'mycore': Unable to create core [mycore] Caused by: Could not load configuration from directory/opt/solr/server/solr/configsets/configset2
So I am wondering what kind of approach I can do, since it seems like I can't make a core without setting up a config first. Or should I make an input menu with create core, create schema and only after the user clicks 'submit' it will process everything, from making a config file, creating schema, and then finally creating the core? I wonder if it's the best approach.
I am looking forward to any help.
You always need to provide a configuration when creating a core.
When your friend run the command, it actually used the default configuration data_driven_schema_configs, which you can confirm by reading the explanation from create_core command (create is an alias for create_core for non Cloud setup):
bin/solr create_core -h
The solr script copied that configuration and then created the core with it.
The example you showed is only valid for SolrCloud. If you are not using SolrCloud, you need to be using Core Admin API directly and manually setup the directory with configuration.
Notice that configsets are a bit of a tricky thing in the sense that if you create several cores from the same configset, that configset is shared and changes made to it by one core affect all of them. So, you most likely don't want to use them, but instead copy the configuration as I described above.

Does Gatling provide a way to compare previously run tests

I've been running Gatling tests and I have a whole bunch of reports in the results folder.
For example I have a report for 200 requests per second and one for 400 requests per second.
Is there anyway to compare the reports against each other?
There's only the Jenkins plugin for now.
That's something we plan on providing as a commercial offer.
Gatling itself provides enterprise version called Gatling Frontline, which does support trends and history runs.
Another possibility is to use your simulation.log and process them using nuxeo gatling-report utility like this:
# check https://maven-eu.nuxeo.org/nexus/#nexus-search;quick~gatling-report for latest version, or build from source
wget 'https://maven-eu.nuxeo.org/nexus/service/local/repositories/public-releases/content/org/nuxeo/tools/gatling-report/4.0/gatling-report-4.0-capsule-fat.jar'
# do not create outputReportDirectory !
java -jar gatling-report-4.0-capsule-fat.jar results/complexscenario-20200618125705159/simulation.log results/complexscenario-20200617130307094/simulation.log -o outputReportDirectory
If you intend to generate trends during maven build, you can have a look at DennisRippinger gatling-reporter maven plugin, which encpsulates previously mentioned project.

Resources