I have two entities in data-config.xml file. I want to run a cron job for each entity at different frequencies. Is it possible to have different times for each entity? Or will it always make two requests when there are two entities when import is done using cron job?
You could do it using
DIH commands are sent to Solr via an HTTP request. The following operations are supported.
Only the SqlEntityProcessor supports delta imports.
For example: http://localhost:8983/solr/dih/dataimport?command=delta-import.
This command supports the same clean, commit, optimize and debug parameters as full-import command described below.
entity
The name of an entity directly under the tag in the configuration file. Use this to execute one or more entities selectively.
Multiple "entity" parameters can be passed on to run multiple entities at once. If nothing is passed, all entities are executed.
For example:
http://localhost:8983/solr/dih/dataimport?command=delta-import&entity=YOUR_ENTIY_NAME
Read more
https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html
Related
I am using Jmeter to test API's. I am often using queries to access the DB (JCDB connection)
So far so good. However as i am using more and more queries it seems i am copying data.
For instance:
Thread 1:
HTTP request 1
Query A
Query B
Query C
Thead 2:
HTTP request 2
Query D
Query A
Thead 3:
HTTP request 3
Query A
Query C
As you can see. I have the same query duplicate it often. Not only on 1 jmx file but i have a lot of jmx files where i use Query API
So i am looking for a way to have to write query A once. I would think to create a new jmx file and just include the jmx file and call to that. Is this a good way to appraoch this? Also how do i call Query A from any thread? I would need to pass (and return) parameters.
Help would be appreciated
It appears you're looking for the Module Controller, you can define a "module" per query
and build your test using the "modules" instead of copying and pasting the real JDBC Request samplers
If you're going to store the "modules" as external .jmx files consider using Test Fragments
So I'm learning about Spark and I have a question about how client libs works.
My goal is to do some sort of data analysis in Spark, telling it where are the data sources (databases, cvs, etc) to process, and store results in hdfs, s3 or any kind of database like MariaDB or MongoDB.
I though about having a service (API application) that "tells" spark what I want to do. The question is: Is it enough setting the master configuration with spark:remote-host:7077 at context creation or should I send the application to spark with some sort of spark-submit command?
This completely depends on how your environment is set up, if all paths are linked to your account you should be able to run one of the two commands, to efficiently open the shell and run test commands. The reason to have a shell, is this will allow you to dynamically run commands and validate/learn how to run/tether commands onto one another and see what results come out.
Scala
spark-shell
Python
pyspark
Inside of the environment, if everything is linked to Hive tables you can check the tables by running
spark.sql("show tables").show(100,false)
The above command will run a "show tables" on the Spark-Hive-Metastore Catalogue and will return all active tables you can see (doesn't mean you can access the underlying data). The 100 means I am going to look at 100 rows and the false means to show the full string not the first N many characters.
In a mythical example if one of the tables you see is called Input_Table you can bring it into the environmrnt with the below commands
val inputDF = spark.sql("select * from Input_Table")
inputDF.count
I would heavily advise, while your learning, not to run the commands via Spark-Submit, because you will need to pass through the Class and Jar, forcing you to edit/rebuild for each testing making it difficult to figure how commands will run without have a lot of down time.
I am looking for any option like dynamic fields (Solr) in Vespa. I need to add new fields to the existing schema without redeployment of the whole application.
Anything related to this is mentioned in documentation of Vespa http://docs.vespa.ai/documentation/search-definitions.html#modify-search-definitions
where it is mentioned that we can add a new field and run vespa-deploy prepare and vespa-deploy activate. But isn't it resubmission of whole application? What is an option to implement it at least overhead?
http://docs.vespa.ai/documentation/search-definitions.html#modify-search-definitions is the right document, yes
altering the search definition is safe, as the deploy prepare step will output steps needed (e.g. no steps, re-start or re-feed). Most actions do not require re-start or re-feed, as Vespa is built to be production friendly, and adding fields is easy as it requires no actions.
Note that there is no support for default values for fields.
As the Vespa configuration is declarative, the full application package is submitted, but the config servers will calculate changes and deploy the delta to the nodes. This makes it easy to keep the application package config in a code repo like git - what you see in the repo is what is deployed.
It depends on what you mean by "dynamic".
If the fields number in the hundreds, are controlled by the application owners and changed daily, then changing the schema and redeploy works perfectly fine: Fields can be updated individually (even when indexed), they incur no overhead if not used, and deploying an application with new fields is cheap and do not require any restarts or anything like that.
If you need tens of thousands of fields, or the fields are added and removed by users, then Vespa does not have a solution for you out of the box. You'd need to mangle the fields into the same index by adding e.g a "myfield_" prefix to each token. This is what engines that support this do internally to make this efficient.
I'm needing to add serial numbers to most of the entities in my application because I'm going to be running a Lucene search index side-by-side.
Rather than having to run an ongoing polling process, or manually run my indexer by my application I'm thinking of the following:
Add a Created column with a default value of GETUTCDATE().
Add a Modified column with a default value of GETUTCDATE().
Add an ON UPDATE trigger to the table that updates Modified to GETUTCDATE() (can this happen as the UPDATE is executed? i.e. it adds SET [Modified] = GETUTCDATE() to the SQL query instead of updating it individually afterwards?)
The ON UPDATE trigger will call my Lucene indexer to update its index (this would have to be an xp_cmdshell call presumably, but is there a way of sending a message to the process instead of starting a new one? I heard I could use Named Pipes, but how do you use named pipes from within a Sproc or trigger? (searching for "SQL Server named pipes" gives me irrelevant results, of course).
Does this sound okay, and how can I solve the small sub-problems?
As I understood, you have to introduce two columns to your existing tables and have them processed (at east one of them) in 'runtime' and used by an external component.
Your first three points are nothing unusual. There are two types of triggers in SQL Server according to time when trigger get processed: INSTEAD OF trigger (actually processed before insert happens) and AFTER trigger. However, inside INSTEAD OF trigger you have to provide logic what to really insert data into the table, along with other custom processing you require. I usually avoid that if not really necessary.
Now about your fourth point - it's tricky and there are several approaches to solve this in SQL Server, but all of them are at least a bit ugly. Basically you have to either execute external process or send message to it. I really don't have any experience with Lucene indexer but I guess one of these methods (execute or send message) would apply.
So, you can do one of the the following to directly or indirectly access external component, meaning to access Lucene indexer directly or via some proxy module:
Implement unsafe CLR trigger; basically you execute .NET code inside the trigger and thus get access to the whole (be careful with that - not entirely true) .NET framework
Implement unsafe CLR procedure; only difference to CLR trigger is that you wouldn't call it imediatelly after INSERT, but you will do fine with some database job that runs periodically
Use xp_cmdshell; you already know about this one, but you can combine this aproach with job-wrapping technique in last point
Call web service; this technique is usually marked as experimental AND you have to implement the service by yourself (if Lucene indexer doesn't install some web service on its own)
There surely are other methods I can't think of right now...
I would personally go with third point (job+xp_cmdshell) because of the simplicity, but that's just because I lack any knowledge of how does the Lucene indexer work.
EDIT (another option):
Use Query Notifications; SQL Server Service Broker allows an external application to connect and monitor interesting changes. You even have several options how to do that (basically synchronous or asynchronous), only precondition is that your Service Borker is up, running and available to your application. This is more sophisticated method to inform external component that something has changed.
Still trying to wrap my head around Quartz.NET after reading all the tutorials, which seem very code specific, versus implementation focused. Here's what I'm trying to do. I have 20 SQL stored procs that do various things, like query log tables, resubmit data to other processes, etc. I'd like to have these SP running throughout the day at regular intervals. So it seems like a natural for Quartz.NET. I plan on creating a Windows Svc that implements Quartz.NET and contains jobs in assemblies in the same folder as the Quartz assembly.
One bad way to implement this, I think, would be to write a single job class for every SP and associate a separate trigger for each one. The job class would simply execute a particular SP whose named was hard coded in the class. That's the bad way.
But for the life of me I can't figure out what the Good way would be. Obviously having a single job class that just does a generic 'execute SP by name', where the names come from a simple SQL table, seems like the way to go, but how would I get different triggers associated with different SPs, and how would Quartz know to load up all twenty SPs on separate threads?
And how would Quartz know to pickup a changed trigger for example for one of the SPs? Would that have to be a start/stop cycle on the Win Svc to reload jobs and triggers, or would I have to hand code some kind of "reload" too?
Any thoughts? Am I misunderstanding what Quartz is? The verbiage makes it sound like it's an Enterprise Scheduler, a System, a thing you install. All the documentation OTOH makes it seem like just a bunch of classes you stitch together to create your OWN scheduler or scheduling system, no different from the classes MS provides in .NET to create apps that do FTP for example. Maybe I'm expecting too much?
A pretty easy way to fulfill your requirements could be:
Start with sample server
Take Quartz.NET distribution's server as starting point, you have there a ready made template for a Windows service that utilizes TopShelf for easy installation
Use XML configuration with change detection
quartz.config file contains the actual configuration, there you can see that jobs and triggers are read from XML file quartz_jobs.xml .
You need to add quartz.plugin.xml.scanInterval = 10 to watch for changes (every ten seconds)
Use trigger job data maps to parameterize the job
You can use same job class for every trigger if SQL execution is as trivial as you propose. Just add needed configuration to trigger's definition in XML (sample here that runs every ten seconds, add as many triggers as you want):
<trigger>
<simple>
<name>sqlTrigger1</name>
<job-name>genericSqlJob</job-name>
<job-group>sqlJobs</job-group>
<job-data-map>
<entry>
<key>sql_to_run</key>
<value>select 1</value>
</entry>
</job-data-map>
<misfire-instruction>SmartPolicy</misfire-instruction>
<repeat-count>-1</repeat-count>
<repeat-interval>10000</repeat-interval>
</simple>
</trigger>
Just use the quartz_jobs.xml as base and make required changes.
Use configuration in your job
You can access the configuration in your job from context's MergedJobDataMap that contains both job's and trigger's parameters, latter overriding former.
public void Execute(IJobExecutionContext context)
{
string sqlToRun = context.MergedJobDataMap.GetString("sql_to_run");
SqlTemplate.ExecuteSql(sqlToRun);
}