I am new to the community and looking forward to being a contributing member. I wanted to throw this out there and see if anyone had an advice:
I am currently in the middle of developing a MVC 3 app that controls various SQL Jobs. It basically allows user to schedule jobs to be completed in the future, but also also allows them to run jobs on demand.
I was thinking of having a thread run in the web app that pulls entity information into an XML file, and writing a window service to monitor this file to perform the requested jobs. Does this sound like a good method? Has anyone done something like this before and used a different approach? Any advice would be great. I will keep the forum posted on progress and practices.
Thanks
I can see you running into some issues using a file for complex communication between processes - files can generally only be written by one process at a time, so what happens if the worker process tries to remove a task at the same time as the web process tries to add a task?
A better approach would be to store the tasks in a database that is accessible to both processes - a database can be written to by multiple processes, and it is easy to select all tasks that have a scheduled date in the past.
Using a database you don't get to use FileSystemWatcher, which I suspect is one of the main reasons you want to use a file. If you really need the job to run instantly there are various sorts of messaging you could use, but for most purposes you can just check the queue table on a timer.
Related
There are two jobs running in flink shown in the below image, If one gets failed, I need to fail the whole flink application? How can I do it? Suppose job with parallelism:1 fails due to some exception, How to fail job with parallelism:4?
The details of how you should go about this depend a bit on the type of infrastructure you are using to run Flink, and how are submitting the jobs. But if you look at ClusterClient and JobClient and associated classes, you should be able to find a way forward.
If you aren't already, you may want to take advantage of application mode, which was added in Flink 1.11. This makes it possible for a single main() method to launch multiple jobs, and added env.executeAsync() for non-blocking job submission.
Is it possible in Apache Flink, to create an application, which consists of multiple jobs who build a pipeline to process some data.
For example, consider a process with an input/preprocessing stage, a business logic and an output stage.
In order to be flexible in development and (re)deployment, I would like to run these as independent jobs.
Is it possible in Flink to built this and directly pipe the output of one job to the input of another (without external components)?
If yes, where can I find documentation about this and can it buffer data if one of the jobs is restarted?
If no, does anyone have experience with such a setup and point me to a possible solution?
Thank you!
If you really want separate jobs, then one way to connect them is via something like Kafka, where job A publishes, and job B (downstream) subscribes. Once you disconnect the two jobs, though, you no longer get the benefit of backpressure or unified checkpointing/saved state.
Kafka can do buffering of course (up to some max amount of data), but that's not a solution to a persistent different in performance, if the upstream job is generating data faster than the downstream job can consume it.
I imagine you could also use files as the 'bridge' between jobs (streaming file sink and then streaming file source), though that would typically create significant latency as the downstream job has to wait for the upstream job to decide to complete a file, before it can be consumed.
An alternative approach that's been successfully used a number of times is to provide the details of the preprocessing and business logic stages dynamically, rather than compiling them into the application. This means that the overall topology of the job graph is static, but you are able to modify the processing logic while the job is running.
I've seen this done with purpose-built DSLs, PMML models, Javascript (via Rhino), Groovy, Java classloading, ...
You can use a broadcast stream to communicate/update the dynamic portions of the processing.
Here's an example of this pattern, described in a Flink Forward talk by Erik de Nooij from ING Bank.
I am interested in make better programs with more responsive design and capabilities. Nowadays, when I create my programs that access data remotely, my interface freezes and there is no animated GIF to work on that condition.
I was told by David Hefferman that animated GIF that are created in the VCL do not respond even in threads because the VCL is in the main thread, and the same goes to databases.
My doubt here is how to work with threads, specifically in databases, so I have lots of questions about it.
Do I have to implement my entire database in thread functions and procedures?
If that is correct, then I can't use database by dropping components to the Form, right?
But what about the user input and grids? Will they work correctly with those threads or will I have to user regular TEdit instead of TDBEdit to then send it's content to a insert/update sql command?
The main objective in here is to create Delphi application that access remote databases like MySQL using Zeos but not freezing for every drop of consult made in the server. At least the smaller ones. It would be very ugly if the system were to download a list of records to a table and the user could still input things. For those cases I would like very much that my animated GIF (or other solutions) could work.
Thank you for any help at all!
In my experience, the best approach is to drop your database components on a Data module and then create this data module dynamically in each thread. Database components typically work fine if they are created and initialized in the thread that is using them.
There are, however, caveats - if you are connecting to a Firebird database, you should make sure that only one thread at the time is establishing a connection. (Use a critical section around the code that connects to the database.) This holds for Firebird 1.5, 2.0 and 2.1 but may not be necessary anymore for Firebird 2.5 (I didn't yet have opportunity to test it).
EDIT (in answer to EASI's comment): Yes, connecting to a database can take some time. If you frequently need to execute short operations, it is best to keep threads connected and running for a longer period of time.
I can think of two ways to do that. 1) Keep threads alive and connected and run a message loop inside. This loop would receive commands from the main thread, process them and return a result. 2) Keep threads initialized and connected in a thread pool and activate them when you need to perform a database operation.
Basically, both approaches are the same, the difference is in the level which handles 'receive and process command' loop.
The second approach can be easily implemented in the OmniThreadLibrary by using the IOmniConnectionPool.SetThreadDataFactory mechanism. See Adding connection pool mechanism to OmniThreadLibrary and demo 24_ConnectionPool for more information. Alternatively, you can use the high-level abstraction Background worker where you can establish database connection on a per-thread basis in a Task initialization block.
We have a web app in which a request for a long running or high processor process is called.
We want to create a windows service to off-load this from the IIS servers. We will install this service on multiple machines to lower the wait time for these jobs. One idea we are looking at is serializing the Job object into Sql Server with its JobType as another column.
The job service will claim the job by updating the row with its indicator, this will keep other services from picking it up. Once the job is complete the service removes that entry.
What I am looking for is other, possibly better ideas to accomplish the Job Service Queuing.
I would say this is a great way to handle this issue. The only thing I would add is that while I don't know what the Job object is or how it is created, you might be able to offload this as well. Instead of creating the object and serializing it to the database, simply store the raw data in SQL. Let the Services handle building the Job object themselves from the ground up. That way you cut the serialization out of the mix. However, if this isn't possible, I would say that your solution seems to be the most viable.
If you do go this route, you could look into optimization of your Service offloading. For example, you could wake extra services when the load gets busy and then put some to sleep when the load lightens.
I'm having the trouble finding the wording, but is it possible to provide a SQL query to a MS SQL server and retrieve the results asynchronously?
I'd like to submit the query from a web request, but I'd like the web process to terminate while the SQL server continues processing the query and dumps the results into a temp table that I can retrieve later.
Or is there some common modifier I can append to the query to cause it to background process the results (like "&" in bash).
More info
I manage a site that allows trusted users to run arbitrary select queries on very large data sets. I'm currently using a Java Daemon to examine a "jobs" table and run the results, I was just hopeful that there might be a more native solution.
Based on your clarification, I think you might consider a derived OLAP database that's designed for those types of queries. Since they seem to be strategic to the business.
This really depends on how you are communicating with the DB. With ADO.NET you can make a command execution run asynchronously. If you were looking to do this outside the scope of some library built to do it you could insert a record into a job table and then have SQL Agent poll the table and then run your work as a stored procedure or something.
In all likelihood though I would guess your web request is received by asp.net and you could use the ADO.NET classes.
See this question
Start stored procedures sequentially or in parallel
In effect, you would have the web page start a job. The job would execute asynchronously.
Since http is connectionless, the only way to associate the retrieval with the query would be with sessions. THen you'd have all these answers waiting around for someone to claim them, and no way to know if the connection (that doesn't exist) has been broken.
In a web page, it's pretty much use-it-or-lose-it.
Some of the other answers might work with a lot of effort, but I don't get the sense that you're looking for an edge-case, high-tech option.
It's a complicated topic to be able to execute a stored procedure and then asynchronously retrieve the result. It's not really for the faint of heart and my first recommendation would be to reexamine your design and be certain that you in fact need to asynchronously process your request in the data tier.
Depending on what precisely you are doing you should look at 2 technologies... SQL Service Broker which basically allows you to queue requests and receive responses asyncrhonously. It was introduced in SQL 2005 and sounds like it may be the best bet from the way you phrased your question.
Take a look at the tutorial for same database service broker conversations on MSDN: http://msdn.microsoft.com/en-us/library/bb839495(SQL.90).aspx
For longer running or larger processing tasks I'd potentially look at something like Biztalk or Windows Workflow. These frameworks (they're largely the same, they came from the same team at MS) allow you to start an asynchronous workflow that may not return for hours, days, weeks, or even months.