T-SQL Background Processing - sql-server

I'm having the trouble finding the wording, but is it possible to provide a SQL query to a MS SQL server and retrieve the results asynchronously?
I'd like to submit the query from a web request, but I'd like the web process to terminate while the SQL server continues processing the query and dumps the results into a temp table that I can retrieve later.
Or is there some common modifier I can append to the query to cause it to background process the results (like "&" in bash).
More info
I manage a site that allows trusted users to run arbitrary select queries on very large data sets. I'm currently using a Java Daemon to examine a "jobs" table and run the results, I was just hopeful that there might be a more native solution.

Based on your clarification, I think you might consider a derived OLAP database that's designed for those types of queries. Since they seem to be strategic to the business.

This really depends on how you are communicating with the DB. With ADO.NET you can make a command execution run asynchronously. If you were looking to do this outside the scope of some library built to do it you could insert a record into a job table and then have SQL Agent poll the table and then run your work as a stored procedure or something.
In all likelihood though I would guess your web request is received by asp.net and you could use the ADO.NET classes.

See this question
Start stored procedures sequentially or in parallel
In effect, you would have the web page start a job. The job would execute asynchronously.

Since http is connectionless, the only way to associate the retrieval with the query would be with sessions. THen you'd have all these answers waiting around for someone to claim them, and no way to know if the connection (that doesn't exist) has been broken.
In a web page, it's pretty much use-it-or-lose-it.
Some of the other answers might work with a lot of effort, but I don't get the sense that you're looking for an edge-case, high-tech option.

It's a complicated topic to be able to execute a stored procedure and then asynchronously retrieve the result. It's not really for the faint of heart and my first recommendation would be to reexamine your design and be certain that you in fact need to asynchronously process your request in the data tier.
Depending on what precisely you are doing you should look at 2 technologies... SQL Service Broker which basically allows you to queue requests and receive responses asyncrhonously. It was introduced in SQL 2005 and sounds like it may be the best bet from the way you phrased your question.
Take a look at the tutorial for same database service broker conversations on MSDN: http://msdn.microsoft.com/en-us/library/bb839495(SQL.90).aspx
For longer running or larger processing tasks I'd potentially look at something like Biztalk or Windows Workflow. These frameworks (they're largely the same, they came from the same team at MS) allow you to start an asynchronous workflow that may not return for hours, days, weeks, or even months.

Related

Reading and writing to bigquery in gcp. What service?

I'm creating a bigquery table where I join and transform data from several other bigquery tables. It's all written in sql and the whole query takes about 20 minutes to run and consists of several sql scripts. I'm also creating some intermediate tables before the end table is created.
Now I want to make above query more robust and schedule it and I cant decide on the tool. Alternatives I'm thinking about.
Make it into a dataflow job and schedule with cloud scheduler. This feels like it might be overkill because all the code is in SQL and from bq --> bq.
Create scheduled queries to load the data. No experience with this but seems quiet nice
Create a python script that executed all the sql using the BQ API. Create a cron job and schedule it to run somewhere in GCP.
Any suggestions on what would be a preferred solution?
If it's encapsulated in a single script (or even multiple) I'd schedule it through BQ. It will handle your query no different than the other options so it doesn't make sense to set up extra services for it.
Are you able to run it as a single query?
According to my experience with GCP, both Cloud Composer and Dataflow jobs would be, as you suggested, overkill. None of these products would be serverless and would probably imply a higher economic cost because of the instances running in the background.
On the other hand, you can create scheduled queries on a regular basis (daily, weekly, etc) that are separated by a big enough time window to make sure the queries are carried out in the expected order. In this sense, the final table would be constructed correctly from the intermediate ones.
From my point of view, both executing a Python Script and sending notifications to Pub/Sub triggering a Cloud Function (as apw-ub suggested) are also good options.
All in all, I guess the final decision should depend more on your personal preference. Please feel free to use the Google Cloud Pricing Calculator (1) for having an estimate of how costly each of the options would be.

open a URL with SQL Server 2012

I am trying to open a web URL through SQL Server 2012, we have tried SQLCLR but its outdadted, we tried to run a batch file and it would get stuck in the executing process
EXEC xp_cmdshell 'c:\PATH.bat'
that's the code we used to open the batch file and then it gets stuck in executing query and i waited 5 minutes still nothing popped up
we have checked through file permissions and everything is allowed, its the 4th time ive tried to this and i couldnt manage can someone please show me an alternate solution ?
While there are pros and cons to accessing a URL from within SQL Server, SQLCLR is most definitely not outdated. Even if you have no custom Assemblies, it is still being used internally for several things:
Hierarchy, Geometry, Geography datatypes
Replication
Several built-in functions such as FORMAT, TRY_PARSE, etc
etc
For more info on what SQLCLR actually is and can do, please see the series of articles I am writing on this topic on SQL Server Central: Stairway to SQLCLR (free registration is required to read content on that site, but it's worth it :-). Level 1 ("What is SQLCLR?") is a fairly comprehensive look at what SQLCLR both is and is not.
If you want a command line utility then you might be able to get away with using curl.
If you want a pre-made SQLCLR function that can handle this so that you don't need to worry about the learning curve of doing such an operation in SQLCLR, then that is available in the SQL# library that I created (but it is not in the Free version; only available in the Full / paid version).
IF you are going to be making this URL / Web Sevice call from within a Trigger (whether it is a SQLCLR Trigger or T-SQL Trigger calling a SQLCLR object), then you need to be very careful since Triggers execute within a system-created Transaction (if no explicit Transaction already exists). What this means is that the actual committing of the Transaction (i.e. the true saving of the change to the DB) will wait until the external call completes. The two problems you run into here are:
The Web Service does not respond super quickly (and it needs to respond super quickly)
There are more concurrent requests made to the specific URI such that .NET waits until there is an opening. This is controlled by ServicePointManager.DefaultConnectionLimit, which can be accessed via the HttpWebRequest object (I think there is a ServicePoint property). The default limit is 2, so any more than 1 - 3 calls to the Web Service per second (generally speaking) can cause blocking, even if the Web Service has the ability to respond quickly. 1 - 3 calls per second might not seem like much, but if using this approach in an audit Trigger scenario on multiple tables, it becomes quite easy to reach this limit. So you need to increase the limit to something much higher than 2, and per each call as it is stored in the App Domain which sometimes gets unloaded due to memory pressure.
For more info and considerations, please see my related answers to similar questions here on S.O.:
SQL Server 2012 make HTTP 'GET' Request from a stored procedure
SQL CLR awaitable not getting executed
SQL CLR for Event Driven Communication
Logging not Persisting When Exception Occurs in Method Executed in a Trigger
Also, this S.O. question is very similar in terms of wanting to get near real-time notification of DML changes, and might apply to your goal:
SqlDependency vs SQLCLR call to WebService

"Real Time" data change detection in SQL Server

We have a requirement for notifying external systems of changes in data in various tables in a SQL Server database. The choice of which data to monitor is somewhat under the control of the user (gets to choose from a list of what we support). The recipients of the notifications may be on a locally connected network (i.e., in the same data center) or they may be remote.
We currently handle this by application code within our data access layer that detects changes and queues notifications on a Service Broker queue which is monitored by a Windows service that performs the actual notification. Not quite real time but close enough.
This has proven to have some maintenance problems so we are looking at using one of the change detection mechanisms that are built into SQL Server. Unfortunately none of the ones I have looked at (I think I looked at them all) seem to fit very well:
Change Data Capture and Change Tracking: Major problem is that they require polling the captured information to determine changes that are to be passed on to recipients. I suspect that will introduce too much overhead.
Notification Services: Essentially uses SQL Server as a web server, which is a horrible waste of licenses. It also requires access through at least two firewalls in the network, which is unacceptable from a security perspective.
Query Notification: Seems the most likely candidate but does not seem to lend itself particularly well to dynamically choosing the data elements to watch. The need to re-register the query after each notification is sent means that we would keep SQL Server busy with managing the registrations
Event Notification: Designed to notify on database or instance level events, not really applicable to data change detection.
About the best idea I have come up with is to use CDC and put insert triggers on the change data tables. The triggers would queue something to a Service Broker queue that would be handled by some other code to perform the notifications. This is essentially what we do now except using a SQL Server feature to do the change detection. I'm not even sure that you can add triggers to those tables but I thought I'd get feedback before spending a lot of time with a POC.
That seems like an awful roundabout way to get the job done. Is there something I have missed that will make the job easier or have I misinterpreted one of these features?
Thanks and I apologize for the length of this question.
Why don't you use update and insert triggers? A trigger can execute clr code, which is explained enter link description here

Reliable asynchronous processing in SQL Server

Some of the services are provided to our customers by a 3rd party. The data which is created on their remote servers is replicated to an on-premises SQL server.
I need to perform some work on that 3rd party server which database is not directly accessible to me.They expose a set of APIs for that purpose. The work is performed on a linked SQL server by a SQL Server Agent job.
Business scenario : customers can receive "badges" .A badge can be given to a customer by calling the UpdateCustomerBadgeInfo web method on a 3rd party server.
So a typical requirement for an automated task would look like this:
"Find all customers who logged in more than 50 times during theday, give them the [has-no-life] badge and send them an SMS notification"
The algorithm would be:
- Select all the matching accounts into a #TempTable
for each customer record:
- Call UpdateCustomerBadgeInfo() method (via CLR)
- If successfully updated badge info-> Enqueue SMS message (queue table)
- Log successful actions (so that the record will not be picked up next time)
The biggest problem with the way it works now is that it takes a lot of time to process large datasets in a WHILE loop.
So the 3rd party provider created a solution to perform batch updates of the customer data. They created a table on the on-premises SQL server to which batch update requests are submitted and later picked up by their service for validation and processing.
The question is :
How the above algorithm should be changed to fit into this asynchronous model?
This answer is valid only if I understood the situation correctly:
3rd party server used to expose web method to update customers one by one
now they expect to get this information from SQL Server table available to you for INSERT/UPDATE/DELETE
you can just stuff your customer-related requests into this table and they will be processed some time later
when the customer-related info gets updated, you have to perform some additional local actions (queue SMS, log activity)
Generally, I don't see any significant changes to the algorithm, but I will try to explain what would I do in this case.
Select all the matching accounts into a #TempTable
This may not be neccessary because you already have the table to stuff your requests in - 3rd party table. Only problem would be synchronizing requests, but for this to analyze you have to provide more details (multiple requests for the same customer allowed? protection of re-issuing the same request?)
for each customer record...
This should be the only change in your implementation. It now has the meaning - for each customer record that is asynchronously processed on 3rd party side. Of course, your 3rd party must give you some clue that they really did process your customer requeset, or you have no idea what to work with. So, when they validate and proces the data, they can provide e.g. nullable columns 'success_time' and 'error_time' to leave you message what has been done and when. If there is success, you continue with processing. If not, you can probably do something about that as well.
But how to react when you get async information back (e.g. sucess_time IS NOT NULL)? Well, there are multiple ways to do that. Personally I try to avoid triggers because they can make your life complicated (their visibility sucks, can cause problems with replication, can cause problems with transactions...) I use them if I really need first-class immediate responsiveness. Another possibility is using async queues with custom activation, which means Service Broker. However, a lot of people avoid using SB technology - it's different than the rest of SQL server, it has its speciffics, debugging is not so easy as with plain old SQL statements etc etc. Aother possibility would be batch processing async responses on your side using agent job. Since you are already using a job, you should be fine with it. Basically, the table should act as a synchronization point - you fill your requests (INSERT), 3rd party processes them (SELECT). After requests get processed they mark them as such (UPDATE success_time or error_time) and at the end you process that response (SELECT) using your agent job task. And your processing includes SMS message and logging, maybe even DELETING from 3rd party table.
Another thing to mention is that you need synchronization methods here. First, don't do anything without transactions, or you may end up processing ghost responses and/or skipping valid waiting responses. Second, when you SELECT responses (rows that are procesed on 3rd party side), you could get some improvent using READPAST hint (skip what is locked). However, if you need to update/delete from your 3rd party table after processing response, you may use SELECT with UPDLOCK to block another side of temperig with the data between your INSERT and UPDATE. Or you don't use any locking hints if you are not completely sure what goes on with the table in question.
Hope it helps.

Managing high-volume writes to SQL Server database

I have a web service that is used to manage files on a filesystem that are also tracked in a Microsoft SQL Server database. We have a .NET system service that watches for files that are added using the FileSystemWatcher class. When a file-added callback comes from FileSystemWatcher, metadata about the file is added to our database, and it works fairly well.
I've now come to a bit of a scalability problem. I'm adding large quantities of files to the filesystem in rapid succession, and this ends up hammering the database with file adds which results in locking up my web front-end.
I have yet to work on database scability issues, so I'm trying to come up with mitigate tactics. I was thinking of perhaps caching file adds and only writing them off to the database every five minutes or so, but I'm not sure how practical that is. This is data that needs to find its way into our database at some point anyway, and so it's going to have to get hammered at some point. Maybe I could limit the number of file db entries written per second to a certain amount, but then I risk having that amount be less than the rate at which files are added. How can I best tackle this?
Have you thought about using something like SQL Server Service Broker? That way you could push through tons of entries in a burst and it would level out the inserts into your database.
Basically you'd be pushing messages onto a queue which would then be consumed by a receiver stored procedure that would perform the insert for you. You could limit the maximum number of receivers executing to help with the responsiveness issues in your web interface.
There's a nice intro paper here. Although it's for 2005, not much has changed between 2005 and the newer versions of SQL Server.
You have a performance problem and you should approach it with a performance investigation methodology like Waits and Queues. Once you identify the actual problem, we can discuss solutions.
This is just a guess but, assuming the notification 'update metadata' code is a stright forward insert, the likely problem is that you're generating one transaction per notification. This results in commit flush waits, see Diagnosing Transaction Log Performance . Batch commit (aggregate multiple notifications before committing) is the canonical solution.
first option is using Caching to handle high-volume data. or using clusters for analysis high volume data. please click here for more information.

Resources