Statistical calculations in SQL Server - sql-server

Does anyone know of any packages or source code that does simple statistical analysis, e.g., confidence intervals or ANOVA, inside a SQL Server stored procedure?

The reason you probably don't want to do that is because these calculations are CPU-intensive. SQL Server is usually licensed by the CPU socket (roughly $5k/cpu for Standard, $20k/cpu for Enterprise) so DBAs are very sensitive to any applications that want to burn a lot of CPU power on the SQL Server itself. If you started doing statistics calculations and suddenly the server needs another CPU, that's an expensive licensing proposition.
Instead, it makes sense to do these statistical calculations on a separate application server. Query the data over the wire to your app server, do the number-crunching there, and then send the results back via an update statement or stored proc. Yes, it's more work, but as your application grows, you won't be facing an expensive licensing bill.

In more recent versions of SQL Server you can use .net objects natively. So any .net package will do. Other than that there's always external proc calls...

Unless you have to do it within the stored proc I'd retrieve the data and do it outside SQL Server. That way you can choose from any of the open source or commercial stats routines and it would probably be faster too.

I don't know if a commercial package like this exist. There could be multiple reasons for this, some of which have been outlined above.
If what you are trying to accomplish is to avoid building statistical functions that process your data stored in SQL Server, you might want to try and integrate statistical packages with your database server by importing data from it. For example, R supports it and there is also CRAN
Once you have accomplished that and you still feel that you'd like to make statistical analysis run inside your SQL Server, the next steps would be to call your stats package from a stored procedure using a command line interface. Your best option here is probably xp_cmdshell, though it requires careful configuration in order not to compromise your SQL Server security.

Related

Better understanding of MySQL transactions

I just realized that my application was needlessly making 50+ database calls per user request due to some hidden coding -- hidden in the sense that between LINQ, persistence frameworks and events it just so turned out that a huge number of calls were being made without me being aware.
Is there a recommended way to analyze individual transactions going to my SQL 2008 database, preferably with some integration to my Visual Studio 2010 environment? I want to be able to 'spy' on individual transactions being made, but only for certain pieces of my code, and without making serious changes to either the code or database.
I addition to SQL Server Profiler, there are a number of performance counters you can look at to see both a real time evaluation and a historic trend:
Batch Requests/sec: Effectively measures the number of actual calls made to the SQL Server
Transactions/sec: Number of transactions in each database.
Connection resets/sec: number of new connections started from the connection pool by your site.
There are many more performance counters you can monitor, specially if you want to measure performance, but going through is besides the scope here. A good starting point is Monitoring Resource Usage.
You can use the SQL Profiler tool that comes with SQL Server Management Studio.
Microsoft SQL Server Profiler is a graphical user interface to SQL Trace for monitoring an instance of the Database Engine or Analysis Services. You can capture and save data about each event to a file or table to analyze later. For example, you can monitor a production environment to see which stored procedures are affecting performance by executing too slowly.
As mentioned, SQL Profiler is userful at the SQL Server level. It is not available in SQL Server SSMS Express however.
At the .NET level, LINQ to SQL and the Entity Framework both support logging. See Logging every data change with Entity Framework, http://msdn.microsoft.com/en-us/magazine/gg490349.aspx, http://peterkellner.net/2008/12/04/linq-debug-output-vs2008/.

SQL Server Stress Test Tools?

I am looking for a stress tool for SQL Server. I've seen a lot of suggestions on Google. But nothing of what I really need.
I am really looking for a tool that could run a list of stored procedures in parallel to see how much contention on resources. The collect and reporting feature is not that important. But I also want something server-side base for our enterprise build server.
I am not looking for a replay feature (Yes it could do the trick but it would be difficult to program a lot of different scenarios)
I've look at the following tools:
RML Utilities from Microsoft
DTM DB Stress (this is the closest to what I'm looking for)
SQL Stress
I created a simple test tool for this scenario, check it out to see if it will be of any use to you. It's free, no licensing of any sort required. No guarantees on any performance or quality either ;-)
Usage: StressDb.exe <No. of instances> <Tot. Runtime (mins)> <Interval (secs)>
Connection string should reside in the configuration file.
All command line arguments are required. Use integers.
The stored proc to use is also in the config file.
You need to have .NET framework 3.5 installed. You can also run it from multiple workstations for additional load, or from multiple folders on same machine if trying to run additional stored procedures. You also need a SQL user Id, as currently it doesn't use a trusted connection.
This code was actually super simple, the only clever bit was making sure that the connections are not pooled.
http://......com/..../stressdb.zip
Let me know if you find it useful.

ADO.NET database access

I have written a program in VB.NET and one of the things this program does is insert records into a Microsoft Access database. The backend of my program that access the database is written as an interchangeable layer. If I "swap" this layer out with a layer that used a Microsoft SQL Server database, my program flies. If I use MS Access, its still pretty quick, but it is much slower. Does anyone have any hints or tips on how to speed up ADO.NET transactions using Microsoft Access? I would really rather use MS Access over SQL Server so that I can distribute my database with my program (rather then connecting to some remote SQL Server). Any suggestions? Also, when I created the MS Access database, I created it in Access 2000 compatible mode. would it be faster to use 2003 compatible mode?
Thanks in advance
Although you need to install it, SQL Server Express supports "XCopy file deployment" where all you need to do to deploy the application is ship an .mdf file and your executables.
Details are here on MSDN.
This does support stored procedures: I've used it in our unit tests to dynamically create a mocked-out database on the fly.
Access is, as you're experiencing, less than optimal.
Have you taken a look at SQL Server Compact Edition. It can be embedded and distributed with your application...and should perform much better than Access.
SQL Server Compact 3.5 will give you the same benefit - a single database file that you can deploy and distribute (as long as you include the runtime assemblies in your app).
It has reduced query capabilities compared to a full SQL Server instance, but it is definitely faster than the Access engine.
I have used it with a mobile app that has a desktop component and it did everything I needed it to do.
Did you also have the Access backend open in Access at the same time? If so try your program without having it open. If that speeds things up then you should open either a database connection or a recordset (against a table with few records) and leave it open while processing the data.
The problem is that if you open and close objects or recordsets against an Access database file and someone else is in the Access database file, Jet wastes a lot of time doing locks against the LDB file. So keeping a permaneent connection to the Access database file solves this problem.
To my experience, ADO.NET is not much optimized for MS Access. Using the older ADO or DAO interfaces (which are available in VB.NET via COM) can bring you performance improvements about a factor 20 or more in some cases. But it all depends a lot of what SQL statements your program really does (lots of batch updates / insert, or lots of queries with large result sets, or lots of interactive LOAD-Transform-Store Cycles).
The MSDN features an Article on how to speed up ADO.NET: http://msdn.microsoft.com/en-us/library/ms998569.aspx
Even though the article is a bit dusty, it still makes a few good points :)
Other than that, using MS Access myself, I found that a few techniques such as caching of data, selecting without the source scheme or optimizing queries are suitable to keep the performance at a halfway decent level.

What is the current trend for SQL Server Integration Services?

Could anybody tell me what the current trend for SQL Server Integration Services is? Is it better than other ETL tools available in market like Informatica, Cognos, etc?
I was introduced to SSIS a couple of weeks ago. Executive summary: I am unlikey to consider it for future projects.
I'm pretty sure flow charts (i.e. non-structured) were discredited as an effective programming paradigm a long time, except in a tiny minority of cases.
There's no point replacing a clean textual (source code) interface with a colourful connect-the-dots one if the user still needs to think like a programmer to know where to drag the arrows.
A program design that you can't access (e.g. fulltext search, navigate using alternative methods, effectively version control, ...) except by one prescribed method is a massive productivity killer. And a wonderful source of RSI.
It's possible there is a particular niche where it's just right, but I imagine most ETL tasks would outgrow it pretty quickly.
SSIS isn't great for production applications from my experience for the following reasons:
To call an SSIS package remotely, you have to call a stored procedure, that calls a job, that calls the SSIS
Using the above method, you can't pass in parameters.
Passing parameters means you have to call the SSIS on a local server - meaning code running on a remote server will have to call code running on the SQL server to execute the package.
I would always rather write specific code to handle ETL and use SSIS for one off transforms.
In my opinion it's quite good platform, and I see a good progress on it. Many of the drwabacks that 2005 version had and that the community complained about, have been corrected on 2008.
From my point of view, the best thing is that you can extend and complement it with SQL or .NET code in an organized way as much as you want.
For instance, you can decide if in your solution you want 80% of c# code and 20% of ETL componenets or 5% of c# code and 95% of ETL components.
disclaimer - i work for microsoft
now the answer
SSIS or SQL Server Integration services is a great tool for ETL operations, there is a lot of uptake in the market place. there is no additional cost other than licensing SQL server and you can also use .Net languages to write tasks.
http://www.microsoft.com/sqlserver/2008/en/us/integration.aspx
http://msdn.microsoft.com/en-us/library/ms141026.aspx
I would list as benefits:
you use SSIS for bigger projects, probably/preferably once or in one run, and then use the integration project for many months with minor changes; the tasks, packages and everything in general is easily readable (of course, depends on perspective)
the tool itself handles the scheduled runs, sending you mails with the logs, and - as long as my experience reaches - it communicates very well with all the other tools (such as SSAS, SQL Server Management Studio, Microsoft Office Excel, Access etc., and other, non-Microsoft tools)
the manually, in-detail configured tasks seem to take over the responsibility in all ways, letting only small chance for errors
as also mentioned above, there are many former problems corrected in the new versions
I would recommend it for ETL, especially if you would continue with analytical processes, since the SSIS, SSAS and SSRS tools blend together quite smoothly.
Drawback: debugging/looking for errors is a bit harder until you get used to it.

Automatically measure all SQL queries

In Maybe Normalizing Isn't Normal Jeff Atwood says, "You're automatically measuring all the queries that flow through your software, right?" I'm not but I'd like to.
Some features of the application in question:
ASP.NET
a data access layer which depends on the MS Enterprise Library Data Access Application Block
MS SQL Server
In addition to Brad's mention of SQL Profiler, if you want to do this in code, then all your database calls need to funnelled through a common library. You insert the timing code there, and voila, you know how long every query in your system takes.
A single point of entry to the database is a fairly standard feature of any ORM or database layer -- or at least it has been in any project I've worked on so far!
SQL Profiler is the tool I use to monitor traffic flowing to my SQL Server. It allows you to gather detailed data about your SQL Server. SQL Profiler has been distributed with SQL Server since at least SQL Server 2000 (but probably before that also).
Highly recommended.
Take a look at this chapter Jeff Atwood and I wrote about performance optimizations for websites. We cover a lot of stuff, but there's a lot of stuff about database tracing and optimization:
Speed Up Your Site: 8 ASP.NET Performance Tips
The Dropthings project on CodePlex has a class for timing blocks of code.
The class is named TimedLog. It implements IDisposable. You wrap the block of code you wish to time in a using statement.
If you use rails it automatically logs all the SQL queries, and the time they took to execute, in your development log file.
I find this very useful because if you do see one that's taking a while, it's one step to just copy and paste it straight off the screen/logfile, and put 'explain' in front of it in mysql.
You don't have to go digging through your code and reconstruct what's happening.
Needless to say this doesn't happen in production as it'd run you out of disk space in about an hour.
If you define a factory that creates SqlCommands for you and always call it when you need a new command, you can return a RealProxy to an SqlCommand.
This proxy can then measure how long ExecuteReader / ExecuteScalar etc. take using a StopWatch and log it somewhere. The advantage to using this kind of method over Sql Server Profiler is that you can get full stack traces for each executed piece of SQL.

Resources