ADO.NET database access - database

I have written a program in VB.NET and one of the things this program does is insert records into a Microsoft Access database. The backend of my program that access the database is written as an interchangeable layer. If I "swap" this layer out with a layer that used a Microsoft SQL Server database, my program flies. If I use MS Access, its still pretty quick, but it is much slower. Does anyone have any hints or tips on how to speed up ADO.NET transactions using Microsoft Access? I would really rather use MS Access over SQL Server so that I can distribute my database with my program (rather then connecting to some remote SQL Server). Any suggestions? Also, when I created the MS Access database, I created it in Access 2000 compatible mode. would it be faster to use 2003 compatible mode?
Thanks in advance

Although you need to install it, SQL Server Express supports "XCopy file deployment" where all you need to do to deploy the application is ship an .mdf file and your executables.
Details are here on MSDN.
This does support stored procedures: I've used it in our unit tests to dynamically create a mocked-out database on the fly.

Access is, as you're experiencing, less than optimal.
Have you taken a look at SQL Server Compact Edition. It can be embedded and distributed with your application...and should perform much better than Access.

SQL Server Compact 3.5 will give you the same benefit - a single database file that you can deploy and distribute (as long as you include the runtime assemblies in your app).
It has reduced query capabilities compared to a full SQL Server instance, but it is definitely faster than the Access engine.
I have used it with a mobile app that has a desktop component and it did everything I needed it to do.

Did you also have the Access backend open in Access at the same time? If so try your program without having it open. If that speeds things up then you should open either a database connection or a recordset (against a table with few records) and leave it open while processing the data.
The problem is that if you open and close objects or recordsets against an Access database file and someone else is in the Access database file, Jet wastes a lot of time doing locks against the LDB file. So keeping a permaneent connection to the Access database file solves this problem.

To my experience, ADO.NET is not much optimized for MS Access. Using the older ADO or DAO interfaces (which are available in VB.NET via COM) can bring you performance improvements about a factor 20 or more in some cases. But it all depends a lot of what SQL statements your program really does (lots of batch updates / insert, or lots of queries with large result sets, or lots of interactive LOAD-Transform-Store Cycles).

The MSDN features an Article on how to speed up ADO.NET: http://msdn.microsoft.com/en-us/library/ms998569.aspx
Even though the article is a bit dusty, it still makes a few good points :)
Other than that, using MS Access myself, I found that a few techniques such as caching of data, selecting without the source scheme or optimizing queries are suitable to keep the performance at a halfway decent level.

Related

Is it possible to have an Access back-end database available for multiple users on the same network?

I am developing a Visual Basic .NET application to be used by the staff of a small training centre nearby. The front-end (UI, menus, etc.) will all be in VB .NET, and there will be a back-end database for storing all of the required data, such as student records and meeting information.
What I would like to know is if it's possible to use a Microsoft Access database for this purpose, and have it accessible by all the staff in the centre (on the same network) at the same time. For example, would I be able to put the database in a shared network folder, and have a copy of the VB application on each PC that would all be able to read/edit/add to the database?
Advice would be appreciated as to how I should proceed. (Note: I would really prefer a method of doing this with MS Access as opposed to suggestions to switch to SQL, as Access was the requested platform)
Thanks in advance.
Yes it can be done and from a programming stand point it is any (much) different then using SQL Server. I think the biggest considerations you have to think about are:
How many simultaneous users do you expect to have using the application?
How secure does the application need to be? Is Access security enough?
How big do I expect the database to become in the next 1 to 5 years?
I think those are you biggest considerations when using Access as a data store and if your answers fall within the specs of Access capabilities then go for it. You can always migrate to SQL Server at a later time if you run into the limits of Access.
You did not mention the version of Access that you are using but a quick Google/Bing search should return specs for every version available.
Yes, but probably not advisable. Despite the disclaimer in your post, you should try to convince the powers to be to look at SQL Server Express instead-- it's free.
But, if Access is the database, all you need to do is have the database reside on a shared directory with full read-write capabilities for all the users. Hopefully when you say "staff of a small training centre", you mean it.
Install the VB.Net program on the client computers and setup the connection string with the path to the database.
Someone else with more recent Microsoft Access experience can probably give better hints on how to reduce the corruption factor. My own experience was to stay away from queries in Access-- have the Access database only for tables and do all of your queries with SQL statements in your client code. My corrupted databases reduced dramatically when I did that, but that was 10-15 years ago.
Back up the database religiously.
Yes, just make sure you chane the extension of your back end access db to your_database_name.be_accdb and it will start logging once the user start writing to it. But I recommend SQL sever

Better understanding of MySQL transactions

I just realized that my application was needlessly making 50+ database calls per user request due to some hidden coding -- hidden in the sense that between LINQ, persistence frameworks and events it just so turned out that a huge number of calls were being made without me being aware.
Is there a recommended way to analyze individual transactions going to my SQL 2008 database, preferably with some integration to my Visual Studio 2010 environment? I want to be able to 'spy' on individual transactions being made, but only for certain pieces of my code, and without making serious changes to either the code or database.
I addition to SQL Server Profiler, there are a number of performance counters you can look at to see both a real time evaluation and a historic trend:
Batch Requests/sec: Effectively measures the number of actual calls made to the SQL Server
Transactions/sec: Number of transactions in each database.
Connection resets/sec: number of new connections started from the connection pool by your site.
There are many more performance counters you can monitor, specially if you want to measure performance, but going through is besides the scope here. A good starting point is Monitoring Resource Usage.
You can use the SQL Profiler tool that comes with SQL Server Management Studio.
Microsoft SQL Server Profiler is a graphical user interface to SQL Trace for monitoring an instance of the Database Engine or Analysis Services. You can capture and save data about each event to a file or table to analyze later. For example, you can monitor a production environment to see which stored procedures are affecting performance by executing too slowly.
As mentioned, SQL Profiler is userful at the SQL Server level. It is not available in SQL Server SSMS Express however.
At the .NET level, LINQ to SQL and the Entity Framework both support logging. See Logging every data change with Entity Framework, http://msdn.microsoft.com/en-us/magazine/gg490349.aspx, http://peterkellner.net/2008/12/04/linq-debug-output-vs2008/.

Statistical calculations in SQL Server

Does anyone know of any packages or source code that does simple statistical analysis, e.g., confidence intervals or ANOVA, inside a SQL Server stored procedure?
The reason you probably don't want to do that is because these calculations are CPU-intensive. SQL Server is usually licensed by the CPU socket (roughly $5k/cpu for Standard, $20k/cpu for Enterprise) so DBAs are very sensitive to any applications that want to burn a lot of CPU power on the SQL Server itself. If you started doing statistics calculations and suddenly the server needs another CPU, that's an expensive licensing proposition.
Instead, it makes sense to do these statistical calculations on a separate application server. Query the data over the wire to your app server, do the number-crunching there, and then send the results back via an update statement or stored proc. Yes, it's more work, but as your application grows, you won't be facing an expensive licensing bill.
In more recent versions of SQL Server you can use .net objects natively. So any .net package will do. Other than that there's always external proc calls...
Unless you have to do it within the stored proc I'd retrieve the data and do it outside SQL Server. That way you can choose from any of the open source or commercial stats routines and it would probably be faster too.
I don't know if a commercial package like this exist. There could be multiple reasons for this, some of which have been outlined above.
If what you are trying to accomplish is to avoid building statistical functions that process your data stored in SQL Server, you might want to try and integrate statistical packages with your database server by importing data from it. For example, R supports it and there is also CRAN
Once you have accomplished that and you still feel that you'd like to make statistical analysis run inside your SQL Server, the next steps would be to call your stats package from a stored procedure using a command line interface. Your best option here is probably xp_cmdshell, though it requires careful configuration in order not to compromise your SQL Server security.

Automatically measure all SQL queries

In Maybe Normalizing Isn't Normal Jeff Atwood says, "You're automatically measuring all the queries that flow through your software, right?" I'm not but I'd like to.
Some features of the application in question:
ASP.NET
a data access layer which depends on the MS Enterprise Library Data Access Application Block
MS SQL Server
In addition to Brad's mention of SQL Profiler, if you want to do this in code, then all your database calls need to funnelled through a common library. You insert the timing code there, and voila, you know how long every query in your system takes.
A single point of entry to the database is a fairly standard feature of any ORM or database layer -- or at least it has been in any project I've worked on so far!
SQL Profiler is the tool I use to monitor traffic flowing to my SQL Server. It allows you to gather detailed data about your SQL Server. SQL Profiler has been distributed with SQL Server since at least SQL Server 2000 (but probably before that also).
Highly recommended.
Take a look at this chapter Jeff Atwood and I wrote about performance optimizations for websites. We cover a lot of stuff, but there's a lot of stuff about database tracing and optimization:
Speed Up Your Site: 8 ASP.NET Performance Tips
The Dropthings project on CodePlex has a class for timing blocks of code.
The class is named TimedLog. It implements IDisposable. You wrap the block of code you wish to time in a using statement.
If you use rails it automatically logs all the SQL queries, and the time they took to execute, in your development log file.
I find this very useful because if you do see one that's taking a while, it's one step to just copy and paste it straight off the screen/logfile, and put 'explain' in front of it in mysql.
You don't have to go digging through your code and reconstruct what's happening.
Needless to say this doesn't happen in production as it'd run you out of disk space in about an hour.
If you define a factory that creates SqlCommands for you and always call it when you need a new command, you can return a RealProxy to an SqlCommand.
This proxy can then measure how long ExecuteReader / ExecuteScalar etc. take using a StopWatch and log it somewhere. The advantage to using this kind of method over Sql Server Profiler is that you can get full stack traces for each executed piece of SQL.

When is it time to change database backends?

Is there a general rule of thumb to follow when storing web application data to know what database backend should be used? Is the number of hits per day, number of rows of data, or other metrics that I should consider when choosing?
My initial idea is that the order for this would look something like the following (but not necessarily, which is why I'm asking the question).
Flat Files
BDB
SQLite
MySQL
PostgreSQL
SQL Server
Oracle
It's not quite that easy. The only general rule of thumb is that you should look for another solution when the current one can't keep up anymore. That could include using different software (not necessarily in any globally fixed order), hardware or architecture.
You will probably get a lot more benefit out of caching data using something like memcached than switching to another random storage backend.
If you think you are going to ever need one of the heavyweights (SqlServer, Oracle), you should start with one of those at the beginning. Data migrations are extremely difficult. In the long run it will cost you less to just start at the top and stay there.
I think you're being overly specific in your rankings. You can pretty much start with flat files and the like for very small data sets, go up to something like DBM for slightly bigger ones that don't require SQL-like syntax, and go to some kind of SQL database after that.
But who wants to do all that rewriting? If the application will benefit from access to joins, stored procedures, triggers, foreign key validation, and the like--just use a SQL database regardless of the dataset size.
Which one should depend more on the client's existing installations and what DBA skills are available than on the amount of data you're holding.
In other words, the size of your database is far from the only consideration, and maybe not the most important one.
There is no blanket answer to this, but ALMOST always, using flat files is not a good idea. You have to parse through them (i suppose) and they do not scale well. Starting with a proper database, like Oracle or SQL Server (or MySQL, Postgres if you are looking for free options) is a good idea. For very little overhead, you will save yourself a lot of effort and headache later on. They also allow you to structure your data in a non-stupid fashion, leaving you free to think of WHAT you will do with the data rather than HOW you will be getting it in/out.
It really depends on your data, and how you intend to use it. At one of my previous positions, we used Postgres due to the native geo-location and timezone extensions which existed because it allowed us to manage our data using polygonal datatypes. For us, we needed to do that, and we also wanted to use stored procedures, views and the like.
Now, another place I worked at used MySQL simply because the data was normalized, standard row by row data.
SQL Server, for a long time, had a 4gb database limit (see SQL Server 2000), but despite that limitation it remains a very stable platform for small to medium applications for which the old data is purged.
Now, from working with Oracle and SQL Server 05/08, all I can tell you is that if you want the creme of the crop for stability, scalability and flexibility, then these two are your best bet. For enterprise applications, I strongly recommend them (merely because that's what we use where I work now).
Other things to consider:
Language integration (ASP.NET session storage, role management, etc.)
Query types (Select, Update, Delete) [Although this is more of a schema design issue, not a DBMS issue)
Data storage requirements
Your application's utilization of the database is the most critical ones. Mainly what queries are used most often (SELECT, INSERT or UPDATE)?
Say if you use SQLite, it is gears for smaller application but for "web" application you might a bigger one like MySQL or SQL Server.
The way you write scripts and your web application platforms also matters. If you're developing on a Microsoft platform, then SQL Server is a better alternative.
Typically, I go with what is commonly accepted by whichever framework I am using. So, if I'm doing .NET => SQL Server, Python (via Django or Pylons) => MySQL or SQLite.
I almost never use flat files though.
There is more to choosing an RDBMS solution that just "back end horsepower". The ability to have commitment control, for example, so you can roll back a failed transaction is one. reason.
Unless you are in the megatransaction rate application, most database engines would be adequate - so it becomes a question of how much you want to pay for the software, whether it runs on the hardware and operating system environment you want, and what expertise you have in managing that software.
That progression sounds painful. If you're going to include MS products (especially the for-pay SQL Server) in there anywhere, you may as well use the whole stack, since you only have to pay for the last of these:
SQL Server Compact -> SQL Server Express -> SQL Server Enterprise (clustered).
If you target your app at SQL Server Compact initially, all your SQL code is guaranteed to scale up to the next version without modification. If you get bigger than SQL Server Enterprise, then congratulations. That's what they call a good problem to have.
Also: go back and check the SO podcasts. I believe they talked about this briefly.
This question depends on your situation really.
If you have control over the server you're deploying to and you can install whatever services you need, then the time to install a MySql or MSSQL Express server and code against an existing database framework VERSUS coding against flat file structure is not worth the effort of considering.
What about FireBird? Where would that fit into that list?
And lets not forget the requirements that the "customer" of your solution must also have in place. If your writing a commercial application for a small companies, then Oracle might not be a good choice... but if your writing a customized solution for a large enterprise which must share data among multiple campuses, and has a good sized IT department then the decision of Oracle vs Sql Server would come down to what does the customer most likely already have deployed.
Data migration nowdays isn't that bad since we have those great tools from Embarcadero, so I would instead let the customer needs drive the decision.
If you have the option SQL Server is a good choice from the word go, predominantly because you have access to solid procedures and functions and the database backup facilities are totally reliable. Wrapping up as much as your logic as you can inside the database itself (rather than in whatever language you are using) helps security and performance - indeed there's an good argument to be made for always using procedures for insert/update logic as these make you invulnerable to injection attacks.
If I have the choice the only time I'd consider MySQL in preference is with a large, fairly simple, database predominantly used for read access. This isn't to decry MySQL which has improved markedly of late and I happily use if I don't have the choice, but for more complex systems with update/insert activity MSSQL is generally the superior option.
I think your list is subjective but I will play your game.
Flat Files
BDB
SQLite
MySQL
PostgreSQL
SQL Server
Oracle
Teradata

Resources