I have a Microsoft Access .accdb database on a company server. If someone opens the database over the network, and runs a query, where does the query run? Does it:
run on the server (as it should, and as I thought it did), and only the results are passed over to the client through the slow network connection
or run on the client, which means the full 1.5 GB database is loaded over the network to the client's machine, where the query runs, and produces the result
If it is the latter (which would be truly horrible and baffling), is there a way around this? The weak link is always the network, can I have queries run at the server somehow?
(Reason for asking is the database is unbelievably slow when used over network.)
The query is processed on the client, but that does not mean that the entire 1.5 GB database needs to be pulled over the network before a particular query can be processed. Even a given table will not necessarily be retrieved in its entirety if the query can use indexes to determine the relevant rows in that table.
For more information, see the answers to the related questions:
ODBC access over network to *.mdb
C# program querying an Access database in a network folder takes longer than querying a local copy
It is the latter, the 1.5 GB database is loaded over the network
The "server" in your case is a server only in the sense that it serves the file, it is not a database engine.
You're in a bad spot:
The good thing about access is that it's easy to create forms and reports and things by people who are not developers. The bad is everything else about it. Particularly 2 things:
People wind up using it for small projects that grow and grow and grow, and wind up in your shoes.
It sucks for multiple users, and it really sucks over a network when it gets big
I always convert them to a web-based app with SQL server or something, but I'm a developer. That costs money to do, but that's what happens when you use a tool that does not scale.
I'm looking for a little advice.
I have some SQL Server tables I need to move to local Access databases for some local production tasks - once per "job" setup, w/400 jobs this qtr, across a dozen users...
A little background:
I am currently using a DSN-less approach to avoid distribution issues
I can create temporary LINKS to the remote tables and run "make table" queries to populate the local tables, then drop the remote tables. Works as expected.
Performance here in US is decent - 10-15 seconds for ~40K records. Our India teams are seeing >5-10 minutes for the same datasets. Their internet connection is decent, not great and a variable I cannot control.
I am wondering if MS Access is adding some overhead here than can be avoided by a more direct approach: i.e., letting the server do all/most of the heavy lifting vs Access?
I've tinkered with various combinations, with no clear improvement or success:
Parameterized stored procedures from Access
SQL Passthru queries from Access
ADO vs DAO
Any suggestions, or an overall approach to suggest? How about moving data as XML?
Note: I have Access 7, 10, 13 users.
Thanks!
It's not entirely clear but if the MSAccess database performing the dump is local and the SQL Server database is remote, across the internet, you are bound to bump into the physical limitations of the connection.
ODBC drivers are not meant to be used for data access beyond a LAN, there is too much latency.
When Access queries data, is doesn't open a stream, it fetches blocks of it, wait for the data wot be downloaded, then request another batch. This is OK on a LAN but quickly degrades over long distances, especially when you consider that communication between the US and India has probably around 200ms latency and you can't do much about it as it adds up very quickly if the communication protocol is chatty, all this on top of the connection's bandwidth that is very likely way below what you would get on a LAN.
The better solution would be to perform the dump locally and then transmit the resulting Access file after it has been compacted and maybe zipped (using 7z for instance for better compression). This would most likely result in very small files that would be easy to move around in a few seconds.
The process could easily be automated. The easiest is maybe to automatically perform this dump every day and making it available on an FTP server or an internal website ready for download.
You can also make it available on demand, maybe trough an app running on a server and made available through RemoteApp using RDP services on a Windows 2008 server or simply though a website, or a shell.
You could also have a simple windows service on your SQL Server that listens to requests for a remote client installed on the local machines everywhere, that would process the dump and sent it to the client which would then unpack it and replace the previously downloaded database.
Plenty of solutions for this, even though they would probably require some amount of work to automate reliably.
One final note: if you automate the data dump from SQL Server to Access, avoid using Access in an automated way. It's hard to debug and quite easy to break. Use an export tool instead that doesn't rely on having Access installed.
Renaud and all, thanks for taking time to provide your responses. As you note, performance across the internet is the bottleneck. The fetching of blocks (vs a continguous DL) of data is exactly what I was hoping to avoid via an alternate approach.
Or workflow is evolving to better leverage both sides of the clock where User1 in US completes their day's efforts in the local DB and then sends JUST their updates back to the server (based on timestamps). User2 in India, also has a local copy of the same DB, grabs just the updated records off the server at the start of his day. So, pretty efficient for day-to-day stuff.
The primary issue is the initial DL of the local DB tables from the server (huge multi-year DB) for the current "job" - should happen just once at the start of the effort (~1 wk long process) This is the piece that takes 5-10 minutes for India to accomplish.
We currently do move the DB back and forth via FTP - DAILY. It is used as a SINGLE shared DB and is a bit LARGE due to temp tables. I was hoping my new timestamped-based push-pull of just the changes daily would have been an overall plus. Seems to be, but the initial DL hurdle remains.
We use a central SQL Server (2008 Standard edition) and several smaller, dedicated SQL Servers (Express editions). We need to implement some mechanism for transferring data asynchronously* from the dedicated decentralized SQL Server (bigger volume, see below) and back from the central SQL Server (few records, basically some notifications for the machines and possibly some optimization hints).
The dedicated SQL Servers are physically located near technology machines, and they are collecting say datetime, temperature rows in regular intervals (think about few seconds interval). There are about 500 records for one job, but the next job follows immediately (the machine does not know it is a new job--being quite stupid in the sense -- and simply collects the temperatures on and on).
The technology machines must be able to work without the central SQL Server, and the central SQL Server must work also when the machine is not accessible (i.e. its dedicated SQL engine cannot be reached, switched off with the machine). In other words, the solution need not to be super fast, but must be robust in the sense that no collected data is lost.
The basic idea is to move the collected data from the dedicated SQL Server (preprocessed to the normalized format with ID of the machine) to the well known table on the central SQL Server. Only the newer data should be sent to minimize the amount of the data. That transfer should be started by the dedicated SQL Server in regular intervals (say one hour) if the connection is OK. If the connection is not OK, the data will be sent after next hour, etc.
Another well known table on the central SQL Server will be used to send notifications for the dedicated SQL Server engines. This way the dedicated engine can be told (for example) what data was already processed/archived on the central SQL Server (i.e. the hint for what records may already be deleted from the local database on the dedicated machine), or whatever information that is hinted from the central (just hints or other not the real-time requirements). The hints will be collected by the dedicated SQL Server (i.e. also the machine responsibility). In other words, the central SQL Server only processes the well known, local tables. It does not try to connect the dedicated SQL Server machines.
The solution should use only the standard mechanisms -- SQL commands (via stored procedures), no external software. What kind of solution should I focus on?
Thanks,
Petr
[Edited later] The SQL servers are at the same Local Area Network.
If you are willing to make a mental switch and stop thinking in terms of tables and rows and instead think in terms of data and messages then Service Broker can do handle all the communication, delivery and message processing. Instead of locally (on the Express machines) doing INSERT INTO LocalTable(datetime, temperature) VALUES (...) you think in terms of:
BEGIN CONVERSATION WITH CentralServer ...;
SEND ON conversation MESSAGE TYPE [Measurement] (<datetime...><temperature ...>)
See Using Service Broker instead of Replication or High Volume Contiguous Real Time ETL
Sounds like a job for merge replication.
Listening to Scott Hanselman's interview with the Stack Overflow team (part 1 and 2), he was adamant that the SQL server and application server should be on separate machines. Is this just to make sure that if one server is compromised, both systems aren't accessible? Do the security concerns outweigh the complexity of two servers (extra cost, dedicated network connection between the two, more maintenance, etc.), especially for a small application, where neither piece is using too much CPU or memory? Even with two servers, with one server compromised, an attacker could still do serious damage, either by deleting the database, or messing with the application code.
Why would this be such a big deal if performance isn't an issue?
Security. Your web server lives in a DMZ, accessible to the public internet and taking untrusted input from anonymous users. If your web server gets compromised, and you've followed least privilege rules in connecting to your DB, the maximum exposure is what your app can do through the database API. If you have a business tier in between, you have one more step between your attacker and your data. If, on the other hand, your database is on the same server, the attacker now has root access to your data and server.
Scalability. Keeping your web server stateless allows you to scale your web servers horizontally pretty much effortlessly. It is very difficult to horizontally scale a database server.
Performance. 2 boxes = 2 times the CPU, 2 times the RAM, and 2 times the spindles for disk access.
All that being said, I can certainly see reasonable cases that none of those points really matter.
It doesn't really matter (you can quite happily run your site with web/database on the same machine), it's just the easiest step in scaling..
It's exactly what StackOverflow did - starting with single machine running IIS/SQL Server, then when it started getting heavily loaded, a second server was bought and the SQL server was moved onto that.
If performance is not an issue, do not waste money buying/maintaining two servers.
On the other hand, referring to a different blogging Scott (Watermasyck, of Telligent) - they found that most users could speed up the websites (using Telligent's Community Server), by putting the database on the same machine as the web site. However, in their customer's case, usually the db & web server are the only applications on that machine, and the website isn't straining the machine that much. Then, the efficiency of not having to send data across the network more that made up for the increased strain.
Tom is correct on this. Some other reasons are that it isn't cost effective and that there are additional security risks.
Webservers have different hardware requirements than database servers. Database servers fare better with a lot of memory and a really fast disk array while web servers only require enough memory to cache files and frequent DB requests (depending on your setup). Regarding cost effectiveness, the two servers won't necessarily be less expensive, however performance/cost ratio should be higher since you don't have to different applications competing for resources. For this reason, you're probably going to have to spend a lot more for one server which caters to both and offers equivalent performance to 2 specialized ones.
The security concern is that if the single machine is compromised, both webserver and database are vulnerable. With two servers, you have some breathing room as the 2nd server will still be secure (for a while at least).
Also, there are some scalability benefits since you may only have to maintain a few database servers that are used by a bunch of different web applications. This way you have less work to do applying upgrades or patches and doing performance tuning. I believe that there are server management tools for making these tasks easier though (in the single machine case).
I would think the big factor would be performance. Both the web server/app code and SQL Server would cache commonly requested data in memory and you're killing your cache performance by running them in the same memory space.
Security is a major concern. Ideally your database server should be sitting behind a firewall with only the ports required to perform data access opened. Your web application should be connecting to the database server with a SQL account that has just enough rights for the application to function and no more. For example you should remove rights that permit dropping of objects and most certainly you shouldn't be connecting using accounts such as 'sa'.
In the event that you lose the web server to a hijack (i.e. a full blown privilege escalation to administrator rights), the worst case scenario is that your application's database may be compromised but not the whole database server (as would be the case if the database server and web server were the same machine). If you've encrypted your database connection strings and the hacker isn't savvy enough to decrypt them then all you've lost is the web server.
One factor that hasn't been mentioned yet is load balancing. If you start off thinking of the web server and the database as separate machines, you optimize for fewer network round trips and also it gets easier to add a second web server or a second database engine as needs increase.
I agree with Daniel Earwicker - the security question is pretty much flawed.
If you have a single box setup with a webserver and only the database for that webserver on it, if that webserver is compromised you lose both the webserver and only the database for that specific application.
This is exactly the same as what happens if you lose the webserver on a 2-server setup. You lose the web server, and just the database for that specific application.
The argument that 'the rest of the DB server's integrity is maintained' where you have a 2-server setup is irrelevant, because in the first scenario, every other database server relating to every other application (if there are any) remain unaffected as well - being, as they are, hosted elsewhere.
Similarly, to the question posed by Kev 'what about all the other databases residing on the DB server? All you've lost is one database.'
if you were hosting an application and database on one server, you would only host databases on that server which related to that application. Therefore, you would not lose any additional databases in a single server setup when compared to a multiple server setup.
By contrast, in a 2 server setup, where the attacker had access to the Web Server, and by proxy, limited rights (in the best case scenario) to the database server, they could put the databases of every other application at risk by carrying out slow, memory intensive queries or maximising the available storage space on the database server. By separating the applications out into their own concerns, very much like virtualisation, you also isolate them for security purposes in a positive way.
I can speak from first hand experience that it is often a good idea to place the web server and database on different machines. If you have an application that is resource intensive, it can easily cause the CPU cycles on the machine to peak, essentially bringing the machine to a halt. However, if your application has limited use of the database, it would probably be no big deal to have them share a server.
Wow, No one brings up the fact that if you actually buy SQL server at 5k bucks, you might want to use it for more than your web application. If your using express, maybe you don't care. I see SQL servers run Databases for 20 to 30 applicaitions, so putting it on the webserver would not be smart.
Secondly, depends on whom the server is for. I do work for financial companies and the govt. So we use a crazy pain in the arse approach of using only sprocs and limiting ports from webserver to SQL. So if the web app gets hacked. The only thing the hacker can do is call sprocs as the user account on the webserver is locked down to only see/call sprocs on the DB. So now the hacker has to figure out how to get into the DB. If its on the web server well its kind of easy to get to.
It depends on the application and the purpose. When high availability and performance is not critical, it's not bad to not to separate the DB and web server. Especially considering the performance gains - if the appliation makes a large amount of database queries, a considerable amount of network load can be removed by keeping it all on the same system, keeping the response times low.
I listened to that podcast, and it was amusing, but the security argument made no sense to me. If you've compromised server A, and that server can access data on server B, then you instantly have access to the data on server B.
I think its because the two machines usually would need to be optimized in different ways. Other than that I have no idea, we run all our applications with the server-database on the same machine - granted we're not public facing - but we've had no problems.
I can't imagine that too many people care about one machine being compromised over both since the web application will usually have nearly unrestricted access to at the very least the data if not the schema inside the database.
Interested in what others might say.
Database licences are not cheep and are often charged per CPU, therefore by separating out your web-servers you can reduce the cost of your database licences.
E.g if you have 1 server doing both web and database that contains 8 CPUs you will have to pay for an 8 cpu licence. However if you have two servers each with 4 CPUs and runs the database on one server you will only have to pay for a 4 cpu licences
An additional concern is that databases like to take up all the available memory and hold it in reserve for when it wants to use it. You can force it to limit the memory but this can considerably slow data access.
Something not mentioned here, and the reason I am facing, is 0 downtime deployments. Currently I have DB/webserver on same machine and that makes updates a pain. If you they are on a seprate machine, you can perform A/B releases.
I.e.:
The DNS currently points to WebServerA
Apply sofware updates to WebServerB
Change DNS to point to WebServerB
Work on WebServerA at leisure for the next round of updates.
This works before the state is stored in the DB, on a separate server.
Arguing that there is a real performance gain to be had by running a database server on a web server is a flawed argument.
Since Database servers take query strings and return result sets, the data actually flowing from data server to web server is relatively small, but the horsepower required to process the query and generate the result set is relatively large. Optimizing performance around the data transfer time therefore is optimizing around the wrong thing.
Regarding security, there are advantages to having the data server on a different box than the web server. Having such a setup is not the be all and end all of security, but it is a step in the right direction.
Regarding scalability, it is easy and relatively cheap to add web servers and put them into cluster to handle increased traffic. It is not so easy and cheap to add data servers and cluster them. Also, web servers and data servers have different hardware needs, so multiple boxes help out with scalability.
If you are starting small and have only one box, then a good way would go would be to use virtual machines. Running the web server and data server in different VMs on one host gives you all the gains of separate boxes at the cost of one large box price.
Operating system is another consideration. While your database may require larger memory spaces and therefore UNIX, your web server - or more specifically your app server since you mention only two tiers - may be a .Net-based, and therefore require Windows.
Ok! Here is the thing, it is more Secure to have your DB Server installed on another Machine and your Application on the Web Server. You then connect your application to the DB with a Web Link. Thanks it.
Is it good, bad, or indifferent to run SQL Server on your webserver?
I'm using Server 2008 and SQL Server 2005, but I don't think that matters to this question.
For small sites, it doesn't make a bit of a difference.
As the load grows, though, this scales really badly, and quicker than you think:
Database servers are built on the premise they "own" the server. They trade memory for speed and they easily use all available RAM for internal caching.
Once resources start to be scarce, profiling becomes very difficult -- it is clear that IIS and SQL are both suffering, less clear where the bottleneck is. IIS needs CPU, SQL Server needs RAM or CPU etc etc
No matter how many layers you put in your code, it all runs on the same CPU, therefore a single layered application will run better in this context -- less overhead -- but it will not scale.
Security is really bad, usually you isolate SQL behind a firewall!
If you can afford it, it's probably better to shell out a few bucks and get a second server, maybe using PostgreSQL. One IIS server and one PostgreSQL cost about as much as on IIS + SQL Server because of licensing costs...
Larger shops would probably not consider this a best practice... However, if you aren't dealing with hundreds of requests per second, you're fine putting them both on one box.
In fact, for small apps, you will see better performance on the back-end because data does not have to go across the wire. It's all about scale.
Keep in mind that database servers eat memory. Here's one important lesson from the school of hard knocks: if you decide to run SQL Server 2005 on the same machine as your web server (and that is the setup you mentioned in your question), make sure you go into Sql Server Management Studio and do this:
Right click on the server instance and click properties
Select 'memory' from the list on the left
Change 'maximum server memory' to something your server can sustain.
If you don't do that, SQL Server will eventually take up all of your server's RAM and hang onto it indefinitely. This will cause your server to more or less sputter and die. If you are not aware of this, it can be very frustrating to troubleshoot.
I've done this quite a few times. It's not something you would do if you had the infrastructure of a large corporation and it does not scale, but it's fine for a lot of things.
It really comes down to how much work your webserver and your sql server are doing.
Without more information I doubt you are going to get any helpful answers.
If your web server is publicly accessible, this is a VERY bad idea from a security perspective.
Although it makes a lot of things more difficult from a routing, firewall, ports, authentication, etc. perspective, separation is good. When you have your database server running on the web server, if your web server is compromised, then your sql server is, too.
When you have them on separate boxes, you've raised the bar a little.
There's still a lot more work to be done to secure your web server AND your database server, but why make it easier than it needs to be?
I'd say it was best to run them on the same server until it becomes a problem. That way you'll save yourself some money and time upfront. Once the site becomes a success and requires a some architectural changes it should have already paid for itself.
Remember to back up :)
It will depend on the expected load of the server. For small sites, it is no problem at all (if correctly configured). For large sites, you might want to consider distributing the load over different servers: web server, file server, database server, etc.
I've seen this issue over and over again. The right answer is to put SQL Server on one machine and IIS (web server) on the other. Your money will go into the SQL Server machine because the right drive system and RAM must be purchased to support a efficient server but the web server can be a much scaled down & less expensive machine with just a mirrored drive set.