We have a coldfusion enterprise server with 2 instances. Each instance has 200+ data-sources to databases on one MSSQL server. This number will keep on growing. Now it seems that requests to a single data-source are getting slower even though the database is small. It is possible that requests get slower when CF has more data-sources?
Are the datasources partitioned for a reason (e.g. different clients/customers, etc)? If this is really just a big application with a bunch of databases, you may be able reduce the number of DSNs through cross-database queries through a single CF datasource.
If the account CF is using to connect to SQL Server has read access to both databases on the server, you can do something like this:
SELECT field1, field2, field3...
FROM [databaseA].[dbo].Table1 T1
JOIN [databaseB].[dbo].Table2 T2 ON ...
I've done this with State and Country tables that are shared across multiple DBs. Set the permissions carefully to prevent damage or errant updates.
Of course it's possible, I doubt there are many people with this kind of experience so we could just guess.
Personally I'd never make that many databases in SQL server, and that many datasources in CF. IMHO using db schemas would be much better solution, easier to maintain, administer and so on.
How's situation with memory? Could happen that huge amount of JDBC connections is choking the server. I'd check memory consumption first, SQL stats after to see data through-output and maybe later even SQL Severs performance settings, CF settings to see concurent possible JDBC connections, network settings and so on.
Again, just guessing and trying to give you a hint where to look.
There's more too it than just coldfusion. Each connection is about 4k, and each datasource can use multiple connections. So 200 DSN's might equal 300 or 400 connections (or 800 or 1000 when aggregated). The DB server itself uses the "tempdb" as a work space for handling requests. It expands this workspace to handle the traffic - but it is a shared resource in a way. So one DB can have an impact on another DB on the server.
I would:
Check the total number of connections on the SQL server (perfmon has some good counters for this)
Use server monitor to get a sense of the total number of connections on each instance.
Use network monitoring to determine what capacity the network connection on each server is using...
Of course it goes without saying that your databases need to be fine tuned to perform as well (indexed and optimized - with a good schema and backstopped by good query code). Creating a scalable solution requires all of these things :)
PS - it goes without saying you can contact me for more "formal" help. I'll be glad to chat about your problem.
Related
I have a Microsoft Access .accdb database on a company server. If someone opens the database over the network, and runs a query, where does the query run? Does it:
run on the server (as it should, and as I thought it did), and only the results are passed over to the client through the slow network connection
or run on the client, which means the full 1.5 GB database is loaded over the network to the client's machine, where the query runs, and produces the result
If it is the latter (which would be truly horrible and baffling), is there a way around this? The weak link is always the network, can I have queries run at the server somehow?
(Reason for asking is the database is unbelievably slow when used over network.)
The query is processed on the client, but that does not mean that the entire 1.5 GB database needs to be pulled over the network before a particular query can be processed. Even a given table will not necessarily be retrieved in its entirety if the query can use indexes to determine the relevant rows in that table.
For more information, see the answers to the related questions:
ODBC access over network to *.mdb
C# program querying an Access database in a network folder takes longer than querying a local copy
It is the latter, the 1.5 GB database is loaded over the network
The "server" in your case is a server only in the sense that it serves the file, it is not a database engine.
You're in a bad spot:
The good thing about access is that it's easy to create forms and reports and things by people who are not developers. The bad is everything else about it. Particularly 2 things:
People wind up using it for small projects that grow and grow and grow, and wind up in your shoes.
It sucks for multiple users, and it really sucks over a network when it gets big
I always convert them to a web-based app with SQL server or something, but I'm a developer. That costs money to do, but that's what happens when you use a tool that does not scale.
I'm looking for a little advice.
I have some SQL Server tables I need to move to local Access databases for some local production tasks - once per "job" setup, w/400 jobs this qtr, across a dozen users...
A little background:
I am currently using a DSN-less approach to avoid distribution issues
I can create temporary LINKS to the remote tables and run "make table" queries to populate the local tables, then drop the remote tables. Works as expected.
Performance here in US is decent - 10-15 seconds for ~40K records. Our India teams are seeing >5-10 minutes for the same datasets. Their internet connection is decent, not great and a variable I cannot control.
I am wondering if MS Access is adding some overhead here than can be avoided by a more direct approach: i.e., letting the server do all/most of the heavy lifting vs Access?
I've tinkered with various combinations, with no clear improvement or success:
Parameterized stored procedures from Access
SQL Passthru queries from Access
ADO vs DAO
Any suggestions, or an overall approach to suggest? How about moving data as XML?
Note: I have Access 7, 10, 13 users.
Thanks!
It's not entirely clear but if the MSAccess database performing the dump is local and the SQL Server database is remote, across the internet, you are bound to bump into the physical limitations of the connection.
ODBC drivers are not meant to be used for data access beyond a LAN, there is too much latency.
When Access queries data, is doesn't open a stream, it fetches blocks of it, wait for the data wot be downloaded, then request another batch. This is OK on a LAN but quickly degrades over long distances, especially when you consider that communication between the US and India has probably around 200ms latency and you can't do much about it as it adds up very quickly if the communication protocol is chatty, all this on top of the connection's bandwidth that is very likely way below what you would get on a LAN.
The better solution would be to perform the dump locally and then transmit the resulting Access file after it has been compacted and maybe zipped (using 7z for instance for better compression). This would most likely result in very small files that would be easy to move around in a few seconds.
The process could easily be automated. The easiest is maybe to automatically perform this dump every day and making it available on an FTP server or an internal website ready for download.
You can also make it available on demand, maybe trough an app running on a server and made available through RemoteApp using RDP services on a Windows 2008 server or simply though a website, or a shell.
You could also have a simple windows service on your SQL Server that listens to requests for a remote client installed on the local machines everywhere, that would process the dump and sent it to the client which would then unpack it and replace the previously downloaded database.
Plenty of solutions for this, even though they would probably require some amount of work to automate reliably.
One final note: if you automate the data dump from SQL Server to Access, avoid using Access in an automated way. It's hard to debug and quite easy to break. Use an export tool instead that doesn't rely on having Access installed.
Renaud and all, thanks for taking time to provide your responses. As you note, performance across the internet is the bottleneck. The fetching of blocks (vs a continguous DL) of data is exactly what I was hoping to avoid via an alternate approach.
Or workflow is evolving to better leverage both sides of the clock where User1 in US completes their day's efforts in the local DB and then sends JUST their updates back to the server (based on timestamps). User2 in India, also has a local copy of the same DB, grabs just the updated records off the server at the start of his day. So, pretty efficient for day-to-day stuff.
The primary issue is the initial DL of the local DB tables from the server (huge multi-year DB) for the current "job" - should happen just once at the start of the effort (~1 wk long process) This is the piece that takes 5-10 minutes for India to accomplish.
We currently do move the DB back and forth via FTP - DAILY. It is used as a SINGLE shared DB and is a bit LARGE due to temp tables. I was hoping my new timestamped-based push-pull of just the changes daily would have been an overall plus. Seems to be, but the initial DL hurdle remains.
I'm not a DBA so this may be a stupid question but I'll ask it anyway. We're upgrading our SQL Servers from 2000 to 2005 and we will probably use either database replication or database mirroring. Our DBA would like to "multipurpose" the standby server meaning that he'd like to increase our capabilities and capacity by running other database applications on the standby server since "it's just going to be sitting there anyway" (his words, not mine). Is this such a good idea? Right now, our main application server uses only one instance that contains 50+ databases. As I understand it, what we're doing now and what our DBA is proposing for a failover server is a bad idea because all of these databases are sharing memory, CPUs, and working areas. If one applications starts behaving badly, the other DBs could be affected.
Any thoughts?
It's really a business question that needs to be answered?? is a slow app better then no app if you can't afford the expense of extra hardware?
Standby and mirrored db's can be used for reporting. Using it as the failover db can work if you have enough headroom (i.e. both databases will comfortably run on the server)
Will you depend on these extra applications? Where do they run in the failover case?
You really need to understand your failure modes.
If you look at it as basic resource math, that doesn't often make sense unless the resources you have running in the failure scenarios can handle the entire expected load. Sometimes this is the case, but not always. In this case, to handle the actual load you may need yet another server to come in (like RAID - perhaps your load needs a minimum of 5 servers, but you have a farm of 6, then you need 1 standby server for ever server to fail above 1). Sometimes a farm can run degraded, but sometimes they just puke and die.
And in the case of out of normal operation, you often have accident cascading where a legitimate incident causes a cascade of issues - e.g. your backup tape is busy restoring a server from a backup (to a test environment, even - there are no real "failures"), now your sql server or exhcange server (or both) is not backed up and your log gets full.
Database Mirroring would not be the way to go here in my opinion as it provides redundancy at the database level only. So you would need to configure database mirroring for up to 50 databases based on the information you provided. The chances are that if one DB where to fail all, 50 would probably follow, as failures typically occur at the hardware level rather than a specific database.
It sounds to me like you should be using SQL Server Clustering technology. You could create an Active/Active cluster to support your requirements.
What is an Active/Active Cluster?
An Active/Active SQL Server cluster means that SQL Server is running on both nodes of a two-way cluster. Each copy of SQL Server acts independently, and users see two different SQL Servers. If one of the SQL Servers in the cluster should fail, then the failed instance of SQL Server will failover to the remaining server. This means that then both instances of SQL Server will be running on one physical server, instead of two.
Applying this to your scenario
You could then split the databases between two instances of SQL server, one active instance on each node. Should one node fail, the other node will pick up the slack and vice versa.
Further Reading
An introduction to SQL Server Clustering
I suspect that you will find the following MSDN thread useful reading also
"it's just going to be sitting there anyway"
It will be sitting there applying transactions...
Take note of John Sansom's recommendation. Keep in mind that a Active/Active cluster requires two sql server licenses and a failover cluster/mirror only needs one.
Setting up mirroring for a large number of db's could turn into a big pain. You need any jobs/maintenance to move over as well - which can be achieved with alerts on WMI failover events. There's probably more to think about that could complicate things.
Is it possible to configure multiple database servers (all hosting the same database) to execute a single query simultaneously?
I'm not asking about executing queries using multiple CPUs simultaneously - I know this it possible.
UPDATE
What I mean is something like this:
There are two 2 servers: Server1 and Server2
Both server host database Foo and both instances of Foo are identical
I connect to Server1 and submit a complicated (lots of joins, many calculations) query
Server1 decides that some calculations should be made on Server2 and some data should be read from that server, too - appropriate parts of the query are sent to Server2
Both servers read data and perform necessary calculations
Finally, results from Server1 and Server2 are merged and returned to the client
All this should happen automatically, without need to explicitly reference Server1 or Server2. I mean such parallel query execution - is it possible?
UPDATE 2
Thanks for the tips, John and wuputah.
I am researching alternatives of increasing both availability and capacity of MOSS database backend. So what I'm looking for is some kind out-of-the-box SQL Server load balancing solution that would be transparent to the application, because I cannot modify the application in any way. I guess SQL Server has no such feature (and Oracle, as far as I understand it, does - it is RAC mentioned by wuputah).
UPDATE 3
A quote from the Top Tips for SQL Server Clustering article:
Let's start by debunking a common
misconception. You use MSCS clustering
for high availability, not for load
balancing. Also, SQL Server does not
have any built-in, automatic
load-balancing capability. You have to
load balance through your
application's physical design.
What you're really talking about is a clustering solution. It looks like SQL Server and Oracle have solutions to this, but I don't know anything about them. I can guess they would be very costly to buy and implement.
Possible alternate suggestions would be as follows:
Use master-slave replication, and do your complex read queries from the slave. All writes must go to the master, which are then sent to the slave, so things stay in sync. This helps things go faster because the slave only has to worry about the writes coming from the master, which are already predetermined on behalf of the slave (no deadlocks etc). If you're looking to utilize multiple servers, this is the first place I would start.
Use master-master replication. This means that all writes from both servers go to each other, so they stay in sync (at least theoretically). This has some of the benefits as master-slave but you don't have to worry about writes going to one server instead of the other. The more common use of master-master replication is for failover support; master-slave is really better suited to performance.
Use the feature John Sansom talked about. I don't know much about it, but it seems its basis is splitting your database into tables on different servers, which will have some benefits as well as drawbacks. The big issue is that since the two systems can't share memory, they will have to share a lot of data over the network to compute complex joins.
Hope this helps!
RE Update 1:
If you can't modify the application, there is hope, but it might be a bit complicated. If you were to set up master-slave replication, you can then set up a proxy to send read queries to the slave(s) and write queries to the master(s). I've seen this done with MySQL, but not SQLServer. That's a bit of a problem unless you want to write the proxy yourself.
This has been discussed on SO previously, so you can find more information there.
RE Update 2:
Microsoft's clustering might not be designed for performance, but that's Microsoft fault. That's still the level of complexity you're talking about here. If they say it won't help, then your options are limited to those above and by what you do with your application (like sharding, splitting into multiple databases, etc).
Yes I believe it is possible, well sort of, let me explain.
You need to look into and research the use of Distributed Queries. A distributed query runs across multiple servers and is typically used to reference data that is not stored locally.
http://msdn.microsoft.com/en-us/library/ms191440.aspx
For example, Server A may hold my Customers table and Server B holds my Orders table. It is possible using distributed queries to run a query that references both Server A and Server B, with each server managing the processing of its local data (which could incorporate the use of parallelism).
Now in theory you could store the exact same data on each server and design your queries specifically so that only certain table were referenced on certain servers, thereby distributing the query load. This is not true parallel processing however, in terms of CPU.
If your intended goal is to distribute the processing load of your application then the typical approach with SQL Server is to use Replication to distribute data processing across multiple servers. This method is also not to be confused with parallel processing.
http://databases.about.com/cs/sqlserver/a/aa041303a.htm
I hope this helps but of course please feel free to pose any questions you may have.
Interesting question, but I'm struggling to get my head around this being beneficial for a multi-user system.
If I'm the only user having half my query done on Server1 and the other half on Server2 sounds cool :)
If there are two concurrent users (lets say with queries of identical difficulty) then I'm struggling to see that this helps :(
I could have identical data on both servers and load balancing - so I get Server1, my mate gets Server2 - or I could have half the data on Server1 and the other half on Server2, and each will be optimised, and cache, just their own data - spreading the load. But whenever you have to do a merge to complete a query the limiting factor becomes the pipe-size between them.
Which is basically Federated Database Servers. Instead of having all my Customers on one server and all my Orders on the other I could, say, have my USA customers and their orders on one, and my European customers/orders on the other, and only if my query spans both is there any need for a merge step.
I remember hearing Joel say he has 2 different locations where the servers are located, each location has 2 front end servers and 1 back end server.
If a one of the hosting facilities goes down, how can he switch over to the other one? (Or is it just going to be a DNS change that will take 24-72 hours to propagate?).
How can a single SQL Server instance have so many databases on it? FB has a completely separate database per account. I can't see a single SQL Server instance having more than say 200-250 databases on it! And I'm sure they have more customers than that.
They talked about this in one of the Stack Overflow podcasts, but I can't find it in the transcripts.
1) Each of the two centers handles approximately 1/2 of the users. Fairly often (hourly, I think Joel said) they ship transaction logs to the other site. If site A goes down, they bring up the db backups on site B, and do the DNS switchover. It won't be instantaneous or automated, nor do they want it to be, because they'll be coming up with slightly stale data, and want to avoid that if it's at all possible to bring the broken site back up.
I'm not sure how they handle the DNS situation, but you can set the TTL on DNS records to mere seconds to limit caching, and have failover occur very quickly.
2) Why not? I'm not sure of the hard limit of databases per instance, but there's also nothing keeping you from running multiple instances of SQL Server on your box. I would imagine you're more limited by hardware than software. (You can also run Fogbugz with a MySQL database backend).
Got a few questions in here so I'll break these out:
If a one of the hosting facilities goes down, how can he switch over to the other one?
There's several ways to do this, including database mirroring (new in SQL Server 2005), log shipping, and replication. I've recorded a podcast on SQL Server high availability and disaster recovery options at SQLServerPedia.
(Or is it just going to be a DNS change that will take 24-72 hours to propagate?)
Like the other post mentioned, you can set your DNS time-to-live numbers very slow, but the cooler method uses database mirroring. With mirroring, you can set both the primary and secondary server names in your connection string, and your application will automatically try the second server when the first one doesn't respond.
How can a single SQL Server instance have so many databases on it? FB has a completely separate database per account. I can't see a single SQL Server instance having more than say 200-250 databases on it! And I'm sure they have more customers than that.
The largest SQL Server I've worked with had over a thousand databases, and I've talked to a couple of other DBAs who have worked on systems with more than 2,000 databases on a server. It certainly makes management much more challenging, that's for sure.