Yay, first post on SO! (Good work Jeff et al.)
We're trying to solve a bottleneck in one of our web-applications that was introduced when we started allowing users to generate reports on-demand.
Our infrastructure is as follows:
1 server acting as a Webserver/DBServer (ColdFusion 7 and MSSQL 2005)
It's serving a web-application for our backend users and a frontend website. The reports are generated by the users from the backend so there's a level of security where the users have to log in (web based).
During peak hours when reports are generated it brings the web-application and frontend website to unacceptable speed due to SQL Server using resources for the huge queries and afterward ColdFusion generating multi page PDFs.
We're not exactly sure what the best practice would be to remove some load, but restricting access to the reports isn't an option at the moment.
We've considered denormalizing data to other tables to simplify the most common queries, but that seems like it would just push the issue further.
So, we're thinking of getting a second server and use it as a "report server" with a replicated copy of our DB on which the queries would be ran. This would fix one issue, but the second remains: generating PDFs is resource intensive.
We would like to offload that task to the reporting server as well, but being in a secured web-application we can't just fire HTTP GET to create PDFs with the user logged in the web-application from server 1 and displaying it in the web-application but generating/fetching it on server 2 without validating the user's credential...
Anyone have experience with this? Thanks in advance Stack Overflow!!
"We would like to offload that task to the reporting server as well, but being in a secured web-application we can't just fire HTTP GET to create PDFs with the user logged in the web-application from server 1 and displaying it in the web-application but generating/fetching it on server 2 without validating the user's credential..."
why can't you? you're using the world's easiest language for writing webservices. here are my suggestions.
first, move the database to it's own server thus having cf and sql server on separate servers. the first reason to do this is performance. as already mentioned, having both cf and sql on the same server isn't an ideal setup. the second reason is for security. if someone is able to hack your webserver, well there right there to get your data. you should have a firewall in place between your cf and sql server to give you more security. last reason is for scalability. if you ever need to throw more resources or cluster your database, it's easier when it's on it's own server.
now for the webservices. what you can do is install cf on another server and writing webservices to handle the generation of reports. just lock down the new cf server to accept only ssl connections and pass the login credentials of the users to the webservice. inside your webservice, authenticate the user before invoking the methods to generate the report.
now for the pdfs themselves. one of the methods i've done in the pass is generating a hash based on some parameters passed (user credentials and the generated sql to run the query) and then once the pdf is generated, you assign the hash to the name of the pdf and save it on disk. now you have a simple caching system where you can look to see if the pdf already exists and if so, return it, otherwise generate it and cache it.
in closing, your problem is not something that most haven't seen before. you just need to do a little work and your application will magnitudes faster.
The most basic best practice is to not have the web server and db server on the same hardware. I'd start with that.
You have to separate the perception between generating the PDF and doing the calculations. Both are separate steps.
What you can do is
1) Create a report calculated table that will run daily and populated it with all the calculated values for all your reports.
2) When someone requests a PDF report, have the report do a simple select query of the pre-calculated values. It will be much less db effort than calculating on the fly. You can use coldfusion to generate the PDF if it's using the fancy pdf settings. Otherwise you may be able to get away with using the raw PDF format (it's similar to html markup) in text form, or use another library (cfx_pdf, a suitable java library, etc) to generate them.
If the users don't need to download and only need to view/print the report, could you get away with flash paper?
An alternative is also to build a report queue. Whether you put it on the second server or not, what CF could do if you can get away with it, you could put report requests into a queue, and email them to the users as they get processed.
You can then control the queue through a scheduled process to run as regularly as you like and do only create a few reports at a time. I'm not sure if it's a suitable approach for your situation.
As mentioned above, doing a stored procedure may also help, and make sure you have your indexes set correctly in MySQL. I once had a 3 minute query that I brought down to 15 seconds because I forgot to declare additional indexes in each table that were being heavily used.
Let us know how it goes!
In addition to advice to separate web & db servers, I'd tried to:
a) move queries into stored procedures, if you're not using them yet;
b) generate reports by scheduler and keep them cached in special tables in ready-to-use state, so customers only select them with few fast queries -- this should also decrease report building time for customers.
Hope this helps.
Related
I have a system that requires a large amount of names and email addresses (two fields only) to be imported via CSV upload.
I can deal with the upload easily enough, how would I verify the email addresses before I process the import.
Also how could I process this quickly or as a background process without requiring the user to watch a script churning away?
Using Classic ASP / SQL server 2008.
Please no jibes at the classic asp.
Do you need to do this upload via the ASP application? If not, whatever kind of scripting language you feel most comfortable with, and can do this with the least coding time is the best tool for the job. If you need for users to be able to upload into the classic ASP app and have a reliable process to insert the valid records into the database and reject the invalid ones, your options change.
Do you need to provide feedback to the users? Like telling them exactly which rows were invalid?
If that second scenario is what you're dealing with, I would have the asp app simply store the file, and have another process, a .net service, or scheduled task or something, do the importing and report on its progress in a text file which the asp app can check. That brings you back to doing it in whatever scripting language you are comfortable with, and you don't have to deal with the http request timing out.
If you google "regex valid email" you can find a variety of regular expressions out there for identifying invalid email addresses.
In a former life, I used to do this sort of thing by dragging the file into a working table using DTS and then working that over using batches of SQL commands. Today, you'd use Integration Services.
This allows you to get the data into SQL Server very quickly, and prevent the script timing out, then you can use whatever method you prefer (e.g. AJAX-driven batches, redirection-driven batches, etc.) to work over discreet chunks of the data, or schedule it to run as a single batch (an SQL Server job) and just report on the results.
You might be lucky enough to get your 500K rows processed in a single batch by your upload script, but I wouldn't chance it.
Afternoon,
I'm looking for some advice on a project I am working on.
Presently I have 2 Console Applications (That run 2 separate API's one for the phone system ACD and one of the Call Recorder).
These churn through traffic and store information in a database ready for a client application (Windows Form) to obtain and use.
At present I am storing the information in a SQL Table and using calls to and from the SQL database in both the Console Applications and Windows Form application.
Is this best practice? In my experience under load the SQL Server could slow down and I don't like the idea of relying on DB requests to send the information back.
Is there a better way I could communicate between my Console Applications and my WIndows Form? I was potentially thinking that TCP messages from one to the other on a port would work and then use the Windows Form to use the 'stream' to perform its actions rather than getting the information from the database.
The reason being is ideally I'd like to have the 'database' as stored within memory for quicker retrieval than using the SQL DB.
Any and all advice is fully appreciated. Please give me an explanation as to why your suggestion is efficient as I would like to learn from this rather than just 'do'
Many Thanks!
James
We maintain a Software as a Service (SaaS) web application that sits on top of a multi-tenant SQL Server database. There are about 200 tables in the system, this biggest with just over 100 columns in it, at last look the database was about 10 gigabytes in size. We have about 25 client companies using the application every entering their data and running reports.
The single instance architecture is working very effectively for us - we're able to design and develop new features that are released to all clients every month. Each client experience can be configured through the use of feature-toggles, data dictionary customization, CSS skinning etc.
Our typical client is a corporate with several branches, one head office and sometimes their own inhouse IT software development teams.
The problem we're facing now is that a few of the clients are undertaking their own internal projects to develop reporting, data warehousing and dashboards based on the data presently stored in our multi-tenant database. We see it as likely that the number and sophistication of these projects will increase over time and we want to cater for it effectively.
At present, we have a "lite" solution whereby we expose a secured XML webservice that clients can call to get a full download of their records from a table. They specify the table, and we map that to a purpose-built stored proc that returns a fixed number of columns. Currently clients are pulling about 20 tables overnight into a local SQL database that they manage. Some clients have tens of thousands of records in a few of these tables.
This "lite" approach has several drawbacks:
1) Each client needs to develop and maintain their own data-pull mechanism, deal with all the logging, error handling etc.
2) Our database schema is constantly expanding and changing. The stored procs they are calling have a fixed number of columns, but occasionally when we expand an existing column (e.g. turn a varchar(50) into a varchar(100)) their pull will fail because it suddenly exceeds the column size in their local database.
3) We are starting to amass hundreds of different stored procs built for each client and their specific download expectations, which is a management hassle.
4) We are struggling to keep up with client requests for more data. We provide a "shell" schema (i.e. a copy of our database with no data in it) and ask them to select the tables they need to pull. They invariably say "all of them" which compounds the changing schema problem and is a heavy drain on our resources.
Sorry for the long winded question, but what I'm looking for is an approach to this problem that other teams have had success with. We want to securely expose all their data to them in a way they can most easily use it, but without getting caught in a constant process of negotiating data exchanges and cleaning up after schema changes.
What's worked for you?
Thanks,
Michael
I've worked for a SaaS company that went through a similar exercise some years back and Web Services is the probably the best solution here. incidentally, one of your "drawbacks" is actually a benefit. Customers should be encouraged to do their own data pulls because each customer's needs on timing and amount of data will be different.
Now instead of a LITE solution, you should look at building out a WSDL with separate CRUD calls for each table and good filtering capabilities. Also, make sure you have change times for records on each table. this way a customer can hit each table and immediately pull only the records that have been updated since the last time they pulled.
Will it be easy. Not a chance, but if you want scalability, it's the only route to go.
ood luck.
I am new to servers and online databases, so please bear with me.
I have a question regarding database server communication on mobile devices as follows:
I am currently developing a game application on iOS. I have set up a non-SQL database on Cloudant and I would like to access that data on my iOS device. I have to update multiple database entries each time I complete a round, and I also need to read multiple entries on my database to refresh the leaderboard. I have tried to access multiple entries on Cloudant individually via device before, but most of them returned as timeout.
Thus, right now I have written several PHP scripts on my application server so that my device only needs to access the script once, and do multiple updates on my database or filter through the data I require from Cloudant. However, this means I need an additional server, meaning higher costs. I feel there should be a better or more elegant solution out there, and thus I would like to ask for help from everybody out here. Is it better to do all the updates directly from the device, or to enlist the help of a 3rd party?
Thanks for your time!
For security reasons alone it is necessary to use a server in front of the cloudant database. I assume you don't want every user of your app to be able to access the whole database. Also, the reasons you gave seem valid to me. It's generally a good idea to reduce the number and size of requests for a mobile application. Also, this might allow you to do some caching in the PHP server, ultimately reducing your costs.
My current development project has two aspects to it. First, there is a public website where external users can submit and update information for various purposes. This information is then saved to a local SQL Server at the colo facility.
The second aspect is an internal application which employees use to manage those same records (conceptually) and provide status updates, approvals, etc. This application is hosted within the corporate firewall with its own local SQL Server database.
The two networks are connected by a hardware VPN solution, which is decent, but obviously not the speediest thing in the world.
The two databases are similar, and share many of the same tables, but they are not 100% the same. Many of the tables on both sides are very specific to either the internal or external application.
So the question is: when a user updates their information or submits a record on the public website, how do you transfer that data to the internal application's database so it can be managed by the internal staff? And vice versa... how do you push updates made by the staff back out to the website?
It is worth mentioning that the more "real time" these updates occur, the better. Not that it has to be instant, just reasonably quick.
So far, I have thought about using the following types of approaches:
Bi-directional replication
Web service interfaces on both sides with code to sync the changes as they are made (in real time).
Web service interfaces on both sides with code to asynchronously sync the changes (using a queueing mechanism).
Any advice? Has anyone run into this problem before? Did you come up with a solution that worked well for you?
This is a pretty common integration scenario, I believe. Personally, I think an asynchronous messaging solution using a queue is ideal.
You should be able to achieve near real time synchronization without the overhead or complexity of something like replication.
Synchronous web services are not ideal because your code will have to be very sophisticated to handle failure scenarios. What happens when one system is restarted while the other continues to publish changes? Does the sending system get timeouts? What does it do with those? Unless you are prepared to lose data, you'll want some sort of transactional queue (like MSMQ) to receive the change notices and take care of making sure they get to the other system. If either system is down, the changes (passed as messages) will just accumulate and as soon as a connection can be established the re-starting server will process all the queued messages and catch up, making system integrity much, much easier to achieve.
There are some open source tools that can really make this easy for you if you are using .NET (especially if you want to use MSMQ).
nServiceBus by Udi Dahan
Mass Transit by Dru Sellers and Chris Patterson
There are commercial products also, and if you are considering a commercial option see here for a list of of options on .NET. Of course, WCF can do async messaging using MSMQ bindings, but a tool like nServiceBus or MassTransit will give you a very simple Send/Receive or Pub/Sub API that will make your requirement a very straightforward job.
If you're using Java, there are any number of open source service bus implementations that will make this kind of bi-directional, asynchronous messaging a snap, like Mule or maybe just ActiveMQ.
You may also want to consider reading Udi Dahan's blog, listening to some of his podcasts. Here are some more good resources to get you started.
I'm mid-way through a similar project except I have multiple sites that need to keep in sync over slow connections (dial-up in some cases).
Firstly you need to track changes, if you can use SQL 2008 (even the Express version is enough if the 2Gb limit isn't a problem) this will ease the pain greatly, just turn on Change Tracking on the database and each table. We're using SQL Server 2008 at the head office with the extended schema and SQL Express 2008 at each site with a sub-set of data and limited schema.
Secondly you need to track your changes, Sync Services does the trick nicely and supports using a WCF gateway into the main database. In this example you will need to use the Sync using SQL Express Client sample as a starting point, note that it's based on SQL 2005 so you'll need to update it to take advantage of the Change Tracking features in 2008. By default the Sync Services uses SQL CE on the clients, which I'm sure isn't enough in your case. You'll need a service that runs on your Web Server that periodically (could be as often as every 10 seconds if you want) runs the Synchronize() method. This will tell your main database about changes made locally and then ask the server for all changes made there. You can set up the get and apply SQL code to call stored procedures and you can add event handlers to handle conflicts (e.g. Client Update vs Server Update) and resolve them accordingly at each end.
We have a shop as a client, with three stores connected to the same VPN
Two of the shops have a computer running as a "server" for that shop and the the third one has the "master database"
To synchronize all to the master we don't have the best solution, but it works: there is a dedicated PC running an application that checks the timestamp of every record in every table of the two stores and if it is different that the last time you synchronize, it copies the results
Note that this works both ways. I.e. if you update a product in the master database, this change will propagate to the other two shops. If you have a new order in one of the shops, it will be transmitted to the "master".
With some optimizations you can have all the shops synchronize in around 20minutes
Recently I have had a lot of success with SQL Server Service Broker which offers reliable, persisted asynchronous messaging out of the box with very little implementation pain.
It is quick to set up and as you learn more you can use some of the more advanced features.
Unknown to most, it is also part of the desktop editions so it can be used as a workstation messaging system
If you have existing T-SQL skills they can be leveraged as all the code to read and write messages is done in SQL
It is blindingly fast
It is a vastly under-hyped part of SQL Server and well worth a look.
I'd say just have a job that copies the data in the pub database input table into a private database pending table. Then once you update the data on the private side have it replicated to the public side. If you don't have any of the replicated data on the public side updated it should be a fairly easy transactional replication solution.