Database security / scaling question

Database security / scaling question - database

Typically I use a database such as MySQL or PostGreSQL on the same machine as the application using it, which makes access easy and secure. I'm just now building the first site that will have a separate physical database server (later this year it will). I'm wondering 3 things:
(security) What things should I look into for starters pertaining to security of accessing a separate machine's database?
(scalability) Are their scalability issues that I should think about pertaining to this (technology agnostic)?
(more ServerFaultish but related) If starting the DB out on the same physical server (using a separate VMWare VM) and later moving to a different physical server, are there implicit problems that I'll have to deal with? Isn't another VM still accessed via localhost?
If these questions are completely ludicrous, I apologize to you DB experts.

Easy, I'll grant you. Secure.. well, security has very little to do with the physical location of the database server.
To get to your three questions though:
First, look at how you can limit access to database tables using the database servers security model. Namely, if your application does not need to drop tables, make sure the user it uses to connect does not have that ability. Second, look into how to encrypt the connection between the database server and your application. In windows this is pretty transparent through kerberos and can even be enforced by group policy settings, not sure about other platforms. Third, look into what features the database has for encrypting the data "at rest". Meaning, does it natively support encryption of the actual data files themselves?
The point here is that your application is only one possible entry point to the database server itself. Ask yourself, what would happen if someone can connect directly without going through your application using your apps credentials. Next ask, what can happen if they find a SQL Injection issue.. Also, ask yourself, what information can be gleaned if someone is able to monitor the IP traffic going between your app and the server. Can they discern any data? Finally, ask yourself, what if they get a copy of the database itself?
The lengths you go for #1 is going to be dependent on several factors such as How valuable is the data (eg: what would happen to you, your company, or your clients if it was lost); and, How much time do you have to come up with an ideal solution?
scalability: This is purely a function of load. Unfortunately, the only way to scale most database applications is to scale up. Meaning that you acquire a larger database server as the need arises. Stack Overflow went through this not too long ago. Some database types (nosql, mongodb, etc) support a concept known as shredding or sharding. MySql, PostGreSql, etc don't. Instead you'll have to specifically design the app to handle it. Which means not using things like auto incrementing keys, etc. This can be a royal PITA... which is why scaling up is a much easier prospect depending on your application.
Another VM is not accessible via "localhost". localhost defines access to your current server. Whether that server is a VM or not is immaterial. You'll have to reference your database server by name. Now, transitioning the database VM to another physical server should have zero impact as your are referencing it by name. Beyond that there aren't any other considerations.

In addition to Chris's valid response,
Security
Use a security mechanism on the network in addition to whatever security features the database or app framework provides. Perhaps this is a simple as firewalling the network, running IPSEC, or over an ssl tunnel. The point is that you shouldn't assume the DB authors are network security experts, or that the DB authentication mechanism has even addressed network security at all.
Scalability
One scalability issue comes to mind when moving from local to remote dbs. Remote TCP/IP communication is much slower than local pipe communication. Your app may have hidden scalability issues due to frequent round-trips to the DB. Between each query, your app waits for each DB response in succession. On a local system, the latency is so small you may not have noticed it.

Related

MS Access connected to SQL Server backend over wan?

I need to develop a database that will have users scattered around the world (around 50 users total, but not simultaneous). I have two questions:
1) I know that with an Access front end connected to an Access back end I risk corruption over a wan, but is that still a risk using sql server?
2) If corruption is not a risk, will performance still make Access an undesirable choice for a front end?
Users will primarily be adding anywhere from 200-800 new records at a time.
Any advice on this would be much appreciated.

The team I work with has taken a variety of approaches to solving the problem you describe. Web apps connecting to a SQL Server back end, .NET desktop apps connection to a SQL Server back end, and MS Access apps, connecting to both SQL Server and MS Access back ends.
In terms of your two specific questions, here are my thoughts based on recent experience.
Question 1) I know that with an Access front end connected to an Access back end I risk corruption over a wan, but is that still a risk using SQLserver?
Answer 1) No. Unlike Access, SQL Server is a robust multi-user database management system. It handles multi-user access and preserves database integrity. You can build your app by creating linked tables in your Access database pointing to their counterpart tables in your SQL Server database. After that, code your Access application as you normally would.
Question 2) If corruption is not a risk, will performance still make Access an undesirable choice for a front end?
Answer 2) My experience is that when accessing a SQL Server back end using linked tables in Access, performance optimization is difficult. Mainly because Access makes decisions about how much data to pull across the network and when to do that. You can't control those things programatically yourself in the same ways you can with the data access objects available in web or .NET desktop apps. Access databases are also logistically more difficult to deploy and maintain (particularly in comparison to a web app), and they are dependent on the version of Access that is installed on your users' workstations.
Hope it helps. Good luck with your project.

You should not be worried about corrupted data when using SQL server as the back-end. Though you should still be concerned with backups, admin requirements, etc.
Re: Access as a front end. Actually works fairly well as long as you ignore deployment issues. Pushing out updates to multiple workstations can be a real pain, and the Access App itself is pretty finicky to install.
Lots of people write a web-app these days for applications just like this. If you don't want to write a web-app, a dot.net based smart client app is a popular choice too.
What is likely important is what expertise is available, how good is the support, how much it costs, etc. Pretty much any popular technology stack will work with enough TLC.
Also don't overlook getting a canned application that already does what you need.

Database synchronization

Recently my clients have asked me if they can use they’re application remotely, disconnected from the local network and the company server.
One solution is to place the database in the cloud, but a connection to the database, and the cloud and an internet connection must be always available.
There not always the case.
So my question is - Is there any database sync system, or a synchronization library so that I can work disconnected with local database and when I connect synchronize the changes I have made and receive changes others have made?
Update:
The application is under Windows (7/xp) ( for now )
It's in Delphi 2007 win32
All client need to have Read/Write access
All Clients have internet connection, but not always ON
Security is not critical, but the Sync service should encrypt the communication
When in the presence of the companies network the system should sync and use the Server Database and not the local one.

You have a host of issues with thinking about such a solution. First, there are lots of possible solutions, such as:
Using database replication within a database, to mimic every update (like a "hot" backup)
Building an application to copy the database periodically (every night)
Using a third-party tool (which is what you are asking, I think)
With replication services, the connection does not have to always be up. Changes to the database are logged when the connection is not available and then applied when they can be sent.
However, there are lots of other issues when you leave a corporate network. What about security of the data and access rights? Do you have other options, such as making it easier to access the database from within the network? Do the users need only read-access to the database or read-write access? Would both versions need to be accessed at the same time. Would there be updates to both at the same time?
You may have other options that are more secure than just moving a database to the cloud.

I believe RemObjects DataAbstract allows offline mode and synchronization by using what they call Briefcases. All your other requirements (security, encrypted connections, etc.) are also covered.
This is not a drop-in replacement, thought, and may need extensive rewrite/refactoring of your application. There are lots of upsides, thought; business rules can/should be enforced on the server (real security), scriptable business rules, multiplatform architecture, etc.

There are some products available in the Java world (SymmetricDS lgpl license) - apart from actually being a working system it is documents how it achieved synchronization. Connects to any db with jdbc support. . There is a pro version but the user guide (downloadable pdf) gives you the db schema plus rules on push pull syncing. Useful if you want to build your own.
Btw there is a data replication so tag that would help.

One possibility that is free is the Microsoft Sync Framework: http://msdn.microsoft.com/en-us/sync/bb736753.aspx
It may be possible for you to use it, but you would need to provid some more detail about your application and operating environment to be sure.

IS it possible to share a database like .mdb and work fine? i try but sometimes the file where the databse is changes from DB to DB1 i use delphi Xe4 and Google Drive .
Thank´s

Which server platform to choose: SQL Azure or Hosted SQL Server for new project

We're getting ready to build a new platform for our current system. Currently we install sql server express locally to all our clients and all their data is stored there. While the process works pretty good, it's still a pain to add columns/tables etc. We also want to have our data available outside of the local install. So we're moving to a central web based sql database and creating a web based application. Our new application will be a Silverlight 5, wcf ria services, mvvm, entity framework application
We've decided that either a web hosted sql server database or sql azure database are the way to go. However, I have no idea why I would choose one over the other. The limitations of azure don't seem to apply to us, but our application will be run on our current shared web host. Is it better to host the application on the same server as the database? Do we even know with shared web hosting that the server is on the same location as the app? There's also the marketing advantage of being 'in the cloud' which our clients love when we drop that word (they have no idea about anything technical, it's just a buzzword for them). I'm not too worried about the cost as I think both will ultimately be about the equivalent of each other.
I feel like I may be completely overthinking this and either will work, however I'd like to try and get the best solution for us and don't want to choose without getting some feedback.
In case it helps, our application is mostly dashboard/informational data. Mostly financial and trending data. It's almost entirely read only. Sometimes the data can get fairly large and we would be sending upwards of 50,000 rows of data to the application.
Thanks for any help/insight you can provide for me!

The main concerns I would have with using a SQL Azure DB from an application on your current shared web host would be
The effect of network latency: Depending on location, every time you do a DB round trip from your application to the SQL Azure DB you will incur a 50-100ms delay. If your application does lots of round trips, this will mount up. Often, if an application has been designed to work with a DB on the LAN (you use of local client DBs suggests this) the they tend to get "chatty" since network delays are very small on the LAN. You may find your application slows down significantly.
Security: You will have to open up the SQL Azure firewall to the IP address(es) that your application presents when querying. Depending on your host, it may be that this IP address is shared between several tenants. This would be a vulnerability.
If neither of these is a problem, then SQL Azure will provide a much lower management overhead (e.g. no need to patch etc.) and will give you very high reliability, especially in terms of the risk of data loss.

Caching to a local SQL instance on a web server

I run a very high traffic(10m impressions a day)/high revenue generating web site built with .net. The core meta data is stored on a SQL server. My team and I have a unique caching strategy that involves querying the database for new meta data at regular intervals from a middle tier server, serializing the data to files and sending those to the web nodes. The web application uses the data in these files (some are actually serialized objects) to instantiate objects and caches those in memory to use for real time requests.
The advantage of this model is that it:
Allows the web nodes to cache all data in memory and not incur any IO overhead querying a database.
If the database ever goes down either unexpectedly or for maintenance windows, the web servers will continue to run and generate revenue. You can even fire up a web server without having to retrieve its initial data from the DB because all the data it needs are in files on its own disks.
Allows us to be completely horizontally scalable. If throughput suffers, we can just add a web server.
The disadvantages are that this caching and persistense layers adds complexity in the code that queries the database, packages the data and unpackages it on the web server. Any time our domain model requires us to add entities, more of this "plumbing" has to be coded. This architecture has been in place for four years and there are probably better ways to tackle this.
One strategy I have been considering is using replication to replicate our master sql server database to local database instances installed on each web server. The web server application would use normal sql/ORM techniques to instantiate objects. Here, we can still sustain a master database outage and we would not have to code up specialized caching code and could instead use nHibernate to handle the persistence.
This seems like a more elegant solution and would like to see what others think or if anyone else has any alternatives to suggest.

I think you're overthinking this. SQL Server already has mechanisms available to you to handle these kinds of things.
First, implement a SQL Server cluster to protect your main database. You can fail over from node to node in the cluster without losing data, and downtime is a matter of seconds, max.
Second, implement database mirroring to protect from a cluster failure. Depending on whether you use synchronous or asynchronous mirroring, your mirrored server will either be updated in realtime or a few minutes behind. If you do it in realtime, you can fail over to the mirror automatically inside your app - SQL Server 2005 & above support embedding the mirror server's name in the connection string, so you don't even have to lift a finger. The app just connects to whatever server's live.
Between these two things, you're protected from just about any main database failure short of a datacenter-wide power outage or network outage, and there's none of the complexity of the replication stuff. That covers your high availability issue, and lets you answer the scaling question separately.
My favorite starting point for scaling is using three separate connection strings in your application, and choose the right one based on the needs of your query:
Realtime - Points directly at the one master server. All writes go to this connection string, and only the most mission-critical reads go here.
Near-Realtime - Points at a load balanced pool of read-only SQL Servers that are getting updated by replication or log shipping. In your original design, these lived on the web servers, but that's dangerous practice and a maintenance nightmare. SQL Server needs a lot of memory (not to mention money for licensing) and you don't want to be tied into adding a database server for every single web server.
Delayed Reporting - In your environment right now, it's going to point to the same load-balanced pool of subscribers, but down the road you can use a technology like log shipping to have a pool of servers 8-24 hours behind. These scale out really well, but the data's far behind. It's great for reporting, search, long-term history, and other non-realtime needs.
If you design your app to use those 3 connection strings from the start, scaling is a lot easier, and doesn't involve any coding complexity - just pick the right connection string.

Have you considered memcached? Since it is:
in memory
can run locally
fully scalable horizontally
prevents the need to re-cache on each web server
It may fit the bill. Check out Google for lots of details and usage stories.

Just some addition to what RickNZ proposed above..
Since your master data which you are caching currently won't change so frequently and probably over some maintenance window, here is what should you do first on database side:
Create a SNAPSHOT replication for the master tables which you want to cache. Adding new entities will be equally easy.
On all the webservers, install SQL Express and subscribe to this Publication.
Since, this is not a frequently changing data, you can rest assure, no much server resource usage issue minus network trips for master data.
All your caching which was available via previous mechanism is still availbale minus all headache which comes when you add new entities.
Next, you can leverage .NET mechanisms as suggested above. You won't face memcached cluster failure unless your webserver itself goes down. There is a lot availble in .NET which a .NET pro can point out after this stage.

It seems to me that Windows Server AppFabric is exactly what you are looking for. (AKA "Velocity"). From the introductory documentation:
Windows Server AppFabric provides a
distributed in-memory application
cache platform for developing
scalable, available, and
high-performance applications.
AppFabric fuses memory across multiple
computers to give a single unified
cache view to applications.
Applications can store any
serializable CLR object without
worrying about where the object gets
stored. Scalability can be achieved by
simply adding more computers on
demand. The cache also allows for
copies of data to be stored across the
cluster, thus protecting data against
failures. It runs as a service
accessed over the network. In
addition, Windows Server AppFabric
provides seamless integration with
ASP.NET that enables ASP.NET session
objects to be stored in the
distributed cache without having to
write to databases. This increases
both the performance and scalability
of ASP.NET applications.

Have you considered using SqlDependency caching?
You could also write the data to the local disk at the web tier, if you're concerned about initial start-up time or DB outages. But at least with a SqlDependency, you shouldn't have to poll the DB to look for changes. It can also be made relatively transparent.
In my experience, adding a DB instance on web servers generally doesn't work out too well from a scalability or performance perspective.
If you're concerned about performance and scalability, you might consider partitioning your data tier. The specifics depend on your app, but as an example, you could move read-only data onto a couple of SQL Express servers that are populated with replication.
In case it helps, I talk about this subject at length in my book (Ultra-Fast ASP.NET).

Why is it not advisable to have the database and web server on the same machine?

Listening to Scott Hanselman's interview with the Stack Overflow team (part 1 and 2), he was adamant that the SQL server and application server should be on separate machines. Is this just to make sure that if one server is compromised, both systems aren't accessible? Do the security concerns outweigh the complexity of two servers (extra cost, dedicated network connection between the two, more maintenance, etc.), especially for a small application, where neither piece is using too much CPU or memory? Even with two servers, with one server compromised, an attacker could still do serious damage, either by deleting the database, or messing with the application code.
Why would this be such a big deal if performance isn't an issue?

Security. Your web server lives in a DMZ, accessible to the public internet and taking untrusted input from anonymous users. If your web server gets compromised, and you've followed least privilege rules in connecting to your DB, the maximum exposure is what your app can do through the database API. If you have a business tier in between, you have one more step between your attacker and your data. If, on the other hand, your database is on the same server, the attacker now has root access to your data and server.
Scalability. Keeping your web server stateless allows you to scale your web servers horizontally pretty much effortlessly. It is very difficult to horizontally scale a database server.
Performance. 2 boxes = 2 times the CPU, 2 times the RAM, and 2 times the spindles for disk access.
All that being said, I can certainly see reasonable cases that none of those points really matter.

It doesn't really matter (you can quite happily run your site with web/database on the same machine), it's just the easiest step in scaling..
It's exactly what StackOverflow did - starting with single machine running IIS/SQL Server, then when it started getting heavily loaded, a second server was bought and the SQL server was moved onto that.
If performance is not an issue, do not waste money buying/maintaining two servers.

On the other hand, referring to a different blogging Scott (Watermasyck, of Telligent) - they found that most users could speed up the websites (using Telligent's Community Server), by putting the database on the same machine as the web site. However, in their customer's case, usually the db & web server are the only applications on that machine, and the website isn't straining the machine that much. Then, the efficiency of not having to send data across the network more that made up for the increased strain.

Tom is correct on this. Some other reasons are that it isn't cost effective and that there are additional security risks.
Webservers have different hardware requirements than database servers. Database servers fare better with a lot of memory and a really fast disk array while web servers only require enough memory to cache files and frequent DB requests (depending on your setup). Regarding cost effectiveness, the two servers won't necessarily be less expensive, however performance/cost ratio should be higher since you don't have to different applications competing for resources. For this reason, you're probably going to have to spend a lot more for one server which caters to both and offers equivalent performance to 2 specialized ones.
The security concern is that if the single machine is compromised, both webserver and database are vulnerable. With two servers, you have some breathing room as the 2nd server will still be secure (for a while at least).
Also, there are some scalability benefits since you may only have to maintain a few database servers that are used by a bunch of different web applications. This way you have less work to do applying upgrades or patches and doing performance tuning. I believe that there are server management tools for making these tasks easier though (in the single machine case).

I would think the big factor would be performance. Both the web server/app code and SQL Server would cache commonly requested data in memory and you're killing your cache performance by running them in the same memory space.

Security is a major concern. Ideally your database server should be sitting behind a firewall with only the ports required to perform data access opened. Your web application should be connecting to the database server with a SQL account that has just enough rights for the application to function and no more. For example you should remove rights that permit dropping of objects and most certainly you shouldn't be connecting using accounts such as 'sa'.
In the event that you lose the web server to a hijack (i.e. a full blown privilege escalation to administrator rights), the worst case scenario is that your application's database may be compromised but not the whole database server (as would be the case if the database server and web server were the same machine). If you've encrypted your database connection strings and the hacker isn't savvy enough to decrypt them then all you've lost is the web server.

One factor that hasn't been mentioned yet is load balancing. If you start off thinking of the web server and the database as separate machines, you optimize for fewer network round trips and also it gets easier to add a second web server or a second database engine as needs increase.

I agree with Daniel Earwicker - the security question is pretty much flawed.
If you have a single box setup with a webserver and only the database for that webserver on it, if that webserver is compromised you lose both the webserver and only the database for that specific application.
This is exactly the same as what happens if you lose the webserver on a 2-server setup. You lose the web server, and just the database for that specific application.
The argument that 'the rest of the DB server's integrity is maintained' where you have a 2-server setup is irrelevant, because in the first scenario, every other database server relating to every other application (if there are any) remain unaffected as well - being, as they are, hosted elsewhere.
Similarly, to the question posed by Kev 'what about all the other databases residing on the DB server? All you've lost is one database.'
if you were hosting an application and database on one server, you would only host databases on that server which related to that application. Therefore, you would not lose any additional databases in a single server setup when compared to a multiple server setup.
By contrast, in a 2 server setup, where the attacker had access to the Web Server, and by proxy, limited rights (in the best case scenario) to the database server, they could put the databases of every other application at risk by carrying out slow, memory intensive queries or maximising the available storage space on the database server. By separating the applications out into their own concerns, very much like virtualisation, you also isolate them for security purposes in a positive way.

I can speak from first hand experience that it is often a good idea to place the web server and database on different machines. If you have an application that is resource intensive, it can easily cause the CPU cycles on the machine to peak, essentially bringing the machine to a halt. However, if your application has limited use of the database, it would probably be no big deal to have them share a server.

Wow, No one brings up the fact that if you actually buy SQL server at 5k bucks, you might want to use it for more than your web application. If your using express, maybe you don't care. I see SQL servers run Databases for 20 to 30 applicaitions, so putting it on the webserver would not be smart.
Secondly, depends on whom the server is for. I do work for financial companies and the govt. So we use a crazy pain in the arse approach of using only sprocs and limiting ports from webserver to SQL. So if the web app gets hacked. The only thing the hacker can do is call sprocs as the user account on the webserver is locked down to only see/call sprocs on the DB. So now the hacker has to figure out how to get into the DB. If its on the web server well its kind of easy to get to.

It depends on the application and the purpose. When high availability and performance is not critical, it's not bad to not to separate the DB and web server. Especially considering the performance gains - if the appliation makes a large amount of database queries, a considerable amount of network load can be removed by keeping it all on the same system, keeping the response times low.

I listened to that podcast, and it was amusing, but the security argument made no sense to me. If you've compromised server A, and that server can access data on server B, then you instantly have access to the data on server B.

I think its because the two machines usually would need to be optimized in different ways. Other than that I have no idea, we run all our applications with the server-database on the same machine - granted we're not public facing - but we've had no problems.
I can't imagine that too many people care about one machine being compromised over both since the web application will usually have nearly unrestricted access to at the very least the data if not the schema inside the database.
Interested in what others might say.

Database licences are not cheep and are often charged per CPU, therefore by separating out your web-servers you can reduce the cost of your database licences.
E.g if you have 1 server doing both web and database that contains 8 CPUs you will have to pay for an 8 cpu licence. However if you have two servers each with 4 CPUs and runs the database on one server you will only have to pay for a 4 cpu licences

An additional concern is that databases like to take up all the available memory and hold it in reserve for when it wants to use it. You can force it to limit the memory but this can considerably slow data access.

Something not mentioned here, and the reason I am facing, is 0 downtime deployments. Currently I have DB/webserver on same machine and that makes updates a pain. If you they are on a seprate machine, you can perform A/B releases.
I.e.:
The DNS currently points to WebServerA
Apply sofware updates to WebServerB
Change DNS to point to WebServerB
Work on WebServerA at leisure for the next round of updates.
This works before the state is stored in the DB, on a separate server.

Arguing that there is a real performance gain to be had by running a database server on a web server is a flawed argument.
Since Database servers take query strings and return result sets, the data actually flowing from data server to web server is relatively small, but the horsepower required to process the query and generate the result set is relatively large. Optimizing performance around the data transfer time therefore is optimizing around the wrong thing.
Regarding security, there are advantages to having the data server on a different box than the web server. Having such a setup is not the be all and end all of security, but it is a step in the right direction.
Regarding scalability, it is easy and relatively cheap to add web servers and put them into cluster to handle increased traffic. It is not so easy and cheap to add data servers and cluster them. Also, web servers and data servers have different hardware needs, so multiple boxes help out with scalability.
If you are starting small and have only one box, then a good way would go would be to use virtual machines. Running the web server and data server in different VMs on one host gives you all the gains of separate boxes at the cost of one large box price.

Operating system is another consideration. While your database may require larger memory spaces and therefore UNIX, your web server - or more specifically your app server since you mention only two tiers - may be a .Net-based, and therefore require Windows.

Ok! Here is the thing, it is more Secure to have your DB Server installed on another Machine and your Application on the Web Server. You then connect your application to the DB with a Web Link. Thanks it.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight