Suggestions for a hosted database - database

I would like to have a SQL database online, but don't want to deal with its care and feeding. There are some commercial offerings out there for hosted DBs, for example Amazon SimpleDB. Can anybody suggest others, and if they used any of these services what their impressions were? Anything that helps me make an informed decision would be appreciated.
Edit: Since there's no one true answer, I've made this a community wiki.

Did you take a look the Amazon Relational Database Service. It is a MySql instance, and it is priced in a similar fashion to the EC2 products.

Google's AppEngine also has a SQL Database: http://code.google.com/appengine that is free, but it doesn't scale very well.
Amazon's SimpleDB is lacking a large chunk of the MySQL API, so if you want to go this route try and stick to SQL92 as much as possible. Also, keep in mind that you are changed per query. This means you want to make every query count. One way of doing that is by using relative updates:
UPDATE persondata SET age=age+1;
To be honest SimpleDB is a waste of money unless you need a large SQL cluster. I'd start off with a local sql db, when your load starts to get out of hand, move the sql db to its own server. After that, you will be looking at clustering your SQL db, and then SimpleDB starts to become an attractive solution.

Related

GCP Storage for large temporary data

I'm using a Cloud SQL instance to store two types of data: typical transactional data and large "read-only" data. Each of these read-only tables could have GBs of data and they work like snapshots that are refreshed once a day. The old data is totally replaced by the most recent data. The "read-only" tables reference data from the "transactional tables", but I don't necessarily need to perform joins between them, so they're kind of "independent".
In this context, I believe using Cloud SQL to store these kind of tables are going to be a problem in terms of billing. Because Cloud SQL is fully managed, I would be paying for maintenance work from Google and I wouldn't need any kind of maintenance for those specific tables.
Maybe there are databases more suitable for storing snapshot/temporary data. I'm considering to move those type of tables to another kind of storage, but it's possible that I would end up making the bill even higher. Or maybe I could continue using Cloud SQL for those tables and just unlog them.
Can anyone help me with this? Is there any kind of storage in GCP that would be great for storing large snapshots that are refreshed once a day? Or is there an workaround to make Cloud SQL not maintain those tables?
This is a tough question because there are a lot of options and a lot of things that could work. The GCP documentation page "Choosing a Storage Option" is very handy in this kind of cases. It has a flowchart to select a storage option based on the kind of data you want to store, a video that explains each storage option and a table with the description, strong points and use cases for each option. I would recommend to start there.
Also, if the issue with Cloud SQL is that is fully managed and pricy, you can set up MySQL on Google Compute Engine and manage it yourself. Is also fairly cheaper for the same machine. For a n1-standard-1, $0.0965 in Cloud SQL and $0.0475 in GCE (keep in mind that other charges may apply on top of the machine price)

Columnar Database Comparisons and DBA efforts

I'm trying to find a database solution and I came across Infobright and Amazon Redshift as potential solutions. Both are columnar databases. Infobright has been around for quite sometime whereas Amazon Redshift is newer.
What is the DBA effort between Infobright and Amazon Redshift?
How accessible is Infobright (API, query interface, etc.) vs AWS?
Where do both sit in your system architecture? Do the operate as a layer on top of your traditional RDBMS?
What is the DevOps effort to setting up both Infobright and Redshift?
I'm leaning a bit more towards Redshift because my application is hosted on AWS and I thought this would create tangible benefits in the long-run since everything is in AWS. Thank you in advance!
Firstly, I'll admit that I work for Infobright. I've done significant research into Redshift, and I feel I can give an honest opinion. I just wrote up a comparison between the two technologies; it can be found here: https://www.infobright.com/wp-content/plugins/download-monitor/download.php?id=37
DBA Effort - Infobright requires very little administration. You cannot index; you don't need to partition/etc. It's an SMP architecture and scales well. Thus, you won't be dealing with multiple nodes. Redshift is also fairly simple. You will need to maintain sorts as well as ensure Analyse is run enough.
Infobright uses a MySQL Shell. Thus, any tool that can utilize MySQL can utilize Infobright. Therefore, you have the same set of tools/interfaces/APIs for Infobright as you do with MySQL. AWS does have an SQL interface, and it does have some API capabilities. It does require that you load directly from S3. Infobright loads from flat files and named pipes from local or remote servers.
Both databases are analytic databases. You would not want to use either as a transactional database. Instead, you typically push data from your transactional system to your analytic database.
DevOps to setup Infobright will be lower than Redshift. However, Redshift is not that overly complicated either. Maintenance of the environment is more of a requirement for Redshift, though.
Infobright does have many AWS-specific installations. In fact, we have implementations that approach nearly 100TB of raw storage on one server. That said, Redshift with many nodes can achieve petabyte scale on an implementation.
There are other factors that can impact your choice. For example, Redshift has very nice failover/HA options already built-in. On the flipside, Infobright can support many concurrent queries and users; Redshift limits queries to 15 regardless of cluster size.
Take a look at the document, and feel free to contact me if you have any specific questions about either technology.

Cloud/hosted database/datastore services to replace local SQL Server instance

As a .NET web developer, I've always used SQL Server as my database store because it's already in the MSFT ecosystem and easy to work with from the .NET platform.
Recently, however, I had a computer almost literally blow up, and consequently lost all my data in SQL Server on that machine.
Now that I've got a new computer, I want to start using an off-site database so that this doesn't happen again. A database hosted by a third-party (i.e. hosting company) or cloud service.
It doesn't have to be SQL Server or even RMDBS necessarily, but if it's not, it'd be be something cutting-edge (e.g. redis, Cassandra, MongoDB, CouchDB, etc.) and not just MySQL or Postgre or something.
Does anyone have an recommendations for those with little financial means?
I'd like to be able to use it during development of projects, and if they ever go live, not have to migrate the data anywhere to a new service--keep the data right there where it is and point my live domain requiring the data to the same service it pointed while in development.
It's not so much a question of available hosted services as of what setup you want for your standard development environment. If one of the cloud datastores doesn't work for you, you can always get a virtual server and install whatever you need.
However, you may want to rethink the idea of putting dev databases in the cloud. Performance will not be as good as something running locally (particularly if you are working with things like bulk import), and turning a dev database into a production database isn't a particularly good idea. I think what you are really looking for is a combination of easy backup, schema management and data setup.
Backup on a live server is easy enough - either you are backing up the entire server or have a script that uploads the backup file somewhere. For dev I don't bother as I prefer to set up disposable environments - have code that can set up the database if it doesn't already exist and add any necessary default data. Most apps don't need much data unless there is some sort of import process involved, and the same code works quite nicely when you first set up the live environment.
Schema management is one of the more painful aspects of working with SQL and where NoSQL systems can make life a lot easier as most have the schema defined entirely by the code that is using it - I mostly use redis myself, but whether or not it is appropriate for you will depend on the type of project you work on - if you need a lot of joins or transactions you probably need SQL, but if you just need basic data storage most NoSQL platforms would be better.
May I suggest looking into Windows Azure table storage? It is quiet different from pure relational play of SQL Server, is the "next big thing" from Microsoft and is in general a somewhat of a paradigm shift for folks used to relational databases.
If you're ever going to come face to face with Azure in the future (and I suspect many .NET people will), it maybe a beneficial of an experience to have.
With respect to costs, they're negligible for individual use. 10,000 transactions a month cost a penny. A gigabyte per month of storage costs 15 cents, and data transfers are 10-15cents per gigabyte.
If you have only "development" projects that store their data in the cloud, I'll be damned if you pay more than $2-3/month to MS... if that :)
Google Cloud Datastore is in beta now and could be a good option for you. It's free up to 1GB and 50K requests per day. The API is rather low level. However, I wrote a high level ORM for GCD called Pogo that serializes and deserializes plain old objects into GCD entities.
Take a look at the documentation and open source here - http://code.thecodeprose.com/pogo
It's also available on Nuget called "Pogo".

How would you migrate hundreds of MS Access databases to a central service?

We have literally 100's of Access databases floating around the network. Some with light usage and some with quite heavy usage, and some no usage whatsoever. What we would like to do is centralise these databases onto a managed database and retain as much as possible of the reports and forms within them.
The benefits of doing this would be to have some sort of usage tracking, and also the ability to pay more attention to some of the important decentralised data that is stored in these apps.
There is no real constraints on RDBMS (Oracle, MS SQL server) or the stack it would run on (LAMP, ASP.net, Java) and there obviously won't be a silver bullet for this. We would like something that can remove the initial grunt work in an automated fashion.
We upsize (either using the upsize wizard or by hand) users to SQL server. It's usually pretty straight forward. Replace all the access tables with linked tables to the sql server and keep all the forms/reports/macros in access. The investment in access isn't lost and the users can keep going business as usual. You get reliability of sql server and centralized backups. Keep in mind - we’ve done this for a few large access databases, not hundreds. I'd do a pilot of a few dozen and see how it works out.
UPDATE:
I just found this, the sql server migration assitant, it might be worth a look:
http://www.microsoft.com/sql/solutions/migration/default.mspx
Update: Yes, some refactoring will be necessary for poorly designed databases. As for how to handle access sprawl? I've run into this at companies with lots of technical users (engineers esp., are the worst for this... and excel sprawl). We did an audit - (after backing up) deleted any databases that hadn't been touched in over a year. "Owners" were assigned based the location &/or data in the database. If the database was in "S:\quality\test_dept" then the quality manager and head test engineer had to take ownership of it or we delete it (again after backing it up).
Upsizing an Access application is no magic bullet. It may be that some things will be faster, but some types of operations will be real dogs. That means that an upsized app has to be tested thoroughly and performance bottlenecks addressed, usually by moving the data retrieval logic server-side (views, stored procedures, passthrough queries).
It's not really an answer to the question, though.
I don't think there is any automated answer to the problem. Indeed, I'd say this is a people problem and not a programming problem at all. Somebody has to survey the network and determine ownership of all the Access databases and then interview the users to find out what's in use and what's not. Then each app should be evaluated as to whether or not it should be folded into an Enterprise-wide data store/app, or whether its original implementation as a small app for a few users was the better approach.
That's not the answer you want to hear, but it's the right answer precisely because it's a people/management problem, not a programming task.
Oracle has a migration workbench to port MS Access systems to Oracle Application Express, which would be worth investigating.
http://apex.oracle.com
So? Dedicate a server to your Access databases.
Now you have the benefit of some sort of usage tracking, and also the ability to pay more attention to some of the important decentralised data that is stored in these apps.
This is what you were going to do anyway, only you wanted to use a different database engine instead of NTFS.
And now you have to force the users onto your server.
Well, you can encourage them by telling them that you aren't going to overwrite their data with old backups anymore, because now you will own the data, and you won't do that anymore.
Also, you can tell them that their applications will run faster now, because you are going to exclude the folder from on-access virus scanning (you don't do that to your other databases, which is why they are full of sql-injection malware, but these databases won't be exposed to the internet), and planning to turn packet signing off (you won't need that on a dedicated server: it's only for people who put their file-share on their domain-server).
Easy upgrade path, improved service to users, greater centralization and control for IT. Everyone's a winner.
Further to David Fenton's comments
Your administrative rule will be something like this:
If the data that is in the database is just being used by one user, for their own work (alone), then they can keep it in their own network share.
If the data that is in the database is for being used by more than one person (even if it is only two), then that database must go on a central server and go under IT's management (backups, schema changes, interfaces, etc.). This is because, someone experienced needs to coordinate the whole show or we will risk the time/resources of the next guy down the line.

When is it time to change database backends?

Is there a general rule of thumb to follow when storing web application data to know what database backend should be used? Is the number of hits per day, number of rows of data, or other metrics that I should consider when choosing?
My initial idea is that the order for this would look something like the following (but not necessarily, which is why I'm asking the question).
Flat Files
BDB
SQLite
MySQL
PostgreSQL
SQL Server
Oracle
It's not quite that easy. The only general rule of thumb is that you should look for another solution when the current one can't keep up anymore. That could include using different software (not necessarily in any globally fixed order), hardware or architecture.
You will probably get a lot more benefit out of caching data using something like memcached than switching to another random storage backend.
If you think you are going to ever need one of the heavyweights (SqlServer, Oracle), you should start with one of those at the beginning. Data migrations are extremely difficult. In the long run it will cost you less to just start at the top and stay there.
I think you're being overly specific in your rankings. You can pretty much start with flat files and the like for very small data sets, go up to something like DBM for slightly bigger ones that don't require SQL-like syntax, and go to some kind of SQL database after that.
But who wants to do all that rewriting? If the application will benefit from access to joins, stored procedures, triggers, foreign key validation, and the like--just use a SQL database regardless of the dataset size.
Which one should depend more on the client's existing installations and what DBA skills are available than on the amount of data you're holding.
In other words, the size of your database is far from the only consideration, and maybe not the most important one.
There is no blanket answer to this, but ALMOST always, using flat files is not a good idea. You have to parse through them (i suppose) and they do not scale well. Starting with a proper database, like Oracle or SQL Server (or MySQL, Postgres if you are looking for free options) is a good idea. For very little overhead, you will save yourself a lot of effort and headache later on. They also allow you to structure your data in a non-stupid fashion, leaving you free to think of WHAT you will do with the data rather than HOW you will be getting it in/out.
It really depends on your data, and how you intend to use it. At one of my previous positions, we used Postgres due to the native geo-location and timezone extensions which existed because it allowed us to manage our data using polygonal datatypes. For us, we needed to do that, and we also wanted to use stored procedures, views and the like.
Now, another place I worked at used MySQL simply because the data was normalized, standard row by row data.
SQL Server, for a long time, had a 4gb database limit (see SQL Server 2000), but despite that limitation it remains a very stable platform for small to medium applications for which the old data is purged.
Now, from working with Oracle and SQL Server 05/08, all I can tell you is that if you want the creme of the crop for stability, scalability and flexibility, then these two are your best bet. For enterprise applications, I strongly recommend them (merely because that's what we use where I work now).
Other things to consider:
Language integration (ASP.NET session storage, role management, etc.)
Query types (Select, Update, Delete) [Although this is more of a schema design issue, not a DBMS issue)
Data storage requirements
Your application's utilization of the database is the most critical ones. Mainly what queries are used most often (SELECT, INSERT or UPDATE)?
Say if you use SQLite, it is gears for smaller application but for "web" application you might a bigger one like MySQL or SQL Server.
The way you write scripts and your web application platforms also matters. If you're developing on a Microsoft platform, then SQL Server is a better alternative.
Typically, I go with what is commonly accepted by whichever framework I am using. So, if I'm doing .NET => SQL Server, Python (via Django or Pylons) => MySQL or SQLite.
I almost never use flat files though.
There is more to choosing an RDBMS solution that just "back end horsepower". The ability to have commitment control, for example, so you can roll back a failed transaction is one. reason.
Unless you are in the megatransaction rate application, most database engines would be adequate - so it becomes a question of how much you want to pay for the software, whether it runs on the hardware and operating system environment you want, and what expertise you have in managing that software.
That progression sounds painful. If you're going to include MS products (especially the for-pay SQL Server) in there anywhere, you may as well use the whole stack, since you only have to pay for the last of these:
SQL Server Compact -> SQL Server Express -> SQL Server Enterprise (clustered).
If you target your app at SQL Server Compact initially, all your SQL code is guaranteed to scale up to the next version without modification. If you get bigger than SQL Server Enterprise, then congratulations. That's what they call a good problem to have.
Also: go back and check the SO podcasts. I believe they talked about this briefly.
This question depends on your situation really.
If you have control over the server you're deploying to and you can install whatever services you need, then the time to install a MySql or MSSQL Express server and code against an existing database framework VERSUS coding against flat file structure is not worth the effort of considering.
What about FireBird? Where would that fit into that list?
And lets not forget the requirements that the "customer" of your solution must also have in place. If your writing a commercial application for a small companies, then Oracle might not be a good choice... but if your writing a customized solution for a large enterprise which must share data among multiple campuses, and has a good sized IT department then the decision of Oracle vs Sql Server would come down to what does the customer most likely already have deployed.
Data migration nowdays isn't that bad since we have those great tools from Embarcadero, so I would instead let the customer needs drive the decision.
If you have the option SQL Server is a good choice from the word go, predominantly because you have access to solid procedures and functions and the database backup facilities are totally reliable. Wrapping up as much as your logic as you can inside the database itself (rather than in whatever language you are using) helps security and performance - indeed there's an good argument to be made for always using procedures for insert/update logic as these make you invulnerable to injection attacks.
If I have the choice the only time I'd consider MySQL in preference is with a large, fairly simple, database predominantly used for read access. This isn't to decry MySQL which has improved markedly of late and I happily use if I don't have the choice, but for more complex systems with update/insert activity MSSQL is generally the superior option.
I think your list is subjective but I will play your game.
Flat Files
BDB
SQLite
MySQL
PostgreSQL
SQL Server
Oracle
Teradata

Resources