How do SQL Azure and Azure Table Storage compare? [closed] - sql-server

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Currently I have a prototype that runs in IIS on my local machine and uses SQL Server Express 2005 for storing data in three SQL tables. I run queries with transactions that employ up to two of those three tables.
Now I need to move my prototype to Windows Azure and can't decide which to choose - SQL Azure or Azure Table Storage.
How do they compare? How do I decide which to choose?

Windows Azure's SQL Database is a relational database, with all the things you'd expect from a relational database (multiple indexes, stored procedures, powerful queries, etc.). Azure Table Storage is a non-relational, massively scalable (up to 100TB per account) storage facility, where entities are located by partition key (a colocation of entities) and row key.
If you want to have a very simple storage mechanism that doesn't require sophisticated relational operations, Azure Table Storage will work quite nicely.
EDIT June 7, 2012: Updated with Spring Release pricing
There are cost differences too. SQL Database starts at $4.99 for 100MB, scaling up on a tiered scale (about $26 for 5GB, $125 for 50GB, $225 FOR 150GB) but has no transactional costs. Azure Table Storage runs $0.125 / GB (or 0.09 / GB without geo-replication), dropping in per-GB price as quantity goes up, but has a $0.01 / 100,000 transaction cost (nominal for low volume systems, but could be significant with very high volume systems). Full pricing details are here.
There's a fairly recent article in MSDN Magazine that goes into greater detail regarding use cases, differences, etc.
If you're going for a straightforward migration of what you have in place today, SQL Database will closely match what you have in SQL Server Express 2005. However, since it's only a prototype at this point, it's worth re-evaluating your needs.

"How do I decide which to choose?"
Good question.
You have to work out what you care about, and evaluate the options against each other.
The key difference in my view is that SQL Azure is less scalable than table storage. if you're expecting huge numbers of visitors, your database may not be able to keep up, and become the bottleneck that stops you from scaling any further - in the Cloud, you can keep adding front-end servers until your credit card bleeds, but once you've gone to a "big" database server, you have nowhere else to go.
(Except that's not really true - you can also find a way of partitioning your application across multiple database servers).
So, if you care about scalability, you may want to go to table storage - it doesn't have the same scalability limits as SQL Azure.
However, the cost of that scalability is stuff you might also care about - you basically have to architect your application from scratch to work with table storage, and you have to recreate a lot of stuff you get for free from a relational database. Transactions don't really work the way you might expect, for instance.
So, if it's only a prototype, and you're not explicitly intending to become the next Facebook, I'd stay with SQL Azure until the scalability pain becomes real.

Related

Multiple Databases Vs Single Database with logically partitioned data [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am pondering over a database design issue. Any help would be highly appreciated.
We are designing an application which has 20 tables (which may grow to about 30 maximum during new feature development)
The technology stack
MVC4,.NET 4.X, Entity Framework 5, SQL Server 2012, ASP.NET membership framework
No of users
We intend to cater to about 1000 clients who would have on average 20 users.
The Question
Should we design the database and the application in such a way that the tables are logically partitioned, i.e all clients use the same tables with a partition guid to separate the data.
OR
Go for multiple databases which could prove to be difficult during new feature launch and bug fixing. BUT could potentially allow for scaling?
Caveats: one of the tables has a binary column which stores files (maximum 5MB per record)
In addition to this we need to consider the Membership framework tables, which we will be extending to another custom table and logically mapping users to a partition guid.
You'll wish you had used separate databases:
If you ever want to grant permissions to the databases themselves to clients or superusers.
If you ever want to restore just one client's database without affecting the data of the others.
If there are regulatory concerns governing your data and data breaches, and you belatedly discover that these regulations can only be met by having separate databases. (Update: a little over 4 years after the writing of this answer, GDPR went into effect)
If you ever want to easily move your customer data to multiple database servers or otherwise scale out, or move larger/more important customers to different hardware. In a different part of the world.
If you ever want to easily archive and decommission old customer data.
If your customers care about their data being siloed, and they find out that you did otherwise.
If your data is subpoenaed and it's hard to extract just one customer's data, or the subpoena is overly broad and you have to produce the entire database instead of just the data for the one client.
When you forget to maintain vigilance and just one query slips through that didn't include AND CustomerID = #CustomerID. Hint: use a scripted permissions tool, or schemas, or wrap all tables with views that include WHERE CustomerID = SomeUserReturningFunction(), or some combination of these.
When you get permissions wrong at the application level and customer data is exposed to the wrong customer.
When you want to have different levels of backup and recovery protection for different clients.
Once you realize that building an infrastructure to create, provision, configure, deploy, and otherwise spin up/down new databases is worth the investment because it forces you to get good at it.
When you didn't allow for the possibility of some class of people needing access to multiple customers' data, and you need a layer of abstraction on top of Customer because WHERE CustomerID = #CustomerID won't cut it now.
When hackers target your sites or systems, and you made it easy for them to get all the data of all your customers in one fell swoop after getting admin credentials in just one database.
When your database backup takes 5 hours to run and then fails.
When you have to get the Enterprise edition of your DBMS so you can make compressed backups so that copying the backup file over the network takes less than 5 hours more.
When you have to restore the entire database every day to a test server which takes 5 hours, and run validation scripts that take 2 hours to complete.
When only a few of your customers need replication and you have to apply it to all of your customers instead of just those few.
When you want to take on a government customer and find out that they require you to use a separate server and database, but your ecosystem was built around a single server and database and it's just too hard or will take too long to change.
You'll be glad you used separate databases:
When a pilot rollout to one customer completely explodes and the other 999 customers are completely unaffected. And you can restore from backup to fix the problem.
When one of your database backups fails and you can fix just that one in 25 minutes instead of starting the entire 10-hour process over again.
You'll wish you had used a single database:
When you discover a bug that affects all 1000 clients and deploying the fix to 1000 databases is hard.
When you get permissions wrong at the database level and customer data is exposed to the wrong customer.
When you didn't allow for the possibility of some class of people needing access to a subset of all the databases (perhaps two customers merge).
When you didn't think how hard it would be to merge two different databases of data.
When you've merged two different databases of data and realize one was the wrong one, and you didn't plan for recovering from this scenario.
When you try to grow past 32,767 customers/databases on a single server and find out that this is the maximum in SQL Server 2012.
When you realize that managing 1,000+ databases is a bigger nightmare than you ever imagined.
When you realize that you can't onboard a new customer just by adding some data in a table, and you have to run a bunch of scary and complicated scripts to create, populate, and set permissions on a new database.
When you have to run 1000 database backups every day, make sure they all succeed, copy them over the network, restore them all to a test database, and run validation scripts on each single one, reporting any failures in a way that will guaranteed to be seen, and which are easily and quickly actionable. And then 150 of these fail in various places and have to be fixed one at a time.
When you find out you have to set up replication for 1000 databases.
Just because I listed more reasons for one doesn't mean it is better.
Some readers may get value from MSDN: Multi-Tenant Data Architecture. Or perhaps SaaS Tenancy App Design Patterns. Or even Developing Multi-tenant Applications for the Cloud, 3rd Edition
If you are refering your architecural as "multi tenant", Microsoft has a good article which is worth to read here. It shows some comparison between "isolated" (multiple db) and "shared" (single db). Generally, shared wins when the # of tenant (client) is big, but when the size of each tenant is big, an isolated approach is recommended.
Those consideration however can only be calculated by experienced developers though.
Still if you managed to use isolated (multiple db) architecture, you still won't get direct benefit in performance when they are still run at same instance. And if you use shared (single db) architecture, consider using int instead of guid, or sequential guid if you still need to use it.

SQL Server: fundamental design issue? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
The director of the company I work for met another director of a software company at a party (this is not a joke !).
The second director told my director that,
'he had spent a fortune abandoning SQL Server' because 'if several people were querying the same table in different ways the database locked up'. Further, this is a 'known fundamental design issue with SQL Server where each query competes for maximum resources until SQL Server locks up after about 7 concurrent queries to the same table'.
Now I know a bit about SQL Server locking and IO and this is news to me. AFAIK there is nothing inherent in the SQL Server architectur that causes such problems. And SQL Server performs well in the TPC benchmarks, esp on price/performance.
Feel embarrassed asking but I have to be sure - is there any grain of truth to what he said ?
EDIT - after reading some of the comments I thought Id make it clear that I agree its possible to write poorly performing sql in Sql Server, as it is on any db platform. Sort of asking if there is anyting inherent in the architecture that blocks it out under certain high concurrency conditions irrespective of how well your db/sql is crafted?
There is absolutely no truth to this, and the other director clearly had one of the following two scenarios in his IT department, which can be found at very many firms:
No DBA, or the (idiot) director was the DBA, or someone with no DBA qualifications whatsoever was forced to act as the DBA
An incompetent DBA or incompetent IT employees who created the DB schema(s) and queried the database in incompetent ways given those schemas. Also, see TomTom's comment to this answer below, which expands and expounds on this item further.
Also, do have a look at SQL Server's market dominance at the moment, before coming to any conclusions of its inability to do something.
That is not to say that SQL Server is perfect and has no "fundamental design issues," such as for example this unbelievable bug you can encounter when using identity columns which practically everybody does for surrogate keys:
SCOPE_IDENTITY() sometimes returns incorrect value
I know you can lock up SQL accidentally or with some poorly written queries but I wouldnt say it is a design flaw with SQL Server itself.
I have worked on software that was going to be deployed across the UK, 80+ stores all querying the same tables and never had any problems. Our company also produced way larger software built on a SQL database which would have 100s of stores with anything from 10 - 100+ applications all querying / updating the same tables, we had a few issues with deadlock but they were all resolved.
With the right people and knowledge SQL is an amazing tool, when you get any old programmer and make them the DBA you get problems which I suspect has happened here.
When I hear stories like this unless the person telling me is developing some huge enterprise level application I always think if big business can manage to use this application without it giving them huge problems then me saying I cant use SQL in my smaller application is just me doing it wrong.
It would be interesting to know what they migrated to instead of SQL though.
Poorly written queries can result in the behavior he is talking about. This is not a reason to abandon SQL Server though, it is a reason to look for better developers/DBAs.
Any technology, when used incorrectly can appear to have problems. There are a number of huge projects out there that are built on SQL Server (including StackOverflow). If what he was describing were a result of SQL Server and not the developers then SQL Server would most likely not even be around...

What's the best DB to store banking transactions?

We are planning to create a web app to store banking transactions for customers, e.g purchases, transfers etc and allow them to tag / categorize each transaction.
Could someone point us to the best DB for this purpose? It needs to scale horizontally and we also need to perform analysis on all transactions.
Thanks
The best database to store banking transactions is the one the banks use, DB2/z.
But, since I doubt you'd be able to afford a System z mainframe, that's probably not an option. That doesn't make it any less the best database of course.
If, however, you're talking about storing transaction for Joe Bloggs or Dodgy Brothers Rug Emporium (as opposed to the two hundred million or so customers of ICBC), pretty well any database will be up to the task - Oracle (despite its inability to differentiate NULLs from empty strings), SQL Server, MySQL PostgreSQL, even SQLite probably.
I'm going to start this by saying its almost impossible to recommend a system based on what you've described. It could be for such a varied number of uses, ranging from mission critical real time financial data that needs to be there and needs to be accurate, through to a web app that sucks in financial records from a bank/credit card statement and lets the user annotate them, in which case it isn't as sensitive.
If you're storing mission critical, sensitive data, I'd go with a commercial option that includes significant support. Also a DBA would be a good idea.
Oracle or MS SQL would be my inclination, and probably Oracle over MS SQL, over because of its multi-platform support. If you're happy to run on Windows then MS SQL is fine.
If you're storing existing transactions that can be tagged (ala Blippy), then any database would be sufficient. If you're thinking of scaling this out to the n'th degree, you might like one of the document database flavours of the month, (MongoDB, Couch etc).
Really I think the question should be reconsidered from the context of what your application will do, not that it happens to do it with financial data. The fact that financial data may require additional security, or additional accuracy checks, that forms part of what the system will do, as does the way the user interacts with your web app etc.
This may not answer your question directly, but here is what I have experienced.
I think, its really about how you'd save your banking transactions. Most database vendors provide sufficient amount of database performance, so all you have to do is to choose one over other.
What you are left with is the actual information to be saved(besides schema). You might think about using database encryption option, but then its not really realistic in your case; because you are talking about transactions, I assume there are quite alot of transactions coming in, and you doing large of amount of reads for your reporting(besides write), possibly for mining, etc.
Usually(sql server), using encryption any data that is written into the database file is encrypted. Snapshots and backups are also use encryption. The transaction log is also protected, so it would hit the performance that you might desire.
So, I see your question really boiling down to How to protect sensitive data?
Here are couple of articles that might help:
Btw, I have deployed solutions with Oracle, SQL Server, and even Sybase as backends, with several transactions pouring in from ATMs, and what I really look for is the performance, besides security. Except for minute limitations of one over other, all are same.
Following articles might help:
Database security: protecting sensitive and critical information
Using One-Way Functions to Protect Sensitive Information in SQL Server Databases

Why would you use Oracle database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I'm curious for technical reasons why you choose Oracle database versus the latest flavors of:
1) Microsoft SQL Server
2) MySQL
3) PostgreSQL
What features or functionality justify the extra cost.
I'm interested in technical arguments, not a religious war.
A friend asked me this and I've always used one of the 3 I listed.
I didn't know enough about Oracle Databases to offer an opinion.
Thanks.
Noone seems to talk about the cost of developers time working with Oracle. Most developers who know any other db hate Oracle, those that don't assume that all DB code and/or ORM tools are difficult to use.
If I started a business that I believed was going to scale to Amazon proportions I might consider NoSQL solutions, otherwise I'd choose PostgreSQL, SQL Server (or indeed even Sybase now) over Oracle every time. I say this having worked (as a dev) with Oracle for 2 years - its terrible to work with!
Only Oracle and Microsoft's SQLServer are closed source, and when something goes wrong and you have a problem the answer is just a phone call away (and cash if course). Anyways MySQL and PostGre have several enterprise consulting services but in the end these consultants aren't really resposible for the product, because the product belongs to everyone. Which is great because you can go in and fix the code if you are good with C and relatively lowlevel programming, but if you aren't finding the solution might become a wild goose chase.
Now since not everyone is skilled enough, and those enterprises with money prefer the security (in the business sense) of the closed source databases, is the reason why these solutions haven't gone out of business, besides the fact that their implementations are solid and worth the money if you have it.
Ok now finally the most important difference is between SQLServer and Oracle and that difference is the OS, most people using Windows will stick with, you guessed it, SQLServer, but if you run on flavors of Unix Oracle is your closed source solution. Anyways I use Oracle on Solaris, but if our target were Windows I would probably use SQLServer because both products are rock solid, but I trust Microsoft has some special tricks under the hood to get the best performance on windows.
Just to name a few:
Oracle Real Application Cluster - provides advanced clustering features
Oracle Data Guard - in short provides physical and logical stand-by features.
Oracle Exadata - implements the database aware storage (that can do predicate filtering, column projection filtering, join processing, hastens tablespace creation). The solution comes with HP servers, full 24/7 warranty, and other nice things. It's quite nice for applications with highly intensive data loading (for example thanks to the independent tablespace creation).
Oracle Virtualization
And of course the magic of the brand ;)
And when it comes to choosing the RDBMS? Usually the choice is pretty obvious - Oracle or the rest of the world. After that you can narrow the choice down by:
platform (windows-only or not)
weight (sqlite, MySQL, PostgreSQL, ...)
budget (initial license cost, maintenance + support cost)
evolution perspectives, for example:
Oracle Express -> Oracle
SQL Server Express -> MSSQL
business perspectives - "secure, well known product" or open-source product (bear in mind the quotation around the first phrase). Other post tends to look deeper into this aspect.
The real question is, what kind of application is going to be used to make use of a RDMS. You certainly don't need oracle for your wordpress blog or twitter clone. But if you want to do some heavy business intelligence, then Oracle might have some features which can help doing that more efficiently than the others.
Ms sql server is very good aswell, it has tons of features. If you are struck on linux and you need a database with features as offered by ms sql, then oracle would be a good pick.
I think it's because Oracle was the first RDMS that supported "sharding"
The costs of SQL Server and Oracle are not that far apart, you know.
In fact for small systems the cost of Oracle vs Your Favourite Free Database is between zero (Oracle Express Edition) and not-very-big ($5,800 processor perpetual for Standard Edition One).
Here's a link to the capabilities of the various editions in 11g: http://www.oracle.com/database/product_editions.html.
List prices are available for all territories at http://store.oracle.com -- typically large companies do not pay retail, of course ;)

SQL Server and Oracle, which one is better in terms of scalability? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
MS SQL Server and Oracle, which one is better in terms of scalability?
For example, if the data size reach 500 TB etc.
Both Oracle and SQL Server are shared-disk databases so they are constrained by disk bandwidth for queries that table scan over large volumes of data. Products such as Teradata, Netezza or DB/2 Parallel Edition are 'shared nothing' architectures where the database stores horizontal partitions on the individual nodes. This type of architecture gives the best parallel query performance as the local disks on each node are not constrained through a central bottleneck on a SAN.
Shared disk systems (such as Oracle Real Application Clusters or Clustered SQL Server installations still require a shared SAN, which has constrained bandwidth for streaming. On a VLDB this can seriously restrict the table-scanning performance that is possible to achieve. Most data warehouse queries run table or range scans across large blocks of data. If the query will hit more than a few percent of rows a single table scan is often the optimal query plan.
Multiple local direct-attach disk arrays on nodes gives more disk bandwidth.
Having said that I am aware of an Oracle DW shop (a major european telco) that has an oracle based data warehouse that loads 600 GB per day, so the shared disk architecture does not appear to impose unsurmountable limitations.
Between MS-SQL and Oracle there are some differences. IMHO Oracle has better VLDB support than SQL server for the following reasons:
Oracle has native support for bitmap indexes, which are an index structure suitable for high speed data warehouse queries. They essentially do a CPU for I/O tradeoff as they are run-length encoded and use relatively little space. On the other hand, Microsoft claim that Index Intersection is not appreciably slower.
Oracle has better table partitioning facilities than SQL Server. IIRC The table partitioning in SQL Server 2005 can only be done on a single column.
Oracle can be run on somewhat larger hardware than SQL Server, although one can run SQL server on some quite respectably large systems.
Oracle has more mature support for Materialized views and Query rewrite to optimise relational queries. SQL2005 does have some query rewrite capability but it is poorly documented and I haven't seen it used in a production system. However, Microsoft will suggest that you use Analysis Services, which does actually support shared nothing configurations.
Unless you have truly biblical data volumes and are choosing between Oracle and a shared nothing architecture such as Teradata you will probably see little practical difference between Oracle and SQL Server. Particularly since the introduction of SQL2005 the partitioning facilities in SQL Server are viewed as good enough and there are plenty of examples of multi-terabyte systems that have been successfully implemented on it.
When you are talking 500TB, that is (a) big and (b) specialized.
I'd be going to a consultancy firm with appropriate specialists to look at the existing skill sets, integration with existing technology stacks, expected usage, backup/recovery/DR requirements....
In short, it's not the sort of project I'd be heading into based on opinions from stackoverflow. No offence intended, but there's simply too many factors to take into account, a lot of which would be business confidential.
Whether Oracle or MSSQL will scale / perform better is question #15. The data model is the first make-it or break-it item regardless of if you're running Oracle, MSSQL, Informix or anything else. Data model structure, what kind of applicaiton, how it accesses the db etc, which platform your developers know well enough to target for a large system etc are the first questions you should ask yourself.
I've worked as a DBA on Oracle (although some years back) and I use MSSQL extensively now, although not as a formal DBA. My advice would be that in the vast majority of cases both will meet everything you can throw at them and your performance issues will be much more dependent upon database design and deployment than the underlying characteristics of the products, which in both cases are absolutely and utterly solid (MSSQL is the best product that MS makes in many peoples opinion so don't let the usual perception of MS blind you on that).
Myself I would tend towards MSSQL unless your system is going to be very large and truly enterprise level (massive numbers of users, multiple 9's uptime etc.) simply because in my experience Oracle tends to require a higher level of DBA knowledge and maintenance than MSSQL to get the best out of it. Oracle also tends to be more expensive, both for initial deployment and in the cost to hire DBAs for it. OTOH if you are looking at an enterprise system then Oracle would have the edge, not least because if you can afford it their support is second to none.
I have to agree with those who said deisgn was more important.
I've worked with superfast and super slow databases of many different flavors (the absolute worst being an Oracle database, but it wasn't Oracle's fault). Design of the database and how you decide to index it and partition it and query it have far more to do with the scalability than whether the product is from MSSQL Server or Oracle.
I think you may more easily find more Oracle dbas with terrabyte database experience (running a large database is a specialty just like knowing a particular flavor of SQL) but that could depend on your local area.
oracle people will tell you oracle is better, sql server peopele will tell you sql server is better.
i say they scale pretty much the same. use what you know better. you have databases out there that are that size on oracle as well as sql server
When you get to OBSCENE database sizes (where over 1TB is really big enough, and 500TB is frigging massive), then operational support must come very high up on the list of requirements. With that much data, you don't mess about with penny pinching system specifications.
How are you going to backup that size of system? Upgrade the OS and patch the database? Scalability and reliability a concern?
I have experience of both Oracle and MS SQL, and for the really really big systems (users, data or importance) then Oracle is better designed for operational support and data management.
Every tried to backup and restore a 1TB+ SQL Server database split over multiple databases on multiple instances with transaction log files being spat out everywhere by each database and trying to keep it all in sync? Good luck with that.
With Oracle, you have ONE database (so I disagree with the "shared nothing" approach is better) with ONE set of REDO logs(1) and one set of archive logs(2) and you can just add extra hardware nodes without changing (i.e. repartitioning) you application and data.
(1) Redo logs are, of course, mirrored.
(2) Archive logs are, of course, stored in multiple locations.
It would also depend on what is your application meant for. If it uses only Inserts with very few updates, then I think MSSQL would be more scalable and better in terms of performance. However if one has lots of updates, then Oracle would scaleup better
I very much doubt that you are going to get an objective answer to that particular question, until you come across anyone that has implemented the same database (schema, data, etc.) on both platforms.
However given the fact that you can find millions of happy users of both databases, I dare say it's not too much of a stretch to say either will scale just fine (I've seen a snappy Sql 2005 implementation of 300 TB that seemed pretty responsive)
Oracle like a high-quality manual film camera, which needs the best photographer to take the best picture while MS SQL like an automatic digital camera. In old days, of course, all professional photographers will use film camera, now think about how many professional photographers use automatic digital camera.

Resources