Related
I am learning to design systems, specifically the database part in it and I know a lot of similar posts are available on SO but either they are a decade old (and a lot has changed since then) or they don't have helpful answers.
All the previous answers or blog posts compare relational and non-relational DBMS (or SQL and NoSQL) in some way. And I can't see a well defined line between the two. Because as soon as I read about a property of NoSQL DBMS, I find there's a latest version of a SQL DBMS that provides the same property and vice versa. For example, Document data-stores store data in JSON-like objects and you can even query into these objects but then I found that PostreSQL also provides this functionality.
Next common point of comparison is Scalability. Well, I can see a lot of giants like Amazon using SQL dbms and they don't seem to have any scalability issues.
Next point of comparison is that SQL dbms enforce schema. We can kind of do that with MongoDB schema validation.
And now newSQL databases claim to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.
So it seems like Relational dbms and Non-relational dbms are moving towards each other.
Keeping all these things in mind, how to choose a dbms? I mean what points do you consider? What's the thought process?
Thank you!
Choosing as a platform for learning or choosing for a specific project? That really does have an impact. If it's for a specific project then the choice will depend on the needs of the project... where is the data coming from? What is the purpose of storing it? (Is it for record keeping, transactional, etc.) Who will access it and how? What needs to be accomplished?
There are a lot of options to consider, that's a fact, but it's helpful to consider the purpose of the project to realize which systems are a better fit.
Beyond that, another major factor in a decision is the skills of the people involved or what is already available in the company (if anything).
Do you have specific details at this point or is this more of learning exercise?
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
For a current project I'm creating a data model. Are there any sources where I can find "best practices" for a good data model? Good means flexible, efficient, with good performance, style, ... Some example questions would be "naming of columns", "what data should be normalized", or "which attributes should be exported into an own table". The source should be a book :-)
Personally I think you should read a book on performance tuning before beginning to model a database. The right design can make a world of difference. If you are not expert in performance tuning, you aren't qualified to design a database.
These books are Database specific, here is one for SQl Server.
http://www.amazon.com/Server-Performance-Tuning-Distilled-Experts/dp/1430219025/ref=sr_1_1?s=books&ie=UTF8&qid=1313603282&sr=1-1
Another book that you should read before starting to design is about antipatterns. Always good to know what you should avoid doing.
http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557/ref=sr_1_1?s=books&ie=UTF8&qid=1313603622&sr=1-1
Do not get stuck in the trap of designing for flexibility. People use that as a way to get out of doing the work to design correctly and flexible databases almost always perform badly. If more than 5% of your database design depends on flexibility, you haven't modeled correctly in my opinion. All the worst COTS products I've had to work with were designed for flexibility first.
Any decent database book will discuss normalization. You can also find that information easily on the web. Be sure to actually create FK/PK relationships.
As far as naming columns, pick a standard and stick with it consistently. Consistency is more important than the actual standard. Don't name columns ID (see SQL antipatterns book). Use the same name and datatypes if columns are going to be in several different tables. What you are going for is to not have to use functions to do joins because of datatype mismatches.
Always remember that databases can (and will) be changed outside the application. Anything that is needed for data integrity must be in the database not the application code. The data will be there long after the application has been replaced.
The most important things for database design:
Thorough definition of the data needed (including correct datatypes)
and the relationships between pieces of data (including correct normalization)
data integrity
performance
security
consistency (of datatypes, naming standards etc.)
The best book I've read on the design of database systems was "An Introduction to Database Systems". Joe Celko's SQL for Smarties books are also worth reading.
Assuming you're building an application and not just a database, and assuming you're using an Object Oriented language, Applying UML and Patterns by Craig Larman has a good discussion on mapping databases to objects.
In terms of defining "good", in my experience "maintainable" is probably top of the list. Maintainability in database design means many things, such as sticking to conventions - I often recommend http://justinsomnia.org/2003/04/essential-database-naming-conventions-and-style/. Normalization is another obvious maintainability strategy. I often recommend being generous with column types - it's hard to change an application if you find out that postal codes in different countries are longer than in the US. I often recommend using views to abstract complex data relations away for less experienced developers.
A key thing with maintainability is the ability to test and deploy. It's worth reading up about Continuous Database Integration (http://www.codeproject.com/KB/architecture/Database_CI.aspx) - whilst not strictly associated with the design of the database schema, it's important context.
As for performance - I believe you should design for maintainability first, and only design for performance if you know you have a problem. Sometimes, you know in advance that performance will be a major problem - designing a database for Facebook (or Stack Exchange), designing a database with huge amounts of data (terabytes and up), or huge numbers of users. Most systems don't fall into that camp - so I recommend regular performance tests, with representative data, to find if you have a problem, and only tune when you can prove you have to. Many performance optimizations are at the expense of maintainability - denormalization, for instance.
Oh, and in general, avoid triggers and stored procedures if you can. That's just my opinion, though...
Even though it is not a book I recommend to read Query evaluation techniques for large databases. It gives a background on query processing which largely influences your schema design, especially for data intensive (e.g., analytical) workloads. It is less hands-on but I believe every database designer should read it at least once :-).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I've been hearing things about NoSQL and that it may eventually become the replacement for SQL DB storage methods due to the fact that DB interaction is often a bottle neck for speed on the web.
So I just have a few questions:
What exactly is it?
How does it work?
Why would it be better than using a SQL Database? And how much better is it?
Is the technology too new to start implementing yet or is it worth taking a look into?
There is no such thing as NoSQL!
NoSQL is a buzzword.
For decades, when people were talking about databases, they meant relational databases. And when people were talking about relational databases, they meant those you control with Edgar F. Codd's Structured Query Language. Storing data in some other way? Madness! Anything else is just flatfiles.
But in the past few years, people started to question this dogma. People wondered if tables with rows and columns are really the only way to represent data. People started thinking and coding, and came up with many new concepts how data could be organized. And they started to create new database systems designed for these new ways of working with data.
The philosophies of all these databases were different. But one thing all these databases had in common, was that the Structured Query Language was no longer a good fit for using them. So each database replaced SQL with their own query languages. And so the term NoSQL was born, as a label for all database technologies which defy the classic relational database model.
So what do NoSQL databases have in common?
Actually, not much.
You often hear phrases like:
NoSQL is scalable!
NoSQL is for BigData!
NoSQL violates ACID!
NoSQL is a glorified key/value store!
Is that true? Well, some of these statements might be true for some databases commonly called NoSQL, but every single one is also false for at least one other. Actually, the only thing NoSQL databases have in common, is that they are databases which do not use SQL. That's it. The only thing that defines them is what sets them apart from each other.
So what sets NoSQL databases apart?
So we made clear that all those databases commonly referred to as NoSQL are too different to evaluate them together. Each of them needs to be evaluated separately to decide if they are a good fit to solve a specific problem. But where do we begin? Thankfully, NoSQL databases can be grouped into certain categories, which are suitable for different use-cases:
Document-oriented
Examples: MongoDB, CouchDB
Strengths: Heterogenous data, working object-oriented, agile development
Their advantage is that they do not require a consistent data structure. They are useful when your requirements and thus your database layout changes constantly, or when you are dealing with datasets which belong together but still look very differently. When you have a lot of tables with two columns called "key" and "value", then these might be worth looking into.
Graph databases
Examples: Neo4j, GiraffeDB.
Strengths: Data Mining
While most NoSQL databases abandon the concept of managing data relations, these databases embrace it even more than those so-called relational databases.
Their focus is at defining data by its relation to other data. When you have a lot of tables with primary keys which are the primary keys of two other tables (and maybe some data describing the relation between them), then these might be something for you.
Key-Value Stores
Examples: Redis, Cassandra, MemcacheDB
Strengths: Fast lookup of values by known keys
They are very simplistic, but that makes them fast and easy to use. When you have no need for stored procedures, constraints, triggers and all those advanced database features and you just want fast storage and retrieval of your data, then those are for you.
Unfortunately they assume that you know exactly what you are looking for. You need the profile of User157641? No problem, will only take microseconds. But what when you want the names of all users who are aged between 16 and 24, have "waffles" as their favorite food and logged in in the last 24 hours? Tough luck. When you don't have a definite and unique key for a specific result, you can't get it out of your K-V store that easily.
Is SQL obsolete?
Some NoSQL proponents claim that their favorite NoSQL database is the new way of doing things, and SQL is a thing of the past.
Are they right?
No, of course they aren't. While there are problems SQL isn't suitable for, it still got its strengths. Lots of data models are simply best represented as a collection of tables which reference each other. Especially because most database programmers were trained for decades to think of data in a relational way, and trying to press this mindset onto a new technology which wasn't made for it rarely ends well.
NoSQL databases aren't a replacement for SQL - they are an alternative.
Most software ecosystems around the different NoSQL databases aren't as mature yet. While there are advances, you still haven't got supplemental tools which are as mature and powerful as those available for popular SQL databases.
Also, there is much more know-how for SQL around. Generations of computer scientists have spent decades of their careers into research focusing on relational databases, and it shows: The literature written about SQL databases and relational data modelling, both practical and theoretical, could fill multiple libraries full of books. How to build a relational database for your data is a topic so well-researched it's hard to find a corner case where there isn't a generally accepted by-the-book best practice.
Most NoSQL databases, on the other hand, are still in their infancy. We are still figuring out the best way to use them.
What exactly is it?
On one hand, a specific system, but it has also become a generic word for a variety of new data storage backends that do not follow the relational DB model.
How does it work?
Each of the systems labelled with the generic name works differently, but the basic idea is to offer better scalability and performance by using DB models that don't support all the functionality of a generic RDBMS, but still enough functionality to be useful. In a way it's like MySQL, which at one time lacked support for transactions but, exactly because of that, managed to outperform other DB systems. If you could write your app in a way that didn't require transactions, it was great.
Why would it be better than using a SQL Database? And how much better is it?
It would be better when your site needs to scale so massively that the best RDBMS running on the best hardware you can afford and optimized as much as possible simply can't keep up with the load. How much better it is depends on the specific use case (lots of update activity combined with lots of joins is very hard on "traditional" RDBMSs) - could well be a factor of 1000 in extreme cases.
Is the technology too new to start implementing yet or is it worth taking a look into?
Depends mainly on what you're trying to achieve. It's certainly mature enough to use. But few applications really need to scale that massively. For most, a traditional RDBMS is sufficient. However, with internet usage becoming more ubiquitous all the time, it's quite likely that applications that do will become more common (though probably not dominant).
Since someone said that my previous post was off-topic, I'll try to compensate :-) NoSQL is not, and never was, intended to be a replacement for more mainstream SQL databases, but a couple of words are in order to get things in the right perspective.
At the very heart of the NoSQL philosophy lies the consideration that, possibly for commercial and portability reasons, SQL engines tend to disregard the tremendous power of the UNIX operating system and its derivatives.
With a filesystem-based database, you can take immediate advantage of the ever-increasing capabilities and power of the underlying operating system, which have been steadily increasing for many years now in accordance with Moore's law. With this approach, many operating-system commands become automatically also "database operators" (think of "ls" "sort", "find" and the other countless UNIX shell utilities).
With this in mind, and a bit of creativity, you can indeed devise a filesystem-based database that is able to overcome the limitations of many common SQL engines, at least for specific usage patterns, which is the whole point behind NoSQL's philosophy, the way I see it.
I run hundreds of web sites and they all use NoSQL to a greater or lesser extent. In fact, they do not host huge amounts of data, but even if some of them did I could probably think of a creative use of NoSQL and the filesystem to overcome any bottlenecks. Something that would likely be more difficult with traditional SQL "jails". I urge you to google for "unix", "manis" and "shaffer" to understand what I mean.
If I recall correctly, it refers to types of databases that don't necessarily follow the relational form. Document databases come to mind, databases without a specific structure, and which don't use SQL as a specific query language.
It's generally better suited to web applications that rely on performance of the database, and don't need more advanced features of Relation Database Engines. For example, a Key->Value store providing a simple query by id interface might be 10-100x faster than the corresponding SQL server implementation, with a lower developer maintenance cost.
One example is this paper for an OLTP Tuple Store, which sacrificed transactions for single threaded processing (no concurrency problem because no concurrency allowed), and kept all data in memory; achieving 10-100x better performance as compared to a similar RDBMS driven system. Basically, it's moving away from the 'One Size Fits All' view of SQL and database systems.
In practice, NoSQL is a database system which supports fast access to large binary objects (docs, jpgs etc) using a key based access strategy. This is a departure from the traditional SQL access which is only good enough for alphanumeric values. Not only the internal storage and access strategy but also the syntax and limitations on the display format restricts the traditional SQL. BLOB implementations of traditional relational databases too suffer from these restrictions.
Behind the scene it is an indirect admission of the failure of the SQL model to support any form of OLTP or support for new dataformats. "Support" means not just store but full access capabilities - programmatic and querywise using the standard model.
Relational enthusiasts were quick to modify the defnition of NoSQL from Not-SQL to Not-Only-SQL to keep SQL still in the picture! This is not good especially when we see that most Java programs today resort to ORM mapping of the underlying relational model. A new concept must have a clearcut definition. Else it will end up like SOA.
The basis of the NoSQL systems lies in the random key - value pair. But this is not new. Traditional database systems like IMS and IDMS did support hashed ramdom keys (without making use of any index) and they still do. In fact IDMS already has a keyword NONSQL where they support SQL access to their older network database which they termed as NONSQL.
It's like Jacuzzi: both a brand and a generic name. It's not just a specific technology, but rather a specific type of technology, in this case referring to large-scale (often sparse) "databases" like Google's BigTable or CouchDB.
NoSQL the actual program appears to be a relational database implemented in awk using flat files on the backend. Though they profess, "NoSQL essentially has no arbitrary limits, and can work where other products can't. For example there is no limit on data field size, the number of columns, or file size" , I don't think it is the large scale database of the future.
As Joel says, massively scalable databases like BigTable or HBase, are much more interesting. GQL is the query language associated with BigTable and App Engine. It's largely SQL tweaked to avoid features Google considers bottle-necks (like joins). However, I haven't heard this referred to as "NoSQL" before.
NoSQL is a database system which doesn't use string based SQL queries to fetch data.
Instead you build queries using an API they will provide, for example Amazon DynamoDB is a good example of a NoSQL database.
NoSQL databases are better for large applications where scalability is important.
Does NoSQL mean non-relational database?
Yes, NoSQL is different from RDBMS and OLAP. It uses looser consistency models than traditional relational databases.
Consistency models are used in distributed systems like distributed shared memory systems or distributed data store.
How it works internally?
NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key-value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.
It can work on Structured and Unstructured Data. It uses Collections instead of Tables
How do you query such "database"?
Watch SQL vs NoSQL: Battle of the Backends; it explains it all.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Why did object oriented databases fail?
I find it astonishing that:
foo bar = new foo();
bar.saveToDatabase();
Lost to:
foo bar = new foo();
/* write complicated code to extract stuff from foo */
/* write complicated code to write stuff to database */
Related questions:
Are Object oriented databases still
in use?
Object Oriented vs Relational
Databases
Why have object oriented databases not been successful (yet)?
Probably because of their coupling with specific programming languages.
First, I don't believe they have "failed" entirely. There are still some around, and they're still used by a couple of companies, as far as I know.
anyway, probably because a lot of the data we want to store in a database is relational in nature.
The problem is that while yes, OO databases are easier to integrate into OO programming languages, relational databases make it easier to define queries and, well, relations between the data stored. Which is often the complicated part.
I have been using db4o (an object oriented database) lately on my personal pet projects, just because it is so quick to set up & get going. No need with the itty, gritty details.
That aside, as I see it, the main reasons why they haven't become popular are:
Reporting is more difficult on object oriented databases. Related to this, it is also easier to manually look into the actual data in a relational database.
Microsoft and Oracle base so much of their business on relational databases.
Lots of businesses already have relational databases in place.
The IT departments often have relational database expertise.
And, as Jan Aagaard, have pointed out, lately it is because the problem have been solved in a different way, giving programmers the object oriented feel even though they program against a relational database.
There are countless numbers of existing applications out there storing their data in relational databases. This data is the lifeblood of those companies. They have collectively invested huge amounts in storing, maintaining and reporting on this data. The cost and risk of moving this priceless information into a fundamentally different environment is extremely high.
Now consider that ORM tools can map modern application data structures into traditional relational models, and you remove pretty much any incentive to migrate to OODBMS. They provide a low-risk alternative to a very costly high risk migration.
Because, as much as ODBMS advertisements were laden with derogatory language about ORM systems, it wasn't that hard to make ORMs do the job, and without all the various hits taken in switching to a pure ODBMS.
What actually happened is that your first code sample won, it just happens to be on a RDBMS persistence layer.
I think it is because the problem was solved differently. You might be using a relational database behind the scenes when you are coding in Ruby on Rails or LINQ to SQL, but it feels like you are working with objects.
Very subjective, but a few reasons come to mind:
Performance has not been as good as relational databases (or at least that's the perception)
Again with performance - relational databases allow you to do things like denormalizing data to further improve performance.
Legacy support for all the non-OO apps that need to access the data.
I think a lot of your answer lies in the "Why we abandoned Object Databases" answer of "Object Oriented vs Relational Databases".
As far as your example goes, it doesn't have to be that way. Linq to SQL is actually a quite nice basic layer over a DBMS, and Linq to Entities (v2 -- v1 sucked) will be pretty cool too. (N)Hibernate has been solving the problem you're having for years now using RDBMSes.
So I guess my answer to you is that O/R mappers are getting to the point where they solve your problem nicely, and you don't need an ODBMS to get what you need.
They will succeed some day. They are future.
Looking back to software technologies in history, the trend is sacrificing performance to reduce complexity (Assembly => C => C++ => .NET). An application which takes 30 minutes to code now, some days in past took a month.
ORMs are right answer to wrong question. Currently, they are the choice, since they make life easier in the absence of a better solution. But they cannot handle the level of complexity they aimed to. "Problems cannot be solved by the same level of thinking that created them." A.E
As others mentioned relational databases are heavily used and relied and replacing them forces a lot of risks. Look the interval between SQL versions and the major changes between these versions and other Microsoft products (conservative approach, which is necessary here). Also I'll add the following items:
Current approach still works. You may argue it will work forever (we
can code assembly yet), but here I mean it doesn't
work logically when, the AVERAGE level of projects complexity and
the time to develop them on relational databases rings the bell.
Major companies did not involved seriously. When the market signals, they do.
The problem is not well-defined yet. Unfortunately current failures help.
It need some improvements in other sciences (QC, AI) rather than
computer. Storing and querying multidimensional data on flat
infrastructure and without enough smartness for self-organizing are
the top obstacles at the theoretic level.
Why not?
I guess they were a solution to a problem nobody was having, or not having enough to pay for it.
Further, OOP and set-based programming are not always very comptatble.
Personally, when I started reading about OO databases, I couldn't help but think "Boy, I hope I never have to work on one of those, update 1 million rows out of a 6 million row table and then make sure all appropriate records in other tables get updated as well"
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Modern database systems today come with loads of features. And you would agree with me that to learn one database you must unlearn the concepts you learned in another database. For example, each database would implement locking differently than others. So to carry the concepts of one database to another would be a recipe for failure. And there could be other examples where two databases would perform very very differently.
So while developing the database driven systems should the programmers need to know the database in detail so that they code for performance? I don't think it would be appropriate to have the DBA called for performance later as his job is to only maintain the database and help out the developer in case of an emergency but not on a regular basis.
What do you think should be the extent the developer needs to gain an insight into the database?
I think these are the most important things (from most important to least, IMO):
SQL (obviously) - It helps to know how to at least do basic queries, aggregates (sum(), etc), and inner joins
Normalization - DB design skills are an major requirement
Locking Model/MVCC - Its nice to have at least a basic grasp of how your databases manage row locking (or use MVCC to accomplish similar goals with optimistic locking)
ACID compliance, Txns - Please know how these work and interact
Indexing - While I don't think that you need to be an expert in tablespaces, placing data on separate drives for optimal performance, and other minutiae, it does help to have a high level knowledge of how index scans work vs. tablescans. It also helps to be able to read a query plan and understand why it might be choosing one over the other.
Basic Tools - You'll probably find yourself wanting to copy production data to a test environment at some point, so knowing the basics of how to restore/backup your database will be important.
Fortunately, there are some great FOSS and free commercial databases out there today that can be used to learn quite a bit about db fundamentals.
I think a developer should have a fairly good grasp of how their database system works, not matter which one it is. When making design and architecture decisions, they need to understand the possible implications when it comes to the database.
Personally, I think you should know how databases work as well as the relational model and the rhetoric behind it, including all forms of normalization (even though I rarely see a need to go beyond third normal form). The core concepts of the relational model do not change from relational database to relational database - implementation may, but so what?
Developers that don't understand the rationale behind database normalization, indexes, etc. are going to suffer if they ever work on a non-trivial project.
I think it really depends on your job. If you are a developer in a large company with dedicated DBAs then maybe you don't need to know much, but if you are in a small company then it may be really helpful knowing more about databases. In small companies you may wear more than one hat.
It cannot hurt to know more in any situation.
It certainly can't hurt to be familiar with relational database theory, and have a good working knowledge of the standard SQL syntax, as well as knowing what stored procedures, triggers, views, and indexes are. Obviously it's not terribly important to learn the database-specific extensions to SQL (T-SQL, PL/SQL, etc) until you start working with that database.
I think it's important to have a basic understand of databses when developing database applications just like it's important to have an understanding of the hardware your your software runs on. You don't have to be an expert, but you shouldn't be totally ignorant of anything your software interacts with.
That said, you probably shouldn't need to do much SQL as an application developer. Most of the interaction with the database should be done through stored procedures developed by the DBA, I'm not a big fan of including SQL code in your application code. If your queries are in stored procedures, then the DBA can change the implementation of the stored procedure, or even the database schema, and so long as the result is the same it doesn't require any changes to your application code.
If you are uncertain about how to best access the database you should be using tried and tested solutions like the application blocks from Microsoft - http://msdn.microsoft.com/en-us/library/cc309504.aspx. They can also prove helpful to you by examining how that code is implemented.
Basic things about Sql queries are must. then you can develop simple system. but when you are going to implement Complex systems you should know Normalization, Procedures, Functions, etc.