How do the advanced features in Relational databases work? - database

To make a long question short, I know about the basics of a Relational Database, Indexing ,Replication, Locking, Concurrency, etc, and SQL syntax (SELECT,INSERT,UPDATE,DELETE, CREATE,DROP,ALTER,TRUNCATE) when used with simple expressions such as:
SELECT EventID,EventName FROM Events WHERE CustomerID=5 ORDER BY EventType
But I don't understand any of the "advanced" topics in Relational databases, like:
Domains
Constraints
Indices
Will anyone please give me a quick primer, an approximate explanation on what these aspects do and how they work?
You may down-vote and totally trash this question, but please explain to me, approximately how these topics work because I need to get up to speed on Relational databases very quickly.

The Wikipedia articles on Relational Databases and the Relational Model are a good place to start. They have links to other articles on the specific topics you mention and these have examples, such as:
Domains
Constraints
Index
Primary Key and Foreign Key

I think that one issue you're going to face with this is that features vary widely between different RDBMS implementations. Locking, consistency and concurrency are very different in Oracle to <insert random name of other system here>. If there is a particular RDBMS that you have an interest in then I'd urge you to investigate how that particular system implements them, because the devil is in the details, as they say.
For example, start with the Oracle Concepts Guide, available in HTML and PDF from http://docs.oracle.com for each version.

Related

How to choose a DBMS?

I am learning to design systems, specifically the database part in it and I know a lot of similar posts are available on SO but either they are a decade old (and a lot has changed since then) or they don't have helpful answers.
All the previous answers or blog posts compare relational and non-relational DBMS (or SQL and NoSQL) in some way. And I can't see a well defined line between the two. Because as soon as I read about a property of NoSQL DBMS, I find there's a latest version of a SQL DBMS that provides the same property and vice versa. For example, Document data-stores store data in JSON-like objects and you can even query into these objects but then I found that PostreSQL also provides this functionality.
Next common point of comparison is Scalability. Well, I can see a lot of giants like Amazon using SQL dbms and they don't seem to have any scalability issues.
Next point of comparison is that SQL dbms enforce schema. We can kind of do that with MongoDB schema validation.
And now newSQL databases claim to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.
So it seems like Relational dbms and Non-relational dbms are moving towards each other.
Keeping all these things in mind, how to choose a dbms? I mean what points do you consider? What's the thought process?
Thank you!
Choosing as a platform for learning or choosing for a specific project? That really does have an impact. If it's for a specific project then the choice will depend on the needs of the project... where is the data coming from? What is the purpose of storing it? (Is it for record keeping, transactional, etc.) Who will access it and how? What needs to be accomplished?
There are a lot of options to consider, that's a fact, but it's helpful to consider the purpose of the project to realize which systems are a better fit.
Beyond that, another major factor in a decision is the skills of the people involved or what is already available in the company (if anything).
Do you have specific details at this point or is this more of learning exercise?

What can an RDBMS do that Neo4j (and graph databases) cant?

“A Graph Database –transforms a–> RDBMS”
The Neo4j site seems to imply that whatever you can do in RDBMS, you can do in Neo4j.
Before choosing Neo4j as a replacement for an RDBMS, I need some doubts answered.
I am interested in Neo4j for
ability to do quickly modify data "schema"
ability to express entities naturally instead of relations and normalizations
...which leads to highly expressive code (better than ORM)
This is a NoSQL solution I am interested in for it's features, not high performance.
Question: Does Neo4j present any issues that may make it unsuitable as a RDBMS replacement?
I am particularly concerned about these:
is there any DB feature I must implement in application logic? (For example, you must implement joins at application layer for a few NoSQL DBs)
Are the fields "indexed" to allow a lookup faster than O(n)?
How do I handle hot backups and replication?
any issues with "altering" schema or letting entities with different versions of the schema living together?
This is an extremely broad topic covering everything from modeling and implementation to IT and support. It's impossible to really answer all those questions here, especially without details on your situation. However, you seem to be exploring options and avenues. So, I'll just pass on some general food for thought as someone that's implemented a number of systems.
Everybody seems to think their new database paradigm is a replacement for relational databases. So, take those claims with a grain of salt.
I like to think in terms of 3 fundamental models: Relational, Document, and Graphing. Depending on your problem space one or even more of these is the right answer. I would not do financial transactions in anything but relational (SQL Based). If you are building a CMS, then a Document DB is the way to go. If my application is modeling networks (roads, people, connections, networks etc.) I use Neo4J.
As far as production quality, there are solid options in each category. Relational has a bunch. For document databases I'd go MongoDB or a higher level JCR system like Apache Jackrabbit. For graphing, I only have experience with Neo4j and it is rock solid for me.
Whatever you do, don't buy into the hype that "We have the one technology that solves all your problems." It's not there and it narrows your thinking.
I 'm convinced Neo4j is a good replacement for relational databases by now.
It is ACID compliant
Though the community version lacks some features like hot backups, the enterprise edition has
You can get support for it
At first sight (and in the new releases where you don't need a START clause) its query language CYPHER can do almost anything SQL can
but
it's harder to find a CYPHER developer than a SQL one
and it does not have an equivalent optimizer: it matters more than with SQL how you write the query
Though it supports replication and Neo explicitly markets it as a big data product, I can't confirm it is scalable enough and I did not study security aspects.
In recent releases (younger that the question above), one can define indexes on labels, which work like indexes on tables in a relational DB, allowing for O(log(n)) lookups.
(fyi: Neo4j has no tables, but each node(~=row) can have different labels, comparable to gmail labels. This is more flexible: you don't have to chose whether you put cars and bicycles in one for vehicles table or not: a bicycle would have both a :vehicle and a :bicycle label.)
To answer the original question: Neo4j does hardly support for schema enforcement. Neo advices implementing automated consistency tests on your database, which you run on your acceptance test instance as part of your release cycle.
Using an enterprise db such as oracle will give you many, many features which may or may not be part of neo. These include:
ACID transactions
High availability / backups / standby
ability to use sql to get data in the most efficient way using a cost based optimizer - the db determines the best way to retrieve the data based on your latest statistics
Scalability, partitioning
support
security
If you are going to implement most of the functionality of your application in code by yourself and don't require the structure and advanced features offered by an rdbms or if your data structures are better suited to a graph based db then by all means trial neo. There is a reason that most corporate apps use a one of the traditional rdbms servers but this may not always be the case in the future

What kind of databases are used on the cloud and what are the concepts behind?

I was reading this article . And the guy mentioned "Relational databases are no longer the norm in the cloud." What does he mean by that? If Relational databases are not used what kind of databases are we going to use and what is the concept behind?
The link to your article is missing. However, I suspect that the author is referring to various forms of "NoSQL" dbs, such as key-value stores, document-based databases, object-based databases, and various other technologies. The main advantage of this approach is that full relational capabilities of a RDBMS are not always needed, and if you discard them, you can sometimes improve performance and scalability.
For example, Amazon offers the Amazon Simple DB, which is basically a giant distributed dictionary with a nicer query language.
I could go on, but I suspect a few searches for NoSQL would be more useful than anything I could write.

What's the attraction of schemaless database systems?

I've been hearing a lot of talk about schema-less (often distributed) database systems like MongoDB, CouchDB, SimpleDB, etc...
While I can understand they might be valuable for some purposes, in most of my applications I'm trying to persist objects that have a specific number of fields of a specific type, and I just automatically think in the relational model. I'm always thinking in terms of rows with unique integer ids, null/not null fields, SQL datatypes, and select queries to find sets.
While I'm attracted to the distributed nature and easy JSON/RESTful interfaces of these new systems, I don't understand how loosely typed key/value hashes will help me with my development. Why would a loose typed, schema-less system be good for keeping clean data sets? How can I for example, find all items with dates between x and y when they might not have dates? Is there any concept of a join?
I understand many systems have their own differences and strengths, but I'm wondering at the difference in paradigm. I suppose this is an open-ended question, but perhaps the community's answers and ways they have personally seen the advantages of these systems will help enlighten me and others about when I would want to make use of these (admittedly more hip) systems instead of the traditional RDBMS.
I'll just call out one or two common reasons (I'm sure people will be writing essay answers)
With highly distributed systems, any given data set may be spread across multiple servers. When that happens, the relational constraints which the DB engine can guarantee are greatly reduced. Some of your referential integrity will need to be handled in application code. When doing so, you will quickly discover several pain points:
your logic is spread across multiple layers (app and db)
your logic is spread across multiple languages (SQL and your app language of choice)
The outcome is that the logic is less encapsulated, less portable, and MUCH more expensive to change. Many devs find themselves writing more logic in app code and less in the database. Taken to the extreme, the database schema becomes irrelevant.
Schema management—especially on systems where downtime is not an option—is difficult. reducing the schema complexity reduces that difficulty.
ACID doesn't work very well for distributed systems (BASE, CAP, etc). The SQL language (and the entire relational model to a certain extent) is optimized for a transactional ACID world. So some of the SQL language features and best practices are useless while others are actually harmful. Some developers feel uncomfortable about "against the grain" and prefer to drop SQL entirely in favor of a language which was designed from the ground up for their requirements.
Cost: most RDBMS systems aren't free. The leaders in scaling (Oracle, Sybase, SQL Server) are all commercial products. When dealing with large ("web scale") systems, database licensing costs can meet or exceed the hardware costs! The costs are high enough to change the normal build/buy considerations drastically towards building a custom solution on top of an OSS offering (all the significant NOSQL offerings are OSS)
The primary concern should be what do you need to do with your data. If you have a huge data set and are finding a traditional RDBMS to be a bottleneck then you may want to experiment with a schemaless or a a NOSQL solution.
Most environments that I am aware of using NOSQL solutions also use an RDBMS solution in some form or fashion. RDBMS based solutions are the norm where data integrity is extremely important and you need ACID transactions. However if your system is not highly transaction based but you need to scale up or scale out real quick, a NOSQL solution may be desirable.
Schemaless is great for two reasons:
Brain optimising intuitiveness of document storage
Resolves Sparse-Matrix and Entity-Attribute-Value storage problems.
I've used both SQL and No-SQL for production applications in Ruby on Rails. I'm not a database expert and I have to confess to googling ACID and similar terms as they're not familiar to me.
"Ah ha! Another know-nothing trend follower jumping on the latest bandwagon" you may say. But, actually, I'm really pleased with my decision to use MongoDB on our most recent 2 year old app and here's why...
The flip-side of brain-optimising intuitiveness was my experience with the Magento e-commerce system. I don't want to bash it because it served me well at the time but it really hit the processor hard trying to calculate the attributes for each product. The underlying reason was the Entity-Attribute-Value store of product data. Cache or be damned was the solution.
The major advantage to me is the optimisation in the only place that really matters - your own brain. So many technologies are critiqued on their efficiency in memory, processors, hardware and yet having a DB that's extremely intuitive to understand brings its own merits. We've found it quick to add features to our code because the database simply looks a lot like the real world we're modelling. When I've asked e-commerce clients to present me with their product list they will naturally tend to use Excel (think table store). The first columns are easy:
Product Name
Price
Product Type (
Then it gets harder and covered in notes, colour coding and links to other tables (yep.. relationships)
Colour (Only some products)
Size (X Large, Large, Small) - only for products 8'9'10, golf clubs use a different scale
Colour 2. The cat collars have two colour choices.
Wattage
Fixing type (Male, Female)
So it ends in a terrible mess of Excel tables that make no sense to me and not much sense to the people who work with the products day in and day out. We throw our arms in the air and decide to go through the catalogue and then it hits me! Wouldn't it be great if you could store the data as it appears in the catalogue!? Just collections of records on each product that just lists the attribute of that product. You can then pick out common attributes to index for retrieval at a later date. Of course, that's a document store.
In summary, document stores are great when you have a sparse matrix problem or objects that mutate their attributes over time. Having lived in a No-SQL world for 2 years, I can't think of a real world application that doesn't have those features because the world itself looks like a document store.
I've only played with MongoDB but one thing that really interested me was how you could nest documents. In MongoDB a document is basically like a record. This is really nice because traditionally, in a RDBMS, if you needed to pull a "Person" record and get the associated address, employer info, etc. you'd frequently have to go to multiple tables, join them up, make multiple database calls. In a NoSQL solution like MongoDB, you can just nest the associated records (documents) and not have to mess with foreign keys, joining, multiple database calls. Everything associated with that one record is pulled.
This is especially handy when dealing with objects. You can in many cases just store an object as a series of nested documents.
NoSQL databases are not schemaless; the schema is embedded in the data. They are properly called semistructured. In some KV data stores, however, the schema may even be embedded in code. The advantage of the semi-structured approach is two fold: flexibility in which columns are part of a row (one row could have 5 columns and another have 5 different columns, and flexibility in the characteristics of the columns (e.g., variable lengths)
Normally the attraction is that of snake oil - most people favourising them have no clue about the relational theorem and speak SQL on a level making professionals puke. No idea what ACID conditions are, ehy they are important etc.
Not saying they do not have valid uses.... just saying that mostly the attraction is people not knowing what they should know and making stupid conclusions. Again, not everyone is like that, but most developers favouring them are - not good in their understanding what a database system acutally is responsible for.

What is NoSQL, how does it work, and what benefits does it provide? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I've been hearing things about NoSQL and that it may eventually become the replacement for SQL DB storage methods due to the fact that DB interaction is often a bottle neck for speed on the web.
So I just have a few questions:
What exactly is it?
How does it work?
Why would it be better than using a SQL Database? And how much better is it?
Is the technology too new to start implementing yet or is it worth taking a look into?
There is no such thing as NoSQL!
NoSQL is a buzzword.
For decades, when people were talking about databases, they meant relational databases. And when people were talking about relational databases, they meant those you control with Edgar F. Codd's Structured Query Language. Storing data in some other way? Madness! Anything else is just flatfiles.
But in the past few years, people started to question this dogma. People wondered if tables with rows and columns are really the only way to represent data. People started thinking and coding, and came up with many new concepts how data could be organized. And they started to create new database systems designed for these new ways of working with data.
The philosophies of all these databases were different. But one thing all these databases had in common, was that the Structured Query Language was no longer a good fit for using them. So each database replaced SQL with their own query languages. And so the term NoSQL was born, as a label for all database technologies which defy the classic relational database model.
So what do NoSQL databases have in common?
Actually, not much.
You often hear phrases like:
NoSQL is scalable!
NoSQL is for BigData!
NoSQL violates ACID!
NoSQL is a glorified key/value store!
Is that true? Well, some of these statements might be true for some databases commonly called NoSQL, but every single one is also false for at least one other. Actually, the only thing NoSQL databases have in common, is that they are databases which do not use SQL. That's it. The only thing that defines them is what sets them apart from each other.
So what sets NoSQL databases apart?
So we made clear that all those databases commonly referred to as NoSQL are too different to evaluate them together. Each of them needs to be evaluated separately to decide if they are a good fit to solve a specific problem. But where do we begin? Thankfully, NoSQL databases can be grouped into certain categories, which are suitable for different use-cases:
Document-oriented
Examples: MongoDB, CouchDB
Strengths: Heterogenous data, working object-oriented, agile development
Their advantage is that they do not require a consistent data structure. They are useful when your requirements and thus your database layout changes constantly, or when you are dealing with datasets which belong together but still look very differently. When you have a lot of tables with two columns called "key" and "value", then these might be worth looking into.
Graph databases
Examples: Neo4j, GiraffeDB.
Strengths: Data Mining
While most NoSQL databases abandon the concept of managing data relations, these databases embrace it even more than those so-called relational databases.
Their focus is at defining data by its relation to other data. When you have a lot of tables with primary keys which are the primary keys of two other tables (and maybe some data describing the relation between them), then these might be something for you.
Key-Value Stores
Examples: Redis, Cassandra, MemcacheDB
Strengths: Fast lookup of values by known keys
They are very simplistic, but that makes them fast and easy to use. When you have no need for stored procedures, constraints, triggers and all those advanced database features and you just want fast storage and retrieval of your data, then those are for you.
Unfortunately they assume that you know exactly what you are looking for. You need the profile of User157641? No problem, will only take microseconds. But what when you want the names of all users who are aged between 16 and 24, have "waffles" as their favorite food and logged in in the last 24 hours? Tough luck. When you don't have a definite and unique key for a specific result, you can't get it out of your K-V store that easily.
Is SQL obsolete?
Some NoSQL proponents claim that their favorite NoSQL database is the new way of doing things, and SQL is a thing of the past.
Are they right?
No, of course they aren't. While there are problems SQL isn't suitable for, it still got its strengths. Lots of data models are simply best represented as a collection of tables which reference each other. Especially because most database programmers were trained for decades to think of data in a relational way, and trying to press this mindset onto a new technology which wasn't made for it rarely ends well.
NoSQL databases aren't a replacement for SQL - they are an alternative.
Most software ecosystems around the different NoSQL databases aren't as mature yet. While there are advances, you still haven't got supplemental tools which are as mature and powerful as those available for popular SQL databases.
Also, there is much more know-how for SQL around. Generations of computer scientists have spent decades of their careers into research focusing on relational databases, and it shows: The literature written about SQL databases and relational data modelling, both practical and theoretical, could fill multiple libraries full of books. How to build a relational database for your data is a topic so well-researched it's hard to find a corner case where there isn't a generally accepted by-the-book best practice.
Most NoSQL databases, on the other hand, are still in their infancy. We are still figuring out the best way to use them.
What exactly is it?
On one hand, a specific system, but it has also become a generic word for a variety of new data storage backends that do not follow the relational DB model.
How does it work?
Each of the systems labelled with the generic name works differently, but the basic idea is to offer better scalability and performance by using DB models that don't support all the functionality of a generic RDBMS, but still enough functionality to be useful. In a way it's like MySQL, which at one time lacked support for transactions but, exactly because of that, managed to outperform other DB systems. If you could write your app in a way that didn't require transactions, it was great.
Why would it be better than using a SQL Database? And how much better is it?
It would be better when your site needs to scale so massively that the best RDBMS running on the best hardware you can afford and optimized as much as possible simply can't keep up with the load. How much better it is depends on the specific use case (lots of update activity combined with lots of joins is very hard on "traditional" RDBMSs) - could well be a factor of 1000 in extreme cases.
Is the technology too new to start implementing yet or is it worth taking a look into?
Depends mainly on what you're trying to achieve. It's certainly mature enough to use. But few applications really need to scale that massively. For most, a traditional RDBMS is sufficient. However, with internet usage becoming more ubiquitous all the time, it's quite likely that applications that do will become more common (though probably not dominant).
Since someone said that my previous post was off-topic, I'll try to compensate :-) NoSQL is not, and never was, intended to be a replacement for more mainstream SQL databases, but a couple of words are in order to get things in the right perspective.
At the very heart of the NoSQL philosophy lies the consideration that, possibly for commercial and portability reasons, SQL engines tend to disregard the tremendous power of the UNIX operating system and its derivatives.
With a filesystem-based database, you can take immediate advantage of the ever-increasing capabilities and power of the underlying operating system, which have been steadily increasing for many years now in accordance with Moore's law. With this approach, many operating-system commands become automatically also "database operators" (think of "ls" "sort", "find" and the other countless UNIX shell utilities).
With this in mind, and a bit of creativity, you can indeed devise a filesystem-based database that is able to overcome the limitations of many common SQL engines, at least for specific usage patterns, which is the whole point behind NoSQL's philosophy, the way I see it.
I run hundreds of web sites and they all use NoSQL to a greater or lesser extent. In fact, they do not host huge amounts of data, but even if some of them did I could probably think of a creative use of NoSQL and the filesystem to overcome any bottlenecks. Something that would likely be more difficult with traditional SQL "jails". I urge you to google for "unix", "manis" and "shaffer" to understand what I mean.
If I recall correctly, it refers to types of databases that don't necessarily follow the relational form. Document databases come to mind, databases without a specific structure, and which don't use SQL as a specific query language.
It's generally better suited to web applications that rely on performance of the database, and don't need more advanced features of Relation Database Engines. For example, a Key->Value store providing a simple query by id interface might be 10-100x faster than the corresponding SQL server implementation, with a lower developer maintenance cost.
One example is this paper for an OLTP Tuple Store, which sacrificed transactions for single threaded processing (no concurrency problem because no concurrency allowed), and kept all data in memory; achieving 10-100x better performance as compared to a similar RDBMS driven system. Basically, it's moving away from the 'One Size Fits All' view of SQL and database systems.
In practice, NoSQL is a database system which supports fast access to large binary objects (docs, jpgs etc) using a key based access strategy. This is a departure from the traditional SQL access which is only good enough for alphanumeric values. Not only the internal storage and access strategy but also the syntax and limitations on the display format restricts the traditional SQL. BLOB implementations of traditional relational databases too suffer from these restrictions.
Behind the scene it is an indirect admission of the failure of the SQL model to support any form of OLTP or support for new dataformats. "Support" means not just store but full access capabilities - programmatic and querywise using the standard model.
Relational enthusiasts were quick to modify the defnition of NoSQL from Not-SQL to Not-Only-SQL to keep SQL still in the picture! This is not good especially when we see that most Java programs today resort to ORM mapping of the underlying relational model. A new concept must have a clearcut definition. Else it will end up like SOA.
The basis of the NoSQL systems lies in the random key - value pair. But this is not new. Traditional database systems like IMS and IDMS did support hashed ramdom keys (without making use of any index) and they still do. In fact IDMS already has a keyword NONSQL where they support SQL access to their older network database which they termed as NONSQL.
It's like Jacuzzi: both a brand and a generic name. It's not just a specific technology, but rather a specific type of technology, in this case referring to large-scale (often sparse) "databases" like Google's BigTable or CouchDB.
NoSQL the actual program appears to be a relational database implemented in awk using flat files on the backend. Though they profess, "NoSQL essentially has no arbitrary limits, and can work where other products can't. For example there is no limit on data field size, the number of columns, or file size" , I don't think it is the large scale database of the future.
As Joel says, massively scalable databases like BigTable or HBase, are much more interesting. GQL is the query language associated with BigTable and App Engine. It's largely SQL tweaked to avoid features Google considers bottle-necks (like joins). However, I haven't heard this referred to as "NoSQL" before.
NoSQL is a database system which doesn't use string based SQL queries to fetch data.
Instead you build queries using an API they will provide, for example Amazon DynamoDB is a good example of a NoSQL database.
NoSQL databases are better for large applications where scalability is important.
Does NoSQL mean non-relational database?
Yes, NoSQL is different from RDBMS and OLAP. It uses looser consistency models than traditional relational databases.
Consistency models are used in distributed systems like distributed shared memory systems or distributed data store.
How it works internally?
NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key-value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.
It can work on Structured and Unstructured Data. It uses Collections instead of Tables
How do you query such "database"?
Watch SQL vs NoSQL: Battle of the Backends; it explains it all.

Resources