I see that Oracle, DB2 and SQL Server contain a new column XML. I'm developing using DB2 and from a database design you can break the 1NF if the xml contains a list.
Am I wrong to assume that SQLXML can break 1NF ?
Thank you,
The relational model is orthogonal to types and places no particular limitations on type complexity. A type could be arbitrarily complex, perhaps containing documents, images, video, etc, as long as all relational operations are supported for relations containing that type. First Normal Form is really just the definition of what a relation schema is, so in principle XML types are permissable by 1NF.
Oracle, DB2 and Microsoft SQL Server are not truly relational however and don't always represent relations and relational operations faithfully. For example SQL Server doesn't support comparison between XML values which means operations like σ(x=x)R or even π(x)R are not possible if x is an XML column. I haven't tried the same with DB2 and Oracle. It is moot whether such tables can properly be said to satisfy 1NF since the XML is implemented as "special" data that doesn't behave as we expect data to behave in relations. Given such limitations I think the important question is whether the proprietary XML type in your chosen DBMS is actually fit for your purposes at all.
The SQL standard defines in its part 14 the XML data type, its semantics and functions around that data type ("SQL/XML"). You could "legally" store few bytes in the XML column or stuff an entire database into a single XML value. It is up to the user and yes, it breaks classic database design. However, if the rest of the database is in 1NF and the XML-typed column is used only for some special payloads (app data, configurations, legal docs, digital signatures, ...) they make a great combination.
There are already other data types and SQL features that allow to break 1NF. Same as above, it is up to the user.
Related
I thought at first that it isn't a relational DB, but after I read that I can join tables and it was written on their site https://crate.io/overview/ (see Use cases), I'm not sure.
Especially I got confused by the senctence:
CrateDB is based on a NoSQL architecture, but features standard SQL.
from https://crate.io/overview/high-level-architecture/
Going by a Codd's 12 rules (which have been used to identify relational databases), CrateDB is not a relational database - yet. CrateDB's eventual consistency model does not prohibit that.
Rule 0: For any system that is advertised as, or claimed to be, a relational data base management system, that system must be able to manage data bases entirely through its relational capabilities.
CrateDB doesn't have another interface with which data can be inserted, retrieved, and updated.
Rule 1: All information in a relational data base is represented explicitly at the logical level and in exactly one way — by values in tables.
Exactly what can be found in CrateDB.
Rule 2: Each and every datum (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.
This is strictly enforced. Access through primary keys will even give you read-after-write consistency.
Rule 3: Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type.
CrateDB supports null.
Rule 4: The data base description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data.
CrateDB has among other meta-tables, Information Schema tables
Rule 5: A relational system may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and that is comprehensive in supporting all of the following items:
Data definition.
View definition.
Data manipulation (interactive and by program).
Integrity constraints.
Authorization.
Transaction boundaries (begin, commit and rollback).
CrateDB supports data definition and data manipulation parts and only a single integrity constraint (primary key). This is definitely incomplete.
Rule 6: All views that are theoretically updatable are also updatable by the system.
CrateDB does not support views yet.
Rule 7: The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data but also to the insertion, update and deletion of data.
CrateDB currently only does that for data retrieval...
Rule 8: Application programs and terminal activities remain logically unimpared whenever any changes are made in either storage representations or access methods.
CrateDB's use of SQL allows for this; performance/storage level improvements are even delivered via system upgrades.
Rule 9: Application programs and terminal activites remain logically unimpared when information-preserving changes of any kind that theoretically permit unimpairment are made to the base tables.
Parts of this are still missing (the views, inserts/updates on joins). However for retrieving data, this is already the case.
Rule 10: Integrity constraints specific to a particular relational data base must be definable in the relational data sublanguage and storable in the catalog, not in the application programs.
This is quite tricky for a distributed database, specifically the foreign key constraints. CrateDB only supports primary key constraints for now.
Rule 11: A relational DBMS has distribution independence.
In CrateDB any kind of sharding/partitioning/distribution is handled transparently for the user. Any kinds of constraints/settings for data distribution are applied on the data definition level.
Rule 12: If a relational system has a low-level (single-record-at-a-time) language, that low level cannot be used to subvert or bypass the integrity rules and constraints expressed in the higher level relational language (multiple-records-at-a-time).
One could argue that COPY FROM directly violates this rule since there is no type checking and conversion happening underneath. However there is no other command/language/API that would allow data manipulation otherwise.
While CrateDB certainly has some catching up to do, there is no reason why it wouldn't become a relational database in this sense soon. Its SQL support may not be on par with Oracle's or Postgres' but many people can live without some very use-case specific features.
As said above, all of the rules above are not directly violated, but rather not implemented yet in a satisfactory manner, so there is no reason why CrateDB can't become a fully relational database eventually.
(Disclaimer: I work there)
Since the beginning of the relational model the three main components that a system must have to be considered relational are (applying Codd's three-component definition of "data model" to the relational model):
data is presented as relations (tables)
manipulation is via relation and/or logic operators/expressions
integrity is enforced by relation and/or logic operators/expressions
Also a multi-user DMBS has been understood to support apparently atomic persistent transactions while benefiting from implementation via overlapped execution (ACID) and a distributed DBMS has been understood to support an apparent single database while benefiting from implementation at multiple sites.
By these criteria CrateDB is not relational.
It has tables, but its manipulation of tables in extremely limited and it has almost no integrity functionality. Re manipulation, it allows querying for rows of a table meeting a condition (including aggregation), and it allows joining multiple tables, but that's not optimized, even for equijoin. Re constraints, its only functionality is column typing, primary keys and non-null columns. It uses a tiny subset of SQL.
See the pages at your link re Supported Features and Standard SQL Compliance as addressed in:
Crate SQL
Data Definition
Constraints (PRIMARY KEY Constraint, NOT NULL Constraint)
Indices
Data Manipulation
Querying Crate
Retrieving Data (FROM Clause, Joins)
Joins
Crate SQL Syntax Reference
As usual with non-relational DBMSs, their documentation does not reflect an understanding or appreciation of the relational model or other fundamental DBMS functionality.
CrateDB is a distributed SQL database. The underlying technology is similar to what so called NoSQL databases typically use (shared nothing architecture, columnar indexes, eventual-consistency, support for semi-structured records) - but makes it accessible via a traditional SQL interface.
So therefor - YES, CrateDB is somewhat of a relational SQL DB.
What is the difference between a DBMS and an RDBMS with some examples and some new tools as examples. Why can't we really use a DBMS instead of an RDBMS or vice versa?
A relational DBMS will expose to its users "relations, and nothing else". Other DBMS's will violate that principle in various ways. E.g. in IDMS, you could do <ACCEPT <hostvar> FROM CURRENCY> and this would expose the internal record id of the "current record" to the user, violating the "nothing else".
A relational DBMS will allow its users to operate exclusively at the logical level, i.e. work exclusively with assertions of fact (which are represented as tuples). Other DBMS's made/make their users operate more at the "record" level (too "low" on the conceptual-logical-physical scale) or at the "document" level (in a certain sense too "high" on that same scale, since a "document" is often one particular view of a multitude of underlying facts).
A relational DBMS will also offer facilities for manipulation of the data, in the form of a language that supports the operations of the relational algebra. Other DBMS's, seeing as they don't support relations to boot, obviously cannot build their data manipulation facilities on relational algebra, and as a consequence the data manipulation facilities/language is mostly ad-hoc. On the "too low" end of the spectrum, this forces DBMS users to hand-write operations such as JOIN again and again and again. On the "too high" end of the spectrum, it causes problems of combinatorial explosion in language complexity/size (the RA has some 4 or 5 primitive operators and that's all it needs - can you imagine 4 or 5 operators that will allow you to do just any "document transform" anyone would ever want to do ?)
(Note very carefully that even SQL systems violate basic relational principles quite seriously, so "relational DBMS" is a thing that arguably doesn't even exist, except then in rather small specialized spaces, see e.g. http://www.thethirdmanifesto.com/ - projects page.)
DBMS : Database management system, here we can store some data and collect.
Imagine a single table , save and read.
RDBMS : Relational Database Management , here you can join several tables together and get related data and queried data ( say data for a particular user or for an particular order,not all users or all orders)
The Noramalization forms comes into play in RDBMS, we dont need to store repeated data again and again, can store in one table, and use the id in other table, easier to update, and for reading we can join both the table and get what we want.
DBMS:
DBMS applications store data as file.In DBMS, data is generally stored in either a hierarchical form or a navigational form.Normalization is not present in DBMS.
RDBMS:
RDBMS applications store data in a tabular form.In RDBMS, the tables have an identifier called primary key and the data values are stored in the form of tables.Normalization is present in RDBMS.
I have seen a lot of topics asking for the choice of a database for a voting mechanism,but my inputs are a bit different. I have an application which contains a GUI in which there can be multiple fields/ radio button or a combination of the above. THe GUI is not fixed. Based on the form submitted, the answer XML is dynamically generated.
Thus if there is a form there can be 10000 different people submitting the same form . and i will be having 10000 different forms(numbers will increase).
I now have the following 2 options. Store every xml as it is in the database ( i have not made the choice of using a relational db or a nosql db like mongodb.) or parse the xml and create tables for every form. THat way the number of tables will be huge.
Now , I have to build a voting mechanism which basically looks at all the xml's that have been generated for a particular form i.e 10000 xml's and extract the answers submitted (Note: the xml is complex because 1 form can have multiple answer elements) and then do a vote to find how many people have given the same answer.
My Questions:
Should I use a relational db or NOSQL (MongoDB /Redis or similar ones)?
Do I need to save the xml documents as it is in the db or should I parse it and convert it to tables and save it? Any other approach that I can follow.
I am using JAVA/J2EE for devlepment currenty.
If your question is about how to store data of variable structure, then document database would be pretty handy. As it is schema-less, there will be no issues with rdbms columns maintenance.
Logically this way is pretty similar to storing xml in relational db. The difference is that with rdbms approach, each database reader should have a special xml parsing layer. (Also about xml you refer to Why would I ever choose to store and manipulate XML in a relational database?.)
In general, if you're planning to have a single database client, you can use xml/rdbms.
By the way, instead of storing xml, you can use rdbms in other way - define "generic" structure. For example, you can have "Entities (name, type, id)" table, and "Attributes (entityId, name, type, value)".
If you store XML in the DB - you gain flexibility against performance and maintainability (XML parsing with xpath etc can be verbose and error prone especially with complex and deeply nested XML structures)
If you store tables for each XML - you gain performance, ease of use, complexity against flexibility
Pick a hybrid approach. Store XMLs in a rdbms table as a generic XML structure (as suggested in one of the answers). This way you have fewer tables (less complexity) and avoid all the performance issue of XML parsing.
It's clear that you can use numeric characters in SQL table names and use them so long as they're not at the beginning. (There's a discussion here on one of the side effects: SQLite issue with Table Names using numbers?)
The database I'm targetting is Oracle 10g/11g.
I'm designing a reporting database where naming some of the entities clearly is best done by describing the reports, which are named after numbers ('part 45', '102S', '401'). It's just the business domain language: these reports just aren't commonly referred to by any other name. The entities I'm modelling really are best named this way.
My question is: am I going to have difficulties with maintenance or programmability if I put numbers in a table name? I'm always worried about ancillary software around the database: drivers, ETL code that might not play nice with a non-plain-vanilla name. But there's a real benefit in intelligibility in this business domain, so am I just being squeamish?
My question put simply is: are there any 'gotchas' or corner cases that would rule out a table name like PART_45_AUDIT?
If PART_45_AUDIT is really the clearest description of the entity you're modeling (which would be very rare), there shouldn't be any gotchas to having numbers in the middle of a name. Putting numbers at the front of the name would be a different story because that would require using double-quoted identifiers and there are plenty of tools that don't fully support double-quoted identifiers. Plus, of course, it's rather annoying to have to type the double-quotes every time you reference the table.
CREATE TABLE "102S" (
col1 number
);
SELECT *
FROM "102S"
I am experimenting with a possible data structure for an app of mine, and I need to provision a column in one of my SQL Server data tables to hold various data of unpredictable size.
Literally, this could mean a string of text, or a Base64 encoded video clip and everything in between.
I realize that the instant response is going to be that I should provision different tables for different types -- and I don't disagree -- but please humor me here.
varchar(MAX)?
nvarchar(MAX)?
I am not a DBA so I don't know what type gives me the most flexibility for the lowest storage cost.
VARBINARY(MAX)?
In principle, trying to force multiple different data types into a single type is a bad idea. You may be better served with a different table for each type. But if you're never going to search the field, you should be able to do anything with a binary field...
Consider using the xml datatype. It will permit you to store, query and index arbitrary XML documents.