My question is, is it necessary for a relation/table in database to have a candidate key and hence a primary key? Is it possible to have a relation where a row cannot be uniquely identified by any combination of attributes?
If no, why? And if yes, then how does a DBMS make operations like search, delete etc, efficient?
Relations always have distinct tuples which means that in a Relational DBMS a table always has at least one candidate key.
SQL is a different case. SQL tables are "tuple bags", not relations. SQL tables can have duplicate rows, which is one of SQL's biggest flaws. Despite the fact that SQL supports duplicate rows the language is ill-suited to cope with them. In the presence of duplicate rows the SQL standard UPDATE and DELETE for instance have no guaranteed way to reference individual rows without resorting to some complex cursor-based operations.
Consequent problems of duplicate rows are certain inefficiencies and complexities of SQL DBMSs and a lack of orthogonality in their features. SQL DBMS engines have to use internal structures and support special features as a prerequisite in order to deal with duplicate rows. Some DBMS vendors try to get around the difficulties by disabling certain features for tables that don't have keys.
A database does not require a primary key. A table is just an unordered set of rows. Without any indexes, the only mechanism for accessing rows in a table is a full table scan (or a full partition scan, if the table is partitioned). Such operations are only efficient for very small numbers of rows.
Tables are more useful when you can refer to particular rows. Often, the best primary keys are auto incremented/identity primary keys. These are maintained by the database. In practice, all tables in a well-designed database are going to have primary keys. Here are three reasons:
Rows can be referred to by other tables.
Individual rows can be updated and deleted.
Individual rows can be selected efficiently and unambiguously.
Note: you can have indexes on a table without primary keys. And combinations of one or more columns can be made unique, even if the combination is not a primary key. The primary key itself is an index, so the inverse is not true. And all rows in a table have "row addresses" which are unique. Whether or not these are available for queries depends on the database engine.
Yes, this is possible.
Just note, that some identifier does exists behind the scenes (Example from SQL Server):
When a table is stored as a heap, individual rows are identified by
reference to a row identifier (RID) consisting of the file number, data page number, and slot on the page
How operations will be performed?
A table scan will be needed for almost any operation:
If a table is a heap and does not have any nonclustered indexes, then
the entire table must be examined (a table scan) to find any row
Related
Friends,
Am new to DB venture, I needed some help/information.
There is a table in our project say "record_table" , values in it is inserted using C++ code.
This table has multiple columns, out of which three columns say for eg. "serialNo, type, sub_type" that C++ code is inserting duplicate values for combination of these columns( these columns are no where unique or primary for that table). But the combination of 3 columns should be unique.
Now we want to make sure duplicates for this combination shouldnt be inserted. I was thinking of adding unique constraint for these columns so that when new record is to be inserted with these duplicated values it will not allow to .
I assume this should work, but I have a doubt will it hit the performance, the C++ binary runs daily and it inserts around 2 million records. Will creating unique constraint hit performance.(Mean will the run time slow down or since the table has millions of records will creating unique constraint make no sense as it has to make a hash of these columns etc)
Please suggest if you can.
Unique constraints are enforced through an index. Chances are you need that index anyway, for querying the data back again, so the overhead of maintaining it is irrelevant.
The real question is, what is the performance impact of handling duplicate records if you don't enforce the constraint? Generally speaking the performance impact of enforcing constraints is trivial compared to fixing data corruption.
Very new to sql server. I have a db with about 20 tables each with around 40 columns. How can I select two tables and see if they have any columns in common?
I basically want to see where I can make joins.. If there's a better way of quickly telling where I can combine info from two tables that could be helpful too.
First of all, in relational databases there is not such a concept of "joinable tables and/or columns". You can always list two relations (= tables) crossing every row in one relation with each row of the other (the cross/carthesian product of them) and then filter those based on some predicate (also called a "join", if the predicate involves columns of both relations).
The idea of "joinable" tables/columns comes into being only when thinking about the database schema. The schema's author can ask the database engine to enforce some referential integrity, by means of foreign keys.
Now if your database schema is well done (that is, its author was kind/clever enough to put referential integrity all over the schema) you can have a clue of which tables are joinable (by which columns).
To find those foreign keys, for each table you can run sp_help 'databasename.tablename' (you can omit the databasename. part, if it is the current database).
This command will output some facts about the given table, like its columns (along with their datatypes, requiredness, ...), its indexes and so on. Somewhere near the end it will list foreign keys along with where (if ever) its primary key is imported as foreign key on other tables.
For each key imported as foreign key on other table you have a candidate predicate for a join.
Please note that this procedure will only work if the foreign keys are set correctly. If they aren't, you can fix your database schema (but to do this you must know already which tables are joinable anyway). Also it won't show you joinable tables on other databases (in the same or linked server).
This also won't work for views.
Try to see in the SQL Management Studio, in the database diagram, there you find the relations between tables.
Can HashTables be used to create indexes in databases? What is the ideal Data structure to create indexes?
If a table has has a foreign key referencing a field in other database does will it help if we create index on the foreign key?
Can HashTables be used to create indexes in databases?
Some DBMSes support hash-based indexes, some don't.
What is the ideal Data structure to create indexes?
No data structure occupies 0 bytes, nor it can be manipulated in 0 CPU cycles, therefore no data structure is "ideal". It is upon us, the software engineers, to decide which data structure has most benefits and fewest detriments to the specific goal we are trying to accomplish.
For example, B-Trees are useful for range scans and hash indexes aren't. Does that mean the B-Trees are "better"? Well, they are if you need range scans, but may not necessarily be if you don't.
If a table has has a foreign key referencing a field in other database does will it help if we create index on the foreign key?
You can not normally have a foreign key toward another database, only another table.
And yes, it tends to help, since every time a row is updated or deleted in the parent table, the child table needs to be searched to see if the FK was violated. This search can significantly benefit from such an index. Many (but not all) DBMSes require index on FK (and might even create it automatically if not already there).
OTOH, if you only add rows to the parent table, you could consider leaving the child table unindexed on FK fields (assuming your DBMS allows you to do so).
Oracle Perspective
Oracle supports clustering by hash value, either for single or multiple tables. This physically colocates rows having the same hash value for the cluster columns, and is faster than accessing via an index. There are disadvantages due to increased complexity and a certain need for preplanning.
You could also use a function-based index to index based on a hash function applied to one or more columns. I'm not sure what the advantage of that would be though.
Foreign key columns in Oracle generally benefit from indexing due to the obvious performance advantages.
Index Organized Tables (IOTs) are tables stored in an index structure. Whereas a table stored
in a heap is unorganized, data in an IOT is stored and sorted by primary key (the data is the index). IOTs behave just like “regular” tables, and you use the same SQL to access them.
Every table in a proper relational database is supposed to have a primary key... If every table in my database has a primary key, should I always use an index organized table?
I'm guessing the answer is no, so when is an index organized table not the best choice?
Basically an index-organized table is an index without a table. There is a table object which we can find in USER_TABLES but it is just a reference to the underlying index. The index structure matches the table's projection. So if you have a table whose columns consist of the primary key and at most one other column then you have a possible candidate for INDEX ORGANIZED.
The main use case for index organized table is a table which is almost always accessed by its primary key and we always want to retrieve all its columns. In practice, index organized tables are most likely to be reference data, code look-up affairs. Application tables are almost always heap organized.
The syntax allows an IOT to have more than one non-key column. Sometimes this is correct. But it is also an indication that maybe we need to reconsider our design decisions. Certainly if we find ourselves contemplating the need for additional indexes on the non-primary key columns then we're probably better off with a regular heap table. So, as most tables probably need additional indexes most tables are not suitable for IOTs.
Coming back to this answer I see a couple of other responses in this thread propose intersection tables as suitable candidates for IOTs. This seems reasonable, because it is common for intersection tables to have a projection which matches the candidate key: STUDENTS_CLASSES could have a projection of just (STUDENT_ID, CLASS_ID).
I don't think this is cast-iron. Intersection tables often have a technical key (i.e. STUDENT_CLASS_ID). They may also have non-key columns (metadata columns like START_DATE, END_DATE are common). Also there is no prevailing access path - we want to find all the students who take a class as often as we want to find all the classes a student is taking - so we need an indexing strategy which supports both equally well. Not saying intersection tables are not a use case for IOTs. just that they are not automatically so.
I'd consider them for very narrow tables (such as the join tables used to resolve many-to-many tables). If (virtually) all the columns in the table are going to be in an index anyway, then why shouldn't you used an IOT.
Small tables can be good candidates for IOTs as discussed by Richard Foote here
I consider the following kinds of tables excellent candidates for IOTs:
"small" "lookup" type tables (e.g. queried frequently, updated infrequently, fits in a relatively small number of blocks)
any table that you already are going to have an index that covers all the columns anyway (i.e. may as well save the space used by the table if the index duplicates 100% of the data)
From the Oracle Concepts guide:
Index-organized tables are useful when
related pieces of data must be stored
together or data must be physically
stored in a specific order. This type
of table is often used for information
retrieval, spatial (see "Overview of
Oracle Spatial"), and OLAP
applications (see "OLAP").
This question from AskTom may also be of some interest especially where someone gives a scenario and then asks would an IOT perform better than an heap organised table, Tom's response is:
we can hypothesize all day long, but
until you measure it, you'll never
know for sure.
An index-organized table is generally a good choice if you only access data from that table by the key, the whole key, and nothing but the key.
Further, there are many limitations about what other database features can and cannot be used with index-organized tables -- I recall that in at least one version one could not use logical standby databases with index-organized tables. An index-organized table is not a good choice if it prevents you from using other functionality.
All an IOT really saves is the logical read(s) on the table segment, and as you might have spent two or three or more on the IOT/index this is not always a great saving except for small data sets.
Another feature to consider for speeding up lookups, particularly on larger tables, is a single table hash cluster. When correctly created they are more efficient for large data sets than an IOT because they require only one logical read to find the data, whereas an IOT is still an index that needs multiple logical i/o's to locate the leaf node.
I can't per se comment on IOTs, however if I'm reading this right then they're the same as a 'clustered index' in SQL Server. Typically you should think about not using such an index if your primary key (or the value(s) you're indexing if it's not a primary key) are likely to be distributed fairly randomly - as these inserts can result in many page splits (expensive).
Indexes such as identity columns (sequences in Oracle?) and dates 'around the current date' tend to make for good candidates for such indexes.
An Index-Organized Table--in contrast to an ordinary table--has its own way of structuring, storing, and indexing data.
Index organized tables (IOT) are indexes which actually hold the data which is being indexed, unlike the indexes which are stored somewhere else and have links to actual data.
If I have a lookup table with very few records in it (say, less than ten), should I bother putting an index on the Foreign Key of another table to which it is attached? For that matter, does the lookup table even need an index on the Primary Key?
Specifically, is there any performance benefit that outweighs the overhead of maintaining the indexes? If not, are there any benefits other than speed?
Note: an example of a lookup table might be Order Status, where the tuples are:
1 - Order Received
2 - In Process
3 - Shipped
4 - Paid
On a transactional system there may be no significant benefit to putting an index on such a column (i.e. a low cardinality reference column) as the query optimiser probably won't use it. It will also generate additional disk traffic on writes to the table as the indexes have to be updated. So for low cardinality FK's on a transactional database it is usually better not to index the columns. This particularly applies to high volume systems.
Note that you may still want the FK for referential integrity and that the FK lookup on a small reference table will probably generate no I/O as the lookup table will almost always be cached.
However, you may find that you want to include the column in a composite index for some reason - perhaps to create a covering index for a commonly used query.
On a table that is frequently bulk-loaded (e.g. a data warehouse) the index write traffic will be much larger than that of the table load if you have many indexed columns. You will probably need to drop or disable the FKs and indexes for a bulk load if any indexes are present.
On a Star Schema you can get some benefit from indexing low cardinality columns, even on SQL Server. If you are doing a highly selective query (i.e. one where the query optimiser decides that the row set returned will be small) then it can do a 'star query' plan where it uses a technique known as index intersection.
Generally, query plans on a star schema should be based around a table scan of the fact table or a highly selective process that bookmarks the fact table and then returns a smaller set of rows. Index intersection is efficient for the latter type of query as the selection can be resolved before doing any I/O on the fact table.
Bitmap indexes are a real win for low cardinality columns on platforms such as Oracle that support them, but SQL Server does not. Even so, low cardinality indexes can still participate in star query plans on SQL Server.
Yes, always have an index.
The query optimizer of a modern database management system (DBMS) will make the determination as to which is faster: (1) actually reading from an index on a column, (2) performing a full table scan.
The table size (in number of rows) needs to be "large enough" for use of the index to be considered.
Yes to both. Always index as a rule of thumb.
Points:
You also can't set up an FK without a unique index on the lookup table
What if you want to delete or update in the lookup table? Especially accidently...
However, saying that, we don't always.
We have very OLTP table (5 million rows+ per day) with several parent tables. We only indexes on the FK columns where we need them. We assume no deletes/key updates on some parent tables, so we reduce the amount of work needed and disk space used.
We used the SQL Server 2005 dmvs to establish that indexes weren't used. We still have the FK in place though.
My personal opinion is that you should... it may be small now but ALWAYS anticipate your tables growing in size. A good database schema will grow easily with more records. Foreign Keys are almost always a good idea.
In sql server, the primary key is the clustered index if there isn't one already (clustered index that is).