Identifying where joins can be made between two sql server tables

Identifying where joins can be made between two sql server tables - sql-server

Very new to sql server. I have a db with about 20 tables each with around 40 columns. How can I select two tables and see if they have any columns in common?
I basically want to see where I can make joins.. If there's a better way of quickly telling where I can combine info from two tables that could be helpful too.

First of all, in relational databases there is not such a concept of "joinable tables and/or columns". You can always list two relations (= tables) crossing every row in one relation with each row of the other (the cross/carthesian product of them) and then filter those based on some predicate (also called a "join", if the predicate involves columns of both relations).
The idea of "joinable" tables/columns comes into being only when thinking about the database schema. The schema's author can ask the database engine to enforce some referential integrity, by means of foreign keys.
Now if your database schema is well done (that is, its author was kind/clever enough to put referential integrity all over the schema) you can have a clue of which tables are joinable (by which columns).
To find those foreign keys, for each table you can run sp_help 'databasename.tablename' (you can omit the databasename. part, if it is the current database).
This command will output some facts about the given table, like its columns (along with their datatypes, requiredness, ...), its indexes and so on. Somewhere near the end it will list foreign keys along with where (if ever) its primary key is imported as foreign key on other tables.
For each key imported as foreign key on other table you have a candidate predicate for a join.
Please note that this procedure will only work if the foreign keys are set correctly. If they aren't, you can fix your database schema (but to do this you must know already which tables are joinable anyway). Also it won't show you joinable tables on other databases (in the same or linked server).
This also won't work for views.

Try to see in the SQL Management Studio, in the database diagram, there you find the relations between tables.

Related

Transferring data from MS Access to SQL Server

I have MS Access tables that are indexed but accept duplicates. Tables have been transferred to SQL Server and linked to Access.
How do you replicate primary keys that accept duplicates?

You should consider using SSMA
Sql Server Migration Assistant for Access
It can move up tables.
It will move up related data, setup PK columns for you.
It will maintain, and create relationships for you.
It will maintain and create all indexes you have now.
If you just have a few tables, say 2-5 tables? Then sure, just import them and setup the relatonships and indexs your self.
However, the last few migrations of data from Access to SQL server? There was in excess of 80 tables - and HUGE numbers of reatonships, indexes, and of course PK settings. The migration wizard thus can send up all of the tables - and setup all of the tables correctly on SQL server for you. Including PK values, FK values (Foreign Keys (related tables)),
and even constraints are in most cases correctly move up to sql server.
what is nice, is then you can re-link the tables in your Access application, and you now using SQL server for the back end database.
SSMAA can be found here:
https://www.microsoft.com/en-us/download/details.aspx?id=54255

I think you're confusing an "index" with a "primary key".
An "index" is a structure that helps optimise queries. Indexes don't have to be unique. A "primary key" is a logical constraint on a column which requires that all values in the column are unique.
It sounds like what you want to do is import the data into SQL and create an index to help speed up queries, but where that index is not constrained to be unique.
Here's the syntax to do that. Suppose we have some table T:
create table T(i int, j int, k int);
We want to create an index on column i to speed up queries, but i is not unique. To do that we create a regular (non unique) index:
create index MyIndexName on T(i);
As a rule, I tend to name my indexes based on what they are indexing. So in the above case I wouldn't call the index "MyIndexName", I would call it something like ix_T_i.

Minimum number of candidate keys for a relation?

My question is, is it necessary for a relation/table in database to have a candidate key and hence a primary key? Is it possible to have a relation where a row cannot be uniquely identified by any combination of attributes?
If no, why? And if yes, then how does a DBMS make operations like search, delete etc, efficient?

Relations always have distinct tuples which means that in a Relational DBMS a table always has at least one candidate key.
SQL is a different case. SQL tables are "tuple bags", not relations. SQL tables can have duplicate rows, which is one of SQL's biggest flaws. Despite the fact that SQL supports duplicate rows the language is ill-suited to cope with them. In the presence of duplicate rows the SQL standard UPDATE and DELETE for instance have no guaranteed way to reference individual rows without resorting to some complex cursor-based operations.
Consequent problems of duplicate rows are certain inefficiencies and complexities of SQL DBMSs and a lack of orthogonality in their features. SQL DBMS engines have to use internal structures and support special features as a prerequisite in order to deal with duplicate rows. Some DBMS vendors try to get around the difficulties by disabling certain features for tables that don't have keys.

A database does not require a primary key. A table is just an unordered set of rows. Without any indexes, the only mechanism for accessing rows in a table is a full table scan (or a full partition scan, if the table is partitioned). Such operations are only efficient for very small numbers of rows.
Tables are more useful when you can refer to particular rows. Often, the best primary keys are auto incremented/identity primary keys. These are maintained by the database. In practice, all tables in a well-designed database are going to have primary keys. Here are three reasons:
Rows can be referred to by other tables.
Individual rows can be updated and deleted.
Individual rows can be selected efficiently and unambiguously.
Note: you can have indexes on a table without primary keys. And combinations of one or more columns can be made unique, even if the combination is not a primary key. The primary key itself is an index, so the inverse is not true. And all rows in a table have "row addresses" which are unique. Whether or not these are available for queries depends on the database engine.

Yes, this is possible.
Just note, that some identifier does exists behind the scenes (Example from SQL Server):
When a table is stored as a heap, individual rows are identified by
reference to a row identifier (RID) consisting of the file number, data page number, and slot on the page
How operations will be performed?
A table scan will be needed for almost any operation:
If a table is a heap and does not have any nonclustered indexes, then
the entire table must be examined (a table scan) to find any row

Database normalization for electricity monitoring system

I've read a lot of tips and tutorials about normalization but I still find it hard to understand how and when we need normalization. So right now I need to know if this database design for an electricity monitoring system needs to be normalized or not.
So far I have one table with fields:
monitor_id
appliance_name
brand
ampere
uptime
power_kWh
price_kWh
status (ON/OFF)
This monitoring system monitors multiple appliances (TV, Fridge, washing machine) separately.
So does it need to be normalized further? If so, how?

Honestly, you can get away without normalizing every database. Normalization is good if the database is going to be a project that affects many people or if there are performance issues and the database does OLTP. Database normalization in many ways boils down to having larger numbers of tables themselves with fewer columns. Denormalization involves having fewer tables with larger numbers of columns.
I've never seen a real database with only one table, but that's ok. Some people denormalize their database for reporting purposes. So it isn't always necessary to normalize a database.
How do you normalize it? You need to have a primary key (on a column that is unique or a combination of two or more columns that are unique in their combined form). You would need to create another table and have a foreign key relationship. A foreign key relationship is a pair of columns that exist in two or more tables. These columns need to share the same data type. These act as a map from one table to another. The tables are usually separated by real-world purpose.
For example, you could have a table with status, uptime and monitor_id. This would have a foreign key relationship to the monitor_id between the two tables. Your original table could then drop the uptime and status columns. You could have a third table with Brands, Models and the things that all models have in common (e.g., power_kWh, ampere, etc.). There could be a foreign key relationship to the first table based on model. Then the brand column could be eliminated (via the DDL command DROP) from the first table as this third table will have it relating from the model name.
To create new tables, you'll need to invoke a DDL command CREATE TABLE newTable with a foreign key on the column that will in effect be shared by the new table and the original table. With foreign key constraints, the new tables will share a column. The tables will have less information in them (fewer columns) when they are highly normalized. But there will be more tables to accommodate and store all the data. This way you can update one table and not put a lock on all the other columns in a denormalized database with one big table.
Once new tables have the data in the column or columns from the original table, you can drop those columns from the original table (except for the foreign key column). To drop columns, you need to invoke DDL commands (ALTER TABLE originalTable, drop brand).
In many ways, performance will be improved if you try to do many reads and writes (commit many transactions) on a database table in a normalized database. If you use the table as a report, and want to present all the data as it is in the table normally, normalized the database will hurt the peformance.
By the way, normalizing the database can prevent redundant data. This can make the database consume less storage space and use less memory.

It is nice to have our database normalize.It helps us to have a efficient data because we can prevent redundancy here and also saves memory usages. On normalizing tables we need to have a primary key in each table and use this to connect to another table and when the primary key (unique in each table) is on another table it is called the foreign key (use to connect to another table).
Sample you already have this table :
Table name : appliances_tbl
-inside here you have
-appliance_id : as the primary key
-appliance_name
-brand
-model
and so on about this appliances...
Next you have another table :
Table name : appliance_info_tbl (anything for a table name and must be related to its fields)
-appliance_info_id : primary key
-appliance_price
-appliance_uptime
-appliance_description
-appliance_id : foreign key (so you can get the name of the appliance by using only its id)
and so on....
You can add more table like that but just make sure that you have a primary key in each table. You can also put the cardinality to make your normalizing more understandable.

Many-to-Many Relationship between two tables in two different databases

As mentioned in the title, is it possible to create many-to-many relationship between two tables that belong to two different databases? If yes, how can i perform that with PostgreSQL?

The standard way of using foreign key constraints to enforce referential integrity is only possible within the same database - not db cluster. But you can operate across multiple schemas in the same database.
Other than that, you can create tables just the same way. And even join tables dynamically among remote databases using dblink or FDW. Referential integrity cannot be guaranteed across databases by the RDBMS, though.
Does not matter much whether the other DB is on the same physical machine or even in the same DB cluster - that just makes the connection faster and more secure.
Or you can replicate data to a common database and add standard constraints there.

It should be possible, but as has been stated you cannot expect much in the way of referential integrity.
If you follow the standard design pattern of using a linking table, you can generate a sort of M2M relationship.
DB1.dbo.Users has the USER_ID primary key
DB2.dbo.Tasks has the TASK_ID primary key
you could create a table on either DB1 or DB2 that is UsersToTasks
DB1.dbo.UsersToTasks
USER_ID - KEY
TASK_ID - KEY
This way, a unique pairing of USER_IDs and TASK_IDs are used as a key in that table. The only thing is you cannot create a foreign key to the other table.
As a pseudo workaround, you could write a trigger on DB2.dbo.Task that would write the TASK_ID to DB1.dbo.TASK_IDS and link that as the foreign key on the linking table above. I'm not sure, but you could also potentially create a delete trigger that would remove the TASK_ID as well.
http://solaimurugan.blogspot.com/2010/08/cross-database-triggers-in-postgresql.html

Database normalization - How not OK is it to have a table with no relationships?

I'm really new to database design, as I will now demonstrate:
I have an MS Sql database that I need to add a table to. The table contains information that pertains to another table. However, there are no candidates for primary keys (all fields can be duplicates). The only thing the table will ever be used for is to keep records that may be required for a certain kind of query, and they can be retrieved super-easily using a field that my other tables also contain (but never uniquely).
Specifically, my main table has a bunch of chemistry records. Each chemistry record is associated with another set of records called quality-control records (in my second table). They are associated by a field called "BatchID". The super-easy part is that I can say, "get all records with this BatchID" and get exactly what I need. But there can be multiple instances of any BatchID in both tables (in fact, there usually are), so I'd need to jump through hoops to link them. In a more general sense, in theory, is it OK to have a table floating around not attached to anything?
The overwhelmingly simple solution is to just put the quality control in the db with no relationships to the chemistry table. I'd need to insert at least one other table to relate it to anything else, maybe more, and the only reason for complicating my life like that is that I don't want to violate some important precept of database design.
My question is, is it ever OK to just have a free-floating table in a database? Or is that right out?
Thanks for any help.

In theory, it's ok to have a table that doesn't have any foreign key constraints. But the table you describe (both tables you describe) should probably have a foreign key that references the table of batches. We'd expect the table of batches to have "BatchID" as its primary key.
The relational model requires tables to have at least one candidate key. It's almost always a bad idea to have a SQL table that doesn't have a candidate key.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight