splitting table - database

I have the following table which holds data on customers and staff. Would it be beneficial if I split it into 2 separate tables: Persons and Address? Each single person can have only one address, phone and mobile. I have a separate table for orders.
My database is quite complex and I wonder if this would be useful for implementation.
Many thanks,
zan
_______________
Persons |
_______________|
PersonID |
FirstName |
LastName |
OrderName |
Email |
Telephone |
Mobile |
StreetAddress |
City |
RegionID FK |
Country |
PostCode |
TitleID FK |
PersonCat FK |
MailingList |
_______________|

Only split tables when it's for normalizing purposes: for example, if one person can have multiple addresses or if less than a certain amount of people have an address (let's say 90%), which would result in a lot of NULL values.
If it's not for normalizing, don't split tables.

Related

Counting records with some conditions in a table and displaying with a button on a form

In Access I have this table tblcls
I have a button on a form. I need some code/vba/SQL etc. for this button so when my client clicks it, they see something like in image2. Where for example.. State OK has 2 English classes, 4 Maths classes, and 3 Science classes. Please note that there won't be more classes, so no more columns but there will be coming more states and cities so the table there will be growing by rows.
Assuming that any of the three class fields can contain any subject, the desired result can be obtained using simple conditional aggregation, e.g.:
select
t.state,
-Sum(t.[Class 1]="English" or t.[Class 2]="English" or t.[Class 3]="English"),
-Sum(t.[Class 1]="Maths" or t.[Class 2]="Maths" or t.[Class 3]="Maths" ),
-Sum(t.[Class 1]="Science" or t.[Class 2]="Science" or t.[Class 3]="Science")
from tblcls t
group by t.state
However, the clunkiness & inelegance of this solution is due to the fact that your database does not adhere to database normalisation rules.
For example, since a state may offer many classes, and a class may be taught in many states, you are working with a many-to-many relationship between states & classes, and so a better way to structure the database whilst adhering the rules of database normalisation would be to make use of a junction table.
Hence, at the very least you may have four tables:
States
+--------------+
| StateID (PK) |
| StateName |
+--------------+
Cities
+--------------+
| CityID (PK) |
| StateID (FK) |
| CityName |
+--------------+
Classes
+--------------+
| ClassID (PK) |
| ClassName |
+--------------+
City_Class_Xref
+--------------+
| ID (PK) |
| CityID (FK) |
| ClassID (FK) |
| StartDate |
| EndDate |
| Cost |
+--------------+
With this structure, there are now many ways to obtain your desired output - one possible method is using a crosstab query, e.g.:
transform count(*)
select states.statename
from
states inner join
(
cities inner join
(
classes inner join city_class_xref on
classes.classid = city_class_xref.classid
)
on cities.cityid = city_class_xref.cityid
)
on states.stateid = cities.stateid
group by states.statename
pivot classes.classname
The beauty of this approach is that if you later decide to add or remove a class, city, or state, the query remains unchanged as nothing has been hard-coded - upon adding another class, the class name will automatically appear in the results of the query.

Ensuring that two column values are related in SQL Server

I'm using Microsoft SQL Server 2017 and was curious about how to constrain a specific relationship. I'm having a bit of trouble articulating so I'd prefer to share through an example.
Consider the following hypothetical database.
Customers
+---------------+
| Id | Name |
+---------------+
| 1 | Sam |
| 2 | Jane |
+---------------+
Addresses
+----------------------------------------+
| Id | CustomerId | Address |
+----------------------------------------+
| 1 | 1 | 105 Easy St |
| 2 | 1 | 9 Gale Blvd |
| 3 | 2 | 717 Fourth Ave |
+------+--------------+------------------+
Orders
+-----------------------------------+
| Id | CustomerId | AddressId |
+-----------------------------------+
| 1 | 1 | 1 |
| 2 | 2 | 3 |
| 3 | 1 | 3 | <--- Invalid Customer/Address Pair
+-----------------------------------+
Notice that the final Order links a customer to an address that isn't theirs. I'm looking for a way to prevent this.
(You may ask why I need the CustomerId in the Orders table at all. To be clear, I recognize that the Address already offers me the same information without the possibility of invalid pairs. However, I'd prefer to have an Order flattened such that I don't have to channel through an address to retrieve a customer.)
From the related reading I was able to find, it seems that one method may be to enable a CHECK constraint targeting a User-Defined Function. This User-Defined Function would be something like the following:
WHERE EXISTS (SELECT 1 FROM Addresses WHERE Id = Order.AddressId AND CustomerId = Order.CustomerId)
While I imagine this would work, given the somewhat "generality" of the articles I was able to find, I don't feel entirely confident that this is my best option.
An alternative might be to remove the CustomerId column from the Addresses table entirely, and instead add another table with Id, CustomerId, AddressId. The Order would then reference this Id instead. Again, I don't love the idea of having to channel through an auxiliary table to get a Customer or Address.
Is there a cleaner way to do this? Or am I simply going about this all wrong?
Good question, however at the root it seems you are struggling with creating a foreign key constraint to something that is not a foreign key:
Orders.CustomerId -> Addresses.CustomerId
There is no simple built-in way to do this because it is normally not done. In ideal RDBMS practices you should strive to encapsulate data of specific types in their own tables only. In other words, try to avoid redundant data.
In the example case above the address ownership is redundant in both the address table and the orders table, because of this it is requiring additional checks to keep them synchronized. This can easily get out of hand with bigger datasets.
You mentioned:
However, I'd prefer to have an Order flattened such that I don't have to channel through an address to retrieve a customer.
But that is why a relational database is relational. It does this so that distinct data can be kept distinct and referenced with relative IDs.
I think the best solution would be to simply drop this requirement.
In other words, just go with:
Customers
+---------------+
| Id | Name |
+---------------+
| 1 | Sam |
| 2 | Jane |
+---------------+
Addresses
+----------------------------------------+
| Id | CustomerId | Address |
+----------------------------------------+
| 1 | 1 | 105 Easy St |
| 2 | 1 | 9 Gale Blvd |
| 3 | 2 | 717 Fourth Ave |
+------+--------------+------------------+
Orders
+--------------------+
| Id | AddressId |
+--------------------+
| 1 | 1 |
| 2 | 3 |
| 3 | 3 | <--- Valid Order/Address Pair
+--------------------+
With that said, to accomplish your purpose exactly, you do have views available for this kind of thing:
create view CustomerOrders
as
select o.Id OrderId,
a.CustomerId,
o.AddressId
from Orders
join Addresses a on a.Id = o.AddressId
I know this is a pretty trivial use-case for a view but I wanted to put in a plug for it because they are often neglected and come in handy with organizing big data sets. Using WITH SCHEMABINDING they can also be indexed for performance.
You may ask why I need the CustomerId in the Orders table at all. To be clear, I recognize that the Address already offers me the same information without the possibility of invalid pairs. However, I'd prefer to have an Order flattened such that I don't have to channel through an address to retrieve a customer.
If you face performance problems, the first thing is to create or amend proper indexes. And DBMS are usually good at join operations (with proper indexes). But yes normalization can sometimes help in performance tuning. But it should be a last resort. And if that route is taken, one should really know what one is doing and be very careful not to damage more at the end of a day, that one has gained. I have doubts, that you're out of options here and really need to go that path. You're likely barking up the wrong tree. Therefore I recommend you take the "normal", "sane" way and just drop customerid in orders and create proper indexes.
But if you really insist, you can try to make (id, customerid) a key in addresses (with a unique constraint) and then create a foreign key based on that.
ALTER TABLE addresses
ADD UNIQUE (id,
customerid);
ALTER TABLE orders
ADD FOREIGN KEY (addressid,
customerid)
REFERENCES addresses
(id,
customerid);

Single table column refers to multiple primary key

I need to store multiple values in a single column.
For example I am creating table which holds the user preferences
e.g.
| user_id | cities | countries |
|---------|------------|------------|
| 1 | 10, 11, 23 | 21, 34 |
because i can't store them as array (or don't prefer to store as array even if it is available - due to maintenance and performance reasons - and better RDMS design), i have to create a mapping table like this
| user_id | type | reference_id |
|---------|---------|--------------|
| 1 | CITY | 10 |
| 1 | CITY | 11 |
| 1 | CITY | 23 |
| 1 | COUNTRY | 21 |
| 1 | COUNTRY | 34 |
The reference id in this column refers to the master tables like city, country, etc.
The problem here i see is
I can't have FK reference to city or country table, because single reference_id column may refer to city or country depends on the type
As i can't have FK, there is no guaranty that we can't have dirty data
Is there any better approach?
Note:
I have given city/country as sample, but i need to have around 20 columns which can have multiple values like city or country
In future i may introduce some boolean preference like "whether you like to travel" so i might want to store TYPE as "TRAVEL" and referece_id as 0 for yes 1 for no; which definately will not have any reference
You could create a Location Table {LocationId, locationType (city/country)}
and then everytime you add a new record to the city or country table, add it to location table first, then add it to city (or country) table as appropriate with same cityId (or countryId) as was used as LocationId in Location Table.
then create FK between preferences table and location table, and add [zero or one] to one (0/1 - 1) FK relationship between City and country tables to the Location table. (Every record in City and COuntry table tables must be in Location table, but not the other way around.
You're saying you want a table for generic data instead of 20 lookup tables enforcing RI? On a large system, the data would be stored in multiple tables instead of using a delimiter to separate the values and then exploding them out in another table, introducing the problem of enforcing RI. If you're storing values that are really generic, like code/description pairs, you just need a codeSetID field to identify which codes belong in which codesets.

Condensing Row Data into a View

I have data in my PeopleInfo table where there are some people that have multiple records that I am trying to combine together into one record for a view.
All people data is the almost the same except for the PlanId and PlanName. So:
| FirstName | LastName | SSN | PlanId | PlanName | Status | Price1 | Price2 |
|-----------|----------|-----------|--------|----------|-----------|---------|--------|
| John | Doe | 123456789 | 1 | Plan A | Primary | 9.00 | NULL |
|-----------|----------|-----------|--------|----------|-----------|---------|--------|
| John | Doe | 123456789 | 2 | Plan B | Secondary | NULL | 5.00 |
I would like to only to have one John Doe record in my view that looked like this:
| FirstName | LastName | SSN | PlanId | PlanName | Status | Price1 | Price2 |
|-----------|----------|-----------|--------|----------|-----------|---------|--------|
| John | Doe | 123456789 | 1 | Plan A | Primary | 9.00 | 5.00 |
Where the Primary status determines which PlanId and PlanName to show. Can anyone help me with this query?
declare #t table ( FNAME varchar(10), LNAME varchar(10), SSN varchar(10), PLANID INT,PLANNAME varchar(10),stat varchar(10),Price1 decimal(18,2),Price2 decimal(18,2))
insert into #t (FNAME,LNAME,SSN,PLANID,PLANNAME,stat,Price1,Price2)values ('john','doe','12345',1,'PlanA','primary',9.00,NULL),('john','doe','12345',1,'PlanB','secondary',Null,8.00)
select
FNAME,
LNAME,
SSN,
MAX(PLANID)PLANID,
MIN(PLANNAME)PLANNAME,
MIN(stat)stat,
MIN(Price1)Price1,
MIN(Price2)Price2 from #t
GROUP BY FNAME,LNAME,SSN
(I can't yet add a comment, so have an answer.)
The only thing that troubles me here is that i am also determining which PlanId and PlanName since they are different and i want to show a specific one based off of the Status field of both records.
Then you don't even need GROUPing. It would be much simpler. Just SELECT WHERE 'Primary' = PlanName. Assuming that (A) there will always be this PlanName for each user, and (B) You are happy to ignore all others.
P.S. If you will only be using Primary and Secondary PlanNames, you might want to change the column to a bit named something like isPrimaryPlan where 1 indicates true and 0 false. However, if you might bring in e.g. Bronze and Consolation Prize Plans later, then you'll need to retain a more variable datatype. Perhaps store the plans in a separate table and have an int FOREIGN KEY to it... I could go on!
OK, I'm back after having a sleep, which has improved my brain slightly,
First, let the record reflect that I don't like the database design here. The People and Plans should be separate tables, linked by foreign keys - via a 3rd table, e.g. PeoplePlans. That takes me to another point: the people here have no primary key (at least not that you have specified). So when writing the below, I had to pick the SSN, assuming that will always be present and unique.
Anyway, something like this should work, with the caveat that I'm not going to replicate the database structure to test it.
select
FirstName,
LastName,
SSN,
PlanId,
PlanName,
Status,
_ca._sum_Price1,
_ca._sum_Price2
from
PeopleInfo as _Primary
cross apply (
select
sum(Price1) as _sum_Price1,
sum(Price2) as _sum_Price2
from
PeopleInfo
where
_Primary.SSN = SSN
) as _ca
where
'Primary' = Status;
This SELECTs all People with Primary status in order to get you those rows. It then CROSS APPLYs their Primary and any other rows and takes the summed Prices.
Hopefully this makes sense. If not, you'll have to do some reading about CROSS APPLY, in addition to about good relational database design. ;-)

DB Data migration

I have a database table called A and now i have create a new table called B and create some columns of A in table B.
Eg: Suppose following columns in tables
Table A // The one already exists
Id, Country Age Firstname, Middlename, Lastname
Table B // The new table I create
Id Firstname Middlename Lastname
Now the table A will be look like,
Table A // new table A after the modification
Id, Country, Age, Name
In this case it will map with table B..
So my problem is now i need to kind of maintain the reports which were generated before the table modifications and my friend told me you need to have a data migration..so may i know what is data migration and how its work please.
Thank you.
Update
I forgot to address the reporting issue raised by the OP (Thanks Mark Bannister). Here is a stab at how to deal with reporting.
In the beginning (before data migration) a report to generate the name, country and age of users would use the following SQL (more or less):
-- This query orders users by their Lastname
SELECT Lastname, Firstname, Age, Country FROM tableA order by Lastname;
The name related fields are no longer present in tableA post data migration. We will have to perform a join with tableB to get the information. The query now changes to:
SELECT b.Lastname, b.Firstname, a.Country, a.Age FROM tableA a, tableB b
WHERE a.name = b.id ORDER BY b.Lastname;
I don't know how exactly you generate your report but this is the essence of the changes you will have to make to get your reports working again.
Original Answer
Consider the situation when you had only one table (table A). A couple of rows in the table would look like this:
# Picture 1
# Table A
------------------------------------------------------
Id | Country | Age | Firstname | Middlename | Lastname
1 | US | 45 | John | Fuller | Doe
2 | UK | 32 | Jane | Margaret | Smith
After you add the second table (table B) the name related fields are moved from table A to table B. Table A will have a foreign key pointing to the table B corresponding to each row.
# Picture 2
# Table A
------------------------------------------------------
Id | Country | Age | Name
1 | US | 45 | 10
2 | UK | 32 | 11
# Table B
------------------------------------------------------
Id | Firstname | Middlename | Lastname
10 | John | Fuller | Doe
11 | Jane | Margaret | Smit
This is the final picture. The catch is that the data will not move from table A to table B on its own. Alas human intervention is required to accomplish this. If I were the said human I would follow the steps given below:
Create table B with columns Id, Firstname, Middlename and Lastname. You now have two tables A and B. A has all the existing data, B is empty .
Add a foreign key to table A. This FK will be called name and will reference the id field of table B.
For each row in table A create a new row in table B using the Firstname, Middlename and Lastname fields taken from table A.
After copying each row, update the name field of table A with the id of the newly created row in table B.
The database now looks like this:
# Table A
-------------------------------------------------------------
Id | Country | Age | Firstname | Middlename | Lastname | Name
1 | US | 45 | John | Fuller | Doe | 10
2 | UK | 32 | Jane | Margaret | Smith | 11
# Table B
------------------------------------------------------
Id | Firstname | Middlename | Lastname
10 | John | Fuller | Doe
11 | Jane | Margaret | Smith
Now you no longer need the Firstname, Middlename and Lastname columns in table A so you can drop them.
voilĂ , you have performed a data migration!
The process I just described above is but a specific example of a data migration. You can accomplish it in a number of ways using a number of languages/tools. The choice of mechanism will vary from case to case.
Maintenance of the existing reports will depend on the tools used to write / generate those reports. In general:
Identify the existing reports that used table A. (Possibly by searching for files that have the name of table A inside them - however, if table A has a name [eg. Username] which is commonly used elsewhere in the system, this could return a lot of false positives.)
Identify which of those reports used the columns that have been removed from table A.
Amend the existing reports to return the moved columns from table B instead of table A.
A quick way to achieve this is to create a database view that mimics the old structure of table A, and amend the affected reports to use the database view instead of table A. However, this adds an extra layer of complexity into maintaining the reports (since developers may need to maintain the database view as well as the reports) and may be deprecated or even blocked by the DBAs - consequently, I would only recommend using this approach if a lot of existing reports are affected.

Resources