Database Designing and Normalization issue

Database Designing and Normalization issue - sql-server

I have a huge access mdb file which contains a single table with 20-30 columns and over 50000 rows and
i have some thing like this
columns:
id desc name phone email fax ab bc zxy sd country state zip .....
1 a ab 12 fff 12 w 2 3 2 d sd 233
2 d ab 12 fff 12 s 2 3 1 d sd 233
here I have some column values related to addresses repeating is there a way to normalize the above table so that we can remove duplicates or repeating data.
Thanks in advance.

Here's a quick answer. You just need to move your address fields to a new table (remove dups) and add a FK back to your primary table.
Table 1 (People or whatever)
id desc name phone email fax ab bc zxy sd address_id
1 a ab 12 fff 12 w 2 3 2 1
2 d ab 12 fff 12 s 2 3 1 2
3 d ab 12 fff 12 s 2 3 1 2
4 d ab 12 fff 12 s 2 3 1 1
Table 2 (Address)
address_id country state zip .....
1 d sd 233
2 e ac 123

Jim W has a good start, but to normalize even further, make your redundant address elements into separate tables as well.
Create the tables for which address data is repeated (Country, State, etc.) Once you have your data tables, you'll want to add columns such as StateID, CountryID, etc. to the Address table.
You now have options for fixing the existing data. You can be quick and dirty and use Update statements to set all the newly created ID fields to point to the right data table.
UPDATE Addresses SET StateID=1 WHERE STATE='AL'
You can do this fairly quickly as a batch .sql file, but I'd recommend a more programmatic solution that rolls through the Address table and tries to match the current 'State' to an entry in the new States table. If found, the StateID on the Address table is updated with the id from the corresponding row in States.
You can then delete the old State field from the address table, as it is now normalized nice and neatly into a separate States table.
This process can be repeated for all redundant data elements. However, IMO db normalization can be taken too far. For example, if you have a commonly used query that, after normalization, requires 10 joins to accomplish, you may see a performance reduction. This doesn't appear to be the case here, as I think you're on the right track.
From a comment above:
#Lance i wanted something similar to that but here is the problem i have raw data coming in the form of single table and i need to refine and send it to two tables i can add address in table 2 but i m not undertanding how would you insert the address_id in table 1
You can retrieve the newly created ID from the address table using ##IDENTITY, and update the address_ID with this value.

Related

Generate String Value Table Automatically in EF Core

I have a table called Customer that has several columns called National Code and Name. It also has a number of other features called Contact Numbers and Recommenders, since the number of Contact Numbers and Recommenders is more than one, so you need some other table to store them.
Also suppose I have other tables like the Customer, each of which has a number of attributes greater than one.
What is your suggestion for storing these values?
In one source, it was suggested that for each table, a table called StringValue be used for storage. Does EF core have a way to implement StringValue without writing additional code?
Example:
Customer Table:
CustomerId Name NationalCode
------------------------------------------------------------------------
1 David xxxx
------------------------------------------------------------------------
StringValue Table:
StringId CustomerId StringName Value
------------------------------------------------------------------------
10 1 PhoneNumber 915245
11 1 PhoneNumber 985452
12 1 PhoneNumber 935446
13 1 Recommenders Mr Jhon
14 1 Recommenders Mr bb
------------------------------------------------------------------------

I think it is more intutive create a new table for the field which has more than one records, then configure a one-to-many relationship between the two tables. Take your case as an example, you can divide the customer table into three tables, they can be linked by foreignkey:
1.Customer Table:
CustomerId Name NationalCode
---------------------------------------------
1 David xxxx
2.Contact Table:
Id CustomerId PhoneNumber
---------------------------------------------
1 1 915245
2 1 985452
3 1 935446
3.Recommender Table:
Id CustomerId RecommenderName
---------------------------------------------
1 1 Mr Jhon
2 1 Mr bb

Editing MS SQL DB with 20 tables to 1 table without data lost

Hello I don't know how to do changes of my MS SQL DB to integrate it to work with new software.
The case is related with software limitations.
Our old software can write, read and work with MS SQL DB with multiple tables but the new software understand only from one table or one view table.
My question is how can I edit my MS SQL DB form 20 tables to do it to be one DB with one table with all data from 20 tables and columns without data lost?
And one last question is true about View Tables in MS SQL that they are read only for applications and software?

Why is it not a good idea to put data from all your 20 tables into one table ?
I will try to explain with an example, since I do not know your database I just think of some tables here
suppose you have a table Clients
ClientID Name Street City
1 John ChuchStreet Denver
2 Anna FlowerStreet Boston
and a table Products
ProductID Name Price
1 Mouse 10
2 Keyboard 30
3 Usb Cable 10
and table Orders
OrderID OrderNumber CLientID TotalAmount
1 123 1 10
2 345 1 20
3 678 2 30
and finally table OrderDetail
OrderDetailID OrderID ProductID Quantity
1 1 1 1
2 2 1 1
3 2 3 1
4 3 2 1
Now to put this into one table, you could do this
ID ClientName ClientStreet ClientCity OrderNumber TotalAmount ProductName ProductPrice ProductQuantity
1 John ChurchStreet Denver 123 10 Mouse 10 1
1 John ChurchStreet Denver 345 20 Mouse 10 1
2 John ChurchStreet Denver 345 20 Usb Cable 10 1
3 Anna FlowerStreet Boston 678 30 Keyboard 30 1
Now you can already see the redundancy,
you need to repeat the address of each customer, time and time again in your table
you need to repeat the ordernumber and total amount time and time again
you need to repeat the productname and price time and time again
Now suppose that John moves to another address, now you have to search for John in every row in the table, and adjust the address
Now suppose a productname changes, again you have to search all rows and update
That is lots of work, very inefficient, and guaranteed to go wrong at some point
Now I only used 4 tables in this example, can you image what will happen if you would merge 20 tables into 1 ?
And the redundancy is not your only problem, what if you want to look at a client, what row should you use ?
What if you want to look at an order, what row should you use ?
What if you want to look at a product, what row should you use ?
In this one table design, you cannot identify a single row for customer, or order anymore. That is because each row contains everyting, there is no distinct row anymore for a customer, or a product, or an order...
Merging all tables into one big table is simply not possible to maintain

Hi thank you for your answers!
I am trying to integrate old ms sql db to new IDFLOW Software.
This software understand from ms sql db but only from one table but my old db contains 18 tables....
IDFLOW understand from views, but it is not good to work with views, they are okay for card design in IDFLOW, but views are not okay to write new data in db from IDFLOW interface , because in views it is not possible to write new data in db!
And now I started to think to create one db with one table with all columns, and I completed it successfully!
Now in my sql server I am with two db, old db and new db.
Now I don't know how to export data from old db and import it to new db?
I am talking about export column2 from table1 from db1 and import it in column2 from table1 from db2 ....
and so and so ........... column2 from table2 from db1 and import it in column3 from table1 from db2 ............
and so and so .... column 5 from table18 from db1 and import it in column10 form table1 from db2.
Is it possible to do that?

Okay, but IDFLOW works only with one table...
And I asked Jolly and they say utilize your DB for one table.
IDFLOW is a software for ID CARDS .
I think it is possible to use one db with one table and all columns for employee information.
The goal is.. in IDFLOW enter all data for new employee, when you enter data from IDFLOW interface they go into MS SQL DB and all fields from DB are in card design configured, and when you select records for one employee from IDFLOW interface (from db) you can print ID CARD with all data for employee.
The question is how to migrate old records from old db in new.
It is not big database, it is only 6 GB from 2006.
And we use it only for printing id cards.

Sqlserver getting to much time fetch the details

I have table like below,
Id(Pk) User_Id C1 C2 C3 C4
1 111 2 a b c
2 111 5 d e f
3 111 7 a f ty
4 222 2 a b c
5 222 5 d e f
6 222 7 a f ty
This table almost having 10L records. And each user_id having almost 10k Records. If I am fetching details by User_id, it is getting almost 5 mins to get the details. Where i have to tune this?
Im using below query
Select * from User where user_id ='111'
And total number of columns in this table is around 130 columns.

Assuming you have a right index defined on (User_id) & just select the column which you actually want so, i would do with backed SSMS:
SET NOCOUNT ON
SELECT User_Id, C1, C2, . . .
FROM User
WHERE user_id = 111;
If, relative index is not present with user_id then you have to pay with poor performance.
Note : If user_id has numeric type, then you don't need to use ''.

If you PK on the Id column is a clustered index, change it to non-clustered and create a clustered index on the User_Id column.
This will most likely make SQL Server use a partial table scan, and as all requested records will be saved "close to eachother", this might reduce the number of page reads required to retrieve the values (which can be "grabbed" from the disk without further criteria checks).

Multiple fields in one sheet mapping to single field in another sheet

I need help designing the following architecture. Say Table 1 has a list of companies along with their information. Then Table 2 has a list of "primary" companies, followed by multiple fields denoting the companies that they do business with ("secondary" companies). So the first field in Table 2 is the "primary" company, and then there could follow up to 20 columns, each with a single entry of a company that does business with the "primary company". Each of those companies are listed in Table 1. I want to be able to tie the information about those "secondary companies" in with the information about them in Table 1. For example, I want to be able to see how many "secondary" companies in Table 2 are from California. I each company listed once in Table 1 along with headquarters locations; but each company in Table 2 is in a distinct column. I'm just getting confused about how to structure this, because when I try to make a query relationship between Table 1 and Table 2, I end up making a join between "Company" in Table 1 and every column in Table 2. Not good, right?
I have something like this...
Table 1
Company City State
A Los Angeles CA
B San Diego CA
C New York NY
.
.
.
Table 2
Primary Company Secondary1 Secondary2 Secondary3 Secondary4
A B C X Y
B A C Z W
C A B W X
Do I need another table for this? Should I just concatenate these 20+ secondary company fields into one column somehow? Any direction you could give would be very helpful.
Thank you!

I think what you want is this:
Table 1
Company City State
A Los Angeles CA
B San Diego CA
C New York NY
.
.
.
Table 2
Primary Company Secondary
A B
A C
A X
A Y
B A
B C
B Z
B W
...
In this case you will have a relationship between table1, column1 and table2 column 1 and 2. This is acceptable (though technically better if you use numbers instead of letters).

SQL delete rows based on date difference

The situation is quite complicated to express in the title. An example should be much easier to understand.
My table A:
uid id ticket created_date
001 1 movie 2015-01-23 08:23:16
002 25 TV 2012-01-13 12:02:20
003 1 movie 2015-02-01 07:15:36
004 1 movie 2014-02-15 15:38:40
What I need to achieve is to remove duplicate records that appear within 31 days between each other and retain the record that appear first. So the above table would be reduced to B:
uid id ticket created_date
001 1 movie 2015-01-23 08:23:16
002 25 TV 2012-01-13 12:02:20
004 1 movie 2014-02-15 15:38:40
because the 3rd row in A were within 31 days of row 1 and it appeared later than row 1 (2015-02-01 vs 2015-01-23), so it gets removed.
Is there a clean way to do this?

I would suggest the following approach:
SELECT A.uid AS uid
INTO #tempA
FROM A
LEFT JOIN A AS B
ON A.id=B.id AND A.ticket=B.ticket
WHERE DATEDIFF(SECOND,B.date,A.date) > 0 AND
DATEDIFF(SECOND,B.date,A.date) < 31*24*60*60;
DELETE FROM A WHERE uid IN (SELECT uid FROM #tempA);
This is assuming that by 'duplicate records' you mean records that have both identical id as well as identical ticket fields. If that's not the case you should adjust the ON clause accordingly.