DB Data migration - database

I have a database table called A and now i have create a new table called B and create some columns of A in table B.
Eg: Suppose following columns in tables
Table A // The one already exists
Id, Country Age Firstname, Middlename, Lastname
Table B // The new table I create
Id Firstname Middlename Lastname
Now the table A will be look like,
Table A // new table A after the modification
Id, Country, Age, Name
In this case it will map with table B..
So my problem is now i need to kind of maintain the reports which were generated before the table modifications and my friend told me you need to have a data migration..so may i know what is data migration and how its work please.
Thank you.

Update
I forgot to address the reporting issue raised by the OP (Thanks Mark Bannister). Here is a stab at how to deal with reporting.
In the beginning (before data migration) a report to generate the name, country and age of users would use the following SQL (more or less):
-- This query orders users by their Lastname
SELECT Lastname, Firstname, Age, Country FROM tableA order by Lastname;
The name related fields are no longer present in tableA post data migration. We will have to perform a join with tableB to get the information. The query now changes to:
SELECT b.Lastname, b.Firstname, a.Country, a.Age FROM tableA a, tableB b
WHERE a.name = b.id ORDER BY b.Lastname;
I don't know how exactly you generate your report but this is the essence of the changes you will have to make to get your reports working again.
Original Answer
Consider the situation when you had only one table (table A). A couple of rows in the table would look like this:
# Picture 1
# Table A
------------------------------------------------------
Id | Country | Age | Firstname | Middlename | Lastname
1 | US | 45 | John | Fuller | Doe
2 | UK | 32 | Jane | Margaret | Smith
After you add the second table (table B) the name related fields are moved from table A to table B. Table A will have a foreign key pointing to the table B corresponding to each row.
# Picture 2
# Table A
------------------------------------------------------
Id | Country | Age | Name
1 | US | 45 | 10
2 | UK | 32 | 11
# Table B
------------------------------------------------------
Id | Firstname | Middlename | Lastname
10 | John | Fuller | Doe
11 | Jane | Margaret | Smit
This is the final picture. The catch is that the data will not move from table A to table B on its own. Alas human intervention is required to accomplish this. If I were the said human I would follow the steps given below:
Create table B with columns Id, Firstname, Middlename and Lastname. You now have two tables A and B. A has all the existing data, B is empty .
Add a foreign key to table A. This FK will be called name and will reference the id field of table B.
For each row in table A create a new row in table B using the Firstname, Middlename and Lastname fields taken from table A.
After copying each row, update the name field of table A with the id of the newly created row in table B.
The database now looks like this:
# Table A
-------------------------------------------------------------
Id | Country | Age | Firstname | Middlename | Lastname | Name
1 | US | 45 | John | Fuller | Doe | 10
2 | UK | 32 | Jane | Margaret | Smith | 11
# Table B
------------------------------------------------------
Id | Firstname | Middlename | Lastname
10 | John | Fuller | Doe
11 | Jane | Margaret | Smith
Now you no longer need the Firstname, Middlename and Lastname columns in table A so you can drop them.
voilĂ , you have performed a data migration!
The process I just described above is but a specific example of a data migration. You can accomplish it in a number of ways using a number of languages/tools. The choice of mechanism will vary from case to case.

Maintenance of the existing reports will depend on the tools used to write / generate those reports. In general:
Identify the existing reports that used table A. (Possibly by searching for files that have the name of table A inside them - however, if table A has a name [eg. Username] which is commonly used elsewhere in the system, this could return a lot of false positives.)
Identify which of those reports used the columns that have been removed from table A.
Amend the existing reports to return the moved columns from table B instead of table A.
A quick way to achieve this is to create a database view that mimics the old structure of table A, and amend the affected reports to use the database view instead of table A. However, this adds an extra layer of complexity into maintaining the reports (since developers may need to maintain the database view as well as the reports) and may be deprecated or even blocked by the DBAs - consequently, I would only recommend using this approach if a lot of existing reports are affected.

Related

SQL Consecutive Sequence Number gets messed up with ORDER BY

I am working on Windows Form Application and it accesses database in SQL Server 2014. I have EmployeeTable which I retrieve data from, and display all the records in DataGridView. In this table, I have a column SequenceID, which basically increments from 1 up to the number of records in this table, but this is not the same as AUTO INCREMENT in that SequenceID gets updated each time the table is modified, and keeps the numerical order no matter how many times new records get inserted or some records are deleted. For example, if the data looks like
SequenceID | Name
1 | John
2 | Mary
3 | Robert
and Mary is removed, then the resulting table needs to look like
SequenceID | Name
1 | John
2 | Robert
In order to achieve this, I used the best answer by zombat from Update SQL with consecutive numbering, and it was working great until I used ORDER BY expression.
This EmployeeTable also has DateAdded column, containing the date when the record was inserted. I need to display all records ordered by this DateAdded column, with the oldest record shown at the top and the newest at the bottom in addition to the correct SequenceID order. However, it gets messed up when a record is deleted, and a new one is inserted.
If I insert 3 records like,
SequenceID | Name | DateAdded
1 | John | 9/25/2017
2 | Mary | 9/26/2017
3 | Robert | 9/27/2017
and remove Mary, it becomes
SequenceID | Name | DateAdded
1 | John | 9/25/2017
2 | Robert | 9/27/2017
and this is good so far. However, if I add another record Tommy on, say, 9/28/2017, which should be added at the bottom because it is the newest, it results in something like,
SequenceID | Name | DateAdded
1 | John | 9/25/2017
3 | Robert | 9/27/2017
2 | Tommy | 9/28/2017
The ORDER BY is working fine, but it messes up the SequenceID, and I am not sure why this is happening. All I am doing is,
SELECT *
FROM EmployeeTable
ORDER BY DateAdded
I tried placing zombat's SQL command both before and after this SQL command, but neither worked. It seems to me like when I delete a row, the row has an invisible spot, and a new record is inserted in there.
Is there any way to fix this so I can order the records by DateAdded and still have the SequenceID working correctly?
If you need id for GUI (presentation only) you could use:
SELECT ROW_NUMBER() OVER(ORDER BY DateAdded) AS sequenceId, Name, DateAdded
FROM EmployeeTable
ORDER BY DateAdded;
EDIT:
I am trying to update the SequenceID, but it is not getting updated
You should not try to reorder your table every time. It doesn't make sense.

Single table column refers to multiple primary key

I need to store multiple values in a single column.
For example I am creating table which holds the user preferences
e.g.
| user_id | cities | countries |
|---------|------------|------------|
| 1 | 10, 11, 23 | 21, 34 |
because i can't store them as array (or don't prefer to store as array even if it is available - due to maintenance and performance reasons - and better RDMS design), i have to create a mapping table like this
| user_id | type | reference_id |
|---------|---------|--------------|
| 1 | CITY | 10 |
| 1 | CITY | 11 |
| 1 | CITY | 23 |
| 1 | COUNTRY | 21 |
| 1 | COUNTRY | 34 |
The reference id in this column refers to the master tables like city, country, etc.
The problem here i see is
I can't have FK reference to city or country table, because single reference_id column may refer to city or country depends on the type
As i can't have FK, there is no guaranty that we can't have dirty data
Is there any better approach?
Note:
I have given city/country as sample, but i need to have around 20 columns which can have multiple values like city or country
In future i may introduce some boolean preference like "whether you like to travel" so i might want to store TYPE as "TRAVEL" and referece_id as 0 for yes 1 for no; which definately will not have any reference
You could create a Location Table {LocationId, locationType (city/country)}
and then everytime you add a new record to the city or country table, add it to location table first, then add it to city (or country) table as appropriate with same cityId (or countryId) as was used as LocationId in Location Table.
then create FK between preferences table and location table, and add [zero or one] to one (0/1 - 1) FK relationship between City and country tables to the Location table. (Every record in City and COuntry table tables must be in Location table, but not the other way around.
You're saying you want a table for generic data instead of 20 lookup tables enforcing RI? On a large system, the data would be stored in multiple tables instead of using a delimiter to separate the values and then exploding them out in another table, introducing the problem of enforcing RI. If you're storing values that are really generic, like code/description pairs, you just need a codeSetID field to identify which codes belong in which codesets.

Condensing Row Data into a View

I have data in my PeopleInfo table where there are some people that have multiple records that I am trying to combine together into one record for a view.
All people data is the almost the same except for the PlanId and PlanName. So:
| FirstName | LastName | SSN | PlanId | PlanName | Status | Price1 | Price2 |
|-----------|----------|-----------|--------|----------|-----------|---------|--------|
| John | Doe | 123456789 | 1 | Plan A | Primary | 9.00 | NULL |
|-----------|----------|-----------|--------|----------|-----------|---------|--------|
| John | Doe | 123456789 | 2 | Plan B | Secondary | NULL | 5.00 |
I would like to only to have one John Doe record in my view that looked like this:
| FirstName | LastName | SSN | PlanId | PlanName | Status | Price1 | Price2 |
|-----------|----------|-----------|--------|----------|-----------|---------|--------|
| John | Doe | 123456789 | 1 | Plan A | Primary | 9.00 | 5.00 |
Where the Primary status determines which PlanId and PlanName to show. Can anyone help me with this query?
declare #t table ( FNAME varchar(10), LNAME varchar(10), SSN varchar(10), PLANID INT,PLANNAME varchar(10),stat varchar(10),Price1 decimal(18,2),Price2 decimal(18,2))
insert into #t (FNAME,LNAME,SSN,PLANID,PLANNAME,stat,Price1,Price2)values ('john','doe','12345',1,'PlanA','primary',9.00,NULL),('john','doe','12345',1,'PlanB','secondary',Null,8.00)
select
FNAME,
LNAME,
SSN,
MAX(PLANID)PLANID,
MIN(PLANNAME)PLANNAME,
MIN(stat)stat,
MIN(Price1)Price1,
MIN(Price2)Price2 from #t
GROUP BY FNAME,LNAME,SSN
(I can't yet add a comment, so have an answer.)
The only thing that troubles me here is that i am also determining which PlanId and PlanName since they are different and i want to show a specific one based off of the Status field of both records.
Then you don't even need GROUPing. It would be much simpler. Just SELECT WHERE 'Primary' = PlanName. Assuming that (A) there will always be this PlanName for each user, and (B) You are happy to ignore all others.
P.S. If you will only be using Primary and Secondary PlanNames, you might want to change the column to a bit named something like isPrimaryPlan where 1 indicates true and 0 false. However, if you might bring in e.g. Bronze and Consolation Prize Plans later, then you'll need to retain a more variable datatype. Perhaps store the plans in a separate table and have an int FOREIGN KEY to it... I could go on!
OK, I'm back after having a sleep, which has improved my brain slightly,
First, let the record reflect that I don't like the database design here. The People and Plans should be separate tables, linked by foreign keys - via a 3rd table, e.g. PeoplePlans. That takes me to another point: the people here have no primary key (at least not that you have specified). So when writing the below, I had to pick the SSN, assuming that will always be present and unique.
Anyway, something like this should work, with the caveat that I'm not going to replicate the database structure to test it.
select
FirstName,
LastName,
SSN,
PlanId,
PlanName,
Status,
_ca._sum_Price1,
_ca._sum_Price2
from
PeopleInfo as _Primary
cross apply (
select
sum(Price1) as _sum_Price1,
sum(Price2) as _sum_Price2
from
PeopleInfo
where
_Primary.SSN = SSN
) as _ca
where
'Primary' = Status;
This SELECTs all People with Primary status in order to get you those rows. It then CROSS APPLYs their Primary and any other rows and takes the summed Prices.
Hopefully this makes sense. If not, you'll have to do some reading about CROSS APPLY, in addition to about good relational database design. ;-)

Best structure for this database?

The Problem: I want to design a website and I need a database for that, however I don't know which structure is better!
What will happen: Users will add some URIs to their favorites.
Possible structures:
Structure one:
TABLE "USERS":
=====================================================================
id | name | last_name | urls
1 | John | Smith | [google.com,stackoverflow.com,yahoo.com,...]
=====================================================================
Structure two:
TABLE "USERS":
=============================================
id | name | last_name
1 | John | Smith
2 | Joe | Roth
==============================================
TABLE "URLS":
==============================================
id | user_id | url
1 | 1 | google.com
2 | 1 | stackoverflow.com
3 | 2 | ask.com
4 | 1 | yahoo.com
5 | 2 | being.com
==============================================
Which structure is best? Thanks in advance!
The second schema (with a foreign key constraint on URLS.user_id) will be easier to manage. A select query will need to use a join and you'll be performing more inserts, but you won't need to perform string parsing to figure out what the urls are (which, with the first schema, you would need for every single select and update).
One of the tables in the production database at my current job has a schema similar to the first case, and it makes my entire department cringe and complain when they have to write code working with it. Yet the table remains, because fixing it would be a massive overhaul. (The table was created by a manager who no longer works here.) If you're creating your schema now, do it right, while you still can.
Absolutely create another table that save urls + user_id. Your first type is not in Normal form.

splitting table

I have the following table which holds data on customers and staff. Would it be beneficial if I split it into 2 separate tables: Persons and Address? Each single person can have only one address, phone and mobile. I have a separate table for orders.
My database is quite complex and I wonder if this would be useful for implementation.
Many thanks,
zan
_______________
Persons |
_______________|
PersonID |
FirstName |
LastName |
OrderName |
Email |
Telephone |
Mobile |
StreetAddress |
City |
RegionID FK |
Country |
PostCode |
TitleID FK |
PersonCat FK |
MailingList |
_______________|
Only split tables when it's for normalizing purposes: for example, if one person can have multiple addresses or if less than a certain amount of people have an address (let's say 90%), which would result in a lot of NULL values.
If it's not for normalizing, don't split tables.

Resources