My original question can be found here, for which I've gotten some great answers, idas and tips.
As part of a feasibility and performance study, I've started to convert my schemas in order to version my data using those ideas. In doing so, I've come up with some kind of other problem.
In my original question, my example was simple, with no real relational references. In an attempt to preserve the example of my previous question, I will now extend the 'Name' part to another table.
So now, my data becomes:
Person
------------------------------------------------
ID UINT NOT NULL,
NameID UINT NOT NULL,
DOB DATE NOT NULL,
Email VARCHAR(100) NOT NULL
PersonAudit
------------------------------------------------
ID UINT NOT NULL,
NameID UINT NOT NULL,
DOB DATE NOT NULL,
Email VARCHAR(100) NOT NULL,
UserID UINT NOT NULL, -- Who
PersonID UINT NOT NULL, -- What
AffectedOn DATE NOT NULL, -- When
Comment VARCHAR(500) NOT NULL -- Why
Name
------------------------------------------------
ID UINT NOT NULL,
FirstName VARCHAR(200) NOT NULL,
LastName VARCHAR(200) NOT NULL,
NickName VARCHAR(200) NOT NULL
NameAudit
------------------------------------------------
ID UINT NOT NULL,
FirstName VARCHAR(200) NOT NULL,
LastName VARCHAR(200) NOT NULL,
NickName VARCHAR(200) NOT NULL,
UserID UINT NOT NULL, -- Who
NameID UINT NOT NULL, -- What
AffectedOn DATE NOT NULL, -- When
Comment VARCHAR(500) NOT NULL -- Why
In a GUI, we could see the following form:
ID : 89213483
First Name : Firsty
Last Name : Lasty
Nick Name : Nicky
Date of Birth : January 20th, 2005
Email Address : my.email#host.com
A change can be made to:
Only to the 'name' part
Only to the 'person' part
To both the 'name' and person parts
If '1' occurs, we copy the original record to NameAudit and update our Name record with the changes. Since the person reference to the name is still the same, no changes to Person or PersonAudit are required.
If '2' occurs, we copy the original record to PersonAudit and update the Person record with the changes. Since the name part has not changed, no changes to Name or NameAudit are required.
If '3' occurs, we update our database according to the two methods above.
If we were to make 100 changes to both the person and name parts, one problem occurs when you later try to show a history of changes. All my changes show the person having the last version of the name. Which is wrong obviously.
In order to fix this, it would seem that the NameID field in Person should reference the NameAudit instead (but only if Name has changes).
And it is this conditional logic that starts complicating things.
I would be curious to find out if anyone has had this kind of problem before with their database and what kind of solution was applied?
You should probably try to read about 'Temporal Database' handling. Two books you could look at are Darwen, Date and Lorentzos "Temporal Data and the Relational Model" and (at a radically different extreme) "Developing Time-Oriented Database Applications in SQL", Richard T. Snodgrass, Morgan Kaufmann Publishers, Inc., San Francisco, July, 1999, 504+xxiii pages, ISBN 1-55860-436-7. That is out of print but available as PDF on his web site at cs.arizona.edu. You can also look for "Allen's Relations" for intervals - they may be helpful to you.
I assume that the DATE type in your database includes time (so you probably use Oracle). The SQL standard type would probably be TIMESTAMP with some number of fractional digits for sub-second resolution. If your DBMS does not include time with DATE, then you face a difficult problem deciding how to handle multiple changes in a single day.
What you need to show, presumably, is a history of the changes in either table, with the corresponding values from the other table that were in force at the time when the changes were made. You also need to decide whether what you are showing is the before or after image; presumably, again, the after image. That means that you will have a 'sequenced' query (Snodgrass's term), with columns like:
Start time -- When this set of values became valid
End time -- When this set of values became invalid
PersonID -- Person.ID (or PersonAudit.ID) for Person data
NameID -- Name.ID (or NameAudit.ID) for Name data
DOB -- Date of Birth recorded while data was valid
Email -- Email address recorded while data was valid
FirstName -- FirstName recorded while data was valid
LastName -- LastName recorded while data was valid
NickName -- NickName recorded while data was valid
I assume that once a Person.ID is established, it does not change; ditto for Name.ID. That means that they remain valid while the records do.
One of the hard parts in this is establishing the correct set of 'start time' and 'end time' values, since transitions could occur in either table (or both). I'm not even sure, at the moment, that you have all the data you need. When a new record is inserted, you don't capture the time it becomes valid (there is nothing in the XYZAudit table when you insert a new record, is there?).
There's a lot more could be said. Before going further, though, I'd rather have some feedback about some of the issues raised so far.
Some other SO questions that might help:
Data structure for non-overlapping ranges within a single dimension
Why do we need a temporal database
Determine whether two date ranges overlap
Best relational database representation of time-bound hierarchies
Since this answer was first written, there's another book published about another set of methods called 'Asserted Versioning' for handling temporal data. The book is 'Managing Time in Relational Databases: How to Design, Update and Query Temporal Data' by Tom Johnston and Randall Weiss. You can find their company at AssertedVersioning.com. Beware: there may be patent issues around the mechanism.
Also, the SQL 2011 standard (ISO/IEC 9075:2011, in a number of parts) has been published. It includes some temporal data support. You can find out more about that and other issues related to temporal data at TemporalData.com, which is more of a general information site rather than one with a particular product axe to grind.
Keep a single changes table with an autoincrement ID and make all your changes to refer to that table.
Always put the original record to the audit table.
To build a history, select all your changes and show the value closest to the change.
Like this:
`Change`
1
2
3
4
5
6
`NameAudit`
1 - created as John Smith
5 - changed to James Smith
`PersonAudit`
1 - created as born on `01.01.1980` in `Seattle, WA`
2 - changed DOB to '01.01.1980`
3 - changed DOB to '02.01.1980`
4 - changed DOB to '02.01.1980`
6 - changes POB to `Washington, DC`
Then select:
SELECT c.id,
(
SELECT MAX(id)
FROM NameAudit na
WHER na.id <= c.id
) as nameVersion,
(
SELECT MAX(id)
FROM PersonAudit pa
WHER pa.id <= c.id
) as personVersion,
na.*,
pa.*
FROM change c
JOIN NameAudit na
ON na.id = nameVersion
JOIN PersonAudit pa
ON pa.id = nameVersion
WHERE change_id BETWEEN 1 AND 6
Related
Our software is a collection of Windows applications that connect to a SQL database. Currently all our client sites have their own server and SQL Server database, however I'm working on making our software work with Azure-hosted databases too.
I've hit one snag with it, and so far not found anything particularly helpful while Googling around.
The current SQL Server version includes a database auditing system I wrote, which does the following:-
The C# Applications include in the connection string information about which program and version it is, and which User is currently logged in.
Important tables have Update and Delete triggers, which send details of any changes to a Service Broker queue. (I don't log Inserts).
The Service Broker then goes through the queue, and records details of the change to a separate AuditLog table.
These details include:-
Table, PK of the row changed, Field, Old Value, New Value, whether it was an Update or Delete, date/time of change, UserID of the user logged in to our software, and which program and version made the change.
This all works very nicely, and I was hoping to keep the system as-is for the Azure version, but unfortunately SQL Azure does not have Service Broker.
So, I need to look for an alternative, which as I mentioned is proving troublesome.
There is SQL Azure Managed Instances, which does have Service Broker, however they are way too expensive for us to even consider. Not one of our clients would pay that much per month.
Anything else I've looked at doesn't seem to have everything I need. In particular, logging which program, version and UserID. Note that this isn't the SQL login UserID, which will be the same for everyone, this is the ID from the Users table with which they log in to our software, and is passed in the Connection String.
So, ideally I'd like something similar to what I have, just with something else in the place of the Service Broker:-
The C# Applications include in the connection string information about which program and version it is, and which User is currently logged in.
Important tables have Update and Delete triggers, which send details of any changes to an asynchronous queue of some sort.
Something then goes through the queue outside the normal program flow, and records details of the change to a separate AuditLog table.
The asynchronous queue and processing outside the normal program flow is important. Obviously I could very easily have the Update and Delete triggers do all the processing and add the records to the AuditLog table, in fact that was v1.0 of the system, but the problem there is that SQL will wait until the triggers have finished before returning to the C# program. This then causes the C# program to slow down considerably when multiple Updates or Deletes are happening.
I'd be happy to look into other logging systems instead of the above, however something which only records data changes without the extra information I pass, specifically program, version and UserID, won't be of any use to me. Our Users always want to know this information whenever they query something they think is an incorrect change.
So, any suggestions for an alternative to Service Broker for SQL Azure please? TIA!
Ok, looks like I have a potential solution: Temporal Tables
Temporal Tables work in Azure, and record a new row in a History table whenever something changes:-
CREATE TABLE dbo.LMSTemporalTest
(
[EmployeeID] INT NOT NULL PRIMARY KEY CLUSTERED
, [Name] NVARCHAR(100) NOT NULL
, [Position] NVARCHAR(100) NOT NULL
, [Department] NVARCHAR(100) NOT NULL
, [Address] NVARCHAR(1024) NOT NULL
, [AnnualSalary] DECIMAL (10,2) NOT NULL
, [UpdatedBy] UniqueIdentifier NOT NULL
, [UpdatedDate] DateTime NOT NULL
, [ValidFrom] DateTime2 (2) GENERATED ALWAYS AS ROW START HIDDEN
, [ValidTo] DateTime2 (2) GENERATED ALWAYS AS ROW END HIDDEN
, PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.LMSTemporalTestHistory));
GO
I can then insert a record into the table...
INSERT INTO LMSTemporalTest(EmployeeID,Name,Position,Department,Address,AnnualSalary, UpdatedBy, UpdatedDate)
VALUES(1, 'Bob', 'Builder', 'Fixers','Oops I forgot', 1, '0D7F5584-C79B-4044-87BD-034A770C4985', GetDate())
GO
Update the row...
UPDATE LMSTemporalTest SET
Address = 'Sunflower Valley, Bobsville',
UpdatedBy = '2C62290B-61A9-4B75-AACF-02B7A5EBFB80',
UpdatedDate = GetDate()
WHERE EmployeeID = 1
GO
Update the row again...
UPDATE LMSTemporalTest SET
AnnualSalary = 420.69,
UpdatedBy = '47F25135-35ED-4855-8050-046CD73E5A7D',
UpdatedDate = GetDate()WHERE EmployeeID = 1
GO
And then check the results:-
SELECT * FROM LMSTemporalTest
GO
EmployeeID Name Position Department Address AnnualSalary UpdatedBy UpdatedDate
1 Bob Builder Fixers Sunflower Valley, Bobsville 420.69 47F25135-35ED-4855-8050-046CD73E5A7D 2019-07-01 16:20:00.230
Note: Because I set them as Hidden, the Valid From and Valid To don't show up
Check the changes for a date / time range:-
SELECT * FROM LMSTemporalTest
FOR SYSTEM_TIME BETWEEN '2019-Jul-01 14:00' AND '2019-Jul-01 17:10'
WHERE EmployeeID = 1
ORDER BY ValidFrom;
GO
EmployeeID Name Position Department Address AnnualSalary UpdatedBy UpdatedDate
1 Bob Builder Fixers Oops I forgot 1.00 0D7F5584-C79B-4044-87BD-034A770C4985 2019-07-01 16:20:00.163
1 Bob Builder Fixers Sunflower Valley, Bobsville 1.00 2C62290B-61A9-4B75-AACF-02B7A5EBFB80 2019-07-01 16:20:00.197
1 Bob Builder Fixers Sunflower Valley, Bobsville 420.69 47F25135-35ED-4855-8050-046CD73E5A7D 2019-07-01 16:20:00.230
And I can even view the History table
SELECT * FROM LMSTemporalTestHistory
GO
EmployeeID Name Position Department Address AnnualSalary UpdatedBy UpdatedDate ValidFrom ValidTo
1 Bob Builder Fixers Oops I forgot 1.00 0D7F5584-C79B-4044-87BD-034A770C4985 2019-07-01 16:20:00.163 2019-07-01 16:20:00.16 2019-07-01 16:20:00.19
1 Bob Builder Fixers Sunflower Valley, Bobsville 1.00 2C62290B-61A9-4B75-AACF-02B7A5EBFB80 2019-07-01 16:20:00.197 2019-07-01 16:20:00.19 2019-07-01 16:20:00.22
Note: the current row doesn't show up, as it's still Valid
All of our important tables have CreatedBy, CreatedDate, UpdatedBy and UpdatedDate already, so I can use those for the UserID logging. No obvious way of handling the Program and Version as standard, but I can always add another hidden field and use Triggers to set that.
EDIT: Actually tested it out
First hurdle was: can you actually change an existing table into a Temporal Table, and the answer was: yes!
ALTER TABLE Clients ADD
[ValidFrom] DateTime2 (2) GENERATED ALWAYS AS ROW START HIDDEN NOT NULL DEFAULT '1753-01-01 00:00:00.000',
[ValidTo] DateTime2 (2) GENERATED ALWAYS AS ROW END HIDDEN NOT NULL DEFAULT '9999-12-31 23:59:59.997',
PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)
GO
ALTER TABLE Clients SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.ClientsHistory))
GO
An important bit above is the defaults on the ValidFrom and ValidTo fields. It only works if ValidTo is the maximum value a DateTime2 can be, hence '9999-12-31 23:59:59.997'. ValidFrom doesn't seem to matter, so I set that to the minimum just to cover everything.
Ok, so I've converted a table, but it now has two extra fields that the non-Azure table doesn't, which are theoretically hidden, but will our software complain about them?
Seems not. Fired up the software, edited a record on the Clients table and saved it, and the software didn't complain at all.
Checked the Clients and ClientsHistory tables:-
SELECT * FROM Clients
FOR SYSTEM_TIME BETWEEN '1753-01-01 00:00:00.000' AND '9999-12-31 23:59:59.997'
WHERE sCAccountNo = '0001064'
ORDER BY ValidFrom
Shows two records, the original and the edited one, and the existing UpdatedUser and UpdatedDate fields show correctly so I know who made the change and when.
SELECT * FROM ClientsHistory
Shows the original record, with ValidTo set to the date of the change,
All seems good, now I just need to check that it still only returns the current record in queries and to our software:-
SELECT * FROM Clients
WHERE sCAccountNo = '0001064'
Just returned the one record, and doesn't show the HIDDEN fields, ValidFrom and ValidTo.
Did a search in our software for Client 0001064, and again it just returned the one record, and didn't complain about the two extra fields.
Still need to set up a few Triggers and add another HIDDEN field to record the program and version from the Connection String, but it looks like Temporal Tables gives me a viable audit option.
Only downside so far is that it creates an entire record row for each set of changes, meaning you have to compare it to other records to find out what changed, but I can write something to simplify that easily enough.
I need to create a service but i need a help with choice of tools.
Imagine service in which users create some data that have value in historical view (e.g. transactions). Other users can see this data but they need a proof that data are real and not falsified by users or even by service.
Example:
User A creates record with number 42
Couple of months passes
User B see this record and wants to be sure that service can't update this record with any other number 37
Service has trust window with 24 hours: it even can change users data, which were made on this day.
Question: Which instruments can help me to achieve that?
I was thinking about doing public daily backups (or reports?) that any user can download. From each report hash will be calculated and inserted into next backup – thus, a chan of hashes created. If service will change something in past, then hashes in this chain will not converge. Of course, i'll create open sourced tool for easy comparing diff between data and check if chain is valid.
Point of trust: there is one thing that i'm afraid of. Service can use many databases simultaneously and update all backups with all hashes one time (because first backup has no hash of previous one). So, to cover that case too, i think of storing hashes in some place that service can't change at all. For example, in one of the existed blockchains (btc, eth, ...) from official wallet of service. Or, maybe, DAG with some blockchain like IOTA?
What do you think of point of trust?
Can i achieve my goal with some simpler way (without blockchain)? And which one?
What are bottlenecks in this logic?
There are 2 participating variables here
timestamp at which the record is created.
the data.
Solution premise,
Tampering proof.
the data can be changed in the same GMT calendar day without violating tamper-proof guarantee. (can be changed to a fixed window after creation)
RDBMS as the data store, (can be changed to any NoSQL with minor mods, but the idea remains the same).
Doesn't depend on any other mechanism which can be faulty or error-prone.
Single query verification.
## Proposed solution
create data table
CREATE TABLE TEST(
ID INT PRIMARY KEY AUTO_INCREMENT,
DATA VARCHAR(64) NOT NULL,
CREATED_AT DATETIME DEFAULT CURRENT_TIMESTAMP()
);
create checksum table, which monitor tempering
CREATE TABLE SIGN(
ID INT PRIMARY KEY AUTO_INCREMENT,
DATA_ID INT NOT NULL,
SIGNATURE VARCHAR(128) NOT NULL,
CREATED_AT DATETIME DEFAULT CURRENT_TIMESTAMP(),
UPDATED_AT TIMESTAMP
);
create trigger on insert of data
/** Trigger on insert */
DELIMITER //
CREATE TRIGGER sign_after_insert
AFTER INSERT
ON TEST FOR EACH ROW
BEGIN
-- INSERT VAL
INSERT INTO SIGN(DATA_ID, `SIGNATURE`) VALUES(
NEW.ID, MD5(CONCAT (NEW.DATA, DATE(NEW.CREATED_AT)))
);
END; //
DELIMITER ;
Create a trigger for update of data
-- UPDATE TRIGGER
DELIMITER //
CREATE TRIGGER SIGN_AFTER_UPDATE
AFTER UPDATE
ON TEST FOR EACH ROW
BEGIN
-- UPDATE VALS
IF (NEW.DATA <> OLD.DATA) AND (DATE(OLD.CREATED_AT) = CURRENT_DATE() ) THEN
UPDATE SIGN SET SIGNATURE=MD5(CONCAT(NEW.DATA, DATE(NEW.CREATED_AT))) WHERE DATA_ID=OLD.ID;
END IF;
END; //
DELIMITER ;
Test
Step 1: insert the data
INSERT INTO TEST(DATA) VALUES ('DATA2');
The signature of data and the date at which it was created, will reflect as the signature in SIGN table.
Step 2: update the data
the signature will get updated if value is changed and it is the SAME DAY.
UPDATE TEST SET DATA='DATA' WHERE ID =1;
Step 3: validate
you can always validate the data signature as
SELECT MD5(CONCAT (T.DATA, DATE(T.`CREATED_AT`))) AS CHECKSUM, S.SIGNATURE FROM TEST AS T ,SIGN AS S WHERE S.DATA_ID= T.ID AND S.`id`=1;
Output
| CHECKSUM | SIGNATURE |
| ------ | ------ |
|2bba70178abdafc5915ba0b5061597fa |2bba70178abdafc5915ba0b5061597fa
I have a small problem with a SQL Server query.
I have an issue with my view of several base tables with duplicate values, so far no problem, these duplicates are logical. By unfortunately I do not get the desired end result, I could do it by programming the front end of my application but I would prefer to do the work on the server.
I will explain the principle:
I have 30 companies which each have an employee table.
My view is a union of the 30 employee tables.
Each employee has a unique serial number, the number is the same across tables, so an employee named "John Doe" with an ID number 'S0000021' can be hired in Company A then transferred to company Q without any problems, it will retain the serial number 'S0000021'.
The difference between the data from the Employee tables A and Q will be in this example the start (hire) and release (transfer) dates entered for Company A and just the start date for company Q so the view will have 2 lines for "John Doe".
12 common fields are the following:
Serial Number (Identical in every employee table)
Social Security Number (Same in every employee table)
Start/Hire Date
Release/Transfer date (empty/null if the employee is current)
Name (Can change across companies if the person divorces)
First name
Maiden name
Last Name
Gender
Final Released
Company Code
The problem seems simple that I would not appear that the latest information of the employee, except with a group by, if it has changed name or release date, it will be displayed twice.
I tried the following different ways but they don't return what I want
I returned results both ways but I always see duplicates because my dates within companies are never identical, and their name may change.
Sorry for this Google translation.
1 --
select
vue.matricule,
vue.numsecu,
vue.name,
vue.lastname,
vue.maidenname,
vue.secondname,
vue.genre,
vue.released,
vue.companycode
from
vue
group by
vue.matricule,
vue.numsecu,
vue.name,
vue.lastname,
vue.maidenname,
vue.secondname,
vue.genre,
vue.released,
vue.companycode
2---
select
distinct(vue.matricule),
vue.numsecu,
vue.name,
vue.lastname,
vue.maidenname,
vue.secondname,
vue.genre,
vue.released,
vue.companycode
from
vue
I assumed the following:
there is a view (vue) that already gathers all data from each of the 30 companies
you are just looking for the latest record for each employee
If you need to also see a record for each name change we can change this.
--set up test data
declare #vue table (
matricule varchar(20),
numsecu varchar(20),
name varchar(20),
lastname varchar(20),
maidenname varchar(20),
secondname varchar(20),
genre varchar(20),
start datetime,
released datetime,
companycode varchar(20));
insert #vue values
('S0000021','123456789','John', 'Doe',null,null,'M','2015-01-01','2015-12-31','A'),
('S0000021','123456789','Johnny', 'Doe',null,null,'M','2016-01-01',null,'Q'), --new company, name change, currently employed
('S0000022','123456780','Jane', 'Doe',null,null,'M','2015-01-01','2015-12-31','A'),
('S0000022','123456780','Jane', 'Doe',null,null,'M','2016-01-01','2016-02-01','Q'); --new company, name change, terminated
select * from #vue order by matricule, start;
--get latest record for each employee
select *
from (--add row numbering
select *, row_number() over (partition by matricule order by start desc) row_num
from #vue
) vue2
where vue2.row_num = 1;
I am trying to get all the data from all tables in one DB.
I have looked around, but i haven't been able to find any solution that works with my current problems.
I made a C# program that creates a table for each day the program runs. The table name will be like this tbl18_12_2015 for today's date (Danish date format).
Now in order to make a yearly report i would love if i can get ALL the data from all the tables in the DB that stores these reports. I have no way of knowing how many tables there will be or what they are called, other than the format (tblDD-MM-YYYY).
in thinking something like this(that obviously doesen't work)
SELECT * FROM DB_NAME.*
All the tables have the same columns, and one of them is a primary key, that auto increments.
Here is a table named tbl17_12_2015
ID PERSONID NAME PAYMENT TYPE RESULT TYPE
3 92545 TOM 20,5 A NULL NULL
4 92545 TOM 20,5 A NULL NULL
6 117681 LISA NULL NULL 207 R
Here is a table named tbl18_12_2015
ID PERSONID NAME PAYMENT TYPE RESULT TYPE
3 117681 LISA 30 A NULL NULL
4 53694 DAVID 78 A NULL NULL
6 58461 MICHELLE NULL NULL 207 R
What i would like to get is something like this(from all tables in the DB):
PERSONID NAME PAYMENT TYPE RESULT TYPE
92545 TOM 20,5 A NULL NULL
92545 TOM 20,5 A NULL NULL
117681 LISA NULL NULL 207 R
117681 LISA 30 A NULL NULL
53694 DAVID 78 A NULL NULL
58461 MICHELLE NULL NULL 207 R
Have tried some different query's but none of them returned this, just a lot of info about the tables.
Thanks in advance, and happy holidays
edit: corrected tbl18_12_2015 col 3 header to english rather than danish
Thanks to all those who tried to help me solving this question, but i can't (due to my skill set most likely) get the UNION to work, so that's why i decided to refactor my DB.
While you could store the table names in a database and use dynamic sql to union them together, this is NOT a good idea and you shouldn't even consider it - STOP NOW!!!!!
What you need to do is create a new table with the same fields - and add an ID (auto-incrementing identity column) and a DateTime field. Then, instead of creating a new table for each day, just write your data to this table with the DateTime. Then, you can use the DateTime field to filter your results, whether you want something from a day, week, month, year, decade, etc. - and you don't need dynamic sql - and you don't have 10,000 database tables.
I know some people posted comments expressing the same sentiments, but, really, this should be an answer.
If you had all the tables in the same database you would be able to use the UNION Operator to combine all your tables..
Maybe you can do something like this to select all the tables names from a given database
For SQL Server:
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_CATALOG='dbName'
For MySQL:
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_SCHEMA='dbName'
Once you have the list of tables you can move all the tables to 1 database and create your report using Unions..
You will need to use a UNION between each select query.
Do not use *, always list the name of the columns you are bringing up.
If you want duplicates, then UNION ALL is what you want.
If you want unique records based on the PERSONID, but there is likely to be differences, then I will guess that an UPDATE_DATE column will be useful to determine which one to use but what if each records with the same PERSONID lived a life of its own on each side?
You'd need to determine business rules to find out which specific changes to keep and merge into the unique resulting record and you'd be on your own.
What is "Skyttenavn"? Is it Danish? If it is the same as "NAME", you'd want to alias that column as 'NAME' in the select query, although it's the order of the columns as listed that counts when determining what to unite.
You'd need a new auto-incremented ID as a unique primary key, by the way, if you are likely to have conflicting IDs. If you want to merge them together into a new primary key identity column, you'd want to set IDENTITY_INSERT to OFF then back to ON if you want to restart natural incrementation.
i'm Creating an application that will be working off a User table, the user table has many other columns like Email,Name,Address,etc. There is a verification stage for the User that needs to be verified when they enter an email address. The page would display all the users that are not verified and that entered an email address.
I've prepared three examples/approaches and wondering what would be the most productive way if there would be a need for also Logging all the verifications in case one is verified by mistake, therefore, delete a verification. Also, lets say there is more than one type of verification, so there is one for email address and another one for street address.
A few things to consider is that if the user adds an email address at a later date, then it should automatically go into the verification queue because it would recognize it either directly, or it would have to create a pre-set value (like example 2).
I prefer making the application more light-weight (like example 1), but i don't want to compromise on performance, and need some help or some re-affirmation with how to approach this..
FIRST EXAMPLE
Verification Queue Page would display the data:
SELECT * WHERE EMAIL <> '' AND User.IsVerified = 0
The fields would look more like this:
//for handling verified
User.IsVerified bit = 0,
User.VerifiedDate datetime,
User.VerifiedNotes varchar(250),
User.VerifiedBy varchar(20)
SECOND EXAMPLE
First we define if it needs verification:
UPDATE User SET User.NeedsVerification = 1 WHERE User.Email <> ''
Second, the page would display the data:
SELECT * WHERE User.NeedsVerification = 1 AND User.IsVerified = 0
The fields would look more like this:
//for handling verified
User.NeedsVerification bit,
User.IsVerified bit,
User.VerifiedDate datetime,
User.VerifiedNotes varchar(250),
User.VerifiedBy varchar(20)
THIRD EXAMPLE
Verification Queue Page would display the data:
SELECT * FROM User INNER JOIN Verification ON User.ID Verification.UserID WHERE User.EMAIL <> '' AND Verification.ID Is Not Null
The fields would be in another table, only considering this option because there may be more types of Verifications and instead of a NeedsVerification field, it can be more like If it even exists sort of deal:
Verification.UserID int,
Verification.Type int,
Verification.IsVerified,
Verification.VerifiedDate datetime,
Verification.VerifiedNotes varchar(250),
Verification.VerifiedBy varchar(20)
I'd lean toward a collection of tables to record the various verifications, one for each type. Thus you would have EmailVerifications, BillToAddressVerifications, ShipToAddressVerifications, CreditVerifications, ..., e.g.:
EmailVerificationId int identity,
UserId int,
EmailAddress varchar(256),
ChangeDate datetime, -- When the user updated the EmailAddress.
VerifiedDate datetime null, -- IsVerified if this is not NULL.
VerifiedNotes varchar(250),
VerifiedBy varchar(20) -- Or AdminId?
This allows you to accomodate:
Multiple changes per user, e.g. several email addresses over the years.
Adding additional types of verifications, e.g. telephone number, as time goes on.
You can also use the history for things like sending notifications to both old and new addresses when a change is made.
Adding verifications that are applied according to business rules, e.g. someone signing up for a newsletter doesn't need a verified email address. A supplier does.
You could extend it to support "validated" and "verified" data. A validated email address doesn't cause bounces and is good enough for a newsletter, a verified email address has been confirmed to be correct.
For performance you could denormalize the data and keep an IsVerified summary bit in the user record and use triggers to maintain the value based on updates to the other tables.