Delete specific history in verioned tables? - database

I'm using MariaDB 10.3. Is there a way to delete history for a specific record? I have a situation where once I delete a record, I'm under contract to remove all records (including historical ones).
Consider the following table:
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`email` varchar(255) DEFAULT NULL,
`start_trxid` bigint(20) unsigned GENERATED ALWAYS AS ROW START,
`end_trxid` bigint(20) unsigned GENERATED ALWAYS AS ROW END,
PRIMARY KEY (`id`,`end_trxid`),
PERIOD FOR SYSTEM_TIME (`start_trxid`, `end_trxid`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING
And consider the following commands run against that table:
insert into users (name, email) values ('cory', 'name#corycollier.com'), ('bob', 'bob#gmail.com');
UPDATE users set name='Cory' where id=1;
UPDATE users set name='Cory Collier' where id=1;
UPDATE users set name='Cory C' where id=1;
UPDATE users set name='Cory' where id=1;
That leaves me with the following history:
select * from (select * from users FOR SYSTEM_TIME BETWEEN (NOW() - INTERVAL 1 DAY) and (NOW())) as history where id=1;
+----+--------------+-----------------------------+-------------+----------------------+
| id | name | email | start_trxid | end_trxid |
+----+--------------+-----------------------------+-------------+----------------------+
| 1 | cory | corycollier#corycollier.com | 697377 | 697384 |
| 1 | Cory Collier | corycollier#corycollier.com | 697384 | 697391 |
| 1 | Cory C | corycollier#corycollier.com | 697391 | 697394 |
| 1 | Cory | corycollier#corycollier.com | 697394 | 697401 |
I don't have a way to delete history for that user. I'd like to.

Related

Oracle 11g Insert into from another table that has duplicates

Ok so I have 2 new tables: Client and Contract. I'm gonna focus on the first one as they have the same structure. Client looks like:
+-----------+---------+
| client_id | name |
+-----------+---------+
| Value 1 | Value 2 |
+-----------+---------+
And created like this:
CREATE TABLE Client (
client_id varchar2(15) NOT NULL,
name varchar2(100) NOT NULL,
CONSTRAINT Client_pk PRIMARY KEY (client_id)
) ;
Also I have an old table: old_contracts looking like:
+------------+----------+------+
| contractid | clientid | name |
+------------+----------+------+
| con1 | cli1 | n1 |
| con2 | cli2 | n2 |
| con3 | cli2 | n2 |
| con4 | cli3 | n3 |
| con5 | cli3 | n3 |
+------------+----------+------+
Defined:
CREATE TABLE old_contracts(
contractid varchar2(15) NOT NULL
clientid varchar2(15) NOT NULL,
name varchar2(100) NOT NULL
) ;
I want to take the data from old_contract and insert it into Client.
This old_contracts table has rows with duplicate clientid (one client can have more than one contract) but I don't want to have duplicates on Client table so I am doing this:
INSERT INTO Client (
client_id,
name
) SELECT DISTINCT
clientid,
name
FROM old_contracts;
to not get duplicates. Anyway, I'm getting this error:
Error SQL: ORA-00001: unique constraint (USER.CLIENT_PK) violated
00001.00000 - "unique constraint (%s.%s) violated"
What's going on? I believe the DISTINCT keyword was going to do the thing.
I've also tried adding a WHERE NOT EXISTS clause as suggested in related posts (i.e. this one), but the result I'm getting is the same error.
Most likely, the name is not always the same for a given clientid.
Try this instead:
INSERT INTO Client (
client_id,
name
) SELECT clientid,
max(name)
FROM old_contracts
GROUP BY clientid;

TSQL procedural issue and advice

I am looking for advice on the best way to accomplish the following
I have a table in SQL Server that holds downloaded data from an external system. I need to use it to update another database. Some records will be inserts and others will be updates. There is a comment table and a main table to insert/update. Comments are linked by an ID created in the comments table and stored in a column of the main table record. (one to one relationship)
So insert into comment table and get a scope_identity return value and then use that as part of the insert statement for the main table.
The updates get the comment ID from the record in the main table and then update the comment table where necessary and also the main table where necessary
EG Table has 5 records
Get first record
If exists in database
get commentID column from comment table and update comment and main table
If not exists
insert into comment table and return comment ID and insert the record into the main table with that comment ID
get the next record
I'm struggling to figure out how to best do this in SQL Server. Can't find the right combination of cursor, while loops, stored procedure etc. Haven't done much by way of procedural work in SQL Server.
Any advice/help is greatly appreciated
Thanks
Habo. I appreciate the feedback. I do struggle to write a clear concise question. The linked page provides good advice. Hope this script below helps clarify.
Thanks again.
USE TEMPDB
--TABLE TO HOLD JOB RECORDS
create table tbl_jobs(
jobnumber varchar(16) primary key clustered,
jobdesc varchar(50),
commentID int
)
GO
INSERT INTO tbl_jobs VALUES ('Job1','Desc1', '1')
INSERT INTO tbl_jobs VALUES ('Job2','Desc2', '2')
INSERT INTO tbl_jobs VALUES ('Job3','Desc3', '3')
--TABLE TO HOLD JOB RECORD COMMENTS
create table tbl_jobComments(
commentID INT IDENTITY(1,1) NOT NULL,
comment text
)
GO
Insert into tbl_jobComments VALUES ('Comment1')
Insert into tbl_jobComments VALUES ('Comment2')
Insert into tbl_jobComments VALUES ('Comment3')
--TABLE TO HOLD RECORDS DOWNLOADED FROM EXTERNAL SYSTEM
create table tbl_updates(
jobnumber varchar(16) primary key clustered,
jobdesc varchar(50),
comment text
)
GO
INSERT INTO tbl_updates VALUES ('Job1','Desc1Modified', 'Comment1')
INSERT INTO tbl_updates VALUES ('Job2','Desc2', 'Comment2')
INSERT INTO tbl_updates VALUES ('Job3','Desc3Modified', 'Comment3')
INSERT INTO tbl_updates VALUES ('Job4','Desc4', 'Comment4')
GO
--OUTPUT FROM tbl_Jobs
+-----------+---------+-----------+
| jobnumber | jobdesc | commentID |
+-----------+---------+-----------+
| Job1 | Desc1 | 1 |
| Job2 | Desc2 | 2 |
| Job3 | Desc3 | 3 |
+-----------+---------+-----------+
--OUTPUT FROM tbl_JobComments
+-----------+----------+
| commentID | comment |
+-----------+----------+
| 1 | Comment1 |
| 2 | Comment2 |
| 3 | Comment3 |
+-----------+----------+
--OUTPUT FROM tbl_updates
+-----------+---------------+-----------+
| jobnumber | jobdesc | comment |
+-----------+---------------+-----------+
| Job1 | Desc1Modified | Comment1 |
| Job2 | Desc2 | Comment2a |
| Job3 | Desc3Modified | Comment3 |
| Job4 | Desc4 | Comment4 |
+-----------+---------------+-----------+
--DESIRED RESULTS tbl_jobs
+-----------+-----------------+-----------+
| jobnumber | jobdesc | commentID |
+-----------+-----------------+-----------+
| Job1 | Desc1Modified | 1 |
| Job2 | Desc2 | 2 |
| Job3 | Desc3Modified | 3 |
| Job4 | Desc4 | 4 |
+-----------+---------+-------------------+
--DESIRED RESULTS tbl_jobs_comments
+-----------+-----------+
| commentID | comment |
+-----------+-----------+
| 1 | Comment1 |
| 2 | Comment2a |
| 3 | Comment3 |
| 4 | Comment4 |
+-----------+-----------+
You can break this into 2 statements, an update and an insert query
(This assumes there is only 1 comment per ID)
UPDATE maintable
SET Comment=upd.comment
FROM maintable mt
JOIN updatestable upd
ON mt.id=upd.id
then insert what is missing:
INSERT INTO maintable (id,comment)
SELECT id, comment
FROM updatestable
WHERE id NOT IN (SELECT id FROM maintable)

Database structure for storing personal skills

I need to design a database for storing skills for a person, a person can have none,one or several skills, what is a good way to store it when it comes to easy modification of skill and fast search?
I have been thinking
1. use a bit array, each bit position represents a skill,
2. a relation table that each row link a person to a SKILL
3. each skill as a field in the table of the person
Any other suggestion or what should I aim for?
First, we need a persons table (all code examples use MySQL syntax):
CREATE TABLE IF NOT EXISTS `persons` (
`id` int unsigned NOT NULL AUTOINCREMENT,
`first_name` varchar(50) NOT NULL,
`last_name` varchar(50) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB Comment='Persons';
And pretend this is the data in the table:
|----|------------|-----------|
| id | first_name | last_name |
|----|------------|-----------|
| 1 | John | Doe |
| 2 | Benny | Hill |
| 3 | Linus | Torvalds |
| 4 | Donald | Knuth |
| .. | .......... | ......... |
|----|------------|-----------|
Then we need a skills table to hold all known skills:
CREATE TABLE IF NOT EXISTS `skills` (
`id` int unsigned NOT NULL AUTOINCREMENT,
`name` varchar(50) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB Comment='Skills';
|----|---------------|
| id | name |
|----|---------------|
| 1 | Swimming |
| 2 | Pilot |
| 3 | Writing |
| 4 | Create kernel |
| 5 | Astronaut |
| .. | ............. |
|----|---------------|
Finally we need a table that associates a person with a skill:
CREATE TABLE IF NOT EXISTS `persons_skills` (
`person_id` int unsigned NOT NULL,
`skill_id` int unsigned NOT NULL,
PRIMARY KEY (`person_id`, `skill_id`),
KEY (`person_id`),
KEY (`skill_id`)
) ENGINE=InnoDB Comment='Skills held by every person';
ALTER TABLE `persons_skills`
ADD FOREIGN KEY (`person_id`) REFERENCES `persons` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
ADD FOREIGN KEY (`skill_id`) REFERENCES `skills` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;
The primary key is defined so that no person can be associated with the same skill more than once and both columns are foreign key to their respective tables.
Assume the data below:
|-----------|----------|
| person_id | skill_id |
|-----------|----------|
| 1 | 1 |
| 2 | 1 |
| 2 | 2 |
| 3 | 1 |
| 3 | 4 |
| 4 | 2 |
| 4 | 3 |
| ......... | ........ |
|-----------|----------|
This data would indicate that John Doe, Benny Hill and Linus Torvalds all have the skill "Swimming". Benny Hill and Donald Knuth are both pilots. Linus Torvalds created a kernel. And Donald Knuth is a writer. None of the persons are an Astronaut...
It's a clasic many to many relationship so I would suggest a persons table, skills table and a personToSkill table. You other suggested solutions might be tempting at first, but they are both a maitnence hell.

SQL Server : Insert Multiple Rows into Multiple Tables From Table Type Paramater

I'm trying to write a stored procedure which takes in a table type parameter and inserts into two tables at once.
I have an entity table which is a base table holding the id for various tables, below is the entity table and a sample Site table.
------ Entity Table ------------------------------------------
| Id | bigint | NOT NULL | IDENTITY(1,1) | PRIMARY KEY
| TypeId | tinyint | NOT NULL |
| Updated | datetime | NULL |
| Created | datetime | NOT NULL |
| IsActive | bit | NOT NULL |
------- Site Table ---------------------------------------
| EntityId | bigint | NOT NULL | PRIMARY KEY
| ProductTypeCode | nvarchar(8) | NOT NULL | PRIMARY KEY
| SupplierCode | nvarchar(8) | NOT NULL | PRIMARY KEY
| Name | nvarchar(128) | NOT NULL |
| Description | nvarchar(max) | NULL |
And here is my table type used to pass into the stored procedure
------- Site Table Type ----------------------------------
| EntityTypeId | tinyint | NOT NULL |
| ProductTypeCode | nvarchar(8) | NOT NULL | PRIMARY KEY
| SupplierCode | nvarchar(8) | NOT NULL | PRIMARY KEY
| Name | nvarchar(128) | NOT NULL |
| Description | nvarchar(max) | NULL |
The idea is that I will pass in a table type parameter into the stored procedure and insert multiple rows at once to save looping inserting one row at a time.
Here's what I have so far
CREATE PROCEDURE InsertSites
#Sites SiteTypeTable READONLY
AS
BEGIN
-- Insert into Entity & Site Tables here, using the Id from the Entity Table in the Site table
INSERT INTO Entity (TypeId, Updated, Created, IsActive)
OUTPUT [inserted].[Id], S.ProductTypeCode, S.SupplierCode, S.Name, S.Description
INTO Site
SELECT EntityTypeId, NULL, GETDATE(), 1
FROM #Sites S
END
I've read about using insert and output together but cannot get this to work. I've also read about merge but also cannot get this to work.
Any help or pointers you can give will be greatly appreciated.
Thanks
Neil
---- Edit ----
Could I do something like this? I'm not sure how to finish this off...
CREATE PROCEDURE InsertSites
#Sites SiteTypeTable READONLY
AS
BEGIN
-- First insert enough rows into Entity table, saving the inserted Ids to a table variable
DECLARE #InsertedOutput TABLE (EntityId bigint)
INSERT INTO Entity (TypeId, Updated, Created, IsActive)
OUTPUT [inserted].[id]
INTO #InsertedOutput
SELECT EntityTypeId, NULL, GETDATE(), 1
FROM #Sites S
-- Use the Ids in #InsertedOutput against the rows in #Sites to insert into Sites
END

How to implement Twitter retweet action in my database

I am implementing web application similar to Twitter. I need to implement 'retweet' action, and one tweet can by retweeted by one person multiple times.
I have a basic 'tweets' table that have columns for:
Tweets: tweet_id | tweet_text | tweet_date_created | tweet_user_id
(where tweet_id is primary key for tweets, tweet_text contains tweet text, tweet_date_created is the DateTime when tweet was created and tweet_user_id is the foreign key to users table and identifies user who has created the tweet)
Now I am wondering how should I implement the retweet action in my database.
Option 1
Should I create new join table, which would look like this:
Retweets: tweet_id | user_id | retweet_date_retweeted
(Where tweet_id is a foreign key to tweets table, user_id is a foreign key to users table and identifies user who has retweeted the tweet, retweet_date_retweeted is a DateTime which specifies when the retweet was done.)
pros: There will be no empty columns, when user process reteet, new line in retweets table will be created.
cons: Querying process will be more difficult, it will need to join two tables and somehow sort the tweets by two dates (when tweet is not retweet, sort it by tweet_date_created, when tweet is retweet, sort it by retweet_date_retweeted).
Option 2
Or should I implement it in the tweets table as parent_id, it will then look like this:
Tweets: tweet_id | tweet_text | tweet_date_created | tweet_user_id | parent_id
(Where all the columns remains the same and parent_id is a foreign key to the same tweets table. When tweet is created, parent_id remains empty. When tweet is retweeted, parent_id contains origin tweet id, tweet_user_id contains user which processed the retweet action, tweet_date_created contains the DateTime when retweet was done, and tweet_text remains empty - becouse we will not let users change the original tweet when retweeting.)
pros: Querying process is much more elegant, as I do not have to join two tables.
cons: There will be empty cells every time tweet is retweeted. So if I have 1 000 tweets in my database and every of them is retweeted for 5 times, there will be 5 000 lines in my tweets table.
Which is the most efficient way? Is it better to have empty cells or to have querying process more clean?
IMO option #1 would be better. The query to join the tweet and retweet tables would not be at all complex and could be done via a left or inner join, depending on whether you want to show all tweets or only tweets which were retweeted. And the join query should be performant as the table is narrow, the columns being joined are ints, and they will each have indices due to the FK constraints.
Another recommendation is not to label all your columns with tweet or retweet, those can be inferred from the table in which the data is stored, for example:
tweet
id
user_id
text
created_at
retweet
tweet_id
user_id
created_at
And sample joins:
# Return all tweets which have been retweeted
SELECT
count(*),
t.id
FROM
tweet AS t
INNER JOIN retweet AS rt ON rt.tweet_id = t.id
GROUP BY
t.id
# Return tweet and possible retweet data for a specific tweet
SELECT
t.id
FROM
tweet AS t
LEFT OUTER JOIN retweet AS rt ON rt.tweet_id = t.id
WHERE
t.id = :tweetId
-- Update per request --
The following is demonstrative only, representing why I would opt for option #1, there are no foreign keys nor are there any indices, you will have to add these yourself. But the results should demonstrate that the joins won't be too painful.
CREATE TABLE `tweet` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(10) unsigned NOT NULL,
`value` varchar(255) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=8 DEFAULT CHARSET=utf8
CREATE TABLE `retweet` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`tweet_id` int(10) unsigned NOT NULL,
`user_id` int(10) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=3 DEFAULT CHARSET=utf8;
# Sample Rows
mysql> select * from tweet;
+----+---------+----------------+---------------------+
| id | user_id | value | created_at |
+----+---------+----------------+---------------------+
| 1 | 1 | User1 | Tweet1 | 2012-07-27 00:04:30 |
| 2 | 1 | User1 | Tweet2 | 2012-07-27 00:04:35 |
| 3 | 2 | User2 | Tweet1 | 2012-07-27 00:04:47 |
| 4 | 3 | User3 | Tweet1 | 2012-07-27 00:04:58 |
| 5 | 1 | User1 | Tweet3 | 2012-07-27 00:06:47 |
| 6 | 1 | User1 | Tweet4 | 2012-07-27 00:06:50 |
| 7 | 1 | User1 | Tweet5 | 2012-07-27 00:06:54 |
+----+---------+----------------+---------------------+
mysql> select * from retweet;
+----+----------+---------+---------------------+
| id | tweet_id | user_id | created_at |
+----+----------+---------+---------------------+
| 1 | 4 | 1 | 2012-07-27 00:06:37 |
| 2 | 3 | 1 | 2012-07-27 00:07:11 |
+----+----------+---------+---------------------+
# Query to pull all tweets for user_id = 1, including retweets and order from newest to oldest
select * from (
select t.* from tweet as t where user_id = 1
union
select t.* from tweet as t where t.id in (select tweet_id from retweet where user_id = 1))
a order by created_at desc;
mysql> select * from (select t.* from tweet as t where user_id = 1 union select t.* from tweet as t where t.id in (select tweet_id from retweet where user_id = 1)) a order by created_at desc;
+----+---------+----------------+---------------------+
| id | user_id | value | created_at |
+----+---------+----------------+---------------------+
| 7 | 1 | User1 | Tweet5 | 2012-07-27 00:06:54 |
| 6 | 1 | User1 | Tweet4 | 2012-07-27 00:06:50 |
| 5 | 1 | User1 | Tweet3 | 2012-07-27 00:06:47 |
| 4 | 3 | User3 | Tweet1 | 2012-07-27 00:04:58 |
| 3 | 2 | User2 | Tweet1 | 2012-07-27 00:04:47 |
| 2 | 1 | User1 | Tweet2 | 2012-07-27 00:04:35 |
| 1 | 1 | User1 | Tweet1 | 2012-07-27 00:04:30 |
+----+---------+----------------+---------------------+
Notice in the last set of results, that we were able to also include the retweets and display the retweet of #4 before the retweet of #3.
-- Update --
You can accomplish what you are asking for by changing the query a bit:
select * from (
select t.id, t.value, t.created_at from tweet as t where user_id = 1
union
select t.id, t.value, rt.created_at from tweet as t inner join retweet as rt on rt.tweet_id = t.id where rt.user_id = 1)
a order by created_at desc;
mysql> select * from (select t.id, t.value, t.created_at from tweet as t where user_id = 1 union select t.id, t.value, rt.created_at from tweet as t inner join retweet as rt on rt.tweet_id = t.id where rt.user_id = 1) a order by created_at desc;
+----+----------------+---------------------+
| id | value | created_at |
+----+----------------+---------------------+
| 3 | User2 | Tweet1 | 2012-07-27 00:07:11 |
| 7 | User1 | Tweet5 | 2012-07-27 00:06:54 |
| 6 | User1 | Tweet4 | 2012-07-27 00:06:50 |
| 5 | User1 | Tweet3 | 2012-07-27 00:06:47 |
| 4 | User3 | Tweet1 | 2012-07-27 00:06:37 |
| 2 | User1 | Tweet2 | 2012-07-27 00:04:35 |
| 1 | User1 | Tweet1 | 2012-07-27 00:04:30 |
+----+----------------+---------------------+
I would choose option 2 with slight modification. Column parent_id in tweets table should point to itself if it is not a retweet. Then, the querying will be extremely easy:
SELECT tm.Id, tm.UserId, tc.Text, tm.Created,
CASE WHEN tm.Id <> tc .Id THEN tm.UserId ELSE NULL END AS OriginalAsker
FROM tweet tm
LEFT JOIN tweet tc ON tm.ParentId = tc.Id
ORDER BY tm.Created DESC
(tc is parent table - the one with content.. it has tweet's text, original poster's Id, etc.)
The reason for introducing rule about pointing to itself if not retweet is that then it is easy to add more joins to original tweet. You just join a table with tc and don't care if it is retweet or not.
Not only the query is easy, but it will also perform much better than option 1, because sorting is done using only one physical column, which can be indexed.
The only drawback is that the DB will be a little bit larger.

Resources