Choice of a primary key in SQL table

Choice of a primary key in SQL table - database

I want to make an SQL table to keep track of notes that are added/edited/deleted. I want to be able to display the state of each NOTEID at this moment in a table, display log of changes of selected note and be able to delete all notes marked with a given NOTEID.
create table[dbo].[NOTES]{
NOTEID [varchar](128) NOT NULL,
CREATEDBY [varchar](128) NOT NULL, /*is this redundant?*/
TIMECREATED DATE NOT NULL, /*is this redundant?*/
MODIFIEDBY [varchar](128) NOT NULL,
TIMEMODIFIED DATE NOT NULL,
NOTE [VARCHAR}(2000) NULL,
PRIMARY KEY ( /* undecided */ ),
};
What is the natural way of making this table? Should I autogenerate the primary ID or should I use (NOTEID,TIMEMODIFIED) as the primary key? What kind of fool proof protection should be added?
I would like to be able to display all notes in a "Note history" window. So, I should store note from 3 days ago, when it was created, note from 2 days ago and from today, when it was modified.
However, the "Notes" table will show the final state for each NOTEID. That is
SELECT NOTE from NOTES where NOTEID = 'selected_note_id' and date = latest

The best way is create two tables.
NOTES (
NOTE_ID -- primary key and autogenerated / autonumeric
CREATEDBY -- only appear once
TIMECREATED -- only appear once
NOTE
)
NOTES_UPDATE (
NOTES_UPDATE_ID -- primary key and autogenerated / autonumeric
NOTE_ID -- Foreign Key to NOTES
MODIFIEDBY
TIMEMODIFIED
NOTE
)
You can get your notes updates
SELECT N.*, NU.*
FROM NOTES N
JOIN NOTES_UPDATE NU
ON N.NOTE_ID = NU.NOTE_ID
and to get the last update just add
ORDER BY NOTE_UPDATE_ID DESC
LIMIT 1 -- THIS is postgres sintaxis.

SIMPLE ANSWER:
The PRIMARY KEY should be the value that unique identifies each row in your table. In your particular case, NOTEID should be your id.
ELABORATING:
It is important to remember that a PRIMARY KEY creates an index by default, which means that whenever you do a query similar to:
SELECT * FROM table WHERE NOTEID = something
The query will execute a lot faster than without an index (which is mostly relevant for bigger tables). The PRIMARY KEY is also forced to be unique, hence no two rows can have the same PRIMARY KEY
A general rule is that you should have an INDEX for any value that will often be used within the WHERE ... part of the statement. If NOTEID is not the only value you will be using in the WHERE .... part of the query, consider creating more indexes
HOWEVER! TREAD WITH CAUTION. Indexes help speed up searches with SELECT however they make UPDATE and INSERT work slower.

I think your current table design is fine, though you might want to make the NOTEID the primary key and auto increment it. I don't see the point of making (NOTEID, TIMEMODIFIED) a composite primary key because a given note ID should ideally only appear once in the table. If the modified time changes, the ID should remain the same.
Assuming we treat notes as files on a computer, then there should be only one table (file system) which stores them. If a given note gets modified, then the timestamp changes to reflect this.

Related

Can someone explain this statement (foreign key) of a table

CREATE TABLE genres
(
genre_id INT GENERATED BY DEFAULT AS IDENTITY NOT NULL,
genre VARCHAR(255) N[enter image description here][1]OT NULL,
parent_id INT NULL,
-- Will be thankful to you for explaining the 3 lines below
PRIMARY KEY (genre_id),
CONSTRAINT fk_parent
FOREIGN KEY(parent_id) REFERENCES genres(genre_id)
);

PRIMARY KEY(genre_id) - means that at most one row in the table GENRES can have a specific value. In other words, for every row in the table, the value in GENRE_ID column will be unique. Additionally the value in the column cannot be null, and that value serves to identify the row uniquely without needing any other value to identify the row. The DDL shows that the database will generate the value for the GENRE_ID column by default.
The "CONSTRAINT fk_parent FOREIGN KEY(parent_id) REFERENCES genres(genre_id) )" means that the database-manger
will enforce that if column PARENT_ID is not null then the value in this column must be an existing value in a row in the GENRES table. Another way of thinking about this is that the database-manager is asked to maintain a parent-child relationship, so a genre may have sub-genres (i.e. a genre may have child genres). So the database manager would not let you specify that a particular genre was a sub-genre of a non-existent parent genre.
The database manager might also enforce the relationship is valid over time, for example it might prevent a delete or update action if that delete would produce orphan rows (i.e. child rows with no parent row) , or it may set such parent_id values to null, depending on the DDL and Db2-version/platform.

Answer:Self Referencing Foriegn Key
https://www.red-gate.com/simple-talk/sql/t-sql-programming/questions-about-primary-and-foreign-keys-you-were-too-shy-to-ask/

How to do transaction.insert_or_update on secondary index and not the primary index?

I have a table in Google Cloud Spanner.
CREATE TABLE test_id (
Id STRING(MAX) NOT NULL,
KeyColumn STRING(MAX) NOT NULL,
parent_id INT64 NOT NULL,
Updated TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true),
) PRIMARY KEY (Id)
And, I am trying to perform transaction.insert_or_update through a python script.
For each row in a pandas dataframe, I am doing:
transaction.insert_or_update(
'test_id', columns=['Id','KeyColumn', 'parent_id', 'Updated'],
values=[(uuid.uuid4().hex, row["KeyColumn"], row["parent_id"], spanner.COMMIT_TIMESTAMP)],
)
What I want is that if the row["KeyColumn"] is already present in KeyColumn of the table, update its parent_id column, otherwise insert a new row in the Spanner table corresponding to that KeyColumn.
But since, my primary key is Id which is generated randomly by uuid.uuid4().hex, it every time inserts a new row.

If I understand you correctly, the following is the situation:
ID is the primary key of your table.
There is a unique index defined for the table on the column KeyColumn.
You want to insert_or_update a row using KeyColumn as the column that should be used to determine whether the row already exists.
That is unfortunately not possible. insert_or_update will always use the primary key of the table to determine whether the row exists. I can think of three possible solutions to this problem, but they all have their drawbacks:
You could change the table definition and make KeyColumn the primary key and set a unique index on the Id column. The problem with this is of course that any other code that depends on Id being the primary key also needs to change. It is also a rather cumbersome change, because Cloud Spanner does not allow you to change the primary key of a table, so you would have to create a copy of the test_id table and then drop the old table.
You could fetch the row from Cloud Spanner before updating it by reading it using the KeyColumn value that you have. The big problem with this is obviously performance. You will need to do a read for each row that you want to update.
You could use a DML statement (UPDATE test_id SET parent_id=#parent WHERE KeyColumn=#key) to execute the update and check whether it actually updated a row by checking the returned update count. If it did not update anything, you could then execute the insert. This will obviously also be slower than an insert_or_update mutation.

Here there is a way to query the Cloud Spanner with a specific index.
You should use something like this in the end of your query : FROM test_id#{FORCE_INDEX=KeyColumnIndex} .
Even though this is the way to execute queries on secondary indexes and the answer for the question in the title, I do not know how much it can be applied in your use case.

Building comment system for different types of entities

I'm building a comment system in PostgreSQL where I can comment (as well as "liking" them) on different entities that I already have (such as products, articles, photos, and so on). For the moment, I came up with this:
(note: the foreign key between comment_board and product/article/photo is very loose here. ref_id is just storing the id, which is used in conjunction with the comment_board_type to determine which table it is)
Obviously, this doesn't seem like good data integrity. What can I do to give it better integrity? Also, I know every product/article/photo will need a comment_board. Could that mean I implement a comment_board_id to each product/article/photo entity such as this?:
I do recognize this SO solution, but it made me second-guess supertypes and the complexities of it: Database design - articles, blog posts, photos, stories
Any guidance is appreciated!

I ended up just pointing the comments directly to the product/photo/article fields. Here is what i came up with in total
CREATE TABLE comment (
id SERIAL PRIMARY KEY,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT (now()),
updated_at TIMESTAMP WITH TIME ZONE,
account_id INT NOT NULL REFERENCES account(id),
text VARCHAR NOT NULL,
-- commentable sections
product_id INT REFERENCES product(id),
photo_id INT REFERENCES photo(id),
article_id INT REFERENCES article(id),
-- constraint to make sure this comment appears in only one place
CONSTRAINT comment_entity_check CHECK(
(product_id IS NOT NULL)::INT
+
(photo_id IS NOT NULL)::INT
+
(article_id IS NOT NULL)::INT
= 1
)
);
CREATE TABLE comment_likes (
id SERIAL PRIMARY KEY,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT (now()),
updated_at TIMESTAMP WITH TIME ZONE,
account_id INT NOT NULL REFERENCES account(id),
comment_id INT NOT NULL REFERENCES comment(id),
-- comments can only be liked once by an account.
UNIQUE(account_id, comment_id)
);
Resulting in:
This makes it so that I have to do one less join to an intermediary table. Also, it lets me add a field and update the constraints easily.

Database design - composite key relationship issue

I had posted a similar question before, but this is more specific. Please have a look at the following diagram:
The explanation of this design is as follows:
Bakers produce many Products
The same Product can be produced by more than one Baker
Bakers change their pricing from time-to-time for certain (of their) Products
Orders can be created, but not necessarily finalised
The aim here is to allow the store manager to create an Order "Basket" based on whatever goods are required, and also allow the system being created to determine the best price at that time based on what Products are contained within the Order.
I therefore envisaged the ProductOrders table to initially hold the productID and associated orderID, whilst maintaining a null (undetermined) value for bakerID and pricingDate, as that would be determined and updated by the system, which would then constitute a finalised order.
Now that you have an idea of what I am trying to do, please advise me on how to to best set these relationships up.
Thank you!

If I understand correctly, an unfinalised order is not yet assigned a baker / pricing (meaning when an order is placed, no baker has yet been selected to bake the product).
In which case, the order is probably placed against the Products Table and then "Finalized" against the BakersProducts table.
A solution could be to give ProductsOrders 2 separate "ProductID's", one being for the original ordered ProductId (i.e. Non Nullable) - say ProductId, and the second being part of the Foreign key to the assigned BakersProducts (say ProductId2). Meaning that in ProductsOrders, the composite foreign keys BakerId, ProductId2 and PricingDate are all nullable, as they will only be set once the order is Finalized.
In order to remove this redundancy, what you might also consider is using surrogate keys instead of the composite keys. This way BakersProducts would have a surrogate PK (e.g. BakersProductId) which would then be referenced as a nullable FK in ProductsOrders. This would also avoid the confusion with the Direct FK in ProductsOrders to Product.ProductId (which from above, was the original Product line as part of the Order).
HTH?
Edit:
CREATE TABLE dbo.BakersProducts
(
BakerProductId int identity(1,1) not null, -- New Surrogate PK here
BakerId int not null,
ProductId int not null,
PricingDate datetime not null,
Price money not null,
StockLevel bigint not null,
CONSTRAINT PK_BakerProducts PRIMARY KEY(BakerProductId),
CONSTRAINT FK_BakerProductsProducts FOREIGN KEY(ProductId) REFERENCES dbo.Products(ProductId),
CONSTRAINT FK_BakerProductsBaker FOREIGN KEY(BakerId) REFERENCES dbo.Bakers(BakerId),
CONSTRAINT U_BakerProductsPrice UNIQUE(BakerId, ProductId, PricingDate) -- Unique Constraint mimicks the original PK for uniqueness ... could also use a unique index
)
CREATE TABLE dbo.ProductOrders
(
OrderId INT NOT NULL,
ProductId INT NOT NULL, -- This is the original Ordered Product set when order is created
BakerProductId INT NULL, -- This is nullable and gets set when Order is finalised with a baker
OrderQuantity BIGINT NOT NULL,
CONSTRAINT FK_ProductsOrdersBakersProducts FOREIGN KEY(BakersProductId) REFERENCES dbo.BakersProducts(BakerProductId)
.. Other Keys here
)

SQL Server: how to constrain a table to contain a single row?

I want to store a single row in a configuration table for my application. I would like to enforce that this table can contain only one row.
What is the simplest way to enforce the single row constraint ?

You make sure one of the columns can only contain one value, and then make that the primary key (or apply a uniqueness constraint).
CREATE TABLE T1(
Lock char(1) not null,
/* Other columns */,
constraint PK_T1 PRIMARY KEY (Lock),
constraint CK_T1_Locked CHECK (Lock='X')
)
I have a number of these tables in various databases, mostly for storing config. It's a lot nicer knowing that, if the config item should be an int, you'll only ever read an int from the DB.

I usually use Damien's approach, which has always worked great for me, but I also add one thing:
CREATE TABLE T1(
Lock char(1) not null DEFAULT 'X',
/* Other columns */,
constraint PK_T1 PRIMARY KEY (Lock),
constraint CK_T1_Locked CHECK (Lock='X')
)
Adding the "DEFAULT 'X'", you will never have to deal with the Lock column, and won't have to remember which was the lock value when loading the table for the first time.

You may want to rethink this strategy. In similar situations, I've often found it invaluable to leave the old configuration rows lying around for historical information.
To do that, you actually have an extra column creation_date_time (date/time of insertion or update) and an insert or insert/update trigger which will populate it correctly with the current date/time.
Then, in order to get your current configuration, you use something like:
select * from config_table order by creation_date_time desc fetch first row only
(depending on your DBMS flavour).
That way, you still get to maintain the history for recovery purposes (you can institute cleanup procedures if the table gets too big but this is unlikely) and you still get to work with the latest configuration.

You can implement an INSTEAD OF Trigger to enforce this type of business logic within the database.
The trigger can contain logic to check if a record already exists in the table and if so, ROLLBACK the Insert.
Now, taking a step back to look at the bigger picture, I wonder if perhaps there is an alternative and more suitable way for you to store this information, perhaps in a configuration file or environment variable for example?

I know this is very old but instead of thinking BIG sometimes better think small use an identity integer like this:
Create Table TableWhatever
(
keycol int primary key not null identity(1,1)
check(keycol =1),
Col2 varchar(7)
)
This way each time you try to insert another row the check constraint will raise preventing you from inserting any row since the identity p key won't accept any value but 1

Here's a solution I came up with for a lock-type table which can contain only one row, holding a Y or N (an application lock state, for example).
Create the table with one column. I put a check constraint on the one column so that only a Y or N can be put in it. (Or 1 or 0, or whatever)
Insert one row in the table, with the "normal" state (e.g. N means not locked)
Then create an INSERT trigger on the table that only has a SIGNAL (DB2) or RAISERROR (SQL Server) or RAISE_APPLICATION_ERROR (Oracle). This makes it so application code can update the table, but any INSERT fails.
DB2 example:
create table PRICE_LIST_LOCK
(
LOCKED_YN char(1) not null
constraint PRICE_LIST_LOCK_YN_CK check (LOCKED_YN in ('Y', 'N') )
);
--- do this insert when creating the table
insert into PRICE_LIST_LOCK
values ('N');
--- once there is one row in the table, create this trigger
CREATE TRIGGER ONLY_ONE_ROW_IN_PRICE_LIST_LOCK
NO CASCADE
BEFORE INSERT ON PRICE_LIST_LOCK
FOR EACH ROW
SIGNAL SQLSTATE '81000' -- arbitrary user-defined value
SET MESSAGE_TEXT='Only one row is allowed in this table';
Works for me.

I use a bit field for primary key with name IsActive.
So there can be 2 rows at most and and the sql to get the valid row is:
select * from Settings where IsActive = 1
if the table is named Settings.

The easiest way is to define the ID field as a computed column by value 1 (or any number ,....), then consider a unique index for the ID.
CREATE TABLE [dbo].[SingleRowTable](
[ID] AS ((1)),
[Title] [varchar](50) NOT NULL,
CONSTRAINT [IX_SingleRowTable] UNIQUE NONCLUSTERED
(
[ID] ASC
)
) ON [PRIMARY]

You can write a trigger on the insert action on the table. Whenever someone tries to insert a new row in the table, fire away the logic of removing the latest row in the insert trigger code.

Old question but how about using IDENTITY(MAX,1) of a small column type?
CREATE TABLE [dbo].[Config](
[ID] [tinyint] IDENTITY(255,1) NOT NULL,
[Config1] [nvarchar](max) NOT NULL,
[Config2] [nvarchar](max) NOT NULL

IF NOT EXISTS ( select * from table )
BEGIN
///Your insert statement
END

Here we can also make an invisible value which will be the same after first entry in the database.Example:
Student Table:
Id:int
firstname:char
Here in the entry box,we have to specify the same value for id column which will restrict as after first entry other than writing lock bla bla due to primary key constraint thus having only one row forever.
Hope this helps!

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight