how to check if a unique value exists in db in cakephp

how to check if a unique value exists in db in cakephp - cakephp

I want to save some data from an external source that has no id other than the data itself (eg. Red, Used, Volvo etc. from the ebay api).
So I have set up my tables like so:
aspectHeaders
id INT AUTO_INCREMENT PRIMARY KEY
name varchar(30)
UNIQUE KEY `NAME`(`name`)
aspects
id INT AUTO_INCREMENT PRIMARY KEY
name varchar(30)
aspectHeader_id INT
UNIQUE KEY `DETAIL` (`aspectHeader_id`,`name`)
aspectHeaders would contain:
7 Manufacturer
and aspects would contain:
1 Volvo 7
So in 2 stages I could check for the existence of any given data in either table and then insert it if it doesn't exist. But my question is can I do it in 1 stage? That is, is there code to check if the data exists and insert it if not and either way return the id?
Hope this is verbose enough.
Thanks

If by "code" you mean a single MySQL statement, I don't think there is. Or at least I don't think there is without making an overly-complicated multi-query query.
Just make an alreadyExists() method (or something) and check before insert - something like:
if(!$this->MyModel->alreadyExists($data)) {
//do insert here
}

I usually use a method like touch() for this that does all the magic internally.
// always returns id (or false on failure)
public function touch($data) {
// check if it already exists
if (exists) {
// return id and do nothing
}
// create() and save()
// return id or false if invalid data
}

Related

REST API vs upserting and "updeleting"

I am building an API with a table (I am deliberately simplifying the schema below to only focus on what is questionable) that looks like this:
CREATE TABLE IF NOT EXISTS some_table
(
id INTEGER GENERATED ALWAYS AS IDENTITY NOT NULL PRIMARY KEY,
user_ids TEXT[] NOT NULL,
field_1 TEXT NOT NULL,
field_2 TEXT NOT NULL,
field_3 TEXT NOT NULL,
hash_id TEXT GENERATED ALWAYS AS ( MD5(field_1 || field_2 || field_3)) STORED UNIQUE NOT NULL
)
The API is a bit trickier than a conventional CRUD in that:
(1) Inserting into the table depends on whether md5(field_1||field_2||field_3) already exists. If it does, I need to append the value user_id to the array field user_ids. Else, insert the row.
(2) Deleting a row also depends on the state of user_ids. Actually, my current implementation makes the database handle deletions in that there is a trigger that acts on updates and deletes rows whenever cardinality(user_ids) = 0.
CREATE OR REPLACE FUNCTION delete_row() RETURNS trigger AS
$$
BEGIN
IF tg_op = 'UPDATE' THEN
DELETE FROM some_table WHERE id = NEW.id;
RETURN NULL;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER some_table_delete_row
AFTER UPDATE
ON some_table
FOR EACH ROW
WHEN (CARDINALITY(NEW.user_ids) = 0)
EXECUTE PROCEDURE delete_row();
As you can see, there is no traditional deleting. What really happens is removing items from user_ids until the length of the array is 0 and then the database will autoremove the row.
I think that the PUT method is the best match for how I want to implement upserts.
It's trickier with DELETE/PUT for decrementing user_ids. Ideally, that looks likes PATCH in that only one field is modified at a time and nothing is allowed to be deleted manually.
Using an auto-generated hash_id value is convenient. That said, I am not sure whether it's the best option when I think of how deletes should work. The endpoint for that is base_url/items/{hash_id}, but in this case I will also need to calculate the hashed value in code or, as another option, just always pass the object in the request so that I can do WHERE hash_id = md5($field_1 || $field_2 || $field_3).
What do you think?

Choice of a primary key in SQL table

I want to make an SQL table to keep track of notes that are added/edited/deleted. I want to be able to display the state of each NOTEID at this moment in a table, display log of changes of selected note and be able to delete all notes marked with a given NOTEID.
create table[dbo].[NOTES]{
NOTEID [varchar](128) NOT NULL,
CREATEDBY [varchar](128) NOT NULL, /*is this redundant?*/
TIMECREATED DATE NOT NULL, /*is this redundant?*/
MODIFIEDBY [varchar](128) NOT NULL,
TIMEMODIFIED DATE NOT NULL,
NOTE [VARCHAR}(2000) NULL,
PRIMARY KEY ( /* undecided */ ),
};
What is the natural way of making this table? Should I autogenerate the primary ID or should I use (NOTEID,TIMEMODIFIED) as the primary key? What kind of fool proof protection should be added?
I would like to be able to display all notes in a "Note history" window. So, I should store note from 3 days ago, when it was created, note from 2 days ago and from today, when it was modified.
However, the "Notes" table will show the final state for each NOTEID. That is
SELECT NOTE from NOTES where NOTEID = 'selected_note_id' and date = latest

The best way is create two tables.
NOTES (
NOTE_ID -- primary key and autogenerated / autonumeric
CREATEDBY -- only appear once
TIMECREATED -- only appear once
NOTE
)
NOTES_UPDATE (
NOTES_UPDATE_ID -- primary key and autogenerated / autonumeric
NOTE_ID -- Foreign Key to NOTES
MODIFIEDBY
TIMEMODIFIED
NOTE
)
You can get your notes updates
SELECT N.*, NU.*
FROM NOTES N
JOIN NOTES_UPDATE NU
ON N.NOTE_ID = NU.NOTE_ID
and to get the last update just add
ORDER BY NOTE_UPDATE_ID DESC
LIMIT 1 -- THIS is postgres sintaxis.

SIMPLE ANSWER:
The PRIMARY KEY should be the value that unique identifies each row in your table. In your particular case, NOTEID should be your id.
ELABORATING:
It is important to remember that a PRIMARY KEY creates an index by default, which means that whenever you do a query similar to:
SELECT * FROM table WHERE NOTEID = something
The query will execute a lot faster than without an index (which is mostly relevant for bigger tables). The PRIMARY KEY is also forced to be unique, hence no two rows can have the same PRIMARY KEY
A general rule is that you should have an INDEX for any value that will often be used within the WHERE ... part of the statement. If NOTEID is not the only value you will be using in the WHERE .... part of the query, consider creating more indexes
HOWEVER! TREAD WITH CAUTION. Indexes help speed up searches with SELECT however they make UPDATE and INSERT work slower.

I think your current table design is fine, though you might want to make the NOTEID the primary key and auto increment it. I don't see the point of making (NOTEID, TIMEMODIFIED) a composite primary key because a given note ID should ideally only appear once in the table. If the modified time changes, the ID should remain the same.
Assuming we treat notes as files on a computer, then there should be only one table (file system) which stores them. If a given note gets modified, then the timestamp changes to reflect this.

SET IDENTITY_INSERT ON/OFF needed on application server, but ALTER permission seems dangerous. Suggestion?

We are building a multi-user web app where they need an unique postId for post they create. Each post has (userId, postId) as the compound primary key.
Right now, postId is an identity value, but because of the need to support some operations that require postId to be inserted as is (no re-numbering), we decide to use SET IDENTITY_INSERT ON/OFF.
However, our DBA told us that such operation is not meant be used by the application server because the ALTER permission requirement:
Permissions
User must own the table or have ALTER permission on the table.
https://msdn.microsoft.com/en-ca/library/ms188059.aspx
If the application server got hacked, with ALTER permission it seems rather risky. Our DBA suggests us to not use identity value at all, and locally generate an unique postId per user.
Can SET IDENTITY_INSERT ON be left on globally?
If it can't be left on globally, does avoiding identity value and use local generation of postId (per user) with max(postId)+1 per user make sense? We much prefer to use identity value if possible because we are worried about possible deadlocks and performance issues associated with custom postId generation.

Starting with SQL Server 2012 you can use sequences like in Oracle. You may be better off with those. First, create the sequence:
CREATE SEQUENCE mySeq AS LONG START WITH 1 INCREMENT BY 1;
GO
Then have the table's primary key default to the next sequence value (instead of being an IDENTITY value):
CREATE TABLE myTable (
myPK LONG PRIMARY KEY DEFAULT (NEXT VALUE FOR mySeq),
myWhatever...
);
If you don't specify a PK value with an INSERT you'll get a unique, generated sequence value. It's basically the same behavior as an IDENTITY. But if you want to specify a PK value you can, as long as you don't violate the primary key's uniqueness - but again, that's the same behavior as an IDENTITY with SET IDENTITY INSERT ON.

It sounds like you need to evaluate your database design if this is possible. A post should be a fixed entity and an identity column as a single primary key should be sufficient. In your comment you mentioned that you might want to copy posts from one user to another user. If you want to split the post so that user1 and user2 can independently control their own versions of the post, then it's just a matter of copying all the post attributes into a new record (which creates a new identity key) and then updating the new records user attribute from User1 to User2. But if you want the users to share the same post... then you should do that with a relationship from user to post to avoid the need to maintain duplicate data in your post table. In other words, if you want to assign user1 and user2 to an identical version of the post, then create a relationship table with two fields (Post ID, User ID). This way you can simply add a user to the post by inserting a new record into the relationship table.
Example: Post 1 is owned by user 1. Post 2 is owned by user 1 and 2.
Post Table - Key (Post ID)
(Post ID=1, PostText="This post is important!")
(Post ID=2, PostText="This post is also important!")
Users - Key (User ID)
(User ID=1, Name="Bob")
(User ID=2, Name="George")
Post Users - Key (Post ID, User ID)
(Post ID=1, User ID=1)
(Post ID=2, User ID=1)
(Post ID=2, User ID=2)

This concerns me a little:
"...the need to support some operations that require postId to be inserted as is (no re-numbering)..."
I assume this sort of operation is the exception, and not the norm? I can also only assume that you're inserting the same post with the same Post ID into the same table without deleting the original? It's still not clear WHY you want to do this.
I really don't see why you'd need to worry about changing the Post ID if you're assigning posts to another user. Nothing else changes except the values in the User ID column by the sounds of it. Unless you mean you can have two or more posts with the same Post ID and the same User ID. If so, why?
To answer your questions:
No, IDENTITY_INSERT cannot be set globally. It is a per object, per session setting.
Using MAX(PostID) + 1 doesn't make sense. Is there a reason why IDENTITY(1, 1) doesn't work for the PostID column? You can re-seed if necessary.
Don't use application-generated UUIDs as key values. They make queries so much slower. At worst, use NEWSEQUENTIALID() in SQL Server if you absolutely want to use a UUID. UUIDs unnecessarily bloat tables and indexes, and with the exception of NEWSEQUENTIALID, are query performance killers.
What you could do is have a Primary key column simply called ID, and then a surrogate key called Post ID if that needs to be non-unique. That way, when you copy a post, the copy of the original gets a new ID, but still retains the original Post ID, but with no need to worry about doing anything unnecessary to the PK.
However, without any changes to the application or the DB, what I would suggest is using a stored procedure executed as the owner of the stored proc (which also owns the Posts table) to set IDENTITY_INSERT to ON only when absolutely necessary. Here is an example I've just written:
create procedure dbo.sp_CopyPost
(
#PostID INT
,#UserID INT
)
with execute as owner
as
begin
set nocount on;
set identity_insert dbo.Posts on;
begin tran
insert into dbo.Posts
(
PostID
,UserID
--Whatever other columns
)
values
(
#PostID
,#UserID
--Whatever other values
)
commit tran
set identity_insert dbo.Posts off;
select ##IDENTITY
end
That will do what you want, based on the current wording of your question, and the comments you've made.
If you need to essentially "plug the gaps" in your identity column, you will find Microsoft-recommended queries to do so in section B here:
https://msdn.microsoft.com/en-us/library/ms186775.aspx

SQL Server Auto Incrementing Identity Per Customer(Tenant) With No Gaps

We have a multi-tenant database which holds multiple customers with each customer having a collection of users like so (Simplified example omitting foreign key specification from users to customers):
CREATE TABLE dbo.Customers
(
CustomerId INT NOT NULL IDENTITY(1, 1),
Name NVARCHAR(256) NOT NULL
)
CREATE TABLE dbo.Users
(
User INT NOT NULL IDENTITY(1, 1),
CustomerId INT NOT NULL,
)
As part of this design the users are required to have a membership number, when we designed this we decided to use the UserId as the membership number however as with all things this requirement has grown and this is no longer an option for two reasons:
After we upgraded to 2012 on each server restart the column is jumping by 1000 values, we have used the workaround shown here: http://www.codeproject.com/Tips/668042/SQL-Server-2012-Auto-Identity-Column-Value-Jump-Is (-t272) to stop that happening but has made us realise that IDENTITY(1, 1) isn't good enough.
What we really want now is to ensure that the number is incremented per customer but it has to be permanent and cannot change once assigned.
Obviously a sequence will not work as again it needs to be per customer we also need to enforce a unique constraint on this per customer/user and ensure that the value is never changed once assigned and does not change if a user is deleted (although this shouldn't happen as we don't delete users but mark them as archived, however I want to guarantee this won't affect it).
Below is a sample of what I wrote which can generate the number, but what is the best way to use this or something similar which ensures a unique, sequential value per customer/user without a chance of any issues as users could be created at the same time from different sessions.
ROW_NUMBER() OVER (ORDER BY i.UserId) + ISNULL((SELECT MAX(users.MembershipNumber)
FROM [User].Users users
WHERE users.Customers_CustomerId = i.Customers_CustomerId), 0)
EDIT: Clarification
I apologise I just re-read my question and I did not make this clear enough, we are not looking to replace UserId, we are happy with the gaps and unique per database identifier that is used on all foreign keys, what we are looking to add is a MembershipNumber that will be displayed to the User which is why it needs to be sequential per customer with no gaps as this membership number will be used on cards that are given to the user so needs to be unique.

Since you already found the problem with Identity columns and how to fix it, I wouldn't say it's not good enough.
However, it doesn't seem to suit your needs since you want the user number to increment per customer.
I would suggest keeping the User column as an Identity column and the primary key of the table, and add another column to specify the User number by customer. this column will also be an integer number with a default value of the result of a UDF that will calculate the next number per customer (see example in this post).
You can protect that value from ever changing by using an instead of update trigger on the users table.
This way to keep a single column primary key, any you have a unique, sequential user number per customer.
Update
Apparently, it is impossible to send column values to a default constraint.
But you can still use an instead of insert trigger to accomplish your goal.

It's because of the default caching sqlserver implements for the sequence objects. See this former thread
Identity increment is jumping in SQL Server database
If the gaps are an issue, sql-server2012 has introduced the Sequence object. These you can declare with NOCACHE, so restarting the Server doesn't create gaps.

I want to share my thoughts on it. Please see below.
Create seperate table which will holds CustomerID and Count columns like below.
CREATE TABLE dbo.CustomerSequence
(
#CustomerID int,
#Count int
);
Write some kind of stored proc like below.
CREATE PROC dbo.usp_GetNextValueByCustomerID
#CustomerID int,
#Count int OUTPUT
AS
BEGIN
UPDATE dbo.CustomerSequence
SET #Count = Count += Count
WHERE CustomerID = #CustomerID;
END
Just call the above stored proc by passing CustomerID and get the next Sequence value from it.

If you have several users adding new registers simultaneously, I think the best idea is to create a compound Primary key, where the user is a tiny byte (if you have less than 255 users) and the incremental number is an integer. Then, when adding a new register you create a string Primary Key, like 'NN.xxxxxx' . Assuming [Number] is your incremental number and [Code] is the user's code (or local machine assigned number), you assign the new UserId using the DMax function , as follows:
NextNumber = Nz(DMax("Number", "clients", "Code=" & Me!code, 0) + 1
UserId= code & "." & NextNumber
where
NN is the user's code
"." is used to separate both fields, and
XXXX is the new Number

Database best practices

I have a table which stores comments, the comment can either come from another user, or another profile which are separate entities in this app.
My original thinking was that the table would have both user_id and profile_id fields, so if a user submits a comment, it gives the user_id leaves the profile_id blank
is this right, wrong, is there a better way?

Whatever is the best solution depends IMHO on more than just the table, but also how this is used elsewhere in the application.
Assuming that the comments are all associated with some other object, lets say you extract all the comments from that object. In your proposed design, extracting all the comments require selecting from just one table, which is efficient. But that is extracting the comments without extracting the information about the poster of each comment. Maybe you don't want to show it, or maybe they are already cached in memory.
But what if you had to retrieve information about the poster while retrieving the comments? Then you have to join with two different tables, and now the resulting record set is getting polluted with a lot of NULL values (for a profile comment, all the user fields will be NULL). The code that has to parse this result set also could get more complex.
Personally, I would probably start with the fully normalized version, and then denormalize when I start seeing performance problems
There is also a completely different possible solution to the problem, but this depends on whether or not it makes sense in the domain. What if there are other places in the application where a user and a poster can be used interchangeably? What if a User is just a special kind of a Profile? Then I think that the solution should be solved generally in the user/profile tables. For example (some abbreviated pseudo-sql):
create table AbstractProfile (ID primary key, type ) -- type can be 'user' or 'profile'
create table User(ProfileID primary key references AbstractProfile , ...)
create table Profile(ProfileID primary key references AbstractProfile , ...)
Then any place in your application, where a user or a profile can be used interchangeably, you can reference the LoginID.

If the comments are general for several objects you could create a table for each object:
user_comments (user_id, comment_id)
profile_comments (profile_id, comment_id)
Then you do not have to have any empty columns in your comments table. It will also make it easy to add new comment-source-objects in the future without touching the comments table.

Another way to solve is to always denormalize (copy) the name of the commenter on the comment and also store a reference back to the commenter via a type and an id field. That way you have a unified comments table where on you can search, sort and trim quickly. The drawback is that there isn't any real FK relationship between a comment and it's owner.

In the past I have used a centralized comments table and had a field for the fk_table it is referencing.
eg:
comments(id,fk_id,fk_table,comment_text)
That way you can use UNION queries to concatenate the data from several sources.
SELECT c.comment_text FROM comment c JOIN user u ON u.id=c.fk_id WHERE c.fk_table="user"
UNION ALL
SELECT c.comment_text FROM comment c JOIN profile p ON p.id=c.fk_id WHERE c.fk_table="profile"
This ensures that you can expand the number of objects that have comments without creating redundant tables.

Here's another approach, which allows you to maintain referential integrity through foreign keys, manage centrally, and provide the highest performance using standard database tools such as indexes and if you really need, partitioning etc:
create table actor_master_table(
type char(1) not null, /* e.g. 'u' or 'p' for user / profile */
id varchar(20) not null, /* e.g. 'someuser' or 'someprofile' */
primary key(type, id)
);
create table user(
type char(1) not null,
id varchar(20) not null,
...
check (id = 'u'),
foreign key (type, id) references actor_master_table(type, id)
);
create table profile(
type char(1) not null,
id varchar(20) not null,
...
check (id = 'p'),
foreign key (type, id) references actor_master_table(type, id)
);
create table comment(
creator_type char(1) not null,
creator_id varchar(20) not null,
comment text not null,
foreign key(creator_type, creator_id) references actor_master_table(type, id)
);