How to do transaction.insert_or_update on secondary index and not the primary index?

How to do transaction.insert_or_update on secondary index and not the primary index? - database

I have a table in Google Cloud Spanner.
CREATE TABLE test_id (
Id STRING(MAX) NOT NULL,
KeyColumn STRING(MAX) NOT NULL,
parent_id INT64 NOT NULL,
Updated TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true),
) PRIMARY KEY (Id)
And, I am trying to perform transaction.insert_or_update through a python script.
For each row in a pandas dataframe, I am doing:
transaction.insert_or_update(
'test_id', columns=['Id','KeyColumn', 'parent_id', 'Updated'],
values=[(uuid.uuid4().hex, row["KeyColumn"], row["parent_id"], spanner.COMMIT_TIMESTAMP)],
)
What I want is that if the row["KeyColumn"] is already present in KeyColumn of the table, update its parent_id column, otherwise insert a new row in the Spanner table corresponding to that KeyColumn.
But since, my primary key is Id which is generated randomly by uuid.uuid4().hex, it every time inserts a new row.

If I understand you correctly, the following is the situation:
ID is the primary key of your table.
There is a unique index defined for the table on the column KeyColumn.
You want to insert_or_update a row using KeyColumn as the column that should be used to determine whether the row already exists.
That is unfortunately not possible. insert_or_update will always use the primary key of the table to determine whether the row exists. I can think of three possible solutions to this problem, but they all have their drawbacks:
You could change the table definition and make KeyColumn the primary key and set a unique index on the Id column. The problem with this is of course that any other code that depends on Id being the primary key also needs to change. It is also a rather cumbersome change, because Cloud Spanner does not allow you to change the primary key of a table, so you would have to create a copy of the test_id table and then drop the old table.
You could fetch the row from Cloud Spanner before updating it by reading it using the KeyColumn value that you have. The big problem with this is obviously performance. You will need to do a read for each row that you want to update.
You could use a DML statement (UPDATE test_id SET parent_id=#parent WHERE KeyColumn=#key) to execute the update and check whether it actually updated a row by checking the returned update count. If it did not update anything, you could then execute the insert. This will obviously also be slower than an insert_or_update mutation.

Here there is a way to query the Cloud Spanner with a specific index.
You should use something like this in the end of your query : FROM test_id#{FORCE_INDEX=KeyColumnIndex} .
Even though this is the way to execute queries on secondary indexes and the answer for the question in the title, I do not know how much it can be applied in your use case.

Related

Adding a primary key column to an existing table in SQL Server?

I need to add a new column and make it the primary key in SQL Server; the table in question does not yet have a unique key column.
Here is the sample table http://www.sqlfiddle.com/#!18/8a161/1/0
My goal is simply to have a column ID and insert values from 1 to 1160 (total number of records in this table) and make that the primary key. Also is there a way to automatically add the numbers from 1-1160 without adding each record one by one since there are 1000+ rows in this table?
Thank you!

Simply alter the table and add the column with the appropriate characteristics.
alter table x add id smallint identity(1,1) not null primary key;
Thrown together quickly and you probably should get in the habit of naming constraints. fiddle. I will note that you may or may not want to use an identity column - think carefully about your actual goal and how you want to continue using this table after the alteration.

The row-limitation in compound primary key in SQL Server 2014

I am going to insert a 2.3 billion rows (2,300,000,000) from table_a into table_b. The schema of table_a and table_b are identical, the only difference is table_a doesn't have a primary key but table_b has set up a 4 columns compound primary key with 0 rows of data. I encounter the error message after 24 hours:
Msg 666, Level 16, State 2, Line 1
The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID 422223771074560. Dropping and re-creating the index may resolve this; otherwise, use another clustering key.
This is my compound PK in table_b and the sample query code, any help will be thankful.
column1: varchar(10), not null
column2: nvarchar(50), not null
column3: nvarchar(100), not null
column4: int, not null
Sample code
insert into table_b
select *
from table_a
where date < '2017-01-01' -- some filters here

According to the SQL Server Documentation part of creating a primary key includes creating a unique index on that same table.
When you create a PRIMARY KEY constraint, a unique index on the
column, or columns, is automatically created. By default, this index
is clustered; however, you can specify a nonclustered index when you
create the constraint.
When a unique index is not on the table, each row gets what the docs are calling a "uniqueifier" which is 4 bytes in length (aka ~2.14 Billion combinations)
If the clustered index is not created with the UNIQUE property, the
Database Engine automatically adds a 4-byte uniqueifier column to the
table. When it is required, the Database Engine automatically adds a
uniqueifier value to a row to make each key unique. This column and
its values are used internally and cannot be seen or accessed by
users.
From this information and your error message we can tell two things:
There is a clustered index on the table
There is not a primary key on the table
Given the volume of the data you're dealing with, I'm betting you have a Clustered Columnstore Index on the table, which in SQL Server 2014 does not have the ability to have a primary key on.
One possible solution is to partition table_b based on particular column value (that has less than 15K unique values based on the limitations specified in the documentation). As a side-note, the same partitioning effort could have a significant impact on minimizing run time of any queries using table_b depending on which column is used in the partition function.

You know that:
If the clustered index is not created with the UNIQUE property, the
Database Engine automatically adds a 4-byte uniqueifier column to the
table. When it is required, the Database Engine automatically adds a
uniqueifier value to a row to make each key unique. This column and
its values are used internally and cannot be seen or accessed by
users.
While it´s unlikely that you will face an issue related with uniqueifiers, we have seen rare cases where customer reaches the uniqueifier limit of 2,147,483,648, generating error 666.
And from this topic about the issue we have:
As of February 2018, the design goal for the storage engine is to not
reset uniqueifiers during REBUILDs. As such, rebuild of the index
ideally would not reset uniquifiers and issue would continue to occur,
while inserting new data with a key value for which the uniquifiers
were exhausted. But current engine behavior is different for one
specific case, if you use the statement ALTER INDEX ALL ON
REBUILD WITH (ONLINE = ON), it will reset the uniqueifiers (across all
version starting SQL Server 2005 to SQL Server 2017).
So, if this is the cause if your issue, you can add additional integer column and build the index over it.

Resetting the primary key to 1

I have a script for microsoft sql server database which has hundreds of tables and tables contains data as well. This is the database of a web application.what I want to do is to delete the previous records and reset the primary key to 1 or 0.
I have tried
`DBCC CHECKIDENT ('dbo.tbl',RESEED,0); `
but it does not work for me as in most of the tables the primary key is not identity.
I can not truncate the table as its primary key is being used as FK in many other tables.
I have also tried to add the identity specification in the primary key of the table and run the checkident query and then changing it back to non-identity spec, but after adding the record again it starts from where it left.
Making changes in the code is not an option for me.
please help.

According with your question I am not sure about the main objective, Why? If you need truncate a lot of tables and change their structures to have an Identity property why you can't disabled the FK? . In the past I have used an standard process for rebuild a table and migrate all the information, this represent a group of steps, I would try to help you but you should follow the next steps.
Steps:
1) Disable FK for alter the structure of your tables. You can get the solution for this task in the next link:
Temporarily disable all foreign key constraints
2) Alter the table with the new property Identity, this is a classic process of ALTER TABLE xxxxxx.
3) Execute the syntax that previously posted :
DBCC CHECKIDENT ('dbo.tbl',RESEED,0);
Try to follow this path and if you have any problem only ask us.

You can not truncate table that have relation. You shoud remove relation firstly.

My understanding of this question:
You have a database with tables that you want to empty and next have them use primary key values starting at 0 or 1.
Some of these tables use an identity value and you already have a solution for those (you know you can find out which columns have an identity by using the sys.columns view? Look for the is_identity column).
Some tables do not use an identity but get their pk values from an unknown source, which we can't modify.
The only solution I see, is creating an after insert trigger (or modifying) on those tables that subtracts from the new pk value.
E.g.: your "hidden generator" will generate a next value 5254, but you want the next pk value to become one:
CREATE TRIGGER trg_sometable_ai
ON sometable
AFTER INSERT
AS
BEGIN
UPDATE st
SET st.pk_col = st.pk_col - 5253
FROM sometable AS st
INNER JOIN INSERTED AS i
ON i.pk_col = th.pk_col
END
You'll have to determine the next value and thus the "subtract value" for each table.
If the code also inserts child records into tables with a foreign key to this table, and uses the previously generated value, you have to modify those triggers as well...
This is a "last resort" solution and something I would recommend against in any scenario that has other options. Manipulating primary key values is generally not a good idea.

SQL Server: how to constrain a table to contain a single row?

I want to store a single row in a configuration table for my application. I would like to enforce that this table can contain only one row.
What is the simplest way to enforce the single row constraint ?

You make sure one of the columns can only contain one value, and then make that the primary key (or apply a uniqueness constraint).
CREATE TABLE T1(
Lock char(1) not null,
/* Other columns */,
constraint PK_T1 PRIMARY KEY (Lock),
constraint CK_T1_Locked CHECK (Lock='X')
)
I have a number of these tables in various databases, mostly for storing config. It's a lot nicer knowing that, if the config item should be an int, you'll only ever read an int from the DB.

I usually use Damien's approach, which has always worked great for me, but I also add one thing:
CREATE TABLE T1(
Lock char(1) not null DEFAULT 'X',
/* Other columns */,
constraint PK_T1 PRIMARY KEY (Lock),
constraint CK_T1_Locked CHECK (Lock='X')
)
Adding the "DEFAULT 'X'", you will never have to deal with the Lock column, and won't have to remember which was the lock value when loading the table for the first time.

You may want to rethink this strategy. In similar situations, I've often found it invaluable to leave the old configuration rows lying around for historical information.
To do that, you actually have an extra column creation_date_time (date/time of insertion or update) and an insert or insert/update trigger which will populate it correctly with the current date/time.
Then, in order to get your current configuration, you use something like:
select * from config_table order by creation_date_time desc fetch first row only
(depending on your DBMS flavour).
That way, you still get to maintain the history for recovery purposes (you can institute cleanup procedures if the table gets too big but this is unlikely) and you still get to work with the latest configuration.

You can implement an INSTEAD OF Trigger to enforce this type of business logic within the database.
The trigger can contain logic to check if a record already exists in the table and if so, ROLLBACK the Insert.
Now, taking a step back to look at the bigger picture, I wonder if perhaps there is an alternative and more suitable way for you to store this information, perhaps in a configuration file or environment variable for example?

I know this is very old but instead of thinking BIG sometimes better think small use an identity integer like this:
Create Table TableWhatever
(
keycol int primary key not null identity(1,1)
check(keycol =1),
Col2 varchar(7)
)
This way each time you try to insert another row the check constraint will raise preventing you from inserting any row since the identity p key won't accept any value but 1

Here's a solution I came up with for a lock-type table which can contain only one row, holding a Y or N (an application lock state, for example).
Create the table with one column. I put a check constraint on the one column so that only a Y or N can be put in it. (Or 1 or 0, or whatever)
Insert one row in the table, with the "normal" state (e.g. N means not locked)
Then create an INSERT trigger on the table that only has a SIGNAL (DB2) or RAISERROR (SQL Server) or RAISE_APPLICATION_ERROR (Oracle). This makes it so application code can update the table, but any INSERT fails.
DB2 example:
create table PRICE_LIST_LOCK
(
LOCKED_YN char(1) not null
constraint PRICE_LIST_LOCK_YN_CK check (LOCKED_YN in ('Y', 'N') )
);
--- do this insert when creating the table
insert into PRICE_LIST_LOCK
values ('N');
--- once there is one row in the table, create this trigger
CREATE TRIGGER ONLY_ONE_ROW_IN_PRICE_LIST_LOCK
NO CASCADE
BEFORE INSERT ON PRICE_LIST_LOCK
FOR EACH ROW
SIGNAL SQLSTATE '81000' -- arbitrary user-defined value
SET MESSAGE_TEXT='Only one row is allowed in this table';
Works for me.

I use a bit field for primary key with name IsActive.
So there can be 2 rows at most and and the sql to get the valid row is:
select * from Settings where IsActive = 1
if the table is named Settings.

The easiest way is to define the ID field as a computed column by value 1 (or any number ,....), then consider a unique index for the ID.
CREATE TABLE [dbo].[SingleRowTable](
[ID] AS ((1)),
[Title] [varchar](50) NOT NULL,
CONSTRAINT [IX_SingleRowTable] UNIQUE NONCLUSTERED
(
[ID] ASC
)
) ON [PRIMARY]

You can write a trigger on the insert action on the table. Whenever someone tries to insert a new row in the table, fire away the logic of removing the latest row in the insert trigger code.

Old question but how about using IDENTITY(MAX,1) of a small column type?
CREATE TABLE [dbo].[Config](
[ID] [tinyint] IDENTITY(255,1) NOT NULL,
[Config1] [nvarchar](max) NOT NULL,
[Config2] [nvarchar](max) NOT NULL

IF NOT EXISTS ( select * from table )
BEGIN
///Your insert statement
END

Here we can also make an invisible value which will be the same after first entry in the database.Example:
Student Table:
Id:int
firstname:char
Here in the entry box,we have to specify the same value for id column which will restrict as after first entry other than writing lock bla bla due to primary key constraint thus having only one row forever.
Hope this helps!

Detailed error message for violation of Primary Key constraint in sql2008?

I'm inserting a large amount of rows into an empty table with a primary key constraint on one column.
If there is a duplicate key error, is there any way to find out the value of the key (or row) that caused the error?
Validating the data prior to the insert is sadly not something I can do right now.
Using SQL 2008.
Thanks!
Doing the count(*) / group by thing is something I'm trying to avoid, this is an insert of hundreds of millions of rows from hundreds of different DB's (some of which are on remote servers)...I don't have the time or space to do the insert twice.
The data is supposed to be unique from the providers, but unfortunately their validation doesn't seem to work correctly 100% of the time and I'm trying to at least see where it's failing so I can help them troubleshoot.
Thank you!

There's not a way of doing it that won't slow your process down, but here's one way that will make it easier. You can add an instead-of trigger on that table for inserts and updates. The trigger will check each record before inserting it and make sure it won't cause a primary key violation. You can even create a second table to catch violations, and have a different primary key (like an identity field) on that one, and the trigger will insert the rows into your error-catching table.
Here's an example of how the trigger can work:
CREATE TRIGGER mytrigger ON sometable
INSTEAD OF INSERT
AS BEGIN
INSERT INTO sometable SELECT * FROM inserted WHERE ISNUMERIC(somefield) = 1 FROM inserted;
INSERT INTO sometableRejects SELECT * FROM inserted WHERE ISNUMERIC(somefield) = 0 FROM inserted;
END
In that example, I'm checking a field to make sure it's numeric before I insert the data into the table. You'll need to modify that code to check for primary key violations instead - for example, you might join the INSERTED table to your own existing table and only insert rows where you don't find a match.

The solution would depend on how often this happens. If it's <10% of the time then I would do the following:
Insert the data
If error then do Bravax's revised solution (remove constraint, insert, find dup, report and kill dup, enable constraint).
This means it's only costing you on the few times an error occurs.
If this is happening more often then I'd look at sending the boys over to see the providers :-)

Revised:
Since you don't want to insert twice, could you:
Drop the primary key constraint.
Insert all data into the table
Find any duplicates, and remove them
Then re-add the primary key constraint
Previous reply:
Insert the data into a duplicate of the table without the primary key constraint.
Then run a query on it to determine rows which have duplicate values for the rpimary key column.
select count(*), <Primary Key>
from table
group by <Primary Key>
having count(*) > 1

Use SSIS to import the data and have it check for this as part of the data flow. That is the best way to handle. SSIS can send the bad records to a table (that you can later send to the vendor to help them clean up their act) and process the good ones.

I can't believe that SSIS does not easily address this "reality", because, let's face it, oftentimes you need and want to be able to:
See if a record exists with a certain unique or primary key
If it does not, insert it
If it does, either ignore it or update it.
I don't understand how they would let a product out the door without this capability built-in in an easy-to-use manner. Like, say, set an attribute of a component to automatically check this.