Add unique constraint and deduplicate column data - database

I am working on a Spring project which uses PostgreSQL and Liquibase. I need to add a unique constraint to a specific column in a table. The table already has a lot of entries and some of them violate the new unique constraint.
Since the application is in production, dropping the table is not an option. I need to implement some sort of modification to the data in the column, so that duplicates get indexed (e.g. we have 2 entries with the value 'foo', after the operation these entries should look something like 'foo' and 'foo2').
So far I've only implemented the change which adds the unique constraint, but I have yet to implement this modification. Is there any functionality in either PostgreSQL or Liquibase which might address this issue?

You need to create an SQL UPDATE query (or queries) that will modify the database and implement the logic that updates duplicates and sets unique values to them.
Then use change type sql in liquibase to instruct liquibase to run that query.

Related

Autoincrement in Entity Framework 5 without identity column in database

I have not been able to find any appropriate solution for my problem, so here's my question for you:
In Entity Framework (5.0), how can I setup an ID-column (PK) to be autocremented when no identity column is defined in the actual database (SQL Server 2005)?
I have seen the StoreGeneratedPattern, but not sure how this would work without identity in the db. The manual approach would be to manually populate the POCO with MAX(id)+1, but that feels like a hack and I'm worried that it will introduce problems in a multi-threaded environment where multiple requests may insert records to my table at the "same" time.
Note that I do not have the possibility to alter the table schema in the database.
What's the best way to solve this?
If one instance of your application is the only thing inserting rows into this table, then the MAX(Id) + 1 hack is probably good enough. Otherwise, you'll need to alter the database schema to generate these values on insert -- either by using IDENTITY or by re-inventing the wheel using triggers, sprocs, etc.
Whatever your solution, it should guarantee that a duplicate key will never be generated -- even if a transaction happens to rollback one or more inserts.
If nothing else inserts into the table, you should be able to alter Id to an identity column without breaking compatibility.
FYI: Entity Framework's StoreGeneratedPattern (or DatabaseGeneratedOption) only specifies how values are handled on insert and update. Using Identity tells EF that the value is expected to be generated by the database on insert. Computed means it's generated on both insert and update.

Concurrency error with MS-Access Linked Tables

I am linking tables to a SQL 2008R2 DB via MS Access Linked Tables.
I am getting this warning when I want to change the data in an Access linked table where the underlying SQL table has more than one bit field in it:
The record has been changed by another user since you started editing
it. If you save the record, you will overwrite the changes the other
user made
I don't have any problems when there is only one bit field in the table. It's really a strange error imho. Has any one else encountered this before and found a work around for it by any chance?
I've seen this sort of issue in working with linked tables in general with SQL. I'm not sure why you're seeing the issue specifically with bit fields. Try adding a 'ts' column with the datatype of timestamp (rowversion) to the table and relink it in Access.
I know this is an old question, but maybe my answer will benefit others since I struggled with same and other similar issues.
I had similar error and was mostly able to get around it. One thing that may help is to use SQL Profiler on the database and watch the SQL commands made by Access while you are trying to add a new row.
Few things to check..
1) Verify that you have an ID column in the table set as the Primary key and AutoNumber
2) If this involves a master/child relationship between another table, in the Access Database Tools "Relationships", specify the relationship and the join type between these types.
3) If a join between tables, then play around with the primary column and foreign column being exposed in the query.
Using the SQL Profiler, I would see where it would try to find the row to update based on other columns besides the primary key. e.g.
update table
set ...
where id = 5 and data1 = somevalue and data2 == othervalue
When doing this, I would sometimes get the same error since I may have edited other values in the new row and therefore the complex where clause would fail. What you want is to have the update rely totally on the primary key.

SQL Server update query that only update table itself not indexes

I need to write a query that update only table not indexes
because I want to update an int field and don't need to 10 huge index to be updated
If the int field is included in any of the index definitions then they will have to be updated too.
SQL Server won't allow the base table to have one value and the indexes another for obvious data integrity reasons.
If the int field is not included in any of the index definitions then only the table will be updated anyway.
You can disable the indexes but to re-enable them involves rebuilding the whole index.
It depends on what you really want to do
Keeping the index consistent with the table data is the Consistency in ACID. This is how SQL Server and ACID-compliant RDBMSes work.
There are cases such as bulk loads where you want to delay this Consistency. So if you have this use case, DROP or DISABLE the indexes.
If you disable the indexes:
they will never be used for any query-
all associated unique and foreign keys etc will be disabled too
they are not maintained
If you DROP them, of course they can't be used either.
After your bulk load is finished, you enable or create the indexes/constraints again.
If this is what you really want then read MSDN:
Disabling Indexes
Guidelines for Disabling Indexes and Constraints
Perhaps Filtered Indexes are what you're looking for.
This is a SQL Server 2008 feature that lets you create an index that only applies to certain values in a column.

How do I manage identities with ETL?

I need help figuring out a workflow and I'm not sure how to go about it... Let's say I'm transforming (ETL?) data from Table A to Table B. Table A has a composite primary key A.a+A.b+A.c, while Table B has just an automatically populated identity column. How can I map the composite keys from A back to the identities created when inserting into B?
Preferably I would like to not have any columns in table B related to A's composite key because there are many other tables that need to undergo the same operation but don't have the same composite key structure.
If I understand you correctly, you can't relate records from table B back to the records of table A after the transformation unless you somehow capture a mapping between A's composite key and B's identifier during the transformation.
You could add a column to A and pre-compute the identifiers to be used when inserting into B. Then you would have a mapping. This could also be done using a separate mapping table, if you don't want to add a column to A.
If you don't want to override the default assignment of identifiers, then you will have to capture them during the load. Oracle provides the returning clause for insert in PL/SQL for this purpose. I'm not sure about SQL Server. It may also be possible to accomplish this by using a trigger on B to insert into a separate mapping table or update a column in A. Though that's likely to slow down your load considerably.
If nothing else, you could create additional columns in B to hold the keys of A during the load, query out the mappings into a separate table afterwards, and then drop the extra columns.
I hope that helps.
Ask yourself exactly what you need the original keys for. The answer may vary depending on the source system. This may lead you to maintain a "source system" column and a "original source keys" column. The latter may need to be a comma-delimited list of the original keys.
Or, you may find that you never actually need to map back, so don't need to keep anything.

Preventing Duplicate Inserts Into SQL With PHP

I'm going to running thousands of queries into SQL and I need to prevent the duplication of field 'domain'. Never had to do this before and any help would be appreciated.
You probably want to create a "UNIQUE" constraint on the field "Domain" - this constraint will raise an error if you create two rows that have the same domain in the database. For an explanation, see this tutorial in W3C school -
http://www.w3schools.com/sql/sql_unique.asp
If this doesn't solve your problem, please clarify the database you have chosen to use (MySql?).
NOTE: This constraint is completely separate from your choice of PHP as a programming language, it is a SQL database definition thing. A huge advantage of expressing this constraint in SQL is that you can trust the database to preserve the constraint even when people import / export data from the database, your application is buggy or another application shares the database.
If this is an absolute database integrity requirement (It's not likely to change, nor does existing data have this problem), I would enforce it at the database with a unique constraint.
As far as detecting it before or after the attempt in order to notify the user, there are a number of techniques which could be used.
Where is the data coming from? Is this something you only want to run once, or a couple of times, or often? If the domain-value already exists, do you just want to skip the insert or do something else (ie increment a counter)?
Depending on your answers, there are many possible solutions:
Pre-sort your data, eliminate duplicates, then insert
(assumes relatively static data, empty table to begin with)
Use an associative array in PHP as a local domain-value cache
(if table already contains data, start by reading existing content;
not thread-safe, but works if it only runs once at a time)
Make domain a UNIQUE column and write wrapper code to handle return errors
Make domain a UNIQUE or PRIMARY KEY column and use an ON DUPLICATE KEY clause:
INSERT INTO mydata ( domain, count ) VALUES
( 'firstdomain', 1 ),
( 'seconddomain', 1 ),
( 'thirddomain', 1 )
ON DUPLICATE KEY
UPDATE count = count+1
Insert all data into the table, then remove duplicates
Note that batching inserts (ie using multiple value clauses per statement) can be significantly faster.
I'm not really sure I understood your question, but perhaps you are looking for SQL's "UNIQUE" constraint. If the query tries to insert a pre-existing value to a field, you (PHP) will be notified about this constraint breach.
There are a bunch of ways to approach this. You could set a unique constraint (like a primary key) on that column. This will cause the insert to fail if that domain has also been inserted. You could also insert all of the duplicate domains and just delete them later on. This will work well if not that many of the domains are duplicated. There are a few questions posted already on finding duplicate rows.
This can be doen with sql, rather than with php.
i am assuming that you are using MySQl, but the same principles will work with different databases.
make the Domain column the primary key. (makes sense, as it has to unique.)
Rather than using INSERT, use UPDATE.
if the primary key already exists (that you are trying to put into the table), update will update the existing tuple, rather than creating a new tuple.
so you will overwrite existing data if it is different, and if it is identical the update will be skipped.

Resources