Remove duplicate rows in powerpivot - relationship

When I create a relationship between two tables, it says each column has duplicate entries.
How can I delete them? I just have to check my primary key column and remove rows of duplicate entires.

I would resolve this in the Edit Queries window. Select your primary key column and from the Home ribbon, choose Remove Rows / Remove Duplicates.
If you have any interest in understanding what the duplicates are, you can investigate by using Keep Rows / Keep Duplicates.

Related

Use foreign keys to update/ delete matching records in multiple tables in Talend Open Studio for Data Integration

I'm struggling with deleting and updating multiple tables from the same input data.
Here is a screenshot of the job.
I'm comparing the rm table in the v24 MSSQL db and in the v26, if a record was deleted in the v24, it has to be deleted in the v26. This is what's happening with tMap_1 and it works. It finds the matching bl_id, fl_id and rm_id as keys.
But before deleting that record in the rm table, the three keys are used as foreign keys in other tables records and have to be deleted or updated.
In this case, I create a second row from tMap_1 which holds the bl_id, fl_id, rm_id that have to be adapted. I link it to a first table 'rmpct' and use a delete, it seems to work even though it's primary key is not used (I adapted the field options for the deletion keys).
But now what I'm trying to do is in the tables that have matching keys, I need to update the values of these keys in the record (they have to become empty). I tried it with another table here "activity_log" but it doesn't find the matching rows based on the foreign keys and I don't know how to update these values with different values than the input row.
Can somebody help me please?
Thanks!

Entity Framework should check duplicate records based on unique set of values for few columns and then add the new record

I've a scenario where multiple API calls are triggered from different clients through some API(sometimes via WebJob), resulting the API code to be triggered in concurrent manner.
This API code inserts records in SQL through Entity Framework. Due to concurrent calls, the table allows duplicate records to be inserted even when there is a condition to check the existing record. This occurs when there is just a fraction seconds of difference between two calls.
I would like to handle this at Database table level by setting the composite key [no single column can be set as unique] to a combination of three columns, so that SQL will never allow duplicate entries based on these three columns.
However, one of the column for composite key should allow multiple inactive records, but should not allow two entries with active records.
Below example will elaborate the issue :
The red boxed records should not be allowed where the CategoryId and CategoryTypeid is same and both are active.
The green boxed records should be allowed. So if there are multiple records with same CategoryId and CategoryTypeId, only one of them should be active.
Please help in some idea how to design a solution to fix this at DB end.
However, one of the column for composite key should allow multiple inactive records, but
should not allow two entries with active records.
And what is the problem here?
Define 2 indices.
1 unique index with CatergoryId, CategoryTypeId, IsActive = 1 as filter
1 non unique index CatergoryId, CategoryTypeId, IsActive = 0 as filter
Done.
Filters are the little known fact of indices that most poeple overlook.
ONLY way to GUARANTEE it.
just to make an idea also can't comment so putting this as an answer.
maybe you should go trough the union all
select distinct categoryID,categoryTypeID,IsActive from Table where IsActive=1
union all
select categoryID,CategoryTypeID,IsActive from Table where IsActive=0
this may help you.
With #TomTom guidance, I've resolved this issue with below SQL statements.
Created two non-clustered indexes on the table and tried re-inserting the duplicates and got the appropriate error. Creating these indexes makes sure that only one active record can exist in the table.
--add unique nonclustered filtered index that allows only one active record
CREATE UNIQUE NONCLUSTERED INDEX unique_active_projects
ON TestDuplicates(CategoryId,CategoryTypeId,IsActive)
WHERE IsActive = 1
GO
--add non-unique nonclustered filtered index to allow multiple non-active records
CREATE NONCLUSTERED INDEX nonunique_inactive_projects
ON TestDuplicates(CategoryId,CategoryTypeId,IsActive)
WHERE IsActive = 0
GO
Hope this helps someone with a similar issue!

Create Primary key in sql server

I have one table in sql server having duplicate ID,but I can not delete those duplicate records .Now the requirement is to create the primary key on that column which is having duplicate data. Is there any way to create the primary key without changing the data.
No, there is no way to a add a PRIMARY KEY constraint to a column that already has duplicate values.
Creating and Modifying PRIMARY KEY Constraints:
When a PRIMARY KEY constraint is added to an existing column or
columns in the table, the Database Engine examines the existing column
data and metadata to make sure that the following rules for primary
keys:
The columns cannot allow for null values.
There can be no duplicate values.
If a PRIMARY KEY constraint is added to a
column that has duplicate values or allows for null values, the
Database Engine returns an error and does not add the constraint.
In case ID column is incremental, then a possible workaround is to add a unique filtered index:
CREATE UNIQUE INDEX AK_MyUniqueIndex ON dbo.MyTable (ID)
WHERE ID > ... max value of existing ID here
This way, uniqueness will be applied only to newly added records.
I know this is old, but, had this idea that I wanted to share:
Step 1. Add a non-nullable int column with a default value that can
be 0
Optional step. Update that column to a 1, so you are able to
identify this existing records afterwards.
Step 2. Update column in all existing rows where there are duplicates with a standard rownumber() using a combination of unique columns or all columns.
Step 3. Define primary key with your ID column first (So, it is
indexed first), then add Step 1 column.
And you are done and with a special column that can helps identify the duplicates easily and the new records which will be all marked as 0, but the best practice would be to add a character or number to all Ids if possible and standardize (This approach helps to do that afterwards), or use something like by year sequence, etc.

How to prevent updating duplicate rows in SQLite Database?

I'm inserting new rows into a SQLite table, but I don't want to insert duplicate rows.
I also don't want to specify every column in the database if possible.
I don't even know if this is possible.
I should be able to take my values and create a new row with them, but if they duplicate another row they should either overwrite the existing row or do nothing.
This is one of the very first steps in database design and normalization. You have to be able to explicitly define what you mean by a duplicate row, and then place a primary key constraint, (or a unique constraint), on the columns in your table that represent that definition.
Before you can define what duplicate means, you have to define (or decide) exactly what the table is to contain,. i.e., what real-world business domain entity or abstraction each row in the table represents, or will hold data for...
Once you have done this, the PK or unique constraint will stop you from inserting duplicate rows... The same PK will help you find the duplicate row when it does exist, and update it with the values of the non-duplicate-defining (non-PK) columns that are different from the values in the existing duplicate row. Only after all this has been done, can an insert or replace (as defined by SQL Lite) process help. This command checks whether a duplicate row (*as dedined by yr PK constraint) exists, and if it does, instead of inserting a new row, it updates the non-PK defined columns in that row with the values spplied by your Replace query.
Your desires appear mutually contradictory. While Andrey's insert or replace answer will get you close to what say you want, what should probably clarify for yourself what you really want.
If you don't want to specify every column, and you want a (presumably) partial row to update rather than insert, you should probably look at the unique constraint and know that the ambiguity of your requirements was also made by the SQL92 Committee.
http://www.sqlite.org/lang_insert.html
insert or replace might interest you

SQL Server unique index allowing duplicates

I am using SQL Server 2008, had a table with an id (numeric) column as the primary key. Also had a unique index on three varchar columns. I was able to add a row with the exact same set of the three columns. I verified it with a simple query on the values and 2 rows were returned.
I edited the index and added the id column. When I tried to edit it again and remove the id column it complained that there were duplicate rows, it deleted the index but couldn't create it.
I then clean the database of the duplicated, recreated the index with the same 3 varchars as unique and nonclustered and now it works properly, not allowing duplicates.
Does anyone know why the uniqueness of this index was ignored?
The index could had been disabled (see Disabling Indexes), your 'duplicate' values may had been different (trailing spaces for example), or your test may be incorrect.
For sure you did not insert a duplicate in a enforced unique index.
I'm not a pro on this subject, but the 'is unique'-setting of the index probably refers to the way the index is build/stored, not that it is a constraint. This probably also means the performance of the index is sub-optimal.
When creating the index, the DBMS might check this.

Resources