Need to avoid repetition results in sql table - sql-server

I have written a SQL command to get the different entries from two tables. But table 3 is not a steady one it fills one by one (It has a dynamic nature because it is filled by a RFID reader). So the difference table (table2) has multiple entries of same. Please help to avoid this adding same entries.
INSERT INTO table2 (EPC)
SELECT PL.EPC
FROM priorityLevel PL
WHERE NOT EXISTS (
SELECT 1
FROM table3 T3
WHERE PL.EPC = T3.EPC
);
I expect not to repeat same entry.

Related

snowflake merge statement using golden gate json as source table

while executing target table in snowflake using json data as source table
merge into cust tgt using (
select parse_json(s.$1):application_num as application num
from prd_json s qualify
row_number() over(partition application
order_by application desc)=1) src
on tgt.application =src.application
when not matched and op_type='I' then
insert(application) values (src.application );
qualify commands ignores all the duplicate data present and gives only unique record but while putting joins its show only less records when compare to normal select statement.
for example :
select distinct application
from prd_json where op_type='I';
--15000 rows are there
while putting joins it shows there is not matching records in target . if it is not matched it should insert all 15000rows but 8500 rows only inserting even though it was not an duplicate record . is there any function available without using "qualify" shall we insert the record. if i ignore qualify am getting dml error duplication. pls guide me if anyone knows.
How about using SELECT DISTINCT?
You demo SQL does not compile. and you using the $1 means it's also hard to guess the names of your columns to know how the ROW_NUMBER is working.
So it's hard to nail down the problem.
But with the following SQL you can replace ROW_NUMBER with DISTINCT
CREATE TABLE cust(application INT);
CREATE OR REPLACE table prd_json as
SELECT parse_json(column1) as application, column2 as op_type
FROM VALUES
('{"application_num":1,"other":1}', 'I'),
('{"application_num":1,"other":2}', 'I'),
('{"application_num":2,"other":3}', 'I'),
('{"application_num":1,"other":1}', 'U')
;
MERGE INTO cust AS tgt
USING (
SELECT DISTINCT
parse_json(s.$1):application_num::int as application,
s.op_type
FROM prd_json AS s
) AS src
ON tgt.application = src.application
WHEN NOT MATCHED AND src.op_type = 'I' THEN
INSERT(application) VALUES (src.application );
number of rows inserted
2
SELECT * FROM cust;
APPLICATION
1
2
running the MERGE code a second time gives:
number of rows inserted
0
Now if truncate CUST and I swap to using this SQL for the inner part:
SELECT --DISTINCT
parse_json(s.$1):application_num::int as application,
s.op_type
FROM prd_json AS s
qualify row_number() over (partition by application order by application desc)=1
I get three rows inserted, because the partition by application, is effectively binding to the s.application not the output application, and there are 3 different "applications" because of the other values.
The reason I wrote my code this way is your
select distinct application
from prd_json where op_type='I';
implies there is something called application already, in the table.. and thus it runs the chance of being used in the ROW_NUMBER statement..
Anyways, there is a large possible problem is you also have "update data" I guess U in your transaction block, that you want to ORDER BY the sub-select so you never have a Inser,Update trying action in Update,Inser order. And assuming you want all update operations if there are many of them.. I will stop. But if you do not have Updates, the sub-select should have the op_type='I' to avoid the non-insert ops making it. Out, or possible worse again, in your ROW_NUMBER pattern replacing the Intserts. Which I suspect is the underlying cause of your problem.

Can I inner join a previous SQL query then perform counts

I've created a query from a fairly large database.
At the moment each single procedure undertaken by an employee appears as 3 identical timed rows Each row informs the site where procedure occurred, equipment part used and whether day or night.
I want to combine the rows with matching name and time together to create a single row containing all the other fields.
I then want to be able to create a log for the employee to show how many of each procedure has been undertaken and the site and technique used.
I figure an inner join may be the best way to do this but would be grateful for further help as to how to set this up on a sub-query.
Current query:
SELECT procedure, employee, chart_time, form,
FROM cust.records
WHERE employeeID IN () AND procedurelabel LIKE 'rad1'
Really appreciate the help
Your question could use a little clarification... not least being the DDL for your table and a few notes on what each column is. Nonetheless, assuming these rows are all in the cust.records table and that each set of 3 rows have a unique combination of name and time, you could do something like this...
SELECT -- first select fields common to all rows... may as well take these from the first table
records1.procedure, records1.employee, records1.chart_time,
-- ... then select records from your joins
records2.some_column,
records3.some_column
FROM cust.records records1
INNER JOIN WHERE cust.records records2 on records1.chart_time = records2.chart_time
and records1.procedure = record2.procedure
and records1.employee = records2.employee
-- Possible condition required here to not join this row to itself
-- or to explicitly join to a specific type of row
INNER JOIN WHERE cust.records records3 on records1.chart_time = records3.chart_time
and records1.procedure = record3.procedure
and records1.employee = records3.employee
-- Possible condition required here to not join this row to itself
-- or to explicitly join to a specific type of row
WHERE employeeID IN ()
AND procedurelabel LIKE 'rad1'
-- Possible condition required here specify the row to select for records1.
Might also be worth considering a table re-design since what you've described doesn't sound normalised.

Number of rows updated in a oracle table

I have a table called t1 which is already updated by a file. I have table t2 which is created as backup for table t1 before modifications. Now I want to know how many records got updated in table t1. Is there anyway that I can do join with back up table and know how many records got altered? Or how to use sql%rowcount function on a already updated table? Or how should i proceed with ALL_TAB_MODIFICATIONS?
You can join the tables on their primary key (cos you didn't update that, hopefully!) and then compare every column.. you'll have to check for nulls too, and it'll make quite a lot of typing. You could use all_tab_cols and a bit of sql to create your query though (write an sql that creates sql as its output )
Actually, thinking about it, you might be able to get away with less typing by doing a natural join the tables together to get a set of rows that didn't change and removing that set from the original full set:
select * from original
Minus
select original.* from original natural inner join backup
Ive never done it, but the theory is that natural join joins on all equal column names so every column of each table will feature in the join condition. It's an inner join so only columns that have not changed will be represented. Any columns that have become null or become valued from null will also disappear. This is hence the set of rows that have not changed. If all you're after is a count, do a count of the original table less the count of this join result. If you want to know which rows changed, do the result set minus.
Ideally you shouldn't do this; instead at the point the update is run, capture the number of rows it affected. However, this technique could be used long after the update was performed (but before some other update was run)

String or binary data would be truncated error in SQL server. How to know the column name throwing this error

I have an insert Query and inserting data using SELECT query and certain joins between tables.
While running that query, it is giving error "String or binary data would be truncated".
There are thousands of rows and multiple columns I am trying to insert in that table.
So it is not possible to visualize all data and see what data is throwing this error.
Is there any specific way to identify which column is throwing this error? or any specific record not getting inserted properly and resulted into this error?
I found one article on this:
RareSQL
But this is when we insert data using some values and that insert is one by one.
I am inserting multiple rows at the same time using SELECT statements.
E.g.,
INSERT INTO TABLE1 VALUES (COLUMN1, COLUMN2,..) SELECT COLUMN1, COLUMN2,.., FROM TABLE2 JOIN TABLE3
Also, in my case, I am having multiple inserts and update statements and even not sure which statement is throwing this error.
You can do a selection like this:
select TABLE2.ID, TABLE3.ID TABLE1.COLUMN1, TABLE1.COLUMN2, ...
FROM TABLE2
JOIN TABLE3
ON TABLE2.JOINCOLUMN1 = TABLE3.JOINCOLUMN2
LEFT JOIN TABLE1
ON TABLE1.COLUMN1 = TABLE2.COLUMN1 and TABLE1.COLUMN2 = TABLE2.COLUMN2, ...
WHERE TABLE1.ID = NULL
The first join reproduces the selection you have been using for the insert and the second join is a left join, which will yield null values for TABLE1 if a row having the exact column values you wanted to insert does not exist. You can apply this logic to your other queries, which were not given in the question.
You might just have to do it the hard way. To make it a little simpler, you can do this
Temporarily remove the insert command from the query, so you are getting a result set out of it. You might need to give some of the columns aliases if they don't come with one. Then wrap that select query as a subquery, and test likely columns (nvarchars, etc) like this
Select top 5 len(Col1), *
from (Select col1, col2, ... your query (without insert) here) A
Order by 1 desc
This will sort the rows with the largest values in the specified column first and just return the rows with the top 5 values - enough to see if you've got a big problem or just one or two rows with an issue. You can quickly change which column you're checking simply by changing the column name in the len(Col1) part of the first line.
If the subquery takes a long time to run, create a temp table with the same columns but with the string sizes large (like varchar(max) or something) so there are no errors, and then you can do the insert just once to that table, and run your tests on that table instead of running the subquery a lot
From this answer,
you can use temp table and compare with target table.
for example this
Insert into dbo.MyTable (columns)
Select columns
from MyDataSource ;
Become this
Select columns
into #T
from MyDataSource;
select *
from tempdb.sys.columns as TempCols
full outer join MyDb.sys.columns as RealCols
on TempCols.name = RealCols.name
and TempCols.object_id = Object_ID(N'tempdb..#T')
and RealCols.object_id = Object_ID(N'MyDb.dbo.MyTable)
where TempCols.name is null -- no match for real target name
or RealCols.name is null -- no match for temp target name
or RealCols.system_type_id != TempCols.system_type_id
or RealCols.max_length < TempCols.max_length ;

Searching for the unique key (not in meta)

With which SQL Server standard tool it is possible to search unique key in the table's data (but not in meta declaration)?
P.S. I am thinking to write such script by myself. May be you could point a snippet for
combinatorics in t-sql? e.g. for generation all Combinations from n by 1..n ?
P.P.S About problem complexity for those who do not see it. It is important that we do not need to analyze the whole data to dismiss the hypnotize that those two columns is the 'unique key'. With real world, 'report-like', sorted data even after analysing first two rows, I think, it is possible to remove many of columns combinations. So I feel such algorithm should have 'before full table compare' phase. But there it is a question for what portion of data to choose for this 'before full table compare' phase . The best candidate about which I think is the 'page'... If data unique in the page we could test the uniqueness on whole table, if not unique (on the page), then go to the next column set.
select t1.col, count(*)
from table t1
join table t2
on t1.col = t2.col
group by t1.col
having count(*) > 1
if zero rows are returned then it is unique
more than one column
select t1.cola, t1.colb, count(*)
from table t1
join table t2
on t1.cola = t2.cola
and t1.colb = t2.colb
group by t1.cola, t2.colb
having count(*) > 1

Resources