I need to clean up some observations in a table that are inaccurate prior to joining to the after mentioned table, this will avoid duplicate observation output.
I validated that the max(date_value) removes the 9K inaccurate transactions ..... newer transaction were completed which fixed the problem.
The code below, without into #temp, fixes the issue but as soon as I add a temp table, I get a syntax error will not execute, I need like 20 variables out of the table and really don't feel like listing them all, must be a simple syntax or alternative method.
SELECT * INTO #temp FROM db.dbo.table WHERE MAX(date_value);
SELECT a.* INTO #temp
FROM table a
inner join (select id, max(created_at) as max_created
from db.table
group by id) b
on a.id = b.id
Related
I have a CTE I am using to pull some data from two tables then stick in an intermediate table called cte_list, something like
with cte_list as (
select pl.col_val from prune_list pl join employees.employee emp on pl.col_val::uuid = emp.id
where pl.col_nm = 'employee_ref_id' limit 100
)
Then, I am doing an insert to move records from the cte_list to another archive table (if they don't exist) called employee_arch_test
insert into employees.employee_arch_test (
select * from employees.employee where id in (select col_val::uuid from cte_list)
and not exists (select 1 from employees.employee_arch_test where employees.employee_arch_test.id=employees.employee.id)
);
This seems to work fine. The problem is when I add another statement after, to do some deletions from the main employee table using this aforementioned cte_list - the cte_list apparently no longer exists?
SQL Error [42P01]: ERROR: relation "cte_list" does not exist
the actual delete query:
delete from employees.employee where id in (select col_val::uuid from cte_list);
Can the cte_list CTE table only be used once or something? I'm running these statements in a LOOP and I need to run the exact same calls for about 2 or 3 other tables but hit a sticking point here.
A CTE only exists for the duration of the statement of which it's a part. I gather you have an INSERT statement with the CTE preceding it:
with cte_list
as (select pl.col_val
from prune_list pl
join employees.employee emp
on pl.col_val::uuid = emp.id
where pl.col_nm = 'employee_ref_id'
limit 100
)
insert into employees.employee_arch_test
(select *
from employees.employee
where id in (select col_val::uuid from cte_list)
and not exists (select 1
from employees.employee_arch_test
where employees.employee_arch_test.id = employees.employee.id)
);
The CTE is part of the INSERT statement - it is not a separate statement by itself. It only exists for the duration of the INSERT statement.
If you need something which lasts longer your options are:
Add the same CTE to each of your following statements. Note that because data may be changing in your database each invocation of the CTE may return different data.
Create a view which performs the same operations as the CTE, then use the view in place of the CTE. Note that because data may be changing in your database each invocation of the view may return different data.
Create a temporary table to hold the data from your CTE query, then use the temporary table in place of the CTE. This has the advantage of providing a consistent set of data to all operations.
I have a number of queries that are run at the same time but now I want the result to populate a permanent table that I've created.
Each of the queries will have a column called 'Descript' which is what I want all the results to join to so i want to make sure that if the Descript column is out of order (or null) on one of the queries it will link the figures to the correct Descript.
I performed an INTO after the end of each query being run but this didn't work.
The first level of data went in but the second level just went underneath the first (if that makes sense) creating more rows.
INSERT INTO dbo.RESULTTABLE (Descript, Category, DescriptCount)
SELECT Descript, Category, DescriptCount
FROM #Query1
I have around 15 queries to join into 1 table so any help to understand the logic is appreciated.
Thanks
If I understood your question clearly, you want to insert query results which is not stored in the Temptable and update already existing records in the table.
update R set Category = Q.Category, DescriptCount = Q.DescriptCount,
from #ResultTable R inner join #Query1 Q ON R.Descript = Q.Descript
INSERT INTO dbo.RESULTTABLE (Descript, Category, DescriptCount)
SELECT Descript, Category, DescriptCount FROM #Query1 where Descript NOT IN (select Descript from #ResultTable)
Then you can process the same approach for other queries.
I have a query that runs fairly fast under normal circumstances. But it is running very slow (at least 20 minutes in SSMS) due to how many values are in the filter.
Here's the generic version of it, and you can see that one part is filtering by over 8,000 values, making it run slow.
SELECT DISTINCT
column
FROM
table_a a
JOIN
table_b b ON (a.KEY = b.KEY)
WHERE
a.date BETWEEN #Start and #End
AND b.ID IN (... over 8,000 values)
AND b.place IN ( ... 20 values)
ORDER BY
a.column ASC
It's to the point where it's too slow to use in the production application.
Does anyone know how to fix this, or optimize the query?
To make a query fast, you need indexes.
You need a separate index for the following columns: a.KEY, b.KEY, a.date, b.ID, b.place.
As gotqn wrote before, if you put your 8000 items to a temp table, and inner join it, it will make the query even faster too, but without the index on the other part of the join it will be slow even then.
What you need is to put the filtering values in temporary table. Then use the table to apply filtering using INNER JOIN instead of WHERE IN. For example:
IF OBJECT_ID('tempdb..#FilterDataSource') IS NOT NULL
BEGIN;
DROP TABLE #FilterDataSource;
END;
CREATE TABLE #FilterDataSource
(
[ID] INT PRIMARY KEY
);
INSERT INTO #FilterDataSource ([ID])
-- you need to split values
SELECT DISTINCT column
FROM table_a a
INNER JOIN table_b b
ON (a.KEY = b.KEY)
INNER JOIN #FilterDataSource FS
ON b.id = FS.ID
WHERE a.date BETWEEN #Start and #End
AND b.place IN ( ... 20 values)
ORDER BY .column ASC;
Few important notes:
we are using temporary table in order to allow parallel execution plans to be used
if you have fast (for example CLR function) for spiting, you can join the function itself
it is not good to use IN with many values, the SQL Server is not able to build always the execution plan which may lead to time outs/internal error - you can find more information here
I intended to run the following UPDATE statement on a SQL Server database table:
UPDATE TABLE_A
SET COL_1=B.COL_1
FROM TABLE_A A
INNER JOIN TABLE_B B
ON A.KEY_1=B.KEY_1
WHERE B.COL_2 IS NOT NULL
AND A.COL_1=91216599
By mistake, I ran the following statement instead:
UPDATE TABLE_A
SET COL_1=B.COL_1
FROM TABLE_A_COPY A
INNER JOIN TABLE_B B
ON A.KEY_1=B.KEY_1
WHERE B.COL_2 is not NULL
AND A.COL_1=91216599
Notice that in this second statement (wrong one), the FROM clause specifies table TABLE_A_COPY instead of TABLE_A. Both tables have exactly the same schema (i.e., same columns) and the same data (before any UPDATE is executed, that is).
Both TABLE_A and TABLE_A_COPY have about 100 million records and the update affects about 500,000 records. The second statement (the wrong one) runs for several hours and fails while the 1st statement (the correct one) runs for 40 seconds and succeeds.
Clearly, both statements are syntactically correct, but I am not sure what exactly I asked SQL Server to do with the first statement.
My questions are:
What SQL Server was trying to do in the second statement? With my mistake I didn't specify the linkage between records from TABLE_A to TABLE_A_COPY, so was it trying to do a CROSS JOIN between the two, and then update each record in TABLE_A a gazillion times?
If it isn't too broad a question to ask, what would be a valid scenario for such an UPDATE statement in which the table being updated is not mentioned in the FROM/JOIN clauses. Why would anyone do that? Why would SQL Server even allow that?
I did try searching for an answer to my questions, but Google seems to think I'm asking about UPDATE FROM syntax.
1) There is no connection between TABLE_A and TABLE_A_COPY so you will get CROSS JOIN and massive update the same row. Result can be non-deterministic if parallel execution is involed:
LiveDemo
CREATE TABLE #TABLE_A(KEY_1 INT PRIMARY KEY,COL_1 INT);
CREATE TABLE #TABLE_A_COPY(KEY_1 INT PRIMARY KEY,COL_1 INT);
CREATE TABLE #TABLE_B(KEY_1 INT PRIMARY KEY, COL_1 INT, COL_2 INT);
INSERT INTO #TABLE_A VALUES (1,91216599),(2,91216599),(3,91216599),
(4,91216599),(5,91216599),(6,6);
INSERT INTO #TABLE_A_COPY VALUES (1,91216599),(2,91216599),(3,91216599),
(4,91216599),(5,91216599),(6,6);
INSERT INTO #TABLE_B VALUES (1,10,10),(2,20,20), (3,30,30);
/*
UPDATE #TABLE_A
SET COL_1=B.COL_1
--SELECT *
FROM #TABLE_A A
INNER JOIN #TABLE_B B
ON A.KEY_1=B.KEY_1
WHERE B.COL_2 IS NOT NULL
AND A.COL_1=91216599;
*/
UPDATE #TABLE_A
SET COL_1=B.COL_1
FROM #TABLE_A_COPY A
INNER JOIN #TABLE_B B
ON A.KEY_1=B.KEY_1
WHERE B.COL_2 is not NULL
AND A.COL_1=91216599
SELECT *
FROM #TABLE_A;
Check in above code how TABLE_A record with KEY_1 = 6 changed.
2)
SQL Server UPDATE FROM/DELETE FROM syntax is much more broad than ANSI standard, the problem you encounter can be reduced to multiple update the same row. With UPDATE you don't get any error or warning:
From Let's deprecate UPDATE FROM! and Deprecate UPDATE FROM and DELETE FROM :
Correctness? Bah, who cares?
Well, most do. That’s why we test.
If I mess up the join criteria in a SELECT query so that too many rows
from the second table match, I’ll see it as soon as I test, because I
get more rows back then expected. If I mess up the subquery criteria
in an ANSI standard UPDATE query in a similar way, I see it even
sooner, because SQL Server will return an error if the subquery
returns more than a single value. But with the proprietary UPDATE FROM
syntax, I can mess up the join and never notice – SQL Server will
happily update the same row over and over again if it matches more
than one row in the joined table, with only the result of the last of
those updates sticking. And there is no way of knowing which row that
will be, since that depends in the query execution plan that happens
to be chosen. A worst case scenario would be one where the execution
plan just happens to result in the expected outcome during all tests
on the single-processor development server – and then, after
deployment to the four-way dual-core production server, our precious
data suddenly hits the fan…
If you use for example MERGE you will get error indicating:
The MERGE statement attempted to UPDATE or DELETE the same row more
than once. This happens when a target row matches more than one source
row. A MERGE statement cannot UPDATE/DELETE the same row of the target
table multiple times. Refine the ON clause to ensure a target row
matches at most one source row, or use the GROUP BY clause to group
the source rows.
So you need to be more carefull and check your code. I wish also to get error but as you see in connect link this won't happen.
One way to avoid this is to use UPDATE alias so you are sure you use tables that take part in FROM JOIN and no other tables are involved.:
UPDATE A
SET COL_1=B.COL_1
FROM #TABLE_A A
INNER JOIN #TABLE_B B
ON A.KEY_1=B.KEY_1
WHERE B.COL_2 IS NOT NULL
AND A.COL_1=91216599;
SQL will allow a lot of stuff that probably does not make sense
Notice tableB is on both side of the on
select *
from tableA
join tableB
on tableB.col1 = tableB.col1
SQL just checks syntax - it is up to you so write a statement that makes sense
There might be some case you really do want to do want a cross product type update
This is how I would write that statement
I line the table names up so it is easier to see
UPDATE TABLE_A
SET A.COL_1 = B.COL_1
FROM TABLE_A A
JOIN TABLE_B B
ON A.KEY_1 = B.KEY_1
AND B.COL_2 IS NOT NULL
AND A.COL_1 = 91216599
AND A.COL_1 <> B.COL_1
Hello I'm struggling to get the query below right. What I want is to return rows with unique names and surnames. What I get is all rows with duplicates
This is my sql
DECLARE #tmp AS TABLE (Name VARCHAR(100), Surname VARCHAR(100))
INSERT INTO #tmp
SELECT CustomerName,CustomerSurname FROM Customers
WHERE
NOT EXISTS
(SELECT Name,Surname
FROM #tmp
WHERE Name=CustomerName
AND ID Surname=CustomerSurname
GROUP BY Name,Surname )
Please can someone point me in the right direction here.
//Desperate (I tried without GROUP BY as well but get same result)
DISTINCT would do the trick.
SELECT DISTINCT CustomerName, CustomerSurname
FROM Customers
Demo
If you only want the records that really don't have duplicates (as opposed to getting duplicates represented as a single record) you could use GROUP BY and HAVING:
SELECT CustomerName, CustomerSurname
FROM Customers
GROUP BY CustomerName, CustomerSurname
HAVING COUNT(*) = 1
Demo
First, I thought that #David answer is what you want. But rereading your comments, perhaps you want all combinations of Names and Surnames:
SELECT n.CustomerName, s.CustomerSurname
FROM
( SELECT DISTINCT CustomerName
FROM Customers
) AS n
CROSS JOIN
( SELECT DISTINCT CustomerSurname
FROM Customers
) AS s ;
Are you doing that while your #Tmp table is still empty?
If so: your entire "select" is fully evaluated before the "insert" statement, it doesn't do "run the query and add one row, insert the row, run the query and get another row, insert the row, etc."
If you want to insert unique Customers only, use that same "Customer" table in your not exists clause
SELECT c.CustomerName,c.CustomerSurname FROM Customers c
WHERE
NOT EXISTS
(SELECT 1
FROM Customers c1
WHERE c.CustomerName = c1.CustomerName
AND c.CustomerSurname = c1.CustomerSurname
AND c.Id <> c1.Id)
If you want to insert a unique set of customers, use "distinct"
Typically, if you're doing a WHERE NOT EXISTS or WHERE EXISTS, or WHERE NOT IN subquery,
you should use what is called a "correlated subquery", as in ypercube's answer above, where table aliases are used for both inside and outside tables (where inside table is joined to outside table). ypercube gave a good example.
And often, NOT EXISTS is preferred over NOT IN (unless the WHERE NOT IN is selecting from a totally unrelated table that you can't join on.)
Sometimes if you're tempted to do a WHERE EXISTS (SELECT from a small table with no duplicate values in column), you could also do the same thing by joining the main query with that table on the column you want in the EXISTS. Not always the best or safest solution, might make query slower if there are many rows in that table and could cause many duplicate rows if there are dup values for that column in the joined table -- in which case you'd have to add DISTINCT to the main query, which causes it to SORT the data on all columns.
-- Not efficient at all.
And, similarly, the WHERE NOT IN or NOT EXISTS correlated subqueries can be accomplished (and give the exact same execution plan) if you LEFT OUTER JOIN the table you were going to subquery -- and add a WHERE . IS NULL.
You have to be careful using that, but you don't need a DISTINCT. Frankly, I prefer to use the WHERE NOT IN subqueries or NOT EXISTS correlated subqueries, because the syntax makes the intention clear and it's hard to go wrong.
And you do not need a DISTINCT in the SELECT inside such subqueries (correlated or not). It would be a waste of processing (and for WHERE EXISTS or WHERE IN subqueries, the SQL optimizer would ignore it anyway and just use the first value that matched for each row in the outer query). (Hope that makes sense.)