I am working with SQL Server 2008 and have a select query that I am trying to create a Null column in.
The query is being used to create a tab delimited text file to be imported by a different system. I am using
Select data1 as col1, data2 as col2, '' as col3, data4 as col4
The problem apparently is that the other system does not see Col3 as NULL even though when I open it in Notepad++ it shows a NULL in that column. The vendor says there is something in that column. I assume they are seeing it as an empty string and not null.
What is a different/better way to put that NULL column in?
Thanks,
Select data1 as col1, data2 as col2, NULL as col3, data4 as col4 FROM tamble_name
Related
I have a query as:
SELECT T1.col1, T1.col2, T2.col3, T2.col4 from Table1 T1 join Table2 T2
This returns the columns(T1,T2,T3,T4) as expected.
col1 col2 col3 col4
A 11 0.5 B
Now i want to add a column to the result set with out modifying the above Select statement.
I want output result set should have another column(col8) from Table1 but i cannot add this column(col8) to the exisiting Select statement.
col1 col2 col3 col4 col8
A 11 0.5 B 9
I know there is Union in SQL server. But Union requires
Each SELECT statement within UNION must have the same number of columns
The columns must also have similar data types
The columns in each SELECT statement must also be in the same order.
I want to achieve similar to UNION but with out above conditions. I want only column to be added to the result set.
Is this achievable in SQL Server?
Any help?
Thanks in advance.
I am looking to find an effective way to delete duplicated records from my database. First, I used a stored procedure that uses joins and such, which caused the query to execute very slow. Now, I am trying a different approach. Please consider the following queries:
/* QUERY A */
SELECT *
FROM my_table
WHERE col1 = value
AND col2 = value
AND col3 = value
This query just executed in 12 seconds, with a result of 182.400 records. The row count in the table is currently 420.930.407, and col1 and col3 are indexed.
The next query:
/* QUERY B */
WITH ALL_RECORDS AS
(SELECT id
FROM my_table
WHERE col1 = value
AND col2 = value
AND col3 = value)
SELECT *
FROM ALL_RECORDS
This query took less than 2 seconds, and gives me all the id's of the 182.400 records in the table (according to the where clause).
Then, my last query, is a query that selects the lowest (first) id of all records grouped on the columns I want to group on to check for duplicates:
/* QUERY C */
SELECT MIN(id)
FROM my_table
WHERE col1 = value
AND col2 = value
AND col3 = value
GROUP BY col1,
col2,
col3,
col4,
col5,
col6
Again, this query executes in less than 2 seconds. The result is 30.400, which means there are 30.400 unique records among the 182.400 records that are unique.
Now, I'd like to delete (or, first select to make sure I have my query right) all records that are not unique. So, I'd like to remove 182.400 - 30.400 = 152.000 records from my_table.
I thought I'd combine the two last queries: get all id's that belong to my dataset according to the where clause on col1, col2 and col3 (query B), and then delete/select all records from that dataset of which the id is not in the id list of the unique record id's (query C).
However, when I select all from query B where query B.id NOT IN query C, the query does not take 2, 4 or 12 (14 or 16) seconds, but seems to take forever (20.000 records shown after 1 minute, around 40.000 after 2 minutes, so I canceled the query since it'll find 152.000 records, which will take 8 minutes this way).
WITH ALL_RECORDS AS
(SELECT id
FROM my_table
WHERE col1 = value
AND col2 = value
AND col3 = value)
SELECT id
FROM ALL_RECORDS
WHERE id NOT IN
(SELECT MIN(id)
FROM my_table
WHERE col1 = value
AND col2 = value
AND col3 = value
GROUP BY col1,
col2,
col3,
col4,
col5,
col6)
I know NOT IN is slow, but I can't grasp how it's THIS slow (since both queries without the not in part execute in less than 2 seconds each).
Does anyone have some good advice for me on how to solve this puzzle?
------------------ Additional information ------------------
Previous solution was the following stored procedure. For some reason it executes perfectly on my acceptance environment, but not on my production environment. Currently, we have over 400 million records on production and a little over 2 million records on acceptance, so this might be a reason.
DELETE my_table
FROM my_table
LEFT OUTER JOIN
(SELECT MIN(id) AS RowId,
col1,
col2,
col3,
col4,
col5,
col6
FROM my_table
WHERE col1 = value
AND col2 = value
AND col3 = value
GROUP BY col1,
col2,
col3,
col4,
col5,
col6) AS KeepRows ON my_table.id = KeepRows.RowId
WHERE KeepRows.RowId IS NULL
AND my_table.col1 = value
AND my_table.col2 = value
AND my_table.col3 = value
I have based this solution on another answer on stackoverflow (can't find it at the moment), but I feel I should be able to create a query based on Query B and C that executes within a few seconds...
with dupl as (
select row_number() over(partition by col1,col2,col3,col4,col5,col6 order by id) rn,
id,col1,col2,col3,col4,col5,col6
from myTable
)
delete dupl where rn>1
Combining two 2-second queries together will not, generally, result in a single 4-second query, because queries, unlike their underlying tables, are rarely indexed.
Usual approach for this kind of tasks is to cache id's you want to keep in a temporary table, index it accordingly and then use it in the left join (or not in - I bet the resulting execution plans are practically the same).
You can probably get some more performance if you will play with indices on the main table. For example, I think that (col1, col2, col3) should give your code some boost (columns should not necessarily be mentioned in this order, it usually depends on their cardinalities).
I have a new requirement to design report using date range parameters for Start and End dates in my report.
I am using this queries in my report:
Main Dataset:
SELECT Col1, Col2, StartDate, TargetDate, Col3
FROM Table
WHERE (StartDate BETWEEN #StartDateFrom AND #StartDateTo)
AND (TargetDate BETWEEN #TargetDateFrom AND #TargetDateTo)
Dataset 1:
SELECT DISTINCT Col1
FROM Table
Dataset 2:
SELECT DISTINCT Col2
FROM Table
WHERE (Col1IN (#Param1))
ORDER BY Col2
Dataset 3:
SELECT DISTINCT Col1, Col2, Col3
FROM Table
WHERE (Col1 IN (#Param1))
AND (Col2 IN (#Param2))
GROUP BY Col1, Col2, Col3
While running the report I get an error: TargetDate parameter is missing value.
Can someone help?
I don't see where you are "DECLARE"ing your variables #TargetDateTo and #TargetDateFrom? You may want to check your query again as SSRS may not be picking up all your variables.
There's 2 ways I've found of upserting many rows into a table with SQL Server 2008.
One of which is found here http://technet.microsoft.com/en-us/library/bb522522(v=sql.105).aspx says to create a temp table, then insert values to temp table, and finally merge that table with target able.
This doesn't seem very efficient to me because you have to create a table, fill the table, merge to target table, and then delete the temp table.
The only other thing I can think of is as follows...
MERGE dbo.targettable as tgt
USING (
SELECT 12 as col1, 13 as col2, 'abc' as col3, 'zyx' as col4
UNION ALL
SELECT 11 as col1, 11 as col2, 'def' as col3, 'def' as col4
(etc etc)
UNION ALL
SELECT 7 as col1, 10 as col2, 'jfj' as col3, 'tub' as col4)
as new
ON tgt.col1=new.col1
WHEN MATCHED THEN UPDATE SET tgt.col2=new.col2, tgt.col3=new.col3, tgt.col4=new.col4
WHEN NOT MATCHED THEN INSERT (col1, col2, col3, col4)
VALUES(new.col1, new.col2, new.col3, new.col4);
Based on usr's answer I was able to find http://msdn.microsoft.com/en-us/library/bb510625.aspx
I think this is the way to do it. Could someone verify that this syntax appears correct?
MERGE dbo.targettable as tgt
USING (VALUES(12, 13, 'abc', 'zyx'), (11, 11, 'def', 'def'),(7, 10, 'jfj', 'tub'))
AS new (col1, col2, col3, col4)
ON tgt.col1=new.col1
WHEN MATCHED THEN UPDATE SET tgt.col2=new.col2, tgt.col3=new.col3, tgt.col4=new.col4
WHEN NOT MATCHED THEN INSERT (col1, col2, col3, col4)
VALUES(new.col1, new.col2, new.col3, new.col4);
Where does the data to be merged come from?
If it comes from a query, inline the query into the merge.
If it
comes from the app, use table-valued parameters.
If it is generated
iteratively, use a temp table or table variable.
If it is a constant like in your example use the VALUES clause. Don't use UNION ALL because it is more verbose, does not document semantics nicely and increases query compile time because the optimizer has to convert it to VALUES form.
I have a table in SQL Server that I want to plus amount of a specific column and have the result in next row.
How can I do that?
I want to plus amount of a specific column and have the result in next row.
In case you are looking to insert another from the previous row after adding certain amount, you could use the following:
INSERT INTO MyTable (Col1, Col2, Col3)
SELECT Col1, Col2 + <additional amount>, Col3
FROM MyTable
WHERE
<Criteria to select that row of interest>
In case you are looking to select all the rows in a table and aggregate the amount column and show the result in a separate row, then you could use the following:
SELECT Col1, Col2, Col3 FROM MyTable
UNION
SELECT '', SUM(Col2), '' FROM MyTable