No results with DISTINCT - Multiple tables - sql-server

I am trying to run the following query which gets data from two tables but I get no results:
SELECT DISTINCT([dbo].[SF].Assignment), [X].Code
FROM [dbo].[SF], [dbo].[X]
WHERE (CHARINDEX([X].Code, [dbo].[SF].Assignment COLLATE SQL_Latin1_General_CP1_CI_AS) > 0)
AND [dbo].[SF].Code = 'NULL'
When I remove the DISTINCT, I get way too many results because of my big data set that causes an out of memory exception.

Removing parentheses with the DISTINCT as well as changing to an INNER JOIN fixed my issues. #Tab Alleman and #squillman can add answer here so I could mark them as correct.

Related

Temp Table with Wild Card

I need to clean up some observations in a table that are inaccurate prior to joining to the after mentioned table, this will avoid duplicate observation output.
I validated that the max(date_value) removes the 9K inaccurate transactions ..... newer transaction were completed which fixed the problem.
The code below, without into #temp, fixes the issue but as soon as I add a temp table, I get a syntax error will not execute, I need like 20 variables out of the table and really don't feel like listing them all, must be a simple syntax or alternative method.
SELECT * INTO #temp FROM db.dbo.table WHERE MAX(date_value);
SELECT a.* INTO #temp
FROM table a
inner join (select id, max(created_at) as max_created
from db.table
group by id) b
on a.id = b.id

Avoid duplicate values in comma delimited sql query

hello I have here a comma delimited query:
select [Product_Name]
,(select h2.Location_name + ', ' from (select distinct * from [dbo].[Product_list]) h2 where h1.Product_Name = h2.Product_Name
order by h2.Product_Name for xml path ('')) as Location_name
,(select h2.[Store name] + ', ' from [dbo].[Product_list] h2 where h1.Product_Name = h2.Product_Name
order by h2.Product_Name for xml path ('')) as store_name, sum(Quantity) as Total_Quantity from [dbo].[Product_list] h1
group by [Product_Name]
but this query shows duplicated data in comma delimited form, my problem is how will I only show the distinct values of the column in comma delimited form? can anyone please help me?
Well, if you don't SELECT DISTINCT * FROM dbo.Product_list and instead SELECT DISTINCT location_name FROM dbo.Product_list, which is anyway the only column you need, it will return only distinct values.
T-SQL supports the use of the asterisk, or “star” character (*) to
substitute for an explicit column list. This will retrieve all columns
from the source table. While the asterisk is suitable for a quick
test, avoid using it in production work, as changes made to the table
will cause the query to retrieve all current columns in the table’s
current defined order. This could cause bugs or other failures in
reports or applications expecting a known number of columns returned
in a defined order. Furthermore, returning data that is not needed can
slow down your queries and cause performance issues if the source
table contains a large number of rows. By using an explicit column
list in your SELECT clause, you will always achieve the desired
results, providing the columns exist in the table. If a column is
dropped, you will receive an error that will help identify the problem
and fix your query.
Using SELECT DISTINCT will filter out duplicates in the result set.
SELECT DISTINCT specifies that the result set must contain only unique
rows. However, it is important to understand that the DISTINCT option
operates only on the set of columns returned by the SELECT clause. It
does not take into account any other unique columns in the source
table. DISTINCT also operates on all the columns in the SELECT list,
not just the first one.
From Querying Microsoft SQL Server 2012 MCT Manual.

String or binary data would be truncated error in SQL server. How to know the column name throwing this error

I have an insert Query and inserting data using SELECT query and certain joins between tables.
While running that query, it is giving error "String or binary data would be truncated".
There are thousands of rows and multiple columns I am trying to insert in that table.
So it is not possible to visualize all data and see what data is throwing this error.
Is there any specific way to identify which column is throwing this error? or any specific record not getting inserted properly and resulted into this error?
I found one article on this:
RareSQL
But this is when we insert data using some values and that insert is one by one.
I am inserting multiple rows at the same time using SELECT statements.
E.g.,
INSERT INTO TABLE1 VALUES (COLUMN1, COLUMN2,..) SELECT COLUMN1, COLUMN2,.., FROM TABLE2 JOIN TABLE3
Also, in my case, I am having multiple inserts and update statements and even not sure which statement is throwing this error.
You can do a selection like this:
select TABLE2.ID, TABLE3.ID TABLE1.COLUMN1, TABLE1.COLUMN2, ...
FROM TABLE2
JOIN TABLE3
ON TABLE2.JOINCOLUMN1 = TABLE3.JOINCOLUMN2
LEFT JOIN TABLE1
ON TABLE1.COLUMN1 = TABLE2.COLUMN1 and TABLE1.COLUMN2 = TABLE2.COLUMN2, ...
WHERE TABLE1.ID = NULL
The first join reproduces the selection you have been using for the insert and the second join is a left join, which will yield null values for TABLE1 if a row having the exact column values you wanted to insert does not exist. You can apply this logic to your other queries, which were not given in the question.
You might just have to do it the hard way. To make it a little simpler, you can do this
Temporarily remove the insert command from the query, so you are getting a result set out of it. You might need to give some of the columns aliases if they don't come with one. Then wrap that select query as a subquery, and test likely columns (nvarchars, etc) like this
Select top 5 len(Col1), *
from (Select col1, col2, ... your query (without insert) here) A
Order by 1 desc
This will sort the rows with the largest values in the specified column first and just return the rows with the top 5 values - enough to see if you've got a big problem or just one or two rows with an issue. You can quickly change which column you're checking simply by changing the column name in the len(Col1) part of the first line.
If the subquery takes a long time to run, create a temp table with the same columns but with the string sizes large (like varchar(max) or something) so there are no errors, and then you can do the insert just once to that table, and run your tests on that table instead of running the subquery a lot
From this answer,
you can use temp table and compare with target table.
for example this
Insert into dbo.MyTable (columns)
Select columns
from MyDataSource ;
Become this
Select columns
into #T
from MyDataSource;
select *
from tempdb.sys.columns as TempCols
full outer join MyDb.sys.columns as RealCols
on TempCols.name = RealCols.name
and TempCols.object_id = Object_ID(N'tempdb..#T')
and RealCols.object_id = Object_ID(N'MyDb.dbo.MyTable)
where TempCols.name is null -- no match for real target name
or RealCols.name is null -- no match for temp target name
or RealCols.system_type_id != TempCols.system_type_id
or RealCols.max_length < TempCols.max_length ;

How can I fix this query (IN-clause) so that SQL server performance does not degrade when there is a low volume of data?

The following chart shows the performance of a process over time.
The process is calling a stored procedure with the following form:
CREATE PROCEDURE [dbo].[GetResultSetsAndResultsWhereStatusIsValidatedByPatientId]
#PatientId uniqueidentifier
AS
BEGIN
SELECT DISTINCT resultSetTable.ResultSetId,
resultSetTable.OrderId,
resultSetTable.ReceivedDateTime,
resultSetTable.ProfileId,
resultSetTable.Status,
profileTable.Code,
testResultTable.AbnormalFlag,
testResultTable.Result,
orderTable.ReceptionDateTime,
testTable.TestCode,
orderedProfileTable.[Status] as opStatus
FROM dbo.ResultSet resultSetTable
INNER JOIN dbo.[Profile] profileTable on (profileTable.ProfileId = resultSetTable.ProfileId)
INNER JOIN dbo.[TestResult] testResultTable on (testResultTable.ResultSetId = resultSetTable.ResultSetId)
INNER JOIN dbo.[Order] orderTable on (resultSetTable.OrderId = orderTable.OrderId)
INNER JOIN dbo.[Test] testTable on (testResultTable.TestId = testTable.TestId)
INNER JOIN dbo.OrderedProfile orderedProfileTable on (orderedProfileTable.ProfileId = resultSetTable.ProfileId)
WHERE orderTable.PatientId = #PatientId
AND orderedProfileTable.[Status] in ('V', 'REP')
END
The problem seems to be the IN-clause. If I remove the IN-clause and only check for one of the values then I get consistent performance as seen in the second part of the graph.
AND orderedProfileTable.[Status] = 'V'
The issue also seems to be related to the amount of data in the tables. Only two tables grow, [ResultSet] and [TestResult], and both these tables are empty at the start of performance runs.
I have tried the following:
Move the IN-clause to a an outer select - no effect
Replace the IN-clause with a join - severe performance degradation
Create an index for the "Status" field used in the IN-clause - no effect
Is there a way to always get the low performance even when there is no data in the two relevant tables?
Have you tried throwing the IN query into an EXIST clause?
WHERE
orderTable.PatientId = #PatientId
AND
EXISTS
(SELECT *
FROM dbo.OrderedProfile as p
WHERE
p.profileid = orderedprofiletable.profileid
AND
[Status] IN ('v','rep'))
Since you're only searching for static results ('v' and 'rep') I would think that the IN clause by itself would be your best bet, but EXIST can sometimes speed up performance so it's worth a shot.
The problem was not related to the IN-clause in the end but a logic error. We started questioning why the query needed DISTINCT and when we removed it we discovered a logic error in the last join (needed some more criteria in what it matches against).
The error has been partially resolved and the performance issue seems to be resolved.
The stored procedure now completes in less than 10 ms on average and no performance degradation.

Getting wrong results whith specific "WHERE" condition

gurus.
I'm stuck with my problem and will appreciate any help or suggestion. Please check this pic.
I don't understand why I'm getting wrong result in bottom query. As you can see the difference with previuos query is only in "WHERE" clause, but this difference must lead to the same results since it's one-to-one join.
Important thing is that v_last_part_info is as a view and I changed it recently. I thought it's due to QEP cache, but i tried OPTION (RECOMPLIE) and even solution described here. The result is still same.
Please, help! What am I missing?
P.S.: [OBJECT_ID] is a column name, not built-in function
P.P.S: ANOTHER_DB has different collation that's the reason i need collate database_default
select Tracking
, SoItem
from v_last_part_info
where Tracking = '4170664293'
Tracking SoItem
4170664293 20
--================================================================================--
select
lpi.Tracking
, lpi.SoItem
from v_last_part_info lpi
join ANOTHER_DB..SO_HEADER h on lpi.Tracking = h.[OBJECT_ID] collate database_default
where Tracking = '4170664293'
Tracking SoItem
4170664293 20
--================================================================================--
select
lpi.Tracking
, lpi.SoItem
from v_last_part_info lpi
join ANOTHER_DB..SO_HEADER h on lpi.Tracking = h.[OBJECT_ID] collate database_default
where [OBJECT_ID] = '4170664293'
Tracking SoItem
4170664293 10
Thanks to GarethD I found out the reason. This happened because of row_number() function inside v_last_part_info. Accorgin to the definition at MSDN:
There is no guarantee that the rows returned by a query using
ROW_NUMBER() will be ordered exactly the same with each execution
unless the following conditions are true.
Values of the partitioned column are unique.
Values of the ORDER BY are unique.
Combinations of values of the partition column and ORDER BY columns are unique.
In my case option 2 was not secured.

Resources