Avoid duplicate values in comma delimited sql query

Avoid duplicate values in comma delimited sql query - sql-server

hello I have here a comma delimited query:
select [Product_Name]
,(select h2.Location_name + ', ' from (select distinct * from [dbo].[Product_list]) h2 where h1.Product_Name = h2.Product_Name
order by h2.Product_Name for xml path ('')) as Location_name
,(select h2.[Store name] + ', ' from [dbo].[Product_list] h2 where h1.Product_Name = h2.Product_Name
order by h2.Product_Name for xml path ('')) as store_name, sum(Quantity) as Total_Quantity from [dbo].[Product_list] h1
group by [Product_Name]
but this query shows duplicated data in comma delimited form, my problem is how will I only show the distinct values of the column in comma delimited form? can anyone please help me?

Well, if you don't SELECT DISTINCT * FROM dbo.Product_list and instead SELECT DISTINCT location_name FROM dbo.Product_list, which is anyway the only column you need, it will return only distinct values.
T-SQL supports the use of the asterisk, or “star” character (*) to
substitute for an explicit column list. This will retrieve all columns
from the source table. While the asterisk is suitable for a quick
test, avoid using it in production work, as changes made to the table
will cause the query to retrieve all current columns in the table’s
current defined order. This could cause bugs or other failures in
reports or applications expecting a known number of columns returned
in a defined order. Furthermore, returning data that is not needed can
slow down your queries and cause performance issues if the source
table contains a large number of rows. By using an explicit column
list in your SELECT clause, you will always achieve the desired
results, providing the columns exist in the table. If a column is
dropped, you will receive an error that will help identify the problem
and fix your query.
Using SELECT DISTINCT will filter out duplicates in the result set.
SELECT DISTINCT specifies that the result set must contain only unique
rows. However, it is important to understand that the DISTINCT option
operates only on the set of columns returned by the SELECT clause. It
does not take into account any other unique columns in the source
table. DISTINCT also operates on all the columns in the SELECT list,
not just the first one.
From Querying Microsoft SQL Server 2012 MCT Manual.

Related

snowflake merge statement using golden gate json as source table

while executing target table in snowflake using json data as source table
merge into cust tgt using (
select parse_json(s.$1):application_num as application num
from prd_json s qualify
row_number() over(partition application
order_by application desc)=1) src
on tgt.application =src.application
when not matched and op_type='I' then
insert(application) values (src.application );
qualify commands ignores all the duplicate data present and gives only unique record but while putting joins its show only less records when compare to normal select statement.
for example :
select distinct application
from prd_json where op_type='I';
--15000 rows are there
while putting joins it shows there is not matching records in target . if it is not matched it should insert all 15000rows but 8500 rows only inserting even though it was not an duplicate record . is there any function available without using "qualify" shall we insert the record. if i ignore qualify am getting dml error duplication. pls guide me if anyone knows.

How about using SELECT DISTINCT?

You demo SQL does not compile. and you using the $1 means it's also hard to guess the names of your columns to know how the ROW_NUMBER is working.
So it's hard to nail down the problem.
But with the following SQL you can replace ROW_NUMBER with DISTINCT
CREATE TABLE cust(application INT);
CREATE OR REPLACE table prd_json as
SELECT parse_json(column1) as application, column2 as op_type
FROM VALUES
('{"application_num":1,"other":1}', 'I'),
('{"application_num":1,"other":2}', 'I'),
('{"application_num":2,"other":3}', 'I'),
('{"application_num":1,"other":1}', 'U')
;
MERGE INTO cust AS tgt
USING (
SELECT DISTINCT
parse_json(s.$1):application_num::int as application,
s.op_type
FROM prd_json AS s
) AS src
ON tgt.application = src.application
WHEN NOT MATCHED AND src.op_type = 'I' THEN
INSERT(application) VALUES (src.application );
number of rows inserted
2
SELECT * FROM cust;
APPLICATION
1
2
running the MERGE code a second time gives:
number of rows inserted
0
Now if truncate CUST and I swap to using this SQL for the inner part:
SELECT --DISTINCT
parse_json(s.$1):application_num::int as application,
s.op_type
FROM prd_json AS s
qualify row_number() over (partition by application order by application desc)=1
I get three rows inserted, because the partition by application, is effectively binding to the s.application not the output application, and there are 3 different "applications" because of the other values.
The reason I wrote my code this way is your
select distinct application
from prd_json where op_type='I';
implies there is something called application already, in the table.. and thus it runs the chance of being used in the ROW_NUMBER statement..
Anyways, there is a large possible problem is you also have "update data" I guess U in your transaction block, that you want to ORDER BY the sub-select so you never have a Inser,Update trying action in Update,Inser order. And assuming you want all update operations if there are many of them.. I will stop. But if you do not have Updates, the sub-select should have the op_type='I' to avoid the non-insert ops making it. Out, or possible worse again, in your ROW_NUMBER pattern replacing the Intserts. Which I suspect is the underlying cause of your problem.

How do I Select an aggregate function from a temp table without getting the invalid column error from not including the column in the GROUP BY clause?

I performed aggregate functions in a temp table but I'm getting an error because the field I performed the aggregate function on is not included in a GROUP BY in the table I am selecting from. To clarify, this is just a snippet so these tables are temp tables in the larger query. They are also named in the actual code.
WITH #t1 AS
(SELECT
Name,
Date,
COUNT(Email),
COUNT(DISTINCT Email)
FROM SentEmails)
SELECT
#t1.*,
#t2.GrossSents
FROM #t1
--***JOINS***
GROUP BY
#t1.Name,
#t1.Date
I expect a table with Name, Date, Count of Emails, Unique Emails, and Gross Sends fields but I get
Column '#t1.COUNT(Email)' is invalid in the select list` because it is not contained in either an aggregate function or the GROUP BY clause.

Break your issue into steps.
Start by getting the query inside your CTE to return the data you expect from it. The query as written here won't run because you're doing aggregation without a GROUP BY clause.
Once that query is giving you the results you want, wrap it in the CTE syntax and try a SELECT * FROM cteName to see if that works. You'll get an error here because each column in a CTE has to have a name and your last two columns don't have names. Also, as noted in the comments, it's a poor practice to name your CTE with a #. It makes the subsequent code more confusing, since it appears as though there's a temp table someplace, and there isn't.
After you have the CTE returning what you need, start joining other tables, one at a time. Monitor those results as you add tables so you're sure that your JOINs are working as you expect.
If you're doing further aggregation on the outer query, specifying SELECT * is just asking for trouble because you're going to need to specify every non-aggregated column in your GROUP BY anyway. As a general rule, you should enumerate your columns in your SELECT, and in this case that will allow you to copy & paste them to your eventual GROUP BY.

Concatenation with a complex query - SQL Server

So I've got a query with multiple joins and several rows that I want to put on one line. A couple of PIVOT statements solved most of this problem, but I have one last field with multiple rows (User Names) that I want to concatenate into one column.
I've read about COALESCE and got a sample to work, but I did not know how to combine the variable returned with the other data fields, as it has no key.
I also saw this recommended approach:
SELECT [ID],
STUFF((
SELECT ', ' + CAST([Name] AS VARCHAR(MAX))
FROM #YourTable WHERE (ID = Results.ID)
FOR XML PATH(''),TYPE
/* Use .value to uncomment XML entities e.g. > < etc*/
).value('.','VARCHAR(MAX)')
,1,2,'') as NameValues
FROM #YourTable Results
GROUP BY ID
But again, I'm not sure how to incorporate this into a complex query.
BTW, the users do not have write access to the DB, so cannot create functions, views, tables or even execute functions. So this limits the options somewhat.

SQL: Row number is different when sorting on columns with null values

In my C# application I'm using the following query to search for a particular string:
;WITH selectRows AS (SELECT *, row = ROW_NUMBER() OVER (ORDER BY <column_name>) FROM <table_name>)
SELECT row FROM selectRows WHERE <column_name> LIKE '%<search_string>%' COLLATE <collate> ORDER BY row;
This particular query always worked fine for me, even when the colum_name for the OVER ORDER BY clause was a column that contained null values. Yesterday I tried to search on a somewhat bigger SQL table (+- 1 million records), it suprised me that I got different row_numbers returned without changing the query between the executions. This only seem to happen on bigger tables and when the column_name for the OVER ORDER BY clause contains any null values. When the column_name is pointed to a column WITHOUT null values the query returns the same result over and over again.
I also tried the following query, but this did not work as well:
;WITH selectRows AS (SELECT *, row = ROW_NUMBER() OVER (ORDER BY ISNULL(<column_name>, '')) FROM <table_name>)
SELECT row FROM selectRows WHERE <column_name> LIKE '%<search_string>%' COLLATE <collate> ORDER BY row;
Note: both queries were tested on SQL Server 2012 and SQL Server 2008. The searched table also had a Primary Key (clustered) index on a Identity column and a nonclustered index on the column_name that is used for the OVER ORDER BY clause.
Thanks in advance!

You have specified an ordering criterion that is not a total order. Example: You order on a column that is always zero. That way the CTE can output a different row order each time.
You are filtering after ordering. That means your filter runs on different rows each time. It might happen to have a lot of matching rows or not.
In general SQL queries are not 100% deterministic thanks to certain constructs. There are more than this one.
Fix: Specify a total order. Use anything to break ties such as ORDER BY X, ID. As a habit I always specify a total order.

Sorting a query, how does thas it work?

Can someone explain to me why this is possible with SQL Server :
select column1 c,column2 d
from table1
order by c,column3
I can sort by column1 using the alias because order by clause is applied after the select clause, but how is it possible to sort by a column that i'm not retreiving ?
Thanks in advance.

All column names from the objects in the FROM clause are available to ORDER BY, except in the case of GROUPing or DISTINCT. As you've indicated the alias is also available, because the SELECT statement is processed before the ORDER BY.
This is one of those cases where you trust the optimizer.

According to Books Online (http://technet.microsoft.com/en-us/library/ms188385(v=sql.90).aspx)
The ORDER BY clause can include items that do not appear in the
select list. However, if SELECT DISTINCT is specified, or if the
statement contains a GROUP BY clause, or if the SELECT statement
contains a UNION operator, the sort columns must appear in the select
list.
Additionally, when the SELECT statement includes a UNION operator, the
column names or column aliases must be those specified in the first
select list.

You can sort by alias' which you define in the select select column1 c and then you tell it to sort by a column that you are not including in the select, but one that still exists in the table. This allows us to sort by expressions of data, without having to have it in the select.
Select cost, tax From table ORDER BY (cost*tax)