SQL Server: Duplicate columns in joined table, but distinct row info - sql-server

So I have joined two tables to identify claims and their corresponding reversals if there are any.
The following is a simplified explanation as to what I have done: Join where MbrNo is the same in both tables, and where Amount=-Amount. So now I have an output table contians duplicate column names:
MbrNo | ClaimType | Amount | MbrNo | ClaimType | Amount
xyz | Medicine | R 300 | xyz | Reversal | - R300
I can not input this in a table as column names are not unique.
But I would like to
1. Format this table to look as follows
MbrNo | ClaimType | Amount
xyz | Medicine | R 300
xyz | Reversal | - R300
with t as
(
select *,
count(*) over(partition by [MbrNo], [DepNo], [PracticeNo], [DisciplineCd], [ServiceDt],[PayAmt]) as rownum
from Claims
)
Select * from
(Select * from t where PayAmt<0) a
left outer join
(Select * from t where PayAmt>0) b
on a.[MbrNo]=b.[MbrNo]
and a.[DepNo]=b.[DepNo]
and a.[PracticeNo]=b.[PracticeNo]
and a.[DisciplineCd]=b.[DisciplineCd]
and a.[ServiceDt]=b.[ServiceDt]
and a.[PayAmt]=-b.[PayAmt]
Basically I want to put the 2nd table in the joined table underneath the first table.
Please help:(

If I've understood your requirements correctly then I think you want the UNION operator. See if this gets you going in the right direction.
with t as
(
select *,
count(*) over(partition by [MbrNo], [DepNo], [PracticeNo], [DisciplineCd], [ServiceDt],[PayAmt]) as rownum
from Claims
)
Select t.* from t where PayAmt < 0
union all
select b.* from
(Select * from t where PayAmt < 0) a
inner join
(Select * from t where PayAmt > 0) b
on a.[MbrNo] = b.[MbrNo]
and a.[DepNo] = b.[DepNo]
and a.[PracticeNo] = b.[PracticeNo]
and a.[DisciplineCd] = b.[DisciplineCd]
and a.[ServiceDt] = b.[ServiceDt]
and a.[PayAmt] = -b.[PayAmt]

Related

How to split a sql table into two tables using a third table containing reference values

I am new here and new to sql therefore I hope I am asking the the question correct.
table internal product EAN
EAN/UPC
1234567789
2233445566
table shop sales
EAN/UPC | product | sales value |
1234567789 | xyz | 200 |
2233445566 | abc | 100 |
9685444444 | yyy | 150 |
Result should look like:
table my company sales
EAN/UPC | product | sales value |
1234567789 | xyz | 200 |
2233445566 | abc | 100 |
and
table my competitor sales
EAN/UPC | product | sales value |
9685444444 | yyy | 150 |
I have all my EAN/UPC codes available (about 100.000)
I am receiving the sales data from the shop including competitor EAN/UPC codes which I need to seperate from mine. I would like to use the first table as reference and if move the values where the EAN/UPC match into the table my company sales, the others where there is no matching EAN/UPC into the table my competitors sales.
I was thinking about using a select into statement with the condition where EAN/UPC T1 is not EAN/UPC T2 .
Thank you very much for your help.
Select *
into CompanySales
from ShopSales
where [EAN/UPC] in (select [EAN/UPC] from productEAN);
Select *
into CompetitorSales
from ShopSales
where [EAN/UPC] not in (select [EAN/UPC] from productEAN);
This should work out:
Setup (for testing purposes) using CTE:
WITH [shop sales] ([EAN/UPC], [product], [sales value]) AS (
SELECT * FROM (
VALUES
(1234567789,'xyz',200),
(2233445566,'abc',100),
(9685444444,'yyy',150)
) AS A (Column1, Column2, Column3)
),
[internal product EAN] ([EAN/UPC]) AS (
SELECT * FROM (
VALUES
(1234567789),
(2233445566)
) AS A (Column1)
)
Two queries to pull the information:
SELECT s.*
FROM [internal product EAN] ip
INNER JOIN [shop sales] s ON ip.[EAN/UPC] = s.[EAN/UPC]
SELECT s.*
FROM [shop sales] s
WHERE s.[EAN/UPC] NOT IN (SELECT [EAN/UPC] FROM [internal product EAN])
As far as creating tables from the data, an INSERT INTO or a SELECT INTO for a new table would probably suffice.
Try something like this:
INSERT INTO OnlyMyProductSales([EAN],[product],SalesValue)
SELECT s.*
FROM [MyProduct] p
INNER JOIN [AllSales] s
ON p.[EAN] = s.[EAN]
INSERT INTO MyCompetitionSales([EAN],[product],SalesValue)
SELECT s.*
FROM [AllSales] s
LEFT JOIN [MyProduct] p
ON p.[EAN] = s.[EAN]
WHERE s.[EAN] IS NULL
this will help:
CREATE TABLE #internalproductEAN (EAN_UPC VARCHAR(50))
INSERT INTO #internalproductEAN
SELECT 1234567789 UNION ALL
SELECT 2233445566
CREATE TABLE #shopsales (EAN_UPC VARCHAR(50),product VARCHAR(10),SalesValue BIGINT)
INSERT INTO #shopsales
SELECT '1234567789','xyz',200 UNION ALL
SELECT '2233445566','abc',100 UNION ALL
SELECT '9685444444','yyy',150
CREATE TABLE #companysales (EAN_UPC VARCHAR(50),product VARCHAR(10),SalesValue BIGINT)
INSERT INTO #companysales
SELECT ss.* FROM #shopsales ss
INNER JOIN #internalproductEAN ip ON ss.EAN_UPC=ip.EAN_UPC
SELECT * FROM #companysales
CREATE TABLE #competitorsales (EAN_UPC VARCHAR(50),product VARCHAR(10),SalesValue BIGINT)
INSERT INTO #competitorsales
SELECT ss.* FROM #shopsales ss
LEFT JOIN #internalproductEAN ip ON ss.EAN_UPC=ip.EAN_UPC
WHERE ip.EAN_UPC IS NULL
SELECT * FROM #competitorsales
--SELECT * FROM #internalproductEAN
--SELECT * FROM #shopsales
DROP TABLE #internalproductEAN
DROP TABLE #shopsales
DROP TABLE #companysales
DROP TABLE #competitorsales

Postgres - join on array values

Say I have a table with schema as follows
id | name | tags |
1 | xyz | [4, 5] |
Where tags is an array of references to ids in another table called tags.
Is it possible to join these tags onto the row? i.e. replacing the id numbers with the values for thise rows in the tags table such as:
id | name | tags |
1 | xyz | [[tag_name, description], [tag_name, description]] |
If not, I wonder if this an issue with the design of the schema?
Example tags table:
create table tags(id int primary key, name text, description text);
insert into tags values
(4, 'tag_name_4', 'tag_description_4'),
(5, 'tag_name_5', 'tag_description_5');
You should unnest the column tags, use its elements to join the table tags and aggregate columns of the last table. You can aggregate arrays to array:
select t.id, t.name, array_agg(array[g.name, g.description])
from my_table as t
cross join unnest(tags) as tag
join tags g on g.id = tag
group by t.id;
id | name | array_agg
----+------+-----------------------------------------------------------------
1 | xyz | {{tag_name_4,tag_description_4},{tag_name_5,tag_description_5}}
(1 row)
or strings to array:
select t.id, t.name, array_agg(concat_ws(', ', g.name, g.description))
...
or maybe strings inside a string:
select t.id, t.name, string_agg(concat_ws(', ', g.name, g.description), '; ')
...
or the last but not least, as jsonb:
select t.id, t.name, jsonb_object_agg(g.name, g.description)
from my_table as t
cross join unnest(tags) as tag
join tags g on g.id = tag
group by t.id;
id | name | jsonb_object_agg
----+------+------------------------------------------------------------------------
1 | xyz | {"tag_name_4": "tag_description_4", "tag_name_5": "tag_description_5"}
(1 row)
Live demo: db<>fiddle.
not sure if this is still helpful for anyone, but unnesting the tags is quite a bit slower than letting postgres do the work directly from the array. you can rewrite the query and this is generally more performant because the g.id = ANY(tags) is a simple pkey index scan without the expansion step:
SELECT t.id, t.name, ARRAY_AGG(ARRAY[g.name, g.description])
FROM my_table AS t
LEFT JOIN tags AS g
ON g.id = ANY(tags)
GROUP BY t.id;

Total Count of column value in Sql Server

Two tables,
TableA and TableB with column "filename" which has same value in both table.
only the number of occurance of data is different.
e.g
|###TableA#########|
|id|filename_TableA|
|01|file1 |
|02|file1 |
|03|file2 |
|04|file2 |
|05|file3 |
|06|file4 |
|## TableB ########|
|id|filename_TableB|
|01|file1 |
|02|file1 |
|03|file1 |
|04|file2 |
|05|file2 |
|06|file3 |
|07|file3 |
|08|file4 |
|09|file4 |
I need to generate a SQL query which shows the distinct filename with there
number of count and totalcount of the distinct filename.
like this:
using select count(distinct filename_TableA) as totalCount from TableA
gives the totalCount of filename but I am not able to generate the sql query for above result output.
Tried for single table:
select
filename_TableA,
count(filename_TableA)as filecount_TableA,
totalCount = (
select count(distinct filename_TableA) from TableA
)
from TableA
group by filename_TableA
Always try to break your problem into smaller parts!
Your question consists of two parts:
Get distinct files and counts from tableA
Get distinct files and counts from tableB
We write queries:
1.
SELECT filename_TableA, COUNT( * ) AS filecount_TableA
FROM TableA
GROUP BY filename_TableA
2.
SELECT filename_TableB, COUNT( * ) AS filecount_TableB
FROM TableB
GROUP BY filename_TableB
Check that the results of each individual query are correct.
Then we combine the queries:
SELECT filename_TableA, filename_TableB, filecount_TableA, filecount_TableB, ISNULL( filecount_TableA, 0 ) + ISNULL( filecount_TableB, 0 ) AS totalCount,
COUNT(*) OVER() AS UniqueFileCount
FROM
( SELECT filename_TableA, COUNT( * ) AS filecount_TableA
FROM TableA
GROUP BY filename_TableA ) AS A
FULL OUTER JOIN
( SELECT filename_TableB, COUNT( * ) AS filecount_TableB
FROM TableB
GROUP BY filename_TableB ) AS B
ON A.filename_TableA = filename_TableB
Note: To cover scenarios where a file name may appear in one table but not the other I have used FULL OUTER JOIN.
If you do not have such a scenario i.e. each file name will appear at least once in every table, then you should use INNER JOIN as it will be faster.

select resultset of counts by array param in postgres

I've been searching for this and it seems like it should be something simple, but apparently not so much. I want to return a resultSet within PostgreSQL 9.4.x using an array parameter so:
| id | count |
--------------
| 1 | 22 |
--------------
| 2 | 14 |
--------------
| 14 | 3 |
where I'm submitting a parameter of {'1','2','14'}.
Using something (clearly not) like:
SELECT id, count(a.*)
FROM tablename a
WHERE a.id::int IN array('{1,2,14}'::int);
I want to test it first of course, and then write it as a storedProc (function) to make this simple.
Forget it, here is the answer:
SELECT a.id,
COUNT(a.id)
FROM tableName a
WHERE a.id IN
(SELECT b.id
FROM tableName b
WHERE b.id = ANY('{1,2,14}'::int[])
)
GROUP BY a.id;
You can simplify to:
SELECT id, count(*) AS ct
FROM tbl
WHERE id = ANY('{1,2,14}'::int[])
GROUP BY 1;
More:
Check if value exists in Postgres array
To include IDs from the input array that are not found I suggest unnest() followed by a LEFT JOIN:
SELECT id, count(t.id) AS ct
FROM unnest('{1,2,14}'::int[]) id
LEFT JOIN tbl t USING (id)
GROUP BY 1;
Related:
Preserve all elements of an array while (left) joining to a table
If there can be NULL values in the array parameter as well as in the id column (which would be an odd design), you'd need (slower!) NULL-safe comparison:
SELECT id, count(t.id) AS ct
FROM unnest('{1,2,14}'::int[]) id
LEFT JOIN tbl t ON t.id IS NOT DISTINCT FROM id.id
GROUP BY 1;

Select N rows avoiding duplicates on a non-key, non-index field

Using T-SQL, how can I select n rows of a non-key, non-index column and avoid duplicate results?
Example table:
ID_ | state | customer | memo
------------------------------------------
1 | abc | 123 | memo text xyz
2 | abc | 123 | memo text abc
3 | abc | 456 | memo text def
4 | abc | 456 | memo text rew
5 | abc | 789 | memo text yte
6 | def | 123 | memo text hrd
7 | def | 432 | memo text dfg
I want to select, say, 2 memos for state 'abc' but the returned memos should not be for the same customer.
memo
----
memo text xyz
memo text def
PS: The only select condition available is state (eg: where state = 'abc')
I have managed to do this in a very inefficient way
SELECT top 2 MAX(memo)
FROM table
WHERE state = 'abc'
GROUP BY customer
This works fine for small sample size, but the production table has over 1 billion rows.
You can try using the following query, in your actual database size. Not sure of the performance in database table with billion rows. So you can do the test yourself.
SELECT memo
FROM (SELECT memo,
ROW_NUMBER() OVER (PARTITION BY customer ORDER BY (SELECT 0)) AS RN
FROM table1 WHERE state = 'abc') T
WHERE RN = 1
You can check the SQL FIDDLE
EDIT: Adding a non-clustered index on state and customer including memo will tremendously improve the performance.
CREATE NONCLUSTERED INDEX [custom_index] ON table
(
[state] ASC,
[customer] ASC
)
INCLUDE ( [memo]) WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [DATA]
A way to get that n distinct value for state/customer is to get an ID for every group
SELECT MIN(ID_) ID
FROM Table1
GROUP BY State, customer
(MIN can be substituted by MAX, it's just a way to get one of the values)
then JOIN that to the table adding the other condition
WITH getID AS (
SELECT MIN(ID_) ID
FROM Table1
GROUP BY State, customer
)
SELECT TOP 2
t.ID_, t.State, t.Customer, t.memo
FROM table1 t
INNER JOIN getID g ON t.ID_ = g.ID
WHERE t.state = 'abc'
SQLFiddle demo
if your version of SQLServer doesn't support WITH the CTE can become a subquery
SELECT TOP 2
t.ID_, t.State, t.Customer, t.memo
FROM table1 t
INNER JOIN (SELECT MIN(ID_) ID
FROM Table1
GROUP BY State, customer
) g ON t.ID_ = g.ID
WHERE t.state = 'abc'
Another way is to use CROSS APPLY to get the distinct ID
SELECT TOP 2
t.ID_, t.State, t.Customer, t.memo
FROM table1 t
CROSS APPLY (SELECT TOP 1
ID_
FROM table1 t1
WHERE t1.State = t.State AND t1.Customer = t.Customer) c
WHERE t.state = 'abc'
AND c.ID_ = t.ID_;
SQLFiddle demo

Resources