How to do UNION on a single table - sql-server

I'm not sure why I'm getting the following error:
Incorrect syntax near the keyword 'UNION'.
My table looks like the following:
+-------+--------+---------------+
| label | Name | Budget |
+-------+--------+---------------+
| 1 | ABC | Allocated |
| 1 | DEF | NotAllocated |
| 0 | XYZ | Allocated |
| 0 | LMN | Allocated |
| 1 | QRS | NotAllocated |
+-------+--------+---------------+
I have a column called Label consisting of 1's and 0's.
Number of records where label is 1 = 10540
Number of records where label is 0 = 1546
I have many records for "1" so I want to undersample them to the "0" level
I'm trying to get 1600 records where label is 1 and 1546 records where label is 0.
I have tried the following but I'm getting an error. How to solve this issue?
SELECT TOP 1600 *
FROM myTable
ORDER BY label ASC
UNION ALL
SELECT TOP 1546 *
FROM myTable
ORDER BY label DESC

You can use the following solution:
SELECT * FROM (SELECT TOP 1600 * FROM myTable ORDER BY label ASC) t1
UNION ALL
SELECT * FROM (SELECT TOP 1546 * FROM myTable ORDER BY label DESC) t2
You get the error message on the ORDER BY with UNION. You can place the ORDER BY only after the SELECT statements of UNION. In your case this would not work because you are using different ORDER BY conditions. So you can solve this by "re-select" the results of your queries. You can find more information about this topic on the Transact-SQL documentation.
As #Zorkolot already mentioned in his answer you don't need a ORDER BY in case you only want to ORDER BY column label to get the rows with 0 or 1. So you can use the following too:
SELECT TOP 1600 * FROM myTable WHERE label = 0
UNION ALL
SELECT TOP 1546 * FROM myTable WHERE label = 1
demo (for both solutions): http://sqlfiddle.com/#!6/d9c21/4/1
Another thought:
If you want to get a maximum amout of rows per group (max. 1600 rows with label = 1 and max. 1600 rows with label = 0, so in sum max. 3200 rows). You should use the following:
SELECT TOP 1600 * FROM myTable WHERE label = 0
UNION ALL
SELECT TOP 1600 * FROM myTable WHERE label = 1

I am trying to get 1600 records where label is 1 and 1546 records
where label is 0.
You could just say WHERE label = # instead of using order by.
SELECT TOP 1600 *
FROM myTable
WHERE label = 1
UNION ALL
SELECT TOP 1546 * --you might not even need TOP here (there are only 1546)
FROM myTable
WHERE label = 0

Related

PostgreSQL - Filtering result set by array column

I have a function which returns a table. One of the columns happens to be a text array. Currently the values in this array column will only ever have at most 2 elements, however there is the instance when the same row will be returned twice with the duplicate row elements in the opposite order. I'm hoping to find a way to only return 1 of these rows and discard the other. To give an example, I run a function as
SELECT * FROM schema.function($1,$2,$3)
WHERE conditions...;
which returns me something like this
ID | ARRAY_COL | ...
1 | {'Good','Day'} | ...
2 | {'Day','Good'} | ...
3 | {'Stuck'} | ...
4 | {'with'} | ...
5 | {'array'} | ...
6 | {'filtering'} | ...
So in this example, I want to return the whole result set with the exception that I only want either row 1 or 2 as they have the same elements in the array (albeit inverted with respect to each other). I'm aware this is probably a bit of a messy problem, but it's something I need to get to the bottom of. Ideally I would like to stick a WHERE clause at the end of my function call which forced the result set to ignore any array value that had the same elements as a previous row. Pseudo code might be something like
SELECT * FROM schema.function($1,$2,$3)
WHERE NOT array_col #> (any previous array_col value);
Any pointers in the right direction would be much appreciated, thanks.
Not sure is the best solution, but could work especially in cases where you might have partially overlapping arrays.
The solution is in 3 steps:
unnest the array_col column and order by id and the item value
select
id,
unnest(array_col) array_col_val
from dataset
order by
id,
array_col_val
regroup by id, now row with id 1 and 2 have the same array_col value
select id,
array_agg(array_col_val) array_col
from ordering
group by id
Select the min id grouping by array_col
select array_col, min(id) from regrouping group by array_col
Full statement
with dataset as (
select 1 id, ARRAY['Good','Day'] array_col UNION ALL
select 2 id, ARRAY['Day','Good'] array_col UNION ALL
select 3 id, ARRAY['Stuck'] array_col UNION ALL
select 4 id, ARRAY['with'] array_col UNION ALL
select 5 id, ARRAY['array_colay'] array_col UNION ALL
select 6 id, ARRAY['filtering'] array_col
)
, ordering as
(select id,
unnest(array_col) array_col_val
from dataset
order by id,
array_col_val)
, regrouping as
(
select id,
array_agg(array_col_val) array_col
from ordering
group by id
)
select array_col, min(id) from regrouping group by array_col;
Result
array_col | min
---------------+-----
{Stuck} | 3
{array_colay} | 5
{filtering} | 6
{with} | 4
{Day,Good} | 1
(5 rows)

Stored procedure: set output parameter from first row of select

I have a stored procedure in SQL Server 2014 that selects some rows from a table with pagination, along with total row count:
SELECT
[...], COUNT(*) OVER () AS RowCount
FROM
[...]
WHERE
[...]
ORDER BY
[...]
OFFSET ([..]) ROWS FETCH NEXT 3 ROWS ONLY
Output:
+----+------+----------+
| ID | Name | RowCount |
+----+------+----------+
| 1 | Bob | 55 |
| 123| John | 55 |
| 99 | Jack | 55 |
+----+------+----------+
I would like to return results with actual data only, passing RowCount in an output parameter.
+----+------+
| ID | Name |
+----+------+
| 1 | Bob |
| 123| John |
| 99 | Jack |
+----+------+
#OutRowCount = 55
I tried with a CTE, but CTE is available only within the first SELECT:
WITH CTE AS
(
SELECT [...], COUNT(*) OVER () AS RowCount
FROM [...]
WHERE [...]
ORDER BY [...]
OFFSET ([..]) ROWS FETCH NEXT 3 ROWS ONLY
)
SELECT
ID, Name
FROM
CTE
SET #OutRowCount = (SELECT TOP 1 RowCount FROM CTE) -- here CTE is no longer defined
How can I do this? I think I can use temp table but I wonder if in this case performance might be an issue.
The "total row count" you have in mind is a bit unclear. Typically when paging you also display to total number of (filtered) rows, e.g. "Showing 3 of 42 Blue Widgets". That doesn't involve Max.
A CTE can have multiple queries, e.g.:
with
AllRows as ( -- All of the filtered rows.
select ..., Count(*) over (...) as RowCount
from ...
where ... -- Filter criteria. ),
SinglePage as ( -- One page of filtered rows.
select ...
from AllRows
order by ... -- Order here to get the correct rows in the page.
offset (...) rows fetch next 3 rows only )
select SP.Id, SP.Name,
( select Count(42) from AllRows ) as TotalRowCount -- Constant over all rows.
from SinglePage
order by ...; -- Keep the rows in the desired order.
Re: SET #OutRowCount = (SELECT TOP 1 RowCount FROM CTE)
Note that TOP 1 without order by isn't guaranteed to pick the row you have in mind.
Thanks to #Larnu and #Stu, I solved this using a table variable, this way:
CREATE PROCEDURE MyProc
#OutRowCount INT OUTPUT
AS
BEGIN
DECLARE #TempTbl TABLE (
ID INT,
Name VARCHAR(MAX),
RowCount INT
)
INSERT INTO
#TempTbl
SELECT
ID,
Name,
COUNT(*) OVER() AS TotRighe
FROM
MyTable
WHERE
[...]
ORDER BY
Name
OFFSET ([...]) ROWS FETCH NEXT 3 ROWS ONLY
SELECT
ID,
Name
FROM
#TempTbl
SET #OutRowCount = ISNULL((SELECT TOP 1 RowCount FROM #TempTbl), 0)
END

SQL: doing several joins to evaluate different columns

I'm quite new to SQL but use it a lot now in my work now (Microsoft SQL Server).
So the issue is this: I collect data that is atypical for a certain column.
Let's say I got different Burgers and they should have a standardized calories value. So I did this with a query
------------------------------------------
| Burger | calories | numBurgers | Rank |
------------------------------------------
| Chicken| 600 | 20 | 1 |
| Chicken| 400 | 3 | 2 |
| Beef | 700 | 35 | 1 |
| Beef | 850 | 4 | 2 |
-------------------------------------------
To get a list of all the "wrong" burgers I use a temporary table and filter out GroupRank = 1
USE database;
GO
WITH GapRanking AS
(
SELECT TOP 100 PERCENT Burger, calories, COUNT(calories),
ROW_NUMBER() OVER(PARTITION BY Burger ORDER BY COUNT(calories) DESC) AS Rank
)
SELECT * FROM GapRanking
WHERE Rank <> 1
...
I get all combinations of Burgers and calories that are not "standard"
Then I do an Inner Join with the original table and all columns on the one above.
SELECT * FROM BaseTable as base
INNER JOIN
(SELECT * FROM GapRanking
WHERE Rank <> 1) AS err
ON (base.Burgers = err.Burgers
AND base.calories = err.calories)
This way I get a table with complete information about the "not-standard" burgers. So far so good.
Now I want to add other rows where there is a deviation in another criteria, price for example, not just calories and add it to the list if its not already there.
So I thought of UNION or JOIN.
So what is the best approach. UNION the above query with the same query just different column (price instead of calories)?
Or do a JOIN with the same query just different column (price instead of calories)?
The code gets quite "ugly" and I'm not sure if I do the right approach here.
Also because of me using the temporary table using WITH a UNION does not seem possible so easily.
I'm really glad for any ideas here. Cheers
use sub-query and join below is just sudo-code not actual you can follow like this way
select t1.*, t2.required_colum
(SELECT TOP 100 PERCENT Burger, calories, COUNT(calories),
ROW_NUMBER() OVER(PARTITION BY Burger ORDER BY COUNT(calories) DESC) AS Rank
) as t1
join
(SELECT TOP 100 PERCENT Burger, calories, COUNT(calories),
ROW_NUMBER() OVER(PARTITION BY Burger ORDER BY COUNT(calories) DESC) AS Rank
) as t2
on t1.colname = t2.colname
where t1.Rank != 1 and t2.Rank != 1

Order by number of matches in array

I'm trying to get a list of best matching items giving a list of tags with the data below:
DROP TABLE IF EXISTS testing_items;
CREATE TEMP TABLE testing_items(
id bigserial primary key,
tags text[]
);
CREATE INDEX ON testing_items using gin (tags);
INSERT INTO testing_items (tags) VALUES ('{123,456, abc}');
INSERT INTO testing_items (tags) VALUES ('{222,333}');
INSERT INTO testing_items (tags) VALUES ('{222,555}');
INSERT INTO testing_items (tags) VALUES ('{222,123}');
INSERT INTO testing_items (tags) VALUES ('{222,123,555,666}');
I have the tags 222,555 and 666. How can I get a list like this?
gin index must be used because there will be tons of records.
id
matches
5
3
3
2
2
1
4
1
id 1 should not be in the list because it doesn't match any tag:
id
matches
1
0
Unnest tags, filter unnested elements and aggregate remaining ones:
select id, count(distinct u) as matches
from (
select id, u
from testing_items,
lateral unnest(tags) u
where u in ('222', '555', '666')
) s
group by 1
order by 2 desc
id | matches
----+---------
5 | 3
3 | 2
2 | 1
4 | 1
(4 rows)
Considering all the answers, it seems that this query combines good sides of each of them:
select id, count(*)
from testing_items,
unnest(array['11','5','8']) u
where tags #> array[u]
group by id
order by 2 desc, 1;
It has the best performance in Eduardo's test.
Here's my two cents using unnest and array contains:
select id, count(*)
from (
select unnest(array['222','555','666']) as tag, *
from testing_items
) as w
where tags #> array[tag]
group by id
order by 2 desc
Results:
+------+---------+
| id | count |
|------+---------|
| 5 | 3 |
| 3 | 2 |
| 2 | 1 |
| 4 | 1 |
+------+---------+
This is how I tested with 10 million records with 3 tags each with random numbers between 0 and 100:
BEGIN;
LOCK TABLE testing_items IN EXCLUSIVE MODE;
INSERT INTO testing_items (tags) SELECT (ARRAY[trunc(random() * 99 + 1), trunc(random() * 99 + 1), trunc(random() * 99 + 1)]) FROM generate_series(1, 10000000) s;
COMMIT;
I've put an ORDER BY c DESC, id LIMIT 5 for not waiting for big responses.
#paqash and #klin solutions have similar performance. My laptop runs them in 12 seconds with the tags 11, 8 and 5.
But this runs in 4.6 seconds:
SELECT id, count(*) as c
FROM (
SELECT id FROM testing_items WHERE tags #> '{11}'
UNION ALL
SELECT id FROM testing_items WHERE tags #> '{8}'
UNION ALL
SELECT id FROM testing_items WHERE tags #> '{5}'
) as items
GROUP BY id
ORDER BY c DESC, id
LIMIT 5
But I still think there is a faster way.
Check it here: http://rextester.com/UTGO74511
If you are using a GIN index, use &&:
select *
from testing_items
where not (ARRAY['333','555','666'] && tags);
id | tags
--- -------------
1 123456abc
4 222123

SQL Server - Transpose rows into columns

I've searched high and low for an answer to this so apologies if it's already answered!
I have the following result from a query in SQL 2005:
ID
1234
1235
1236
1267
1278
What I want is
column1|column2|column3|column4|column5
---------------------------------------
1234 |1235 |1236 |1267 |1278
I can't quite get my head around the pivot operator but this looks like it's going to be involved. I can work with there being only 5 rows for now but a bonus would be for it to be dynamic, i.e. can scale to x rows.
EDIT:
What I'm ultimately after is assigning the values of each resulting column to variables, e.g.
DECLARE #id1 int, #id2 int, #id3 int, #id4 int, #id5 int
SELECT #id1 = column1, #id2 = column2, #id3 = column3, #id4 = column4,
#id5 = column5 FROM [transposed_table]
You also need a value field in your query for each id to aggregate on. Then you can do something like this
select [1234], [1235]
from
(
-- replace code below with your query, e.g. select id, value from table
select
id = 1234,
value = 1
union
select
id = 1235,
value = 2
) a
pivot
(
avg(value) for id in ([1234], [1235])
) as pvt
I think you'll find the answer in this answer to a slightly different question: Generate "scatter plot" result of members against sets from SQL query
The answer uses Dynamic SQL. Check out the last link in mellamokb's answer: http://www.sqlfiddle.com/#!3/c136d/14 where he creates column names from row data.
In case you have a grouped flat data structure that you want to group transpose, like such:
GRP | ID
---------------
1 | 1234
1 | 1235
1 | 1236
1 | 1267
1 | 1278
2 | 1234
2 | 1235
2 | 1267
2 | 1289
And you want its group transposition to appear like:
GRP | Column 1 | Column 2 | Column 3 | Column 4 | Column 5
-------------------------------------------------------------
1 | 1234 | 1235 | 1236 | 1267 | 1278
2 | 1234 | 1235 | NULL | 1267 | NULL
You can accomplish it with a query like this:
SELECT
Column1.ID As column1,
Column2.ID AS column2,
Column3.ID AS column3,
Column4.ID AS column4,
Column5.ID AS column5
FROM
(SELECT GRP, ID FROM FlatTable WHERE ID = 1234) AS Column1
LEFT OUTER JOIN
(SELECT GRP, ID FROM FlatTable WHERE ID = 1235) AS Column2
ON Column1.GRP = Column2.GRP
LEFT OUTER JOIN
(SELECT GRP, ID FROM FlatTable WHERE ID = 1236) AS Column3
ON Column1.GRP = Column3.GRP
LEFT OUTER JOIN
(SELECT GRP, ID FROM FlatTable WHERE ID = 1267) AS Column4
ON Column1.GRP = Column4.GRP
LEFT OUTER JOIN
(SELECT GRP, ID FROM FlatTable WHERE ID = 1278) AS Column5
ON Column1.GRP = Column5.GRP
(1) This assumes you know ahead of time which columns you will want — notice that I intentionally left out ID = 1289 from this example
(2) This basically uses a bunch of left outer joins to append 1 column at a time, thus creating the transposition. The left outer joins (rather than inner joins) allow for some columns to be null if they don't have corresponding values from the flat table, without affecting any subsequent columns.

Resources