SQL - split data into two columns based on identifier

SQL - split data into two columns based on identifier - sql-server

I have a table that contains the product categories/headers and product names but all in one column. I need to split it out into separate Category and Product columns. I also have a helper column Header_Flag which can be used to determine if the row contains the header or product name. The input table looks like this:
|---------------------|------------------|
| Product | Header_Flag |
|---------------------|------------------|
| Furniture | Y |
| Bed | N |
| Table | N |
| Chair | N |
| Cosmetics | Y |
| Lip balm | N |
| Lip stick | N |
| Eye liner | N |
| Apparel | Y |
| Shirt | N |
| Trouser | N |
|---------------------|------------------|
The output format I'm looking for would be like this:
|---------------------|------------------|
| Category | Product |
|---------------------|------------------|
| Furniture | Bed |
| Furniture | Table |
| Furniture | Chair |
| Cosmetics | Lip balm |
| Cosmetics | Lip stick |
| Cosmetics | Eye liner |
| Apparel | Shirt |
| Apparel | Trouser |
|---------------------|------------------|

With the data is it stands, you cannot get the results you are after. To be able to achieve this, you need to be able to order your data, using an ORDER BY clause, and ordering on either of these column does not achieve the same result as the sample data:
CREATE TABLE dbo.YourTable (Product varchar(20),HeaderFlag char(1));
GO
INSERT INTO dbo.YourTable
VALUES('Furniture','Y'),
('Bed','N'),
('Table','N'),
('Chair','N'),
('Cosmetics','Y'),
('Lipbalm','N'),
('Lipstick','N'),
('Eyeliner','N'),
('Apparel','Y'),
('Shirt','N'),
('Trouser','N');
GO
SELECT *
FROM dbo.YourTable
ORDER BY Product;
GO
SELECT *
FROM dbo.YourTable
ORDER BY HeaderFlag
GO
DROP TABLE dbo.YourTable;
AS you can see, the orders both differ.
If you add a column you can order on though (I'm going to use an IDENTITY) then you can achieve this:
CREATE TABLE dbo.YourTable (I int IDENTITY, Product varchar(20),HeaderFlag char(1));
GO
INSERT INTO dbo.YourTable
VALUES('Furniture','Y'),
('Bed','N'),
('Table','N'),
('Chair','N'),
('Cosmetics','Y'),
('Lipbalm','N'),
('Lipstick','N'),
('Eyeliner','N'),
('Apparel','Y'),
('Shirt','N'),
('Trouser','N');
GO
SELECT *
FROM dbo.YourTable YT
ORDER BY I;
Then you can use a cumulative COUNT to put the values into groups and get the header:
WITH Grps AS(
SELECT YT.I,
YT.Product,
YT.HeaderFlag,
COUNT(CASE YT.HeaderFlag WHEN 'Y' THEN 1 END) OVER (ORDER BY I ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM dbo.YourTable YT),
Split AS(
SELECT G.I,
MAX(CASE G.HeaderFlag WHEN 'Y' THEN Product END) OVER (PARTITION BY G.Grp) AS Category,
G.Product,
G.HeaderFlag
FROM Grps G)
SELECT S.Category,
S.Product
FROM Split S
WHERE HeaderFlag = 'N'
ORDER BY S.I;

Related

How to identify valid records based on column values in snowflake

I have a table as below
I want output like below
This means I have few predefined pairs, example
if one employee is coming from both HR_INTERNAL and HR_EXTERNAL, take only that record which is from HR_INTERNAL
if one employee is coming from both SALES_INTERNAL and SALES_EXTERNAL, take only that record which is from SALES_INTERNAL
etc.
Is there a way to achieve this?
I used ROW_NUMBER to rank
ROW_NUMBER() OVER(PARTITION BY "EMPID" ORDER BY SOURCESYSTEM ASC) AS RANK_GID

I just put them on a table like this:
create or replace table predefined_pairs ( pairs ARRAY );
insert into predefined_pairs select [ 'HR_INTERNAL', 'HR_EXTERNAL' ] ;
insert into predefined_pairs select [ 'SALES_INTERNAL', 'SALES_EXTERNAL' ] ;
Then I use the following query to produce the output you wanted:
select s.sourcesystem, s.empid,
CASE WHEN COUNT(1) OVER(PARTITION BY EMPID) = 1 THEN 'ValidRecord'
WHEN p.pairs[0] IS NULL THEN 'ValidRecord'
WHEN p.pairs[0] = s.sourcesystem THEN 'ValidRecord'
ELSE 'InvalidRecord'
END RecordValidity
from source s
left join predefined_pairs p on array_contains( s.sourcesystem::VARIANT, p.pairs ) ;
+-------------------+--------+----------------+
| SOURCESYSTEM | EMPID | RECORDVALIDITY |
+-------------------+--------+----------------+
| HR_INTERNAL | EMP001 | ValidRecord |
| HR_EXTERNAL | EMP001 | InvalidRecord |
| SALES_INTERNAL | EMP002 | ValidRecord |
| SALES_EXTERNAL | EMP002 | InvalidRecord |
| HR_EXTERNAL | EMP004 | ValidRecord |
| SALES_INTERNAL | EMP005 | ValidRecord |
| PURCHASE_INTERNAL | EMP003 | ValidRecord |
+-------------------+--------+----------------+

SQL QUERY - check column has 'letter' or '-' then bring value from another column

I have table 1 data and I need to create another column as on criteria
If id has "-" or "any letters" bring value from invoice
Table1
+---------+---------+
| id | invoice |
+---------+---------+
| 1234 | 2534 |
| 9870 | 6542 |
| ABC234 | 9874 |
| 34-5469 | 325416 |
+---------+---------+
Expected Result as id2
+---------+---------+--------+
| id | invoice | id2 |
+---------+---------+--------+
| 1234 | 2534 | 1234 |
| 9870 | 6542 | 9870 |
| ABC234 | 9874 | 9874 |
| 34-5469 | 325416 | 325416 |
+---------+---------+--------+

You can use isnumeric function to find out whether the id is int or not.
select *,
case when isnumeric(id) = 1 then id else invoice end as id2
from [yourtable]
Edit: isnumeric is not providing credible results all the time and hence if you are using SQL server 2012 or 2014 and above you may go for try_cast
select *
,case when try_cast(id as int) is not null then id else invoice end as id2
from [yourtable]

Assuming that you are just looking for values that have a letter or a hyphen (-) you could use a CASE expression and a LIKE like this:
SELECT CASE WHEN id LIKE '%[A-z-]%' THEN invoice ELSE id END
FROM dbo.YourTable;
What would likely be better, however, is to check that id doesn't value any characters apart from digits:
SELECT CASE WHEN id LIKE '%[^0-9]%' THEN invoice ELSE id END
FROM dbo.YourTable;

Check if records exists at least once in LEFT JOINED Table

I have an Images, Orders and OrderItems table, I want to match for any images, if any has already been bought by the User passed as parameters by displaying true or false in an IsBought column.
Select Images.Id,
Images.Title,
Images.Description,
Images.Location,
Images.PriceIT,
Images.PostedAt,
CASE WHEN OrderItems.ImageId = Images.Id THEN CAST(1 AS BIT)
ELSE CAST(0 AS BIT) END
AS 'IsBought'
FROM Images
INNER JOIN Users as u on Images.UserId = u.Id
LEFT JOIN Orders on Orders.UserId = #userId
LEFT JOIN OrderItems on Orders.Id = OrderItems.OrderId and OrderItems.ImageId = Images.Id
Group By Images.Id,
Images.Title,
Images.Description,
Images.Location,
Images.PriceIT,
Images.PostedAt,
OrderItems.ImageId,
Orders.UserId
When I use this CASE WHEN I have duplicates when the item has been bought where IsBought is True and the duplicate is False.
In the case where the Item has never been bought, there is no duplicates, IsBought is just equal to False
----------------------------------
| User | type |
----------------------------------
| Id | nvarchar(450) |
----------------------------------
| .......|
----------------------------------
----------------------------------
| Orders | type |
----------------------------------
| Id | nvarchar(255) |
----------------------------------
| UserId | nvarchar(450) |
----------------------------------
| ........................... |
----------------------------------
----------------------------------
| OrderItems | type |
----------------------------------
| Id | nvarchar(255) |
----------------------------------
| OrderId | nvarchar(255) |
----------------------------------
| ImageId | int |
----------------------------------
----------------------------------
| Images | type |
----------------------------------
| Id | int |
----------------------------------
| UserId | nvarchar(450) |
----------------------------------
| Title | nvarchar(MAX) |
----------------------------------
| Description| nvarhar(MAX) |
----------------------------------
| ......................... |
----------------------------------
Any ideas on how I could just have one row per Images with IsBought set to true or false but not duplicates?
I would like something like this:
----------------------------------------------------------------------------
| Id | Title | Description | Location | PriceIT | Location | IsBought |
----------------------------------------------------------------------------
| 1 | Eiffel Tower | .... | ...... | 20.0 | Paris | true |
----------------------------------------------------------------------------
| 2 | Tore di Pisa | .... | ...... | 20.0 | Italia | false |
---------------------------------------------------------------------------
| etc ......
---------------------------------------------------------------------------

Your query logic looks suspicious. It is unusual to see a join that consists only of a comparison of a column from the unpreserved table to a parameter. I suspect that you don't need a join to users at all since you seem to be focused on things "bought" by a person and not things "created" (which is implied by the name "author") by that same person. And a group by clause with no aggregate is often a cover-up for a logically flawed query.
So start over. You want to see all images apparently. For each, you simply want to know if that image is associated with any order of a given person.
select img.*, -- you would, or course, only select the columns needed
(select count(*) from Sales.SalesOrderDetail as orddet
where orddet.ProductID = img.ProductID) as [Order Count],
(select count(*) from Sales.SalesOrderDetail as orddet
inner join Sales.SalesOrderHeader as ord
on orddet.SalesOrderID = ord.SalesOrderID
where orddet.ProductID = img.ProductID
and ord.CustomerID = 29620
) as [User Order Count],
case when exists(select * from Sales.SalesOrderDetail as orddet
inner join Sales.SalesOrderHeader as ord
on orddet.SalesOrderID = ord.SalesOrderID
where orddet.ProductID = img.ProductID
and ord.CustomerID = 29620) then 1 else 0 end as [Has Ordered]
from Production.ProductProductPhoto as img
where img.ProductID between 770 and 779
order by <something useful>;
Notice the aliases - it is much easier to read a long query when you use aliases that are shorter but still understandable (i.e., not single letters). I've included 3 different subqueries to help you understand correlation and how you can build your logic to achieve your goal and help debug any issues you find.
This is based on AdventureWorks sample database - which you should install and use as a learning tool (and to help facilitate discussions with others using a common data source). Note that I simply picked a random customer ID value - you would use your parameter. I filtered the query to a range of images to simplify debugging. Those are very simple but effective methods to help write and debug sql.

How do you create a query which returns dynamic column names in Postgresql?

I have two tables in a reporting database, one for orders, and one for order items. Each order can have multiple order items, along with a quantity for each:
Orders
+----------+---------+
| order_id | email |
+----------+---------+
| 1 | 1#1.com |
+----------+---------+
| 2 | 2#2.com |
+----------+---------+
| 3 | 3#3.com |
+----------+---------+
Order Items
+---------------+----------+----------+--------------+
| order_item_id | order_id | quantity | product_name |
+---------------+----------+----------+--------------+
| 1 | 1 | 1 | Tee Shirt |
+---------------+----------+----------+--------------+
| 2 | 1 | 3 | Jeans |
+---------------+----------+----------+--------------+
| 3 | 1 | 1 | Hat |
+---------------+----------+----------+--------------+
| 4 | 2 | 2 | Tee Shirt |
+---------------+----------+----------+--------------+
| 5 | 3 | 3 | Tee Shirt |
+---------------+----------+----------+--------------+
| 6 | 3 | 1 | Jeans |
+---------------+----------+----------+--------------+
For reporting purposes, I'd love to denormalise this data into a separate PostgreSQL view (or just run a query) that turns the data above into something like this:
+----------+---------+-----------+-------+-----+
| order_id | email | Tee Shirt | Jeans | Hat |
+----------+---------+-----------+-------+-----+
| 1 | 1#1.com | 1 | 3 | 1 |
+----------+---------+-----------+-------+-----+
| 2 | 2#2.com | 2 | 0 | 0 |
+----------+---------+-----------+-------+-----+
| 3 | 3#3.com | 3 | 1 | 0 |
+----------+---------+-----------+-------+-----+
ie, it's a sum of the quantity of each item within the order with the product name; and the product names set as the column titles. Do I need to use something like crosstab to do this, or is there a clever way using subqueries even if I don't know the list of distinct product names at before the query runs.

This is one possible answer:
create table orders
(
orders_id int PRIMARY KEY,
email text NOT NULL
);
create table orders_items
(
order_item_id int PRIMARY KEY,
orders_id int REFERENCES orders(orders_id) NOT NULL,
quantity int NOT NULL,
product_name text NOT NULL
);
insert into orders VALUES (1, '1#1.com');
insert into orders VALUES (2, '2#2.com');
insert into orders VALUES (3, '3#3.com');
insert into orders_items VALUES (1,1,1,'T-Shirt');
insert into orders_items VALUES (2,1,3,'Jeans');
insert into orders_items VALUES (3,1,1,'Hat');
insert into orders_items VALUES (4,2,2,'T-Shirt');
insert into orders_items VALUES (5,3,3,'T-Shirt');
insert into orders_items VALUES (6,3,1,'Jeans');
select
orders.orders_id,
email,
COALESCE(tshirt.quantity, 0) as "T-Shirts",
COALESCE(jeans.quantity,0) as "Jeans",
COALESCE(hat.quantity, 0) as "Hats"
from
orders
left join (select orders_id, quantity from orders_items where product_name = 'T-Shirt')
as tshirt ON (tshirt.orders_id = orders.orders_id)
left join (select orders_id, quantity from orders_items where product_name = 'Jeans')
as jeans ON (jeans.orders_id = orders.orders_id)
left join (select orders_id, quantity from orders_items where product_name = 'Hat')
as hat ON (hat.orders_id = orders.orders_id)
;
Tested with postgresql. Result:
orders_id | email | T-Shirts | Jeans | Hats
-----------+---------+----------+-------+------
1 | 1#1.com | 1 | 3 | 1
2 | 2#2.com | 2 | 0 | 0
3 | 3#3.com | 3 | 1 | 0
(3 rows)
Based on your comment, you can try to use tablefunc like this:
CREATE EXTENSION tablefunc;
SELECT * FROM crosstab
(
'SELECT orders_id, product_name, quantity FROM orders_items ORDER BY 1',
'SELECT DISTINCT product_name FROM orders_items ORDER BY 1'
)
AS
(
orders_id text,
TShirt text,
Jeans text,
Hat text
);
But I think you are thinking the wrong way about SQL. You usually know which rows you want and have to tell it SQL. "Rotating tables" 90 degrees is not part of SQL and should be avoided.

MySQL I want to optimize this further

So I started off with this query:
SELECT * FROM TABLE1 WHERE hash IN (SELECT id FROM temptable);
It took forever, so I ran an explain:
mysql> explain SELECT * FROM TABLE1 WHERE hash IN (SELECT id FROM temptable);
+----+--------------------+-----------------+------+---------------+------+---------+------+------------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-----------------+------+---------------+------+---------+------+------------+-------------+
| 1 | PRIMARY | TABLE1 | ALL | NULL | NULL | NULL | NULL | 2554388553 | Using where |
| 2 | DEPENDENT SUBQUERY | temptable | ALL | NULL | NULL | NULL | NULL | 1506 | Using where |
+----+--------------------+-----------------+------+---------------+------+---------+------+------------+-------------+
2 rows in set (0.01 sec)
It wasn't using an index. So, my second pass:
mysql> explain SELECT * FROM TABLE1 JOIN temptable ON TABLE1.hash=temptable.hash;
+----+-------------+-----------------+------+---------------+----------+---------+------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+---------------+----------+---------+------------------------+------+-------------+
| 1 | SIMPLE | temptable | ALL | hash | NULL | NULL | NULL | 1506 | |
| 1 | SIMPLE | TABLE1 | ref | hash | hash | 5 | testdb.temptable.hash | 527 | Using where |
+----+-------------+-----------------+------+---------------+----------+---------+------------------------+------+-------------+
2 rows in set (0.00 sec)
Can I do any other optimization?

You can gain some more speed by using a covering index, at the cost of extra space consumption. A covering index is one which can satisfy all requested columns in a query without performing a further lookup into the clustered index.
First of all get rid of the SELECT * and explicitly select the fields that you require. Then you can add all the fields in the SELECT clause to the right hand side of your composite index. For example, if your query will look like this:
SELECT first_name, last_name, age
FROM table1
JOIN temptable ON table1.hash = temptable.hash;
Then you can have a covering index that looks like this:
CREATE INDEX ix_index ON table1 (hash, first_name, last_name, age);

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL - split data into two columns based on identifier - sql-server

Related

How to identify valid records based on column values in snowflake

SQL QUERY - check column has 'letter' or '-' then bring value from another column

Check if records exists at least once in LEFT JOINED Table

How do you create a query which returns dynamic column names in Postgresql?

MySQL I want to optimize this further

Categories

Resources