I have two tables. First is ItemDetails and second is ItemHeaders.
ItemHeaders:
ItemID ItemName
1 Apple
2 Orange
3 Grapes
ItemDetails:
ID ItemHeader1 ItemHeader2 ItemHeader3
1 1 2 1
2 3 2 1
3 2 1 2
4 2 3 3
OutPut:
ID Categroy1 Categroy2 Category3
1 Apple Orange Apple
2 Grapes Orange Apple
3 Orange Apple Orange
4 Orange Grapes Grapes
My Query:
Select ID, i1.ItemName as Categroy1, i2.ItemName as Categroy2, i3.ItemName as Categroy3
From ItemDetails d
Left Join ItemHeaders i1 on d.ItemHeader1 = i1.ItemID
Left Join ItemHeaders i2 on d.ItemHeader2 = i2.ItemID
Left Join ItemHeaders i3 on d.ItemHeader3 = i3.ItemID
Question: This is sample data and I have 50,000 records in ItemDetails. When I run my query it takes lots of time. Can someone suggest optimize query or best option to achieve above result? Please let me know if question or query is not clear.
Edit: There is an index on ItemID. You said about Pivot. How can I use pivot to get my result? And also there are 10 headers instead of 3. Here I have mentioned only 3.
Apart from index.hope left join is correct.i mean in your requirement you cannot use inner join .
what is the purpose of ItemDetails ?
you can have one column called itemHeader and type .
why you will need to run 50000 and above rows at one go.why not use paging ?
you can pivot the thing in front end also .
Related
Let's say I have this data, how can I segment in Google Data Studio to return the result as in next table:
Item Name
Category
Returned
Apples
Fruits
0
Potato
Vegetables
1
TV
Electronics
2
Banana
Fruits
2
Tomato
Vegetables
0
Fridge
Electronics
2
Grapes
Fruits
1
Onion
Vegetables
2
AC
Electronics
2
Pineapple
Fruits
0
Carrot
Vegetables
1
Oven
Electronics
1
I am looking for that end result appears like that Returned (0-2) is the count not the sum.
Category
Returned (0)
Returned (1)
Returned (2)
Fruits
2
1
2
Vegetables
1
2
1
Electronics
0
1
2
I tried filtering but not appearing correctly.
create a new calculated field with the formula:
CONCAT("Returned(",Returned,")")
Next create a PIVOT table chart with Category as row-dimension & above created calculated field as column-dimension
-
I'm trying to make a query and i'm having a bad time with one thing. Suppose I have a table that looks like this:
id
Sample
Species
Quantity
Group
1
1
AA
5
A
2
1
AB
6
A
3
1
AC
10
A
4
1
CD
15
C
5
1
CE
20
C
6
1
DA
13
D
7
1
DB
7
D
8
1
EA
6
E
9
1
EF
4
E
10
1
EB
2
E
In the table I filter to have just 1 sample (but i have many), it has the species, the quantity of that species and a functional group (there are only five groups from A to E). I would like to make a query to group by the samples and make columns of the counts of the species of certain group, something like this:
Sample
N_especies
Group A
Group B
Group C
Group D
Group E
1
10
3
0
2
2
3
So i have to count the species (thats easy) but i don't know how to make the columns of a certain group, can anyone help me?
You can use PIVOT :
Select a.Sample,[A],[B],[C],[D],[E], [B]+[A]+[C]+[D]+[E] N_especies from
(select t.Sample,t.Grp from [WS_Database].[dbo].[test1] t) t
PIVOT (
COUNT(t.Grp)
for t.Grp in ([A],[B],[C],[D],[E])
) a
I have been working with country-level survey data in Stata that I needed to reshape. I ended up exporting the .dta to a .csv and making a pivot table in in Excel but I am curious to know how to do this in Stata, as I couldn't figure it out.
Suppose we have the following data:
country response
A 1
A 1
A 2
A 2
A 1
B 1
B 2
B 2
B 1
B 1
A 2
A 2
A 1
I would like the data to be reformatted as such:
country sum_1 sum_2
A 4 4
B 3 2
First I tried a simple reshape wide command but got the error that "values of variable response not unique within country" before realizing reshape without additional steps wouldn't work anyway.
Then I tried generating new variables conditional on the value of response and trying to use reshape following that... the whole thing turned into kind of a mess so I just used Excel.
Just curious if there is a more intuitive way of doing that transformation.
If you just want a table, then just ask for one:
clear
input str1 country response
A 1
A 1
A 2
A 2
A 1
B 1
B 2
B 2
B 1
B 1
A 2
A 2
A 1
end
tabulate country response
| response
country | 1 2 | Total
-----------+----------------------+----------
A | 4 4 | 8
B | 3 2 | 5
-----------+----------------------+----------
Total | 7 6 | 13
If you want the data to be changed to this, reshape is part of the answer, but you should contract first. collapse is in several ways more versatile, but your "sum" is really a count or frequency, so contract is more direct.
contract country response, freq(sum_)
reshape wide sum_, i(country) j(response)
list
+-------------------------+
| country sum_1 sum_2 |
|-------------------------|
1. | A 4 4 |
2. | B 3 2 |
+-------------------------+
In Stata 16 up, help frames introduces frames as a way to work with multiple datasets in the same session.
I am using the LAG function to move my values one row down.
However, I need to use the same value as previous if the items in source column is duplicated:
ID | SOURCE | LAG | DESIRED OUTCOME
1 | 4 | - | -
2 | 2 | 4 | 4
3 | 3 | 2 | 2
4 | 3 | 3 | 2
5 | 3 | 3 | 2
6 | 1 | 3 | 3
7 | 4 | 1 | 1
8 | 4 | 4 | 1
As you can see, for instance in ID range 3-5 the source data doesn't change and the desired outcome should be fed from the last row with different value (so in this case ID 2).
Sql server's version of lag supports an expression in the second argument to determine how many rows back to look. You can replace this with some sort of check to not look back e.g.
select lagged = lag(data,iif(decider < 0,0,1)) over (order by id)
from (values(0,1,'dog')
,(1,2,'horse')
,(2,-1,'donkey')
,(3,2,'chicken')
,(4,23,'cow'))f(id,decider,data)
This returns the following list
null
dog
donkey
donkey
chicken
Because the decider value on the row with id of 2 was negative.
Well, first lag may not be the tool for the job. This might be easier to solve with a recursive CTE. Sql and window functions work over set. That said, our goal here is to come up with a way of describing what we want. We'd like a way to partition our data so that sequential islands of the same value are part of the same set.
One way we can do that is by using lag to help us discover if the previous row was different or not.
From there, we can now having a running sum over these change events to create partitions. Once we have partitions, we can assign a row number to each element in the partition. Finally, once we have that, we can now use the row number to look
back that many elements.
;with d as (
select * from (values
(1,4)
,(2,2)
,(3,3)
,(4,3)
,(5,3)
,(6,1)
,(7,4)
,(8,4)
)f(id,source))
select *,lag(source,rn) over (order by Id)
from (
select *,rn=row_number() over (partition by partition_id order by id)
from (
select *, partition_id = sum(change) over (order by id)
from (
select *,change = iif(lag(source) over (order by id) != source,1,0)
from d
) source_with_change
) partitioned
) row_counted
As an aside, this an absolutely cruel interview question I was asked to do once.
I am trying to condense a table which contains multiple rows per event to a smaller table which contains counts of key sub-events within each event. Events are defined based on unique combinations across columns.
As a specific example, say I have the following data involving customer visits to various stores on different dates with different items purchased:
cust date store item_type
a 1 Main St 1
a 1 Main St 2
a 1 Main St 2
a 1 Main St 2
b 1 Main St 1
b 1 Main St 2
b 1 Main St 2
c 1 Main St 1
d 2 Elm St 1
d 2 Elm St 3
e 2 Main St 1
e 2 Main St 1
a 3 Main St 1
a 3 Main St 2
I would like to restructure the data to a table that contains a single line per customer visit on a given day, with appropriate counts. I am trying to understand how to use SQLite to condense this to:
Index cust date store n_items item1 item2 item3 item4
1 a 1 Main St 4 1 3 0 0
2 b 1 Main St 3 1 2 0 0
3 c 1 Main St 1 1 0 0 0
4 d 2 Elm St 2 1 0 1 0
5 e 2 Main St 2 2 0 0 0
6 a 3 Main St 2 1 1 0 0
I can do this in excel for this trivial example (begin with sumproduct( cutomer * date) as suggested here, followed by cumulative sum on this column to generate Index, then countif and countifs to generate desired counts).
Excel is poorly suited to doing this for thousands of rows, so I am looking for a solution using SQLite.
Sadly, my SQLite kung-fu is weak.
I think this is the closest I have found, but I am having trouble understanding exactly how to adapt it.
When I tried a more basic approach to begin by generating a unique index:
CREATE UNIQUE INDEX ui ON t(cust, date);
I get:
Error: indexed columns are not unique
I would greatly appreciate any help with where to start. Many thanks in advance!
To create one result record for each unique combination of column values, use GROUP BY.
The number of records in the group is available with COUNT.
To count specific item types, use a boolean expression like item_type=x, which returns 0 or 1, and sum this over all records in the group:
SELECT cust,
date,
store,
COUNT(*) AS n_items,
SUM(item_type = 1) AS item1,
SUM(item_type = 2) AS item2,
SUM(item_type = 3) AS item3,
SUM(item_type = 4) AS item4
FROM t
GROUP BY cust,
date,
store