how to count classses in columns - sql-server

I'm trying to make a query and i'm having a bad time with one thing. Suppose I have a table that looks like this:
id
Sample
Species
Quantity
Group
1
1
AA
5
A
2
1
AB
6
A
3
1
AC
10
A
4
1
CD
15
C
5
1
CE
20
C
6
1
DA
13
D
7
1
DB
7
D
8
1
EA
6
E
9
1
EF
4
E
10
1
EB
2
E
In the table I filter to have just 1 sample (but i have many), it has the species, the quantity of that species and a functional group (there are only five groups from A to E). I would like to make a query to group by the samples and make columns of the counts of the species of certain group, something like this:
Sample
N_especies
Group A
Group B
Group C
Group D
Group E
1
10
3
0
2
2
3
So i have to count the species (thats easy) but i don't know how to make the columns of a certain group, can anyone help me?

You can use PIVOT :
Select a.Sample,[A],[B],[C],[D],[E], [B]+[A]+[C]+[D]+[E] N_especies from
(select t.Sample,t.Grp from [WS_Database].[dbo].[test1] t) t
PIVOT (
COUNT(t.Grp)
for t.Grp in ([A],[B],[C],[D],[E])
) a

Related

How to aggregate number of notes sent to each user?

Consider the following tables
group (obj_id here is user_id)
group_id obj_id role
--------------------------
100 1 A
100 2 root
100 3 B
100 4 C
notes
obj_id ref_obj_id note note_id
-------------------------------------------
1 2 10
1 3 10
1 0 foobar 10
1 4 20
1 2 20
1 0 barbaz 20
2 0 caszes 30
2 1 30
4 1 70
4 0 taz 70
4 3 70
Note: a note in the system can be assigned to multiple users (for instance: an admin could write "sent warning to 2 users" and link it to 2 user_ids). The first user the note gets linked to is stored differently than the other linked users. The note itself is linked to the first linked user only. Whenever group.obj_id = notes.obj_id then ref_obj_id = 0 and note <> null
I need to make an overview of the notes per user. Normally I would do this by joining on group.obj_id = notes.obj_idbut here this goes wrong because of ref_obj_id being 0 (in which case I should join on notes.obj_id)
There are 4 notes in this system (foobar, barbaz, caszes and taz).
The desired output is:
obj_id user_is_primary notes_primary user_is_linked notes_linked
-------------------------------------------------------------------
1 2 10;20 2 30;70
2 1 30 2 10;20
3 0 2 10;70
4 1 70 1 20
How can I get to this aggregated result?
I hope that I was able to explain the situation clearly; perhaps it is my inexperience but I find the data model not the most straightforward.
Couldn't you simply put this in the ON clause of your join?
case when notes.ref_obj_id = 0 then notes.obj_id else notes.ref_obj_id end = group.obj_id

T-SQL to sum total value instead of rejoining table multiple times

I've looked for an example question like this, I ask for grace if it's been answered (I thought it would have been but have a hard time finding meaningful results with the terms I searched.)
I work at a manufacturing plant where at ever manufacturing operation a part is issued a new serial number. The database table I have to work with has the serial number recorded in the Container field and the previous serial number the part had recorded in the From_Container field.
I'm trying to SUM the Extended_Cost column on parts we've had to re-do operations on.
Here's a sample of data from tbl_Container:
Container From_Container Extended_Cost Part_Key Operation
10 9 10 PN_100 60
9 8 10 PN_100 50
8 7 10 PN_100 40
7 6 10 PN_100 30
6 5 10 PN_100 20
5 4 10 PN_100 50
4 3 10 PN_100 40
3 2 10 PN_100 30
2 1 10 PN_100 20
1 100 10 PN_100 10
In this example the SUM I would expect returned is 40, because operations 20, 30, 40 and 50 were all re-done and cost $10 each.
So far I've been able to do this by rejoining the table to itself 10 times using aliases in the following fashion:
LEFT OUTER JOIN tbl_Container AS FCP_1
ON tbl_Container.From_Container = FCP_1.Container
AND FCP_1.Operation <= tbl_Container.Operation
AND tbl_Container.Part_Key = FCP_1.Part_Key
And then using SUM to add the Extended_Cost fields together. However, I'm violating the DRY principle and there has got to be a better way.
Thank you in advance for your help,
Me
You can try this query.
;WITH CTE AS
(
SELECT TOP 1 *, I = 0 FROM tbl_Container C ORDER BY Container
UNION ALL
SELECT T.*, I = I + 1 FROM CTE
INNER JOIN tbl_Container T
ON CTE.Container = T.From_Container
AND CTE.Part_Key = T.Part_Key
)
SELECT Part_Key, SUM(T1.Extended_Cost) Sum_Extended_Cost FROM CTE T1
WHERE
EXISTS( SELECT * FROM
CTE T2 WHERE
T1.Operation = T2.Operation
AND T1.I > T2.I )
GROUP BY Part_Key
Result:
Part_Key Sum_Extended_Cost
---------- -----------------
PN_100 40

Rowmax as new column in data table

I have rank scores of countries for different variables.
I would like to create a column with the maximum rank that occurs per row.
Say the data look something like:
A B C D E F G H I ....
V1 1 4 5 3 12 . 6 9 83
V2 . . 4 6 1 4 7 6 32
So A - X are countries. In rows V1 up you have various variables and in the cells you have the rank score relating to the variable.
Problem is that some countries for whatever reasons don´t score in relation to certain variables, perhaps because V1 is not relevant to country C or whatever.
So in the end I´d like something like
A B C D E F G H I .... newv
V1 1 4 5 3 12 . 6 9 83 83
V2 . . 4 6 1 4 7 6 5 6
I think egen newvar=rowmax(A B C D E F G H I…) does what you need. Have a look at the egen help file for more information. (I presume you need value 7 in the second row, not 6?)

Condense multiple rows to single row with counts based on unique values in sqlite

I am trying to condense a table which contains multiple rows per event to a smaller table which contains counts of key sub-events within each event. Events are defined based on unique combinations across columns.
As a specific example, say I have the following data involving customer visits to various stores on different dates with different items purchased:
cust date store item_type
a 1 Main St 1
a 1 Main St 2
a 1 Main St 2
a 1 Main St 2
b 1 Main St 1
b 1 Main St 2
b 1 Main St 2
c 1 Main St 1
d 2 Elm St 1
d 2 Elm St 3
e 2 Main St 1
e 2 Main St 1
a 3 Main St 1
a 3 Main St 2
I would like to restructure the data to a table that contains a single line per customer visit on a given day, with appropriate counts. I am trying to understand how to use SQLite to condense this to:
Index cust date store n_items item1 item2 item3 item4
1 a 1 Main St 4 1 3 0 0
2 b 1 Main St 3 1 2 0 0
3 c 1 Main St 1 1 0 0 0
4 d 2 Elm St 2 1 0 1 0
5 e 2 Main St 2 2 0 0 0
6 a 3 Main St 2 1 1 0 0
I can do this in excel for this trivial example (begin with sumproduct( cutomer * date) as suggested here, followed by cumulative sum on this column to generate Index, then countif and countifs to generate desired counts).
Excel is poorly suited to doing this for thousands of rows, so I am looking for a solution using SQLite.
Sadly, my SQLite kung-fu is weak.
I think this is the closest I have found, but I am having trouble understanding exactly how to adapt it.
When I tried a more basic approach to begin by generating a unique index:
CREATE UNIQUE INDEX ui ON t(cust, date);
I get:
Error: indexed columns are not unique
I would greatly appreciate any help with where to start. Many thanks in advance!
To create one result record for each unique combination of column values, use GROUP BY.
The number of records in the group is available with COUNT.
To count specific item types, use a boolean expression like item_type=x, which returns 0 or 1, and sum this over all records in the group:
SELECT cust,
date,
store,
COUNT(*) AS n_items,
SUM(item_type = 1) AS item1,
SUM(item_type = 2) AS item2,
SUM(item_type = 3) AS item3,
SUM(item_type = 4) AS item4
FROM t
GROUP BY cust,
date,
store

Informix subqueries with FIRST option

What is the best way of transcribing the following Transact-SQL code to Informix Dynamic Server (IDS) 9.40:
Objective: I need the first 50 orders with their respective order lines
select *
from (select top 50 * from orders) a inner join lines b
on a.idOrder = b.idOrder
My problem is with the subselect because Informix does not allow the FIRST option in the subselect.
Any simple idea?.
The official answer would be 'Please upgrade from IDS 9.40 since it is no longer supported by IBM'. That is, IDS 9.40 is not a current version - and should (ideally) not be used.
Solution for IDS 11.50
Using IDS 11.50, I can write:
SELECT *
FROM (SELECT FIRST 10 * FROM elements) AS e
INNER JOIN compound_component AS a
ON e.symbol = a.element
INNER JOIN compound AS c
ON c.compound_id = a.compound_id
;
This is more or less equivalent to your query. Consequently, if you use a current version of IDS, you can write the query using almost the same notation as in Transact-SQL (using FIRST in place of TOP).
Solution for IDS 9.40
What can you do in IDS 9.40? Excuse me a moment...I have to run up my IDS 9.40.xC7 server (this fix pack was released in 2005; the original release was probably in late 2003)...
First problem - IDS 9.40 does not allow sub-queries in the FROM clause.
Second problem - IDS 9.40 does not allow 'FIRST n' notation in either of these contexts:
SELECT FIRST 10 * FROM elements INTO TEMP e;
INSERT INTO e SELECT FIRST 10 * FROM elements;
Third problem - IDS 9.40 doesn't have a simple ROWNUM.
So, to work around these, we can write (using a temporary table - we'll remove that later):
SELECT e1.*
FROM elements AS e1, elements AS e2
WHERE e1.atomic_number >= e2.atomic_number
GROUP BY e1.atomic_number, e1.symbol, e1.name, e1.atomic_weight, e1.stable
HAVING COUNT(*) <= 10
INTO TEMP e;
SELECT *
FROM e INNER JOIN compound_component AS a
ON e.symbol = a.element
INNER JOIN compound AS c
ON c.compound_id = a.compound_id;
This produces the same answer as the single query in IDS 11.50. Can we avoid the temporary table? Yes, but it is more verbose:
SELECT e1.*, a.*, c.*
FROM elements AS e1, elements AS e2, compound_component AS a,
compound AS c
WHERE e1.atomic_number >= e2.atomic_number
AND e1.symbol = a.element
AND c.compound_id = a.compound_id
GROUP BY e1.atomic_number, e1.symbol, e1.name, e1.atomic_weight,
e1.stable, a.compound_id, a.element, a.seq_num,
a.multiplicity, c.compound_id, c.name
HAVING COUNT(*) <= 10;
Applying that to the original orders plus order lines example is left as an exercise for the reader.
Relevant subset of schema for 'Table of Elements':
-- See: http://www.webelements.com/ for elements.
-- See: http://ie.lbl.gov/education/isotopes.htm for isotopes.
CREATE TABLE elements
(
atomic_number INTEGER NOT NULL UNIQUE CONSTRAINT c1_elements
CHECK (atomic_number > 0 AND atomic_number < 120),
symbol CHAR(3) NOT NULL UNIQUE CONSTRAINT c2_elements,
name CHAR(20) NOT NULL UNIQUE CONSTRAINT c3_elements,
atomic_weight DECIMAL(8,4) NOT NULL,
stable CHAR(1) DEFAULT 'Y' NOT NULL
CHECK (stable IN ('Y', 'N'))
);
CREATE TABLE compound
(
compound_id SERIAL NOT NULL PRIMARY KEY,
name VARCHAR(100) NOT NULL UNIQUE
);
-- The sequence number is used to order the components within a compound.
CREATE TABLE compound_component
(
compound_id INTEGER REFERENCES compound,
element CHAR(3) NOT NULL REFERENCES elements(symbol),
seq_num SMALLINT DEFAULT 1 NOT NULL
CHECK (seq_num > 0 AND seq_num < 20),
multiplicity INTEGER NOT NULL
CHECK (multiplicity > 0 AND multiplicity < 20),
PRIMARY KEY(compound_id, seq_num)
);
Output (on my sample database):
1 H Hydrogen 1.0079 Y 1 H 1 2 1 water
1 H Hydrogen 1.0079 Y 3 H 2 4 3 methane
1 H Hydrogen 1.0079 Y 4 H 2 6 4 ethane
1 H Hydrogen 1.0079 Y 5 H 2 8 5 propane
1 H Hydrogen 1.0079 Y 6 H 2 10 6 butane
1 H Hydrogen 1.0079 Y 11 H 2 5 11 ethanol
1 H Hydrogen 1.0079 Y 11 H 4 1 11 ethanol
6 C Carbon 12.0110 Y 2 C 1 1 2 carbon dioxide
6 C Carbon 12.0110 Y 3 C 1 1 3 methane
6 C Carbon 12.0110 Y 4 C 1 2 4 ethane
6 C Carbon 12.0110 Y 5 C 1 3 5 propane
6 C Carbon 12.0110 Y 6 C 1 4 6 butane
6 C Carbon 12.0110 Y 7 C 1 1 7 carbon monoxide
6 C Carbon 12.0110 Y 9 C 2 1 9 magnesium carbonate
6 C Carbon 12.0110 Y 10 C 2 1 10 sodium bicarbonate
6 C Carbon 12.0110 Y 11 C 1 2 11 ethanol
8 O Oxygen 15.9990 Y 1 O 2 1 1 water
8 O Oxygen 15.9990 Y 2 O 2 2 2 carbon dioxide
8 O Oxygen 15.9990 Y 7 O 2 1 7 carbon monoxide
8 O Oxygen 15.9990 Y 9 O 3 3 9 magnesium carbonate
8 O Oxygen 15.9990 Y 10 O 3 3 10 sodium bicarbonate
8 O Oxygen 15.9990 Y 11 O 3 1 11 ethanol
If I understand your question you are having a problem with "TOP". Try using a TOP-N query.
For example:
select *
from (SELECT *
FROM foo
where foo_id=[number]
order by foo_id desc)
where rownum <= 50
This will get you the top fifty results (because I order by desc in the sub query)

Resources