PostgreSQL: find minimum value across multiple columns but return column name - arrays

For each row in a table, I want to find the minimum value across a couple of numeric columns, then take the name of that column (which holds the desired value) and populate a new column with the name (or custom string).
A few rules first in my specific scenario: the value to be found across the columns must also be > 0. Also, if no value in the column is > 0, then a custom string should be placed (ie. 'none').
For example, take this table below with columns alpha to delta storing the values:
id | alpha | bravo | charlie | delta
------+--------+--------+---------+--------
1 | 5 | 2.3 | -1 | -5
2 | 9 | 8 | 3 | 1
3 | -1 | -4 | -7 | -9
4 | 6.1 | 4 | 3.9 | 0
for each row, I want to find out which column holds the lowest positive value. My expected output is something like this:
id | alpha | bravo | charlie | delta | lowest_postive
------+--------+--------+---------+--------+---------------
1 | 5 | 2.3 | -1 | -5 | 'col: bravo'
2 | 9 | 8 | 3 | 1 | 'col: delta'
3 | -1 | -4 | -7 | -9 | 'col: none'
4 | 6.1 | 4 | 3.9 | 0 | 'col: charlie'
Should I use a CASE ... WHEN ... THEN ...? Should I be converting the row into an array first, then assinging each position in the array?

You can do:
select *,
case when mp = alpha then 'col: alpha'
when mp = bravo then 'col: bravo'
when mp = charlie then 'col: charlie'
when mp = delta then 'col: delta'
end as lower_positive
from (
select *,
least(
case when alpha > 0 then alpha end,
case when bravo > 0 then bravo end,
case when charlie > 0 then charlie end,
case when delta > 0 then delta end
) as mp
from t
) x
However, this solution doesn't account for multiple minimums; the first one (from left ro right) wins.

Related

How to create array of given length and same number?

When using SUMSERIES I need to specify "the array or range containing the coefficients of the power series" but I want to make it so the number of elements is dynamic while the element itself (1) remains the same.
Example:
SUM FROM 0 TO N of x^1,5
(cell) Length of series N : 7 -- > SUMSERIES(1,5;0;1;{1,1,1,1,1,1,1})
But I should be able to change the seven for a 3 and get --> SUMSERIES(1,5;0;1;{1,1,1})
In Java for example you'd declare and instantiate the array --> int[] arr = new int[N];
And then fill in a loop --> for(int i = 0; i <arr.length; i++) {arr[i] = 1,5}
Thanks in advance and sorry if the explanation isnĀ“t clear, it's my first time around hehe
this should work:
=SUMSERIES(1,5;0;1;SEQUENCE(1,[cell],1,0))
try:
=ARRAYFORMULA(SIGN(TRANSPOSE(ROW(INDIRECT("A1:A"&A1)))))
and then:
=INDEX(SUMSERIES(1,5; 0; 1; SIGN(TRANSPOSE(ROW(INDIRECT("A1:A"&A1))))))
In older version of Excel you can get this array using this (all of them are array formulas)
=INDEX(MUNIT(n),1,0)*0+x for horizontal array
=INDEX(MUNIT(n),0,1)*0+x for vertical array
Where:
n is dimension of the array
x is value of each item in the array
How it works:
MUNIT creates an identity matrix of size N
+---++---+---+---+---+---+
| || 1 | 2 | . | . | n |
+---++---+---+---+---+---+
+---++---+---+---+---+---+
| 1 || 1 | 0 | 0 | 0 | 0 |
| 2 || 0 | 1 | 0 | 0 | 0 |
| . || 0 | 0 | 1 | 0 | 0 |
| . || 0 | 0 | 0 | 1 | 0 |
| n || 0 | 0 | 0 | 0 | 1 |
+---++---+---+---+---+---+
Now we extract one (the first) row/column (n is set to 7 here)
=INDEX(MUNIT(7),1,0) for row extraction
=INDEX(MUNIT(7),0,1) for column extraction
And fill it with desired number (desired number here is 9 here)
=INDEX(MUNIT(7),1,0)*0+9 for row
=INDEX(MUNIT(7),0,1)*0+9 for column

Creating an excel array having unque values

I would like to clean-up an excel array by padding zeros for non-zero duplicate values.
For example, an array having arbitrary positive integers {1;0;5;0;4;0;6;4;0;5} should result in {1;0;5;0;4;0;6;0;0;0} when the non-zero duplicate elements are replaced by zeros.
A similar array say, {1;"";5;"";4;"";6;4;"";5} should result in {1;"";5;"";4;"";6;"";"";""} when the duplicate numbers are padded by null strings.
Could it be done using excel functions only?
Your kind help would be appreciated.
A B
----------------
| 1 | 1 |
---------------- Formule in B1=
| 0 | 2 |
---------------- =IFERROR(SMALL($A$2:$A$9,COUNTIF($A$2:$A$9,0)+ROW(A1)),"")
| 2 | 3 |
----------------
| 0 | 4 |
----------------
| 3 | 5 |
----------------
| 4 | 6 |
----------------
| 0 | |
----------------
| 5 | |
----------------
| 6 | |
----------------
https://support.office.com/en-us/article/substitute-function-6434944e-a904-4336-a9b0-1e58df3bc332
=SUBSTITUTE(A2,0,"")
{1;0;5;0;4;0;6;4;0;5} {1;;5;;4;;6;4;;5}

How to generate long (up to 25 millions) random sequence of integers in C (with no repetition)?

I need to generate long (pseudo)random arrays (1000-25 000 000 integers) where no element is repeated. How do I do it since rand() function does not generate numbers long enough?
I tried to use this idea: array[i] = (rand() << 14) | rand() % length; however I suppose there is much better way that I don't know.
Thank you for your help.
You can use the Fisher-Yates shuffle for this.
Create an array of n elements and populate each element sequentially.
-------------------------
| 1 | 2 | 3 | 4 | 5 | 6 |
-------------------------
In this example n is 6. Now select a random index from 0 to n-1 (i.e. rand() % n) and swap the number at that index with the number at the top of the array. Let's say the random index is 2. So we swap the value at index 2 (3) and the one at n-1 (6). Now we have:
v
-------------------------
| 1 | 2 | 6 | 4 | 5 | 3 |
-------------------------
Now we do the same, this time with the upper bound of the index being n-2. Then we swap the value at that index with the value at index n-2. Let's say time we randomly get 0. So we swap index 0 (1) with index n-2 (5):
v
-------------------------
| 5 | 2 | 6 | 4 | 1 | 3 |
-------------------------
Then repeat. Let's say the next random index is 3. This happens to be our upper limit, so no change:
v
-------------------------
| 5 | 2 | 6 | 4 | 1 | 3 |
-------------------------
Next we get 0:
v
-------------------------
| 6 | 2 | 5 | 4 | 1 | 3 |
-------------------------
And finally 1:
v
-------------------------
| 6 | 2 | 5 | 4 | 1 | 3 |
-------------------------

Using Arithmetic in SQL on my own columns to fill a third column where it is zero. (complicated, only when certain criteria is met)

So here is my question. Brace yourself as it takes some thinking just to wrap your head around what I am trying to do.
I'm working with Quarterly census employment and wage data. QCEW data has something called suppression codes. If a data denomination (comes in overall, location quotient, and over the year each year each quarter) is suppressed, then all the data for that denomination is zero. I have my table set up in the following way (only showing you columns that are relevant for the question):
A County_Id column,
Industry_ID column,
Year column,
Qtr column,
Suppressed column (0 for not suppressed and 1 for suppressed),
Data_Category column (1 for overall, 2 for lq, and 3 for over the year),
Data_Denomination column (goes 1-8 for what specific data is being looked at in that category ex: monthly employment,Taxable wage, etc. typical data),
and a value column (which will be zero if the Data_Category is suppressed - since all the data denomination values will be zero).
Now, if Overall data (cat 1) for, say, 1991 quarter 1 is suppressed, but the next year quarter 1 has both overall and over the year (cats 1 and 3) NOT suppressed, then we can infer what the value would be for that first year's suppressed data, since OTY1991q1 = (Overall1991q1 - Overall1990q1). So to find that suppressed data we would just subtract our cat 1 (denom 1-8) values from our cat 3 (denom 1-8) values to replace the zeroes that are in our suppressed values from the year before. It's fairly easy to grasp mathematically, the difficulty is that there are millions of columns with which to check for these criteria. I'm trying to write some kind of SQL query that would do this for me, check to make sure Overall-n qtr-n is suppressed, then look to see if the next year isn't for both overall and oty, (in maybe some sort of complicated case statement? Then if those criteria are met, perform the arithmetic for the two Data_Cat-Data_Denom categories and replace the zero in the respective Cat-Denom values.
Below is a simple sample (non-relevant data_cats removed) that I hope will help get what I'm trying to do across.
|CountyID IndustryID Year Qtr Suppressed Data_Cat Data_Denom Value
| 5 10 1990 1 1 1 1 0
| 5 10 1990 1 1 1 2 0
| 5 10 1990 1 1 1 3 0
| 5 10 1991 1 0 1 1 5
| 5 10 1991 1 0 1 2 15
| 5 10 1991 1 0 1 3 25
| 5 10 1991 1 0 3 1 20
| 5 10 1991 1 0 3 2 20
| 5 10 1991 1 0 3 3 35
So basically what we're trying to do here is take the overall data from each data category (I removed lq ~ data_cat 2) because it isn't relevant and data_denom (which I've narrowed down from 8 to 3 for simplicity) in 1991, subtract it from the overall 1991 value and that will give you the applicable
| value for the previous year's 1990 cat_1. So here data_cat 1 Data_denom 1 would be 15 (20-5), denom 2 would be 5(20-15), and denom 3 would be 10(35-25). (Oty 1991q1 - overall 1991q1) = 1990q1. I hope this helps. Like I said the problem isn't the math it's formulating a query that will check this criteria millions and millions of times.
If you want to find supressed data that has 2 rows of unsupressed data for the next year and quarter, we could use cross apply() to do something like this:
test setup: http://rextester.com/ORNCFR23551
using cross apply() to return rows with a valid derived value:
select t.*
, NewValue = cat3.value - cat1.value
from t
cross apply (
select i.value
from t as i
where i.CountyID = t.CountyID
and i.IndustryID = t.IndustryID
and i.Data_Denom = t.Data_Denom
and i.Year = t.Year +1
and i.Qtr = t.Qtr
and i.Suppressed = 0
and i.Data_Cat = 1
) cat1
cross apply (
select i.value
from t as i
where i.CountyID = t.CountyID
and i.IndustryID = t.IndustryID
and i.Data_Denom = t.Data_Denom
and i.Year = t.Year +1
and i.Qtr = t.Qtr
and i.Suppressed = 0
and i.Data_Cat = 3
) cat3
where t.Suppressed = 1
and t.Data_Cat = 1
returns:
+----------+------------+------+-----+------------+----------+------------+-------+----------+
| CountyID | IndustryID | Year | Qtr | Suppressed | Data_Cat | Data_Denom | Value | NewValue |
+----------+------------+------+-----+------------+----------+------------+-------+----------+
| 5 | 10 | 1990 | 1 | 1 | 1 | 1 | 0 | 15 |
| 5 | 10 | 1990 | 1 | 1 | 1 | 2 | 0 | 5 |
| 5 | 10 | 1990 | 1 | 1 | 1 | 3 | 0 | 10 |
+----------+------------+------+-----+------------+----------+------------+-------+----------+
Using outer apply() to return all rows
select t.*
, NewValue = coalesce(nullif(t.value,0),cat3.value - cat1.value,0)
from t
outer apply (
select i.value
from t as i
where i.CountyID = t.CountyID
and i.IndustryID = t.IndustryID
and i.Data_Denom = t.Data_Denom
and i.Year = t.Year +1
and i.Qtr = t.Qtr
and i.Suppressed = 0
and i.Data_Cat = 1
) cat1
outer apply (
select i.value
from t as i
where i.CountyID = t.CountyID
and i.IndustryID = t.IndustryID
and i.Data_Denom = t.Data_Denom
and i.Year = t.Year +1
and i.Qtr = t.Qtr
and i.Suppressed = 0
and i.Data_Cat = 3
) cat3
returns:
+----------+------------+------+-----+------------+----------+------------+-------+----------+
| CountyID | IndustryID | Year | Qtr | Suppressed | Data_Cat | Data_Denom | Value | NewValue |
+----------+------------+------+-----+------------+----------+------------+-------+----------+
| 5 | 10 | 1990 | 1 | 1 | 1 | 1 | 0 | 15 |
| 5 | 10 | 1990 | 1 | 1 | 1 | 2 | 0 | 5 |
| 5 | 10 | 1990 | 1 | 1 | 1 | 3 | 0 | 10 |
| 5 | 10 | 1991 | 1 | 0 | 1 | 1 | 5 | 5 |
| 5 | 10 | 1991 | 1 | 0 | 1 | 2 | 15 | 15 |
| 5 | 10 | 1991 | 1 | 0 | 1 | 3 | 25 | 25 |
| 5 | 10 | 1991 | 1 | 0 | 3 | 1 | 20 | 20 |
| 5 | 10 | 1991 | 1 | 0 | 3 | 2 | 20 | 20 |
| 5 | 10 | 1991 | 1 | 0 | 3 | 3 | 35 | 35 |
+----------+------------+------+-----+------------+----------+------------+-------+----------+
UPDATE 1 - fixed some column names
UPDATE 2 - improved aliases in 2nd query
Ok, I think I get it.
If you're just wanting to make that one inference, then the following may help. (If this is just the first of many inferences you want to make in filling data gaps, you may find that a different method leads to a more efficient solution for doing both/all of them, but I guess cross that bridge when you get there...)
While much of the basic logic stays the same, how you'd tweak it depends on whether you want a query just to provide the values you would infer (e.g. to drive an UPDATE statement), or whether you want to use this logic inline in a bigger query. For performance reasons, I suspect the former makes more sense (especially if you can do the update once and then read the resulting dataset many times), so I'll start by framing things that way and come back to the other in a moment...
It sounds like you have a single table (I'll call it QCEW) with all these columns. In that case, use joins to associate each suppressed overall datapoint (c_oa in the following code) with the corresponding overall and oty datapoints from a year later:
SELECT c_oa.*, n_oa.value - n_oty.value inferred_value
FROM QCEW c_oa --current yr/qtr overall
inner join QCEW n_oa --next yr (same qtr) overall
on c_oa.countyId = n_oa.countyId
and c_oa.industryId = n_oa.industryId
and c_oa.year = n_oa.year - 1
and c_oa.qtr = n_oa.qtr
and c_oa.data_denom = n_oa.data_denom
inner join QCEW n_oty --next yr (same qtr) over-the-year
on c_oa.countyId = n_oty.countyId
and c_oa.industryId = n_oty.industryId
and c_oa.year = n_oty.year - 1
and c_oa.qtr = n_oty.qtr
and c_oa.data_denom = n_oty.data_denom
WHERE c_oa.SUPPRESSED = 1
AND c_oa.DATA_CAT = 1
AND n_oa.SUPPRESSED = 0
AND n_oa.DATA_CAT = 1
AND n_oty.SUPPRESSED = 0
AND n_oty.DATA_CAT = 3
Now it sounds like the table is big, and we've just joined 3 instances of it; so for this to work you'll need good physical design (appropriate indexes/stats for join columns, etc.). And that's why I'd suggest doing an update based on the above query once; sure, it may run long, but then you can read the inferred values in no time.
But if you really want to merge this directly into a query of the data you could modify it some to show all values, with inferred values mixed in. We need to switch to outer joins to do this, and I'm going to do some slightly weird things with join conditions to make it fit together:
SELECT src.COUNTYID
, src.INDUSTRYID
, src.YEAR
, src.QTR
, case when (n_oa.value - n_oty.value) is null
then src.suppressed
else 2
end as SUPPRESSED_CODE -- 0=NOT SUPPRESSED, 1=SUPPRESSED, 2=INFERRED
, src.DATA_CAT
, src.DATA_DENOM
, coalesce(n_oa.value - n_oty.value, src.value) as VALUE
FROM QCEW src --a source row from which we'll generate a record
left join QCEW n_oa --next yr (same qtr) overall (if src is suppressed/overall)
on src.countyId = n_oa.countyId
and src.industryId = n_oa.industryId
and src.year = n_oa.year - 1
and src.qtr = n_oa.qtr
and src.data_denom = n_oa.data_denom
and src.SUPPRESSED = 1 and n_oa.SUPPRESSED = 0
and src.DATA_CAT = 1 and n_oa.DATA_CAT = 1
left join QCEW n_oty --next yr (same qtr) over-the-year (if src is suppressed/overall)
on src.countyId = n_oty.countyId
and src.industryId = n_oty.industryId
and src.year = n_oty.year - 1
and src.qtr = n_oty.qtr
and src.data_denom = n_oty.data_denom
and src.SUPPRESSED = 1 and n_oty.SUPPRESSED = 0
and src.DATA_CAT = 1 and n_oty.DATA_CAT = 3

Fill sequence in sql rows

I have a table that stores a group of attributes and keeps them ordered in a sequence. The chance exists that one of the attributes (rows) could be deleted from the table, and the sequence of positions should be compacted.
For instance, if I originally have these set of values:
+----+--------+-----+
| id | name | pos |
+----+--------+-----+
| 1 | one | 1 |
| 2 | two | 2 |
| 3 | three | 3 |
| 4 | four | 4 |
+----+--------+-----+
And the second row was deleted, the position of all subsequent rows should be updated to close the gaps. The result should be this:
+----+--------+-----+
| id | name | pos |
+----+--------+-----+
| 1 | one | 1 |
| 3 | three | 2 |
| 4 | four | 3 |
+----+--------+-----+
Is there a way to do this update in a single query? How could I do this?
PS: I'd appreciate examples for both SQLServer and Oracle, since the system is supposed to support both engines. Thanks!
UPDATE: The reason for this is that users are allowed to modify the positions at will, as well as adding or deleting new rows. Positions are shown to the user, and for that reason, these should show a consistence sequence at all times (and this sequence must be stored, and not generated on demand).
Not sure it works, But with Oracle I would try the following:
update my_table set pos = rownum;
this would work but may be suboptimal for large datasets:
SQL> UPDATE my_table t
2 SET pos = (SELECT COUNT(*) FROM my_table WHERE id <= t.id);
3 rows updated
SQL> select * from my_table;
ID NAME POS
---------- ---------- ----------
1 one 1
3 three 2
4 four 3
Do you really need the sequence values to be contiguous, or do you just need to be able to display the contiguous values? The easiest way to do this is to let the actual sequence become sparse and calculate the rank based on the order:
select id,
name,
dense_rank() over (order by pos) as pos,
pos as sparse_pos
from my_table
(note: this is an Oracle-specific query)
If you make the position sparse in the first place, this would even make re-ordering easier, since you could make each new position halfway between the two existing ones. For instance, if you had a table like this:
+----+--------+-----+
| id | name | pos |
+----+--------+-----+
| 1 | one | 100 |
| 2 | two | 200 |
| 3 | three | 300 |
| 4 | four | 400 |
+----+--------+-----+
When it becomes time to move ID 4 into position 2, you'd just change the position to 150.
Further explanation:
Using the above example, the user initially sees the following (because you're masking the position):
+----+--------+-----+
| id | name | pos |
+----+--------+-----+
| 1 | one | 1 |
| 2 | two | 2 |
| 3 | three | 3 |
| 4 | four | 4 |
+----+--------+-----+
When the user, through your interface, indicates that the record in position 4 needs to be moved to position 2, you update the position of ID 4 to 150, then re-run your query. The user sees this:
+----+--------+-----+
| id | name | pos |
+----+--------+-----+
| 1 | one | 1 |
| 4 | four | 2 |
| 2 | two | 3 |
| 3 | three | 4 |
+----+--------+-----+
The only reason this wouldn't work is if the user is editing the data directly in the database. Though, even in that case, I'd be inclined to use this kind of solution, via views and instead-of triggers.

Resources