Combinations of two elements using multiple columns with postgresql - database

I want to get the pondered value of different combinations of only 4 variables for each row.
The thing is, I have a database like this with different possible weights w for each inc_id
id|inc_id | sem_90 | sem_85 | sem_80 | t_90 | t_85 | t_80 | time | total | w1 | w2 | w3 | w4
1 A 0.01 0.08 0.09 0 0 0.001 0.99 0.006 0 0.1 0.01 0.08
2 A 0.01 0.08 0.09 0 0 0.001 0.99 0.006 0 0.1 0.02 0.07
3 B ...
4 B ...
5 C ...
and I need to create a new column with a pondered value for each weight in inc_id like:
(sem_90 * w1) + (t_90 * w2) + (time * w3) + (total * w4) but for all the possible combinations of the sem_ and t_ variables for each inc_id like:
(sem_90 * w1) + (t_85 * w2) + (time * w3) + (total * w4)
(sem_90 * w1) + (t_80 * w2) + (time * w3) + (total * w4)
etc
So my final data should look like this
inc_id | combination | w1 | w2 | w3 | w4 | pondered_value |
A sem_90 - t90 0 0.1 0.01 0.08 0.0147
A sem_90 - t85 0 0.1 0.01 0.08 0.0147
A sem_90 - t80 0 0.1 0.01 0.08 0.0148
A sem_85 - t90 0 0.1 0.01 0.08 0.0147
A ...
A sem_90 - t90 0 0.1 0.02 0.07 0.024
A sem_90 - t85 0 0.1 0.02 0.07 0.024
A ...
B sem_90 - t_90 ...
Is this possible to do this with a query in a postgre database?

You can use lateral values() table joins to multiply the source rows so you get one row per sem_* + t* combination.
Something like this:
select src.inc_id
,sem.lbl || ' - ' || t.lbl as combination
,src.w1,src.w2,src.w3,src.w4
,sem.val * src.w1 + t.val * src.w2 + src."time" * src.w3 + src.total * src.w4
from sometable src -- change "sometable" to the name of your table
cross join lateral (values ('sem_90',sem_90),('sem_85',sem_85),('sem_80',sem_80)) sem(lbl,val)
cross join lateral (values ( 't_90', t_90),( 't_85', t_85),( 't_80', t_80)) t(lbl,val)

Related

SUM of Multi Columns with group

I have a big complex query with 40+ column. The result of the query look like this:
itemCode
itemName
customerCode
customerName
fac1
fac2
sum(fac2)
fac3
sum(fac3)
001
ABCD
0023
Dummy 1
0.25
0.25
0.75
0.1
0.5
001
ABCD
0024
Dummy 2
0.25
0.25
0.75
0.2
0.5
001
ABCD
0025
Dummy 3
0.50
0.25
0.75
0.2
0.5
002
EFGH
0023
Dummy 1
0.20
0.52
0.52
0.1
0.1
003
MNOP
0023
Dummy 1
0.50
0.75
1.25
0.3
0.7
003
MNOP
0024
Dummy 2
0.20
0.50
1.25
0.4
0.7
I want the individual values (i.e fac columns) and SUM of some of these columns (like column sum(fac2) above). The problem here is that the sum should be group by itemCode and not with all values (not with customer code as well). I don't mind sum values repeated in sum(fac2) or sum(fac3) columns. Thanks in advance.
I got it through below:
sumFac2 = SUM(fac2) OVER (PARTITION BY itemCode)

SQL Server : adding rows for each row?

I have a table in SQL Server like this:
Col1 Col2 Col3
----- ---- -----
1 1 1
0.5 0.5 2
0.3 0.1 3
What I would like to do is that for each value in Col 3, so 1,2,3, add a 4th column that contains the numbers 1-53 in sequence. So, something like:
Col1 Col2 Col3 Col 4
----- ---- ----- ------
1 1 1 1
1 1 1 2
1 1 1 3
And so forth.
How could I accomplish this in T-SQL / Microsoft SQL Server 2016?
Thanks!
Are these the results you're trying to get?
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
Col1 DECIMAL(9,1) NOT NULL,
Col2 DECIMAL(9,1) NOT NULL,
Col3 INT NOT NULL
);
INSERT #TestData (Col1, Col2, Col3) VALUES
(1, 1 ,1), (0.5,0.5,2), (0.3,0.1,3);
SELECT
td.Col1, td.Col2, td.Col3, Col4 = t.n
FROM
#TestData td
CROSS APPLY dbo.tfn_Tally(53, 1) t;
Results...
Col1 Col2 Col3 Col4
----- ----- ---- -----
1.0 1.0 1 1
0.5 0.5 2 1
0.3 0.1 3 1
1.0 1.0 1 2
0.5 0.5 2 2
0.3 0.1 3 2
1.0 1.0 1 3
0.5 0.5 2 3
0.3 0.1 3 3
1.0 1.0 1 4
0.5 0.5 2 4
0.3 0.1 3 4
1.0 1.0 1 5
0.5 0.5 2 5
0.3 0.1 3 5
1.0 1.0 1 6
0.5 0.5 2 6
0.3 0.1 3 6
1.0 1.0 1 7
0.5 0.5 2 7
0.3 0.1 3 7
1.0 1.0 1 8
0.5 0.5 2 8
0.3 0.1 3 8
1.0 1.0 1 9
0.5 0.5 2 9
0.3 0.1 3 9
1.0 1.0 1 10
0.5 0.5 2 10
0.3 0.1 3 10
1.0 1.0 1 11
0.5 0.5 2 11
0.3 0.1 3 11
1.0 1.0 1 12
0.5 0.5 2 12
0.3 0.1 3 12
1.0 1.0 1 13
0.5 0.5 2 13
0.3 0.1 3 13
1.0 1.0 1 14
0.5 0.5 2 14
0.3 0.1 3 14
1.0 1.0 1 15
0.5 0.5 2 15
0.3 0.1 3 15
1.0 1.0 1 16
0.5 0.5 2 16
0.3 0.1 3 16
1.0 1.0 1 17
0.5 0.5 2 17
0.3 0.1 3 17
1.0 1.0 1 18
0.5 0.5 2 18
0.3 0.1 3 18
1.0 1.0 1 19
0.5 0.5 2 19
0.3 0.1 3 19
1.0 1.0 1 20
0.5 0.5 2 20
0.3 0.1 3 20
1.0 1.0 1 21
0.5 0.5 2 21
0.3 0.1 3 21
1.0 1.0 1 22
0.5 0.5 2 22
0.3 0.1 3 22
1.0 1.0 1 23
0.5 0.5 2 23
0.3 0.1 3 23
1.0 1.0 1 24
0.5 0.5 2 24
0.3 0.1 3 24
1.0 1.0 1 25
0.5 0.5 2 25
0.3 0.1 3 25
1.0 1.0 1 26
0.5 0.5 2 26
0.3 0.1 3 26
1.0 1.0 1 27
0.5 0.5 2 27
0.3 0.1 3 27
1.0 1.0 1 28
0.5 0.5 2 28
0.3 0.1 3 28
1.0 1.0 1 29
0.5 0.5 2 29
0.3 0.1 3 29
1.0 1.0 1 30
0.5 0.5 2 30
0.3 0.1 3 30
1.0 1.0 1 31
0.5 0.5 2 31
0.3 0.1 3 31
1.0 1.0 1 32
0.5 0.5 2 32
0.3 0.1 3 32
1.0 1.0 1 33
0.5 0.5 2 33
0.3 0.1 3 33
1.0 1.0 1 34
0.5 0.5 2 34
0.3 0.1 3 34
1.0 1.0 1 35
0.5 0.5 2 35
0.3 0.1 3 35
1.0 1.0 1 36
0.5 0.5 2 36
0.3 0.1 3 36
1.0 1.0 1 37
0.5 0.5 2 37
0.3 0.1 3 37
1.0 1.0 1 38
0.5 0.5 2 38
0.3 0.1 3 38
1.0 1.0 1 39
0.5 0.5 2 39
0.3 0.1 3 39
1.0 1.0 1 40
0.5 0.5 2 40
0.3 0.1 3 40
1.0 1.0 1 41
0.5 0.5 2 41
0.3 0.1 3 41
1.0 1.0 1 42
0.5 0.5 2 42
0.3 0.1 3 42
1.0 1.0 1 43
0.5 0.5 2 43
0.3 0.1 3 43
1.0 1.0 1 44
0.5 0.5 2 44
0.3 0.1 3 44
1.0 1.0 1 45
0.5 0.5 2 45
0.3 0.1 3 45
1.0 1.0 1 46
0.5 0.5 2 46
0.3 0.1 3 46
1.0 1.0 1 47
0.5 0.5 2 47
0.3 0.1 3 47
1.0 1.0 1 48
0.5 0.5 2 48
0.3 0.1 3 48
1.0 1.0 1 49
0.5 0.5 2 49
0.3 0.1 3 49
1.0 1.0 1 50
0.5 0.5 2 50
0.3 0.1 3 50
1.0 1.0 1 51
0.5 0.5 2 51
0.3 0.1 3 51
1.0 1.0 1 52
0.5 0.5 2 52
0.3 0.1 3 52
1.0 1.0 1 53
0.5 0.5 2 53
0.3 0.1 3 53
You'll have to invent a fake table with numbers in:
WITH nums as(
SELECT 1 as num
UNION ALL
SELECT num + 1 FROM nums
WHERE num <= 53
)
SELECT yourtable.*, num as col4 FROM
Yourtable
CROSS JOIN
nums
You can use below code. There are many ways to generate sequence (you can store it in temp table or use cte)
CREATE TABLE temp
(
Col1 DECIMAL(10,1),
Col2 DECIMAL(10,1),
Col3 INT
)
INSERT INTO temp
VALUES
(1,1,1)
,(0.5,0.5,2)
,(0.3,0.1,3)
DECLARE #Start INT =1
, #ENd INT = 53
SELECT
t.*
, seq.n AS Col4
FROM temp t
CROSS APPLY
(
SELECT DISTINCT n = number
FROM master..[spt_values]
WHERE number BETWEEN #start AND #end
) seq
RESULT:
Col1 Col2 Col3 Col4
--------------------------------------- --------------------------------------- ----------- -----------
1.0 1.0 1 1
1.0 1.0 1 2
1.0 1.0 1 3
1.0 1.0 1 4
1.0 1.0 1 5
1.0 1.0 1 6
1.0 1.0 1 7
1.0 1.0 1 8
1.0 1.0 1 9
1.0 1.0 1 10
1.0 1.0 1 11
1.0 1.0 1 12
1.0 1.0 1 13
1.0 1.0 1 14
1.0 1.0 1 15
1.0 1.0 1 16
1.0 1.0 1 17
1.0 1.0 1 18
1.0 1.0 1 19
1.0 1.0 1 20
1.0 1.0 1 21
1.0 1.0 1 22
1.0 1.0 1 23
1.0 1.0 1 24
1.0 1.0 1 25
1.0 1.0 1 26
1.0 1.0 1 27
1.0 1.0 1 28
1.0 1.0 1 29
1.0 1.0 1 30
1.0 1.0 1 31
1.0 1.0 1 32
1.0 1.0 1 33
1.0 1.0 1 34
1.0 1.0 1 35
1.0 1.0 1 36
1.0 1.0 1 37
1.0 1.0 1 38
1.0 1.0 1 39
1.0 1.0 1 40
1.0 1.0 1 41
1.0 1.0 1 42
1.0 1.0 1 43
1.0 1.0 1 44
1.0 1.0 1 45
1.0 1.0 1 46
1.0 1.0 1 47
1.0 1.0 1 48
1.0 1.0 1 49
1.0 1.0 1 50
1.0 1.0 1 51
1.0 1.0 1 52
1.0 1.0 1 53
0.5 0.5 2 1
0.5 0.5 2 2
0.5 0.5 2 3
0.5 0.5 2 4
0.5 0.5 2 5
0.5 0.5 2 6
and so on...

SQL Server : how to map decimals to corrected values

I have a situation where I get trip data from another company. The other company measures fuel with a precision of ⅛ gallon.
I get data from the other company and store it in my SQL Server table. The aggregated fuel amounts aren't right. I discovered that while the other company stores fuel in 1/8 gallons, it was sending me only one decimal place.
Furthermore, thanks to this post, I've determined that the company isn't rounding the values to the nearest tenth but is instead truncating them.
Query:
/** Fuel Fractions **/
SELECT DISTINCT ([TotalFuelUsed] % 1) AS [TotalFuelUsedDecimals]
FROM [Raw]
ORDER BY [TotalFuelUsedDecimals]
Results:
TotalFuelUsedDecimals
0.00
0.10
0.20
0.30
0.50
0.60
0.70
0.80
What I'd like is an efficient way to add a corrected fuel column to my views which would map as follows:
0.00 → 0.000
0.10 → 0.125
0.20 → 0.250
0.30 → 0.375
0.50 → 0.500
0.60 → 0.625
0.70 → 0.750
0.80 → 0.875
1.80 → 1.875
and so on
I'm new to SQL so please be kind.
Server is running Microsoft SQL Server 2008. But if you know a way better function only supported by newer SQL Server, please post it too because we may upgrade someday soon and it may help others.
Also, if it makes any difference, there are several different fuel columns in the table that I'll be correcting.
While writing up the question, I tried the following method using a temp table and multiple joins which seemed to work. I expect there are better solutions out there to be had.
CREATE TABLE #TempMap
([from] decimal(18,2), [to] decimal(18,3))
;
INSERT INTO #TempMap
([from], [to])
VALUES
(0.0, 0.000),
(0.1, 0.125),
(0.2, 0.250),
(0.3, 0.375),
(0.5, 0.500),
(0.6, 0.625),
(0.7, 0.750),
(0.8, 0.875)
;
SELECT [TotalFuelUsed]
,[TotalFuelCorrect].[to] + ROUND([TotalFuelUsed], 0, 1) AS [TotalFuelUsedCorrected]
,[IdleFuelUsed]
,[IdleFuelCorrect].[to] + ROUND([IdleFuelUsed], 0, 1) AS [IdleFuelUsedCorrected]
FROM [Raw]
JOIN [#TempMap] AS [TotalFuelCorrect] ON [TotalFuelUsed] % 1 = [TotalFuelCorrect].[from]
JOIN [#TempMap] AS [IdleFuelCorrect] ON [IdleFuelUsed] % 1 = [IdleFuelCorrect].[from]
ORDER BY [TotalFuelUsed] DESC
DROP TABLE #TempMap;
Try adding a column as:
select ....
, case when right(cast([TotalFuelUsed] as decimal(12,1)), 1) = 1 then [TotalFuelUsed] + 0.025
when right(cast([TotalFuelUsed] as decimal(12,1)), 1) = 2 then [TotalFuelUsed] + 0.05
when right(cast([TotalFuelUsed] as decimal(12,1)), 1) = 3 then [TotalFuelUsed] + 0.075
when right(cast([TotalFuelUsed] as decimal(12,1)), 1) = 6 then [TotalFuelUsed] + 0.025
when right(cast([TotalFuelUsed] as decimal(12,1)), 1) = 7 then [TotalFuelUsed] + 0.05
when right(cast([TotalFuelUsed] as decimal(12,1)), 1) = 8 then [TotalFuelUsed] + 0.075
else [TotalFuelUsed] end as updatedTotalFuelUsed

Find index in vector where the values start to decrease strictly monotonic

I have bunch of numbers [a b] like
A = [0 0.001;
0.01 2 ;
0.02 0.5 ;
0.03 0.4 ;
0.04 0.9 ;
0.05 0.7 ;
0.06 0.5 ;
0.07 0.8 ;
0.08 0.8 ;
0.09 0.8 ;
0.10 0.3 ;
0.11 0.1 ;
0.12 0.05 ]
I want to find the last value in b series which after that b value, series decent for example here the answer is [0.04 0.8].
the matrix is really big and I don't need to sort its values.
the matrix form should stay intact.
What about:
A = [0 0.001;
0.01 2 ;
0.02 0.5 ;
0.03 0.4 ;
0.04 0.9 ;
0.05 0.7 ;
0.06 0.5 ;
0.07 0.8 ;
0.08 0.8 ;
0.09 0.8 ;
0.10 0.3 ;
0.11 0.1 ;
0.12 0.05 ]
X = find( diff(A(:,2)) > 0 ,1,'last') + 1
out = A(X,:)
returns:
X = 8
out = 0.0700 0.8000

R extension in C, setting matrix row/column names

I'm writing an R package that manipulates Matrices in C. Currently, the matrices returned to R have numbers for the row/column names. I would rather assign my own row/column names when modifying the object in C.
I've googled around for about an hour, but haven't found a good solution yet. The closest I've found is dimnames, but I want to name each column, not just the two dimensions. The matrices get larger than 4x4, below is just a small example of what I want to do.
The number of rows is 4^x where X is the length of the row name
Current
[,1] [,2] [,3] [,4]
[1,] 0.20 0.00 0.00 0.80
[2,] 0.25 0.25 0.25 0.25
[3,] 0.25 0.25 0.25 0.25
[4,] 1.00 0.00 0.00 0.00
[5,] 0.20 0.00 0.00 0.80
[6,] 0.25 0.25 0.25 0.25
[7,] 0.25 0.25 0.25 0.25
[8,] 1.00 0.00 0.00 0.00
[9,] 0.20 0.00 0.00 0.80
[10,] 0.25 0.25 0.25 0.25
[11,] 0.25 0.25 0.25 0.25
[12,] 1.00 0.00 0.00 0.00
[13,] 0.20 0.00 0.00 0.80
[14,] 0.25 0.25 0.25 0.25
[15,] 0.25 0.25 0.25 0.25
[16,] 1.00 0.00 0.00 0.00
Desired
[A] [C] [G] [T]
[AA] 0.20 0.00 0.00 0.80
[AC] 0.25 0.25 0.25 0.25
[AG] 0.25 0.25 0.25 0.25
[AT] 1.00 0.00 0.00 0.00
[CA] 0.20 0.00 0.00 0.80
[CC] 0.25 0.25 0.25 0.25
[CG] 0.25 0.25 0.25 0.25
[CT] 1.00 0.00 0.00 0.00
[GA] 0.20 0.00 0.00 0.80
[GC] 0.25 0.25 0.25 0.25
[GG] 0.25 0.25 0.25 0.25
[GT] 1.00 0.00 0.00 0.00
[TA] 0.20 0.00 0.00 0.80
[TC] 0.25 0.25 0.25 0.25
[TG] 0.25 0.25 0.25 0.25
[TT] 1.00 0.00 0.00 0.00
If you are open to C++ instead of C, then Rcpp can make this a little easier. We just create a list object with rows and column names as we would in R, and assign that to the dimnames attribute of the matrix object:
R> library(inline) # to compile, link, load the code here
R> src <- '
+ Rcpp::NumericMatrix x(2,2);
+ x.fill(42); // or more interesting values
+ // C++0x can assign a set of values to a vector, but we use older standard
+ Rcpp::CharacterVector rows(2); rows[0] = "aa"; rows[1] = "bb";
+ Rcpp::CharacterVector cols(2); cols[0] = "AA"; cols[1] = "BB";
+ // now create an object "dimnms" as a list with rows and cols
+ Rcpp::List dimnms = Rcpp::List::create(rows, cols);
+ // and assign it
+ x.attr("dimnames") = dimnms;
+ return(x);
+ '
R> fun <- cxxfunction(signature(), body=src, plugin="Rcpp")
R> fun()
AA BB
aa 42 42
bb 42 42
R>
The actual assignment of the column and row names is so manual ... because the current C++ standard does not allow direct assignment of vectors at initialization, but that will change.
Edit: I just realized that I can of course use static create() method on the row and colnames too, which makes this a little easier and shorter still
R> src <- '
+ Rcpp::NumericMatrix x(2,2);
+ x.fill(42); // or more interesting values
+ Rcpp::List dimnms = // two vec. with static names
+ Rcpp::List::create(Rcpp::CharacterVector::create("cc", "dd"),
+ Rcpp::CharacterVector::create("ee", "ff"));
+ // and assign it
+ x.attr("dimnames") = dimnms;
+ return(x);
+ '
R> fun <- cxxfunction(signature(), body=src, plugin="Rcpp")
R> fun()
ee ff
cc 42 42
dd 42 42
R>
So we are down to three or four statements, no monkeying with PROTECT / UNPROTECT and no memory management.
As Jim said, this is much easier to do in R. I'm passing the names into the C function via the nam argument.
#include <Rinternals.h>
SEXP myMat(SEXP nam) {
/*PrintValue(nam);*/
SEXP ans, dimnames;
PROTECT(ans = allocMatrix(REALSXP, length(nam), length(nam)));
PROTECT(dimnames = allocVector(VECSXP, 2));
SET_VECTOR_ELT(dimnames, 0, nam);
SET_VECTOR_ELT(dimnames, 1, nam);
setAttrib(ans, R_DimNamesSymbol, dimnames);
UNPROTECT(2);
return(ans);
}
If you put that code in a file called myMat.c, you can test it via the line below. I'm using Ubuntu, so you will have to change myMat.so to myMat.dll if you're on Windows.
R CMD SHLIB myMat.c
Rscript -e 'dyn.load("myMat.so"); .Call("myMat", c("A","C","G","T"))'
The note above is instructive. The dimnames is a list with the same number of elements as dimensions of the dataset, where each element corresponds to the number elements along that dimension, i.e., list(c('a','c','g','t'), c('a','c','g','t')).
To set that in C, I would recommend:
PROTECT(dimnames = allocVector(VECSXP, 2));
PROTECT(rownames = allocVector(STRSXP, 4));
PROTECT(colnames = allocVector(STRSXP, 4));
setAttrib( ? , R_DimNamesSymbol, dimnames);
You'll have to then set the relevant rowname and colname elements. In general, this stuff is much easier to do in R.
jim

Resources