I have a big complex query with 40+ column. The result of the query look like this:
itemCode
itemName
customerCode
customerName
fac1
fac2
sum(fac2)
fac3
sum(fac3)
001
ABCD
0023
Dummy 1
0.25
0.25
0.75
0.1
0.5
001
ABCD
0024
Dummy 2
0.25
0.25
0.75
0.2
0.5
001
ABCD
0025
Dummy 3
0.50
0.25
0.75
0.2
0.5
002
EFGH
0023
Dummy 1
0.20
0.52
0.52
0.1
0.1
003
MNOP
0023
Dummy 1
0.50
0.75
1.25
0.3
0.7
003
MNOP
0024
Dummy 2
0.20
0.50
1.25
0.4
0.7
I want the individual values (i.e fac columns) and SUM of some of these columns (like column sum(fac2) above). The problem here is that the sum should be group by itemCode and not with all values (not with customer code as well). I don't mind sum values repeated in sum(fac2) or sum(fac3) columns. Thanks in advance.
I got it through below:
sumFac2 = SUM(fac2) OVER (PARTITION BY itemCode)
Related
I want to get the pondered value of different combinations of only 4 variables for each row.
The thing is, I have a database like this with different possible weights w for each inc_id
id|inc_id | sem_90 | sem_85 | sem_80 | t_90 | t_85 | t_80 | time | total | w1 | w2 | w3 | w4
1 A 0.01 0.08 0.09 0 0 0.001 0.99 0.006 0 0.1 0.01 0.08
2 A 0.01 0.08 0.09 0 0 0.001 0.99 0.006 0 0.1 0.02 0.07
3 B ...
4 B ...
5 C ...
and I need to create a new column with a pondered value for each weight in inc_id like:
(sem_90 * w1) + (t_90 * w2) + (time * w3) + (total * w4) but for all the possible combinations of the sem_ and t_ variables for each inc_id like:
(sem_90 * w1) + (t_85 * w2) + (time * w3) + (total * w4)
(sem_90 * w1) + (t_80 * w2) + (time * w3) + (total * w4)
etc
So my final data should look like this
inc_id | combination | w1 | w2 | w3 | w4 | pondered_value |
A sem_90 - t90 0 0.1 0.01 0.08 0.0147
A sem_90 - t85 0 0.1 0.01 0.08 0.0147
A sem_90 - t80 0 0.1 0.01 0.08 0.0148
A sem_85 - t90 0 0.1 0.01 0.08 0.0147
A ...
A sem_90 - t90 0 0.1 0.02 0.07 0.024
A sem_90 - t85 0 0.1 0.02 0.07 0.024
A ...
B sem_90 - t_90 ...
Is this possible to do this with a query in a postgre database?
You can use lateral values() table joins to multiply the source rows so you get one row per sem_* + t* combination.
Something like this:
select src.inc_id
,sem.lbl || ' - ' || t.lbl as combination
,src.w1,src.w2,src.w3,src.w4
,sem.val * src.w1 + t.val * src.w2 + src."time" * src.w3 + src.total * src.w4
from sometable src -- change "sometable" to the name of your table
cross join lateral (values ('sem_90',sem_90),('sem_85',sem_85),('sem_80',sem_80)) sem(lbl,val)
cross join lateral (values ( 't_90', t_90),( 't_85', t_85),( 't_80', t_80)) t(lbl,val)
I have a table in SQL Server like this:
Col1 Col2 Col3
----- ---- -----
1 1 1
0.5 0.5 2
0.3 0.1 3
What I would like to do is that for each value in Col 3, so 1,2,3, add a 4th column that contains the numbers 1-53 in sequence. So, something like:
Col1 Col2 Col3 Col 4
----- ---- ----- ------
1 1 1 1
1 1 1 2
1 1 1 3
And so forth.
How could I accomplish this in T-SQL / Microsoft SQL Server 2016?
Thanks!
Are these the results you're trying to get?
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
Col1 DECIMAL(9,1) NOT NULL,
Col2 DECIMAL(9,1) NOT NULL,
Col3 INT NOT NULL
);
INSERT #TestData (Col1, Col2, Col3) VALUES
(1, 1 ,1), (0.5,0.5,2), (0.3,0.1,3);
SELECT
td.Col1, td.Col2, td.Col3, Col4 = t.n
FROM
#TestData td
CROSS APPLY dbo.tfn_Tally(53, 1) t;
Results...
Col1 Col2 Col3 Col4
----- ----- ---- -----
1.0 1.0 1 1
0.5 0.5 2 1
0.3 0.1 3 1
1.0 1.0 1 2
0.5 0.5 2 2
0.3 0.1 3 2
1.0 1.0 1 3
0.5 0.5 2 3
0.3 0.1 3 3
1.0 1.0 1 4
0.5 0.5 2 4
0.3 0.1 3 4
1.0 1.0 1 5
0.5 0.5 2 5
0.3 0.1 3 5
1.0 1.0 1 6
0.5 0.5 2 6
0.3 0.1 3 6
1.0 1.0 1 7
0.5 0.5 2 7
0.3 0.1 3 7
1.0 1.0 1 8
0.5 0.5 2 8
0.3 0.1 3 8
1.0 1.0 1 9
0.5 0.5 2 9
0.3 0.1 3 9
1.0 1.0 1 10
0.5 0.5 2 10
0.3 0.1 3 10
1.0 1.0 1 11
0.5 0.5 2 11
0.3 0.1 3 11
1.0 1.0 1 12
0.5 0.5 2 12
0.3 0.1 3 12
1.0 1.0 1 13
0.5 0.5 2 13
0.3 0.1 3 13
1.0 1.0 1 14
0.5 0.5 2 14
0.3 0.1 3 14
1.0 1.0 1 15
0.5 0.5 2 15
0.3 0.1 3 15
1.0 1.0 1 16
0.5 0.5 2 16
0.3 0.1 3 16
1.0 1.0 1 17
0.5 0.5 2 17
0.3 0.1 3 17
1.0 1.0 1 18
0.5 0.5 2 18
0.3 0.1 3 18
1.0 1.0 1 19
0.5 0.5 2 19
0.3 0.1 3 19
1.0 1.0 1 20
0.5 0.5 2 20
0.3 0.1 3 20
1.0 1.0 1 21
0.5 0.5 2 21
0.3 0.1 3 21
1.0 1.0 1 22
0.5 0.5 2 22
0.3 0.1 3 22
1.0 1.0 1 23
0.5 0.5 2 23
0.3 0.1 3 23
1.0 1.0 1 24
0.5 0.5 2 24
0.3 0.1 3 24
1.0 1.0 1 25
0.5 0.5 2 25
0.3 0.1 3 25
1.0 1.0 1 26
0.5 0.5 2 26
0.3 0.1 3 26
1.0 1.0 1 27
0.5 0.5 2 27
0.3 0.1 3 27
1.0 1.0 1 28
0.5 0.5 2 28
0.3 0.1 3 28
1.0 1.0 1 29
0.5 0.5 2 29
0.3 0.1 3 29
1.0 1.0 1 30
0.5 0.5 2 30
0.3 0.1 3 30
1.0 1.0 1 31
0.5 0.5 2 31
0.3 0.1 3 31
1.0 1.0 1 32
0.5 0.5 2 32
0.3 0.1 3 32
1.0 1.0 1 33
0.5 0.5 2 33
0.3 0.1 3 33
1.0 1.0 1 34
0.5 0.5 2 34
0.3 0.1 3 34
1.0 1.0 1 35
0.5 0.5 2 35
0.3 0.1 3 35
1.0 1.0 1 36
0.5 0.5 2 36
0.3 0.1 3 36
1.0 1.0 1 37
0.5 0.5 2 37
0.3 0.1 3 37
1.0 1.0 1 38
0.5 0.5 2 38
0.3 0.1 3 38
1.0 1.0 1 39
0.5 0.5 2 39
0.3 0.1 3 39
1.0 1.0 1 40
0.5 0.5 2 40
0.3 0.1 3 40
1.0 1.0 1 41
0.5 0.5 2 41
0.3 0.1 3 41
1.0 1.0 1 42
0.5 0.5 2 42
0.3 0.1 3 42
1.0 1.0 1 43
0.5 0.5 2 43
0.3 0.1 3 43
1.0 1.0 1 44
0.5 0.5 2 44
0.3 0.1 3 44
1.0 1.0 1 45
0.5 0.5 2 45
0.3 0.1 3 45
1.0 1.0 1 46
0.5 0.5 2 46
0.3 0.1 3 46
1.0 1.0 1 47
0.5 0.5 2 47
0.3 0.1 3 47
1.0 1.0 1 48
0.5 0.5 2 48
0.3 0.1 3 48
1.0 1.0 1 49
0.5 0.5 2 49
0.3 0.1 3 49
1.0 1.0 1 50
0.5 0.5 2 50
0.3 0.1 3 50
1.0 1.0 1 51
0.5 0.5 2 51
0.3 0.1 3 51
1.0 1.0 1 52
0.5 0.5 2 52
0.3 0.1 3 52
1.0 1.0 1 53
0.5 0.5 2 53
0.3 0.1 3 53
You'll have to invent a fake table with numbers in:
WITH nums as(
SELECT 1 as num
UNION ALL
SELECT num + 1 FROM nums
WHERE num <= 53
)
SELECT yourtable.*, num as col4 FROM
Yourtable
CROSS JOIN
nums
You can use below code. There are many ways to generate sequence (you can store it in temp table or use cte)
CREATE TABLE temp
(
Col1 DECIMAL(10,1),
Col2 DECIMAL(10,1),
Col3 INT
)
INSERT INTO temp
VALUES
(1,1,1)
,(0.5,0.5,2)
,(0.3,0.1,3)
DECLARE #Start INT =1
, #ENd INT = 53
SELECT
t.*
, seq.n AS Col4
FROM temp t
CROSS APPLY
(
SELECT DISTINCT n = number
FROM master..[spt_values]
WHERE number BETWEEN #start AND #end
) seq
RESULT:
Col1 Col2 Col3 Col4
--------------------------------------- --------------------------------------- ----------- -----------
1.0 1.0 1 1
1.0 1.0 1 2
1.0 1.0 1 3
1.0 1.0 1 4
1.0 1.0 1 5
1.0 1.0 1 6
1.0 1.0 1 7
1.0 1.0 1 8
1.0 1.0 1 9
1.0 1.0 1 10
1.0 1.0 1 11
1.0 1.0 1 12
1.0 1.0 1 13
1.0 1.0 1 14
1.0 1.0 1 15
1.0 1.0 1 16
1.0 1.0 1 17
1.0 1.0 1 18
1.0 1.0 1 19
1.0 1.0 1 20
1.0 1.0 1 21
1.0 1.0 1 22
1.0 1.0 1 23
1.0 1.0 1 24
1.0 1.0 1 25
1.0 1.0 1 26
1.0 1.0 1 27
1.0 1.0 1 28
1.0 1.0 1 29
1.0 1.0 1 30
1.0 1.0 1 31
1.0 1.0 1 32
1.0 1.0 1 33
1.0 1.0 1 34
1.0 1.0 1 35
1.0 1.0 1 36
1.0 1.0 1 37
1.0 1.0 1 38
1.0 1.0 1 39
1.0 1.0 1 40
1.0 1.0 1 41
1.0 1.0 1 42
1.0 1.0 1 43
1.0 1.0 1 44
1.0 1.0 1 45
1.0 1.0 1 46
1.0 1.0 1 47
1.0 1.0 1 48
1.0 1.0 1 49
1.0 1.0 1 50
1.0 1.0 1 51
1.0 1.0 1 52
1.0 1.0 1 53
0.5 0.5 2 1
0.5 0.5 2 2
0.5 0.5 2 3
0.5 0.5 2 4
0.5 0.5 2 5
0.5 0.5 2 6
and so on...
I am trying to make a chart that has a line graph showing the change in value in the count column for each month, and then two points showing the min and max value in that month. The table table is below.
Date Min Max Count
1/1/2015 0.28 6.02 13
2/1/2015 0.2 7.72 8
3/1/2015 1 1 1
4/1/2015 0.4 6.87 7
5/1/2015 0.36 3.05 8
6/1/2015 0.17 1.26 13
7/1/2015 0.31 1.59 15
8/1/2015 0.39 3.35 13
9/1/2015 0.22 0.86 10
10/1/2015 0.3 2.48 13
11/1/2015 0.16 0.82 9
12/1/2015 0.33 2.18 5
1/1/2016 0.23 1.16 14
2/1/2016 0.38 1.74 7
3/1/2016 0.1 8.87 9
4/1/2016 0.28 0.68 3
5/1/2016 0.13 3.23 11
6/1/2016 0.33 1 5
7/1/2016 0.28 1.26 4
8/1/2016 0.08 0.41 2
9/1/2016 0.43 0.61 2
10/1/2016 0.49 1.39 4
11/1/2016 0.89 0.89 1
I tried doing a scatter plot but when I try to Add a Line from Column value I get an error saying that the line cannot work on categorical data.
Any suggestions on how I can prepare this visualization?
Thanks!
I would do this in a combination chart.
Insert a combination chart (Line & Bar Graph)
On your X-Axis put your date as <BinByDateTime([Date],"Year.Month",1)>
On your Y-Axis put your aggregations: Sum([Count]), Max([Max]), Min([Min])
Right click > Properties > Series > set the Min and Max to Line Type
(Optional) Change the Y-Axis scale
I have a dataset in a long format as e.g.:
time subject var1 var2 var3
1 1 0.41 0.48 0.85
2 1 0.58 0.38 0.15
3 1 0.08 0.39 0.96
4 1 0.58 0.87 0.15
5 1 0.55 0.40 0.67
1 2 0.76 0.49 0.03
2 2 0.36 0.26 0.93
3 2 0.83 0.88 0.63
4 2 0.19 0.65 0.99
5 2 0.89 0.91 0.47
I would like to get a dataset in a wide format as
time var1_sub1 var2_sub1 var3_sub1 var1_sub2 var2_sub2 var3_sub2
1 0.41 0.48 0.85 0.76 0.49 0.03
2 0.58 0.38 0.15 0.36 0.26 0.93
3 0.08 0.39 0.96 0.83 0.88 0.63
4 0.58 0.87 0.15 0.19 0.65 0.99
5 0.55 0.40 0.67 0.89 0.91 0.47
So far, I came up with an idea to do it in the following way:
data data_sub1;
set data;
if subject=1;
var1_sub1=var1;
var2_sub1=var2;
var3_sub1=var3;
run;
data data_sub2;
set data;
if subject=2;
var1_sub2=var1;
var2_sub2=var2;
var3_sub2=var3;
run;
proc sort data=data_sub1;
by time;
run;
proc sort data=data_sub2;
by time;
run;
data datamerged;
merge data_sub1 data_sub2;
by time;
run;
It works, everything is fine, but I would like to learn how one could code it in a more beautiful way as in the practice I have much more subjects and variables.
This is a PROC TRANSPOSE problem. To solve most PROC TRANSPOSE problems, make it totally vertical (one value-one variable name per row) and then transpose using the ID statement.
data have;
input time subject var1 var2 var3;
datalines;
1 1 0.41 0.48 0.85
2 1 0.58 0.38 0.15
3 1 0.08 0.39 0.96
4 1 0.58 0.87 0.15
5 1 0.55 0.40 0.67
1 2 0.76 0.49 0.03
2 2 0.36 0.26 0.93
3 2 0.83 0.88 0.63
4 2 0.19 0.65 0.99
5 2 0.89 0.91 0.47
;;;;
run;
data have_vert;
set have;
array vars var:;
do _t = 1 to dim(vars);
id=cats(vname(vars[_t]),'_','sub',subject); *this is our future variable name;
value = vars[_t]; *this is our future variable value;
output;
end;
keep time id value subject;
run;
proc sort data=have_vert;
by time subject id;
run;
proc transpose data=have_vert out=want;
by time;
var value;
id id;
run;
I'm writing an R package that manipulates Matrices in C. Currently, the matrices returned to R have numbers for the row/column names. I would rather assign my own row/column names when modifying the object in C.
I've googled around for about an hour, but haven't found a good solution yet. The closest I've found is dimnames, but I want to name each column, not just the two dimensions. The matrices get larger than 4x4, below is just a small example of what I want to do.
The number of rows is 4^x where X is the length of the row name
Current
[,1] [,2] [,3] [,4]
[1,] 0.20 0.00 0.00 0.80
[2,] 0.25 0.25 0.25 0.25
[3,] 0.25 0.25 0.25 0.25
[4,] 1.00 0.00 0.00 0.00
[5,] 0.20 0.00 0.00 0.80
[6,] 0.25 0.25 0.25 0.25
[7,] 0.25 0.25 0.25 0.25
[8,] 1.00 0.00 0.00 0.00
[9,] 0.20 0.00 0.00 0.80
[10,] 0.25 0.25 0.25 0.25
[11,] 0.25 0.25 0.25 0.25
[12,] 1.00 0.00 0.00 0.00
[13,] 0.20 0.00 0.00 0.80
[14,] 0.25 0.25 0.25 0.25
[15,] 0.25 0.25 0.25 0.25
[16,] 1.00 0.00 0.00 0.00
Desired
[A] [C] [G] [T]
[AA] 0.20 0.00 0.00 0.80
[AC] 0.25 0.25 0.25 0.25
[AG] 0.25 0.25 0.25 0.25
[AT] 1.00 0.00 0.00 0.00
[CA] 0.20 0.00 0.00 0.80
[CC] 0.25 0.25 0.25 0.25
[CG] 0.25 0.25 0.25 0.25
[CT] 1.00 0.00 0.00 0.00
[GA] 0.20 0.00 0.00 0.80
[GC] 0.25 0.25 0.25 0.25
[GG] 0.25 0.25 0.25 0.25
[GT] 1.00 0.00 0.00 0.00
[TA] 0.20 0.00 0.00 0.80
[TC] 0.25 0.25 0.25 0.25
[TG] 0.25 0.25 0.25 0.25
[TT] 1.00 0.00 0.00 0.00
If you are open to C++ instead of C, then Rcpp can make this a little easier. We just create a list object with rows and column names as we would in R, and assign that to the dimnames attribute of the matrix object:
R> library(inline) # to compile, link, load the code here
R> src <- '
+ Rcpp::NumericMatrix x(2,2);
+ x.fill(42); // or more interesting values
+ // C++0x can assign a set of values to a vector, but we use older standard
+ Rcpp::CharacterVector rows(2); rows[0] = "aa"; rows[1] = "bb";
+ Rcpp::CharacterVector cols(2); cols[0] = "AA"; cols[1] = "BB";
+ // now create an object "dimnms" as a list with rows and cols
+ Rcpp::List dimnms = Rcpp::List::create(rows, cols);
+ // and assign it
+ x.attr("dimnames") = dimnms;
+ return(x);
+ '
R> fun <- cxxfunction(signature(), body=src, plugin="Rcpp")
R> fun()
AA BB
aa 42 42
bb 42 42
R>
The actual assignment of the column and row names is so manual ... because the current C++ standard does not allow direct assignment of vectors at initialization, but that will change.
Edit: I just realized that I can of course use static create() method on the row and colnames too, which makes this a little easier and shorter still
R> src <- '
+ Rcpp::NumericMatrix x(2,2);
+ x.fill(42); // or more interesting values
+ Rcpp::List dimnms = // two vec. with static names
+ Rcpp::List::create(Rcpp::CharacterVector::create("cc", "dd"),
+ Rcpp::CharacterVector::create("ee", "ff"));
+ // and assign it
+ x.attr("dimnames") = dimnms;
+ return(x);
+ '
R> fun <- cxxfunction(signature(), body=src, plugin="Rcpp")
R> fun()
ee ff
cc 42 42
dd 42 42
R>
So we are down to three or four statements, no monkeying with PROTECT / UNPROTECT and no memory management.
As Jim said, this is much easier to do in R. I'm passing the names into the C function via the nam argument.
#include <Rinternals.h>
SEXP myMat(SEXP nam) {
/*PrintValue(nam);*/
SEXP ans, dimnames;
PROTECT(ans = allocMatrix(REALSXP, length(nam), length(nam)));
PROTECT(dimnames = allocVector(VECSXP, 2));
SET_VECTOR_ELT(dimnames, 0, nam);
SET_VECTOR_ELT(dimnames, 1, nam);
setAttrib(ans, R_DimNamesSymbol, dimnames);
UNPROTECT(2);
return(ans);
}
If you put that code in a file called myMat.c, you can test it via the line below. I'm using Ubuntu, so you will have to change myMat.so to myMat.dll if you're on Windows.
R CMD SHLIB myMat.c
Rscript -e 'dyn.load("myMat.so"); .Call("myMat", c("A","C","G","T"))'
The note above is instructive. The dimnames is a list with the same number of elements as dimensions of the dataset, where each element corresponds to the number elements along that dimension, i.e., list(c('a','c','g','t'), c('a','c','g','t')).
To set that in C, I would recommend:
PROTECT(dimnames = allocVector(VECSXP, 2));
PROTECT(rownames = allocVector(STRSXP, 4));
PROTECT(colnames = allocVector(STRSXP, 4));
setAttrib( ? , R_DimNamesSymbol, dimnames);
You'll have to then set the relevant rowname and colname elements. In general, this stuff is much easier to do in R.
jim