Data Transformation in Oracle

Data Transformation in Oracle - database

What query should use, If I have a data set that I want to transform
From | To | Val1 | Val2 | Val3
123 130 AB DE EF
131 140 WS ED RF
141 145 GT HY JU
and I want to print the following data set in Oracle
ID | Val1 | Val2 | Val3
123 AB DE EF
124 AB DE EF
125 AB DE EF
126 AB DE EF
127 AB DE EF
128 AB DE EF
129 AB DE EF
130 AB DE EF
131 WS ED RF
132 WS ED RF
133 WS ED RF
134 WS ED RF
135 WS ED RF
136 WS ED RF
137 WS ED RF
138 WS ED RF
139 WS ED RF
140 WS ED RF
141 GT HY JU
142 GT HY JU
143 GT HY JU
144 GT HY JU
145 GT HY JU

Assumptions: Your table is called inputs (if not, use your actual table name); the first two columns are called f and t (they can't be called from and to, those are Oracle reserved words); and the f column is unique - meaning, it has no duplicates - and you guarantee that f <= t in all rows:
select f + level - 1 as id, val1, val2, val3
from inputs
connect by level <= t - f + 1
and prior f = f
and prior sys_guid() is not null
;

Related

Google Sheets (Formula or GAScript) - Combine 2 sheets with unique columns into a single sheet

The Problem
I am trying to combine Sheet1 & Sheet2 into Sheet3 sorted by timestamp, but I am unable to adjust the columns so they match between both datasets. Is this even possible with using formulas, or is my only option Google App Scripts?
My Attempt
=query({Sheet1!A2:F;Sheet2!A2:F},"WHERE Col1 is not null ORDER BY Col1")
I have also tried other methods using helper columns, but that did not work very well either.
Spreadsheet:
https://docs.google.com/spreadsheets/d/1w1RIygC4GodoIvzBGKbx5P_GwSqBMPJ6AkL8Dl5ZLOU/edit?usp=sharing
Sheet1
Timestamp
First Name
Email
Address
Phone Number
Comments
3/15/2022 8:12:00
Jed
JedRigby#
123 St
(778) 913-4767
Comment A
3/15/2022 9:23:00
Elle-May
Elle-MayMcdermott#
124 St
(660) 632-5480
Comment B
3/15/2022 10:11:00
Junayd
JunaydDavis#
125 St
(774) 516-6738
Comment C
3/19/2022 19:55:04
Caleb
CalebMaddox#
128 St
(624) 540-7406
Comment D
3/19/2022 22:17:04
Misbah
MisbahHowarth#
129 St
(890) 436-0537
Comment E
Sheet2
Timestamp
First Name
Last Name
Email
Address
3/15/2022 13:37:00
Jody
English
JodyEnglish#
126 St
3/19/2022 17:32:04
Samual
Savage
SamualSavage#
127 St
3/22/2022 7:24:04
Bill
Short
BillShort#
130 St
3/22/2022 9:51:04
Jevon
Conner
JevonConner#
131 St
3/22/2022 12:33:04
Clementine
Talley
ClementineTalley#
132 St
COMBINED (Sheet1 & Sheet2) - Expected Reults
Timestamp
First Name
Last Name
Email
Address
Phone Number
Comments
3/15/2022 8:12:00
Jed
Rigby
JedRigby#
123 St
(778) 913-4767
Comment A
3/15/2022 9:23:00
Elle-May
Mcdermott
Elle-MayMcdermott#
124 St
(660) 632-5480
Comment B
3/15/2022 10:11:00
Junayd
Davis
JunaydDavis#
125 St
(774) 516-6738
Comment C
3/15/2022 13:37:00
Jody
English
JodyEnglish#
126 St
(492) 298-3670
3/19/2022 17:32:04
Samual
Savage
SamualSavage#
127 St
(871) 816-6015
3/19/2022 19:55:04
Caleb
Maddox
CalebMaddox#
128 St
(624) 540-7406
Comment D
3/19/2022 22:17:04
Misbah
Howarth
MisbahHowarth#
129 St
(890) 436-0537
Comment E
3/22/2022 7:24:04
Bill
Short
BillShort#
130 St
(660) 632-5480
3/22/2022 9:51:04
Jevon
Conner
JevonConner#
131 St
(549) 806-8647
3/22/2022 12:33:04
Clementine
Talley
ClementineTalley#
132 St
(660) 632-5480

try:
=ARRAYFORMULA(QUERY({QUERY({Sheet1!A2:F, REGEXEXTRACT(Sheet1!C2:C, Sheet1!B2:B&"(.*)#")},
"select Col1,Col2,Col7,Col3,Col4,Col5,Col6");
QUERY(Sheet2!A2:F, "select A,B,C,D,E,F,' ' label ' '''")},
"where Col1 is not null order by Col1", ))

when duplicate values found then

I want to have a query that selects all duplicate values in a column. If those value meet the conditions then I'd like the query to return only those values.
Class Student_ID Location
Biology 511 4A
Biology 512 15B
Biology 513 15B
English 514 6A
Biology 521 6A
Spanish 522 6A
Spanish 523 15B
Chemistry 524 4A
English 531 15B
Biology 532 4A
Chemistry 534 4A
Select all duplicate values in the class column and if among those values there is location in both 4A and 15B then assign 1.
CASE WHEN count(class) > 1 AND (Location = '4A' AND Location = '15B') THEN 1
ELSE 0 END
what is most important is how to select duplicate values as a group and then look at the condition (location must be 4A and 15B). So the query must first group the duplicated values from the class column and then see if within the group the values meet the condition of location. So for example we first group the class column we get 5x biology this is then seen as a group and then within this group if there exist one row with location 4A AND one row with location 15B then and only then assign value 1 for biology. Almost all the values in class column have duplicates.
Desired Output
Class Location
Biology 1
Chemistry 0
English 0
Spanish 0

As an alternative to Tim Schmelter's answer, you can also do this with a LEFT JOIN.
SELECT yt1.Class, IIF(COUNT(yt2.Class) > 0, 1, 0) AS IsMatch
FROM YourTable yt1
LEFT JOIN YourTable yt2 ON yt1.Location = '4A' AND yt2.Location = '15B' AND
yt2.Class = yt1.Class
GROUP BY yt1.Class

Why am I encountering this error pertaining to non-matching object lengths when the individual parts of this script work fine on their own?

I have a script to process weather polar radar data into cartesian coordinates and then plot it. I have tested each individual component and the individual components do what they're supposed to every time. Recently I had the need to streamline everything and so I put it in a script, but when I try to run my data through it, it kicks out an error message that I don't fully understand. Any help would be greatly appreciated. Thanks in advance. I am executing the script with the command radProcess(test, 1, ref).
My data looks like this (although this example has been scaled down from the 3600x800 data frame that it's in)-
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 -96 -75 -69 -62 51 40 47 50 52 47
2 -94 -80 -67 -57 53 37 44 51 54 50
3 -100 -81 -72 -61 54 42 50 48 56 50
4 -101 -82 -72 -63 55 43 40 47 50 48
5 -999 -78 -73 -59 55 40 46 49 54 54
6 -102 -81 -71 -59 51 37 44 52 55 57
7 -101 -79 -74 -59 54 43 42 47 55 47
8 -95 -80 -73 -59 52 40 48 54 58 54
9 -96 -78 -75 -58 57 44 39 50 47 55
10 -99 -79 -73 -59 57 45 46 56 55 53
I'm encountering an error message when I try to run my data that looks like this-
Error in radProcess(test, 10, ref) :
dims [product 864000] do not match the length of object [2880000]
In addition: Warning message:
In final.levl[cbind(z.rad, t.rad, r.rad)] * conversion.factor :
longer object length is not a multiple of shorter object length
The script can be seen below-
radProcess <- function(file, level, product){
## Convert file to 3 dimensional array format organized ##
## by scan level (10-top, 1-bottom of array) ##
print("Converting to 3D Array")
x.arr.vert <- array(unlist(file), dim = c(10,360,800))
x.arr.horz <- aperm(array(x.arr.vert, dim = c(360,10,800)), c(2,1,3))
final.levl <- x.arr.horz[ c(10:1),,]
## Create matrix of coordinates and values and then ##
## converts from polar to cartesian coordinates ##
print("Creating Matrix of Polar Coordinates")
mat <- which( (final.levl > -1000), arr.ind = TRUE)
z.rad <- mat[, 1]
t.rad <- mat[, 2]
r.rad <- mat[, 3]
print("Converting to Cartesian Coordinates")
theta.polar <- t.rad * pi / 180
r.polar <- r.rad * 0.075
x.cart <- r.polar * cos(theta.polar)
y.cart <- r.polar * sin(theta.polar)
## Reflectivity adjustment constant = .514245 ##
## Velocity adjustment constant = .1275628 ##
print("Determining Conversion Factor")
conversion.factor <- ifelse( (product == "ref"), yes = .514245, no = .1275628)
print("Copying Values from Array to Matrix")
value <- (final.levl[cbind(z.rad, t.rad, r.rad)] * conversion.factor)
Cart.Coord.Matrix <- matrix( NA, nrow = 2880000, ncol = 4)
Cart.Coord.Matrix <- cbind(z.rad, y.cart, x.cart, value)
## Create new matrix level wanted from transposed value matrix ##i
print("Reducing down to Level Wanted")
specified.level <- Cart.Coord.Matrix[z.rad == level,]
## Plot level values in Radar plot ##
print("Plotting Data Points")
x1<-specified.level[,3]
y2<-specified.level[,2]
z3<-specified.level[,4]
d1 <- data.frame(x1,y2,z3)
dg1 <-qplot(y2,x1,colour=z3,data=d1)
dg1 + scale_colour_gradientn(limits = c(0, 60), colours = rev(rainbow(10)))
}
Example of finished product

Sum of multiple variables by group

I have a dataset with over 900 observations, each observation represents the population of a sub-geographical area for a given year by gender (male, female, all) and 20 different age groups.
I have dropped the variable for the sub-geographical area and I want to collape into the greater geographical area (called Geo).
I am having a difficult time doing a SUM or PROC MEANS because I have so many age groups to sum up and I am trying to avoid writing them all out. I want to collapse across the group year, geo, sex so that I only have 3 observations per Geo (my raw data could have as many as 54 observations).
This is an example of what a tiny section of the raw data looks like:
Year Geo Sex Age0005 Age0610 Age1115 (etc)
2010 1 1 92 73 75
2010 1 2 57 81 69
2010 1 3 159 154 144
2010 1 1 41 38 43
2010 1 2 52 41 39
2010 1 3 93 79 82
2010 2 1 71 66 68
2010 2 2 63 64 70
2010 2 3 134 130 138
2010 2 1 32 35 34
2010 2 2 29 31 36
2010 2 3 61 66 70
This is how I want it to look:
Year Group Sex Age0005 Age0610 Age1115 (etc)
2010 1 1 133 111 118
2010 1 2 109 122 08
2010 1 3 252 233 226
2010 2 1 103 101 102
2010 2 2 92 95 106
2010 2 3 195 196 208
Any ideas? Please help!

You don't have to write out each variable name individually - there are ways of getting around that. E.g. if all of the age group variables that need to be summed up start with age then you can use a : wildcard to match them:
proc summary nway data = have;
var age:;
class year geo sex;
output out = want sum=;
run;
If your variables don't have a common prefix, but are all next to each other in one big horizontal group in your dataset, you can use a double dash list instead:
proc summary nway data = have;
var age005--age1115; /*Includes all variables between these two*/
class year geo sex;
output out = want sum=;
run;
Note also the use of sum= - this means that each summarised variable is reproduced with its original name in the output dataset.

I personally like to use proc sql for this, since it makes it very clear what you're summing and grouping by.
data old ;
input Year Geo Sex Age0005 Age0610 Age1115 ;
datalines;
2010 1 1 92 73 75
2010 1 2 57 81 69
2010 1 3 159 154 144
2010 1 1 41 38 43
2010 1 2 52 41 39
2010 1 3 93 79 82
2010 2 1 71 66 68
2010 2 2 63 64 70
2010 2 3 134 130 138
2010 2 1 32 35 34
2010 2 2 29 31 36
2010 2 3 61 66 70
;
run;
proc sql ;
create table new as select
year
, geo label = 'Group'
, sex
, sum(age0005) as age0005
, sum(age0610) as age0610
, sum(age1115) as age1115
from old
group by geo, year, sex ;
quit;

How do I sum using for distinct items in a table

I have to show my table data in sort order by design_no
Here is my data
design_no fname meter rate s m l xl
---------------------------------------------------------------
3092 2111-1 432.00 235.00 32 33 21 21
3092 2111-1 498.75 235.00 38 37 24 24
3092 2111-1 460.50 235.00 31 35 23 24
3092 2111 501.75 245.00 37 38 25 24
I want show it like this..
design_no fname meter rate pcs
---------------------------------------------------
3092 2111 501.75 245.00 124
3092 2111-1 1391.25 235.00 343
Kindy help me

SELECT design_no,fname,SUM(meter),rate,SUM(s)+SUM(m)+SUM(l)+SUM(xl)
FROM tab
GROUP BY design_no,fname,rate
What behaviour do you want if the rate is different for the same design_no and fname?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Data Transformation in Oracle - database

Related

Google Sheets (Formula or GAScript) - Combine 2 sheets with unique columns into a single sheet

when duplicate values found then

Why am I encountering this error pertaining to non-matching object lengths when the individual parts of this script work fine on their own?

Sum of multiple variables by group

How do I sum using for distinct items in a table

Categories

Resources