T-SQL - How can I get data in a row to show in a column - sql-server

I have a record set that contains course attendance data in a row that I want to display in columns based on the last letter in the Course_Code and haven't been able to find a method for this.
The Course_Code filed contains the city followed by a sequence letter denoting the order the modules are to be taken. A must be first followed by B, then C etc.
The data looks like this:
Course_Code Student_ID
MadridA 123
ParisB 123
NewYorkC 123
HamburgD 123
HamburgA 456
ParisB 456
HamburgC 456
HamburgD 456
HamburgA 789
ParisB 789
HamburgC 789
MadridD 789
What I need the result to look like is:
Student_ID CourseA CourseB CourseC CourseD
123 MadridA ParisB NewYorkC HamburgD
456 HamburgA ParisB HamburgC HamburgD
789 HamburgA ParisB HamburgC MadridD
I've been looking into PIVOT as a likely solution but can't find any example that doesn't involve SUM or AVG on data values. I don't need to change the data just move to the appropriate column.
Is PIVOT going to do what I need or am I in the wrong creek with a broken paddle on that?

You can use the PIVOT function to get the result, but you will need to use either the max or min aggregate function since your data is a string.
You should be able to use the following:
select student_id,
CourseA, CourseB,
CourseC, CourseD
from
(
select course_code, student_id,
-- append the course letter A, etc to Course to get the new column names
col = 'Course'+right(course_code, 1)
from yourtable
) d
pivot
(
max(course_code)
for col in (CourseA, CourseB, CourseC, CourseD)
) piv;
See SQL Fiddle with Demo

Related

Comparing values between records in a table using Informatica PowerCenter

Consider a table with the following records in a Database:
>>> Table A:
Col_1 Col_2 Col_3
GGG 123 -
GGG 123 X
GGG 123 Y
KKK 786 X
MMM 999 Y
DDD 456 X
DDD 456 U
Wherever we have records with matching values in col_1 and col_2, and we have values X and Y in col_3, the records with X and Y must be deleted. In other cases, we should keep the records.
For example in the above table, the output should look like this:
>>> Output_Table:
Col_1 Col_2 Col_3
GGG 123 -
KKK 786 X
MMM 999 Y
DDD 456 X
DDD 456 U
How this scenario can be implemented (using expression transformation, variable ports, lookup and so on...)? Any help would be greatly appreciated.
There can be multiple scenarios. And i am not sure if your issue is exactly like you described but i will answer as per your question.
Assuming Col_3 can have 'X','Y' - as hardcoded value you want to remove. The values you are trying to remvoe are hardcoded.
First sort the data based on Col_1,Col_2.
Then use EXP transformation and create 7 ports like below. Here we will compare one row with its previous row and see if they are same or not. If same, then concat col3 into one single column.
col1
col2
in_col3
v_col3= iif(v_prev_col1=col1 and v_prev_col2=col2,col3,v_col3||''||col3)
v_prev_col1=col1
v_prev_col2=col2
o_col3=v_col3
After that use an aggregator - group by ports will be col1,col2. And then col3 will be MAX(o_col3) from expression before. Agg will stamp concatenated col3 into one single column.
Then add a filter like below to check if you have XY or YX for duplicate rows.
iif(max_col3='XY' or reverse(max_col3)='XY',FALSE,TRUE) -- You can place any hardcode values here.
EDIT :
5. Now, if you want to get original data (like in comments) excluding XY combination, then use a joiner.
use a joiner now, join output from step 4 and output of step 1. It will be a normal join on Col_1,Col_2.
And the output of the joiner will have no XY combination.
Whole mapping should look like this
|->2.EXP-->3.AGG-->4.FIL--|
-->1.SRT ->|------------------------>|->5.JNR--...--> TGT

Hive table Array Columns - explode using array_index

Hi i have a Hive table
select a,b,c,d from riskfactor_table
In the above table B, C and D columns are array columns. Below is my Hive DDL
Create external table riskfactor_table
(a string,
b array<string>,
c array<double>,
d array<double> )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '~'
stored as textfile location 'user/riskfactor/data';
Here is my table data:
ID400S,["jms","jndi","jaxb","jaxn"],[100,200,300,400],[1,2,3,4]
ID200N,["one","two","three"],[212,352,418],[6,10,8]
If i want to split array columns how can i split?
If i use explode function i can split array values for only one column
select explode(b) as b from riskfactor_table;
Output:
jms
jndi
jaxb
jxn
one
two
three
But i want all the columns to be populated using one select statement below-
Query - select a,b,c,d from risk_factor;
Output:
row1- ID400S jms 100 1
row2- ID400S jndi 200 2
row3- ID400S jaxb 300 3
row4- ID400S jaxn 400 4
How can i populate all the data?
You can achieve this using LATERAL VIEW
SELECT Mycoulmna, Mycoulmnb ,Mycoulmnc
FROM riskfactor_table
LATERAL VIEW explode(a) myTablea AS Mycoulmna
LATERAL VIEW explode(a) myTableb AS Mycoulmnb
LATERAL VIEW explode(a) myTablec AS Mycoulmnc ;
for more detail go throw it .
Use the 'numeric_range' UDF from Brickhouse. Here is a blog posting describing the details.
https://brickhouseconfessions.wordpress.com/2013/03/07/exploding-multiple-arrays-at-the-same-time-with-numeric_range/
In your case, your query would be something like
SELECT a,
array_index( b, i ),
array_index( c, i ),
array_index( d, i )
FROM risk_factor_table
LATERAL VIEW numeric_range( 0, 3 );
I was also looking for same question's solution. Thanks Jerome, for this Brickhouse solution.
I had to make a slight change (addition of alias "n1 as n") as below to make it work for my case:
hive> describe test;
OK
id string
animals array<string>
cnt array<bigint>
hive> select * from test;
OK
abc ["cat","dog","elephant","dolphin","snake","parrot","ant","frog","kuala","cricket"] [10597,2027,1891,1868,1804,1511,1496,1432,1305,1299]
hive> select `id`, array_index(`animals`,n), array_index(`cnt`,n) from test lateral view numeric_range(0,10) n1 as n;
OK
abc cat 10597
abc dog 2027
abc elephant 1891
abc dolphin 1868
abc snake 1804
abc parrot 1511
abc ant 1496
abc frog 1432
abc kuala 1305
abc cricket 1299
The only thing is I have to know beforehand that there are 10 elements to be exploded.

Find valid combinations based on matrix

I have a in CALC the following matrix: the first row (1) contains employee numbers, the first column (A) contains productcodes.
Everywhere there is an X that productitem was sold by the corresponding employee above
| 0302 | 0303 | 0304 | 0402 |
1625 | X | | X | X |
1643 | | X | X | |
...
We see that product 1643 was sold by employees 0303 and 0304
What I would like to see is a list of what product was sold by which employees but formatted like this:
1625 | 0302, 0304, 0402 |
1643 | 0303, 0304 |
The reason for this is that we need this matrix ultimately imported into an SQL SERVER table. We have no access to the origins of this matrix. It contains about 50 employees and 9000+ products.
Thanx for thinking with us!
try something like this
;with data as
(
SELECT *
FROM ( VALUES (1625,'X',NULL,'X','X'),
(1643,NULL,'X','X',NULL))
cs (col1, [0302], [0303], [0304], [0402])
),cte
AS (SELECT col1,
col
FROM data
CROSS apply (VALUES ('0302',[0302]),
('0303',[0303]),
('0304',[0304]),
('0402',[0402])) cs (col, val)
WHERE val IS NOT NULL)
SELECT col1,
LEFT(cs.col, Len(cs.col) - 1) AS col
FROM cte a
CROSS APPLY (SELECT col + ','
FROM cte B
WHERE a.col1 = b.col1
FOR XML PATH('')) cs (col)
GROUP BY col1,
LEFT(cs.col, Len(cs.col) - 1)
I think there are two problems to solve:
get the product codes for the X marks;
concatenate them into a single, comma-separated string.
I can't offer a solution for both issues in one step, but you may handle both issues separately.
1.
To replace the X marks by the respective product codes, you could use an array function to create a second table (matrix). To do so, create a new sheet, copy the first column / first row, and enter the following formula in cell B2:
=IF($B2:$E3="X";$B$1:$E$1;"")
You'll have to adapt the formula, so it covers your complete input data (If your last data cell is Z9999, it would be =IF($B2:$Z9999="X";$B$1:$Z$1;"")). My example just covers two rows and four columns.
After modifying it, confirm with CTRL+SHIFT+ENTER to apply it as array formula.
2.
Now, you'll have to concatenate the product codes. LO Calc lacks a feature to concatenate an array, but you could use a simple user-defined function. For such a string-join function, see this answer. Just create a new macro with the StarBasic code provided there and save it. Now, you have a STRJOIN() function at hand that accepts an array and concatenates its values, leaving empty values out.
You could add that function using a helper column on the second sheet and apply it by dragging it down. Finally, to get rid of the cells with the single product IDs, copy the complete second sheet, paste special into a third sheet, pasting only the values. Now, you can remove all columns except the first one (employee IDs) and the last one (with the concatenated product ids).
I created a table in sql for holding the data:
CREATE TABLE [dbo].[mydata](
[prod_code] [nvarchar](8) NULL,
[0100] [nvarchar](10) NULL,
[0101] [nvarchar](10) NULL,
[and so on...]
I created the list of columns in Calc by copying and pasting them transposed. After that I used the concatenate function to create the columnlist + datatype for the create table statement
I cleaned up the worksheet and imported it into this table using SQL Server's import wizard. Cleaning meant removing unnecessary rows/columns. Since the columnnames were identical mapping was done correctly for 99%.
Now I had the data in SQL Server.
I adapted the code MM93 suggested a bit:
;with data as
(
SELECT *
FROM dbo.mydata <-- here i simply referenced the whole table
),cte
and in the next part I uses the same 'worksheet' trick to list and format all the column names and pasted them in.
),cte
AS (SELECT prod_code, <-- had to replace col1 with 'prod_code'
col
FROM data
CROSS apply (VALUES ('0100',[0100]),
('0101', [0101] ),
(and so on... ),
The result of this query was inserted into a new table and my colleagues and I are querying our harts out :)
PS: removing the 'FOR XML' clause resulted in a table with two columns :
prodcode | employee
which containes al the unique combinations of prodcode + employeenumber which is a lot faster and much more practical to query.

Sum of values of json array in PostgreSQL

In PostgreSQL 9.3, I have a table like this
id | array_json
---+----------------------------
1 | ["{123: 456}", "{789: 987}", "{111: 222}"]
2 | ["{4322: 54662}", "{123: 5121}", "{1: 5345}" ... ]
3 | ["{3232: 413}", "{5235: 22}", "{2: 5453}" ... ]
4 | ["{22: 44}", "{12: 4324}", "{234: 4235}" ... ]
...
I want to get the sum of all values in array_json column. So, for example, for first row, I want:
id | total
---+-------
1 | 1665
Where 1665 = 456 + 987 + 222 (the values of all the elements of json array). No previous information about the keys of the json elements (just random numbers)
I'm reading the documentation page about JSON functions in PostgreSQL 9.3, and I think I should use json_each, but can't find the right query. Could you please help me with it?
Many thanks in advance
You started looking at the right place (going to the docs is always the right place).
Since your values are JSON arrays -> I would suggest using json_array_elements(json)
And since it's a json array which you have to explode to several rows, and then combine back by running sum over json_each_text(json) - it would be best to create your own function (Postgres allows it)
As for your specific case, assuming the structure you provided is correct, some string parsing + JSON heavy wizardry can be used (let's say your table name is "json_test_table" and the columns are "id" and "json_array"), here is the query that does your "thing"
select id, sum(val) from
(select id,
substring(
json_each_text(
replace(
replace(
replace(
replace(
replace(json_array,':','":"')
,'{',''),
'}','')
,']','}')
,'[','{')::json)::varchar
from '\"(.*)\"')::int as val
from json_test_table) j group by id ;
if you plan to run it on a huge dataset - keep in mind string manipulations are expensive in terms of performance
You can get it using this:
/*
Sorry, sqlfiddle is busy :p
CREATE TABLE my_table
(
id bigserial NOT NULL,
array_json json[]
--,CONSTRAINT my_table_pkey PRIMARY KEY (id)
)
INSERT INTO my_table(array_json)
values (array['{"123": 456}'::json, '{"789": 987}'::json, '{"111": 222}'::json]);
*/
select id, sum(json_value::integer)
from
(
select id, json_data->>json_object_keys(json_data) as json_value from
(
select id, unnest(array_json) as json_data from my_table
) A
) B
group by id

CakePHP iterate through all rows in the DB and update?

I am not sure of the cakephp way to do this. My model looks like below (simplified)
Model
id column1 column2 column3 sum
1232 3 5 2
5474 5 10 4
Now, because of the nature of the program, I need to iterate through the database, multiply each column value by a multiplier, then sum those values, then put that value into each record's sum. So, for example, if I had a variable $multiplier = 2, then I would want to have this happen for the first row:
(3*$multiplier) + (5*$multiplier) + (2*$multiplier) = 20
Model
id column1 column2 column3 sum
1232 3 5 2 20
5474 5 10 4 38
Of course, this is very simplified, but it's representative of what I want to do.
Is there a cakephp way to do this? I dont have an auto-incrementing id column in the db, but rather just an id column (which is unique).
Thank you!
Let the database do it for you:
$this->Model->updateAll(array('sum' => 'column1 + column2 + column3'));
http://book.cakephp.org/view/1031/Saving-Your-Data (see section updateAll).

Resources