How to aggregate column values into array after groupBy? - arrays

I want to group by name and add color into array i have done following thing but it cant helped
val uid = flatten(collect_list($"color")).alias("color")
val df00= df_a.groupBy($"name")
.agg(color)
I have a dataframe with following values
---------------
|name |color |
---------------
|gaurav| red |
|harsh |black |
|nitin |yellow|
|gaurav|white |
|harsha|blue |
---------------
I want to group by name and store the color values into array using scala, to get a result like this:
----------------------
|name | color |
----------------------
|gaurav| [red,white] |
|harsh | [black,blue]|
|nitin | [yellow] |
----------------------

Use collect_list
The code is shown below:
import org.apache.spark.sql.functions._
df.groupBy($"name").agg(collect_list($"color").as("color_list")).show
Hope it helps!!

Related

Replacing placeholder with another table's data (without knowing in advance the substitutions)

I need to replace placeholders in a text, reading from a query matched to that specific message.
Table Template_Messages
| ID | String | Query
| PICKUP_MSG|Your {vehicle_name} will be ready for pick-up on {pickup_date}|SELECT * FROM vehicles WHERE ID = ?
If I take the query I find in the 'Query' column, I will find the following table:
Table Vehicles
| ID | vehicle_name | plate | pickup_date | ... |
| P981| BMW X5 | AA014CC| 2022-09-20 | ... |
| Z323| Ford Focus | HH000JJ| 2022-10-21 | ... |
Then with the following query:
SELECT * FROM vehicles WHERE ID = 'Z323'
By making the appropriate substitutions I should obtain this output:
Your Ford Focus will be ready for pick-up on 2022-10-21
How can I achieve this?
And since the 'query' column of the first table does not only refer to the 'vehicles' table, can it work dynamically on any placeholder/query?

Retrieve an array from a dataframe using Scala/Spark

I got a dataframe, dfFood, with some info which i would like to extract and analize.
Dataframe looks like this:
|amount| description| food| id| name|period|rule|typeFood|version|
| ---- | ------------ | ------------- | - | --- | ---- | -- | ------ | ----- |
| 100|des rule name2|[Chicken, Fish]| 2|name2| 2022| 2| [1, 2]| 2|
| 55|des rule name3| [Vegetables]| 3|name3| 2022| 3| [3]| 3|
| 13|des rule name4| [Ramen]| 4|name4| 2022| 4| [4]| 4|
I want to read all the rows and analize some fields in those rows. I'm using foreach to extract dataframe info.
dfFood.foreach(row => {
println(row)
val food = row(2)
println(food)
}
That code prints this:
[100,des rule name2,WrappedArray(Chicken, Fish),2,name2,2022,2,WrappedArray(1, 2),2]
WrappedArray(Chicken, Fish)
[55,des rule name3,WrappedArray(Vegetables),3,name3,2022,3,WrappedArray(3),3]
WrappedArray(Vegetables)
[13,des rule name4,WrappedArray(Ramen),4,name4,2022,4,WrappedArray(4),4]
WrappedArray(Ramen)
I'm getting the field food using row(2), but i would like to get the info throw name using something like row("food").
Also i would like to transform that wrapped array in a list, and then analize all data inside the array.

How to get data to other sheet if cell have number value only

I have google sheet which have some data. like bellow
Sheet one
Column A | | Column B
=================================
Hello | | *1*
World! | | p
Foo | | *3*
Bar | | L
Bar1 | | *0*
Want data in sheet 2 only which have nubmers
Sheet two
Column A | | Column B
=================================
Hello | | *1*
Foo | | *3*
Bar1 | | *0*
Hope you understand what I want.
try:
=FILTER(Sheet1!A1:B, ISNUMBER(Sheet1!B1:B*1))
or maybe:
=FILTER(Sheet1!A1:B, ISNUMBER(REGEXEXTRACT(Sheet1!B1:B, "\*(\d+)\*")*1))

SSRS How to Concatenate Multiple Rows with Dynamic Columns

Right now I have dynamic columns based off of a query, and then I have data attached to those columns that I want to populate in rows. They associate just fine, but the problem is that the rows hold whatever place they were in, instead of ascending to the top again, like so:
|column|column2|column3|
| row1 | | |
| | row2 | |
| | | row3 |
And the goal is:
|column|column2|column3|
| row1 | row2 | row3 |
| | | |
| | | |
I know that I could do this in the query, combining the headers with a dynamic query, but is there any SSRS magic that can achieve this without that?
EDIT 1
I am using a matrix, sorry about not specifying, I heard that the only way to do dynamic columns are matrices, so I thought it was implied.
EDIT 2
The rows come in like
Wanted_Column | Wanted Row
Column | data
Column2 | data
Column | data
and I want it so that the table will look like
|column|column2|
| data | data |
| data | |
| | |
for any number of columns/rows
I'm assuming you are using a matrix to do this. It's not clear from your question... Anyway, you'll need to add row grouping. If you don't have data that can be grouped by an actual column value then set the group expression to 1 and that should do it.
If this is not correct then please show you data as it comes from the dataset and the expected output.

How to tell SSIS unpivot to make rows even when values is null

I have this flat file that I import and it needs to be unpivoted. All works well except that I would like that the unpivot makes the rows even is the value is null.
I don't want to resort to some sort of hack to add -1 and replace the -1 after.
The software that uses the database expect to have always 3 rows for each line that was imported from the flat file even if it has null for value.
Some drawing to explain the problem
flat file line
-----------------------------------------------------------------
|id of person | code1 | value1 | code2 | value2 | code3 | value3|
-----------------------------------------------------------------
|123 | hh1 | hh2 | 2 | hh3 | | |
-----------------------------------------------------------------
What I get is
------------------------------
|id of person | code | value |
------------------------------
|123 | hh2 | 2 |
------------------------------
what I want
------------------------------
|id of person | code | value |
------------------------------
|123 |hh1 | null |
------------------------------
|123 |hh2 | 2 |
------------------------------
|123 |hh3 | null |
------------------------------
I think this is the table you meant to have to get the result you want.
-----------------------------------------------------------------
|id of person | code1 | value1 | code2 | value2 | code3 | value3|
-----------------------------------------------------------------
|123 | hh1 | NULL | hh2 | 2 | hh3 | NULL |
-----------------------------------------------------------------
I am not positive if you can do this with the Unpivot Data Flow Transformation, however you can easily accomplish this in a script task with an asynchronous output. To use an asynchronous output you have to make a change to the Output properties of the transformation script task under Inputs and Outputs. Set SynchronousInputID to None
After that the following code should work for you.
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
Output0Buffer.AddRow();
Output0Buffer.ID = Row.ID;
Output0Buffer.Code = Row.Code1;
Output0Buffer.Value = Row.Value1;
Output0Buffer.AddRow();
Output0Buffer.ID = Row.ID;
Output0Buffer.Code = Row.Code2;
Output0Buffer.Value = Row.Value2;
Output0Buffer.AddRow();
Output0Buffer.ID = Row.ID;
Output0Buffer.Code = Row.Code3;
Output0Buffer.Value = Row.Value3;
}
This will create 3 new rows for every 1 row. I assumed that your identifier is ID.

Resources