Concatenate zipcode, housenumber, housenumber_ext Pyspark - concatenation

I have several tables which contains the zipcode, housenumber and housenumber_ext.
I have created two parts in the jupyter notebook. One which selects the records -
select the zipcode and housenumber when the housenumnber_ext is empty
and the other part when the housenumber is not empty.
Concatenate zipcode and housenumber when housenumber_ext is empty
df_dump_no_ext = (
df_dump.selectExpr("*")
.filter(F.col("housenumber_ext") == (""))
.select(F.concat_ws('_',"zipcode","housenumber").alias("PCHx"),\
"zipcode","housenumber","housenumber_ext",\)
)
Concantenate zipcode and housenumber when housenumber_ext is not empty
df_dump_ext = (
df_dump.selectExpr("*")
.filter(F.col("housenumber_ext") != (""))
.select(F.concat_ws('_',"zipcode","housenumber","housenumber_ext").alias("PCHx"),\
"zipcode","housenumber","housenumber_ext")
)
the zipcode, housenumber and housenumber are concatenated.
+-------+-----------+---------------+
|zipcode|housenbr |housenbr_ext |
+-------+-----------+---------------+
|1017KG |468 | |
|1019AG |111 |D |
+-------+-----------+---------------+
+---------------+-------+-----------+---------------+
|PCHx |zipcode|housenbr |housenbr_ext |
+---------------+-------+-----------+---------------+
|1017KG_468 |1017KG |468 | |
|1019AG_111_D |1019AG |111 |D |
+---------------+-------+-----------+---------------+
The code which I mentioned above is used repeated for a nect table.
Because I am not familiar with definitions yet, I would presume that for repeating code this can be defined.
Please your advise/help

Related

Store a list of values as a string when creating a table in snowflake

I am trying to create a table with 5 columns. COLUMN #2 (PROGRESS) is a comma seperated list (i.e 1,2,3,4 etc.) but when trying to create this table as either a string, variant or varchar, Snowflake refuses to allow this. Any advice on how I can create a column seperated list from a CSV? I tried to import the data as a TSV, XML, as well as a JSON file but no success.
create or replace TABLE AD_HOC.TEMP.NEW_DATA (
VISITOR_ID VARCHAR(16777216),
PROGRESS VARCHAR(16777216),
DATE DATETIME,
ROLE VARCHAR(16777216),
FIRST_VISIT DATETIME
)COMMENT='Interaction data'
;
Goal:
VISITOR_ID | PROGRESS | DATE | ROLE | FIRST_VISIT
111 | [1,2,3] | 1/1/2022 | OWNER | 1/1/2021
123 | [1] | 1/2/2022 | ADMIN | 2/2/2021
23321 | [1,2,3,4] | 2/22/2022 | USER | 3/12/2021
I encoded the column in python and loaded the data in Snowflake!
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df = doc_data.join(pd.DataFrame(mlb.fit_transform(doc_data.pop('PROGRESS')),
columns=mlb.classes_,
index=doc_data.index))
df

How to transform data when we have comma separated values in csv format file in snowflake

I have an excel csv format data set with the following data:
Columns: id, product_name, sales, quantity, Profit
Data: 1, "Novimex Executive Leather Armchair, Black","$3,709.40", 9, -$288.77
When I am trying to insert these records from stage to snowflake table, data is getting shifted from product name column because we have comma separated , Black and similarly for following columns data are getting shifted. After loading the data it is looking like as per below:
+----+-------------------------------------+--------+----------+---------+
| id | product_name | sales | quantity | Profit |
+----+-------------------------------------+--------+----------+---------+
| 1 | "Novimex Executive Leather Armchair | Black" | $3 | 709.40" |
+----+-------------------------------------+--------+----------+---------+
Query used:
copy into orders_staging (id,Product_Name,Sales,Quantity,Profit)
from
(select $1,$2,$3,$4,$5
from #sales_data_stage)
file_format = (type = csv field_delimiter = ',' skip_header = 1 ENCODING = 'iso-8859-1');
Use Field Enclosure.
FIELD_OPTIONALLY_ENCLOSED_BY='"'
If you have any issues with accounting styled numbers, remember to put " " around them too.
https://community.snowflake.com/s/question/0D50Z00008pDcoRSAS/copying-csv-files-delimited-by-commas-where-commas-are-also-enclosed-in-strings
Additional documentation for Copy To
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#type-csv
Additional documentation on the Create File
https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html

Usage of " " inside a concat statement in excel

I'm working on data cleansing of a database and I'm currently in the process of changing the upper case names into proper case. Hence, I'm using excel to have an update statement like this:
A | B | C | D |
| 1 | Name | id | Proper case name| SQL Statement |
|-----|------|-----|-----------------|---------------|
| 2 | AAAA | 1 |Aaaa |=CONCAT("UPDATE table SET Name = "'",C2,"'" WHERE id = ",B2,";") |
|-----|------|-----|-----------------|---------------|
| 3 | BBBB | 2 |Bbbb |=CONCAT("UPDATE table SET Name = "'",C3,"'" WHERE id = ",B3,";")|
The SQL state should be something like this:
UPDATE table SET Name = 'Aaaa' WHERE id = 1
UPDATE table SET Name = 'Bbbb' WHERE id = 2
I'm finding it difficult to get apostrophe around the name.
I think you need:
=CONCATENATE("UPDATE table SET Name = '",C2,"' WHERE id = ",B2,";")

Yii2 - Combining data from 2 tables with a 3 table to compair it with

Im working on getting data from a database, in this case from 3 tables, 'product' is used to check if the product is deleted or enabled, if the product is not deleted and is enabled than the 'product_translation' is used to get the language , translation & product_id (this is used to group the data with ArrayHelper::index) this goes well(see code below), but I now need to get data from a third table (in this case a category -> category_translation) which has a translation field which we want to to add to the data output, this field must match the following ways the category_id, attribute & language fields must match all. I hope that somebody understands this.
Im new to yii2 so I still are figuring out how to use this all.
----------------------------------------------
| product |
----------------------------------------------
| id | is_deleted | is_enabeld | category_id |
----------------------------------------------
---------------------------------------------------
| product_translation |
---------------------------------------------------
| product_id | attribute | language | translation |
---------------------------------------------------
----------------------------------------------------
| category_translation |
----------------------------------------------------
| category_id | attribute | language | translation |
----------------------------------------------------
Used query until now:
$query = new Query;
$slugs = $query->select('pt.language , pt.translation , pt.product_id , p.category_id , ct.translation as category')
->from(['pt' => 'product_translation'])
->leftJoin(['p' => 'product'] , 'p.id = pt.product_id')
->leftJoin(['ct' => 'category_translation'] , 'ct.category_id = p.category_id')
->where(['p.is_deleted' => 0,
'p.is_enabled' => 1,
'pt.attribute' => 'slug',
'ct.attribute' => 'slug'
// when adding this it will not return any data at all
// when not adding this it will return double/wrong data
'ct.language' => 'pt.language'
])
->all();
// Group values by id.
$results = ArrayHelper::index( $slugs , null , 'product_id' );

Looping on checkbox array values inside Laravel Controller is not working

I have multiple checkboxes on a form generated from a model to view which is presented this way:
{{Form::open(array('action'=>'LaboratoryController#store'))}}
#foreach (Accounts::where('accountclass',$i)->get() as $accounttypes)
{{ Form::checkbox('accounttype[]', $accounttypes->id)}}
#endforeach
{{Form::submit('Save')}}
{{Form::close()}}
When I return the Input::all() from my controller store method, it outputs like this:
{"client":"1","accounttype":["2","3","5","12","13","14","16","31","32","33"]}
Now I want to store the accounttypes array values to the accounts table by looping through the array in order to store each values on each rows using the same client id.
The same accounttype will be inserted to the second table but with different data.
So, my accounts table:
+-------------+---------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------------+------+-----+---------+----------------+
| accountno | int(11) unsigned zerofill | NO | PRI | NULL | auto_increment |
| accounttype | int(11) | NO | | NULL | |
| client | int(11) | NO | | NULL | |
| created_at | datetime | NO | | NULL | |
+-------------+---------------------------+------+-----+---------+----------------+
My controller store method:
public function store()
{
$accounttypes = Input::get('accounttype');
if(is_array($accounttypes))
{
for($i=0;$i < count($accounttypes);$i++)
{
// insert data on first table (accounts table)
$accountno = DB::table('accounts')->insertGetId(array('client'=>Input::get('client'),'accounttype',$accounttypes[$i]));
// insert data on the second table (account summary table) using the account no above
// DB::table('accountsummary')...blah blah
}
}
return Redirect::to('some/path');
}
The function seems to work but only for the first array value which is "2". I don't know what's wrong with the code but it seems that the loop doesn't go through the rest of the values. I was testing other loop methods like while and foreach but still the looping variable ($i) returns zero.
I was thinking if laravel controller doesn't allow loops on POST methods.
Your inputs are greatly appreciated. Thanks..
Foreach and DB::insert() works for me.
foreach ($accounttypes as $accounttype) {
DB::insert('INSERT INTO tb_accounts (accounttype,client) VALUES (?,?)', array($accounttype,Input::get('client'));
}
I just need to create separate query to get the last insert id because DB::insertGetId doesn't work the way I want it. But that's another issue. Anyway, thanks.

Resources