How to reference sub-objects in talend schema - salesforce

So I have the following SOQL query that includes the ActivityHistories relationship of the Account object:
SELECT Id, Name, ParentId, (SELECT Description FROM ActivityHistories)
FROM Account
WHERE Name = '<some client>'
This query works just in in SOQLXplorer and returns 5 nested rows under the ActivityHistories key. In Talend, I am following the instructions from this page to access the sub-objects (although the example uses the query "up" syntax, not the query "down" syntax. My schema mapping is as follows:
The query returns the parent Account rows but not the ActivityHistory rows that are in the subquery:
Starting job GetActivities at 15:43 22/06/2016.
[statistics] connecting to socket on port XXXX
[statistics] connected
0X16000X00fQd61AAC|REI||
[statistics] disconnected
Job GetActivities ended at 15:43 22/06/2016. [exit code=0]
Is it possible to reference the subrows using Talend? If so, what is the syntax for the schema to do so? If not, how can I unpack this data in some ay to get to the Description fields for each Account? Any help is much appreciated.
Update: I have written a small python script to extract the ActivityHistory records and dump them in a file, then used a tFileInput to ingest the CSV and then continue through my process. But this seems very kludgey. Any better options out there?

I've done some debugging from the code perspective and it seems that if you specify correct column name, you will achieve the correct response. For your example, it should be: Account_ActivityHistories_records_Description
The output from tLogRow will be similar to:
00124000009gSHvAAM|Account1|tests;Lalalala
As you can see, the Description from all child elements are stored as 1 string, delimited by the semicolon. You can change the delimiter on Advanced Settings view on the SalesforceInput.

I have written a small python script (source gist here) to extract the ActivityHistory records and dump them in a file (command line argument), then used a tFileInput to ingest the CSV and then continue through my process.

Related

Locating Columns that Contain a String in their Name

Other than manually traversing every table schema in the entire database, how can I produce a list of all tables that contain a field containing the string "email" in Pervasive 13?
For example, in IBM DB2, I can do this with a query like this:
select tabschema,tabname,colname
from syscat.columns
where upper(colname) LIKE UPPER('%email%')
order by tabname
How can I achieve this in Pervasive 13?
You can query the System Objects, use:
SELECT f.Xf$Name, g.Xe$Name
FROM X$File f
INNER JOIN X$Field g ON g.Xe$File = f.Xf$Id
WHERE UPPER(g.Xe$Name) LIKE '%EMAIL%';
I'm still open to other suggestions, but the way I did this was by exporting the database schema to a .sql text file, and I used a regular expression create table.*email to search through that file and locate all the tables containing a column with email in their name.
This worked, but I look forward to other people's suggestions.

COPY INTO with partitioned ADLS

I have a container with partitioned parquet files that I want to use with the copy into command. My directories look like the below.
ABC_PARTITIONED_ID=1 (directory)
1-snappy.parquet
2-snappy.parquet
3-snappy.parquet
4-snappy.parquet
ABC_PARTITIONED_ID=2 (directory)
1-snappy.parquet
2-snappy.parquet
3-snappy.parquet
ABC_PARTITIONED_ID=3 (directory)
1-snappy.parquet
2-snappy.parquet
....
Each partitioned directory can contain multiple parquet files. I do not have a hive partition column that matches the pattern of the directories (ID1, ID2 etc).
How do I properly use the pattern parameter in the copy into command to write to a SF table from my ADLS? I am using this https://www.snowflake.com/blog/how-to-load-terabytes-into-snowflake-speeds-feeds-and-techniques/ as an example.
I do not think that you have anything to do with the pattern parameter.
You said you do not have a hive partition column that matches the pattern of the directories. If you do not have a column to use these partitions, then they are probably not beneficial for querying the data. Maybe they were generated to help maintenance. If this is the case, ignore the partition, and read all files with the COPY command.
If you think having such a column would help, then the blog post (you mentioned) already shows how you can parse the filenames to generate the column value. Add the partition column to your table (and even you may define it as the clustering key), and run the COPY command to read all files in all partitions/directories, parse the value of the column from the file name.
For parsing the partition value, I would use this one which seems easier:
copy into TARGET_TABLE from (
select
REGEXP_SUBSTR (
METADATA$FILENAME,
'.*\ABC_PARTITIONED_ID=(.*)\/.*',
1,1,'e',1
) partitioned_column_value,
$1:column_name,
...
from #your_stage/data_folder/);
If the directory/partition name doesn't matter to you, then you can use some of the newer functions in Public Preview that support Parquet format to create the table and ingest the data. Your question on how to construct the pattern would be PATTERN='*.parquet' as all subfolders would be read.
//create file format , only required to create one time
create file format my_parquet_format
type = parquet;
//EXAMPLE CREATE AND COPY INTO FOR TABLE1
//create an empty table using this file format and location. name the table table1
create or replace table ABC
using template (
select array_agg(object_construct(*))
from table(
infer_schema(
location=>'#mystage/ABC_PARTITIONED_ROOT',
file_format=>'my_parquet_format'
)
));
//copy parquet files in folder /table1 into table TABLE1
copy into ABC from #mystage/ABC_PARTITIONED_ROOT pattern = '*.parquet' file_format=my_parquet_format match_by_column_name=case_insensitive;
This should be possible by creating a storage integration, granting access in Azure for Snowflake to access the storage location, and then creating an external stage.
Alternatively you can generate a shared access signature (SAS) token to grant Snowflake (limited) access to objects in your storage account. You can then access an external (Azure) stage that references the container using the SAS token.
Snowflake metadata provides
METADATA$FILENAME - Name of the staged data file the current row belongs to. Includes the path to the data file in the stage.
METADATA$FILE_ROW_NUMBER - Row number for each record
We could do something like this:
select $1:normal_column_1, ..., METADATA$FILENAME
FROM
'#stage_name/path/to/data/' (pattern => '.*.parquet')
limit 5;
For example: it would give something like:
METADATA$FILENAME
----------
path/to/data/year=2021/part-00020-6379b638-3f7e-461e-a77b-cfbcad6fc858.c000.snappy.parquet
we need to handle deducing the column from it. We could do a regexp_replace and get the partition value as column like this:
select
regexp_replace(METADATA$FILENAME, '.*\/year=(.*)\/.*', '\\1'
) as year
$1:normal_column_1,
FROM
'#stage_name/path/to/data/' (pattern => '.*.parquet')
limit 5;
In the above regexp, we give the partition key.
Third parameter \\1 is the regex group match number. In our case, first group match - this holds the partition value.
More detailed answer and other approaches to solve this issue is available on this stackoverflow answer

Postgres: is there any row_to_json equivalent that returns values only?

In a project I'm working on, I need to stream potentially large data sets from a Postgres database to the client, for analytics purposes.
The application is built in Rails (irrelevant for this question) and after a bit of research I'm currently able to stream query results by using COPY in Postgres:
COPY (SELECT row_to_json(t) from (#{query}) t) TO STDOUT;
Sources (for who's interested):
https://shift.infinite.red/fast-csv-report-generation-with-postgres-in-rails-d444d9b915ab
https://github.com/brianhempel/stream_json_demo
This works, but it yields every row as a key-value pair, e.g.:
["{\"id\":403457,\"email\":\"email403457#example.com\",\"first_name\":\"Firstname403457\",\"last_name\":\"Lastname403457\",\"source\":\"adwords\",\"created_at\":\"2015-08-05T22:43:07.295796\",\"updated_at\":\"2017-01-19T04:48:29.464051\"}"]
In the spirit of minimising the size (in bytes) of the response and especially since this is getting served through the web, I want to return just an array of values for every row, i.e.:
["[403457, \"email403457#example.com\", \"Firstname403457\", \"Lastname403457\", \"adwords\", \"2015-08-05T22:43:07.295796\", \"2017-01-19T04:48:29.464051\"]"]
Is there a way to achieve this within Postgres, even by nesting functions, starting from the query above?
You could create a simple SQL function that converts a row into the desired format:
CREATE FUNCTION row2json(anyelement) RETURNS json
LANGUAGE sql STABLE AS
'SELECT json_agg(z.value) FROM json_each(row_to_json($1)) z';
Then you use that to transform the output:
SELECT row2json(mytab) FROM mytab;
If performance is more important than JSON output, just cast the result to a string:
SELECT CAST(mytab AS text) FROM mytab;

Microsoft word Database quick part - How to use a mergefield as a filter for the database query

I am using mail merge to input data from an excel sheet. Everthing works great and I can access my variables using «MyMergefield»
Now I need for each letter generated to look into another excel file and do a query that will take the «MyMergefield» as a query filter SELECT FROM x WHERE field1 = «MyMergefield»
The way I am proceeding is "inserting a quick part" => "Field" in my word document.
In the quickpart dialog, I choose "DataBase", then I choose my excel file.
once the data source is chosen, There an option to change the request parameters, I click on it and I get the filter configuration popup where I can choose the field (from the excel sheet), the operator ("equals" in this case). Then there's the compare with field. In my case its not as simple as comparing to as string. Its comparing to a mail merge field.
I tried the following syntax:
«Myfield»
MERGEFIELD Myfield
MERGEFIELD "Myfield"
{MergeField Myfield}
{ MERGEFIELD Myfield}
None worked, it complained that it did not find any match so it did not insert the database (Of course it will not find any match to the syntax if I don't run mail merge)
I did look directly in the openxml file of an existing example (because I can't edit existing quickpart - Correct me if Im wrong) and the database query looked like:
FROM `Candidates$` WHERE ((`column` = '</w:instrText>
...
<w:instrText xml:space="preserve"> MERGEFIELD Myfield</w:instrText>
</w:r>
Any ideas? Thank you!

Loading 532 columns from a CSV file into a DB2 table

Summary : Is there a limit to the number of columns which can be Imported/Loaded from a CSV file? If yes, what is the workaround? Thanks
I am very new to DB2, and I am supposed to import a | (pipe) delimited csv file which contains 532 columns into a DB2 table which also has 532 columns in exact positions as the csv. I also have a smaller file with only 27 columns in both csv and table. I am using the following command:
IMPORT FROM "C:\myfile.csv" OF DEL MODIFIED BY COLDEL| METHOD P (1, 2,....27) MESSAGES "C:\messages.txt" INSERT INTO PRE_SUBS_GPRS2_1010 (col1,col2,....col27);
This works fine.
But in the second file, which is like:
IMPORT FROM "C:\myfile.csv" OF DEL MODIFIED BY COLDEL| METHOD P (1, 2,....532) MESSAGES "C:\messages.txt" INSERT INTO PRE_SUBS_GPRS_1010 (col1,col2,....col532);
It does not work. It gives me an error that says:
SQL3037N An SQL error "-206" occurred during Import processing.
Explanation:
An SQL error occurred during processing of the Action String (for
example, "REPLACE into ...") parameter.
The command cannot be processed.
User Response:
Look at the SQLCODE (message number) in the message for more
information. Make changes and resubmit the command.
I am using the Control Center to run the query, not command prompt.
The problem was because one of the column names in the list of columns of the INSERT statement was more than 30 characters long. It was getting truncated and was not recognized.
Hope this helps others in future. Please let me know if you need further details.
The specific error code is SQL0206 and the documentation about this error is here
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.messages.sql.doc/doc/msql00206n.html
For the limits, I think the maximal quantity of columns in an import should be the maximal quantity permitted for a Table. Take a look in the information center
Database fundamentals > SQL > SQL and XML limits
Maximum number of columns in a table 7 1012
Try to import just one row. If you have problems, probably is due to incompatibility of types, column order, duplicated rows with the already present in the table.

Resources