Compare date from source file and choose the most recent one - arrays

I'm struggling with the following task. I have a source file in sft and I must run an ETL JOB to compare the dates in this file (DateA) with the ones in my database (DATEB).
Could you please help me setting this condition:
Condition 1) If Date A is null, leave Date B as it is
Condition 2) If Date A is more recent than Date B, replace Date B with Date A
I was thinking of something like that, but i can't finish it:
"expression": "DATEA eq '' ? DATEB : Condition 2",
"field": "data_CHANNEL_EMAIL_ACCEPTANCEDATE"
}
Is that correct? How to I write the second condition?

Related

Snapshot of a table daily for 30 days with task and stored procedure

I wanted to have a view of a table at the end of each day at midnight in the PST timezone. These tables are very small, only 300 entries per day on average.
I want to track the change in the rows based on the ids of the table and take a snap shot of their state each day, where each new table view would have a 'date' status.
The challenge is that the original table is growing so each new 'snapshot' will be a different size.
Here is an example of my data:
Day 1
Id Article Title Genre views date
1, "I Know Why the Caged Bird Sings", 10, 01-26-2019
2, "In the Cold", "Non-Fiction", 20, 01-26-2019
Day 2
Id Article Title Genre views date
1, "I Know Why the Caged Bird Sings", "Non-Fiction", 20, 02-27-2019
2, "In the Cold", "Non-Fiction", 40, 02-27-2019
3, "Bury My Heart At Wounded Knee", "Non-Fiction", 100, 02-27-2019
I have a stored procedure that I would like to create to copy the state of the current table. However it is not recommended to create a table in a stored procedure, so I am trying to create a task that manages the table creation and stored procedure call:
USE WAREHOUSE "TEST";
CREATE DATABASE "Snapshots";
USE DATABASE "Snapshots";
Create or Replace Table ArticleLibrary (id int, Title string, Genre string, Viewed number, date_captured timestamp );
INSERT INTO ArticleLibrary Values
(1, 'The man who walked the moon', 'Non-Fiction', 10, CURRENT_DATE() ),
(2, 'The girl who went to Vegas', 'Fiction', 20 , CURRENT_DATE())
;
SELECT * FROM ArticleLibrary;
//CREATE Stored Procedure
create procedure Capture_daily()
Returns string
LANGUAGE JAVASCRIPT
AS
$$
var rs = snowflake.execute({sqlText: "})
var rs = snowflake.execute( {sqlText: "COPY INTO "ARTICLELIBRARY"+CURRENT_DATE() * FROM ArticleLibrary; "} ); );
return 'success';
$$
;
CREATE OR REPLACE TASK IF NOT EXISTS Snapshot_ArticleLibrary_Task
WAREHOUSE = 'TEST'
SCHEDULE = '{ 1440 MINUTE | USING CRON 0 America/Los_Angeles }'
AS
CREATE TABLE "ARTICLELIBRARY"+"CURRENT_DATE()";
CALL Capture_daily();
//Run tomorrow
INSERT INTO ArticleLibrary Values
(3, 'The Best Burger in Town', 'News', 100, CURRENT_DATE());
I need some help improving the Stored Procedure and task I set up, I am not sure how to call more than one sql statement at the end of the task.
I am open to advice on how to better achieve this, considering this is a small amount of data and just an experiment to demonstrate compute cost on a small scale. I also am considering using a window function with a window frame in one large table that inserts data from each new day, where the date is the status, the ids would then not be unique.
Since you're talking about daily snapshots and such a small amount of data I would insert each days snapshot into a single table with the current_date() as a new column called "snapshot_id" for example.
You can have a view on top of this table that shows the latest day or even a UDF that can take the day as a parameter and return the results for any day. This table will be extremely quick since it'll be naturally clustered by the "snapshot_id" column and you will have all of your history in one spot which is nice and clean.
I've done this in the past where our source tables had millions of records and you can get quite far with this approach.
I would recommend leveraging the Zero-Copy Cloning functionality of Snowflake for this. You can create a clone every day with a simple command, it will take no time, and if the underlying data isn't completely changing every day, then you're not going to use any additional storage, either.
https://docs.snowflake.net/manuals/sql-reference/sql/create-clone.html
You would still need an SP to dynamically create the table name based on the date, and you can execute that SP from a TASK.
For the question: I am not sure how to call more than one sql statement at the end of the task.
One of the approaches would be to embed the multiple sql commands in their desired order in a stored procedure and call this stored procedure through the Task.
create or replace procedure capture_daily()
returns string
language javascript
as
$$
var sql_command1 = snowflake.createStatement({ sqlText:`Create or Replace Table
"ARTICLELIBRARY".....`});
var sql_command2 = snowflake.createStatement({ sqlText:'COPY INTO
"ARTICLELIBRARY" ...`});
var sql_command3 = snowflake.createStatement({ sqlText:"Any Other DML
Command OR CALL sp_name"});
 
try 
{ 
sql_command1.execute(); 
sql_command2.execute(); 
sql_command3.execute(); 
return "Succeeded."; 
} 
catch (err) 
{ 
return "Failed: " + err;
} 
$$
;
 
CREATE OR REPLACE TASK IF NOT EXISTS Snapshot_ArticleLibrary_Task
WAREHOUSE = 'TEST'
SCHEDULE = '{ 1440 MINUTE | USING CRON 0 America/Los_Angeles }'
AS
CALL Capture_daily();

Fitbit Data Export - Creating a data warehouse

I plan to create a Fitbit data warehouse for educational purposes, and there doesn't seem to be any material online for Fitbit data specifically.
A few issues faced:
You can only export 1 month of data (max) at a time from the Fitbit website. My plan would be to drop a month's worth of data at a time into a folder, and have these files read seperately.
You can either export the data through CSV or .XLS. The issue with XLS is that each day in the month will create a seperate sheet for food logs, which will then need to be merged in a staging table. The issue with CSV would be that there is one sheet per file, with all of the data in there: CSV Layout
I would then use SSIS to load the data into a SQL Server database for reporting purposes.
Which would the more suited approach be, to export the data using .XLS format or CSV?
Edit: How would it be possible to load a CSV file into SSIS with such a format?
The CSV layout would be as such:
Body,,,,,,,,,
Date,Weight,BMI,Fat,,,,,,
01/06/2018,71.5,23.29,15,,,,,,
02/06/2018,71.5,23.29,15,,,,,,
03/06/2018,71.5,23.29,15,,,,,,
04/06/2018,71.5,23.29,15,,,,,,
05/06/2018,71.5,23.29,15,,,,,,
06/06/2018,71.5,23.29,15,,,,,,
07/06/2018,71.5,23.29,15,,,,,,
08/06/2018,71.5,23.29,15,,,,,,
09/06/2018,71.5,23.29,15,,,,,,
10/06/2018,71.5,23.29,15,,,,,,
11/06/2018,71.5,23.29,15,,,,,,
12/06/2018,71.5,23.29,15,,,,,,
13/06/2018,71.5,23.29,15,,,,,,
14/06/2018,71.5,23.29,15,,,,,,
15/06/2018,71.5,23.29,15,,,,,,
16/06/2018,71.5,23.29,15,,,,,,
17/06/2018,71.5,23.29,15,,,,,,
18/06/2018,71.5,23.29,15,,,,,,
19/06/2018,71.5,23.29,15,,,,,,
20/06/2018,71.5,23.29,15,,,,,,
21/06/2018,71.5,23.29,15,,,,,,
22/06/2018,71.5,23.29,15,,,,,,
23/06/2018,71.5,23.29,15,,,,,,
24/06/2018,71.5,23.29,15,,,,,,
25/06/2018,71.5,23.29,15,,,,,,
26/06/2018,71.5,23.29,15,,,,,,
27/06/2018,71.5,23.29,15,,,,,,
28/06/2018,71.5,23.29,15,,,,,,
29/06/2018,72.8,23.72,15,,,,,,
30/06/2018,72.95,23.77,15,,,,,,
,,,,,,,,,
Foods,,,,,,,,,
Date,Calories In,,,,,,,,
01/06/2018,0,,,,,,,,
02/06/2018,0,,,,,,,,
03/06/2018,0,,,,,,,,
04/06/2018,0,,,,,,,,
05/06/2018,0,,,,,,,,
06/06/2018,0,,,,,,,,
07/06/2018,0,,,,,,,,
08/06/2018,0,,,,,,,,
09/06/2018,0,,,,,,,,
10/06/2018,0,,,,,,,,
11/06/2018,0,,,,,,,,
12/06/2018,0,,,,,,,,
13/06/2018,100,,,,,,,,
14/06/2018,0,,,,,,,,
15/06/2018,0,,,,,,,,
16/06/2018,0,,,,,,,,
17/06/2018,0,,,,,,,,
18/06/2018,0,,,,,,,,
19/06/2018,0,,,,,,,,
20/06/2018,0,,,,,,,,
21/06/2018,0,,,,,,,,
22/06/2018,0,,,,,,,,
23/06/2018,0,,,,,,,,
24/06/2018,0,,,,,,,,
25/06/2018,0,,,,,,,,
26/06/2018,0,,,,,,,,
27/06/2018,"1,644",,,,,,,,
28/06/2018,"2,390",,,,,,,,
29/06/2018,981,,,,,,,,
30/06/2018,0,,,,,,,,
For example, "Foods" would be the table name, "Date" and "Calories In" would be column names. "01/06/2018" is the Date, "0" is the "Calories in" and so on.
Tricky, I just pulled my fitbit data as this peaked my curiosity. That csv is messy. You basically have mixed file formats in one file. That won't be straight forward in SSIS. The XLS format and like you mentioned the food logs tagging each day on the worksheet, SSIS won't like that changing.
CSV:
XLS:
Couple of options off the top of my head that I see for CSV.
Individual exports from Fitbit
I see you can pick which data you want to include in your export: Body, Foods, Activities, Sleep.
Do each export individually, saving each file with a prefix of what type of data it is.
Then build SSIS with multiple foreach loops and data flow task for each individual file format.
That would do it, but would be a tedious effort when having to export the data from Fitbit.
Handle the one file with all the data
This option you would have to get creative since the formats are mixed and you have sections with difference column definitions, etc.
One option would be to create a staging table with as many columns as which ever section has the most, which looks to be maybe "Activities". Give each column a generic name as Column1,Column2 and make them all VARCHAR.
Since we have mixed "formats" and not all data types would line up we just need to get all the data out first and then sort out conversion later.
From there you can build one data flow and flat file source and also get line number added since we will need to sort out where each section of data is later.
When building out the file connection for your source you will have to manually add all columns since the first row of data in your file doesn't include all the commas for each field, SSIS won't be able to detect all the columns. Manually add the number of columns needed, also make sure:
Text Qualifier = "
Header row Delimiter = {LF}
Row Delimiter = {LF}
Column Delimiter = ,
That should get you data loaded into a database at least into a stage table. From there you would need to use a bunch of T-SQL to zero in on each "section" of data and then parse, convert and load from there.
Small test I did I just had table call TestTable:
CREATE TABLE [dbo].[TestTable](
[LineNumber] [INT] NULL,
[Column1] [VARCHAR](MAX) NULL,
[Column2] [VARCHAR](MAX) NULL,
[Column3] [VARCHAR](MAX) NULL,
[Column4] [VARCHAR](MAX) NULL,
[Column5] [VARCHAR](MAX) NULL,
[Column6] [VARCHAR](MAX) NULL,
[Column7] [VARCHAR](MAX) NULL,
[Column8] [VARCHAR](MAX) NULL,
[Column9] [VARCHAR](MAX) NULL
)
Dataflow and hooked up the file source:
Execute dataflow and then I had data loaded as:
From there I worked out some T-SQL to get to each "Section" of data. Here's an example that shows how you could filter to the "Foods" section:
DECLARE #MaxLine INT = (
SELECT MAX([LineNumber])
FROM [TestTable]
);
--Something like this, using a sub query that gets you starting and ending line numbers for each section.
--Doing the conversion of what column that section of data ended up in.
SELECT CONVERT(DATE, [a].[Column1]) AS [Date]
, CONVERT(BIGINT, [a].[Column2]) AS [CaloriesIn]
FROM [TestTable] [a]
INNER JOIN (
--Something like this to build out starting and ending line number for each section
SELECT [Column1]
, [LineNumber] + 2 AS [StartLineNumber] --We add 2 here as the line that start the data in a section is 2 after its "heading"
, LEAD([LineNumber], 1, #MaxLine) OVER ( ORDER BY [LineNumber] )
- 1 AS [EndLineNumber]
FROM [TestTable]
WHERE [Column1] IN ( 'Body', 'Foods', 'Activities' ) --Each of the sections of data
) AS [Section]
ON [a].[LineNumber]
BETWEEN [Section].[StartLineNumber] AND [Section].[EndLineNumber]
WHERE [Section].[Column1] = 'Foods'; --Then just filter on what sectoin you want.
Which in turn gave me the following:
There could be other options for parsing that data, but this should give a good starting point and a idea on how tricky this particular CSV file is.
As for the XLS option, that would be straight forward for all sections except food logs. You would basically setup an excel file connection and each sheet would be a "table" in the source in the data flow and have individual data flows for each worksheet.
But then what about Food logs. Once those changed and you rolled into the next month or something SSIS would freak out, error, probably complain about metadata.
One obvious work around would be manually manipulate the excel and merge all of them into one "Food Log" sheet prior to running it through SSIS. Not ideal because you'd probably want something completely automated.
I'd have to tinker around with that. Maybe a script task and some C# code to combine all those sheets into one, parsing the date out of each sheet name and appending it to the data prior to a data flow loading it. Maybe possible.
Looks like there are challenges with both of the files Fitbit is exporting out no matter which format you look at.

Talend parse Date "yyyy-MM-dd'T'HH:mm:ss'.000Z'"

I have an error parsing a date in Talend.
My input is an excel file as String and my output is a Date with the following Salesforce format "yyyy-MM-dd'T'HH:mm:ss'.000Z'"
I have a tMap with this connection
TalendDate.parseDate("yyyy-MM-dd'T'HH:mm:ss'.000Z'",Row1.firstDate)
but is throwing the following error:
java.lang.RuntimeException: java.text.ParseException: Unparseable
date: "2008-05-11T12:02:46.000+0000" at
routines.TalendDate.parseDate(TalendDate.java:895)
Any help?
Thanks
In TalendDate.parseDate, the parameter "pattern" must match the pattern of the input String, and not the pattern of the Date you want in the output.
You can try :
TalendDate.parseDate("yyyy-MM-dd'T'HH:mm:ss'.000+0000'",Row1.firstDate )
Formatting of Date output is accessible in the 'schema' menu, in "Date Model" column.
Try this,
TalendDate.parseDate("MM/dd/yyyy",'T'HH:mm:ss',Row1.firstDate);

SSIS Derived Column Replace Dates

Overview of situation: I have two databases (one is DB2 and one is MSSQL) and I am using an SSIS package to feed from one to the other via jobs. In our SQL table, datetime fields were set up as SmallDateTime (years and years ago, cannot change at this point in time yet to DateTime). We are now getting dates that are coming through as year 2099 (1/1/2099) which fails as SmallDateTime can only go MaxDate of 06/06/2079 11:59:59.
My/our solution is to use the Derived Column transform to check the date, and if it is over year 2078, make it null. It was also advised that check for null before checking date.
I tried doing this,
[Derived Column Name] [Derived Column ] [Expression]
[ MyDate ] [Replace "MyDate"] [MyDate == "" ? NULL(DT_WSTR,5) : MyDate]
[ VerifiedDates ] [Add As New Column] [VerifiedDates == YEAR((DT_DBDATE)MyDate) > = 2078 ? NULL(DT_WSTR,10) : MyDate]
But this did not work for two reasons. Not only was the expression wrong, it also would not allow me to replace the column of "MyDate" like I did in the first run. Can I not replace a column more than once? Do these tasks happen at the same time?
Due to that issue, I tried to just replace the dates via the expression
[ MyDate ][Replace "MyDate"][YEAR((DT_DBDATE)MyDate) >= 2078 ? NULL(DT_WSTR, 10) : MyDate]
as well as
[ MyDate ][Replace "MyDate"][MyDate == YEAR((DT_DBDATE)MyDate) >= 2078 ? NULL(DT_WSTR, 10) : MyDate]
But none of these seem to be the correct syntax... Can anyone point me to where I am off?
I'm also having trouble finding a good resource for the syntax, presently using this ref
Have you tried the DATEPART function instead
[ MyDate ][Replace "MyDate"][ DATEPART("yyyy", (DT_DBTIMESTAMP)MyDate) >= 2078 ? NULL(DT_WSTR, 10) : MyDate ]

Assign date in salesforce.

I have written a batch job. In this batch job i have a condition to take date > = 1/1/2012
I am not able to give the date in the query. can you please help me.
global Database.QueryLocator start(Database.BatchableContext BC) {
system.debug('Inside start');
//date mydate = date.valueof(1/1/2012);
return Database.getQueryLocator('select name from opportunity');
}
I have given it in 2 ways
1st is :
took the date into a date field and give the condition as date >= :mydate
(shows an error in debug log as Invalid date: 0 )
and
2nd way when i gave the date their itself as date >=: 1/1/2012
(it shows as error in debug log as unexpected token: / )
can you please help me
Thanks
Anu
You must follow the proper date format
YYYY-MM-DD
select name from opportunity where mydate >= 2012-01-01
More information here

Resources