Salesforce -Fire apex trigger only after complete data load - salesforce

So here is the issue
We are loading data into a CustomObject__c using DataLoader.
Usually the no of records that are passed are 3.
Also, if there is any issue with the data passed, they run the dataloader again and pass the corrected data. Now, the older data has to be deleted.
So, I am handling it in before insert code and calling a batch in after insert code.
Here is the code for my trigger:
trigger TriggerCustom on CustomObject__c (before insert, after insert) {
List<CustomObject__c> customobjectlist = [Select Id from CustomObject__c WHERE CreatedDate = TODAY ];
if (Trigger.isBefore) {
delete exchlisttoday;
}
if(Trigger.isAfter)
{
BatchApex b = BatchApex();
Database.executebatch(b);
}
}
This was designed keeping in mind they pass only 3 records at a time.
However, now they want to pass more than 200 records using data loader.
How can I modify my trigger so that it fires only after one single dataload is completed (for e.g. if they pass 1000 records at once, the trigger has to fire only after the 1000 records are completely inserted

Trigger will not know when you are done, after 3, 203 or 10000 records (you can use bulk api to load large volumes, they'll be chunked into 10K packets but still - triggers will work 200 at a time).
If you have scripted data load - maybe you can update something else as next step. Another object (something dummy that has just 1 record) and have trigger on this?
If you have scripted data load - maybe you can query the Ids and then pass them to delete operation which would run before the upload task. This becomes bit too much for poor little data loader but Talend, Informatica, Azure Data Factory, Jitterbit etc proper ETL tools could do it. (although deleting before is bit brave... what if the load fails? You're screwed... Maybe delete should be after successful update)
Maybe you can guarantee that last record in your daily load will have some flag set and in the trigger - look for that flag?
Maybe you can schedule the batch to run every hour. You can't do it easily from UI but you can write the cron expression and schedule as 1-liner in dev console. In the Schedulable's execute() make it check if there was anything loaded today and if there was even single record - trigger the batch?

Related

Can we parametrize snowflake tasks?

I need to do one-time historical data load, followed by incremental load every 10 minutes.
is there a way to parametrize snowflake task to 1st run the historical load and then change the parameter to execute incremental loads? if not, can you suggest a better approach to handle historical (One-time) and incremental loads via tasks
Note: An underlying table of snowflake stream contains historical records and any new data after implementing stream/tasks is considered as incremental.
if you have a task call a stored procedure, you could have the stored procedure first check to see if the target table is empty (or whatever check you want. As long as you can write it as code, it'll work. Heck you could have it insert a task run log into a separate table, and check to see if it's the first time it's run.) and do the initial historical load in that case, and not otherwise.
Then the first time you run it, it will do one code path, and foreverafter it will do the other.

Is there a way to define a Dynamic Table comprised of entries that have NOT been touched by an event recently?

I'm new to Flink and I'm trying to use it to have a bunch of live views of my application. At least one of the dynamic views I'd like to build would be to show entries that have not met an SLA -- or essentially expired -- and the condition for this would be a simple timestamp comparison. So I would basically want an entry to show up in my dynamic table if it has NOT been touched by an event recently. In playing around with Flink 1.6 (constrained to this due to AWS Kinesis) in a dev environment, I'm not seeing that Flink is re-evaluating a condition unless an event touches that entry.
I've got my dev environment plugged into a Kinesis stream that's sending in live access log events from a web server. This isn't my real use case but it was an easy one to begin testing with. I've written a simple table query that pulls in a request path, its last access time, and computes a boolean flag to indicate whether it hasn't been accessed in the last minute. I'm debugging this via a retract stream connected to PrintSinkFunction so all updates/deletes are printed to my console.
tEnv.registerDataStream("AccessLogs", accessLogs, "username, status, request, responseSize, referrer, userAgent, requestTime, ActionTime.rowtime");
Table paths = tEnv.sqlQuery("SELECT request AS path, MAX(requestTime) as lastTime, CASE WHEN MAX(requestTime) < CURRENT_TIMESTAMP - INTERVAL '1' MINUTE THEN 1 ELSE 0 END AS expired FROM AccessLogs GROUP BY request");
DataStream<Tuple2<Boolean, Row>> retractStream = tEnv.toRetractStream(paths, Row.class);
retractStream .addSink(new PrintSinkFunction<>());
I expect that when I access a page, an Add event is sent to this stream. Then if I wait 1 minute (do nothing), the CASE statement in my table will evaluate to 1, so I should see a Delete and then Add event with that flag set.
What I actually see is that nothing happens until I load that page again. The Delete event actually has the flag set, while the Add event that immediate follows that has it cleared again (as it should since it's no longer "expired).
// add/delete, path, lastAccess, expired
(true,/mypage,2019-05-20 20:02:48.0,0) // first page load, add event
(false,/mypage,2019-05-20 20:02:48.0,1) // second load > 2 mins later, remove event for the entry with expired flag set
(true,/mypage,2019-05-20 20:05:01.0,0) // second load, add event
Edit: The most useful tip I've come across in my searching is to create a ProcessFunction. I think this is something I could make work with my dynamic tables (in some cases I'd end up with intermediate streams to look at computed dates), but hopefully it doesn't have to come to that.
I've gotten the ProcessFunction approach to work but it required a lot more tinkering than I initially thought it would:
I had to add a field to my POJO that changes in the onTimer() method (could be a date or a version that you simply bump each time)
I had to register this field as part of the dynamic table
I had to use this field in my query in order for the query to get re-evaluated and change the boolean flag (even though I don't actually use the new field). I just added it as part of my SELECT clause.
Your approach looks promising but a comparison with a moving "now" timestamp is not supported by Flink's Table API / SQL (yet).
I would solve this in two steps.
register the dynamic table in upsert mode, i.e., a table that is upserted per key (request in your case) based on a version timestamp (requestTime in your case). The resulting dynamic table would hold the latest row for every request.
Have a query with a simple filter predicate like yours that compares the version timestamp of the rows of the dynamic (upsert) table and filters out all rows that have timestamps which are too close to now.
Unfortunately, neither of both features (upsert conversions and comparisons against the moving "now" timestamp) are available in Flink, yet. There is some ongoing work for upsert table conversions though.

Auto updating access database (can't be linked)

I've got a CSV file that refreshes every 60 seconds with live data from the internet. I want to automatically update my Access database (on a 60 second or so interval) with the new rows that get downloaded, however I can't simply link the DB to the CSV.
The CSV comes with exactly 365 days of data, so when another day ticks over, a day of data drops off. If i was to link to the CSV my DB would only ever have those 365 days of data, whereas i want to append the existing database with the new data added.
Any help with this would be appreciated.
Thanks.
As per the comments the first step is to link your CSV to the database. Not as your main table but as a secondary table that will be used to update your main table.
Once you do that you have two problems to solve:
Identify the new records
I assume there is a way to do so by timestamp or ID, so all you have to do is hold on to the last ID or timestamp imported (that will require an additional mini-table to hold the value persistently).
Make it happen every 60 seconds. To get that update on a regular interval you have two options:
A form's 'OnTimer' event is the easy way but requires very specific conditions. You have to make sure the form that triggers the event is only open once. This is possible even in a multi-user environment with some smart tracking.
If having an Access form open to do the updating is not workable, then you have to work with Windows scheduled tasks. You can set up an Access Macro to run as a Windows scheduled task.

High volume inserts for SQL Server

I'm looking for some advice on how to implement a process for mass inserts, like to the tune of 400 records per second. The data is coming from an outside real time trigger and the app will get notified when a data change happens. When that data change happens, I need to consume it.
I've looked at several different implementations for doing batch processing including using datatables/sqlbulkcopy or writing to csv and consuming.
What can you recommend?
400 inserts per second doesn't feel like it should present any major challenge. It depends on what you're inserting, if there are any indexes which could have page splits due to inserts, and if you have any extra logic going on during your insert proc or script.
If you want to insert them one by one, I would recommend just building a barebones stored procedure which does a simple insert of it's parameters into a staging table with no indexes, constraints, anything. That will allow you to very quickly get the data into the database, and you can have a separate process come through every minute or something and work off the rows in batches.
Alternatively, you could have your application store up records until you reach a certain number, and then insert them into the database with a proc using a table-valued parameter. Then you'll only have one insert of however many rows you chose to batch up. The cost of that should be pretty trivial. Do note however that if your application crashes before it's inserted enough rows, those will be lost.
SqlBulkCopy is a powerful tool, but as the name suggests, it's built more for bulk loading of tables. If you have a constant stream of insert requests coming in, I would not recommend using it to load up your data. That might be a good approach if you want to batch up a LOT of requests to load all at once, but not as a recurring and frequent activity.
This works pretty well for me. I can't guarantee you 400 per sec tho:
private async Task BulkInsert(string tableName, DataTable dt)
{
if (dt == null)
return;
using (SqlBulkCopy bulkCopy = new SqlBulkCopy("./sqlserver..."))
{
bulkCopy.DestinationTableName = tableName;
await bulkCopy.WriteToServerAsync(dt);
}
}

how to update part of object

It is necessary to insert some data in DB once each web service method is called: in the beginning of the request processing and in the end.
My intention is to insert record that will contains all income information in the beginning of request processing and after that update the same record once request is processed and data are ready to be send back (or error is occurred and I is need to store error message).
The problem is that income data can be pretty long and LINQ To SQL before update need to fetch object data from DB and then "store" it again. In this case "income data" is going 3 times:
1st time when inserting - it goes into DB;
2nd time before object update - it is fetched from DB;
3rd time on update - it is going to DB again.
Is there any possibility to optimize such process if I already have object fetched from DB?
Is the same applied to Entity Framework? Does it allow to update only the part of object?
An ORM is geared towards converting complete rows to complete objects, and back again - so updates are always to the full object.
However, both Linq-to-SQL as well as Entity Framework are definitely smart enough to find out what properties have changed on an entity, so if you only update some fields, the generated SQL command using UPDATE will only update those changed fields.
So basically: you just try it! Fire up SQL profiler and see what SQL goes to the database; in Entity Framework, I'm positive that if you only change some fields, only those changed fields will be updated in an UPDATE statement and nothing else.

Resources