Auto updating access database (can't be linked) - database

I've got a CSV file that refreshes every 60 seconds with live data from the internet. I want to automatically update my Access database (on a 60 second or so interval) with the new rows that get downloaded, however I can't simply link the DB to the CSV.
The CSV comes with exactly 365 days of data, so when another day ticks over, a day of data drops off. If i was to link to the CSV my DB would only ever have those 365 days of data, whereas i want to append the existing database with the new data added.
Any help with this would be appreciated.
Thanks.

As per the comments the first step is to link your CSV to the database. Not as your main table but as a secondary table that will be used to update your main table.
Once you do that you have two problems to solve:
Identify the new records
I assume there is a way to do so by timestamp or ID, so all you have to do is hold on to the last ID or timestamp imported (that will require an additional mini-table to hold the value persistently).
Make it happen every 60 seconds. To get that update on a regular interval you have two options:
A form's 'OnTimer' event is the easy way but requires very specific conditions. You have to make sure the form that triggers the event is only open once. This is possible even in a multi-user environment with some smart tracking.
If having an Access form open to do the updating is not workable, then you have to work with Windows scheduled tasks. You can set up an Access Macro to run as a Windows scheduled task.

Related

SQL Server Change Data Capture - Validating Incremental Window

I want to implement an incremental load process using SQL Server Change Data Capture. Every example I find takes the "happy path."
In other words, they assume that the CDC history exceeds the time since the last successful incremental load.
Suppose we leave the cleanup job with the default of 3 days, and for some reason our load hasn't successfully completed for longer than that. I need to check for this and run a full extract instead.
I'm logging the successful execution datetime in SQL Server tables. So, if I compare the last successful date to the earliest record in the cdc.lsn_time_mapping table, will this accomplish my task?
Basically something like:
Select #LastSuccessfulDate from audit....
Select #MinCdCDate = min(tran_begin_time) from cdc.lsn_time_mapping
if #MinCdCDate > #LastSuccessfulDate then 'Full' else 'Incremental'
Should this work? Is there a better way to accomplish it?
I would always stay in the "log domain" not the "time domain" when working directly with CDC. So track the last LSN of the last run and compare it against sys.fn_cdc_get_min_lsn every time you syncronize.
So if you last synchronized at lsn=100, and the min_lsn=110, then you've got a gap of 10 missing log records.
But this is only one of many scenarios that will require you to reinitialize the replication with a full sync, so you should also have an input parameter or somesuch to force a full sync.

Can we parametrize snowflake tasks?

I need to do one-time historical data load, followed by incremental load every 10 minutes.
is there a way to parametrize snowflake task to 1st run the historical load and then change the parameter to execute incremental loads? if not, can you suggest a better approach to handle historical (One-time) and incremental loads via tasks
Note: An underlying table of snowflake stream contains historical records and any new data after implementing stream/tasks is considered as incremental.
if you have a task call a stored procedure, you could have the stored procedure first check to see if the target table is empty (or whatever check you want. As long as you can write it as code, it'll work. Heck you could have it insert a task run log into a separate table, and check to see if it's the first time it's run.) and do the initial historical load in that case, and not otherwise.
Then the first time you run it, it will do one code path, and foreverafter it will do the other.

Persist Data in SSIS for Next Execution

I have data to load where I only need to pull records since the last time I pulled this data. There are no date fields to save this information in my destination table so I have to keep track of the maximum date that I last pulled. The problem is I can't see how to save this value in SSIS for the next time the project runs.
I saw this:
Persist a variable value in SSIS package
but it doesn't work for me because there is another process that purges and reloads the data separate from my process. This means that I have to do more than just know the last time my process ran.
The only solution I can think of is to create a table but it seems a bit much to create a table to hold one field.
This is a very common thing to do. You create an execution table that stores the package name, the start time, the end time, and whether or not the package failed/succeeded. You are then able to pull the max start time of the last successfully ran execution.
You can't persist anything in a package between executions.
What you're talking about is a form of differential replication and this has been done many many times.
For differential replication it is normal to store some kind of state in the subscriber (the system reading the data) or the publisher (the system providing the data) that remembers what state you're up to.
So I suggest you:
Read up on differential replication design patterns
Absolutely put your mind at rest about writing data to a table
If you end up having more than one source system or more than one source table your storage table is not going to have just one record. Have a think about that. I answered a question like this the other day - you'll find over time that you're going to add handy things like the last time the replication ran, how long it took, how many records were transferred etc.
Is it viable to have a SQL table with only one row and one column?
TTeeple and Nick.McDermaid are absolutely correct, and you should follow their advice if humanly possible.
But if for some reason you don't have access to write to an execution table, you can always use a script task to read/write the last loaded date to a text file on on whatever local file-system you're running SSIS on.

Order by creation time in OpenEdge

Is there an automatic way of knowing which rows are the latest to have been added to an OpenEdge table? I am working with a client and have access to their database, but they are not saving ids nor timestamps for the data.
I was wondering if, hopefully, OpenEdge is somehow doing this out of the box. (I doubt it is but it won't hurt to check)
Edit: My Goal
My goal from this is to be able to only import the new data, i.e. the delta, of a specific table. Without having which rows are new, I am forced to import everything because I have no clue what was aded.
1) Short answer is No - there's no "in the box" way for you to tell which records were added, or the order they were added.
The only way to tell the order of creation is by applying a sequence or by time-stamping the record. Since your application does neither, you're out of luck.
2) If you're looking for changes w/out applying schema changes, you can capture changes using session or db triggers to capture updates to the db, and saving that activity log somewhere.
3) If you're just looking for a "delta" - you can take a periodic backup of the database, and then use queries to compare the current db with the backup db and get the differences that way.
4) Maintain a db on the customer site with the contents of the last table dump. The next time you want to get deltas from the customer, compare that table's contents with the current table, dump the differences, then update the db table to match the current db's table.
5) Personally. I'd talk to the customer and see if (a) they actually require this functionality, (b) find out what they think about adding some fields and a bit of code to the system to get an activity log. Adding a few fields and some code to update them shouldn't be that big of a deal.
You could use database triggers to meet this need. In order to do so you will need to be able to write and deploy trigger procedures. And you need to keep in mind that the 4GL and SQL-92 engines do not recognize each other's triggers. So if updates are possible via SQL, 4GL triggers will be blind to those updates. And vice-versa. (If you do not use SQL none of this matters.)
You would probably want to use WRITE triggers to catch both insertions and updates to data. Do you care about deletes?
Simple-minded 4gl WRITE trigger:
TRIGGER PROCEDURE FOR WRITE OF Customer. /* OLD BUFFER oldCustomer. */ /* OLD BUFFER is optional and not needed in this use case ... */
output to "customer.dat" append.
export customer.
output close.
return.
end.

How to capture table level data changes in SQL Server 2008 R2?

I have high volume of data normalized into more than 100 tables. There are multiple applications which change underlying data in those tables and I want to raise events on those changes. Possible options that I know of are:
Change Data Capture
Change Tracking
Using Triggers on each table (bad option but possible)
Can someone share the best way of doing this if someone has already done this before?
What I really want in the end is if there is one transaction that affected 12 tables off 100 I should be able to bubble one event up instead of 12. Assume there are concurrent users change these tables.
Two options I can think of:
Triggers ARE the right way to capture change events in the DB layer
Codewise, I make sure in my app that each table is changed through only one place in the code, regardless what the change is (I call it a hub for that table, as it channels many different pathways into one place), it becomes very easy to catch change events that way in the code layer
One possibility is SQL Server Query Notifications: Using Query Notifications
As long as you want to 'batch' multiple changes, I think you should follow the route of Change Data Capture or Change Tracking (depending on whether you just want to know that something changed or what changes happened).
They should be used by a 'polling' procedure, where you poll for changes every few minutes (seconds, miliseconds???) and raise events. The nice thing about this is that as long as you store the last rowversion of the previous poll -for each table- you can check whenever you like for changes since the last poll. You don't rely on a real time triggers approach, that if halted you would loose all events forever. The procedure could be easily created inside a procedure that checks each table and you would need only 1 more table to store last rowversion per table.
Also, the overhead of this approach would be controlled by you and by how frequently the polling happens.

Resources