Snowflake Total Bytes Over Network at any Given Point of Time - snowflake-cloud-data-platform

I am trying to check on the snowflake account level, how much data been transferred from snowflake in and out at any given point of time. BYTES_SENT_OVER_THE_NETWORK - I tried to research this column from the Query History table but I couldn't get the given point in time instead I am getting start time and end time and how much bytes transferred. Is there any way we can get these details. Happy to share more context if required.

There is a Data_TRANSFER_HISTORY view in account usage which capture this for COPY, REPLICATION and EXTERNAL_FUNCTIONS
https://docs.snowflake.com/en/sql-reference/account-usage/data_transfer_history.html
For each query data transfer, you can use columns INBOUND_DATA_TRANSFER_BYTES and OUTBOUND_DATA_TRANSFER_BYTES
https://docs.snowflake.com/en/sql-reference/account-usage/query_history.html

Related

Having trouble with interface table structures in SQL Server

I'm currently working on a project that involves a third party database and application. So far we are able to successfully TEST and interface data between our databases. However we are having trouble when we are extracting a large set of data (ex 100000 rows and 10 columns per row) and suddenly it stopped at the middle of transaction for whatever reason(ex blackouts, force exit or etc..), missing or duplication of data is happening in this type of scenario.
Can you please give us a suggestions to handle these types of scenarios? Thank you!
Here's our current interface structure
OurDB -> Interface DB -> 3rdParty DB
OurDB: we are extracting records from OurDB (with bit column as false) to the InterfaceDb
InterfaceDB: after inserting records from OurDB, we will update OurDB bit column as true
3rdPartyDB: they will extract and delete all records from InterfaceDB (they assume that all records is for extraction)
Well, you defintitely need a ETL tool then and preferably SSIS. First it will drastically improve your transfer rates while also providing robust error handling. Additionally you will have to use lookup transforms to ensure duplicates do not enter the sytsem. I would suggest go for Cache Connection Manager in order to perform the look-ups.
In terms of design, if your source system (OurDB) is having a primary key say recId, then have a column say source_rec_id in your InterfaceDB table. Say your first run has transferred 100 rows. Now in your second run, you would then need to pick 100+1th record and move on to the next rows. This way you will have a tracking mechanism and one-to-one correlation between source system and destination system to understand how many records have got transferred, how many are left etc.
For best understanding of SSIS go to Channel 9 - msdn - SSIS. Very helpful resource.

Exporting One day data without holding lock , in Oracle

I have to write a script which will take a day's data from a single table and dump it into another table (where my other process will process that data). However, the table from which I want to export data is a "Live" table which has a high amount of inserts. Any kind of locking (while the export is being done) will not be acceptable. I know that mysql has a way of NOT holding a lock via the "--skip-lock-tables" flag. Does Oracle have something similar ? If not , what is the best way to achieve this ?
Oracle version being used is - Oracle SE One 12.1.0.1.v2
At the point in time when SELECT starts, an SCN is attached to your SELECT. Any changes to the data after that SCN does not impact your SELECT. Your SELECT reads the old data from UNDO. If this is a high transaction table and you expect your SELECT to be long running, ensure that you have sufficient UNDO space and enough UNDO_RETENTION.
Also focus on good design prevents any potential issues. From this point of view, I would suggest implementing daily partitions on your source table. This would help backing up one day's data easily and help maintain the table in future.
Hope this helps.

How to control which rows were sent via SSIS

I'm trying to create SSIS package which will periodically send data to other database. I want to send only new records(I need to keep sent records) so I created status column in my source table.
I want my package to update this column after successfuly sending data, but I can't update all rows wih "unsent" status because during package execution some rows may have been added, and I also can't use transactions(I mean on isolation levels that would solve my problem: I can't use Serializable beacause i musn't prevent users from adding new rows, and Sequence Container doesn't support Snapshot).
My next idea was to use recordset and after sending data to other db use it to get ids of sent rows, but I couldn't find a way to use it as datasource.
I don't think I should set status "to send" and then update it to "sent", I believe it would be to costly.
Now I'm thinking about using temporary table, but I'm not convinced that this is the right way to do it, am I missing something?
Record Set is a destination. You cannot use it in Data Flow task.
But since the data is saved to a variable, it is available in the Control flow.
After completing the DataFlow, come to the control flow and create a foreach component that can run on the ResultSet varialbe.
Read each Record Set value into a variable and use it to run an update query.
Also, see if "Lookup Transform" can be useful to you. You can generate rows that match or doesn't match.
I will improve the answer based on discussions
What you have here is a very typical data mirroring problem. To start with, I would not simply have a boolean that signifies that a record was "sent" to the destination (mirror) database. At the very least, I would put a LastUpdated datetime column in the source table, and have triggers on that table, on insert and update, that put the system date into that column. Then, every day I would execute an SSIS package that reads the records updated in the last week, checks to see if those records exist in the destination, splitting the datastream into records already existing and records that do not exist in the destination. For those that do exist, if the LastUpdated in the destination is less than the LastUpdated in the source, then update them with the values from the source. For those that do not exist in the destination, insert the record from the source.
It gets a little more interesting if you also have to deal with record deletions.
I know it may seem wasteful to read and check a week's worth, every day, but your database should hardly feel it, it provides a lot of good double checking, and saves you a lot of headaches by providing a simple, error tolerant algorithm. Some record does not get transferred because of some hiccup on the network, no worries, it gets picked up the next day.
I would still set up the SSIS package as a server task that sends me an email with any errors, so that I can keep track. Most days, you get no errors, and when there are errors, you can wait a day or resolve the cause and let the next days run pick up the problems.
I am doing a similar thing, in my case, I have a status on the source record.
I read in all records with a status of new.
Then use a OLE DB Command to execute SQL on each row, changing
the status to "In progress"(in you where, enter a ? as the value in
the Component Property tab, and you can configure it as a parameter
from the table row like an ID or some pk in the Column Mappings
tab).
Once the records are processed, you can change all "In Progress"
records to "Success" or something similar using another OLE DB
Command.
Depending on what you are doing, you can use the status to mark records that errored at some point, and require further attention.

Persist Data in SSIS for Next Execution

I have data to load where I only need to pull records since the last time I pulled this data. There are no date fields to save this information in my destination table so I have to keep track of the maximum date that I last pulled. The problem is I can't see how to save this value in SSIS for the next time the project runs.
I saw this:
Persist a variable value in SSIS package
but it doesn't work for me because there is another process that purges and reloads the data separate from my process. This means that I have to do more than just know the last time my process ran.
The only solution I can think of is to create a table but it seems a bit much to create a table to hold one field.
This is a very common thing to do. You create an execution table that stores the package name, the start time, the end time, and whether or not the package failed/succeeded. You are then able to pull the max start time of the last successfully ran execution.
You can't persist anything in a package between executions.
What you're talking about is a form of differential replication and this has been done many many times.
For differential replication it is normal to store some kind of state in the subscriber (the system reading the data) or the publisher (the system providing the data) that remembers what state you're up to.
So I suggest you:
Read up on differential replication design patterns
Absolutely put your mind at rest about writing data to a table
If you end up having more than one source system or more than one source table your storage table is not going to have just one record. Have a think about that. I answered a question like this the other day - you'll find over time that you're going to add handy things like the last time the replication ran, how long it took, how many records were transferred etc.
Is it viable to have a SQL table with only one row and one column?
TTeeple and Nick.McDermaid are absolutely correct, and you should follow their advice if humanly possible.
But if for some reason you don't have access to write to an execution table, you can always use a script task to read/write the last loaded date to a text file on on whatever local file-system you're running SSIS on.

Large file, SSIS to Small chunks, parallel enrichment

I have a pretty large file 10GB in Size need to load the records into the DB,
I want to have two additional columns
LoadId which is a constant (this indicates the files unique Load number)
ChunkNumber which would indicate the Chunk of the batch size.
So if I have a batch size of 10,000 records I want
LoadId = {GUID}
ChunkNumber = 1
for the next 10,000 records i want
LoadId = {GUID}
ChunkNumber = 2
Is this possible in SSIS? I suppose I can write a custom component for this but there should be a inbuilt ID if i could use as SSIS is already running stuff in batches of size 10,000
Can some one help me to figure out this parameter if it exists and can it be used?
Ok little bit more detail on the background of what and why.
We we get the data into a Slice of 10,000 records then we can start calling the Stored Procedures to enrich the data in chunks, all i am trying to do is can the SSIS help here by putting a Chunk number and a Guid
this helps the stored proc to move the data in chunks, although i could do this after the fact with a row number, Select has to travel through the whole set again and update the chunk numbers. its a double effort.
A GUID will represent the the complete dataset and individual chunks are related to it.
Some more insight. There is a WorkingTable we import this large file into and if we start enriching all the data at once the Transaction log would be used up, it is more manageable if we can get the data into chunks, so that Transaction log would not blow up and also we can parallel the enrichment process.
The data moves from De-normalized format normalized format from here. SP is more maintainable in therms of release and management of day today, so any help is appreciated.
or is there an other better way of dealing with this?
For the LoadID, you could use the SSIS variable
System::ExecutionInstanceGUID
which is generated by SSIS when the package runs.

Resources