Calculate Data Storage for a database in snowflake - snowflake-cloud-data-platform

Snowflake database storage includes (maybe there are some others)
tables
time travel
fail safe
clones
files staged
I am trying to find a way to calculate tables + time-travel + fail-safe with and without clones not using TABLE_STORAGE_METRICS.
Currently looking at ACCOUNT_USAGE.database_storage_usage_history, but I am not sure what is included in AVERAGE_DATABASE_BYTES.
How do I find the correct values for current database?
Edit:
I am not an account admin
Would like to use query instead of UI
Edit 2: Result from SELECT * FROM INFORMATION_SCHEMA.TABLE_STORAGE_METRICS;
With no IMPORTED privilege or permission to view SNOWFLAKE database.

The documentation is a great source of information on this:
https://docs.snowflake.com/en/sql-reference/account-usage/database_storage_usage_history.html#database-storage-usage-history-view
Number of bytes of database storage used, including data in Time
Travel.
Per your list, this would include tables, clones (which are tables on their own), and Time Travel.
For stages, you'd need to use the STAGE_STORAGE_USAGE_HISTORY view.
Is there a reason you don't want to use the TABLE_STORAGE_METRICS? Just curious.

If you are an accout admin, you should be able to see this on your snowflake console

Related

How to get a list of files names from Snowflake external S3 stage?

I am looking for the best way to automatically detect new files in a S3 bucket and then to load the data into a Snowflake table.
I know this can be achieved using Snowpipe and SNS, SQS notifications set up in AWS but I would like to have a self-contained solution within Snowflake which can be used for multiple data sources.
I want to have a table which is updated with the file names from a S3 bucket and then to load files which have not already been loaded from S3 into Snowflake.
The only way I have found to automatically detect new files from an external S3 stage in Snowflake so far is to use the code below and a task on a set schedule. This lists the file names and then uses result_scan to display the last query as a table.
list #STAGE_NAME;
set qid=last_query_id();
select "name" from table(result_scan($qid))
Does anyone know a better way to automatically detect new files in an external stage from Snowflake? Any help is much appreciated.
Not necessarily better than the way you've already found, but there is an alternative approach to listing the files in an S3 bucket.
If you create an EXTERNAL TABLE over the data in S3, you can then use the METADATA$FILENAME property in a query. If you have a record of which files have already been loaded into Snowflake then you can compare and select the names of the new files and process them.
e.g.
ALTER EXTERNAL TABLE MYSCHEMA.MYEXTERNALTABLE REFRESH;
SELECT DISTINCT
METADATA$FILENAME as filename
FROM
MYSCHEMA.MYEXTERNALTABLE;
Short Run:
Your approach
You've already found a viable solution, and your concern about the reliability of the last query id function is understandable. Procedures' sessions are isolated and so the last_query_id() function will be isolated to only the statements executed within that procedure. It might be unnecessary to use a procedure, but I personally like that they let you create reusable abstractions.
Another approach
An alternative, if you don't like the approach you're using, would be to create a single table with a single VARIANT data column plus the stage metadata columns, maintained by a single giant pipe, and you could maintain a set of materialized views over that table, which would filter, convert variant fields to columns, and sanitize, as appropriate.
There are some benefits:
simpler: integrating new prefixes for a stage requires only an additional materialized view, not an additional pipe + task
more control: you'd be able to operate directly and automatically on the data in raw form, rather than needing to load into a table and then check it. This means you can perform data quality checks, metadata checks, and sanitization.
maintainable: the use of materialized views over an immutable source means you can at any time change the logic and perform a full backfill with little effort.
Long Run:
Notification Integrations enable snowflake to listen (and possibly notify in the future, roadmap-gods willing) to external messaging systems. At this moment only Azure is supported, so it won't work for your case, but keep an eye out over the next few months -- I think it's safe to speculate that we will see this feature grow to support AWS, and a more direct and concise manner for implementing your original solution will eventually become available.

How and where to store the current customer purchasing history data?

I am now working on a project which requires to show the transaction history of one customer and if the product customer buys is under warranty or not. I need to use the data from the current system, the system can provide Web API, which is a .csv file. So how can I make use of the current system data?
A solution I think of is to download all the .csv files and write scripts to insert every record into the database I built which contains the necessary tables and relations to hold the data I retrieve. Then I can have a new database which I want. because I never done this before so I want know if it is feasible?
And one more question would be, if I should store the data locally or use a cloud database like Firebase?
High-end databases like SQL Server and Oracle come with utilities that allow you to read directly from a csv file. Check the docs. Having done this many times, the best procedure I found was to read the file into one holding table. This gives you the chance to examine the data and find any unexpected quirks or missing fields. This allows you to correct the data, where possible.
Then write the scripts to move the data from the holding table into the proper tables you have designed. This must be done in a logical manner. For example, move the customer data before the buy transactions. Thus any error messages you get will not be because you tried to store a transaction before you stored the customer. (You will have referential integrity set up, yes?) This gives you more chances to correct or adjust the data or just identify problems more or less at your leisure.
Whether or not to store the data in the cloud is strictly according to the preferences of your employer.

Standard practice/API for sharing database data without giving direct database access

We would like to give some of our customers the option to read data from our central database. The data is live and new records are being added every few seconds. Our database is MySQL running on Amazon RDS.
I was wondering what is the common practice for doing so.
One option would be to give them select right from specific tables, in that case they would be able to access other customers' data as well.
I have tried searching for database, interface, and API key words and some other key words, but I couldn't find a good answer.
Thanks!
Use REST for exposing specific tables to do CRUD operations. You can control the access on it too.

Collect data from 80 users, hiding other user's data

My wife works for a medium sized retail chain. Managers from each of the 80 outlets have to fill in one row of performance info for each of their staff (900 in all), but aren't allowed to see the data of other stores' staff.
My wife currently manages this with lots of spreadsheets, because each month the executive change what they want to collect, and their IT team don't have the resources to update their SAS system. She has to manually compile all the data into 1 spreadsheet for analysis which is time consuming and error prone. She's recently gone from having to do this for 20 outlets to 80 outlets and thinks she must be an easier way.
Is there a simple form based system, that can leverage what is already installed (microsoft office and lotus but not MSAccess), or can be run from a network drive. Cloud apps are banned. Excel's security is all wrong. Can word form templates append to a shared data source? Any ideas?
TIA
You could have a single table with all the data, then create 'shadowtables' on this table for each individual store.
in MySQL this would probably be either a 'partition table' (I've never used this so not sure how it works) or the use of temp tables.
You would then need to implement a method whereby when a user logs in at a given location (IP address) a trigger would create the temp table, then populate it with the relevent data for the store at that IP address.
An alternative (probably easier too) would be to have a specied table for each store, then grant users specific priveleges on each table you create. Again you'll need trigers to either populate a single 'master table' with info as it is updated, or you will just send a
select * from outlet1, outlet2... outlet80
again you may decide to create a temp table from the above select, and implement a custom script to create it only when required.
In fact that is probably how I would do it.
Then in you web interface have a button to create the temp table, and display it to the current user (provided they have the required priveleges to view all the tables of course).
I don't know for certain if Lotus is able to implement this, I don't know about its 'database' solution. I know that to do something similar in Access isn't that hard, the only downside would be needing to handle user identification (which Access doesn't do natively), again I don't know about Lotus.
In my experience the 'flat file database systems' don't generally handle user permisions in a native fashion, it is put onto the interface development to hand this.
I'm not sure how helpful the answer is, but it may take you a little way to a solution (even if you end up going for a server/client dbms system)
You can use Lotus for this. A simple start for you:
Create a database with one form and one view
On the form add whatever fields you want but also add a computed-when-composed multi-value field of type "Readers" with formula:
"[Admin]" : #Name( [CANONICALIZE];#userName)
With the exception of those with an Admin role (e.g., your wife), the view will display to each user only the records that the user created. The users will have to create one record per row.
Alternatively you could create an agent in the database that reads the data from an Excel file and builds the documents (records) with the READERS field's value computed as the documents are created.
If that's the route you want to take post a reply here and I'll post some code to (i) prompt a user to select an excel file, (ii) read the excel file data into lotus notes, (iii) implement a READERS field to see that documents are kept confidential between the creator and the Admin role people.
Hope that helps.

Storing data in text files instead of SQL Server

I'm intending to use both of SQL Server and simple text files to save my data.
Information like Users data are going to be stored in SQL Server, RSS fedd for each user are going to be stored in folder with the user Id as a title and inside this folder I can put the files that going to store the data in, each file can take only 20 lines, if there is more than 20 then I make a new file.
When I need to reed this data I simply call the last file in the user's folder.
I need to know what is the advantages and disadvantages of using this method?
thanx
I would suggest you to store the text file data into either VARCHAR(8000) or Blob and store inside the table in database.
The advantages of storing in database is:
All your data is stored in a single place. It is very easy for you to backup and restore in other place, if required
Database by default comes with concurrency and if you have say multiple users trying to access the same row, same table, database handles it inherently
When you go for files and database kind of hybrid approach, you are going for distributed storage and you have to always make sure that they are consistent
If you want to just store the latest text file content, go for UPDATE. If you want to keep history of earlier text files content, go for SCD Type 2 kind of storage or go for historical table containing previous text file data
Database is a single contained unit and you can do so many things on it like : Transparent data encryption, masking, access control and all security related stuff in a single contained unit. In hybrid approach, you have to manage security in two places.
When all your data is in a single place, and once you have proper indexes, you can write queries and come up with so many different reporting use cases, using SQL. But, if the data is distributed, you have to manage how will be handling the different reporting use cases.
The question is not quite correct.
You should start with clarification of requirements for the application. Answer to yourself the following questions:
What type of data queries need to be executed (selects, updates, reports).
How many users will be. How often requests from them will be coming. Does data must be synchronized across users (Concurrency).
Need of authentication and authorization, localization.
Need for modification history support.
Etc.
Databases usually have all this mechanisms and you do not have to implement them in your application.
Depending on your application needs you decide what strategy to use for storing the data: by means of database, files, or by both approaches.

Resources