Two question about Time Travel storage-costs in snowflake - snowflake-cloud-data-platform

I read the snowflake document a lot. Snowflake will has storage-costs if data update.
"tables-storage-considerations.html" mentioned that:
As an extreme example, consider a table with rows associated with
every micro-partition within the table (consisting of 200 GB of
physical storage). If every row is updated 20 times a day, the table
would consume the following storage:
Active 200 GB | Time Travel 4 TB | Fail-safe 28 TB | Total Storage 32.2 TB
The first Question is, if a periodical task run 20 times a day, and the task exactly update one row in each micro-partition, then the table still consume 32.2TB for the total storage?
"data-time-travel.html" mentioned that:
Once the defined period of time has elapsed, the data is moved into
Snowflake Fail-safe and these actions can no longer be performed.
So my second question is: why Fail-safe cost 28TB, not 24TB (reduce the time travel cost)?
https://docs.snowflake.com/en/user-guide/data-cdp-storage-costs.html
https://docs.snowflake.com/en/user-guide/tables-storage-considerations.html
https://docs.snowflake.com/en/user-guide/data-time-travel.html

First question: yes, it's the fact that the micro-partition is changing that is important not how many rows within it change
Question 2: fail-safe is 7 days of data. 4Tb x 7 = 28Tb

Related

Tricky: SQL Server-side aggregation of time-series data for charting

I have a large time-series data set in a table that contains 5 years of data. The data is very structured; it is clustered/ordered on the time column and there is exactly one record for exactly every 10 minutes over this entire 5 year period.
In my user-side application I have a time-series chart that is 400 pixels wide, and users can set the time scale from 1 hour up to 5 years. Therefore any query to the database by this chart that returns more than 400 records provides data that cannot be physically displayed.
What I want to know is; can anyone suggest an approach such that when the database is queried for a certain time range, the SQL database would dynamically make a suitable averaging aggregation that returns no more than 400 records?
Example 1): if the time range was 5 years, SQL Server would calculate ~1 value for every 4.5 days (5yrs*365days/400records required), so would average all the 10 minute samples for each 4.5 day bin and return a record for each bin. About 400 in total.
Example 2): If the time range was one month, SQL Server would calculate ~1 record for every 1.85 hours (31 days/400records), so would average all the 10 minute samples for each 1.85 hour bin and return a record for each bin. About 400 in total.
Ideally I'd like a solution that from the applications perspective can be queried just like a static Table.
I'd really appreciate any suggested approaches or code snippets.
some examples, if you have a datetime column (which is not quite clear from your question, as there is not table schema):
Grouping into interval of 5 minutes within a time range
SELECT / GROUP BY - segments of time (10 seconds, 30 seconds, etc)
They should be quite easy to port to SQL server, use datediff to convert your datetime values into an unix timestamp and use round() with the function parameter <> 0 for the div.

Datastore Read Operations Calculation

So I am currently performing a test, to estimate how much can my Google app engine work, without going over quotas.
This is my test:
I have in the datastore an entity, that according to my local
dashboard, needs 18 write operations. I have 5 entries of this type
in a table.
Every 30 seconds, I fetch those 5 entities mentioned above. I DO
NOT USE MEMCACHE FOR THESE !!!
That means 5 * 18 = 90 read operations, per fetch right ?
In 1 minute that means 180, in 1 hour that means 10800 read operations..Which is ~20% of the daily limit quota...
However, after 1 hour of my test running, I noticed on my online dashboard, that only 2% of the read operations were used...and my question is why is that?...Where is the flaw in my calculations ?
Also...where can I see in the online dashboard how many read/write operations does an entity need?
Thanks
A write on your entity may need 18 writes, but a get on your entity will cost you only 1 read.
So if you get 5 entries every 30 secondes during one hour, you'll have about 5reads * 120 = 600 reads.
This is in the case you make a get on your 5 entries. (fetching the entry with it's id)
If you make a query to fetch them, the cost is "1 read + 1 read per entity retrieved". Wich mean 2 reads per entries. So around 1200 reads in one hour.
For more details informations, here is the documentation for estimating costs.
You can't see on the dashboard how many writes/reads operations an entity need. But I invite you to check appstats for that.

GAE Datastore vs MongoDB by Price

I need a NoSql database to write continuous log data. Approx. 100 write per second. And a single data is contains 3 column and less than 1kb. Read is necessarily only once a day, then I can delete all daily data. But I can't decide that which is the cheapest solution; Google App Engine and Datastore or Heroku and Mongolab?
I can give you costs for GAE:
Taking billing docs and assuming you'll have about 258M operations per (86400 second per day * 100 requests/s) this would cost you
Writing: 258M record * ($0.2 / 100k) = $516 for writing unindexed data
Reading: 258M records * ($0.07 / 100k ops) = $180 for reading once a month
Deleting 258M rec * ($0.2 / 100k) = $516 for deleting unindexed data
Storage: 8.6M entities at 1kb per day = 8.6GB per day = 240 GB / month = averaged 120 GB
Storage cost: 120 GB * 0.12$/GB = $15 / month
So your total operation per month on GAE would be about $1300 per month. Note that using a structured database for writing unstructured data is not optimal and it reflects on the price.
With App Engine, it is recommended that you use memcache for operations like this, and memcache does not incur database charges. Using python 2.7 and ndb, memcache is automatically used and you will get at most 1 database write per second.
At current billing:
6 cents per day for reads/writes.
Less than $1 per day storage

Large amount of timecourses in database

I have a rather large amount of data (~400 mio datapoints) which is organized in a set of ~100,000 timecourses. This data may change every day and for reasons of revision-safety has to be archived daily.
Obviously we are talking about way too much data to be handled efficiently, so I made some analysis on sample data. Approx. 60 to 80% of the courses do not change at all between two days and for the rest only a very limited amount of the elements changes. All in all I expect much less than 10 mio datapoints change.
The question is, how do I make use of this knowledge? I am aware of concepts like the Delta-Trees used by SVN and similar techniques, however I would prefer, if the database itself would be capable of handling such semantic compression. We are using Oracle 11g for storage and the question is, is there a better way than a homebrew solution?
Clarification
I am talking about timecourses representing hourly energy-currents. Such a timecourse might start in the past (like 2005), contains 8760 elements per year and might end any time up to 2020 (currently). Each timecourse is identified by one unique string.
The courses themselves are more or less boring:
"Course_XXX: 1.1.2005 0:00 5; 1.1.2005 1:00 5;1.1.2005 2:00 7,5;..."
My task is making day-to-day changes in these courses visible and to do so, each day at a given time a snapshot has to be taken. My hope is, that some loss-free semantical compression will spare me from archiving ~20GB per day.
Basically my source data looks like this:
Key | Value0 | ... | Value23
to archive that data I need to add an additional dimension which directly or indirectly tells me the time at which the data was loaded from the source-system, so my archive-database is
Key | LoadID | Value0 | ... | Value23
Where LoadID is more or less the time the source-DB was accessed.
Now, compression in my scenario is easy. LoadIDs are growing with each run and I can give a range, i.e.
Key | LoadID1 | LoadID2 | Value0 | ... | Value23
Where LoadID1 gives me the ID of the first load where the 24 values where observed and LoadID2 gives me the ID of the last consecutive load where the 24 values where observed.
In my scenario, this reduces the amount of data stored in the database to 1/30th

What is a viable local database for Windows Phone 7 right now?

I was wondering what is a viable database solution for local storage on Windows Phone 7 right now. Using search I stumbled upon these 2 threads but they are over a few months old. I was wondering if there are some new development in databases for WP7. And I didn't found any reviews about the databases mentioned in the links below.
windows phone 7 database
Local Sql database support for Windows phone 7
My requirements are:
It should be free for commercial use
Saving/updating a record should only save the actual record and not the entire database (unlike WinPhone7 DB)
Able to fast query on a table with ~1000 records using LINQ.
Should also work in simulator
EDIT:
Just tried Sterling using a simple test app: It looks good, but I have 2 issues.
Creating 1000 records takes 30 seconds using db.Save(myPerson). Person is a simple class with 5 properties.
Then I discovered there is a db.SaveAsync<Person>(IList) method. This is fine because it doesn't block the current thread anymore.
BUT my question is: Is it save to call db.Flush() immediately and do a query on the currently saving IList? (because it takes up to 30 seconds to save the records in synchronous mode). Or do I have to wait until the BackgroundWorker has finished saving?
Query these 1000 records with LINQ and a where clause the first time takes up to 14 sec to load into memory.
Is there a way to speed this up?
Here are some benchmark results: (Unit tests was executed on a HTC Trophy)
-----------------------------
purging: 7,59 sec
creating 1000 records: 0,006 sec
saving 1000 records: 32,374 sec
flushing 1000 records: 0,07 sec
-----------------------------
//async
creating 1000 records: 0,04 sec
saving 1000 records: 0,004 sec
flushing 1000 records: 0 sec
-----------------------------
//get all keys
persons list count = 1000 (0,007)
-----------------------------
//get all persons with a where clause
persons list with query count = 26 (14,241)
-----------------------------
//update 1 property of 1 record + save
persons list with query count = 26 (0,003s)
db saved (0,072s)
You might want to take a look at Sterling - it should address most of your concerns and is very flexible.
http://sterling.codeplex.com/
(Full disclosure: my project)
try Siaqodb is commercial project and as difference from Sterling, not serialize objects and keep all in memory for query.Siaqodb can be queried by LINQ provider which efficiently can pull from database even only fields values without create any objects in memory, or load/construct only objects that was requested.
Perst is free for non-commercial use.
You might also want to try Ninja Database Pro. It looks like it has more features than Sterling.
http://www.kellermansoftware.com/p-43-ninja-database-pro.aspx

Resources