Datastore Read Operations Calculation - google-app-engine

So I am currently performing a test, to estimate how much can my Google app engine work, without going over quotas.
This is my test:
I have in the datastore an entity, that according to my local
dashboard, needs 18 write operations. I have 5 entries of this type
in a table.
Every 30 seconds, I fetch those 5 entities mentioned above. I DO
NOT USE MEMCACHE FOR THESE !!!
That means 5 * 18 = 90 read operations, per fetch right ?
In 1 minute that means 180, in 1 hour that means 10800 read operations..Which is ~20% of the daily limit quota...
However, after 1 hour of my test running, I noticed on my online dashboard, that only 2% of the read operations were used...and my question is why is that?...Where is the flaw in my calculations ?
Also...where can I see in the online dashboard how many read/write operations does an entity need?
Thanks

A write on your entity may need 18 writes, but a get on your entity will cost you only 1 read.
So if you get 5 entries every 30 secondes during one hour, you'll have about 5reads * 120 = 600 reads.
This is in the case you make a get on your 5 entries. (fetching the entry with it's id)
If you make a query to fetch them, the cost is "1 read + 1 read per entity retrieved". Wich mean 2 reads per entries. So around 1200 reads in one hour.
For more details informations, here is the documentation for estimating costs.
You can't see on the dashboard how many writes/reads operations an entity need. But I invite you to check appstats for that.

Related

Two question about Time Travel storage-costs in snowflake

I read the snowflake document a lot. Snowflake will has storage-costs if data update.
"tables-storage-considerations.html" mentioned that:
As an extreme example, consider a table with rows associated with
every micro-partition within the table (consisting of 200 GB of
physical storage). If every row is updated 20 times a day, the table
would consume the following storage:
Active 200 GB | Time Travel 4 TB | Fail-safe 28 TB | Total Storage 32.2 TB
The first Question is, if a periodical task run 20 times a day, and the task exactly update one row in each micro-partition, then the table still consume 32.2TB for the total storage?
"data-time-travel.html" mentioned that:
Once the defined period of time has elapsed, the data is moved into
Snowflake Fail-safe and these actions can no longer be performed.
So my second question is: why Fail-safe cost 28TB, not 24TB (reduce the time travel cost)?
https://docs.snowflake.com/en/user-guide/data-cdp-storage-costs.html
https://docs.snowflake.com/en/user-guide/tables-storage-considerations.html
https://docs.snowflake.com/en/user-guide/data-time-travel.html
First question: yes, it's the fact that the micro-partition is changing that is important not how many rows within it change
Question 2: fail-safe is 7 days of data. 4Tb x 7 = 28Tb

Merging different granularity time series in influxdb

I want to store trades as well as best ask/bid data, where the latter updates much more rapidly than the former, in InfluxDB.
I want to, if possible, use a schema that allows me to query: "for each trade on market X, find the best ask/bid on market Y whose timestamp is <= the timestamp of the trade".
(I'll use any version of Influx.)
For example, trades might look like this:
Time Price Volume Direction Market
00:01.000 100 5 1 foo-bar
00:03.000 99 50 0 bar-baz
00:03.050 99 25 0 foo-bar
00:04.000 101 15 1 bar-baz
And tick data might look more like this:
Time Ask Bid Market
00:00.763 100 99 bar-baz
00:01.010 101 99 foo-bar
00:01.012 101 98 bar-baz
00:01.012 101 99 foo-bar
00:01:238 100 99 bar-baz
...
00:03:021 101 98 bar-baz
I would want to be able to somehow join each trade for some market, e.g. foo-bar, with only the most recent ask/bid data point on some other market, e.g. bar-baz, and get a result like:
Time Trade Price Ask Bid
00:01.000 100 100 99
00:03.050 99 101 98
Such that I could compute the difference between the trade price on market foo-bar and the most recently quoted ask or bid on market bar-baz.
Right now, I store trades in one time series and ask/bid data points in another and merge them on the client side, with logic along the lines of:
function merge(trades, quotes, data_points)
next_trade, more_trades = first(trades), rest(trades)
quotes = drop-while (quote.timestamp < next_trade.timestamp) quotes
data_point = join(next_trade, first(quotes))
if more_trades
return merge(more_trades, quotes, data_points + data_point)
return data_points + data_point
The problem is that the client has to discard tons of ask/bid data points because they update so frequently, and only the most recent update before the trade is relevant.
There are tens of markets whose most recent ask/bid I might want to compare a trade with, otherwise I'd simply store the most recent ask/bid in the same series as the trades.
Is it possible to do what I want to do with Influx, or with another time series database? An alternative solution that produces lower quality results is to group the ask/bid data by some time interval, say 250ms, and take the last from each interval, to at least impose an upper bound on the amount of quotes the client has to drop before finding the one that's closest to the next trade.
NB. Just a clarification on InfluxDB terminology. You're probably storing trade and tick data in different measurements(analogous to a table). Series is a subdivision withing a measurement based on tag values. e.g
Time Ask Bid Market
00:00.763 100 99 bar-baz
is one series
Time Ask Bid Market
00:01.010 101 99 foo-bar
is another series(assuming you are storing Market name/id as a tag and not a field)
Answer
InfluxQL https://docs.influxdata.com/influxdb/v1.7/query_language/spec/ - I can't think of a way to achieve what you need with InfluxQL (Influx Query Language) as it does not support joins.
Perhaps what you could do on the client side is instead of requesting all tick data for a period and discarding most of it, make a request per trade and market to get exactly the (the most recent with respect to the trade) ask/bid datapoint that you need. Something like:
function merge(trades, market)
points = <empty list>
for next_trade in trades
quote = db.query("select last(ask), last(bid) from tick_data where time<=next_trade.timestamp and Market=market and time>next_trade.timestamp - 1m")
// or to get a list per market with one query
// quote_per_market = db.query("select last(ask), last(bid) from tick_data where time<=next_trade.timestamp group by Market")
points = points + join(next_trade, quote)
return points
Of course you'd have the overhead of querying the database more frequently but depending on the number of trades and your resource constraints it may be more efficient. NB. A potential pitfall here is that ask and bid retrieved this way are not retrieved as a pair but independently and while they are returned as a pair it could happen that they have different timestamps. If for some timestamp for some reason you only have an ask or a bid price you might run into this problem. However, as long as you write them in pairs and have no missing data it should be ok.
Flux https://www.influxdata.com/products/flux/ - Flux is a more sophisticated query language that is part of Influxdb 1.7 and 2 that allows you to do joins and operations across different measurements. I can't give you any examples yet but it's worth having a look at.
Other (relational) Times Series DBs that you could have a look at that would also allow you to do joins are CrateDB https://crate.io/ or Postgres + TimescaleDB https://www.timescale.com/products

Go - AppEngine - Performance

I have Go app that receives JSON in POST and stores it in a Datastore (AppEngine)
The statistic for first 24 hours:
40 entities were stored in datastore. (every entity is small less 1K, JSON with 7-10 fields)
7.20 Instance hours consumed.
7 hours is much more then I expected. I expected to see 7 seconds or even 1 second.
Is that normal?
Instance hours means how long your app standup. As GAE will go idle if no request in 15 minutes, in your case, if there is a request every 15 minutes, you may max cost 40req*15min/60=10hour instance hours. So 7.2 instance hours is possible.

GAE Datastore vs MongoDB by Price

I need a NoSql database to write continuous log data. Approx. 100 write per second. And a single data is contains 3 column and less than 1kb. Read is necessarily only once a day, then I can delete all daily data. But I can't decide that which is the cheapest solution; Google App Engine and Datastore or Heroku and Mongolab?
I can give you costs for GAE:
Taking billing docs and assuming you'll have about 258M operations per (86400 second per day * 100 requests/s) this would cost you
Writing: 258M record * ($0.2 / 100k) = $516 for writing unindexed data
Reading: 258M records * ($0.07 / 100k ops) = $180 for reading once a month
Deleting 258M rec * ($0.2 / 100k) = $516 for deleting unindexed data
Storage: 8.6M entities at 1kb per day = 8.6GB per day = 240 GB / month = averaged 120 GB
Storage cost: 120 GB * 0.12$/GB = $15 / month
So your total operation per month on GAE would be about $1300 per month. Note that using a structured database for writing unstructured data is not optimal and it reflects on the price.
With App Engine, it is recommended that you use memcache for operations like this, and memcache does not incur database charges. Using python 2.7 and ndb, memcache is automatically used and you will get at most 1 database write per second.
At current billing:
6 cents per day for reads/writes.
Less than $1 per day storage

How do I measure response time in seconds given the following benchmarking data?

We recently got some data back on a benchmarking test from a software vendor, and I think I'm missing something obvious.
If there were 17 transactions (I assume they mean successfully completed requests) per second, and 1500 of these requests could be served in 5 minutes, then how do I get the response time for a single user? Is this sort of thing even possible with benchmarking? I have a lot of other data from them, including apache config settings, but I'm not sure how to do all the math.
Given the server setup they sent, I want to know how I can deduce the user response time. I have looked at other similar benchmarking tests, but I'm having trouble measuring requests to response time. What other data do I need to provide here to get that?
If only 1500 of these can be served per 5 minutes then:
1500 / 5 = 300 transactions per min can be served
300 / 60 = 5 transactions per second can be served
so how are they getting 17 completed transactions per second? Last time I checked 5 < 17 !
This doesn't seem to fit. Or am I looking at it wrongly?
I presume be user response time, you mean the time it takes to serve a single transaction:
If they can serve 5 per second than it takes 200ms (1/5) per transaction
If they can serve 17 per second than it takes 59ms (1/17) per transaction
That is all we can tell from the given data. Perhaps clarify how many transactions are being done per second.

Resources