Where with allocation I mean, that for how much time have I "assumed the ownership" of the resource.
And with consumption I mean, how much has the resource actually been used for.
For example, single vCPU:
My app engine works for a whole month, 24/7 = 30 * 24 = 720h.
Average CPU usage is 60% for the whole month.
Do I pay fixed price for 720h, or...
Do I pay calculated price for 720h * actual usage (60%) = 432h?
Standard and flexible environment pricing differ from each other by the granularity of billing:
a few instance classes in standard env vs a wider range of combinations of vCPU and memory units
some (small) differences in terms of uptime calculation and start/stop offsets
But fundamentally both charged based on consumption (as in uptime, not in effective CPU used!).
Scaling matters as well:
automatic and basic scaling shut down idle dynamic instances, more or less bringing the answer close to "consumption" for them.
manual scaling as well as resident instances in automatic/basic scaling are charged as "allocation" as they'll be always running
Billing is based on clock time, not percentage used during that time period.
If your usage of a compute resource is 0%, you will still pay for it (assuming that the compute resource is running and not shutdown).
Related
I'd like to know if a time-series database will crumble with this scenario:
I have tens of thousands of IoTs sending 4 different values each 5min.
I will query those values for each IoT, for certain time spans. My question is:
Is a tsdb approach feasible and scalable up to, e.g., a million IoTs, having metrics like:
iot.key1.value1
iot.key1.value2
iot.key1.value3
iot.key1.value4
iot.key2.value1
.
.
.
iot.key1000000.value4
? Or are they way too much "amount of metrics"?
The retention policy will be 2 years, with possible roll ups maybe after (TBA) months. But I think this consideration only matters for disk size afaik.
Right now I'm using graphite
A reporting frequency of five minutes that should be fairly manageable, just be sure to set your storage schema to five-minutes being the smallest resolution data in order to save space, as you won't be needing to hold on to data at shorter periods.
With that said, scaling a graphite cluster to meet your needs isn't easy as Whisper isn't optimized for this. There are several resources/stories where others have shared their dismay trying to achieve this, for example: here and here
There are other limitations to consider too, Whisper is configured in such a way that it can record only one datapoint per timestamp, and the last datapoint received "wins". This might not be an issue to you now, but later down the road you might find that you need to increase the datapoint reporting requency to get a better insight into your data.
Therein comes the question, how can I get around that? Often, StatsD is the answer - it's an aggregator that takes your individual metrics over a defined period of time, and churns out a histogram-like set of metrics with different statistical derivatives of your data (minimum, maximum, X-percentile, and so on). Suddenly you're then faced with the prospect of managing a Graphite instance or cluster, one (or more) StatsD service, and that's before you even get to the fun part of visualising your data: Grafana is often used here and also requires you to set up and maintain.
Conversely, assuming you will maintain that reporting frequency, but increase the number of devices (as you mentioned), you might find another component of your Graphite stack - Carbon-relay - running into some bottlenecking issues (as described here).
I work at MetricFire, formerly Hosted Graphite, where we had a lot of these considerations in mind when building our product/service. Collectively we process millions of datapoints per second across hundreds of accounts. Data is rolled up and stored at four resolutions: 5-seconds, 30-seconds, 5-minute, 1-hour, where each resolution is available for 24 hours, 3 days, six months and two years, respectively.
A key component of our set-up is that our storage is not built on the typical Whisper backend - instead we use a custom-built data store using Riak allowing us to do many things: scale easily and aggregate datapoints per metric into Data Views, to name a few. That article about Data Views was written by one of our engineers and goes into some detail about the decisions we made when building our storage layer.
I want to calculate the page life expectancy of my SQL Server.
If I query the PLE with the follwowing query I get the value 46.000:
SELECT [object_name],
[counter_name],
[cntr_value] FROM sys.dm_os_performance_counters
WHERE [object_name] LIKE '%Manager%'
AND [counter_name] = 'Page life expectancy'
I think this value isn't the final value because of the high amount. Do I have to calculate these value with a specifiy formula?
Thanks
Although some counters reported by sys.dm_os_performance_counters are cumulative, PLE reflects the current value so no calculation is necessary.
As to whether the value of 46 seconds is a cause for concern depends much on the workload and storage system. This value would be concern on a high-volume OLTP system with local spinning disk media due to the multi-millisecond latency incurred for each physical IO and IOPS of roughly 200 per spindle. Conversely, the same workload with high-performance local SSD may be fine because the storage capable of well over 100K IOPS.
I have to configure graphite to hold data with 1sec resolution for important data.I'm not going to have a lot of this ,high resolution is important for me.
Yesterday I created about ~400 metrics each with 12 values and the DB size increased to 300 GB
pattern = .*
retentions = 1s:1800d
Am I missing something ?
Graphite's backing database, Whisper, uses flat files as a storage mechanism. It calculates how much space will be needed to store a continuous stream of data at your resolution for the entire retention period as soon as the metric is created. This is because all time-slots within an archive take up space whether or not a value is stored.If you don't add any more metrics, then you needn't worry about your storage footprint expanding further.
See http://graphite.readthedocs.io/en/latest/whisper.html
Assume an app that collects real-time temperature data for various cities around the world every 10 minutes.
Using the following GAE datastore model,
class City(db.Model):
name = db.StringProperty()
class DailyTempData(db.Model):
date = db.DateProperty()
temp_readings = db.ListProperty(float, indexed=False) # appended every 10 minutes
and a cron.yaml as so,
cron:
- description: read temperature
url: /cron/read_temps
schedule: every 10 minutes
I am already hitting GAE's daily free quota for datastore writes, and I'm looking for ways to get around this problem.
I'm thinking of reducing my datastore writes by persisting the temperature data only at the end of each day, which will effectively reduce the daily write volume (for each city) from 144 times to 1.
One way to do this is to use memcache as a temporary scratchpad, but due to the possibility of random data evictions, I could well lose all my data for the day. (Aside question: from experience, how often does unplanned eviction really happen?)
Questions are as follows:
Is there such a memory/storage facility (persistent and guaranteed across cron jobs) that would allow me to reduce datastore writes as described?
If not, what could be some alternative solutions?
The only other requirement would be that the temperature readings must be accessible (for serving to client-side) any given time of day.
The only guaranteed storage in the datastore.
As to memcache evictions - it's depends on what is going on, in your app and in google appengine land, evictions could be within a minute or two or after hours. In my appengine instances I usually have oldest items sitting at about 2 hours old. But it all depends and you just can't rely on it.
tasks queues payload is about 10K.
You could just write a blob (containing all cities measured in the 10min interval) and then reprocess it and unpick it and write out the city details at the end of the day.
When you say clients must be able to access temperature readings, do you mean just the current or all the readings for the day.
You could also change your model, so that a huge object is stored for each execution or the cron. Not just for each city, I mean.
For example, say the object is called Measures... A Measures item will contain a List of all your measures for the corresponding time. Store them as non-indexed properties and you should have no problems... And also just 144 writes a day.
For the reading part... Use memcache to store the Measures items, as a good usage pattern.
How does the performance of db.get() compare with that of db.get_by_key_name()?
get_by_key_name must compute the keys based on app, model, name and parent, so it should consume a (tiny but not null) amount of CPU more than db.get, which needs no computation. However I doubt you can measure the difference in elapsed time, since the fetching from storage will vastly dominate in both cases.
For all intents and purposes they are equivalent.