I am investigating different structures for our database, which is expected to contain millions of files. I have narrowed it down to two different models; one of which is 4 times faster and uses 3 times less CPU, but uses 4 times more IO reads than the other.
So what is more expensive in both money and server bottlenecks, considering we are planning to host it in either Amazon or Azure cloud, IO or CPU?
It totally depends on the type of IO device and the size of the virtualized instance used. In a cloud hosted environment the real hardware specs are totally abstracted into marketing terms like EC2 Compute Unit. The only real way to know is to spin up in all environments and load test. Anything else is just a plain old guess.
Just want to add one more variable - Memory.
High memory instances can dramatically reduce the IOPS / CPU requirements.
For example - a MongoDB instance which have most of its working set in memory - hardly do IO calls.
And I agree with jeremyjjbrown - test, test, test.
Your KPI would be transactions (R/W) per seconds and transactions per Dollar.
Related
I have a GAE standard Python app that does some fairly computational processing. I need to complete the processing within the 60 second request time limit, and ideally I'd like to do it faster for a better user experience.
Splitting the work to multiple threads don't seem to be a good solution because the threads would likely run on the same CPU and thus wouldn't give a speed up.
I was wondering if Google Cloud Functions (GCF) could be used in a similar manner as threads. For example, if I create a GCF to do the processing, split my work into 10 chunks, and make 10 GCF calls in parallel, can I expect to get results 10x faster? (aside from latency and GCF startup costs)
Each function invocation runs in its own server instance, and a function will scale up to 1000 instances to handle concurrent requests in parallel. So yes, you can do this, if you are willing to potentially pay the cold start cost of each server instance as it's allocated for its first request.
If you're able to split the workload in smaller chunks that you'd be launching in parallel via separate (external) requests I'd suspect you'd get a better performance (and cost) by using GAE itself (maybe in a separate service) instead of CFs:
GAE standard environment instances can have higher CPU speeds - a B8 instance has 4.8 GHz, the max CF CPU speed is 2.4 GHz
you have better control over the GAE scaling configuration and starting time penalties
I suspect networking delays would be at least the same if not better on GAE - not going to another product infra (unsure though)
GAE costs would likely be smaller since you pay per instance hours (regardless of how many requests the instance handles) not per request/invocations
We are planning our new EBS structure on amazon to get the best performance out of SQL Server. During the process some doubts appeared:
1 - Using the Amazon calculator (http://calculator.s3.amazonaws.com/index.html) we got the costs below:
General purpose (SSD) - 1000GB - 3000 IOPS = $184,30
Provisioned IOPS (SSD) - 1000GB - 3000 IOPS = $511,00
This amount is a huge diference in a month for the same performance (???), I'm aware about the "IOPS burst implementation" on General purpose SSD, but according to documentation:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
When the volume size is 1000 GB the burst duration is "infinite" (Always 3000 IOPS).
Is it safe to say that the performance between the two disks above are exactly the same?
2 - We need about 1700 GB for 100 databases, what layout should we use?
Options:
Get two disks (GP SSD) with 1000GB (3000 IOPS) each and distribute the workload among this two.
Get two disks (GP SSD) with 1000GB (3000 IOPS) each and put then together with RAID 0 ? (We will be able to have 6000 IOPS burst, but should I be worried about EBS fault?)
Get four disks (GP SSD) with 1000GB (3000 IOPS) each and use RAID 10? (Is it necessary with EBS?)
Give your suggestion, i will be glad to hear.
From Amazon support, hope this helps!
Greetings
The disk cost question is easy enough to answer. General purpose (SSD) and Provisioned IOPS (SSD) use similar technology. Side by side they can achieve the same speeds, the only difference being that GP2 maximum sped is 3000 and PIOPs is 4000, per volume. One reason PIOPS is much more expensive is that you also pay for the number of IO you use, where as GP2 there is no per IO cost.
As for the design of the 1700GB datastore, there are 2 main factors. Redundancy and Performance. And of course cost is a big factor. To provide proper guidance here we would need to know what your actual needs are going to be then we could suggest some solutions. However, there are a couple of main RAID levels etc that match what you suggested that we can talk about.
Get two disks (GP SSD) with 1000GB (3000 IOPS) each and distribute the workload among this two.
No RAID. I take it you mean just have some databases on one volume and some on the other? This to me, is actually fine. All i would do in addition is backup the DBs to some other locally attached EBS volumes. This would be for workloads no greater that 3000 IO (read and writes combined). It's also easily expandable. Just add more disk.
Get two disks (GP SSD) with 1000GB (3000 IOPS) each and put then together with RAID 0 ? (We will be able to have 6000 IOPS burst, but should I be worried about EBS fault?)
RAID 0. All you have done here is make things twice as fast. But lose one disk and you lose everything. Again, if you are happy to restore from backup if a disk fails, this is a fast cheap config, for upto 6000 IO. Not easily expandable.
Get four disks (GP SSD) with 1000GB (3000 IOPS) each and use RAID 10? (Is it necessary with EBS?)
RAID 5, 6, and 10. These are all faster and more redundant. Arguably, RAID 10 is the best config for database, and probably the right config for you. With 1700 GB of data, if things go wrong there will be lots and lots of unhappy people.
Any suggestions?
Have you considered Amazon RDS? RDS has lots of advantages. We do all the heavy lifting, including multi AZ deployments, and RDS can expand vertically (CPU) and horizontally (Space) as your needs grow.
http://aws.amazon.com/rds/details/
The other thing to consider with GP2 is.... you 'might' not need to provision 1TB volumes. You probably do not need the 3000 IO 'infinity' burst model. Lets say you do want to run at 3000 IO all the time. Why not provision 5 x 200GB volumes, where each volume has 3 IO per GB. So 5x200x3=3000IO baseline. Put the 5 volumes in raid 5 (for example) and you should get around 3000IO all day long, and never run out of credit if you dont go over that (and IO is equally distributed)
However, those volumes can each burst to 3000 IO for 30 minutes continuous before you get rate limited to 600IO per vol. Which is still 3000IO in total. So... in this config you can burst to 15,000IO at anytime and when you do get limited you still have the 3000IO you predicted you need. Just don't run at over 3000 for more than needed or you'll have no burst left.
Neat huh? I think it is worthwhile to call or chat in to discuss your actual needs and answer any questions. Ultimately though, you will need to test and benchmark which ever design you decide to go with as talking about things and actual results will always differ! I imagine you guys are quite advanced but - here is a great example benchmark if you want to do some simple tests on various designs to help you decide what is best.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/benchmark_piops.html
What could be the reasons for Redis slow work/response?
i.e. I found on Stackoverflow that storing large files or data in Redis makes it slow. What's else?
There is no simple answer to this question. With all NoSQL or SQL based storage solutions, there are plenty of conditions that could result in high latency or slowness of the storage engine. Redis is no exception.
I would suggest to start by reading:
How fast is Redis?
Redis latency problems troubleshooting
Here is a non exhaustive list of potential reasons:
Inadequate hardware (network, memory, CPU)
Software based virtualization (Xen on low-end hardware for instance)
Not enough memory, generating swapping at the OS level
Too many O(n) operations (like KEYS) executed in the single-threaded engine
Large objects stored in Redis, leading to uncontrolled expansion of the communication buffers
Huge number of simultaneous sessions (>30000)
Too many connection operations per second (Redis is not a webserver, connections are supposed to be permanent, not transient).
Too many roundtrips generated by the client application (no pipelining or aggregated command usage)
Large fork operations generated by bgsave or AOF rewrite (especially on VMs)
I/O related latencies when AOF is used
Accumulation of many expire operations triggered at the same time
Accumulation of memory in client and master/slave communication buffers, or slow log data
TCP incast conditions when network bandwidth consumption is significant
Using distributed storage (and especially cloudy ones such as EC2 EBS) to store dump or AOF files
There are probably many other reasons, related to the workload generated by your own application.
If some people think about other general reasons, we can add them to this list.
As was mentioned new connections, > 200 per minute could cause slownesses. A Possible solution is to add a proxy that keeps constant number of connections:
twemproxy
envoy
We're working on an application that's going to serve thousands of users daily (90% of them will be active during the working hours, using the system constantly during their workday). The main purpose of the system is to query multiple databases and combine the information from the databases into a single response to the user. Depending on the user input, our query load could be around 500 queries per second for a system with 1000 users. 80% of those queries are read queries.
Now, I did some profiling using the SQL Server Profiler tool and I get on average ~300 logical reads for the read queries (I did not bother with the write queries yet). That would amount to 150k logical reads per second for 1k users. Full production system is expected to have ~10k users.
How do I estimate actual read requirement on the storage for those databases? I am pretty sure that actual physical reads will amount to much less than that, but how do I estimate that? Of course, I can't do an actual run in the production environment as the production environment is not there yet, and I need to tell the hardware guys how much IOPS we're going to need for the system so that they know what to buy.
I tried the HP sizing tool suggested in the previous answers, but it only suggests HP products, without actual performance estimates. Any insight is appreciated.
EDIT: Main read-only dataset (where most of the queries will go) is a couple of gigs (order of magnitude 4gigs) on the disk. This will probably significantly affect the logical vs physical reads. Any insight how to get this ratio?
Disk I/O demand varies tremendously based on many factors, including:
How much data is already in RAM
Structure of your schema (indexes, row width, data types, triggers, etc)
Nature of your queries (joins, multiple single-row vs. row range, etc)
Data access methodology (ORM vs. set-oriented, single command vs. batching)
Ratio of reads vs. writes
Disk (database, table, index) fragmentation status
Use of SSDs vs. rotating media
For those reasons, the best way to estimate production disk load is usually by building a small prototype and benchmarking it. Use a copy of production data if you can; otherwise, use a data generation tool to build a similarly sized DB.
With the sample data in place, build a simple benchmark app that produces a mix of the types of queries you're expecting. Scale memory size if you need to.
Measure the results with Windows performance counters. The most useful stats are for the Physical Disk: time per transfer, transfers per second, queue depth, etc.
You can then apply some heuristics (also known as "experience") to those results and extrapolate them to a first-cut estimate for production I/O requirements.
If you absolutely can't build a prototype, then it's possible to make some educated guesses based on initial measurements, but it still takes work. For starters, turn on statistics:
SET STATISTICS IO ON
Before you run a test query, clear the RAM cache:
CHECKPOINT
DBCC DROPCLEANBUFFERS
Then, run your query, and look at physical reads + read-ahead reads to see the physical disk I/O demand. Repeat in some mix without clearing the RAM cache first to get an idea of how much caching will help.
Having said that, I would recommend against using IOPS alone as a target. I realize that SAN vendors and IT managers seem to love IOPS, but they are a very misleading measure of disk subsystem performance. As an example, there can be a 40:1 difference in deliverable IOPS when you switch from sequential I/O to random.
You certainly cannot derive your estimates from logical reads. This counter really is not that helpful because it is often unclear how much of it is physical and also the CPU cost of each of these accesses is unknown. I do not look at this number at all.
You need to gather virtual file stats which will show you the physical IO. For example: http://sqlserverio.com/2011/02/08/gather-virtual-file-statistics-using-t-sql-tsql2sday-15/
Google for "virtual file stats sql server".
Please note that you can only extrapolate IOs from the user count if you assume that cache hit ratio of the buffer pool will stay the same. Estimating this is much harder. Basically you need to estimate the working set of pages you will have under full load.
If you can ensure that your buffer pool can always take all hot data you can basically live without any reads. Then you only have to scale writes (for example with an SSD drive).
We're being asked to spec out production database hardware for an ASP.NET web application that hasn't been built yet.
The specs we need to determine are:
Database CPU
Database I/O
Database RAM
Here are the metrics I'm currently looking at:
Estimated number of future hits to
website - based on current IIS logs.
Estimated worst-case peak loads to
website.
Estimated number of DB queries per
page, on average.
Number of servers in web farm that
will be hitting database.
Cache polling traffic from database
(using SqlCacheDependency).
Estimated data cache misses.
Estimated number of daily database transactions.
Maximum acceptable page render time.
Any other metrics we should be taking into account?
Also, once we have all those metrics in place, how do they translate into hardware requirements?
What I have been doing lately for server planning is using some free tools that HP provides, which are collectively referred to as the "server sizers". These are great tools because they figure out the optimal type of RAID to use, and the correct number of disk spindles to handle the load (very important when planning for a good DB server) and memory processor etc. I've provided the link below I hope this helps.
http://h71019.www7.hp.com/ActiveAnswers/cache/70729-0-0-225-121.html?jumpid=reg_R1002_USEN
What I am missing is a measure for the needed / required / defined level of reliability.
While you could probably spec out a big honking machine to handle all the load, depending on your reliabiltiy requirements, you might rather want to invest in smaller, but multiple machines, and into safer disk subsystems (RAID 5).
Marc
In my opinion, estimating hardware for an application that hasn't been built and designed yet is more of a political issue than a scientific issue. By the time you finish the project, current hardware capability and their price, functional requirements, expected number of concurrent users, external systems and all other things will change and this change is beyond your control.
However this question comes up very often since you need to put numbers in a proposal or provide a report to your manager. If it is a proposal, what you are trying to accomplish is to come up with a spec that can support the proposed sofware system. The only trick is to propose a system that will not increase your cost for competiteveness while not puting yourself at the risk of a low performance system.
If you can characterize your current workload in terms of hits to pages, then you can then:
1) calculate the typical type of query that will be done for each page
2) using the above 2 pieces of information, estimate the workload on the database server
You also need to determine your performance requirements - what is the max and average response time you want for your website?
Given the workload, and performance requirements, you can then calculate capacity. The best way to make this estimate is to use some existing hardware, run a simulated database workload on a database on that hardware, and then extrapolate your hardware requirements based on your data from the first steps.