How to scale up Datomic transactor for throughput? - datomic

I want to test how much throughput I can possibly get out of Datomic, the bottleneck being its transactor, which does scale out.
With the rather simple transactions that I used, I got a TPS rate of x (I can't tell you the number due to Datomic's DeWitt clause, but to give you an idea, it's not less than 5590.92 and not more than 1,000,000,000), which I would like to boost, say by a factor of 10, so that transaction spikes do not lead to seconds of latency.
I gave my transactor (virtual) machine 12 cores, of which it only used y (again, I won't tell you the number, but it's somewhere between 5 and 12).
The Datomic peer creating the transactions sat on another virtual machine.
I tried both the dev storage and a remote in-memory h2 server, and the TPS figures were about the same, which is why I believe the transactor is the bottleneck, not storage. And the Datomic peer can easily generate many more transactions (in its single thread) when I just comment out the conn.transactAsync(data) call, so the bottleneck is not in the peer either.
Can I make Datomic's transactor use more cores or in some other way scale up the transactor? For my use case it would be fine even if I could only get a temporary boost to handle transaction spikes just for a few seconds - the high TPS rate does not need to be maintained for more than say 5 seconds.
Transactor settings:
memory-index-threshold=32m
memory-index-max=1g
object-cache-max=1g
Changing memory-index-max should not have that big of an impact, as the TPS rate in my tests is relatively stable. (As soon as memory-index-max is used up, back-pressure should reduce the TPS rate, but that's not what I'm seeing)
Transactor started with:
bin/transactor -Xms6g -Xmx6g config/transactor.properties

Related

SQL Server - EBS Storage Design

We are planning our new EBS structure on amazon to get the best performance out of SQL Server. During the process some doubts appeared:
1 - Using the Amazon calculator (http://calculator.s3.amazonaws.com/index.html) we got the costs below:
General purpose (SSD) - 1000GB - 3000 IOPS = $184,30
Provisioned IOPS (SSD) - 1000GB - 3000 IOPS = $511,00
This amount is a huge diference in a month for the same performance (???), I'm aware about the "IOPS burst implementation" on General purpose SSD, but according to documentation:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
When the volume size is 1000 GB the burst duration is "infinite" (Always 3000 IOPS).
Is it safe to say that the performance between the two disks above are exactly the same?
2 - We need about 1700 GB for 100 databases, what layout should we use?
Options:
Get two disks (GP SSD) with 1000GB (3000 IOPS) each and distribute the workload among this two.
Get two disks (GP SSD) with 1000GB (3000 IOPS) each and put then together with RAID 0 ? (We will be able to have 6000 IOPS burst, but should I be worried about EBS fault?)
Get four disks (GP SSD) with 1000GB (3000 IOPS) each and use RAID 10? (Is it necessary with EBS?)
Give your suggestion, i will be glad to hear.
From Amazon support, hope this helps!
Greetings
The disk cost question is easy enough to answer. General purpose (SSD) and Provisioned IOPS (SSD) use similar technology. Side by side they can achieve the same speeds, the only difference being that GP2 maximum sped is 3000 and PIOPs is 4000, per volume. One reason PIOPS is much more expensive is that you also pay for the number of IO you use, where as GP2 there is no per IO cost.
As for the design of the 1700GB datastore, there are 2 main factors. Redundancy and Performance. And of course cost is a big factor. To provide proper guidance here we would need to know what your actual needs are going to be then we could suggest some solutions. However, there are a couple of main RAID levels etc that match what you suggested that we can talk about.
Get two disks (GP SSD) with 1000GB (3000 IOPS) each and distribute the workload among this two.
No RAID. I take it you mean just have some databases on one volume and some on the other? This to me, is actually fine. All i would do in addition is backup the DBs to some other locally attached EBS volumes. This would be for workloads no greater that 3000 IO (read and writes combined). It's also easily expandable. Just add more disk.
Get two disks (GP SSD) with 1000GB (3000 IOPS) each and put then together with RAID 0 ? (We will be able to have 6000 IOPS burst, but should I be worried about EBS fault?)
RAID 0. All you have done here is make things twice as fast. But lose one disk and you lose everything. Again, if you are happy to restore from backup if a disk fails, this is a fast cheap config, for upto 6000 IO. Not easily expandable.
Get four disks (GP SSD) with 1000GB (3000 IOPS) each and use RAID 10? (Is it necessary with EBS?)
RAID 5, 6, and 10. These are all faster and more redundant. Arguably, RAID 10 is the best config for database, and probably the right config for you. With 1700 GB of data, if things go wrong there will be lots and lots of unhappy people.
Any suggestions?
Have you considered Amazon RDS? RDS has lots of advantages. We do all the heavy lifting, including multi AZ deployments, and RDS can expand vertically (CPU) and horizontally (Space) as your needs grow.
http://aws.amazon.com/rds/details/
The other thing to consider with GP2 is.... you 'might' not need to provision 1TB volumes. You probably do not need the 3000 IO 'infinity' burst model. Lets say you do want to run at 3000 IO all the time. Why not provision 5 x 200GB volumes, where each volume has 3 IO per GB. So 5x200x3=3000IO baseline. Put the 5 volumes in raid 5 (for example) and you should get around 3000IO all day long, and never run out of credit if you dont go over that (and IO is equally distributed)
However, those volumes can each burst to 3000 IO for 30 minutes continuous before you get rate limited to 600IO per vol. Which is still 3000IO in total. So... in this config you can burst to 15,000IO at anytime and when you do get limited you still have the 3000IO you predicted you need. Just don't run at over 3000 for more than needed or you'll have no burst left.
Neat huh? I think it is worthwhile to call or chat in to discuss your actual needs and answer any questions. Ultimately though, you will need to test and benchmark which ever design you decide to go with as talking about things and actual results will always differ! I imagine you guys are quite advanced but - here is a great example benchmark if you want to do some simple tests on various designs to help you decide what is best.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/benchmark_piops.html

Why is the CPU not maxed out?

I have an application that I'd like to make more efficient - it isn't taxing any one resource enough that I can identify it as a bottleneck, so perhaps the app is doing something that is preventing full efficiency.
The application pulls data from a database on one SQL Server instance, does some manipulation on it, then writes it to a database on another SQL Server instance - all on one machine. It doesn't do anything in parallel.
While the app is running (it can take several hours), none of the 4 CPU cores are maxed out (they hover around 40-60% utilization each), the disks are almost idle and very little RAM is used.
Reported values:
Target SQL Server instance: ~10% CPU utilization, 1.3GB RAM
Source SQL Server instance: ~10% CPU utilization, 300MB RAM
Application: ~6% CPU utilization, 45MB RAM
All the work is happening on one disk, which writes around 100KB/s during the operation, on average. 'Active time' according to task manager is usually 0%, occasionally flickering up to between 1 and 5% for a second or so. Average response time, again according to task manager, moves betweeen 0ms and 20ms, mainly showing between 0.5 and 2ms.
Databases are notorious for IO limitations. Now, seriously, as you say:
The application pulls data from a database on one SQL Server instance,
does some manipulation on it, then writes it to a database on another
SQL Server instance - all on one machine.
I somehow get the idea this is a end user level mashine, maybe a workstation. Your linear code (a bad idea to get full utilization btw, as you never run all 3 parts - read, process, write - in parallel) will be seriously limited by whatever IO subsystem you have.
But that will not come into play as long as you can state:
It doesn't do anything in parallel.
What it must do is do things in parallel:
One task is reading the next data
One task does the data processing
One task does the data writing
You can definitely max out a lot more than your 4 cores. Last time I did something like that (read / manipulate / write) we were maxing out 48 cores with around 96 or so processing threads running in parallel (and a smaller amount doing the writes). But a core of that is that your application msut start actually using multiple CPU's.
If you do not parallelize:
You only will max out one core max,
YOu basically waste time waiting for databases on both ends. The latency while you wait for data to be read or committed is latency you are not processing anything.
;) And once you fix that you will get IO problems. Promised.
I recommend reading How to analyse SQL Server performance. You need to capture and analyze the wait stats. These will tell you what is the execution doing that prevents it from going max out on CPU. You already have a feeling that the workload is causing the SQL engine to wait rather than run, but only after you understand the wait stats you'll be able to get a feel what is waiting for. Follow the article linked for specific analysis techniques.

What is more expensive for databases, IO or CPU?

I am investigating different structures for our database, which is expected to contain millions of files. I have narrowed it down to two different models; one of which is 4 times faster and uses 3 times less CPU, but uses 4 times more IO reads than the other.
So what is more expensive in both money and server bottlenecks, considering we are planning to host it in either Amazon or Azure cloud, IO or CPU?
It totally depends on the type of IO device and the size of the virtualized instance used. In a cloud hosted environment the real hardware specs are totally abstracted into marketing terms like EC2 Compute Unit. The only real way to know is to spin up in all environments and load test. Anything else is just a plain old guess.
Just want to add one more variable - Memory.
High memory instances can dramatically reduce the IOPS / CPU requirements.
For example - a MongoDB instance which have most of its working set in memory - hardly do IO calls.
And I agree with jeremyjjbrown - test, test, test.
Your KPI would be transactions (R/W) per seconds and transactions per Dollar.

Optimizing Solr 4 on EC2 debian instance(s)

My Solr 4 instance is slow and I don't know why.
I am attempting to modify the configurations of JVM, Tomcat6 and Solr 4 in order
to optimize performance, with queries per second as the key metric.
Currently I am running on an EC2 small tier with Debian squeeze, but ready to switch to Ubuntu if needed.
There is nothing special about my use case. The index is small. Queries do include a moderate number of unions (e.g. 10), plus faceting, but I don't think that's unusual.
My understanding is that these areas could need tweaking:
Configuring the JVM Garbage collection schedule and memory allocation ("GC tuning is a precise art form", ref)
Other JVM settings
Solr's Query Result cache, Filter cache, Document cache settings
Solr's Auto-warming settings
There are a number of ways to monitor the performance of Solr:
SolrMeter
Sematext SPM
New Relic
But none of these methods indicate which settings need to be adjusted, and there's no guide that I know of that steps through an exhaustive list of settings that could possibly improve performance. I've reviewed the following pages (one, two, three, four), and gone through some rounds of trial and error so far without improvement.
Questions:
How to tell JVM to use all the 2 GB memory on the small EC2 instance?
How to debug and optimize JVM Garbage Collection?
How do I know when I/O throttling, such as the new EBS IOPS pricing, is the issue?
Using figures like the NewRelic examples below, how to detect what is problematic behavior, and how to approach solutions.
Answers:
I'm looking for link to good documentation for setting up and optimizing Solr 4, from a DevOps or server admin perspective (not index or application design).
I'm looking for the top trouble spots in catalina.sh, solrconfig.xml, solr.xml (other?) that are most likely causes of problems.
Or any tips you think address the questions.
First, you should not focus on switching your linux distribution. A different distribution might bring some changes but considering the information you gave, nothing prove that these changes may be significant.
You are mentionning lots of possibilities for your optimisations, this can be overwhelming. You should consider an tweaking area only once you have proven that the problem lies in that particular part of your stack.
JVM Heap Sizing
You can use the parameter -mx1700m to give a maximum of 1.7GB of RAM to the JVM. Hotspot might not need it, so don't be surprised if your heap capacity does not reach that number.
You should set the minimum heap size to a low value, so that Hotspot can optimise its memory usage. For instance, to set a minimal heap size at 128MB, use -mx128m.
Garbage Collector
From what you say, you have limited hardware (1-core at 1.2GHz max, see this page)
M1 Small Instance
1.7 GiB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)
...
One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2
GHz 2007 Opteron or 2007 Xeon processor
Therefore, using that low-latency GC (CMS) won't do any good. It won't be able to run concurrently with your application since you have only one core. You should switch to the Throughput GC using -XX:+UseParallelGC -XX:+UseParallelOldGC.
Is the GC really a problem ?
To answer that question, you need to turn on GC logging. It is the only way to see whether GC pauses are responsible for your application response time. You should turn these on with -Xloggc:gc.log -XX:+PrintGCDetails.
But I don't think the problem lies here.
Is it a hardware problem ?
To answer this question, you need to monitor resource utilization (disk I/O, network I/O, memory usage, CPU usage). You have a lot of tools to do that, including top, free, vmstat, iostat, mpstat, ifstat, ...
If you find that some of these resources are saturating, then you need a bigger EC2 instance.
Is it a software problem ?
In your stats, the document cache hit rate and the filter cache hit rate are healthy. However, I think the query result cache hit rate is pretty low. This implies a lot of queries operations.
You should monitor the query execution time. Depending on that value you may want to increase the cache size or tune the queries so that they take less time.
More links
JVM options reference : http://jvm-options.tech.xebia.fr/
A feedback that I did on some application performance audit : http://www.pingtimeout.fr/2013/03/petclinic-performance-tuning-about.html
Hope that helps !

Reasons for Redis to slow down

What could be the reasons for Redis slow work/response?
i.e. I found on Stackoverflow that storing large files or data in Redis makes it slow. What's else?
There is no simple answer to this question. With all NoSQL or SQL based storage solutions, there are plenty of conditions that could result in high latency or slowness of the storage engine. Redis is no exception.
I would suggest to start by reading:
How fast is Redis?
Redis latency problems troubleshooting
Here is a non exhaustive list of potential reasons:
Inadequate hardware (network, memory, CPU)
Software based virtualization (Xen on low-end hardware for instance)
Not enough memory, generating swapping at the OS level
Too many O(n) operations (like KEYS) executed in the single-threaded engine
Large objects stored in Redis, leading to uncontrolled expansion of the communication buffers
Huge number of simultaneous sessions (>30000)
Too many connection operations per second (Redis is not a webserver, connections are supposed to be permanent, not transient).
Too many roundtrips generated by the client application (no pipelining or aggregated command usage)
Large fork operations generated by bgsave or AOF rewrite (especially on VMs)
I/O related latencies when AOF is used
Accumulation of many expire operations triggered at the same time
Accumulation of memory in client and master/slave communication buffers, or slow log data
TCP incast conditions when network bandwidth consumption is significant
Using distributed storage (and especially cloudy ones such as EC2 EBS) to store dump or AOF files
There are probably many other reasons, related to the workload generated by your own application.
If some people think about other general reasons, we can add them to this list.
As was mentioned new connections, > 200 per minute could cause slownesses. A Possible solution is to add a proxy that keeps constant number of connections:
twemproxy
envoy

Resources