How to load test Server Sent Events? - benchmarking

I have a small app that sends Server Sent Events. I would like to load test my app so I can benchmark the latency from the time a message is pushed to the time the message is received so I can know when/where the performance breaks down. What tools are available to be able to do this?

Since Server-Sent Events it is just HTTP you can use siege utility. Here is the example:
siege -b -t 1m -c45 http://127.0.0.1:9292/streaming
Where:
-b benchmark mode i.e. don't wait between connections
-t 1m benchmark for 1 minute
-c45 number of concurrent connections
http://127.0.0.1:9292 my dev server host and custom port
/streaming HTTP-endpoint which respond with Content-Type: text/event-stream
Output:
Lifting the server siege... done.
Transactions: 79 hits
Availability: 100.00 %
Elapsed time: 59.87 secs
Data transferred: 0.01 MB
Response time: 23.43 secs
Transaction rate: 1.32 trans/sec
Throughput: 0.00 MB/sec
Concurrency: 30.91
Successful transactions: 79
Failed transactions: 0
Longest transaction: 30.12
Shortest transaction: 10.04

I took a simple path of creating a shell script that initiates N background jobs of cURL which connected to the SSE endpoint of my service.
To get the exact cURL syntax, open your Chrome web dev tools -> Network tab -> right click on the entry of the request to the SSE endpoint and choose from the context menu "Copy as cURL"
Then you paste that command in a shell script that roughly looks like:
#!/bin/bash
i=0;
while [ $i -lt 50 ] ;do
[PASTE YOUR cURL COMMAND HERE] -s -o /dev/null &
i=`expr $i + 1`;
done
This will add 50 background cURL jobs each time it's run. Notice that I added to Chrome's cURL command the params -s -o /dev/null. This is to run cURL in silent mode and to suppress any output.
In my case the service was implemented in NodeJs, so I used process.hrtime() for high precision timing to measure the delay of looping through the N connected clients to broadcast the data.
The results were ok: it served 1000+ active connections in ~0.02sec
Keep in mind that if you run server + cURL clients from the same machine, you'll probably hit OS limits of open files. To see open file limits on your linux box (common case is 1024) run:
$ ulimit -n
To avoid reaching the 1000+ active cURLs I got, you can:
start them from multiple machines
or increase this limit (see sysctl)
The problem I faced was that eventually node crushed with an ELIFECYCLE error and the log was not very helpful in diagnosing the problem. Any suggestions are welcome.

Related

Once completing the thread the count down timer still working.. why is that?

I have used 100 users for my testing and it is completed 100 and it shows as 100/100 in the right-hand upper corner, but the timer is working? I don't know why is that..can you please explain to me why it happens like that. because when am using 5 users and once it completed timer is stopped. please help me. please check the attachment.
There is no any attachment however I can try to guess the possible reason.
When you start JMeter the following beautiful warning is displayed:
================================================================================
Don't use GUI mode for load testing !, only for Test creation and Test debugging.
For load testing, use CLI Mode (was NON GUI):
jmeter -n -t [jmx file] -l [results file] -e -o [Path to web report folder]
& increase Java Heap to meet your test requirements:
Modify current env variable HEAP="-Xms1g -Xmx1g -XX:MaxMetaspaceSize=256m" in the jmeter batch file
Check : https://jmeter.apache.org/usermanual/best-practices.html
================================================================================
So my expectation is that 100 users is way too much than JMeter GUI can handle so consider running your JMeter test in command-line non-GUI mode and it should end normally/gracefully. If it doesn't - make sure to follow JMeter Best Practices

Share storage bucket between apps

I have an internal tool that lets me edit configuration files and then the config files gets synced to Google Storage (* * * * * gsutil -m rsync -d /data/www/config_files/ gs://my-site.appspot.com/configs/).
How can I use these config files across multiple instances in Google App Engine? (I don't want to use the Google PHP SDK to read / write to the config files in the bucket).
Only thing I can come up with is making a cron.yaml file that downloads the configs from the bucket to /app/configs/ every minute, but then I'd have to reload php-fpm every minute as well.
app.yaml:
runtime: custom
env: flex
service: my-site
env_variables:
CONFIG_DIR: /app/configs
resources:
cpu: 1
memory_gb: 0.5
disk_size_gb: 10
automatic_scaling:
min_num_instances: 2
max_num_instances: 20
cpu_utilization:
target_utilization: 0.5
Dockerfile:
FROM eu.gcr.io/google-appengine/php71
RUN mkdir -p /app;
ADD . /app
RUN chmod -R a+r /app
I am assuming you are designing a solution where you can use pull requests on the GCS bucket to get configuration and update your apps en mass quickly.
There are many points in the process, depending on your exact flow, where you can insert a "please update now" command. For example, why can't you simply queue a task as you update the configuration in your GCS bucket? That task will basically down the configuration and redeloy your application.
Unless you are thinking about using multiple applications that have access to that bucket, and you want to be able to update them at the same time centrally. In that case, your cron job solution makes sense. Dan's suggestion definitely works, but I think you can make it easier by using version numbers. Simply have another file with a version number in it, the cron job pulls that file, compares it and then performs an update if the version is newer. It's very similar to Dan's solution except you don't really need to hash anything. If you are updating GCS with your configurations, might as well tag on another file with the version information.
Another solution is to expose a handler in all those applications, for example an "/update" handler. Whenever it's hit, the application performs the update. You can hit that handler whenever you actually update the configuration in your GCS. This is more of a push solution. The advantage is that you have more control over which applications gets the updates, this might be useful if you aren't sure about a certain configuration yet so you don't want to update everything at once.
We did not want to add a handler in our application for this. We thought it was best to use supervisord.
additional-supervisord.conf:
[program:sync-configs]
command = /app/scripts/sync_configs.sh
startsecs = 0
autorestart = false
startretries = 1
sync_configs.sh:
#!/usr/bin/env bash
while true; do
# Sync configs from Google Storage.
gsutil -m rsync -c -r ${CONFIG_BUCKET} /app/config
# Reload PHP-FPM
ps ax | grep php-fpm | cut -f2 -d" " - | xargs kill -s USR2
# Wait 60 seconds.
sleep 60
done
Dockerfile:
COPY additional-supervisord.conf /etc/supervisor/conf.d/

Always getting message on startup: "Still not seeing Solr listening on 8983 after 180 seconds"

Version: Solr 6.3
OS: CentOs 7.3
After installation when running service solr restart, after 180 seconds I always get the same message before the INFO messages print out.
$ service solr restart
Archiving 1 old GC log files to /var/solr/logs/archived
Archiving 1 console log files to /var/solr/logs/archived
Rotating solr logs, keeping a max of 9 generations
Waiting up to 180 seconds to see Solr running on port 8983 [-] Still not seeing Solr listening on 8983 after 180 seconds!
What's weird is that the Solr server comes up and is accessible via the web interface almost immediately, however the full 180 seconds are spent waiting only to throw that message out each time. What causes this message and how can I get Solr identified to be running sooner?
Thanks!
This looks like either Solr not running on that port or it listens on specific interface and the checker scripts is using a default (localhost?) one. Can you run that with debug or check the definitions in the startup script?
I was getting the same error message trying to start solr: "Still not seeing Solr listening on 8983 after 180 seconds!". However, I couldn't access solr's web interface either. Checking the log files in /var/log/solr I read the following error message:
java.nio.file.AccessDeniedException: /tmp/start_6692986047430088693.properties
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.createFile(Files.java:632)
at java.nio.file.TempFileHelper.create(TempFileHelper.java:138)
at java.nio.file.TempFileHelper.createTempFile(TempFileHelper.java:161)
at java.nio.file.Files.createTempFile(Files.java:897)
at org.eclipse.jetty.start.StartArgs.getMainArgs(StartArgs.java:596)
at org.eclipse.jetty.start.Main.invokeMain(Main.java:205)
at org.eclipse.jetty.start.Main.start(Main.java:458)
at org.eclipse.jetty.start.Main.main(Main.java:76)
The problem was that I was inside a FreeBSD jail that had unusual permissions set on the /tmp directory (also on /var/tmp). Fixing the permissions on these directories solved the problem:
# chmod 1777 /tmp /var/tmp
I realize the cause of your problem is probably different. But since the error message is the same, I thought it could be useful to add this solution here.

Staring ServerAgent into Jmeter Thread Group from Batch file (called from beanshell sampler)

I've created a JMeter script that I should run later on Jenkins.
First in my script I add a Thread Group with 1 user, 1 loop and a Beanshell Sampler to start the ServerAgent to later get PerfMon metrics from the server.
In Beanshell Sampler I have added:
Runtime.getRuntime().exec("C:/Windows/System32/cmd.exe /c D:/AppServer/AppServer.bat");
Inside the batch file (AppServer.bat) I have the following:
plink -v -ssh <username>#<host> -pw <password> -m D:\AppServer\commands.sh > D:\AppServer\outputlog.txt
and in commands.sh file I have:
#!/bin/bash
cd /ServerAgent-2.2.1
./startAgent.sh --udp-port 0 --sysinfo --auto-shutdown
In the second Tread Group (multiple users and loop forever for 10 min) is the recorded test steps and in the third Tread Group (1 user, 1 loop) I have only PerfMon listener.
Problem
When I'm running the test seams like all Threads are starting at the same time and there is not enough time to connect to the server and so the PerfMon metrics will not display (Connection refused: connect).
What I've tried
I tried with timers, setUp Thread Group and setting delay on second and third thread groups but I wasn't able to find a solution. The bat file is working. I tested it with disabling the other tread groups and I'm getting connected successfully to the server.
Also tried a different approach. I split the first tread group into one .jmx test and the other two Thread Groups into other test. In Jenkins I'm having two Build Steps.
In the first Build step I have Execute Windows Batch Command:
jmeter -n -t D:/SearchOrder/AppServerStart.jmx
Then I've add another Build Step in Jenkins to Invoke Ant build file to start the test script with the PerfMon.
The Console Output:
...
Executing test plan: D:\SearchOrder\AppServerStart.jmx ==> D:\SearchOrder\OrderSearch.jtl
Created the tree successfully using D:\SearchOrder\AppServerStart.jmx
Starting the test # Wed Dec 03 04:52:25 PST 2014 (1417611145595)
Waiting for possible shutdown message on port 4445
Tidying up ... # Wed Dec 03 04:52:25 PST 2014 (1417611145847)
... end of run
Executing test plan: D:\SearchOrder\OrderSearch.jmx ==> D:\SearchOrder\OrderSearch.jtl
Created the tree successfully using D:\SearchOrder\OrderSearch.jmx
Starting the test # Wed Dec 03 04:52:27 PST 2014 (1417611147067)
Waiting for possible shutdown message on port 4445_
...
The build are starting one after another and the server is not started in those 2 seconds, from one to another build steps.
Does anyone have similar experience or some advice on how to resolve this?
Are you sure you can connect to the serveragent.
Try
telnet localhost <portNumber>
A successful connection can be checked by keying in test, the response should be Yep. Or by querying for metrics
metrics:cpu
should look something like
13.451776649746192
11.688311688311689
12.071156289707751
16.64564943253468

Nagios: CRITICAL - Socket timeout after 10 seconds

I've been running nagios for about two years, but recently this problem started appearing with one of my services.
I'm getting
CRITICAL - Socket timeout after 10 seconds
for a check_http -H my.host.com -f follow -u /abc/def check, which used to work fine. No other services are reporting this problem. The remote site is up and healthy, and I can do a wget http://my.host.com/abc/def from the nagios server, and it downloads the response just fine. Also, doing a check_http -H my.host.com -f follow works just fine, i.e. it's only when I use the -u argument that things break. I also tried passing it a different user agent string, no difference. I tried increasing the timeout, no luck. I tried with -v, but all it get is:
GET /abc/def HTTP/1.0
User-Agent: check_http/v1861 (nagios-plugins 1.4.11)
Connection: close
Host: my.host.com
CRITICAL - Socket timeout after 10 seconds
... which does not tell me what's going wrong.
Any ideas how I could resolve this?
Thanks!
Try using the -N option of check_http.
I ran into similar problems, and in my case the web server didn't terminate the connection after sending the response (https was working, http wasn't). check_http tries to read from the open socket until the server closes the connection. If that doesn't happen then the timeout occurs.
The -N option tells check_http to receive only the header, but not the content of the page / document.
I tracked my issue down to an issue with the security providers configured in the most recent version of OpenSUSE.
From summary of other web pages it appears to be an issue with an attempt to use TLSv2 protocol which does not appear to work correctly, or is missing something in the default configurations to allow it to work.
To overcome the problem I commented out the security provider in question from the JRE security configuration file.
#security.provider.10=sun.security.pkcs11.SunPKCS11
The security.provider. value may be different in your configuration, but essentially the SunPKCS11 provider is at issue.
This configuration is normally found in
$JAVA_HOME/lib/security/java.security
of the JRE that you are using.
Fixed with this url in nrpe.cfg: (on Deb 6.0 Squeeze using nagios-nrpe-server)
command[check_http]=/usr/lib/nagios/plugins/check_http -H localhost -p 8080 -N -u /login?from=%2F
For whoever is interested, I stumbled in this problem too and the problem ended up being in mod_itk on the web server.
A patch is available, even if it seems it's not included in the current CentOS or Debian packages:
https://lists.err.no/pipermail/mpm-itk/2015-September/000925.html
In my case /etc/postfix/main.cf file was not good configured.
My mailserverrelay was not defined and was also very restrictive.
I should to add:
relayhost = mailrelay.ext.example.com
smtpd_relay_restrictions = permit_mynetworks permit_sasl_authenticated defer_unauth_destination

Resources