Why is this JeroMQ (ZeroMQ port) benchmark so slow? - benchmarking

I would like to use this library I found, it's a pure java port (not a wrapper) of zeromq.
I am trying to test it and while it claims some good numbers, the test I am performing is giving rather poor results and it's even performed locally (client and serve on the same machine). I'm sure it's something I am doing wrong. It takes approx. 5 seconds to execute this 10.000 messages loop.
All I did is take the Hello world example and removed pause and sysouts. Here is the code:
The Server:
package guide;
import org.jeromq.ZMQ;
public class hwserver{
public static void main(String[] args) throws Exception{
// Prepare our context and socket
ZMQ.Context context = ZMQ.context(1);
ZMQ.Socket socket = context.socket(ZMQ.REP);
System.out.println("Binding hello world server");
socket.bind ("tcp://*:5555");
while (true) {
byte[] reply = socket.recv(0);
String requestString = "Hello" ;
byte[] request = requestString.getBytes();
socket.send(request, 0);
}
}
}
The Client:
package guide;
import org.jeromq.ZMQ;
public class hwclient{
public static void main(String[] args){
ZMQ.Context context = ZMQ.context(1);
ZMQ.Socket socket = context.socket(ZMQ.REQ);
socket.connect ("tcp://localhost:5555");
System.out.println("Connecting to hello world server");
long start = System.currentTimeMillis();
for(int request_nbr = 0; request_nbr != 10_000; request_nbr++) {
String requestString = "Hello" ;
byte[] request = requestString.getBytes();
socket.send(request, 0);
byte[] reply = socket.recv(0);
}
long end = System.currentTimeMillis();
System.out.println(end-start);
socket.close();
context.term();
}
}
Is is possible to fix this code and get some decent numbers?

You're doing round-trip request-reply, and this will be just as slow using the C++ libzmq. You will only get fast performance on JeroQM, ZeroMQ, or any I/O when you do streaming.
Round-tripping is slow due to how I/O and TCP work. On libzmq we can do about 20K messages/second using round-tripping, and 8M/sec using streaming. Streaming has additional optimizations like batching which you can't do with round-trip request-reply.
For a throughput performance test, send 10M messages from node 1 to node 2, then send back a single ACK when you get them. Time that on ZeroMQ and on JeroMQ, you should see around 3x difference in speed.

Please refer the throughput test between synchronous round-trip and asynchronous round-trip at
https://github.com/zeromq/jeromq/blob/master/src/test/java/guide/tripping.java
The asynchronous was x40 faster than the synchronous round-trip.
If you want to benchmark the full speed of jeromq, please run perf.LocalThr and perf.RemoteThr on your environment.

Related

Send and receive a w3c.dom.Document over socket as byte[] Java

I send a document over socket like this:
sendFXML(asByteArray(getRequiredScene(fetchSceneRequest())));
private void sendFXML(byte[] requiredFXML) throws IOException, TransformerException {
dataOutputStream.write(requiredFXML);
dataOutputStream.flush();
}
private Document getRequiredScene(String requiredFile) throws IOException, ParserConfigurationException, SAXException, TransformerException {
return new XMLLocator().getDocumentOrReturnNull(requiredFile);
}
private String fetchSceneRequest() throws IOException, ClassNotFoundException {
return dataInputStream.readUTF();
}
On the side of XMLLocator it finds the correct document and parses it right. I see it by printing the whole doc in console.
But I cannot handle it on the clients side where it's fetch by:
public static void receivePage() throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] data = new byte[989898];
int bytesRead = -1;
while((bytesRead = dataInputStream.read(data)) != -1 ) { //stops here
baos.write(data, 0, bytesRead );
}
Files.write(Paths.get(FILE_TO_RECEIVED), data);
}
After the first iteration in while() cycle it just stops on the commented place.
I don't know if I have an error on the side of the server and I send this in doc in an incorrect format or I read the sent byte array incorrectly. Where is the problem?
Edit:
For the debug purpose, in the receivePage() method, I've chosen a different way of reading the byte array from server which goes like:
int count = inputStream.available();
byte[] b = new byte[count];
int bytes = dataInputStream.read(b);
System.out.println(bytes);
for (byte by : b) {
System.out.print((char)by);
}
And now I'm able to print fetched FXLM in console but a new problem has appeared.
On debug, it normally receives the byte[] from server, writes 2024 for count and displayes the content of the file but if I run the app normally via Shift + f10 it fetches nothing and just writes 0 in console
Edit2:
For some reason, once again, on debug, it's able to even write into a file
for (byte by : b) {
Files.write(Paths.get(FILE_TO_RECEIVED), b);
System.out.print((char)by);
}
But when I try to return this fxml on debug and then show like this:
Parent fxmlToShow = FXMLLoader.load(getClass().getResource("/network/gui.fxml"));
Scene childScene = new Scene(fxmlToShow);
Stage window = (Stage)((Node)ae.getSource()).getScene().getWindow();
window.setScene(childScene);
return window;
It shows only previous files. Like on the first attempt of debug it show a blank page when I asked for the 1st one from server. On the second attempt of debug when i ask for 3rd page from server, it shows me the previously asked one and so on.
To me, it seems absolutely insane cuz the fxml rile actually refreshes before the line
Parent fxmlToShow = FXMLLoader.load(getClass().getResource("/network/gui.fxml"));
is invoked.
Yeah, thank everybody for participating.
So, the issue of incorrect displaying if FXML files was caused by the incorrect FILE_TO_RECEIVED path.
When FXMLLoader.load(getClass().getResource("/network/gui.fxml")); loads gui.fxml it takes it not from D:\\JetBrains\\IdeaProjects\\Client\\src\\network\\gui.fxml,im my case, but from D:\\JetBrains\\IdeaProjects\\Client\\OUT\\PRODUCTION\\Client\\network\\gui.fxml.
As for me, that doesn't seem obvious.
What about different behaviour on debug and on run. In method receivePage() it needs to wait until connection is available.
int count = inputStream.available();
If you read docs for this method you will see
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream ...
The available method for class InputStream always returns 0...
So, you jext need to wait for connection to be available
while(inputStream.available()==0){
Thread.sleep(100);
}
Otherwise it just prepares byte[] b = new byte[count]; for 0 bytes and you can write in nothing.

Amazon MWS - Request Throttled

After reading the Throttling documentation https://docs.developer.amazonservices.com/en_US/products/Products_Throttling.html and https://docs.developer.amazonservices.com/en_US/dev_guide/DG_Throttling.html , I've started honoring the quotaRemaining and the quotaResetsAt response headers so that I dont go beyond the quote limit. However, whenever I fire a few requests within quick succession, i get the following exception.
The documentation doesnt mention anything about any burst limits. It talks about maximum request quota, but i dont know how that applies to my case. I'm invoking the ListMatchingProducts api
Caused by: com.amazonservices.mws.client.MwsException: Request is throttled
at com.amazonservices.mws.client.MwsAQCall.invoke(MwsAQCall.java:312)
at com.amazonservices.mws.client.MwsConnection.call(MwsConnection.java:422)
... 19 more
I guess I figured it out.
ListMatchingProducts mentions that the Maximum Request Quota is 20. Practically this means that you can fire at max 20 requests in quick succession, but after that you must wait until the Restore Rate "replenishes" your request "credits" (i.e in my case 1 request every 5 seconds).
This Restore rate will (every 5 seconds) start to then re-fill the quota, up to a max of 20 requests. The following code worked for me...
class Client {
private final int maxRequestQuota = 19
private Semaphore maximumRequestQuotaSemaphore = new Semaphore(maxRequestQuota)
private volatile boolean done = false
Client() {
new EveryFiveSecondRefiller().start()
}
ListMatchingProductsResponse fetch(String searchString) {
maximumRequestQuotaSemaphore.acquire()
// .....
}
class EveryFiveSecondRefiller extends Thread {
#Override
void run() {
while (!done()) {
int availablePermits = maximumRequestQuotaSemaphore.availablePermits()
if (availablePermits == maxRequestQuota) {
log.debug("Max permits reached. Waiting for 5 seconds")
sleep(5000)
continue
}
log.debug("Releasing a single permit. Current available permits are $availablePermits")
maximumRequestQuotaSemaphore.release()
sleep(5000)
}
}
boolean done() {
done
}
}
void close() {
done = true
}
}

JDBC Connection pooling for SQL Server: DBCP vs C3P0 vs No Pooling

I got this Java webapp which happens to communicate too much with a SQL Server Database. I wanna decide how to manage the connections to this DB in an efficient manner. The first option which pops to mind is using connection pooling third parties. I chose C3P0 and DBCP and prepared some test cases to compare these approaches as follows:
No Pooling:
public static void main(String[] args) {
long startTime=System.currentTimeMillis();
try {
for (int i = 0; i < 100; i++) {
Connection conn = ConnectionManager_SQL.getInstance().getConnection();
String query = "SELECT * FROM MyTable;";
PreparedStatement prest = conn.prepareStatement(query);
ResultSet rs = prest.executeQuery();
if (rs.next()) {
System.out.println(i + ": " + rs.getString("CorpName"));
}
conn.close();
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Finished in: "+(System.currentTimeMillis()-startTime)+" milli secs");
}
DBCP:
public static void main(String[] args) {
long startTime=System.currentTimeMillis();
try {
for (int i = 0; i < 100; i++) {
Connection conn = ConnectionManager_SQL_DBCP.getInstance().getConnection();
String query = "SELECT * FROM MyTable;";
PreparedStatement prest = conn.prepareStatement(query);
ResultSet rs = prest.executeQuery();
if (rs.next()) {
System.out.println(i + ": " + rs.getString("CorpName"));
}
conn.close();
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Finished in: "+(System.currentTimeMillis()-startTime)+" milli secs");
}
C3P0:
public static void main(String[] args) {
long startTime=System.currentTimeMillis();
try {
for (int i = 0; i < 100; i++) {
Connection conn = ConnectionManager_SQL_C3P0.getInstance().getConnection();
String query = "SELECT * FROM MyTable;";
PreparedStatement prest = conn.prepareStatement(query);
ResultSet rs = prest.executeQuery();
if (rs.next()) {
System.out.println(i + ": " + rs.getString("CorpName"));
}
conn.close();
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Finished in: "+(System.currentTimeMillis()-startTime)+" milli secs");
}
And Here is the results:
Max Pool size for c3p0 and dbcp=10
c3p0: 5534 milli secs
dbcp: 4807 milli secs
No Pooling: 2660 milli secs
__
Max Pool size for c3p0 and dbcp=100
c3p0: 4937 milli secs
dbcp: 4798 milli secs
No Pooling: 2660 milli secs
One might say the initialization and startup time of pooling libraries might affect the results of these test cases. I have repeated them with larger numbers in the loop and results are almost the same.
Surprisingly the no pooling approach is much more faster than connection pooling methods. While I assume when we close a connection physically, getting a new one must be more time consuming.
So, what's going on here?
EDIT_01: c3p0 and dbcp configurations
c3p0:
cpds.setMinPoolSize(5);
cpds.setAcquireIncrement(5);
cpds.setMaxPoolSize(100);
cpds.setMaxStatements(1000);
dbcp:
basicDataSource.setMinIdle(5);
basicDataSource.setMaxIdle(30);
basicDataSource.setMaxTotal(100);
basicDataSource.setMaxOpenPreparedStatements(180);
The rest of configurations are left as default. Worth to mention that all connections are established for a DB on localhost.
c3p0 is not deader than a doornail. It's old but (somewhat) actively maintained. Whether newer alternatives better suit your application is for you to decide.
What version of c3p0 are you using? If you think it is deader than a doornail, are you using an old version? You should be using 0.9.5.2.
The outcome of the test as you've defined it will be highly dependent on lots of things difficult to evaluate with the information you've provided. As Mark Rotteveel points out, you've not shown any information about your config. You've not said anything about the location of the SQL Server. You'll notice greater benefit from a Connection pool when the database is remote than when it is local, as some of the performance improvement comes from amortizing the network latency of Connection acquisition over multiple client uses. Your test executes a query and iterates through the result set. The longer the result set, the more you'll see overhead from the Connection pool (which must proxy the ResultSet) overtake the benefits of faster Connection acquisition. (The numbers you are getting look unusually bad, though. c3p0 typically has very fast ResultSet passthrough performance.) With a sufficiently long queries, the cost of Connection acquisition becomes negligible, if iterating through a ResultSet, the overhead of the pooling library increases, making a Connection pool not so useful.
But this is far from the typical use case for web or mobile clients, which usually make short queries, inserts, and updates. For short queries, inserts, and updates, the cost of a de novo Connection acquisition can be very large relative to the execution of the query. This is the use-case for which Connection pools offer a large improvement. That may not be what you are testing; it depends on how big MyTable is.

Algorithm for concurrent access to resource(s) on database

Some time ago we implemented a warehouse management app that keeps track of quantities of each product we have in the store. We solved the problem of concurrent access to data with database locks (select for update), but this approach led to poor performance when many clients try to consume product quantities from the same store. Note that we manage only a small set of product types (less than 10) so the degree of concurrency could be heavy (also, we don't care of stock re-fill). We thought to split each resource quantity in smaller "buckets", but this approach could lead to starvation for clients that try to consume a quantity that is bigger than each bucket capacity: we should manage buckets merge and so on...
My question is: there are some broadly-accepted solutions to this problem? I also looked for academic articles but the topic seems too wide.
P.S. 1:
our application runs in a clustered environment, so we cannot rely on the application concurrency control. The question aims to find an algorithm that structures and manages the data in a different way than a single row, but keeping all the advantages that a db transaction (using locks or not) has.
P.S. 2: for your info, we manage a wide number of similar warehouses, the example focuses on a single one, but we keep all the data in one db (prices are all the same, etc).
Edit: The setup below will still work on a cluster if you use a queueing program that can coordinate among multiple processes / servers, e.g. RabbitMQ.
You can also use a simpler queueing algorithm that only uses the database, with the downside that it requires polling (whereas a system like RabbitMQ allows threads to block until a message is available). Create a Requests table with a column for unique requestIds (e.g. a random UUID) that acts as the primary key, a timestamp column, a respourceType column, and an integer requestedQuantity column. You'll also need a Logs table with a unique requestId column that acts as the primary key, a timestamp column, a resourceType column, an integer requestQuantity column, and a boolean/tinyint/whatever success column.
When a client requests a quantity of ResourceX it generates a random UUID and adds a row to the Requests table using the UUID as the requestId, and then polls the Logs table for the requestId. If the success column is true then the request succeeded, else it failed.
The server with the database assigns one thread or process to each resource, e.g. ProcessX is in charge of ResourceX. ProcessX retrieves all rows from the Requests table where resourceType = ResourceX, sorted by timestamp, and then deletes them from Requests; it then processes each request in order, decrementing an in-memory counter for each successful request, and at the end of processing the requests it updates the quantity of ResourceX on the Resources table. It then writes each request and its success status to the Logs table. It then retrieves all of the requests from Requests where requestType = RequestX again, etc.
It may be slightly more efficient to use an autoincrement integer as the Requests primary key, and to have ProcessX sort by primary key instead of by timestamp.
One option is to assign one DAOThread per resource - this thread is the only thing that accesses that resource's database table so that there's no locking at the database level. Workers (e.g. web sessions) request resource quantities using a concurrent queue - the example below uses a Java BlockingQueue, but most languages will have some sort of concurrent queue implementation you can use.
public class Request {
final int value;
final BlockingQueue<ReturnMessage> queue;
}
public class ReturnMessage {
final int value;
final String resourceType;
final boolean isSuccess;
}
public class DAOThread implements Runnable {
private final int MAX_CHANGES = 10;
private String resourceType;
private int quantity;
private int changeCount = 0;
private DBTable table;
private BlockingQueue<Request> queue;
public DAOThread(DBTable table, BlockingQueue<Request> queue) {
this.table = table;
this.resourceType = table.select("resource_type");
this.quantity = table.select("quantity");
this.queue = queue;
}
public void run() {
while(true) {
Requester request = queue.take();
if(request.value <= quantity) {
quantity -= request.value;
if(++changeCount > MAX_CHANGES) {
changeCount = 0;
table.update("quantity", quantity);
}
request.queue.offer(new ReturnMessage(request.value, resourceType, true));
} else {
request.queue.offer(new ReturnMessage(request.value, resourceType, false));
}
}
}
}
public class Worker {
final Map<String, BlockingQueue<Request>> dbMap;
final SynchronousQueue<ReturnMessage> queue = new SynchronousQueue<>();
public class WorkerThread(Map<String, BlockingQueue<Request>> dbMap) {
this.dbMap = dbMap;
}
public boolean request(String resourceType, int value) {
dbMap.get(resourceType).offer(new Request(value, queue));
return queue.take();
}
}
The Workers send resource requests to the appropriate DAOThread's queue; the DAOThread processes these requests in order, either updating the local resource quantity if the request's value doesn't exceed the quantity and returning a Success, else leaving the quantity unchanged and returning a Failure. The database is only updated after ten updates to reduce the amount of IO; the larger MAX_CHANGES is, the more complicated it will be to recover from system failure. You can also have a dedicated IOThread that does all of the database writes - this way you don't need to duplicate any logging or timing (e.g. there ought to be a Timer that flushes the current quantity to the database after every few seconds).
The Worker uses a SynchronousQueue to wait for a response from the DAOThread (a SynchronousQueue is a BlockingQueue that can only hold one item); if the Worker is running in its own thread the you may want to replace this with a standard multi-item BlockingQueue so that the Worker can process the ReturnMessages in any order.
There are some databases e.g. Riak that have native support for counters, so this might improve your IO thoughput and reduce or eliminate the need for a MAX_CHANGES.
You can further increase throughput by introducing BufferThreads to buffer the requests to the DAOThreads.
public class BufferThread implements Runnable {
final SynchronousQueue<ReturnMessage> returnQueue = new SynchronousQueue<>();
final int BUFFERSIZE = 10;
private DAOThread daoThread;
private BlockingQueue<Request> queue;
private ArrayList<Request> buffer = new ArrayList<>(BUFFERSIZE);
private int tempTotal = 0;
public BufferThread(DAOThread daoThread, BlockingQueue<Request> queue) {
this.daoThread = daoThread;
this.queue = queue;
}
public void run() {
while(true) {
Request request = queue.poll(100, TimeUnit.MILLISECONDS);
if(request != null) {
tempTotal += request.value;
buffer.add(request);
}
if(buffer.size() == BUFFERSIZE || request == null) {
daoThread.queue.offer(new Request(tempTotal, returnQueue));
ReturnMessage message = returnQueue.take();
if(message.isSuccess()) {
for(Request request: buffer) {
request.queue.offer(new ReturnMessage(request.value, daoThread.resourceType, message.isSuccess));
}
} else {
// send unbuffered requests to DAOThread to see if any can be satisfied
for(Request request: buffer) {
daoThread.queue.offer(request);
}
}
buffer.clear();
tempTotal = 0;
}
}
}
}
The Workers send their requests to the BufferThreads, who then wait until they've buffered BUFFERSIZE requests or have waited for 100ms for a request to come through the buffer (Request request = queue.poll(100, TimeUnit.MILLISECONDS)), at which point they forward the buffered message to the DAOThread. You can have multiple buffers per DAOThread - rather than sending a Map<String, BlockingQueue<Request>> to the Workers you instead send a Map<String, ArrayList<BlockingQueue<Request>>>, one queue per BufferThread, with the Worker either using a counter or a random number generator to determine which BufferThread to send a request to. Note that if BUFFERSIZE is too large and/or if you have too many BufferThreads then Workers will suffer from long pause times as they wait for the buffer to fill up.

Querying real time data from an SQL database sudden latency problem

We are testing an application that is supposed to display real time data for multiple users on a 1 second basis. New data of 128 rows is inserted each one second by the server application into an SQL datatbase then it has to be queried by all users along with another old referential 128 rows.
We tested the query time and it didn't exceed 30 milliseonds; also the interface function that invokes the query didn't take more than 50 milliseconds with processing the data and all
We developed a testing application that creates a thread and an SQL connection per each user. The user issues 7 queries each 1 second. Everything starts fine, and no user takes more than 300 milliseconds for the 7 data series ( queries ). However, after 10 minutes, the latency exceeds 1 second and keeps on increasing. We don't know if the problem is from the SQL server 2008 handling multiple requests at the same time, and how to overcome such a problem.
Here's our testing client if it might help. Note that the client and server are made on the same 8 CPU machine with 8 GB RAM. Now we're questioning whether the database might not be the optimal solution for us.
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Enter Number of threads");
int threads = int.Parse(Console.ReadLine());
ArrayList l = new ArrayList();
for (int i = 0; i < threads; i++)
{
User u = new User();
Thread th = new Thread(u.Start);
th.IsBackground = true;
th.Start();
l.Add(u);
l.Add(th);
}
Thread.CurrentThread.Join();
GC.KeepAlive(l);
}
}
class User
{
BusinessServer client ; // the data base interface dll
public static int usernumber =0 ;
static TextWriter log;
public User()
{
client = new BusinessServer(); // creates an SQL connection in the constructor
Interlocked.Increment(ref usernumber);
}
public static void SetLog(int processnumber)
{
log = TextWriter.Synchronized(new StreamWriter(processnumber + ".txt"));
}
public void Start()
{
Dictionary<short, symbolStruct> companiesdic = client.getSymbolData();
short [] symbolids=companiesdic.Keys.ToArray();
Stopwatch sw = new Stopwatch();
while (true)
{
int current;
sw.Start();
current = client.getMaxCurrentBarTime();
for (int j = 0; j < 7; j++)
{
client.getValueAverage(dataType.mv, symbolids,
action.Add, actionType.Buy,
calculationType.type1,
weightType.freeFloatingShares, null, 10, current, functionBehaviour.difference); // this is the function that has the queries
}
sw.Stop();
Console.WriteLine(DateTime.Now.ToString("hh:mm:ss") + "\t" + sw.ElapsedMilliseconds);
if (sw.ElapsedMilliseconds > 1000)
{
Console.WriteLine("warning");
}
sw.Reset();
long diff = 0;//(1000 - sw.ElapsedMilliseconds);
long sleep = diff > 0 ? diff : 1000;
Thread.Sleep((int)sleep);
}
}
}
Warning: this answer is based on knowledge of MSSQL 2000 - not sure if it is still correct.
If you do a lot of inserts, the indexes will eventually get out of date and the server will automatically switch to table scans until the indexes are rebuilt. Some of this is done automatically, but you may want to force reindexing periodically if this kind of performance is critical.
I would suspect the query itself. While it may not take much time on an empty database, as the amount of data grows it may require more and more time depending on how the look up is done. Have you examined the query plan to make sure that it is doing index lookups instead of table scans to find the data? If not, perhaps introducing some indexes would help.

Resources