Why is SymmetricDS taking so long to sync? - symmetricds

When running SymmetricDS on system which has an unreliable connection, it takes a long time for sync to catch up when the connection comes back.
What kind of configuration values can I look at to make that sync goes faster?

There's no one definite answer. Configuration of symmetricDs parameters is a tradeoff. Improving one thing could harm another. Spend some time reading about different parameters, measuring and then trying what improves the performance or... be patient.

Related

Should I keep this "GlobalConnection" or create connection for every query?

I have inherited a legacy Delphi application that uses ADO to connect to SQL Server.
The application has a notion of a "Global Connection" -- that is a single connection that it opens at the start, and then keeps open all throughout the running of the application (which can be days, weeks, or longer....)
So my question is this: Should I keep this way of doing things or should I switch to a "connect-query-disconnect" mode of doing things? Does it matter?
Switching would be a non-trivial task, but I'll do it if it means better performance, data management, etc.
Well, it depends on what you're expecting to get out of it, and what kind of application it is.
There's nothing in particular wrong with using a single long-running connection, as long as the application can gracefully handle disconnections and recover or log/notify when it can't reconnect.
The problem with a connect-query-disconnect setup is that you're adding the overhead of connecting and disconnecting on every query. That's going to slow things down, and in an interactive GUI application users may notice the additional overhead. You also have to make sure that authorization is transparently handled if it isn't already.
At the same time, there may be interactive performance gains to be had if you can push all the queries off onto background threads and asynchronously update the GUI. If contention appears because the queries are serialized, you can migrate to a connection-pool system fairly readily as well and improve things even more. This has a fairly high complexity cost to it though, so now you're looking to balancing what the gains are compared to the work involved.
Right now, my ultimate response is "if it ain't broke, don't fix it." Changes along the lines you propose are a lot of work -- how much do the users of this application stand to gain? Are there other problems to solve that might benefit them more?
Edit: Okay, so it's broke. Well, slow at least, which is all the same to me. If you've ruled out problems with the SQL Server itself, and the queries are performing as fast as they can (i.e. DB schema is sane, the right indexes are available, queries aren't completely braindead, server has enough RAM and fast enough I/O, network isn't flaky, etc.), then yes, it's time to find ways to improve the performance of the app itself.
Simply moving to a connect-query-disconnect is going to make things worse, and the more queries you're issuing the bigger the drop off is going to be. It sounds like you're going to need to rearchitect the app so that you can run fewer queries, run them in the background, cache more aggressively on the client, or some combination of all 3.
Don't forget the making the clients perform better means that server side performance gets more important since it's probably going to be handling a higher load if clients start making multiple connections and issuing multiple queries in parallel.
As mr Frazier told before - the one global connection is not bad per se.
If you intend to change, first detect WHAT is the problem. Let's see some scenarios:
1
Some screens(IOW: an set of 1..n forms to operate in a business entity) are slow. Possible causes:
insuficient filtering resulting in a pletora of records being pulled from database without necessity.
the number of records are ok, but takes too much to render it. Solution: faster controls or intelligent rendering (ex.: Virtual list views)
too much queries each time you open an screen. Possible solutions: use TClientDatasets (or any in-memory dataset) to hold infrequently modified lookup tables. An more sophisticated cache for more extensive tables or opening those datasets in other threads can improve response times.
Scrolling on datasets with controls bound can be slow (just to remember, because those little details can be easily forgotten).
2
Whole app simply slows down. Checklist:
Network cards are ok? An few net cards mal-functioning can wreak havoc even on good structured networks as they create unnecessary noise on the line.
[MSSQL DBA HAT ON] The next on the line of attack is SQL Server. Ask the DBA to trace blocks and deadlocks. Register slow queries and work on them speed up. This relate directly to #1.1 and #1.3
Detect if some naive developer have done SELECT inside transactions. In read committed isolation, it's just overhead, as it'll create more network traffic. Open the query, retrieve the data and close the dataset.
Review the database schema, if you can.
Are any data-bound operations on a bulk of records (let's say, remarking the price of some/majority/all products) being done on the app? Make an SP or refactor the operation on an query, it'll be much faster and will reduce the load of the entire server.
Extensive operations on a group of records? Learn how to do that operations at once on the server instead of one-by-one record. See an examination of most used alternatives on the MSSQL MVP Erland Sommarskog's article on array and list on MSSQL.
Beware of queries with WHERE like : WHERE SomeFunction(table1.blabla) = #SomeParam . Most of time, that ones will not use an index causing to read the entire table to select the desired data. If is a big table.... Indexing on a persisted computed columns can make miracles...[MSSQL HAT OFF]
That's what I can think of without a little more detail... ;-)
Database connections are time consuming resources to create and the rule of thumb should be create as little as possible and reuse as much as possible. That's why some other technologies have database connection pools, which are typically established at application/service startup and then kept as long as possible and shared among threads.
From your comment, the application has performances issues, but it's difficult without more details to make any recommendation.
Should try to nail down what is slow - are all queries slow or just some specific ones?
If just some specific ones is there some correlation.
My 2 cents.

Is relational database appropriate for soft real-time system?

I'm working on a real-time video analysis system which processes the video stream frame by frame. At each frame it can generate several events which should be recorded and some delivered to another system via network. The system is soft real-time, i.e. message latencies higher than 25ms are highly undesirable, but not fatal.
Are relational databases (specifically, MySQL and Postgres) appropriate as the datastore for such system?
Can I expect the DB to work well when it is installed on its own server and has ~50 25fps streams of single-row SQL inserts coming in over the network?
EDIT: I think in general performance would not be a problem, but I worry about the latency variance. If it will occasionally delay for 1000 ms, that would be very bad.
Oh, and the system runs 24/7 so the DB could grow arbitrarily big. Does that degrade the insert latency?
I wouldn't worry too much about performance when choosing a relational database over another type of datastore, choose the solution that best meets your requirements for accessing that data later. However, if you do choose not only a RDBMS but one over the network then you might want to consider buffering events to a local disk briefly on their way over to the DB. Use a separate thread or process or something to push events into the DB to keep the realtime system unaffected.
Biggest problems are how unpredictable the latency will be and how it never goes down, always up. But modern hardware to the rescue, specify a machine with enough cpu cores. You can count on at least two, getting four is easy. So you can spin up a thread and dedicate one core to the dbase updates, isolating it from your soft real-time code. Now you don't care about the variability in the delays, at least as long as the dbase updates don't take so long that you generate data faster than it can consume.
Setup a dbase server and load it up with fake data, double the amount you think it ever needs to store. Test continuously while you develop, add the instrumenting code you need to measure how it is doing at an early stage in the project.
As I've written, if you queue the rows that need to be saved and save them in an async way (so not to stop the "main" thread) there shouldn't be any problem... BUT!!!
You want to save them in a DB... So someone else will read the rows AT THE SAME TIME they are being written. Sadly it's normally quite difficult to tell to a DB "this work is very high priority, everything else can be stalled but not this". So if someone does:
BEGIN TRANSACTION
SELECT COUNT(*) FROM TABLE
WAITFOR DELAY '01:00:00'
(I'm using T-Sql here... But I think it's quite clear. Ask for the COUNT(*) of the table, so that there is a lock on the table and then WAITFOR an hour)
then the writes could be stalled and go in timeout. In general if you configure everyone but the app to be able only to do reads, these problems shouldn't be present.

What is the Speed Difference Between Database and Web Service Calls?

All things being equal, and in the most simple form, which is faster?
1.) A call to a web service method
2.) A call to a database
For example, assume that you have a simple web service that just returns an integer that is calculated in X time. You also have a database that, when queried in th right way, also takes X time to calculate the answer. (So the compute time is the same in both cases) In both cases, assume the amount of data both directions is the same, say, a single 32-bit integer, for simplicity.
Thus far, the calculation times of both the web service and the database are exactly the same.
The environment is 1 application server, where the app resides, and 1 other server that is holding both the web service and the database. There is nothing else going on in the environment other than the application calling either the web service or database repeatedly. This all within one single LAN, so any network latency is equal.
From an application, which will be faster, the call to the database, or the call to the web service?
What I am trying to isolate, I guess, is which is more heavy-weight. Does the set up, open, close, tear down of a database connection end up slower than that for a web service, or is it the same? Additionally, if there are other things, such as parsing the result from a web service, how do they affect the speed?
O(1) doesn't refer to any length of time. A single operation could take .001 ms on a webservice and 100 seconds in a database and they both could be using O(1) functions:
http://en.wikipedia.org/wiki/Big_O_notation
It's hard to know quite what you're asking. If you're asking whether accessing a local database is generally faster than accessing a similar service over the internet, then I expect that, generally, the answer is that the local database will be faster. The call over the internet to the web service has a lot of overhead and communication over internet is relatively slow. Evan on a slow computer a databases can perform many thousands of simple queries per second. Contrast that with access over the internet, where you'd be lucky to get 50 round trip requests per second, not even accounting for time it takes to perform the requested operation on the server.
If you're asking whether a server on the web can serve data faster by avoiding a database and calculating results directly, then the answer is it depends. The call to the database in this case adds unnecessary overhead if the data in it can be easily calculated in a stand-alone function. The answer to this question doesn't really have anything to do with a "web service". Is it faster to calculate an answer in a function or to access the answer using a query on a database? As I said, the answer would depend on the complexity of the particular function you had to use, and weighing its computation time against the overhead of accessing the answer (or part of the answer) directly from a database.
In short, the answer to your question depends on what exactly you're asking. It would also probably help to know why you're asking the question. I have a suspicion that the real answer is that this probably isn't something you need to worry about, not really a practical concern unless you have a particular situation requiring optimization.
If you're concerned about comparison of speed when webservice and database are both on a lan, I'm pretty sure the overhead of the db is a less than the webservice. The application typically maintains a stateful connection(s) to the db, while requests to a webservice are via http, which is stateless, relatively higher overhead, and slower. Could be wrong, though. Best answer would be to whip up a simple webservice, query, and (1) measure time it takes to retrieve results using both methods, and compare, and/or (2) create an app that opens a lot of threads and do some load testing.
A caveat: If your app doesn't maintain an open connection or have access to a pool of connections with the db, then the db alternative may well be slower. Initial creation of a db connection can be relatively slow. But that shouldn't figure into things, since you should write your app so that an open connection is always maintained.
Based on practical experience, I would say that the database call is significantly faster.
It all depends on the network topology and languages you're using. If you're talking C#...my money would be on the database call being faster almost every time.
Your calls to the database server are going to be made over the native protocol. Everything is going to be optimized.
If you're calling a web service, you're going to need some mechanism to send the request to the web server, wait for the web server to respond, and then something to parse the result of the web service call back into your code.
One could say that generally, latency of the network in a web service (which will typically be over the internet) is going to be slower than the call to a database (which is typically on a LAN or something, which is faster than one's connection to the internet).
Of course, this makes a LOT of assumptions about setups/software/etc, etc which effectively reduces it to an apples and oranges comparison, which there is never a good answer for.
O(1) doesn't specify the speed, it specifies the 'growth' in time required as the underlying data gets larger. The constants are dropped from the equation. What this means is that O(N^2) can be less than O(N) for some really small N.
A web service is a way to connect to some functionality. Besides the network latency, the real time is bound by what the service is actually doing. There could be a database underneath for example. If it is something that just returns an Integer, the computational time is mostly trivial, the request is bounded by the network.
A database needs to parse the query, build a query tree, optimized it, then apply some search algorithms against a series of caches and files. If you just plopped an Integer into a trivial table, or a tableless SQL call, then fetching the data is probably trivial, its the whole transactional packaging that will eat CPU.
Can you get a packet back and forth to a server before you can parse trivial SQL and punch back a tabled result? Mostly, these days I say it was a toss up. Some networks are faster than others, while some databases and servers are pretty good. Nothing is certain.
In general, is a web service faster than a database? Yes, if and only if the service is trivial (if it's hiding a database, then it's obviously just additional time). Databases are big bulky engines, and while they've gotten much faster over the years, their base level of transactional integrity specifies an awful lot of minimum CPU usage. They're slower because they are doing so much more work. Contrast that with some explicit minimal computation hidden behind network access. A fibber or gigabit network can rapidly move data. It's just so much less work to get accomplished.
Of course the reason we don't replace databases with custom written web services is time. It takes too long to write it, and then keep it up to date. Way more effort than just slamming it into a database and accepting it's performance.
Paul.
IMHO I would say the database call would be faster hands down. I say this because there is much less overhead. With the verbosity of the HTTP protocol and SOAP markup incurred you have a lot more bloat in your data. This bloat data has extra cost for packaging and un-packaging. with a stored procedure call you could use an output parameter to return a single int instead of a result set to make it even lighter.
Algorithmic complexity is just one variable that impacts the overall performance of a system. Other factors might include network latency or network bandwidth, especially when the size of the returned data is different.
If you run the same O(1) algorithm on a local machine, you will get the results faster than if you run the algorithm on a machine on another continent and need the same results sent over the network.
Other factors might include raw CPU speed if the calls are done on physically different machines.
That's why premature optimisation is the root of all evil.
EDIT:
I'd say it depends even more now on the details of the system, i.e., what database software you are using, or whether or not your web service is reading data from a static web page or dynamically generating the data.
But I am beginning to lose sight of why you are asking the question. You seem to say that both methods take the same amount of time. So if they take the same amount of time, how can you ask which is faster? Clearly they are equally fast. You need to tell us more about how and when they stop taking the same amount of time.
If we are assuming that you are communicating to a different server for both the web and database calls, wouldn't they be pretty much the same, since both requests are transferred through TCP/IP? The only thing then that could be compared is how big the actual results are that are sent back in terms of bits across the wire.

Quick and dirty way to compare SQL server performance

Further to my previous question about the Optimal RAID setup for SQL server, could anyone suggest a quick and dirty way of benchmarking the database performance on the new and old servers to compare them? Obviously, the proper way would be to monitor our actual usage and set up all sorts of performance counters and capture the queries, etc., but we are just not at that level of sophistication yet and this isn't something we'll be able to do in a hurry. So in the meanwhile, I'm after something that would be a bit less accurate, but quick to do and still better than nothing. Just as long as it's not misleading, which would be worse than nothing. It should be SQL Server specific, not just a "synthetic" benchmark. It would be even better if we could use our actual database for this.
Measure the performance of your application itself with the new and old servers. It's not necessarily easy:
Set up a performance test environment with your application on (depending on your architecture this may consist of several machines, some of which may be able to be VMs, but some of which may not be)
Create "driver" program(s) which give the application simulated work to do
Run batches of work under the same conditions - remember to reboot the database server between runs to nullify effects of caching (Otherwise your 2nd and subsequent runs will probably be amazingly fast)
Ensure that the performance test environment has enough hardware machines in to be able to load the database heavily - this may mean swapping out some VMs for real hardware.
Remember to use production-grade hardware in your performance test environment - even if it is expensive.
Our database performance test cluster contains six hardware machines, several of which are production-grade, one of which contains an expensive storage array. We also have about a dozen VMs on a 7th simulating other parts of the service.
you can always insert, read, and delete a couple of million rows - it's not a realistic mix of operations but it should strain the disks nicely...
Find at least a couple of the queries that are taking some time, or at least that you suspect are taking time, insert a lot of data if you don't have it already, and run the queries having set:
SET STATISTICS IO ON
SET STATISTICS TIME ON
SET STATISTICS PROFILE ON
Those should give you a rough idea of the resources being consumed.
You can also run SQL Server Profiler to get a general idea of what queries are taking a long time and how long they are taking plus other statistics. It outputs a lot of data so try to filter it down a little bit, possibly by long duration or one of the other performance statistics.

Why is it bad practice to make multiple database connections in one request?

A discussion about Singletons in PHP has me thinking about this issue more and more. Most people instruct that you shouldn't make a bunch of DB connections in one request, and I'm just curious as to what your reasoning is. My first thought is the expense to your script of making that many requests to the DB, but then I counter myself with the question: wouldn't multiple connections make concurrent querying more efficient?
How about some answers (with evidence, folks) from some people in the know?
Database connections are a limited resource. Some DBs have a very low connection limit, and wasting connections is a major problem. By consuming many connections, you may be blocking others for using the database.
Additionally, throwing a ton of extra connections at the DB doesn't help anything unless there are resources on the DB server sitting idle. If you've got 8 cores and only one is being used to satisfy a query, then sure, making another connection might help. More likely, though, you are already using all the available cores. You're also likely hitting the same harddrive for every DB request, and adding additional lock contention.
If your DB has anything resembling high utilization, adding extra connections won't help. That'd be like spawning extra threads in an application with the blind hope that the extra concurrency will make processing faster. It might in some certain circumstances, but in other cases it'll just slow you down as you thrash the hard drive, waste time task-switching, and introduce synchronization overhead.
It is the cost of setting up the connection, transferring the data and then tearing it down. It will eat up your performance.
Evidence is harder to come by but consider the following...
Let's say it takes x microseconds to make a connection.
Now you want to make several requests and get data back and forth. Let's say that the difference in transport time is negligable between one connection and many (just ofr the sake of argument).
Now let's say it takes y microseconds to close the connection.
Opening one connection will take x+y microseconds of overhead. Opening many will take n * (x+y). That will delay your execution.
Setting up a DB connection is usually quite heavy. A lot of things are going on backstage (DNS resolution/TCP connection/Handshake/Authentication/Actual Query).
I've had an issue once with some weird DNS configuration that made every TCP connection took a few seconds before going up. My login procedure (because of a complex architecture) took 3 different DB connections to complete. With that issue, it was taking forever to log-in. We then refactored the code to make it go through one connection only.
We access Informix from .NET and use multiple connections. Unless we're starting a transaction on each connection, it often is handled in the connection pool. I know that's very brand-specific, but most(?) database systems' cilent access will pool connections to the best of its ability.
As an aside, we did have a problem with connection count because of cross-database connections. Informix supports synonyms, so we synonymed the common offenders and the multiple connections were handled server-side, saving a lot in transfer time, connection creation overhead, and (the real crux of our situtation) license fees.
I would assume that it is because your requests are not being sent asynchronously, since your requests are done iteratively on the server, blocking each time, you have to pay for the overhead of creating a connection each time, when you only have to do it once...
In Flex, all web service calls are automatically called asynchronously, so you it is common to see multiple connections, or queued up requests on the same connection.
Asynchronous requests mitigate the connection cost through faster request / response time...because you cannot easily achieve this in PHP without out some threading, then the performance hit is greater then simply reusing the same connection.
that's my 2 cents...

Resources