Symmetric DS taking too long on DataExtractorService - database

I'm using SDS to migrate data from a SQL server to a Mysql database. My tests of moving the data of a database that was not in use worked correctly though they took like 48 hours to migrate all the existing data. I configured dead triggers to move all current data and triggers to move the new added data.
When moving to a live database that it is in use the data is being migrated too slow. On the log file I keep getting the message:
[corp-000] - DataExtractorService - Reached one sync byte threshold after 1 batches at 105391240 bytes. Data will continue to be synchronized on the next sync
I have like 180 tables and I have created 15 channels for the dead triggers and 6 channels for the triggers. For the configuration file I have:
job.routing.period.time.ms=2000
job.push.period.time.ms=5000
job.pull.period.time.ms=5000
I have none foreign key configuration so there wont be an issue with that. What I would like to know is how to make this process faster. Should I reduce the number of channels?
I do not know what could be the issue since the first test I ran went very well. Is there a reason why the threshold is not being clreared.
Any help will be apreciated.
Thanks.

How large are your tables? How much memory does the SymmetricDS instance have?
I've used SymmetricDS for a while, and without having done any profiling on it I believe that reloading large databases went quicker once I increased available memory (I usually run it in a Tomcat container).
That being said, SymmetricDS isn't by far as quick as some other tools when it comes to the initial replication.
Have you had a look at the tmp folder? Can you see any progress in file size. That is, the files which SymmetricDS temporarily writes to locally before sending the batch off to the remote side? Have you tried turning on more fine grained logging to get more details? What about database timeouts? Could it be that the extraction queries are running too long, and that the database just cuts them off?

Related

Access local front-end connected to Azure SQL Server back-end very slow

I've been using Access to rapid-prototype a DB. Now I'd like to do a small group online test so I split the DB and placed the back-end on Azure SQL Server, then re-linked. It's incredibly slow and I've been researching solutions for days without positive results. My local environment is Win10, Office2016 64bit and internet connection is fast and stable.
I have tried different ODBC drivers, including the SQL Native Client v11.
I've disabled auto-tuning level on the NIC.
I've recreated all queries from access on the server.
I've made sure that Tracing in ODBC is off.
But I enabled tracing temporarily to see what was happening. If I opened the front-end, logged in (Small user table), and did something on the first form (Add 1 record with 3 sub-records...and really...nothing fancy or heavy at all and this only takes 1 minute) then closed the DB, I see that the Tracing log file is 1.5MB.
So I created an empty Access file and an ODBC link to only the User table (12 columns, 20 records), and then monitored the tracing log file again. Opening access doesn't add anything to the log file, but opening this one, linked table made the log file grow to 255kb. Opening this table in access took 5 seconds.
Access sent about 800 requests to the server for opening just this one small table. If I paste all the User table data into a text file, it's only 2kb. SO why is it so slow?
Any ideas on this, and specifically other suggestions to get this working faster?
Kind regards,
Well, the reason why using Azure is slower than running Access connected to a local instance of SQL server is because, well slow is slow!
I mean, if you going to travel 30 miles, you have a choice to walk, or to take a car.
So here is the question you need to know:
Why is walking slower than driving a car?
Answer: Because you are travelling at a slower speed!
So why is using Azure slower the using an instance of SQL server running on your local computer or local network?
Answer:
Because the connection speed to Azure is about 100 times slower!
The idea here that you not going to take into account the DIFFERENCE in connection speed is the issue here. It is a disservice to the reading public that may conclude that such a setup (Access front end on a pc to Azure instance of SQL server) is not a viable setup.
So the first issue here is to make a note of your connection speed to the back end database.
A typical office local area network has a speed of 100mbits, or today most are 1gig – even the el-cheapo routers you purchase at Best Buy are now rated at 1gig (1000 mbits).
However, your typical high speed internet is rated at about 5, or 10 mbits. So that is 100 times slower. (Actually 1000/5 = 200 times slower!!!).
That means if something NOW takes 3 seconds on your office network with Access and SQL server? Well then a WAN (over the internet), then you need to multiple the time by the change in your connection speed (this is so simple – yet it seems to escape all!). So, if you lucky, you might have a 5 mbits speed rating for your internet. That means you go
1000 / 5 = 200
You now take the 200 and multiple the existing delay you have of say 3 seconds and you get 600 seconds (that is 10 minutes if you are wondering!). So you going from 3 seconds to 10 minutes!
This kind of comparison in speed would be like walking into a sports shop to purchase a rubber boat to cross the Atlantic. So not taking into account the change in internet speed and wondering why things are slow is the issue here.
You can most certainly use Access to Azure, but you have to realize two simple concepts.
a connection and test with a connection that is 50-200 times slower than your LAN is a test that going to run 50 to 200 times slower! The failure to mention and take into considering the MASSIVE DIFFERENCE in your speed connection of your LAN compared to a WAN is the simple issue here.
opening a form bound to a large table of data is going to case performance issues.
I was sitting at the bus stop talking to a 90 year old granny lady. I asked her the following:
Have you ever used an instant teller?
She said, why yes, I use them all the time.
I then asked here don’t you think it would be bad to have the teller machine download all the peoples accounts while you wait and THEN ask you for your account number?
The old lady stated, of course, that would be silly. I type in my account pin and the machine ONLY downloads my account information – this is practical and obvious.
In other words that old lady realised that downloading a bunch of data BEFORE you the user even types in or does anything is a waste of bandwidth.
So you never want to launch a form bound to a table and THEN ask the user what record to work on. Why have Access download large numbers of records into a form and THEN ask the user or allow the user to navigate to the required record?
Even when using Google, it does not download the whole internet into your web browser page and you then go ctrl+f to search the contents of that web page.
The same concepts should be applied to Access applications. A design that asks for what to work on and then launches a form bound to a table with a "where" clause will thus fix this issue.
So if you have a form (and even a sub form) that displays a customer invoice, you would FIRST ASK FOR the invoice number, and then simply launch that form using a where clause that restricts the form to the ONE invoice!
Keep in mind that you can STILL use that invoice form bound to a table of 1 million rows and ONLY THE ONE record will be pulled down the network connection *if one used the where clause.
So a typical internet connection has adequate speed to run a browser, and also has MORE than adequate bandwidth speed to pull down a few records. Access often gets a bad rap for poor performance, but that is ONLY DUE to Access developers IGNORING the obvious advice that downloading tons of things that you don’t yet need into a form will run slow.
So web based applications, or even desktop applications written in vb.net perform well with SQL Azure running in the cloud over that MUCH slower internet connection because those applications don’t launch forms bound to large datasets WITHOUT FIRST simply allowing the user to request what they need to see and view.
As for Access and using SharePoint? That setup can be VERY fast, and in fact MUCH faster than SQL Azure, MySQL or any traditional database system because when you using SharePoint tables and Access, then Access automatic syncs a copy of the data local. This setup means your application will continue to run WITHOUT ANY internet connection. The instant the connection is restored, then the data sync can resume.
This means that if you have a table with 15,000 rows and run a report on that data the report can run and launch in an instant with SharePoint back end since a local copy of the data exists in the front end at ALL TIMES! So this setup is VERY well suited an off line mode or in cases that you have a poor and slow internet connections since you as noted always have local copy of the data – only when a record is changed does a sync occur, and that sync can occur independent of Access. So you change one record – and it starts syncing with SharePoint.
However for larger data sets that have to be updated, then SQL server is far better since you can execute a sql update on 10,000 rows and ZERO network traffic and transfer of data need occur to update those 10,000 rows when using SQL server (a pass through query) and when using SharePoint, the 10,000 rows WILL transfer over the network since the local copy requires the rows to be updated. So that massive advantage of using SharePoint for the database backend does not exist for applications that have to update lots of rows or do lots of row update types of data processing.
So the key concepts and take away here:
The high speed internet connection you have is often 10-200 times slower than your typical cheap office (local) network. So that means a 2 second operation will now take 10-200 times longer.
The Access application needs to be optimized to avoid things like loading too many records into a form. So building search forms etc. That FIRST ASK the user what they need to work on is a basic and simple requirement for all good developers and that includes Access developers.
Access and SharePoint can be the BEST option, and such a setup allows the application to run EVEN WHEN there is no internet connection at all. If table sizes are below say 10,000 rows, then this setup can often be ideal. However for applications that have to update lots of rows and for data processing heavy applications this setup is poor since updates to any rows will case data syncing to occur over the network. This setup is also the cheapest, since a single office 365 account with SharePoint support for Access can be had for $6 per month, and that $6 account allows up to 500 free users and those 500 users can even use their Gmail or non-Microsoft account for this setup. And such access applications that do fit within the bounds of SharePoint tables tend to need far less changes and optimizing then using SQL server over the internet.
With SQL server, use of views, pass-tough query and in some cases writing store procedures allows updates and code to run WITHOUT using ANY bandwidth. So one can send a single update query to the server that updates 10,000 rows of data – the only network cost will be the “tiny” amount of bandwidth to send that sql statement.
So while bound forms can be used with SQL Azure running in the cloud, one needs to build software like those do for the web, or vb.net in which they FIRST ask the user what account or customer to work on and THEN launch the UI to display that given data.
So in access, you build a search form say like this:
So at the end of the day, it is important to ignore posts here that suggests Access to SQL in the cloud is not viable. Access with proper designs will work rather well over typical internet connections to SQL server running on Azure.
In fact I seen people use Access to SQL over a 56k modem!
One has to adopt sensible designs in which the data pulled for a given task is restricted – this is a hall mark of all developers – the only issue is Access does NOT enforce this approach while most other developer tools don’t let you hang yourself with things like bound forms to large tables! It not that Access is slow, but Access is slow when you make poor design decisions.
Access to SharePoint can be a real winner – especially for poor bandwidth, spotty bandwidth and even when the connection is lost, the application will continue to run and run faster than 99% of the cases if you were running the same application with a SQL back end. There is a BIG caveat here since only certain types of applications will work well with SharePoint tables. For me to explain the why, how, and when such applications are better is beyond a simple post here, but one simply needs to be aware that SharePoint can be incredible solution, but not for all applications and SQL server can and will be better choice. This SharePoint “better” choice can only be determined on a case by case evaluation of the given type of application in question.
The problem is simply that Azure SQL Database is not very fast running with small DTUs (Database Transaction Units) compared to, say, an in-house instance of SQL Server hosted on even a moderate modern server.
I've checked it out too, and it requires extremely careful design of queries and filtering - far from what you normally can get away with - to obtain acceptable overall speed. On the other hand, this is a giving experience that will bring focus to potential bottlenecks you otherwise wouldn't encounter before it might be too late.
OK, so after almost a week of trying to get this to work (Access front-end to SQL Server back-end on Azure), I've come to the conclusion that it's not a viable solution.
I've tried SQL Server, and setup a Sharepoint 2016 server on Azure, which also failed.
What has worked is using a product from Bullzipp called MS Access to MySQL to convert the access tables, then adding a MySQL DB on the server and importing the file generated by Bullzip. The only thing to note here is that Bullzip doesn't like the newer access formats (it wants an MDB file) so go to Access, create a new, empty file, but make sure you set its file type to MDB. then import your tables across and run Bullzip.
It's now working a hell of a lot faster than the SQL Server, but I am getting some write conflicts in Access, so I just need to go through the code and do whatever I need to so I can avoid those messages.
Using Access as a front to Azure SQL tables is the worst solution. But, sometimes you have to do it. I have a client who is adamant that she wants to keep her Access database. When she hired her very first employee, it became clear she needed to SQL tables behind the screens.
This was a bit of a nightmare. However, after redesigning some terrible table structures, creating views and many procs, I've been able to do it. I use local tables in some cases, and refill by pulling from a stored proc and inserting into the local table. I use linked tables for basic data edits, and do explicit save records almost constantly.
I also have a first-load module that opens all forms, goes to the last record, back to the first record, and then hides the form until needed. The load limps along for about 3
My only remaining issue is now that Azure will close connections after idle time of (I think) 30 or more minutes -- or maybe it's when the laptop sleeps? That kills the app and it has to be closed and re-opened.

Postgresql: direct replication on demand

I have two Postgresql databases (on different servers): lets call them stable and unstable. The stable is for read purposes only. From time to time the the stable is updated to the current unstable.
I would like to improve the update process. Dumping to a file and restoring is not possible any more due to the size of database. I found many replication tools designed for a master-slave configuration, but that is not exactly what I do, as the slave (stable) is updated on demand. The tools seem to be more complicated that the problem I want them to solve.
Ideally, I would like a simple tool with interface like pg_replicate source target which would result in target db being exact clone of source and the data being transfered directly between the two database servers (they are connected by a local network) without creating large temporary files. The database size is 50 GB and growing, so transfering SQL inserts might not suffice.
So, is there a tool which could do it? Or anything close?

SQL Server Table > MS Access Local Copy?

I'm looking for a little advice.
I have some SQL Server tables I need to move to local Access databases for some local production tasks - once per "job" setup, w/400 jobs this qtr, across a dozen users...
A little background:
I am currently using a DSN-less approach to avoid distribution issues
I can create temporary LINKS to the remote tables and run "make table" queries to populate the local tables, then drop the remote tables. Works as expected.
Performance here in US is decent - 10-15 seconds for ~40K records. Our India teams are seeing >5-10 minutes for the same datasets. Their internet connection is decent, not great and a variable I cannot control.
I am wondering if MS Access is adding some overhead here than can be avoided by a more direct approach: i.e., letting the server do all/most of the heavy lifting vs Access?
I've tinkered with various combinations, with no clear improvement or success:
Parameterized stored procedures from Access
SQL Passthru queries from Access
ADO vs DAO
Any suggestions, or an overall approach to suggest? How about moving data as XML?
Note: I have Access 7, 10, 13 users.
Thanks!
It's not entirely clear but if the MSAccess database performing the dump is local and the SQL Server database is remote, across the internet, you are bound to bump into the physical limitations of the connection.
ODBC drivers are not meant to be used for data access beyond a LAN, there is too much latency.
When Access queries data, is doesn't open a stream, it fetches blocks of it, wait for the data wot be downloaded, then request another batch. This is OK on a LAN but quickly degrades over long distances, especially when you consider that communication between the US and India has probably around 200ms latency and you can't do much about it as it adds up very quickly if the communication protocol is chatty, all this on top of the connection's bandwidth that is very likely way below what you would get on a LAN.
The better solution would be to perform the dump locally and then transmit the resulting Access file after it has been compacted and maybe zipped (using 7z for instance for better compression). This would most likely result in very small files that would be easy to move around in a few seconds.
The process could easily be automated. The easiest is maybe to automatically perform this dump every day and making it available on an FTP server or an internal website ready for download.
You can also make it available on demand, maybe trough an app running on a server and made available through RemoteApp using RDP services on a Windows 2008 server or simply though a website, or a shell.
You could also have a simple windows service on your SQL Server that listens to requests for a remote client installed on the local machines everywhere, that would process the dump and sent it to the client which would then unpack it and replace the previously downloaded database.
Plenty of solutions for this, even though they would probably require some amount of work to automate reliably.
One final note: if you automate the data dump from SQL Server to Access, avoid using Access in an automated way. It's hard to debug and quite easy to break. Use an export tool instead that doesn't rely on having Access installed.
Renaud and all, thanks for taking time to provide your responses. As you note, performance across the internet is the bottleneck. The fetching of blocks (vs a continguous DL) of data is exactly what I was hoping to avoid via an alternate approach.
Or workflow is evolving to better leverage both sides of the clock where User1 in US completes their day's efforts in the local DB and then sends JUST their updates back to the server (based on timestamps). User2 in India, also has a local copy of the same DB, grabs just the updated records off the server at the start of his day. So, pretty efficient for day-to-day stuff.
The primary issue is the initial DL of the local DB tables from the server (huge multi-year DB) for the current "job" - should happen just once at the start of the effort (~1 wk long process) This is the piece that takes 5-10 minutes for India to accomplish.
We currently do move the DB back and forth via FTP - DAILY. It is used as a SINGLE shared DB and is a bit LARGE due to temp tables. I was hoping my new timestamped-based push-pull of just the changes daily would have been an overall plus. Seems to be, but the initial DL hurdle remains.

Failover strategy for database application

I've got a writing and reading database application holding a local cache. In case of an application server fault a backup server shall start working.
The primary and backup application can only run exclusively because of its local cache and some low isolation level on the database.
As far as my communication knowledge goes it is impossible to let both servers always figure out who is allowed to run exclusively.
Can I somehow solve this communication conflict through using the database as a third entity? I think this is a quite typical problem and there might not be a 100% safe method, but I would be happy to know how other people recommend to solve such issues? Or if there is some best practice to this.
It's okay if both application are not working for 30 minutes or so, but there is not enough time to get people out of bed and let them figure out what the problem is.
Can you set up a third server which is monitoring both application servers for health? This server could then decide appropriately in case one of the servers appears to be gone: Instruct the hot standby to start processing.
if i get the picture right, your backup server constantly polls the primary server for data updates, it wouldn't be hard to check if the poll fails, schedule it again for 30s later 3 times and in the third failure dynamically update the DNS entry to the database server to reflect the change in active server. Both Windows DNS and Bind accept dynamic updates signed and unsigned.

SQL Server transactional replication for very large tables

I have set up transactional replication between two SQL Servers on different ends of a relatively slow VPN connection. The setup is your standard "load snapshot immediately" kind of thing where the first thing it does after initializing the subscription is to drop and recreate all tables on the subscriber side and then start doing a BCP of all the data. The problem is that there are a few tables with several million rows in them, and the process either a) takes a REALLY long time or b) just flat out fails. The messages I keep getting when I look in Replication Monitor are:
The process is running and is waiting for a response from the server.
Query timeout expired
Initializing
It then tries to restart the bulk loading process (skipping any BCP files that it has already loaded).
I am currently stuck where it just keeps doing this over and over again. It's been running for a couple days now.
My questions are:
Is there something I could do to improve this situation given that the network connection is so slow? Maybe some setting or something? I don't mind waiting a long time as long as the process doesn't keep timing out.
Is there a better way to do this? Perhaps make a backup, zip it, copy it over and then restore? If so, how would the replication process know where to pick up when it starts applying the transactions, since updates will be occurring between the time I make the backup and get it restored and running on the other side.
Yes.
You can apply the initial snapshot manually.
It's been a while for me, but the link (into BOL) has alternatives to setting up the subscriber.
Edit: From BOL How-tos, Initialize a Transactional Subscriber from a Backup
In SQL 2005, you have a "compact snapshot" option, that allow you to reduce the total size of the snapshot. When applied over a network, snapshot items "travel" compacted to the suscriber, where they are then expanded.
I think you can easily figure the potential speed gain by comparing sizes of standard and compacted snapshots.
By the way, there is a (quite) similar question here for merge replication, but I think that at the snapshot level there is no difference.

Resources