I'm in charge of fixing part of an application that syncs data entities from a DB2 database running on an iSeries to a SQL Server database on Azure. The application seems to run this synchronization just fine when the iSeries and the SQL Server host are on the same local network. However, add in the change in latency when the database host is in Azure and this process slows to an unacceptable level.
I'm looking for ideas on how to proceed. Some additional information:
Some of our clients might have millions of records in some tables.
An iSeries platform is not virtualizable/containerizable as far as I know.
Internet connection speed will vary wildly. Highly dependent on the client.
The current solution uses an ODBC driver to get data from the iSeries. I've started looking at taking a different approach as far as retrieving the data including creating API's (doesn't seem to be the best idea for transferring lots of data) or using Azure Data Services, but, those are really pieces to a bigger puzzle.
Edit: I'm trying to determine what the bottleneck is and how to fix it. I forgot to add that the IT manager (we only have one person in IT... we are a small company) has setup a test environment using a local iSeries and an instance of our software in Azure. He tried analyzing the network traffic using Wireshark and told me we aren't using tons of bandwidth, but, there seems to be alot of TCP "chatter". His conclusion was we need to consider another way to get the data. I'm not convinced and that is part of the reason I'm here asking this question.
More details:
Our application uses data from an ERP system that runs on the iSeries.
Part of our application is the sync process in question.
The process has two components: a full data sync and a delta/changes data sync.
When these processes are run when both the iSeries and our application are on site, the performance isn't great, but, it is acceptable.
The sync process is written in C# and connects to DB2 using an ODBC driver.
We do some data manipulation and verification as part of the sync process. i.e., making sure parent records exist, foreign keys are appropriate, etc.
Related
First of all: I'm a newb. Its my first time working in a important project.
This is my plan:
How it works now:
The rasp has a state based on the others connected to the same computer. The computer has an algorithm to calculate these states.
The rasp also generate data every second, and send it directly to the server using http requests.
The server shows a page to the user based on rasp properties, where the user want to have access to the rasp state, memory load, camera stream and also send commands.
Every new Computer needs to be registered on the Server Computer, also every Computer and Rasp properties.
So if I add a new camera to the rasp, someone needs to update the Server database.
This approach is not very scalable. I'm targeting a very scalable solution.
So, I'm considering:
Each computer has an individual database and an API answer http requests. This database hold rasp information, like the cameras, current state, etc...
In this approach, it's better to the computer send the information or to the server asks for it?
The server has access to the Computers databases via database synchronization (I need to do a better research about this). So, no to individual APIs, only individual databases synchronized with the server.
Suggestions?
OBS: in the current state, all the rasps, computers and the server are inside a single VPN.
Why are you bothering with the intermediary computers/databases? Have the Raspberry make HTTP request to central server to "register" itself, and have it send it's content periodically to that centralized server.
If there is too much load, split that centralized server into application/database tiers and load-balance the application tier. If there is still too much load on the database, re-architect your database.
Having multiple sources of "truth" is one of the biggest downfalls of an application design. Don't try and synchronize your way out of this, just create a single point of truth. If you need to scale that centralized db higher, then look at solutions that are amenable to that, like NoSQL solutions instead of trying to distribute it out to the clients.
In this approach, it's better to the computer send the information or to the server asks for it?
Let the server request the data when it needs it... and when it can! If the computers send data whenever they feel it, your server may have performance problems.
The server has access to the Computers databases via database synchronization (I need to do a better research about this). So, no to individual APIs, only individual databases synchronized with the server.
Try to make the replication one-way only, so identify the tables used for the computers' data and do not write in them on the server.
Database replication may be a solution for information sharing, but it implies a heavy coupling of the systems because of the schema structure. Isolate the shared tables.
Suggestions?
Use a database that supports replication (mySql does, I think), don't write your own system
Compute/aggregate/adapt the data on your computers to make the work of the server easier
Every new Computer needs to be registered on the Server Computer
Have a look at tools like Consul
I have a Microsoft Access .accdb database on a company server. If someone opens the database over the network, and runs a query, where does the query run? Does it:
run on the server (as it should, and as I thought it did), and only the results are passed over to the client through the slow network connection
or run on the client, which means the full 1.5 GB database is loaded over the network to the client's machine, where the query runs, and produces the result
If it is the latter (which would be truly horrible and baffling), is there a way around this? The weak link is always the network, can I have queries run at the server somehow?
(Reason for asking is the database is unbelievably slow when used over network.)
The query is processed on the client, but that does not mean that the entire 1.5 GB database needs to be pulled over the network before a particular query can be processed. Even a given table will not necessarily be retrieved in its entirety if the query can use indexes to determine the relevant rows in that table.
For more information, see the answers to the related questions:
ODBC access over network to *.mdb
C# program querying an Access database in a network folder takes longer than querying a local copy
It is the latter, the 1.5 GB database is loaded over the network
The "server" in your case is a server only in the sense that it serves the file, it is not a database engine.
You're in a bad spot:
The good thing about access is that it's easy to create forms and reports and things by people who are not developers. The bad is everything else about it. Particularly 2 things:
People wind up using it for small projects that grow and grow and grow, and wind up in your shoes.
It sucks for multiple users, and it really sucks over a network when it gets big
I always convert them to a web-based app with SQL server or something, but I'm a developer. That costs money to do, but that's what happens when you use a tool that does not scale.
I'm looking for a little advice.
I have some SQL Server tables I need to move to local Access databases for some local production tasks - once per "job" setup, w/400 jobs this qtr, across a dozen users...
A little background:
I am currently using a DSN-less approach to avoid distribution issues
I can create temporary LINKS to the remote tables and run "make table" queries to populate the local tables, then drop the remote tables. Works as expected.
Performance here in US is decent - 10-15 seconds for ~40K records. Our India teams are seeing >5-10 minutes for the same datasets. Their internet connection is decent, not great and a variable I cannot control.
I am wondering if MS Access is adding some overhead here than can be avoided by a more direct approach: i.e., letting the server do all/most of the heavy lifting vs Access?
I've tinkered with various combinations, with no clear improvement or success:
Parameterized stored procedures from Access
SQL Passthru queries from Access
ADO vs DAO
Any suggestions, or an overall approach to suggest? How about moving data as XML?
Note: I have Access 7, 10, 13 users.
Thanks!
It's not entirely clear but if the MSAccess database performing the dump is local and the SQL Server database is remote, across the internet, you are bound to bump into the physical limitations of the connection.
ODBC drivers are not meant to be used for data access beyond a LAN, there is too much latency.
When Access queries data, is doesn't open a stream, it fetches blocks of it, wait for the data wot be downloaded, then request another batch. This is OK on a LAN but quickly degrades over long distances, especially when you consider that communication between the US and India has probably around 200ms latency and you can't do much about it as it adds up very quickly if the communication protocol is chatty, all this on top of the connection's bandwidth that is very likely way below what you would get on a LAN.
The better solution would be to perform the dump locally and then transmit the resulting Access file after it has been compacted and maybe zipped (using 7z for instance for better compression). This would most likely result in very small files that would be easy to move around in a few seconds.
The process could easily be automated. The easiest is maybe to automatically perform this dump every day and making it available on an FTP server or an internal website ready for download.
You can also make it available on demand, maybe trough an app running on a server and made available through RemoteApp using RDP services on a Windows 2008 server or simply though a website, or a shell.
You could also have a simple windows service on your SQL Server that listens to requests for a remote client installed on the local machines everywhere, that would process the dump and sent it to the client which would then unpack it and replace the previously downloaded database.
Plenty of solutions for this, even though they would probably require some amount of work to automate reliably.
One final note: if you automate the data dump from SQL Server to Access, avoid using Access in an automated way. It's hard to debug and quite easy to break. Use an export tool instead that doesn't rely on having Access installed.
Renaud and all, thanks for taking time to provide your responses. As you note, performance across the internet is the bottleneck. The fetching of blocks (vs a continguous DL) of data is exactly what I was hoping to avoid via an alternate approach.
Or workflow is evolving to better leverage both sides of the clock where User1 in US completes their day's efforts in the local DB and then sends JUST their updates back to the server (based on timestamps). User2 in India, also has a local copy of the same DB, grabs just the updated records off the server at the start of his day. So, pretty efficient for day-to-day stuff.
The primary issue is the initial DL of the local DB tables from the server (huge multi-year DB) for the current "job" - should happen just once at the start of the effort (~1 wk long process) This is the piece that takes 5-10 minutes for India to accomplish.
We currently do move the DB back and forth via FTP - DAILY. It is used as a SINGLE shared DB and is a bit LARGE due to temp tables. I was hoping my new timestamped-based push-pull of just the changes daily would have been an overall plus. Seems to be, but the initial DL hurdle remains.
We're getting ready to build a new platform for our current system. Currently we install sql server express locally to all our clients and all their data is stored there. While the process works pretty good, it's still a pain to add columns/tables etc. We also want to have our data available outside of the local install. So we're moving to a central web based sql database and creating a web based application. Our new application will be a Silverlight 5, wcf ria services, mvvm, entity framework application
We've decided that either a web hosted sql server database or sql azure database are the way to go. However, I have no idea why I would choose one over the other. The limitations of azure don't seem to apply to us, but our application will be run on our current shared web host. Is it better to host the application on the same server as the database? Do we even know with shared web hosting that the server is on the same location as the app? There's also the marketing advantage of being 'in the cloud' which our clients love when we drop that word (they have no idea about anything technical, it's just a buzzword for them). I'm not too worried about the cost as I think both will ultimately be about the equivalent of each other.
I feel like I may be completely overthinking this and either will work, however I'd like to try and get the best solution for us and don't want to choose without getting some feedback.
In case it helps, our application is mostly dashboard/informational data. Mostly financial and trending data. It's almost entirely read only. Sometimes the data can get fairly large and we would be sending upwards of 50,000 rows of data to the application.
Thanks for any help/insight you can provide for me!
The main concerns I would have with using a SQL Azure DB from an application on your current shared web host would be
The effect of network latency: Depending on location, every time you do a DB round trip from your application to the SQL Azure DB you will incur a 50-100ms delay. If your application does lots of round trips, this will mount up. Often, if an application has been designed to work with a DB on the LAN (you use of local client DBs suggests this) the they tend to get "chatty" since network delays are very small on the LAN. You may find your application slows down significantly.
Security: You will have to open up the SQL Azure firewall to the IP address(es) that your application presents when querying. Depending on your host, it may be that this IP address is shared between several tenants. This would be a vulnerability.
If neither of these is a problem, then SQL Azure will provide a much lower management overhead (e.g. no need to patch etc.) and will give you very high reliability, especially in terms of the risk of data loss.
I am building an Asp.net MVC site where I have a fast dedicated server for the web app but the database is stored in a very busy Ms Sql Server used by many other applications.
Also if the web server is very fast, the application response time is slow mainly for the slow response from the db server.
I cannot change the db server as all data entered in the web application needs to arrive there at the end (for backup reasons).
The database is used only from the webapp and I would like to find a cache mechanism where all the data is cached on the web server and the updates are sent to the db asynchronously.
It is not important for me to have an immediate correspondence between read db data and inserted data: think like reading questions on StackOverflow and new inserted questions that are not necessary to show up immediately after insertion).
I thought to build an in between WCF service that would exchange and sync the data between the slow db server and a local one (may be an Sqllite or an SqlExpress one).
What would be the best pattern for this problem?
What is your bottleneck? Reading data or Writing data?
If you are concerning about reading data, using a memory based data caching machanism like memcached would be a performance booster, As of most of the mainstream and biggest web sites doing so. Scaling facebook hi5 with memcached is a good read. Also implementing application side page caches would drop queries made by the application triggering lower db load and better response time. But this will not have much effect on database servers load as your database have some other heavy users.
If writing data is the bottleneck, implementing some kind of asyncronyous middleware storage service seems like a necessity. If you have fast and slow response timed data storage on the frontend server, going with a lightweight database storage like mysql or postgresql (Maybe not that lightweight ;) ) and using your real database as an slave replication server for your site is a good choise for you.
I would do what you are already considering. Use another database for the application and only use the current one for backup-purposes.
I had this problem once, and we decided to go for a combination of data warehousing (i.e. pulling data from the database every once in a while and storing this in a separate read-only database) and message queuing via a Windows service (for the updates.)
This worked surprisingly well, because MSMQ ensured reliable message delivery (updates weren't lost) and the data warehousing made sure that data was available in a local database.
It still will depend on a few factors though. If you have tons of data to transfer to your web application it might take some time to rebuild the warehouse and you might need to consider data replication or transaction log shipping. Also, changes are not visible until the warehouse is rebuilt and the messages are processed.
On the other hand, this solution is scalable and can be relatively easy to implement. (You can use integration services to pull the data to the warehouse for example and use a BL layer for processing changes.)
There are many replication techniques that should give you proper results. By installing a SQL Server instance on the 'web' side of your configuration, you'll have the choice between:
Making snapshot replications from the web side (publisher) to the database-server side (suscriber). You'll need a paid version of SQLServer on the web server. I have never worked on this kind of configuration but it might use a lot of the web server ressources at scheduled synchronization times
Making merge (or transactional if requested) replication between the database-server side (publisher) and web side(suscriber). You can then use the free version of MS-SQL Server and schedule the synchronization process to run according to your tolerance for potential loss of data if the web server goes down.
I wonder if you could improve it adding a MDF file in your Web side instead dealing with the Sever in other IP...
Just add an SQL 2008 Server Express Edition file and try, as long as you don't pass 4Gb of data you will be ok, of course there are more restrictions but, just for the speed of it, why not trying?
You should also consider the network switches involved. If the DB server is talking to a number of web servers then it may be being constrained by the network connection speed. If they are only connected via a 100mb network switch then you may want to look at upgrading that too.
the WCF service would be a very poor engineering solution to this problem - why make your own when you can use the standard SQLServer connectivity mechanisms to ensure data is transferred correctly. Log shipping will send the data across at selected intervals.
This way, you get the fast local sql server, and the data is preserved correctly in the slow backup server.
You should investigate the slow sql server though, the performance problem could be nothing to do with its load, and more to do with the queries and indexes you're asking it to work with.