I would like to connect to the same database on Cloudant with different applications. Only one of the applications is going to be writing to the database, and the other would be reading only.
Is this okay to do?
Yep! Perfectly OK.
As Kim mentioned, the only thing to worry about with multiple asynchronous connections comes from when different applications / clients try updating the same doc. When this happens, Cloudant will create a conflict, storing both updates so you can sort out which is correct.
Check out Conflict Management in CouchDB for why this occurs and how to deal with it, and the Cloudant docs on conflicts for methods to examine conflicts.
Related
I am developing various services which talk directly to BigQuery, by streaming rows into the database. Right now I am updating the schema directly from the Google cloud UI, which has been causing issues as you can imagine due to forgetfulness!
I would like to understand how best to keep code & schemas aligned for what are still fast evolving services and schemas.
My current ideas are:
use something like Terraform, but I am unsure on how this works on live tables which need updating or migrating
add code to the service to check / set the schema, which would at least throw errors if not automate the process
Thanks in advance!
EDIT:
To give more clarity as request in the comments; we are using a cloud run microservices to stream rows into bigquery, the services are written in python/node. Their primary goal is to do some light transform on the data and store in BQ.
Not really sure what more to add, my ideal scenario is that we have something in the code which also defines or at least checks the schema, to keep the code and db in sync.
As any databases, you have to follow some rules or best practices to avoid some errors and conflicts. For example, you have to avoir to update manually the schema, you can choose to do it always with code (which is better because you can follow the schema change thanks to your git history).
Then you can have a side process that check the schema and update it with the newer changes. Or do this at startup (only if you aren't in serverless and the app start duration isn't a concern for you).
Terraform is perfect to deploy infrastructure, but much more limited to update/patch existing components. I don't recommend it.
I have to write a webservice in php to serve at three different zones/(cities or countries). Each zone will have its own machine to run this web service instance behind every webservice is a database which is exact clone/copy in each region, web service serves the clients with data from db. Main reason for multiples instances of web service is to distribute client load.
The clients can make read and write calls via web service APIs.
Write calls will modify the database for that instance but this change has to be applied as soon as possible to all databases in other zones also as all the databases in each zone are clones and exact copies, so changes in one db must be synced in all the databases in other zones.
I presume the write calls must go to some kind of master server which coordinates among all the web services etc. But I am sure this pattern is quite common and some solution is already out there.
Please advise if there is any database or application level technique which would keep the databases in sync when there are write calls so that modification or addition is reflected in all instances of db ? I can choose the database of my choice but primary choice would be mysql server or postgres, but can change to other database which can solve this issue.
You're right, this pattern is quite common and there is a name for it - Synchronous Master-Master replication. Most modern RDBMS support it:
PosgreSQL supports it thru pg_cluster https://wiki.postgresql.org/wiki/PgCluster
MySQL https://www.howtoforge.com/mysql_master_master_replication
But before implementing it straight away I'd recommend reading more about different types of replication, their pros and cons:
https://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling
https://dev.mysql.com/doc/refman/8.0/en/replication.html
Synchronous Master-Master replication will be quite slow, especially in a multi-zone scenario, so you might consider other techniques:
Asynchronous replication
Sharding/Partitioning
A mix of sharding and replication
There is a very good book on different distributed techniques(including sharding and replication) - "Designing Data Intensive Applications" by Martin Kleppmann.
Replication techniques are definitely worth looking at, but there can be a certain amount of technical overhead and cost to replication. I work for a company called Redactics (https://www.redactics.com), and we came up with a simpler solution that is sort of a near realtime replication based on delta updates using a pure SQL approach.
There are certainly pros and cons to both approaches, I'm not trying to push Redactics hard if this is not the most appropriate solution for your needs, but Redactics simply tracks the most recent primary keys and uses modification timestamps to find new and changed records, and then copies them over. You can run the sync pretty often without a lot of load since it is just a delta update. Obviously any workflow can break, but repairing broken replication can be tricky, so we like this approach and running these sync workflows within your own infrastructure.
I am new to ipad development. I have to develop an app for a client whose employees use ipads.I am to develop this app that would take the data that they have and store it to the main sql server on their server. On researching i came across that people do that once they have their data on ipad and later sync it with their server. I have used sqlite for android before. But that was like a school project. CRUD operations basically. So since i have little knowledge of sqlite i want to pursue this app in this way. My question is can i write an app that will sync temporary sqlite data with server once they sync ? I have more questions..
Thanks.
It is certainly possible to synchronize data between multiple databases.
Generally speaking, you have to record all changes made since the last synchronization (usually done with serial numbers or timestamps), and apply those changes to the other database.
If the same data has been modified by multiple users, you have to resolve this conflict somehow.
If multiple users can add data, you have to prevent duplicates of primary keys.
See these Wikipedia articles for explanations of some related concepts:
Data synchronization
Replication
Change data capture
this Guy may solve the problem, but it only supports Xamarin(iOS or Android).
http://forums.xamarin.com/discussion/5719/sync-sqlite-with-sql-server-merge-replication
Need a little advice here. We do some windows mobile development using the .NET Compact framework and SQL CE on the mobile along with a central SQL 2005 database at the customers offices. Currently we synchronize the data using merge replication technology.
Lately we've had some annoying problems with synchronization throwing errors and generally being a bit unreliable. This is compounded by the fact that there seems to be limited information out there on replication issues. This suggests to me that it isn't a commonly used technology.
So, I was just wondering if replication was the way to go for synchronizing data or are there more reliable methods? I was thinking web services maybe or something like that. What do you guys use for this implementing this solution?
Dave
I haven't used replication a great deal, but I have used it and I haven't had problems with it. The thing is, you need to set things up carefully. No matter which method you use you need to decide on the rules governing all of the various possible situations - changes in both databases, etc.
If you are more specific about the "generally being a bit unreliable" then maybe you'll get more useful advice. As it is all I can say is, I haven't had issues with it.
EDIT: Given your response below I'll just say that you can certainly go with a custom replication that uses SSIS or some other method, but there are definitely shops out there using replication successfully in a production environment.
well we've had the error occur twice which was a real pain fixing :-
The insert failed. It conflicted with an identity range check constraint in database 'egScheduler', replicated table 'dbo.tblServiceEvent', column 'serviceEventID'. If the identity column is automatically managed by replication, update the range as follows: for the Publisher, execute sp_adjustpublisheridentityrange; for the Subscriber, run the Distribution Agent or the Merge Agent.
When we tried running the stored procedure it messed with the identities so now when we try to synchronize it throws the following error in the replication monitor.
The row operation cannot be reapplied due to an integrity violation. Check the Publication filter. [,,,Table,Operation,RowGuid] (Source: MSSQLServer, Error number: 28549)
We've also had a few issues were snapshots became invalid but these were relatively easy to fix. However all this is making me wonder whether replication is the best method for what we're trying to do here or whether theres an easier method. This is what prompted my original question.
We're working on a similar situation, but ours is involved with programming a tool that works in a disconnected model, and runs on the Windows Desktop... We're using SQL Server Compact Edition for the clients and Microsoft SQL Server 2005 with a web service for the server solution.
TO enable synchronization services, we initially started by building our own synchronization framework, but after many issues with keeping that framework in sync with the rest of the system, we opted to go with Microsoft Synchronization Framework. (http://msdn.microsoft.com/en-us/sync/default.aspx for reference). Our initial requirements were to make the application as easy to use as installing other packages like Intuit QuickBooks, and I think that we have closely succeeded.
The Synchronization Framework from Microsoft has its ups and downs, but the only bad thing that I can say at this point is that documentation is horrendous.
We're in discussions now to decide whether or not to continue using it or to go back to maintaining our own synchronization subsystem. YMMV on it, but for us, it was a quick fix to the issue.
You're definitely pushing the stability envelope for CE, aren't you?
When I've done this, I've found it necessary to add in a fair amount of conflict tolerance, by not thinking of it so much as synchronization as simultaneous asynchronous data collection, with intermittent mutual updates and/or refreshes. In particular, I've always avoided using identity columns for anything. If you can strictly adhere to true Primary Keys based on real (not surrogate) data, it makes things easier. Sometimes a PK comprising SourceUnitNumber and timestamp works well.
If you can, view the remotely collected data as a simple timestamped, sourceided, userided log of cumulative chronologically ordered transactions. Going the other way, the host provides static validation info which never needs to go back - send back the CRUD transactions instead.
Post back how this turns out. I'm interested in seeing any kind of reliable Microsoft technology that helps with this.
TomH & le dorfier - I think that part of our problem is that we're allowing the customer to insert a large number of rows into one of the replicated table with an identity field. Its a scheduling application which can automatically multiple tasks up to a specified month/year. One of the times that it failed was around the time they entered 15000 rows into the table. We'll look into increasing the identity range.
The synchronization framework sounds interesting but sounds like it suffers from a similar problem to replication of having poor documentation. Trying to find help on replication is a bit of a nightmare and I'm not sure I want us to move to something with similar issues. Wish M'soft would stop releasing stuff that seems to have the support of beta s'ware!
My current development project has two aspects to it. First, there is a public website where external users can submit and update information for various purposes. This information is then saved to a local SQL Server at the colo facility.
The second aspect is an internal application which employees use to manage those same records (conceptually) and provide status updates, approvals, etc. This application is hosted within the corporate firewall with its own local SQL Server database.
The two networks are connected by a hardware VPN solution, which is decent, but obviously not the speediest thing in the world.
The two databases are similar, and share many of the same tables, but they are not 100% the same. Many of the tables on both sides are very specific to either the internal or external application.
So the question is: when a user updates their information or submits a record on the public website, how do you transfer that data to the internal application's database so it can be managed by the internal staff? And vice versa... how do you push updates made by the staff back out to the website?
It is worth mentioning that the more "real time" these updates occur, the better. Not that it has to be instant, just reasonably quick.
So far, I have thought about using the following types of approaches:
Bi-directional replication
Web service interfaces on both sides with code to sync the changes as they are made (in real time).
Web service interfaces on both sides with code to asynchronously sync the changes (using a queueing mechanism).
Any advice? Has anyone run into this problem before? Did you come up with a solution that worked well for you?
This is a pretty common integration scenario, I believe. Personally, I think an asynchronous messaging solution using a queue is ideal.
You should be able to achieve near real time synchronization without the overhead or complexity of something like replication.
Synchronous web services are not ideal because your code will have to be very sophisticated to handle failure scenarios. What happens when one system is restarted while the other continues to publish changes? Does the sending system get timeouts? What does it do with those? Unless you are prepared to lose data, you'll want some sort of transactional queue (like MSMQ) to receive the change notices and take care of making sure they get to the other system. If either system is down, the changes (passed as messages) will just accumulate and as soon as a connection can be established the re-starting server will process all the queued messages and catch up, making system integrity much, much easier to achieve.
There are some open source tools that can really make this easy for you if you are using .NET (especially if you want to use MSMQ).
nServiceBus by Udi Dahan
Mass Transit by Dru Sellers and Chris Patterson
There are commercial products also, and if you are considering a commercial option see here for a list of of options on .NET. Of course, WCF can do async messaging using MSMQ bindings, but a tool like nServiceBus or MassTransit will give you a very simple Send/Receive or Pub/Sub API that will make your requirement a very straightforward job.
If you're using Java, there are any number of open source service bus implementations that will make this kind of bi-directional, asynchronous messaging a snap, like Mule or maybe just ActiveMQ.
You may also want to consider reading Udi Dahan's blog, listening to some of his podcasts. Here are some more good resources to get you started.
I'm mid-way through a similar project except I have multiple sites that need to keep in sync over slow connections (dial-up in some cases).
Firstly you need to track changes, if you can use SQL 2008 (even the Express version is enough if the 2Gb limit isn't a problem) this will ease the pain greatly, just turn on Change Tracking on the database and each table. We're using SQL Server 2008 at the head office with the extended schema and SQL Express 2008 at each site with a sub-set of data and limited schema.
Secondly you need to track your changes, Sync Services does the trick nicely and supports using a WCF gateway into the main database. In this example you will need to use the Sync using SQL Express Client sample as a starting point, note that it's based on SQL 2005 so you'll need to update it to take advantage of the Change Tracking features in 2008. By default the Sync Services uses SQL CE on the clients, which I'm sure isn't enough in your case. You'll need a service that runs on your Web Server that periodically (could be as often as every 10 seconds if you want) runs the Synchronize() method. This will tell your main database about changes made locally and then ask the server for all changes made there. You can set up the get and apply SQL code to call stored procedures and you can add event handlers to handle conflicts (e.g. Client Update vs Server Update) and resolve them accordingly at each end.
We have a shop as a client, with three stores connected to the same VPN
Two of the shops have a computer running as a "server" for that shop and the the third one has the "master database"
To synchronize all to the master we don't have the best solution, but it works: there is a dedicated PC running an application that checks the timestamp of every record in every table of the two stores and if it is different that the last time you synchronize, it copies the results
Note that this works both ways. I.e. if you update a product in the master database, this change will propagate to the other two shops. If you have a new order in one of the shops, it will be transmitted to the "master".
With some optimizations you can have all the shops synchronize in around 20minutes
Recently I have had a lot of success with SQL Server Service Broker which offers reliable, persisted asynchronous messaging out of the box with very little implementation pain.
It is quick to set up and as you learn more you can use some of the more advanced features.
Unknown to most, it is also part of the desktop editions so it can be used as a workstation messaging system
If you have existing T-SQL skills they can be leveraged as all the code to read and write messages is done in SQL
It is blindingly fast
It is a vastly under-hyped part of SQL Server and well worth a look.
I'd say just have a job that copies the data in the pub database input table into a private database pending table. Then once you update the data on the private side have it replicated to the public side. If you don't have any of the replicated data on the public side updated it should be a fairly easy transactional replication solution.