Delphi Solution for data replication between two remote sites loosely connected - database

I'm using Delphi XE4 Architect (Delphi Xe3 is ok as well)
I need to find a smart solution to the following problem
and I would like to use one of these frameworks: kbmMW or RemOjects SDK / DataAbstract or RealThinClient
Currently I have an application using a very simple MSSQL database on a site A that is used by users of a site B through the remote desktop.
The application sometimes needs to show some pictures and also view some PDF, but it is mostly text data entry.
There is no particular reason for me to use MSSQL,
but it is a database that I found already active and populated and I have not built it myself.
And now, it would be complicated to change it.
(Database is not important, not using specific features nor stored procedures nor triggers)
Users of the Site B are connected to the A site via a network connection very slow
and occasionally the connection is not available for a few hours and up to one day (this is the major problem).
The situation of the connection, unfortunately, can not be improved for various reasons.
The database is quite simple has many tables that hardly ever change,
about ten instead undergo daily updates and potentially they may be subject to competing changes.
Mainly the records of these tables contain data that are locked in update
from a single user to edit some fields and then he saves releasing the lock.
I would like to get something very different to optimize performance.
Users of the A site have higher priority, they are more important, because the A site is the headquarters.
I would like to have a copy of the database at Site A to Site B,
so that users of site B can work in local, much faster without using the remote desktop connecting to the site A.
The RDP protocol is not very optimized and in any case if the connection is absent, users could not work.
Synchronize and update databases lock records between the two databases may not be a big problem.
Basically when a user of the Site B acquires edit a record in the database B,
obviously a user of the site A should not be able to modify the same record on the database of the site A.
This should also work in the opposite direction of course.
My big problem is figuring out how handling to the best the situation that occurs
when the connection between B and A is not available for some hours. (And transaction/events is increasing on site B).
Events on Site A have generally priority (on collision) on events on Site B.
Users of the Site B must be able to continue working.
When the connection becomes active, the changes should be sent to the database at Site A.
Obviously this can result in conflicts, but the changes made on the record
possibly by users B can be discarded or it can be done under the supervision of a selective merge
and approval record by record user of the site B.
Well, I hope the scenario is almost explained clearly.
Additional infos:
DB schema is very simple, only tables, no triggers, stored procedure. So I can build one as example but imagine 10 tables that can be updated in concurrency.
DB is used by a desktop app of sales departement, so it contains most secret data.
Remote connection is typically max 512Kbit, but the main problem here is that the connection sometimes may be not active
and user on remote site must work anyway. THis is the main focus.
Total data of daily updates could be at max 10 Mb, compressed, only for DB connections. There are some other data synchronized
on the same connection but they are not part of this job.
I don't want to use specific MSSQL tools or services (replications or so on), because DB could change in future.
Thanks

We do almost exactly this using a Delphi client app, a kbmMW based Delphi server app, MSSQL database (though it used to work quite happily on on DBISAM database too).
We have some tables that only the head office site users are allowed to modify. The smaller tables are transferred in their entirety each time there is a "merge". The larger tables and the transaction type tables all have a date added and/or a date modified field and only those records that have been changed or added in the last 3 weeks or so (configurable) are transferred. This means sites can still update to the latest data even if they have been disconnected for quite some time - we used to have clients in remote places on dubious dial up lines!
We only run the merge routines once or twice a day but it would work equally well on an hourly basis or other time schedule.
At given times of day each site (including head office) "export" their changed/new records to files (eg client dataset tables or similar). These are then zipped up by the application and placed in an "outgoing" folder. The zip file is named based on the location id, date, time etc. The files are transferred by some external means eg via FTP / file share / email etc etc. Each branch office sends/transfers its data files to head office and head office transfers its data to each branch. The files are transferred by whatever means to an "incoming" folder.
On a regular basis (eg hourly) each location does a check on the incoming folder to see if there is anything new for it to import. If so it adds all the new records, branch locations overwrite the head-office data tables with the new ones and edited records are merged in "somehow". This is the tricky bit. The easiest policy is "head office wins" so all edits are accepted unless there is a conflict in which case the head office version wins. Alternatively you could use "last edited wins" - but then you need to make sure clocks are in sync across locations. The other option is to add conflicting records to some form of "suspense" status and let an end user decide at some point in the future. We do this on one data set. Whichever conflict method you choose you need to record each decision in some form of log table and prompt an administrative level user to check occasionally.
When the head office imports data or when data is added at the head office then a field is set to indicate the data is part of the master data. When branches add data this field is empty to indicate it has yet to reach the master set. This helps when branches export their data as they can include all data that doesn't have this field set.
We have found that you can't run the merge interactively as you'll end up never getting any work done and you won't be able to run the merge at night etc. It needs to be fully automated with the ability for an admin user to make adjustments at some point after the fact.
We've been running this approach for several years now on multi-site operations and once it settled down it has worked pretty much flawlessly. With 2 export/import schedules per day we have found the branch offices run perfectly well and are only ever missing less than a days worth of transactions. Works well in our scenario where we don't often have conflicts. Exported data is in the region of 5-10MB which zips up plenty small enough.
Primary keys are vital! We use a GUID and it hasn't let us down yet.
The choice of database server and n-tier framework are, actually, irrelevant. It's the process that matters here.
Basically when a user of the Site B acquires edit a record in the database B, obviously a user of the site A should not be able to modify the same record on the database of the site A. This should also work in the opposite direction of course.
I can't see how you're ever going to make this bit work reliably if both sites have their own copy of the database and you're allowing for dropped/non-existent inter-site connections on occasion.

Related

Creating seperate weblinks for QA and dev region in SNOWFLAKE

I am not very sure if there is possibility in SNOWFLAKE to create separate weblinks for QA and dev region.
Now we have one common link to access SNOWFLAKE in our company and we have QA and Dev databases built in that, I was just wondering if there is a option to create seperate web links for, one link for QA and one link for Dev.
You can have "secondary" account setup, on a new URL that are part of the same bill, but they really are "another" account.
So the question becomes what value does this add.
With different URL's you can reuse the same SQL verbatim and not need to alter it per "region". You can also reuse the same use accounts. If you DDOS the endpoint (which uses to happen with 100+ connections) you also loss access to the admin control surface to make the instance big to handle "the increase in load" (this might have changed over the years, we last had this problem is 2017)
Re-using the same account but have prod-x/dev-x/qa-x users/databases/roles, means you have just one instance to admin. You have to use some region aware software to run/rewrite you SQL.
We did both at my old job. We started with all in one, and just handled it, but we did DDOS the endpoint and block ourselves from making it bigger till we found the tool that was just start new sessions, and run hard queries. So we got a second account (ignore the already extra account is different world regions) and planned to do all dev from that. But when we spun it up, we created some warehouse and back then the SQL commands didn't set a default auto off time like the UI, and some features where missing on the region, so we walked away from that instance for a month or two, and then got a bill with ~15K USD of server charges. Which was unpleasant. Anyways the dev instance never really got used. (the default to the warehouse creation was changed though). For our system having different account was really wasteful. Because to have data "always loading" and have dashboards always loadable for test (and multi-regions at that) means always have one extra-small always running, where-as when they where on the same instance both QA and DEV ran on the same instance, and given the total data load was so tiny, 1 instance was more than enough.
Which is to say, more instances leads to a lot of waste. If you like waste, and extra overhead, go for it. Many people come from a big-iron perspective, thus to avoid noisy neighbor problem, each thing needs to be it's own box, but that is just not an issue here. just use prefix's, and it's "all separate"

Enable users to "hotfix" source data while waiting for upstream source data to change

For a few SaaS tools our company uses, a 3rd party administrates the tools and provides us with daily feeds, which we load into our data warehouse.
Occasionally, a record in one of the feeds will have an error that needs to be fixed ASAP for downstream reporting. However, the SLA for the 3rd party to correct the record(s) in the source SaaS system can take up to two weeks. The 'error' doesn't break anything it is just that a record is closed when it should have stayed open, or a field has the wrong value.
The process is as follows:
BI team A, downstream of us in the data warehouse team, notices the discrepancy.
BI team A corrects the record in their database, which other teams consume from
BI team B, which receives data from the data warehouse and BI team A, raises an alarm because they see a discrepancy between our output and that which they receive from team A.
We (data warehouse team) have to correct the source data
The upstream 3rd party eventually corrects the records
Does anyone have a best practice for this scenario? What is an approach that would:
A. enable the BI team A to correct records ASAP without impacting the data warehouse team, and
B. be rollback-able once the upstream 3rd party corrects the source data?
One idea I had was to use a source-controlled csv file (like a dbt seed table) were it not that records usually contain PII and therefore can't version controlled.
how I would approach this:
Ensure that you have controls on your DW to catch any errors. Having a consumer of your data (BI Team A) telling you that your data is wrong is not a good place to be in!
Have 1 team responsible for fixing the data and in 1 place - this ensures you have control, consistency and auditing. As the data starts in the DW and then moves downstream to other systems, the DW is the place to fix it.
Build a standard process for fixing data that involves as little manual intervention as possible and which has been developed and tested in advance. When you encounter an error, and are under pressure from your customers to fix it, the last thing you want is to be trying to work out how to resolve the error and then developing/running untested code
At a high-level, your standard process should be a copy of the Production process e.g. a copy of the staging table (where you can insert the corrected versions of the incorrect records) and a copy of the loading process but pointed at this copied staging table . Depending on your Production logic you may need to amend the copy to delete/insert or update the incorrect records in your DW. Depending on your toolset, you might be able to achieve this with a separate config file rather than copying tables/logic.
Auditing. You should always be able to trace the fact that records have been amended, which records have been affected and what the changes were
Obviously you need to ensure that the changes you make to the DW cascade down to any consuming systems - either in the regular update process (if your consumers can wait until then) or as a one-off process. Similarly, you need to ensure that when the amended record is finally received from the 3rd Party that it updates your DW correctly and that you've audited the fact that an error has been corrected - presumably you'd want to be able to report on any errors not fixed by the 3rd party within their SLA?

Duplicate records and loss of primary key on MS Access table in multi-user database

Apologies if a similar question has been addressed elsewhere but I'm struggling to find the obvious answer to my issue....
I have rolled out a split end database (.accdb created in Access 2013) to 6 members of my team by providing each with a copy of the front end which links to a back end on a shared network drive. Four of the users are opening the db through Access 2013, one through Access Runtime 2013 and one through Runtime 2010 (32 bit).
The primary job of the database is to allow users to allocate and manage tasks for a set of campaigns. The db centres around a task table which is updated via a bound form. When new task records are created, usually via a control from a parent 'campaign' form, some fields are pre-populated.
The (frequent) bug seems to occur when a two users are editing different task records via the task form at the same time. Occasionally, one of the task records becomes corrupted (hashed out or Chinese characters!) but more often one of the tasks becomes duplicated in place of the other. This then leads to duplicate task IDs and the loss of the primary key on this field.
I have tried setting record locking to both no locks (optimistic locking) - on users' access clients (except the Runtime versions where I can't see there is an option to do this) and on the task form itself - and edit record (pessimistic locking) using the setting in the task form properties.
I am having trouble diagnosing whether the error lies with locking and/or the point at which a record is saved (currently just on form close) or whether there is a bigger weakness in the set up. Does anyone have any ideas as to why this duplication and sometimes corruption might occur? Thanks

Centralized data access or variables

I'm trying to find a way to access a centralized database for both retrieval and update.
the following is what I'm looking for,
Server 1 has this variable for example
int counter;
Server 2 will be interacting with the user, and will increase the counter whenever the user uses the service, until a certain threshold is reached. when this threshold is reached then server 2 will start rejecting the user access.
Also, the user will be able to use multiple servers (like server 2) from multiple locations and each time the user accesses the access any server the counter will be increased.
I tried google but it's hard to search for something without a name.
One approach to designing this is to do sharding by user - i.e. split the users between your servers depending on the ID of the user. That is, if you have 10 servers, then users with ID's ending with 2 would have all of their data stored on server 2, and so on. This assumes that user ID's are distributed uniformly.
One other approach is to shard the users by location - if you have servers in Asia vs Europe, for example. You'd need a property in the User record that tells you where the user is located; based on that, you'll know which server to route them to.
Ultimately, all of these design options have a concept of "where does the master record for a user reside?" Each of these approaches attempts to definitively answer this question.
A different category of approaches has to do with multi-master replication, which is supported by some database vendors; this approach does not scale as well (i.e. it's hard to get it to scale to 20 servers), but you might want to look into it, too.

Database time acces in Heroku with Play Framework

I am having a problem and I need your help.
I am working with Play Framework v1.2.4 in java, and my server is uploaded in the Heroku servers.
All works fine, I can access to my databases and all is ok, but I am experiment troubles when I do a couple of saves to the database.
I have a method who store data many times in the database and return a notification to a mobile phone. My problem is that the notification arrives before the database finish to save the data, because when it arrives I request for the update data to the server, and it returns the data without the last update. After a few seconds I have trying to update again, and the data shows correctly, therefore I think there is a time-access problem.
The idea would be that when the databases end to save the data, the server send the notification.
I dont know if this is caused because I am using the free version of the Heroku Servers, but I want to be sure before purchasing it.
In general all requests to cloud databases are always slower than the same working on your local machine. Even simply query that on your computer needs just 0.0001 sec can be as slow as 0.5 sec in the cloud. Reason is simple clouds providers uses shared databases + (geo) replications, which just... cannot be compared to the database accessed only by one program on the same machine.
Also keep in mind that free Heroku DB plans doesn't offer ANY database cache, which means that every query is fetched from the cloud directly.
As we don't know your application it's hard to say what is the bottleneck anyway almost for sure you have at least 3 ways to solve your problem. They are not an alternatives, probably you will need to use (or at least check) all of them.
You need to risk some basic plan and see how things changed with paid version, maybe it will be good enough for you, maybe not.
Redesign your application to make less queries. For an example instead sending 10 queries to select 10 different rows, you will need to send one query, which selects all 10 records at once.
Use Play's cache API to avoid repeating selecting the same set of data again and again. For an example, if you have some categories, which changes rarely, but you need category tree for each article, you don't need to fetch categories from DB every time, instead you can store a List of categories in cache, so you will need to use only one request to fetch article's content (which can be cached for some short time as well...)

Resources