I have a cluster sync'd by SymmetricDS. I want to keep the ranges of primary key different per server. However if I change the auto increment base on one server it is reflected on the other servers. Is there a way to achieve this?
What database are you using? SymmetricDS will allow and identity insert for example in MSSQL on the target so that the source id will match the target and not auto increment. This will keep the databases in sync. Im not sure what you are asking when you say you want to keep different ranges per server? SymmetricDS does not sync the seed values for ranges, only the data. Are you requesting a load with a create tables first by chance this would send the source definition from source to target and would adjust the target seed.
This problem could be solved by increasing the length of columns representing these IDs at the central node. This increased length then can be used for a unique prefix per leaf node in the sync graph. I have solved the issue by adding my own implementation of org.jumpmind.symmetric.io.data.writer.DatabaseWriterFilterAdapter implementing the interface org.jumpmind.symmetric.ext.ISymmetricEngineAware. This filter is then packaged with SymmetricDs engine and registered with symmetric-extensions.xml file. The filter would intercept all incoming rows of data, check the sender and prepend a unique ID. Same has to be done for all foreign keys.
If such data has to be synced back from central to leaf nodes, then a reverse implementation has to be created and registered that will be stripping the prefixes before sending data to leaf nodes.
It sounds complicated, but it's really not and it works quite well once put in place
Related
For my project I need ids that can be easily shared, so firestores default auto generated ids won't work.
I am looking for a way to auto generate id like 8329423 that would be incremented or randomly chosen in range 0 to 9999999.
Firestore's auto-ID fields are designed to statistically guarantee that no two clients will ever generate the same value. This is why they're as long as they are: it's to ensure there is enough randomness (entropy) in them.
This allows Firestore to determine these keys completely client-side without needing to look up on the server whether the key it generated was already generated on another client before. And this in turn has these main benefits:
Since the keys are generated client-side, they can also be generated when the client is not connected to any server.
Since the keys are generated client-side, there is no need for a roundtrip to the server to generate a new key. This significantly speeds up the process.
Since the keys are generated client-side, there is no contention between clients generating keys. Each client just generates keys as needed.
If these benefits are important to your use-case, then you should strongly consider whether you're likely to create a better unique ID than Firestore already does. For example, Firestore's IDs have 62^20 unique values, which is why they're statistically guaranteed to never generate the same value over a very long period of time. Your proposed range of 0 - 9999999 has 1 million unique values, which is much more likely to generate duplicate.
If you really want this scheme for IDs, you will need to store the IDs that you've already given out on the server (likely in Firestore), so that you can check against it when generating a new key. A very common way to do this is to keep a counter of the last ID you've already handed out in a document. To generate a new unique ID, you:
Read the latest counter value from the document.
Increment the counter.
Write the updated counter value to the document.
Use the updated counter value in your code.
Since this read-update-write happens from multiple clients, you will need to use a transaction for it. Also note that the clients now are coordinating the key-generation, so you're going to experience throughput limits on the number of keys you can generate.
I'm writing an application with offline support. i.e. browser/mobile clients sync commands to the master db every so often.
I'm using uuid's on both client and server-side. When synching up to the server, the servre will return a map of local uuids (luid) to server uuids (suid). Upon receiving this map, clients updated their records suid attributes with the appropriate values.
However, say a client record, e.g. a todo, has an attribute 'list_id' which holds the foreign key to the todos' list record. I use luids in foreign_keys on clients. However, when that attribute is sent over to the server, it would dirty the server db with luids rather than the suid the server is using.
My current solution, is for the master server to keep a record of the mappings of luids to suids (per client id) and for each foreign key in a command, look up the suid for that particular client and use the suid instead.
I'm wondering wether others have come across thus problem and if so how they have solved it? Is there a more efficient, simpler way?
I took a look at this question "Synchronizing one or more databases with a master database - Foreign keys (5)" and someone seemed to suggest my current solution as one option, composite keys using suids and autoincrementing sequences and another option using -ve ids for client ids and then updating all negative ids with the suids. Both of these other options seem like a lot more work.
Thanks,
Saimon
From my experience it's easiest taking the composition approach, particularly when it comes to debugging issues and potential needs for rolling back, i.e. it's really helpful to know what requests came from what machine and resulted in what changes. Whenever you're effectively dealing with many to one you have to have a way to effectively isolate all of the many, it also allows you to do more intelligent conflict management when you have two of the 'many' sending updates that are non-complementary (if you want to do that sort of thing).
I've just thought of another possibility:
When assigning luids on the client-side, keep a map of all assignments of that luid e.g.
something like (json):
{
'luid123': [{model: list, attribute: 'id'},
{model: todo, attribute: 'list_id'}]
}
When we get the global luid2suid map from the server (after sync up), for each luid we look up the luid in the luid map and for each entry update the appropriate attribute in the appropriate model with the guid accordingly and then remove the entry from the luid mapping.
What do you think?
This way I avoid having to do expensive look ups in the global luid2suid map for all foreign keys of command synched. Another benefit is that foreign keys are all suids on the client too and I'd only have to look up suids from luids on the server side for cases of offline record creation and modification before synching back to server.
It's just an idea that just popped into my head. I'm still hoping for more feedback on the subject
I'm using Google Gears to be able to use an application offline (I know Gears is deprecated). The problem I am facing is the synchronization with the database on the server.
The specific problem is the primary keys or more exactly, the foreign keys. When sending the information to the server, I could easily ignore the primary keys, and generate new ones. But then how would I know what the relations are.
I had one sollution in mind, bet the I would need to save all the pk for every client. What is the best way to synchronize multiple client with one server db.
Edit:
I've been thinking about it, and I guess seqential primary keys are not the best solution, but what other possibilities are there? Time based doesn't seem right because of collisions which could happen.
A GUID comes to mind, is that an option? It looks like generating a GUID in javascript is not that easy.
I can do something with natural keys or composite keys. As I'm thinking about it, that looks like the best solution. Can I expect any problems with that?
This is not quite a full answer, but might at least provide you with some ideas...
The question you're asking (and the problem you're trying to address) is not specific to Google Gears, and will remains valid with other solutions, like HTML 5 or systems based or Flash/Air.
There's been a presentation about that subject given during the last ZendCon a few month ago -- and the slides are available on slideshare : Planning for Synchronization with Browser-Local Databases
Going through thoses slides, you'll see notes about a couple of possibilities that might come to mind (some did actually come to your mind, or in other answers) :
Using GUID
Composite Keys
Primary key pool (i.e. reserve a range of keys beforehand)
Of course, for each one of those, there are advantages... and drawbacks -- I will not copy-paste them : take a look at the slides ;-)
Now, in your situation, which solution will be the best ? Hard to say, actually -- and the sooner your think about synchronisation, the better/easier it'll probably be : adding stuff into an application is so much simpler when that application is still in its design stage ^^
First, it might be interesting to determine whether :
Your application is generally connected, and being dis-connected only rarely happens
Or if your application is generally dis-connected, and only connects once in a while.
Then, what are you going to synchronise ?
Data ?
Like "This is the list of all commands made by that user"
With that data replicated on each dis-connected device, of course -- which can each modify it
In this case, if one user deletes a line, and another one adds a line, how to know which one has the "true" data ?
Or actions made on those data ?
Like "I am adding an entry in the list of commands made by that user"
In this case, if one user deletes a line, and another one adds a line, it's easy to synchronize, as you just have to synchronise those two actions to your central DB
But this is not quite easy to implements, especially for a big application / system : each time an action is made, you have to kind of log it !
There is also a specific problem to which we don't generally think -- until it happens : especially if your synchronisation process can take some time (if you have a lot of data, if you don't synchronise often, ...), what if the synchronisation is stopped when it's not finished yet ?
For instance, what if :
A user, in a train, has access to the network, with some 3G card
The synchronisation starts
there is a tunnel -- and the connection is lost.
Having half-synchronised data might not be that good, in most situations...
So, you have to find a solution to that problem, too : in most cases, the synchronisation has to be atomic !
I've came up with the following solution:
Every client gets a unique id from the server. Everywhere a primary key is referenced, I use a composite key with the client id and an auto increment field.
This way, the combination is unique, and it's easy to implement. The only thing left is making sure every client does get a unique id.
I just found out one drawback: SQLite doesn't support autoincrement on composite primary keys, so I would have to handle the id's myself.
I would use a similar setup to your latest answer. However, to get around your auto-increment issue, I would use a single auto-increment surrogate key in your master database and then store the client primary key and your client id as well. That way you are not losing or changing any data in the process and you are also tracking which client the data was originally sourced from.
Be sure to also set up a unique index on your Client Pk, Client Id to enable referential integrity from any child tables.
Is there a reasonable limit to how many objects the client can create while disconnected?
One possibilty I can see is to create a sort of "local sequence".
When your client connects to the central server, it gets a numeric ID, say a 7 digit number (the server generates it as a sequence).
The actual PKs are created as strings like this: 895051|000094 or 895051|005694 where the first part is the 7 digit number sent from the server, and the second part is a "local" sequence managed by the client.
As soon as you synch with the central, you can get a new 7 digit number and restart your local sequence. This is not too different from what you were proposing, all in all. It just makes the actual PK completely independant from the client identity.
Another bonus is that if you have a scenario where the client has never connected to the server, it can use 000000|000094 locally, require a new number from the server and update the keys on its side before sending back to the server for synch (this is tricky if you have lots of FK constraints though, and could not be feasible).
I'm a J2EE developer & we are using hibernate mapping with a PostgreSQL database.
We have to keep track of any changes occurs in the database, in others words all previous & current values of any field should be saved. Each field can be any type (bytea, int, char...)
With a simple table it is easy but we a graph of objects things are more difficult.
So we have, speaking in a UML point of view, a graph of objects to store in the database with every changes & the user.
Any idea or pattern how to do that?
A common way to do this is by storing versions of objects.
If add a "version" and a "deleted" field to each table that you want to store an audit trail on, then instead of doing normal updates and deletes, follow these rules:
Insert - Set the version number to 0 and insert as normal.
Update - Increment the version number and do an insert instead.
Delete - Increment the version number, set the deleted field to true and do an insert instead.
Retrieve - Get the record with the highest version number and return that.
If you follow this pattern, every time you update you will create a new record rather than overwriting the old data, so you will always be able to track back and see all the old objects.
This will work exactly the same for graphs of objects, just add the new fields to each table within the object graph, and handle each insert/update/delete for each table as described above.
If you need to know which user made the modification, you just add a "ModifiedBy" field as well.
(You can either do this processing in your DA layer code, or if you prefer you can use database triggers to catch your update/delete/retrieve calls and re-process them following the rules.)
Obviously, you need to consider space requirements, as every single update will result in a fully new record. If your application is update heavy, you are going to generate a lot of data. It's common to also include a "last modified time" fields so you can process the database off line and delete data older than required.
Current RDBMS implementations are not very good at handling temporal data. That's one reason why maintaining separate journalling tables through triggers is the usual approach. (The other is that audit trails frequently have different use cases to regular data, and having them in separate tables makes it easier to manage access to them). Oracle does a pretty slick job of hiding the plumbing in its Total Recall product, but being Oracle it charges $$$ for this.
Scott Bailey has published a presentation on temporal data in PostgreSQL. Alas it won't help you right now but it seems like some features planned for 8.5 and 8.6 will enable the transparent storage of time-related data. Find out more.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
(Not related to versioning the database schema)
Applications that interfaces with databases often have domain objects that are composed with data from many tables. Suppose the application were to support versioning, in the sense of CVS, for these domain objects.
For some arbitry domain object, how would you design a database schema to handle this requirement? Any experience to share?
Think carefully about the requirements for revisions. Once your code-base has pervasive history tracking built into the operational system it will get very complex. Insurance underwriting systems are particularly bad for this, with schemas often running in excess of 1000 tables. Queries also tend to be quite complex and this can lead to performance issues.
If the historical state is really only required for reporting, consider implementing a 'current state' transactional system with a data warehouse structure hanging off the back for tracking history. Slowly Changing Dimensions are a much simpler structure for tracking historical state than trying to embed an ad-hoc history tracking mechanism directly into your operational system.
Also, Changed Data Capture is simpler for a 'current state' system with changes being done to the records in place - the primary keys of the records don't change so you don't have to match records holding different versions of the same entity together. An effective CDC mechanism will make an incremental warehouse load process fairly lightweight and possible to run quite frequently. If you don't need up-to-the minute tracking of historical state (almost, but not quite, and oxymoron) this can be an effective solution with a much simpler code base than a full history tracking mechanism built directly into the application.
A technique I've used for this in that past has been to have a concept of "generations" in the database, each change increments the current generation number for the database - if you use subversion, think revisions.
Each record has 2 generation numbers associated with it (2 extra columns on the tables) - the generation that the record starts being valid for, and the generation the it stops being valid for. If the data is currently valid, the second number would be NULL or some other generic marker.
So to insert into the database:
increment the generation number
insert the data
tag the lifetime of that data with valid from, and a valid to of NULL
If you're updating some data:
mark all data that's about to be modified as valid to the current generation number
increment the generation number
insert the new data with the current generation number
deleting is just a matter of marking the data as terminating at the current generation.
To get a particular version of the data, find what generation you're after and look for data valid between those generation versions.
Example:
Create a person.
|Name|D.O.B |Telephone|From|To |
|Fred|1 april|555-29384|1 |NULL|
Update tel no.
|Name|D.O.B |Telephone|From|To |
|Fred|1 april|555-29384|1 |1 |
|Fred|1 april|555-43534|2 |NULL|
Delete fred:
|Name|D.O.B |Telephone|From|To |
|Fred|1 april|555-29384|1 |1 |
|Fred|1 april|555-43534|2 |2 |
An alternative to strict versioning is to split the data into 2 tables: current and history.
The current table has all the live data and has the benefits of all the performance that you build in.
Any changes first write the current data into the associated "history" table along with a date marker which says when it changed.
If you are using Hibernate JBoss Envers could be an option. You only have to annotate classes with #Audited to keep their history.
You'll need a master record in a master table that contains the information common among all versions.
Then each child table uses master record id + version no as part of the primary key.
It can be done without the master table, but in my experience it will tend to make the SQL statements a lot messier.
A simple fool-proof way, is to add a version column to your tables and store the Object's version and choose the appropriate application logic based on that version number.
This way you also get backwards compatibility for little cost. Which is always good
ZoDB + ZEO implements a revision based database with complete rollback to any point in time support. Go check it.
Bad Part: It's Zope tied.
Once an object is saved in a database, we can modify that object any number of times right, If we want to know how many no of times that an object is modified then we need to apply this versioning concept.
When ever we use versioning then hibernate inserts version number as zero, when ever object is saved for the first time in the database. Later hibernate increments that version no by one automatically when ever a modification is done on that particular object.
In order to use this versioning concept, we need the following two changes in our application
Add one property of type int in our pojo class.
In hibernate mapping file, add an element called version soon after id element
I'm not sure if we have the same problem, but I required a large number of 'proposed' changes to the current data set (with chained proposals, ie, proposal on proposal).
Think branching in source control but for database tables.
We also wanted a historical log but this was the least important factor - the main issue was managing change proposals which could hang around for 6 months or longer as the business mulled over change approval and got ready for the actual change to be implemented.
The idea is that users can load up a Change and start creating, editing, deleting the current state of data without actually applying those changes. Revert any changes they may have made, or cancel the entire change.
The only way I have been able to achieve this is to have a set of common fields on my versioned tables:
Root ID: Required - set once to the primary key when the first version of a record is created. This represents the primary key across all of time and is copied into each version of the record. You should consider the Root ID when naming relation columns (eg. PARENT_ROOT_ID instead of PARENT_ID). As the Root ID is also the primary key of the initial version, foreign keys can be created against the actual primary key - the actual desired row will be determined by the version filters defined below.
Change ID: Required - every record is created, updated, deleted via a change
Copied From ID: Nullable - null indicates newly created record, not-null indicates which record ID this row was cloned from when updated
Effective From Date/Time: Nullable - null indicates proposed record, not-null indicates when the record became current. Unfortunately a unique index cannot be placed on Root ID/Effective From as there can be multiple null values for any Root ID. (Unless you want to restrict yourself to a single proposed change per record)
Effective To Date/Time: Nullable - null indicates current/proposed, not-null indicates when it became historical. Not technically required but helps speed up queries finding the current data. This field could be corrupted by hand-edits but can be rebuilt from the Effective From Date/Time if this occurs.
Delete Flag: Boolean - set to true when it is proposed that the record be deleted upon becoming current. When deletes are committed, their Effective To Date/Time is set to the same value as the Effective From Date/Time, filtering them out of the current data set.
The query to get the current state of data according to a change would be;
SELECT * FROM table WHERE (CHANGE_ID IN :ChangeId OR (EFFECTIVE_FROM <= :Now AND (EFFECTIVE_TO IS NULL OR EFFECTIVE_TO > :Now) AND ROOT_ID NOT IN (SELECT ROOT_ID FROM table WHERE CHANGE_ID IN :ChangeId)))
(The filtering of change-on-change multiples is done outside of this query).
The query to get the current state of data at a point in time would be;
SELECT * FROM table WHERE EFFECTIVE_FROM <= :Now AND (EFFECTIVE_TO IS NULL OR EFFECTIVE_TO > :Now)
Common indexes created on (ROOT_ID, EFFECTIVE_FROM), (EFFECTIVE_FROM, EFFECTIVE_TO) and (CHANGE_ID).
If anyone knows a better solution I would love to hear about it.