A way to differentiate datas when synchronize db?

A way to differentiate datas when synchronize db? - database

I have a web app which I can create some notes, each time I create a new note, it will insert to a table with an auto_increment id. (quite obvious)
Now I want to develop an android app which I can create notes too (save them locally in sqlite), and then syncronize those notes with the server.
The problem is, when I create notes in my phone they will have their own auto_increment id which many times will be the same with those notes in server!
I don't care to have duplicated notes (actually I don't think there is a way to differentiate if the new note is duplicated or not, because they don't have some physical id), the problem is if they have same id (primary key), I won't be able to insert them to the server.
Any suggestion?

You could use an UUID as a key for your note.
That way, each entry should have an unique id, be it created on the server or on the client.
To create a UUID, you can use UUID.randomUUID().

The most obvious solution would be to give each note its own unique hash or GUID in addition to the database's auto_increment_id.
You'd then use these unique values as the basis for synchronisation in conjunction with a "last synced" timestamp in each of the tables so that you know what data needs to be synced and can easily determine if the data already exists in the destination (and should be updated) or whether it's a new note.

I'm sorry but i think that your DB structure is wrong. You cannot use autoincrement field in this way, different DBs with a disconnected architecture. Autoincrement values are created for a specific use, if you need to merge two tables like this, you have to implement a different logic. Use a note_id to identify a note in a unique way, using more data (i.e. the user id, the device id etc.) to make this id unique. Autoincrement will only give you a messy architecture at best in this scenario

Related

SQL Server Employee table with existing ID number

I am attempting to create an Employee table in SQL Server 2016 and I want to use EmpID as the Primary Key and Identity. Here is what I believe to be true and my question: When I create the Employee table with EmpID as the Primary Key and an Identity(100, 1) column, each time I add a new employee, SQL Server will auto create the EmpID starting with 100 and increment by 1 with each new employee. What happens if I want to import a list of existing employees from another company and those employees already have an existing EmpID? I haven't been able to figure out how I would import those employees with the existing EmpID. If there is a way to import the employee list with the existing EmpID, will SQL Server check to make sure the EmpID's from the new list does not exist for a current employee? Or is there some code I need to write in order to make that happen?
Thanks!

You are right about primary keys, but about importing employees from another company and Merging it with your employee list, you have to ask these things:
WHY? Sure there are ways to solve this problem, but why will you merge other company employees into your company employee?
Other company ID structure: Most of the time, companies have different ID structure, some have 4 characters others have only numbers so on and so forth. But you have to know the differences of the companies ID Structure.
If the merging can't be avoided, then you have to tell the higher ups about the concern, and you have to tell them that you have to give the merging company new employee ID's which is a must. With this in my, simply appending your database with the new data is the solution.

This is an extremely normal data warehousing issue where a table has data sources from multiple places. Also comes up in migration, acquisitions, etc.
There is no way to keep the existing IDs as a primary key if there are multiple people with the same ID.
In the data warehouse world we would always create a new surrogate key, which is the primary key to the table, and include the original key and a source system identifier as two attributes.
In your scenario you will probably keep the existing keys for the original company, and create new IDs for the new employees, and save the oldID in an additional column for historical use.
Either of these choices also means that as you migrate other associated data such as leave information imported from the old system, you can translate it to the new key by looking up OldID in the employee table, and finding the associated newID to associate it with when saving your lave records in the new system.
At the end of the day there is no alternative to this, as you simply cant have two employees with the same primary key.

I have never seen any company that migrate employees from another company and keep their existed employee id. Usually, they'll give them a new ID and keep the old one in the employee file for references uses. But they never uses the old one as an active ID ever.
Large companies usually uses serial of special identities that are already defined in the system to distinguish employees based on field, specialty..etc.
Most companies they don't do the same as large ones, but instead, they stick with one identifier, and uses dimensions as an identity. These dimensions specify areas of work for employees, projects, vendors ..etc. So, they're used in the system globally, and affected on company financial reports (which is the main point of using it).
So, what you need to do is to see the company ID sequence requirements, then, play your part on that. As depending on IDENTITY alone won't be enough for most companies. If you see that you can depend on identity alone, then use it, if not, then see if you can use dimensions as an identity (you could create five dimensions - Company, Project, Department, Area, Cost Center - it will be enough for any company).
if you used identity alone, and want to migrate, then in your insert statement do :
SET IDENTITY_INSERT tableName ON
INSRT INTO tableName (columns)
...
this will allow you to insert inside identity column, however, doing this might require you to reset the identity to a new value, to avoid having issues. read DBCC CHECKIDENT
If you end up using dimensions, you could make the dimension and ID both primary keys, which will make sure that both are unique in the table (treated as one set).

Event Sourcing SQL Populate Parent and Child Table

Following up from question
CQRS Read Model Design when Event Sourcing with a Parent-Child-GrandChild… relationship:
We utilize Event sourcing with SQL Server 2016 at Example: furniture company.
(1) We have a Parent and Child table. Say a FurnitureDescriptionTable, (Parent table- description of all furniture Items) and FurnitureOrders(Child - multiple customers orders, refers to FurnitureDescription table). Should the join column between these be Guid or Integer Identity in SQL?
(2) If Guid, who generates the Guid, API or SQL? any reason?

Choosing what kind of type you need for for primary/foreign keys is a known problem in RDBMS world. Simple googling will help. But still:
Guids are usually done on the application side. This option is popular (since you are referring to CQRS) when command handlers can generate complete domain objects, including the identity. Otherwise, you need to have a unique identity generator, which might be non-trivial, but still feasible in some databases, like using Oracle sequences.
Numbers are usually chosen for database-generated ids. Then, new id will only be known when the row is inserted to a table. For event-sourcing scenario this is not an option, since you will only insert on the read side, but objects are created on the write side.

Why Do I need user id attribute?

I am currently trying to design a social network type of website and this is the class diagram
that I have so far
at the moment I have userId and username in separate tables because I wanted to normalize these tables but now I am not sure why do I need the userId attribute? I have done research and a lot of similar projects have this attribute but I don't get why? if the username is already going to uniquely identify a particular user.
By the way I am aware I have a problem with the requests table because at the moment with the attributes given I cannot identify a primary key
Thanks

Two big reasons I can think of:
Optimization. SQL databases typically perform far better when using integer primary keys than varchar ones. Lookup-something-by-user is one of the most common operations in this environment, so this has real performance implications. Many DBAs don't like GUID/UUIDs as PKs for exactly this reason.
Nothing dictates that a username must uniquely identify users. Case in point: Stack Exchange user handles don't have to be unique, and are freely editable.

Single column/primary key only table for referential integrity?

Maybe i'm going about this wrong but my working on a database design for one of my projects.
I have an entity with a classification column which groups up entities into convenient categories for the user. These classifications are predefined and unchangeable by the user (at least thats the current design).
I'm trying to decide if I should have a 'EntityClassification' table which contains simply an 'Id' column as the primary key with no other information in order to have an enforced relationship between the Entity:Classification -> EntityClassification:Id.
I don't plan to have a name/description column in EntityClassification since my current thought is that I'll need to support localization of these pre-defined names which will be done with static string table like resource files downloaded to the client based on their country/language. There really isn't any other data which is associated with this EntityClassfication that I would want and a table seems like it might be an overkill?
Is this common/recommend practice for this type of problem? We're using SQL Server 2008 and don't have an enum datatype for the database which would seem to be really what i'm trying to achieve.

You should have the table with name and description not only for end user display, but internal documentation so when the users say 'my query based on this classification doesn't work!' someone hired in the future will know which ID they're talking about.

Do you just want to ensure that the values in Entity:Classification are restricted to your pre-determined list? If so a check constraint might be what you need.
Such constraints aren't as flexible as foreign keys: to alter the checked values we have to drop and recreate the constraint, but then you say there are no plans to change the values so that shouldn't matter.

What would you do to avoid conflicting data in this database schema?

I'm working on a multi-user internet database-driven website with SQL Server 2008 / LinqToSQL / custom-made repositories as the DAL. I have run across a normalization problem which can lead to an inconsistent database state if exploited correctly and I am wondering how to deal with the problem.
The problem: Several different companies have access to my website. They should be able to track their Projects and Clients at my website. Some (but not all) of the projects should be assignable to clients.
This results in the following database schema:
**Companies:**
ID
CompanyName
**Clients:**
ID
CompanyID (not nullable)
FirstName
LastName
**Projects:**
ID
CompanyID (not nullable)
ClientID (nullable)
ProjectName
This leads to the following relationships:
Companies-Clients (1:n)
Companies-Projects (1:n)
Clients-Projects(1:n)
Now, if a user is malicious, he might for example insert a Project with his own CompanyID, but with a ClientID belonging to another user, leaving the database in an inconsistent state.
The problem occurs in a similar fashion all over my database schema, so I'd like to solve this in a generic way if any possible. I had the following two ideas:
Check for database writes that might lead to inconsistencies in the DAL. This would be generic, but requires some additional database queries before an update and create queries are performed, so it will result in less performance.
Create an additional table for the clients-Projects relationship and make sure the relationships created this way are consistent. This also requires some additional select queries, but far less than in the first case. On the other hand it is not generic, so it is easier to miss something in the long run, especially when adding more tables / dependencies to the database.
What would you do? Is there any better solution I missed?
Edit: You might wonder why the Projects table has a CompanyID. This is because I want users to be able to add projects with and without clients. I need to keep track of which company (and therefore which website user) a clientless project belongs to, which is why a project needs a CompanyID.

I'd go with with the latter, having one or more tables that define the allowable relationships between entities.

Note, there's no circularity in the references you have, so the title is misleading.
What you have is the possibility of conflicting data, that's different.
Why do you have "CompanyID" in the project table? The ID of the company involved is implicitly given by the client you link to. You don't need it.
Remove that column and you've removed your problem.
Additionally, what is the purpose of the "name" column in the client table? Can you have a client with one name, differing from the name of the company?
Or is "client" the person at that company?
Edit: Ok with the clarification about projects without companies, I would separate out the references, but you're not going to get rid of the problem you're describing without constraints that prevent multiple references being made.
A simple constraint for your existing tables would be that not both the CompanyID and ClientID fields of the project row could be non-null at the same time.

If you want to use the table like this and avoid the all the new queries just put triggers on the table and when user tries to insert row with wrong data the trigger with stop him.
Best Regards,
Iordan

My first thought would be to create a special client record for each company with name "No client". Then eliminate the CompanyId from the Project table, and if a project has no client, use the "No client" record rather than a "normal" client record. If processing of such no-client's is special, add a flag to the no-client record to explicitly identify it. (I'd hate to rely on the name being "No Client" or something like that -- too fuzzy.)
Then there would be no way to store inconsistent data so the problem would go away.

In the end I implemented a completely generic solution which solves my problem without much runtime overhead and without requiring any changes to the database. I'll describe it here in case someone else has the same problem.
First off, the approach only works because the only table that other tables are referencing through multiple paths is the Companies table. Since this is the case in my database, I only have to check whether all n:1 referenced entities of each entity that is to be created / updated / deleted are referencing the same company (or no company at all).
I am enforcing this by deriving all of my Linq entities from one of the following types:
SingleReferenceEntityBase - The norm. Only checks (via reflection) if there really is only one reference (no matter if transitive or intransitive) to the Companies table. If this is the case, the references to the companies table cannot become inconsistent.
MultiReferenceEntityBase - For special cases such as the Projects table above. Asks all directly referenced entities what company ID they are referencing. Raises an exception if there is an inconsistency. This costs me a few select queries per CRUD operation, but since MultiReferenceEntities are much rarer than SingleReferenceEntities, this is negligible.
Both of these types implement a "CheckReferences" and I am calling it whenever the linq entity is written to the database by partially implementing the OnValidate(System.Data.Linq.ChangeAction action) method which is automatically generated for all Linq entities.