JPASpringDataRepository partial save - database

I am creating a web application with Spring Boot which uses Spring Data JPA for the database access. I have created a Repository class that extends the JpaRepository as follows:
public interface MyRepository extends JpaRepository <MyClass, Integer>{
}
I am invoking this method from my controller as follows:
myRepo.save(myclassList); //myclassList is a List<MyClass>
In the database table corresponding to myclass, there is a unique constraint on one of the columns. So if the constraint is violated, an exception is thrown. However ideally, I would want the save method to work for those records that do not violate the constraint. However sadly this is not the case. So if the unique constraint is violated for even one record, none of the records get inserted into the database. Is there any workaround for this? Or will I need to manually check each record to see if it exists in the DB and only insert the ones that do not exist in the database?

will I need to manually check each record to see if it exists in the DB and only insert the ones that do not exist in the database?
Short answer
Yes.
Longer Version
The exact way to proceed depends on your specific use case. I see the following options:
Make sure your data is valid before you save it. This is probably the default approach
Save each entity in a separate transaction, this way the rollback will only rollback the changes to that one entity.
Try to save everything in one transaction. If it fails fallback to 1. or 2. This is a little more coding but probably has the best performance if constraint violations are a rare event.

Related

Dealing with 3NF in terms of database domain modelling where attributes are added knowing they create transitive dependency

I am currently working on setting up a database domain model, where in terms of normalization I will be challenged due to transitive dependency. However, for this particular model it is a choice, that we choose to add such transitive dependency for a reason, and I am wondering how you would go about dealing with such cases in the aspect of normalization?
Let me show what I mean:
I have a table called UserSubscription that have the following attributes:
id {dbgenerated}
created
user
price
currency
subscriptionid
The values for:
price
currency
Depend on the subscriptionid, which points to a second table Subscription (in which the subscriptionid is a FK reference to this tables PK). One might say why, would I even consider including duplicate values from the Subscription table into the UserSubscription table? Well the reason is that the Subscription might change at any point in time, and for reference we want to store the original value of the subscription in the UserSubscription so that even if it changes we still have the values that the user signed up for originally.
I know from the perspective of normalization, that this transitive dependency I create should be fixed, and ideally I would move the values back into the subscription table, and just not allow the values to be modified, and instead create a new subscription whenever it is necessary.
But ideally I do not want to create new subscriptions every time something needs to change in those that exist, simply because it is expected these change often - following say market competition values. At the same time for every new subscription created any user will have more to choose from.
This also means that if we no longer want to use a subscription, we would need to: Remove it, and Create a new one. This can be fixed by simply Updating, since we will no longer need the old one anyways.
The above is a school project, I just wonder whether it would ever be "ok" in terms of normalization to choose such approach, when I choose to do so by choice, and to reduce the tasks associated with removing and creating new subscriptions when I expect these would change frequently.
why don't you instead create a M:N table (mapping table) USER_SUBSCRIPTION where you will have the relationships between USER and SUBSCRIPTION ? You can store all values there historically, and don't have to remove/create anything with the change.. it the user decides to opt-out, you only update the flag_active, flag_deleted, flag_dtime_end, whatever works for you...
Here is a simple model for demonstration:
USER
id_user PK
name
... other details
SUBSCRIPTION
id_subscription PK
name
details
flag_active (TRUE|FALSE or 1|0 values)
... other details
USER_SUBSCRIPTION
id_user FK
id_subscription FK
dtime_start -- when the subscription started
dtime_end -- when the subscription ended
flag_valid (T|F or 1|0) -- optional, will give you a quick headsup about active subscriptions ... but this is sort of redundant, for you can get it from the dtime_start vs dtime_end .. up to you
This will give you a very generic (and therefore flexibile / scalable) model to work with users' subscriptions ... no duplications, all handled by FK/PK referential constraints, ... etc

EF Core Code First Automatic Column Not Deletable

I have a Code-First .Net Core project using Entity Framework Core (2.0).
There are two tables, RhythmBlock and RhythmPattern.
Based on a List<RhythmBlock> property on RhythmPattern, EF created a column on RhythmBlock of RhythmPatternId, which I didn't really want.
So I tried all the following:
Mark the List<RhythmBlock> with the Annotation [NotMapped] and update db.
User FluentAPI to tell the db to Ignore and update db.
Delete the List<RhythmBlock> and replace with a method that returns the same, updated db.
All sorts of shenanigans with resetting the Migrations, direct sql, etc.
The results of all my attempts were that anytime I deleted the column RhythmPatternId in table RhythmBlocks, I would get the error Invalid Column Name. Even if the column and all indexes and foreign keys were deleted, all reference removed from the project, the error would continue.
The only way to get my project running again? Add back in the deleted column. So it's working, but I don't understand why?

Dynamic database routing in Django

In my database, I have a Customer table defined in my database that all other tables are foreign keyed on.
class Customer(models.Model):
...
class TableA(models.Model):
Customer = models.ForeignKey(Customer)
...
class TableB(models.Model):
Customer = models.ForeignKey(Customer)
...
I'm trying to implement a database router that determines the database to connect to based on the primary key of the Customer table. For instance, ids in the range 1 - 100 will connect to Database A, ids in the range 101 - 200 will connect to Database B.
I've read through the Django documentation on routers but I'm unsure if what I'm asking is possible. Specifically, the methods db_for_read(model, **hints) and db_for_write(model, **hints) work on the type of the object. This is useless for me as I need routing to be based on the contents of the instance of the object. The documentation further states that the only **hints provided at this moment are an instance object where applicable and in some cases no instance is provided at all. This doesn't inspire me with confidence as it does not explicitly state the cases when no instance is provided.
I'm essentially attempting to implement application level sharding of the database. Is this possible in Django?
Solve Chicken and egg
You'll have to solve the chicken and egg problem when saving a new Customer. You have to save to get an id, but you have to know the id to know where to save.
You can solve that by saving all Customers in DatabaseA first and then check the id and save it in the target db too. See Django multidb: write to multiple databases. If you do it consequently, you won't run into these problems. But make sure to pay attention to deleting Customers.
Then route using **hints
The routing problem that's left is pretty straight forward if an instance is in the hints. Either it is a Customer and you'll return 'DatabaseA' or it has a customer and you'll decide on its customer_id or customer.id.
Try and remember, there is no spoon.
When there is no instance in the hints, but it is a model from your app, raise an error, so you can change the code that created the Queryset. You should always provide hints, when they aren't added automatically.
What will really bake your cookie
If for most queries you have a know Customer, this is ok. But think about queries like TableA.objects.filter(customer__name__startswith='foo')

inserting into a view in SQL server

I have a SQL Server as backend and use ms access as frontend.
I have two tables (persons and managers), manager is derived from persons (a 1:1 relation), thus i created a view managersFull which is basically a:
SELECT *
FROM `managers` `m`
INNER JOIN `persons` `p`
ON `m`.`id` = `p`.`id`
id in persons is autoincrementing and the primary key, id in managers is the primary key and a foreign key, referencing persons.id
now i want to be able to insert a new dataset with a form in ms access, but i can’t get it to work. no error message, no status line, nothing. the new rows aren’t inserted, and i have to press escape to cancel my changes to get back to design view in ms access.
i’m talking about a managers form and i want to be able to enter manager AND person information at the same time in a single form
my question is now: is it possible what i want to do here? if not, is there a “simple” workaround using after insert triggers or some lines of vba code?
thanks in advance
The problem is that your view is across several tables. If you access multiple tables you could update or insert in only one of them.
Please also check the MSDN for more detailed information on restrictions and on proper strategies for view updates
Assuming ODBC, some things to consider:
make sure you have a timestamp field in the person table, and that it is returned in your managers view. You also probably need the real PK of the person table in the manager view (I'm assuming your view takes the FK used for the self-join and aliases it as the ID field -- I wouldn't do that myself, as it is confusing. Instead, I'd use the real foreign key name in the managers view, and let the PK stand on its own with its real name).
try the Jet/ACE-specific DISTINCTROW predicate in your recordsource. With Jet/ACE back ends, this often makes it possible to insert into both tables when it's otherwise impossible. I don't know for certain if Jet will be smart enough to tell SQL Server to do the right thing, though.
if neither of those things works, change your form to use a recordsource based on your person table, and use a combo box based on the managers view as the control with which you edit the record to relate the person to a manager.
Ilya Kochetov pointed out that you can only update one table, but the work-around would be to apply the updates to the fields on one table and then the other. This solution assumes that the only access you have to these two tables is through this view and that you are not allowed to create a stored procedure to take care of this.
To model and maintain two related tables in access you don’t use a query or view that is a join of both tables. What you do is use a main form, and drop in a sub-form that is based on the child table. If the link master and child setting in the sub-form is set correctly, then you not need to write any code and access will insert the person’s id in the link field.
So, don’t use a joined table here. Simply use a form + sub-form setup and you be able to edit and maintain the data and the data in the related child table.
This means you base the form on the table, and not a view. And you base the sub-form on the child table. So, don't use a view here.

What would you do to avoid conflicting data in this database schema?

I'm working on a multi-user internet database-driven website with SQL Server 2008 / LinqToSQL / custom-made repositories as the DAL. I have run across a normalization problem which can lead to an inconsistent database state if exploited correctly and I am wondering how to deal with the problem.
The problem: Several different companies have access to my website. They should be able to track their Projects and Clients at my website. Some (but not all) of the projects should be assignable to clients.
This results in the following database schema:
**Companies:**
ID
CompanyName
**Clients:**
ID
CompanyID (not nullable)
FirstName
LastName
**Projects:**
ID
CompanyID (not nullable)
ClientID (nullable)
ProjectName
This leads to the following relationships:
Companies-Clients (1:n)
Companies-Projects (1:n)
Clients-Projects(1:n)
Now, if a user is malicious, he might for example insert a Project with his own CompanyID, but with a ClientID belonging to another user, leaving the database in an inconsistent state.
The problem occurs in a similar fashion all over my database schema, so I'd like to solve this in a generic way if any possible. I had the following two ideas:
Check for database writes that might lead to inconsistencies in the DAL. This would be generic, but requires some additional database queries before an update and create queries are performed, so it will result in less performance.
Create an additional table for the clients-Projects relationship and make sure the relationships created this way are consistent. This also requires some additional select queries, but far less than in the first case. On the other hand it is not generic, so it is easier to miss something in the long run, especially when adding more tables / dependencies to the database.
What would you do? Is there any better solution I missed?
Edit: You might wonder why the Projects table has a CompanyID. This is because I want users to be able to add projects with and without clients. I need to keep track of which company (and therefore which website user) a clientless project belongs to, which is why a project needs a CompanyID.
I'd go with with the latter, having one or more tables that define the allowable relationships between entities.
Note, there's no circularity in the references you have, so the title is misleading.
What you have is the possibility of conflicting data, that's different.
Why do you have "CompanyID" in the project table? The ID of the company involved is implicitly given by the client you link to. You don't need it.
Remove that column and you've removed your problem.
Additionally, what is the purpose of the "name" column in the client table? Can you have a client with one name, differing from the name of the company?
Or is "client" the person at that company?
Edit: Ok with the clarification about projects without companies, I would separate out the references, but you're not going to get rid of the problem you're describing without constraints that prevent multiple references being made.
A simple constraint for your existing tables would be that not both the CompanyID and ClientID fields of the project row could be non-null at the same time.
If you want to use the table like this and avoid the all the new queries just put triggers on the table and when user tries to insert row with wrong data the trigger with stop him.
Best Regards,
Iordan
My first thought would be to create a special client record for each company with name "No client". Then eliminate the CompanyId from the Project table, and if a project has no client, use the "No client" record rather than a "normal" client record. If processing of such no-client's is special, add a flag to the no-client record to explicitly identify it. (I'd hate to rely on the name being "No Client" or something like that -- too fuzzy.)
Then there would be no way to store inconsistent data so the problem would go away.
In the end I implemented a completely generic solution which solves my problem without much runtime overhead and without requiring any changes to the database. I'll describe it here in case someone else has the same problem.
First off, the approach only works because the only table that other tables are referencing through multiple paths is the Companies table. Since this is the case in my database, I only have to check whether all n:1 referenced entities of each entity that is to be created / updated / deleted are referencing the same company (or no company at all).
I am enforcing this by deriving all of my Linq entities from one of the following types:
SingleReferenceEntityBase - The norm. Only checks (via reflection) if there really is only one reference (no matter if transitive or intransitive) to the Companies table. If this is the case, the references to the companies table cannot become inconsistent.
MultiReferenceEntityBase - For special cases such as the Projects table above. Asks all directly referenced entities what company ID they are referencing. Raises an exception if there is an inconsistency. This costs me a few select queries per CRUD operation, but since MultiReferenceEntities are much rarer than SingleReferenceEntities, this is negligible.
Both of these types implement a "CheckReferences" and I am calling it whenever the linq entity is written to the database by partially implementing the OnValidate(System.Data.Linq.ChangeAction action) method which is automatically generated for all Linq entities.

Resources