Understanding how JPA 2.0 Cache updates

Understanding how JPA 2.0 Cache updates - database

I have a JPA entity with the following definition. Its marked as read only and the cache is set to expire at 3 am.
Please consider this scenario .
1. The table has a record with deptId :100 and department Name: "SALES"
Fetch the record programatically using
entityManager.createNamedQuery("Department.findById").setParameter("depId",100) .getResultList();
The returned record contains the department name as "SALES"
The above record data is modified directly in the backend database using a update
sqlquery. changed the department name from "SALES" to "REGIONAL_SALES"
Programatically fetch the record with deptId :100 using entityManager.createNamedQuery("Department.findById").setParameter("depId",100) .getResultList();
The returned record contains the updated department name
How does JPA know the value in the backend has been updated? Instead of fetching from cache its fetching the updated value from DB
or
Is it the cache got updated after the DB change. I have set the cache to expire only at 3am (24 hr format)
Please help me understand
#NamedQuery(name="Department.findById", query = "SELECT d from Department d WHERE d.deptNo= :depId")
#Entity
#Table(name="DEPARTMENT")
#Cache(expiryTimeOfDay=#TimeOfDay(hour=3))
#ReadOnly
public class Department{
#EmbeddedId
int deptNo;
#Column
String deptName;
//constructor, getter and setter
}

JPA providers have different caching policies. You didn't state what provider do you use but I'm assuming that you are using EclipseLink. Quote from their wiki:
By default in EclipseLink all queries access the database, unless they are by Id, or by cache indexed fields.
I recommend you check
http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Caching/Query_Cache
http://www.eclipse.org/eclipselink/documentation/2.6/concepts/cache003.htm
From my point of view your query should hit cache first.
Its strange though you annotate #EmbeddedId and int typed field.
Does it work for you because it causes following exception?
The Entity class (...) has an embedded at
tribute [id] of type [class int] which is NOT an Embeddable class. Probable reason: missing #Embeddable or missing in orm.xml if metadata-complete

A query will go to the database (barring any implementation-specific option to just use the cache). It isn't a case of the implementation "knowing" the datastore is updated ... it goes there anyway. Not the same for a EM.find() call though.

Related

JPASpringDataRepository partial save

I am creating a web application with Spring Boot which uses Spring Data JPA for the database access. I have created a Repository class that extends the JpaRepository as follows:
public interface MyRepository extends JpaRepository <MyClass, Integer>{
}
I am invoking this method from my controller as follows:
myRepo.save(myclassList); //myclassList is a List<MyClass>
In the database table corresponding to myclass, there is a unique constraint on one of the columns. So if the constraint is violated, an exception is thrown. However ideally, I would want the save method to work for those records that do not violate the constraint. However sadly this is not the case. So if the unique constraint is violated for even one record, none of the records get inserted into the database. Is there any workaround for this? Or will I need to manually check each record to see if it exists in the DB and only insert the ones that do not exist in the database?

will I need to manually check each record to see if it exists in the DB and only insert the ones that do not exist in the database?
Short answer
Yes.
Longer Version
The exact way to proceed depends on your specific use case. I see the following options:
Make sure your data is valid before you save it. This is probably the default approach
Save each entity in a separate transaction, this way the rollback will only rollback the changes to that one entity.
Try to save everything in one transaction. If it fails fallback to 1. or 2. This is a little more coding but probably has the best performance if constraint violations are a rare event.

Fetch data from 20 related tables (through id), combine them to a json File and leverage spring batch for this

I have a Person database in SQL Server with tables like address, license, relatives etc. about 20 of them. All the tables have id parameter that is unique per person. There are millions of records in these tables. I need to combine theses records of the person using their common id parameter, and convert to a json table file with some column name changes. This json file then gets pushed to kafka through a producer. If I can get the example with the kafka producer as item writer- fine, but real problem is understanding the strategy and specifics on how to utilize spring batch item reader, processor, and item writer to create the composite json file. This is my first Spring batch application so I am relatively new to this.
I am hoping for the suggestions on the implementation strategy using a composite reader or processor to use person id as the cursor, and query each table using the id for each table , convert the resulting records to json and aggregate it to a composite, relational json file with root table PersonData that feeds to kafka cluster.
Basically I have one data source, same database for the reader. I plan to use Person table to fetch id and other records unique for the person, and use id as the where clause for 19 other tables. convert each resultset from the table to json, and composite the json object at the end and write to kafka.

We had such an requirement in a project and solved it with the following approach.
In Splitflow, that run parallel, we had a step for ever table that loaded the data of the table in the file, sorted by common id (this is optional, but it is easier for testing, if you have the data in files).
Then we implemented our own "MergeReader".
This mergereader had FlatFileItemReaders for every file/table (let's call them dataReaders). All these FlatFileItemReaders were wrapped with a SingleItemPeekableItemReader.
The logic for the read method of the MergeReader is as follows:
public MyContainerPerId read() {
// you need a container to store the items, that belong together
MyContainerPerId container = new MyContainerPerId();
// peek through all "dataReaders" to find the lowest actual key
int lowestId = searchLowestKey();
for (Reader dataReader : dataReaders) {
// I assume, that more than one entry in a table can belong to
// the same person id
wihile (dataReader.peek().getId() == lowestId) {
{
container.add(dataReader.read());
}
}
// the container contains all entries from all tables
// belonging to the same person id
return container;
}
If you need restart capability, you have implement ItemStream in a way, that it keeps track of the current readposition for every dataReader.

I used the Driving Query Based ItemReaders usage pattern described here to solve this issue.
Reader: just a default implementation of JdbcCursoritemReader with sql to fetch
the unique relational id (e.g. select id from person -)
Processor: Uses this long id as the input and a dao implemented by me using
jdbcTemplate from spring fetches data through queries against each of
the table for a specific id (e.g. select * from license where id=) and map results in list format to a POJO
of Person - then convert to json object (using Jackson) and then to
string
Writer: either write the file out with json string or publish json string to a
topic in case of kafka use

We went through similar exercise migrating 100mn + rows from multiple tables as a form of JSON so that we can post it to a message bus.
The idea is create a view, de-normalize the data and read from that view using JdbcPagingItemReader.Reading from one source has less overhead.
When you de-normalize the data make sure you do not get multiple rows for master table.
Example - SQL server -
create or alter view viewName as
select master.col1 , master.col2,
(select dep1.col1,
dep1.col2
from dependent1 dep1
where dep1.col3 = master.col3 for json path
) as dep1
from master master;
The above will give you dependent table data in a json String with one row for each master table data. Once you retrieve the data you can use GSON or Jackson to convert it as POJO.
We tried to avoid JdbcCursoritemReader as it will pull all data in memory and read one by one from it. It does not support pagination.

Can I order by GeneratedValue Entity Ids to get the last added entity

On every request to my SessionBean I need to receive the last added instance of a JPA entity whose PK is declared with #Id #GeneratedValue(strategy=GenerationType.AUTO) Long id.
My current approach is to add ORDER BY e.id DESC to the query. Unfortunately im not sure whether generated ids are strictly increasing for subsequently persisted entities and I can't seem to find any documentation on that topic. Can anyone help me with that?

JPA does not specify the order of id generation, so the provider is free to issue nonsequential ids.
If you want to rely on the entity insertion order, consider adding a temporal createdAt or modifiedAt field to your entity. This approach is used by some persistace frameworks, e.g. ActiveRecord.
You can leave the generation of this value to the provider by using a callback in a base entity class:
#PrePersist
void makeCreationTimestamp() {
createdAt = System.currentTimeMillis();
}

Dynamic database routing in Django

In my database, I have a Customer table defined in my database that all other tables are foreign keyed on.
class Customer(models.Model):
...
class TableA(models.Model):
Customer = models.ForeignKey(Customer)
...
class TableB(models.Model):
Customer = models.ForeignKey(Customer)
...
I'm trying to implement a database router that determines the database to connect to based on the primary key of the Customer table. For instance, ids in the range 1 - 100 will connect to Database A, ids in the range 101 - 200 will connect to Database B.
I've read through the Django documentation on routers but I'm unsure if what I'm asking is possible. Specifically, the methods db_for_read(model, **hints) and db_for_write(model, **hints) work on the type of the object. This is useless for me as I need routing to be based on the contents of the instance of the object. The documentation further states that the only **hints provided at this moment are an instance object where applicable and in some cases no instance is provided at all. This doesn't inspire me with confidence as it does not explicitly state the cases when no instance is provided.
I'm essentially attempting to implement application level sharding of the database. Is this possible in Django?

Solve Chicken and egg
You'll have to solve the chicken and egg problem when saving a new Customer. You have to save to get an id, but you have to know the id to know where to save.
You can solve that by saving all Customers in DatabaseA first and then check the id and save it in the target db too. See Django multidb: write to multiple databases. If you do it consequently, you won't run into these problems. But make sure to pay attention to deleting Customers.
Then route using **hints
The routing problem that's left is pretty straight forward if an instance is in the hints. Either it is a Customer and you'll return 'DatabaseA' or it has a customer and you'll decide on its customer_id or customer.id.
Try and remember, there is no spoon.
When there is no instance in the hints, but it is a model from your app, raise an error, so you can change the code that created the Queryset. You should always provide hints, when they aren't added automatically.
What will really bake your cookie
If for most queries you have a know Customer, this is ok. But think about queries like TableA.objects.filter(customer__name__startswith='foo')

Many-to-Many relationship in Zend 2 Framework

I use the Zend 2 Framework to build my web application. I implemented my database table models by this tutorial: http://framework.zend.com/manual/2.1/en/user-guide/database-and-models.html
I have a many-to-many relationship between two models in my database. To get data from them I googled and found this link: http://mattmccormick.ca/2010/04/24/how-to-easily-create-models-and-table-relationships-in-zend-framework/
The problem is that all the table models extends from Zend_Db_Table_Abstract in the example. I don't know how to get data from the models.
I have a table containing votings, every voting has a unique hash id. Every voting also has tags. Therefore I defined a table tags with all the tags available and a voting_tag_map where all many-to-many relationships are mapped.
What I have tried so far is the following, that's code from my VotingTable class:
public function getTagsByVoting($votingHash){
$select = $this->tableGateway->getSql()->select();
$select->from(array('v' => 'voting'))
->join('voting_tag_map', 'v.voting_id=voting_tag_map.voting_id')
->join('tags', 'voting_tag_map.tag_id=tags.tag_id');
$resultSet = $this->tableGateway->selectWith($select);
return $resultSet;
}
It says then:
Since this object was created with a table and/or schema in the constructor, it is read only.
Thats because of the from() method. If I delete the from() method, it says:
Statement could not be executed
Can anyone help me please?

Since this object was created with a table and/or schema in the constructor, it is read only.
This error is because you are trying to set the table name in the from clause, but it's already been set in the contructor of the TableGateway, and you can't change it once set.
If you really need to do this then you can extens AbstractTableGateway yourself then you won't have to add a string tablename to the contructor, but you don't really need to use an alias on your main table...
The SQL error you get when you comment out the from() method will be due to your referencing the votes table as it's alias 'v' in your join, when you are not using the alias v, try changing it to 'voting.XXX' from 'v.XXX'