Need to establish Oracle database connectivity in drools to get some data as and when required while executing the rules. How do I go about that?
You shouldn't do this. Instead, you should query your data out of the database first, then pass it into the rules as facts in working memory.
I tried to write a detailed answer about all the reasons you shouldn't do this, but it turns out that StackOverflow has a character limit. So I'm going to give you the high level reasons.
Latency
Data consistency
Lack of DB access hardening
Extreme design constraints for rules
High maintenance burden
Potential security issues
Going in order ...
Latency. Database queries aren't free. Regardless of how good your connection management is, you will incur overhead every time you make a database call. If you have a solid understanding of the Drools execution lifecycle and how it executes rules, and you design your rules to explicitly only query the database in ways that will minimize the number and quantity of calls, you could consider this an OK risk. A good caching layer wouldn't be amiss. Note that having to properly design your rules this way is not trivial, and you'll incur perpetual overhead in having to make sure all of your rules remain compliant.
(Hint: this means you must never ever call the database from the 'when' clause.)
Data consistency. A database is a shared resource. If you make the same query in two different 'when' clauses, there is no guarantee that you'll get back the same result. Again, you could potentially work around this with a deep understanding of how Drools evaluates and executes rules, and designing your rules appropriately. But the same issues from 'latency' will affect you here -- namely the burden of perpetual maintenance. Further the rule design restrictions -- which are quite strict -- will likely make your other rules and use cases less efficient as well because of the contortions you need to pull to keep your database-dependent rules compatible.
Lack of hardening. The Java code you can write in a DRL function is not the same as the Java code you can write in a Java class. DRL files are parsed as strings and then interpreted and then compiled; many language features are simply not available. (Some examples: try-with-resources, annotations, etc.) This makes properly hardening your database access extremely complicated and in some cases impossible. Libraries which rely on annotations like Spring Data are not available to you for use in your DRL functions. You will need to manage your connection pooling, transaction management, connection management (close everything!), error handling, and so on manually using a subset of the Java language that is roughly equivalent to Java 5.
This is, of course, specific to writing your code to access the database as a function in your DRL. If you instead implement your database access in a service which acts like a database access layer, you can leverage the full JDK and its features and functionality in that external service which you then pass into the rules as an input. But in terms of DRL functions, this point remains a major concern.
Rule design constraints. As I mentioned previously, you need to have an in-depth understanding of how Drools evaluates and executes rules in order to write effective rules that interact with the database. If you're not aware that all left hand sides ("when" clauses) are executed first, then the "matches" ordered by salience, and then the right hand sides ("then" clauses) executed in order sequentially .... well you absolutely should not be trying to do this from the rules. Not only do you as the initial implementor need to understand the rules execution lifecycle, but everyone who comes after you who is going to be maintaining your rules needs to also understand this and continue implementing the rules based on these restrictions. This is your high maintenance burden.
As an example, here are two rules. Let's assume that "DataService" is a properly implemented data access layer with all the necessary connection and transaction management, and it is passed into working memory as a fact.
rule "Record Student Tardiness"
when
$svc: DataService() // data access layer
Tardy( $id: studentId )
$student: Student($tardy: tardyCount) from $svc.getStudentById($id)
then
$student.setTardyCount($tardy + 1)
$svc.save($student)
end
rule "Issue Demerit for Excessive Tardiness"
when
$svc: DataService() // data access layer
Tardy( $id: studentId )
$student: Student(tardyCount > 3) from $svc.getStudentById($id)
then
AdminUtils.issueDemerit($student, "excessive tardiness")
end
If you understand how Drools executes rules, you'll quickly realize the problems with these rules. Namely:
we call getStudentById twice (latency, consistency)
the changes to the student's tardy count are not visible to the second rule
So if our student, Alice, has 3 tardies recorded in the database, and we pass in a new Tardy instance for her, the first rule will hit and her tardy count will increment and be saved (Alice will have 4 tardies in the database.) But the second rule will not hit! Because at the time the matches are calculated, Alice only had 3 tardies, and the "issue demerit" rule only triggers for more than 3. So while she has 4 tardies now, she didn't then.
The solution to the second problem is, of course, to call update to let Drools know to reevaluate all matches with the new data in working memory. This of course exacerbates the first issue -- now we'll be calling getStudentById four times!
Finally the last problem are potential security issues. This really depends on how you implement your queries, but you'll need to be doubly sure you're not accidentally exposing any connection configuration (URL, credentials) in your DRLs, and that you've properly sanitized all query inputs to protect yourself against SQL injection.
The right way to do this, of course, is not to do it at all. Call the database first, then pass it to your rules.
As an example, let's say we have a set of rules which is designed to determine if a customer purchase is "suspicious" by comparing it to trends from the previous 3 months' worth of purchases.
// Assume this class serves as our data access layer and does proper connection,
// transaction management. It might be something like a Spring Data JPA repository,
// or something from another library; the specifics are not relevant.
private PurchaseService purchaseService;
public boolean isSuspiciousPurchase(Purchase purchase) {
List<Purchase> previous = purchaseService.getPurchasesForCustomerAfterDate(
purchase.getCustomerId(),
LocalDate.now().minusMonths(3));
KieBase kBase = ...;
KieSession session = kBase.newKieSession();
session.insert(purchase);
session.insert(previous);
// insert other facts as needed
session.fireAllRules();
// ...
}
As you can see, we call the database and pass the result into working memory. Then we can write the rules such that they do work against that existing list, without needing to interact with the database at all.
If our use case requires modifying the database -- eg saving updates -- we can pass those commands back to the caller and they can be invoked after the fireAllRules is completed. Not only will that keep us from having to interact with the database in the rules, but it'll give us better control over our transaction management (you can probably group the updates into a single transaction, even if the originally came from multiple rules). And since we don't need to understand anything about how Drools evaluates and executes rules, it'll be a little more robust in case a rule with a database "update" is triggered twice.
You can use function like below to get details from DB. Here I have written function in DRL file but its suggested to add such code in java file and call specific method from DRL file.
function String ConnectDB(String ConnectionClass,String url,String user, String password) {
Class.forName(ConnectionClass);
java.sql.Connection con = DriverManager.getConnection(url, user, password);
Statement st = con.createStatement();
ResultSet rs = st.executeQuery("select * from Employee where employee_id=199");
rs.first();
return rs.getString("employee_name");
}
rule "DBConnection"
when
person:PersonPojo(name == ConnectDB("com.mysql.jdbc.Driver","jdbc:mysql://localhost:3306/root","root","redhat1!"))
.. ..
then
. . ..
end
Related
When I want to soft delete resources as a policy of my company I can do it in one of two places.
I can do it in my database with some "instead of DELETE" trigger. Like so:
CREATE TRIGGER prevent_resource_delete
BEFORE DELETE ON resource
FOR EACH ROW EXECUTE PROCEDURE resource_soft_delete();
CREATE FUNCTION resource_soft_delete() RETURNS trigger
LANGUAGE plpgsql AS
$$
BEGIN
UPDATE resource SET deleted_at = now() WHERE id = OLD.id;
RETURN NULL;
END;
$$;
That's how pretty much every article about soft deletes suggests to do it. Other than articles written specifically by a ORM owner because they have their in-house solution.
I like this approach. The logic in my APIs looks like I am just deleting the resource.
Resource.query().deleteById(id); // Using a query builder
db.query('DELETE FROM resource WHERE id = $1;', [id]); // Using native library
To me it seems more natural and I don't have to worry about other developers accidentally hard deleting stuff. But it can also be confusing to those who don't know what is actually going on. And having any logic in the database means I can have bugs there (soft deleting logic is usually dead simple, but still...), which would be hard to debug. At least compared to those in my APIs.
But also I can instead have the logic in the APIs themselves. Keeping logic next to the other logic. Less elegant but more straightforward. No hidden logic somewhere else. I do lose the protection from people accidentally hard deleting resources.
Resource.query().findById(id).patch({deleted_at: new Date()}); // Using a query builder
db.query('UPDATE resource SET deleted_at = now() WHERE id = $1;', [id]); // Using native library
I am inclined to choose the former option as I consider the choice of whether to soft delete a database matter. The database chooses what to do with deleted data. Deleted data, soft or hard, is in principle not part of the application anymore. The APIs can't retrieve it. It is for me, the developer, to use for analytics, legal reasons or to manually aid a user who wants to recover something he/she considers lost.
But I don't like the downsides. I just talked to a colleague that was worried because he thought we were actually deleting stuff. Now, that could actually be solved with better onboarding and documentation. But should it be like that?
When to implement soft delete logic in the code over the database? Why does every article I find directly suggest the database without even considering the code? It looks like there is a strong reason I can't find.
As per me there isn't any strong reason, it depends on the architect and developer where they decide to put the logic, but below could be the possible reasons behind it ::
First is, as we are deleting something from the DB, so keeping the logic where it's best suited and,
Second writing the logic for each and every API is kind of redundant instead doing it in DB once and for all tables or nodes or collections is of less work to do. :)
I'm new to database applications and I'm trying to use Datamapper to make a ruby web application.
I stumbled across this piece of code which I don't understand:
transaction do |txn|
link = Link.new(:identifier => custom)
link.url = Url.create(:original => original)
link.save
end
I have a few questions: What exactly are transactions? And why was this preferred instead of just doing:
link = Link.new(:identifier => custom)
link.url = Url.create(:original => original)
link.save
When should I consider using transactions? What are the best use-cases? Is there any resource available online where I can read more about such concepts.
Thanks!
Transaction is an indivisible unit of work. The idea comes from the database world and is connected with the problems of data selection/update. Consider the following situation:
user A asks for object O in order to change it.
While A was doing his/her stuff, user B asked for the same object. Object O currently is equal for both users.
Then A puts the update to the database, with changed property O1 to the object O. User B hasn't got this change - his object O is still the same as it was before.
B puts the update to the database with changed property O2 to the object O. The change to O1 is effectively lost.
Basically, it has to do with multi-user access and changes - there are several kinds of problems that arise.
Transaction are also used to couple different operations together into one logical processing statement. For example, you need to delete User with all his/her associated Photos.
The topic is really vast to cover in one post, so I'd recommend reading following articles: wiki#1 and wiki#2.
A transaction is a series of instructions which, upon execution, are seen as one atomic instruction.
This means that all of the instructions must succeed in order for the transaction to succeed. If only one of them fails, you return at the state you were before the beginning of the transaction. This is good for fault-tolerance, for example.
One other field in which transactions are useful is in concurrent applications. Using a transaction avoids interference by other processes.
Hope this helps.
How do you go about collecting and storing data which was not part of the initial database and software design? For example, if you've come up with a pointing system, you have to collect the points for every user which has already been registered. For new users, that would be easy, because the changes of the business logic will reflect the pointing system ... but the old ones?
In general, how does one deal with data, which should have been there from the beginning, but wasn't? Writing manual queries to collect the missing pieces? Using crons?
Well, you are asking for something that is by definition not possible, I think.
deal with data hich should have been there from the beginning, but wasn't?
Because if you are able to deduce the number of points from the existing data in the database. If that were possible, there is obviously no missing data.... Storing the points separately would make it redundant (still a fine option in case you need that for performance).
For example: stackoverflow rewards number of consecutive visits. Let's say they did not do that from the start. If they were logging date-of-visit already, you can recalc the points. So no missing data.
So if that is not possible, you need another solution: either get data from other sources (parse a webserver log) or get the business to draft some extra business rules for the determination of the default values for the existing users (difficult in this particular example).
Writing manual queries to collect the missing pieces? Using crons?
I would populate that in a conversion script or even in a special conversion application if very complex.
I would like to ask experienced users, if you prefer to use data aware controls to add, insert, delete and edit data in DB or you favor to do it manualy.
I developed some DB applications, in which for the sake of "user friendly policy" I run into complicated web of table events (afterinsert, afteredit, after... and beforeedit, beforeinsert, before...). After that it was a quite nasty work to debug the application.
Aware of this risk (later by another application) I tried to avoid this problem, so I paid increased attention to write code well, readable and comprehensive. It seemed everything all right from the beginning, but as I needed to handle some preprocessing stuff before sending and loading data etc, I run into the same problems again, "slowly and inevitably". Sometime I could not use dataaware controls anyway, and what seemed to be a "cool" feature of DAControl at the beginning it turned to an obstacle on the end. I "had to" write special routine for non-dataaware controls, in order to behave as dataaware. Then I asked myself, why on earth should I use dataaware controls? Is it better to found application architecture on non-dataaware controls? It requires more time to write bug-proof code, of course, but does it worth of it? I do not know...
I happened to me several times, like jinxed : paradise on the beginning hell on the end...
I do not know, if I use wrong method to write DB program, if there is some standard common practice how to proceed. Or if it is common problem to everybody?
Thanx for advices and your experiences
I've written applications that used data aware components against TTable style components and applications which used non-data aware components.
My preference these days is to use data aware components but with TClientDataSets rather than TTable style components.
Using a TClientDataSet I don't have to make my user interface structure mimic my database structure. It's flexible enough to fill it with the data from several tables and then when you are applying the updates back to the database you can manually add/delete/update records as you see fit.
The secret should be in DataSet parameter automation, you can create a control that glues datasets together in master-slave way, just by defining connections between them. Ofcourse such control should be fed with form parameters in some other generalized way. In this case calling form with entity identifier, all datasets will get filled in a proper order and will allow to update data in database automatically by provider.
Generally it is better to have DataSets being an exact representation of tables with optional calculated fields (fkInternalCalc sometimes works better as it updates with row change not field change) bound to data aware controls. Data aware controls are the most optimal approach, and less error prone. Like in every aspect, there are exceptions to that.
If you must write too many glue functions, the problem probably is in design pattern not in VCL.
A lot of the time I use data aware controls linked to an in-memory table (kbmMemTable) that is filled from a query.
The benefits I see are:
I have full control over all inserts/updates/posts/edits to the database.
No need to worry about a user leaving a record in update mode (potentially locking other users)
Did I mention full control over all inserts/updates/posts/edits?
Using the in-memory table is as easy as:
dataset.sql.add('select a.field,b.field from a,b');
dataset.open;
inMemoryTable.loadfromdataset(dataset);
inMemoryTable.checkpoint;
And then "resolving" back to the database, you are given access to the original and new data for each field in each record (similar in a way to a trigger) - you can easily transaction and resolve a whole edit back in milliseconds - even if it took the end user 30 mins to fill in the data aware controls.
Have you considered a O/R mapper for Delphi like tiOPF or hcOPF?
This will separate the business domain logic from the database layer. For big and legacy systems, it is even common to add another layer, the 'Anti Corruption Layer', which protects the model from changes in the database design.
We implement the majority of our business rules in the database, using stored procs.
I can never decide how best to pass data constraint violation errors from the database back to the user interface. The constraints I'm talking about are tied more to business rules than data integrity.
For example, a db error such as "Cannot insert duplicate key row" is the same as the business rule "you can't have more than one Foo with the same name". But we've "implemented" it at the most common sense location: as a unique constraint that throws an exception when the rule is violated.
Other rules such as "You're only allowed 100 Foos per day" do not cause errors per-say, since they're gracefully handled by custom code such as return empty dataset that the application code checks for and passes back to the ui layer.
And therein lies the rub. Our ui code looks like this (this is AJAX.NET webservices code, but any ajax framework will do):
WebService.AddFoo("foo", onComplete, onError); // ajax call to web service
function onComplete(newFooId) {
if(!newFooId) {
alert('You reached your max number of Foos for the day')
return
}
// update ui as normal here
}
function onError(e) {
if(e.get_message().indexOf('duplicate key')) {
alert('A Foo with that name already exists');
return;
}
// REAL error handling code here
}
(As a side note: I notice this is what stackoverflow does when you submit comments too quickly: the server generates a HTTP 500 response and the ui catches it.)
So you see, we are handling business rule violations in two places here, one of which (ie the unique constaint error) is being handled as a special case to the code that is supposed to handle real errors (not business rule violations), since .NET propagates Exceptions all the way up to the onError() handler.
This feels wrong. My options I think are:
catch the 'duplicate key violation' exception at the app server level and convert it to whatever it is the ui expects as the "business rule violated" flag,
preempt the error (say, with a "select name from Foo where name = #Name") and return whatever it is the app server expects as the "business rule violated" flag,
in the same ballpark as 2): leverage the unique constraint built into the db layer and blindly insert into Foo, catching any exceptions and convert it to whatever it is the app server expects as the "business rule violated" flag
blindly insert into Foo (like 3) and let that Exception propagate to the ui, plus have the app server raise business rule violations as real Exceptions (as opposed to 1). This way ALL errors are handled in the ui layer's onError() (or similar) code.
What I like about 2) and 3) is that the business rule violations are "thrown" where they are implemented: in the stored proc. What I don't like about 1) and 3) is I think they involve stupid checks like "if error.IndexOf('duplicate key')", just like what is in the ui layer currently.
Edit: I like 4), but most people say to use Exceptions only in exceptional circumstances.
So, how do you people handle propagating business rule violations up to the ui elegantly?
We don't perform our business logic in the database but we do have all of our validation server-side, with low-level DB CRUD operations separated from higher level business logic and controller code.
What we try to do internally is pass around a validation object with functions like Validation.addError(message,[fieldname]). The various application layers append their validation results on this object and then we call Validation.toJson() to produce a result that looks like this:
{
success:false,
general_message:"You have reached your max number of Foos for the day",
errors:{
last_name:"This field is required",
mrn:"Either SSN or MRN must be entered",
zipcode:"996852 is not in Bernalillo county. Only Bernalillo residents are eligible"
}
}
This can easily be processed client side to display messages related to individual fields as well as general messages.
Regarding constraint violations we use #2, i.e. we check for potential violations before insert/update and append the error to the validation object.
The problem is really one of a limitation in the architecture of your system. By pushing all logic into the database, you need to handle it in two places (as opposed to building a layer of business logic that links the UI with the database. Then again, the minute you have a layer of business logic you lose all the benefits of having logic in stored procs. Not advocating one or the other. The both suck about equally. Or don't suck. Depending on how you look at it.
Where was I?
Right.
I think a combination of 2 and 3 is probably the way to go.
By pre-empting the error you can create a set of procedures that can be called from the UI-facing code to provide detailed implementation-specific feedback to the user. You don't necessarily need to do this with ajax on a field-by-field basis, but you could.
The unique constraints and other rules that are in the database then become the final sanity-check for all data, and can assume that data is good before being sent, and throw Exceptions as a matter of course (the premise being that these procedures should always be called with valid data and therefor invalid data is an Exceptional circumstance).
In defense of #4, SQL Server has a pretty orderly hierarchy of error severity levels predefined. Since as you point out it's well to handle errors where the logic is, I'd be inclined to handle this by convention between the SP and the UI abstraction, rather than adding a bunch of extra coupling. Especially since you can raise errors with both a value and a string.
A stored procedure may use the RAISERROR statement to return error information to the caller. This can be used in a way that permits the user interface to decide how the error will appear, while permitting the stored procedure to provide the details of the error.
RAISERROR can be called with a msg_id, severity and state, and with a set of error arguments. When used this way, a message with the given msg_id must have been entered into the database using the sp_addmessage system stored procedure. This msg_id can be retrieved as the ErrorNumber property in the SqlException that will be raised in the .NET code calling the stored procedure. The user interface can then decide on what sort of message or other indication to display.
The error arguments are substituted into the resulting error message similarly to how the printf statement works in C. However, if you want to just pass the arguments back to the UI so that the UI can decide how to use them, simply make the error messages have no text, just placeholders for the arguments. One message might be '"%s"|%d' to pass back a string argument (in quotes), and a numeric argument. The .NET code could split these apart and use them in the user interface however you like.
RAISERROR can also be used in a TRY CATCH block in the stored procedure. That would allow you to catch the duplicate key error, and replace it with your own error number that means "duplicate key on insert" to your code, and it can include the actual key value(s). Your UI could use this to display "Order number already exists", where "x" was the key value supplied.
I've seen lots of Ajax based applications doing a real-time check on fields such as username (to see if it already exists) as soon as the user leaves the edit box. It seems to me a better approach than leaving to the database to raise an exception based on a db constraint - it is more proactive since you have a real process: get the value, check to see if it is valid, show error if not, allow to continue if no error. So it seems option 2 is a good one.
This is how I do things, though it may not be best for you:
I generally go for the pre-emptive model, though it depends a lot on your application architecture.
For me (in my environment) it makes sense to check for most errors in the middle (business objects) tier. This is where all the other business-specific logic takes place, so I try to keep as much of the rest of my logic here too. I think of the database as somewhere to persist my objects.
When it comes to validation, the easiest errors can be trapped in javascript (formatting, field lengths, etc.), though of course you never assume that those error checks took place. Those errors also get checked in the safer, more controlled world of server-side code.
Business rules (such as "you can only have so many foos per day") get checked in the server-side code, in the business object layer.
Only data rules get checked in the database (referential integrity, unique field constraints, etc.). We pre-empt checking all of these in the middle tier too, to avoid hitting the database unnecessarily.
Thus my database only protects itself against the simple, data-centric rules that it's well equipped to handle; the more variable, business-oriented rules live in the land of objects, rather than the land of records.