What are the semantics of capability manager functions in Pact? - pact-lang

The Pact documentation describes two flavors of capability: unmanaged and managed. Managed capabilities are dynamic and can change the state of a capability as it is brought in and out of scope. A managed capability accomplishes this dynamic behavior with a dedicated capability manager function. The Pact documentation has a short section on manager functions which includes an example of the TRANSFER_mgr manager function.
This documentation is a high-level summary that demonstrates why you would want to use a manager function. However, there is no entry in the Pact documentation on the #managed metadata field or the semantics of the manager function. I am struggling to see how they relate to one another and what arguments each can & should take.
For example, the TRANSFER capability uses #managed like this:
(defcap TRANSFER (sender receiver amount:decimal)
#managed amount TRANSFER_mgr
...)
Which implies, to me, that TRANSFER_mgr will be called with the amount as an argument. But the definition of TRANSFER_mgr has two parameters:
(defun TRANSFER_mgr:decimal (managed:decimal requested:decimal)
What are the semantics of the transfer manager function?
For example:
Where do the parameters come from in TRANSFER_mgr? (Is managed first the amount provided via #managed and then the result of prior calls to TRANSFER_mgr? Is requested the amount provided in calls to with-capability in the module code?)
How many arguments can #managed take? (Is it a single argument, or can there be many? If many, how does that affect the parameters that the manager function takes?)
When is the manager function invoked? (Is it on calls to with-capability for this capability?)

Just one clarification on the difference between managed and unmanaged
capabilities before I answer your questions, since they are a little bit
different from the picture described above:
Capabilities are never "changed": they are only granted by with-capability.
What happens with a managed capability is that, in addition to defining a
capability, it also defines a "resource" that is decreased in some way
whenever the associated capability is granted.
You can think of it like this: stateless capabilities are granted by
with-capability and demanded by require-capability; managed capabilities
setup an initial resource by install-capability (which ought to have a
different name, more like install-resource), deduct from the resource AND
grant by with-capability, and are demanded by require-capability.
Note how install-capability is unique to managed capabilities, while
with-capability does double duty. In a way, with-capability is really two
separate operations composed together in the managed case:
;; You write this:
(install-capability (TRANSFER FROM TO PROVIDED))
...
(with-capability (TRANSFER FROM TO REQUESTED) EXPR)
;; ----
;; But what it does internally is more like this:
(install-capability (TRANSFER FROM TO) PROVIDED)
...
(if (already-granted-p (TRANSFER FROM TO))
EXPR
(consume-resource (TRANSFER FROM TO) REQUESTED
(with-capability (TRANSFER FROM TO) EXPR)))
You can see here that (TRANSFER FROM TO) identifies the capability -- in
both the managed and unmanaged cases. The extra parameter relating to the
resource is what's new in the managed case. The fact that it gets passed as an
argument in (TRANSFER FROM TO AMOUNT) to both install-capability and
with-capability is just a syntactic convenience. Which brings us directly to
your questions...
Where do the parameters come from in TRANSFER_mgr? (Is managed first the
amount provided via #managed and then the result of prior calls to
TRANSFER_mgr? Is requested the amount provided in calls to with-capability
in the module code?)
The #managed keyword identifies the argument referring to the resource
parameter. In the case of TRANSFER, this is the amount argument, as
declared by:
#managed amount TRANSFER_mgr
This also states that TRANSFER_mgr will receive two arguments related to the
amount: The first being the current amount of the resource, and the second
being the proposed amount to be deducted by the call to with.
(defun TRANSFER_mgr:decimal (current:decimal requested:decimal)
For install-capability, the amount argument passed is the initial amount
of the resource. For with-capability, the amount argument is the amount of
resource being requested before the capability can be granted. In that case,
the current amount that is passed as the first argument to the management function
comes from the current state of the Pact evaluator.
How many arguments can #managed take? (Is it a single argument, or can there
be many? If many, how does that affect the parameters that the manager
function takes?)
#managed allows for only a single argument, although that argument could be
a list or an object. For example, you could provide a list of names as the
"resource", and write a management functions that removes names from the list
as they are "used"; or the resource could be an object, if you wanted a single
managed capability to manage multiple resources.
In practice, however, the managed capability feature is used mainly by coin
contracts to govern transfer amounts.
When is the manager function invoked? (Is it on calls to with-capability for
this capability?)
The manager function has the job of both confirming that sufficient resource
exists, and deducting from the resource. It is called whenever
with-capability is used and the capability has not yet been granted.

Related

SCD-2 in data modelling: how do I detect changes?

I know the concept of SCD-2 and I'm trying to improve my skills about it doing some practices.
I have the next scenario/experiment:
I'm calling daily to a rest API to extract information about companies.
In my initial load to the DB everything is new, so everything is very easy.
Next day I call to the same rest API, which might returns the same companies, but some of them might have (or not) some changes (i.e., they changed the size, the profits, the location, ...)
I know SCD-2 might be really simple if the rest API returns just records with changes, but in this case it might returns as well records without changes.
In this scenario, how people detect if the data of a company has changes or not in order to apply SCD-2?, do they compare all the fields?.
Is there any example out there that I can see?
There is no standard SCD-2 nor even a unique concept of it. It is a general term for large number of possible approaches. The only chance is to practice and see what is suitable for your use case.
In any case you must identify the natural key of the dimension and the set of the attributes you want to keep the history.
You may of course make it more complex by the decision to use your own surrogate key.
You mentioned that there are two main types of the interface for the process:
• You get periodically a full set of the dimension data
• You get the “changes only” (aka delta interface)
Paradoxically the former is much simple to handle than the latter.
First of all, in the full dimensional snapshot the natural key holds, contrary to the delta interface (where you may get more changes for one entity).
Additionally you have to handle the case of late change delivery or even the wrong order of changes delivery.
Next important decision is if you expect deletes to occur. This is again trivial in the full interface, you must define some convention, how this information would be passed in the delta interface.
Connected is the question whether a previously deleted entity can be reused (i.e. reappear in the data).
If you support delete/reuse you'll have to thing about how to show them in your dimension table.
In any case you will need some additional columns in the dimension to cover the historical information.
Some implementation use a change_timestamp, some other use validity interval valid_from and valid_to.
Even other implementation claim that additional sequence number is required – so you avoid the trap of more changes with the identical timestamp.
So you see that before you look for some particular implementation you need carefully decide the options above. For example the full and delta interface leads to a completely different implementations.

Need to establish Database connectivity in DRL file

Need to establish Oracle database connectivity in drools to get some data as and when required while executing the rules. How do I go about that?
You shouldn't do this. Instead, you should query your data out of the database first, then pass it into the rules as facts in working memory.
I tried to write a detailed answer about all the reasons you shouldn't do this, but it turns out that StackOverflow has a character limit. So I'm going to give you the high level reasons.
Latency
Data consistency
Lack of DB access hardening
Extreme design constraints for rules
High maintenance burden
Potential security issues
Going in order ...
Latency. Database queries aren't free. Regardless of how good your connection management is, you will incur overhead every time you make a database call. If you have a solid understanding of the Drools execution lifecycle and how it executes rules, and you design your rules to explicitly only query the database in ways that will minimize the number and quantity of calls, you could consider this an OK risk. A good caching layer wouldn't be amiss. Note that having to properly design your rules this way is not trivial, and you'll incur perpetual overhead in having to make sure all of your rules remain compliant.
(Hint: this means you must never ever call the database from the 'when' clause.)
Data consistency. A database is a shared resource. If you make the same query in two different 'when' clauses, there is no guarantee that you'll get back the same result. Again, you could potentially work around this with a deep understanding of how Drools evaluates and executes rules, and designing your rules appropriately. But the same issues from 'latency' will affect you here -- namely the burden of perpetual maintenance. Further the rule design restrictions -- which are quite strict -- will likely make your other rules and use cases less efficient as well because of the contortions you need to pull to keep your database-dependent rules compatible.
Lack of hardening. The Java code you can write in a DRL function is not the same as the Java code you can write in a Java class. DRL files are parsed as strings and then interpreted and then compiled; many language features are simply not available. (Some examples: try-with-resources, annotations, etc.) This makes properly hardening your database access extremely complicated and in some cases impossible. Libraries which rely on annotations like Spring Data are not available to you for use in your DRL functions. You will need to manage your connection pooling, transaction management, connection management (close everything!), error handling, and so on manually using a subset of the Java language that is roughly equivalent to Java 5.
This is, of course, specific to writing your code to access the database as a function in your DRL. If you instead implement your database access in a service which acts like a database access layer, you can leverage the full JDK and its features and functionality in that external service which you then pass into the rules as an input. But in terms of DRL functions, this point remains a major concern.
Rule design constraints. As I mentioned previously, you need to have an in-depth understanding of how Drools evaluates and executes rules in order to write effective rules that interact with the database. If you're not aware that all left hand sides ("when" clauses) are executed first, then the "matches" ordered by salience, and then the right hand sides ("then" clauses) executed in order sequentially .... well you absolutely should not be trying to do this from the rules. Not only do you as the initial implementor need to understand the rules execution lifecycle, but everyone who comes after you who is going to be maintaining your rules needs to also understand this and continue implementing the rules based on these restrictions. This is your high maintenance burden.
As an example, here are two rules. Let's assume that "DataService" is a properly implemented data access layer with all the necessary connection and transaction management, and it is passed into working memory as a fact.
rule "Record Student Tardiness"
when
$svc: DataService() // data access layer
Tardy( $id: studentId )
$student: Student($tardy: tardyCount) from $svc.getStudentById($id)
then
$student.setTardyCount($tardy + 1)
$svc.save($student)
end
rule "Issue Demerit for Excessive Tardiness"
when
$svc: DataService() // data access layer
Tardy( $id: studentId )
$student: Student(tardyCount > 3) from $svc.getStudentById($id)
then
AdminUtils.issueDemerit($student, "excessive tardiness")
end
If you understand how Drools executes rules, you'll quickly realize the problems with these rules. Namely:
we call getStudentById twice (latency, consistency)
the changes to the student's tardy count are not visible to the second rule
So if our student, Alice, has 3 tardies recorded in the database, and we pass in a new Tardy instance for her, the first rule will hit and her tardy count will increment and be saved (Alice will have 4 tardies in the database.) But the second rule will not hit! Because at the time the matches are calculated, Alice only had 3 tardies, and the "issue demerit" rule only triggers for more than 3. So while she has 4 tardies now, she didn't then.
The solution to the second problem is, of course, to call update to let Drools know to reevaluate all matches with the new data in working memory. This of course exacerbates the first issue -- now we'll be calling getStudentById four times!
Finally the last problem are potential security issues. This really depends on how you implement your queries, but you'll need to be doubly sure you're not accidentally exposing any connection configuration (URL, credentials) in your DRLs, and that you've properly sanitized all query inputs to protect yourself against SQL injection.
The right way to do this, of course, is not to do it at all. Call the database first, then pass it to your rules.
As an example, let's say we have a set of rules which is designed to determine if a customer purchase is "suspicious" by comparing it to trends from the previous 3 months' worth of purchases.
// Assume this class serves as our data access layer and does proper connection,
// transaction management. It might be something like a Spring Data JPA repository,
// or something from another library; the specifics are not relevant.
private PurchaseService purchaseService;
public boolean isSuspiciousPurchase(Purchase purchase) {
List<Purchase> previous = purchaseService.getPurchasesForCustomerAfterDate(
purchase.getCustomerId(),
LocalDate.now().minusMonths(3));
KieBase kBase = ...;
KieSession session = kBase.newKieSession();
session.insert(purchase);
session.insert(previous);
// insert other facts as needed
session.fireAllRules();
// ...
}
As you can see, we call the database and pass the result into working memory. Then we can write the rules such that they do work against that existing list, without needing to interact with the database at all.
If our use case requires modifying the database -- eg saving updates -- we can pass those commands back to the caller and they can be invoked after the fireAllRules is completed. Not only will that keep us from having to interact with the database in the rules, but it'll give us better control over our transaction management (you can probably group the updates into a single transaction, even if the originally came from multiple rules). And since we don't need to understand anything about how Drools evaluates and executes rules, it'll be a little more robust in case a rule with a database "update" is triggered twice.
You can use function like below to get details from DB. Here I have written function in DRL file but its suggested to add such code in java file and call specific method from DRL file.
function String ConnectDB(String ConnectionClass,String url,String user, String password) {
Class.forName(ConnectionClass);
java.sql.Connection con = DriverManager.getConnection(url, user, password);
Statement st = con.createStatement();
ResultSet rs = st.executeQuery("select * from Employee where employee_id=199");
rs.first();
return rs.getString("employee_name");
}
rule "DBConnection"
when
person:PersonPojo(name == ConnectDB("com.mysql.jdbc.Driver","jdbc:mysql://localhost:3306/root","root","redhat1!"))
.. ..
then
. . ..
end

How do I correctly use libsodium so that it is compatible between versions?

I'm planning on storing a bunch of records in a file, where each record is then signed with libsodium. However, I would like future versions of my program to be able to check signatures the current version has made, and ideally vice-versa.
For the current version of Sodium, signatures are made using the Ed25519 algorithm. I imagine that the default primitive can change in new versions of Sodium (otherwise libsodium wouldn't expose a way to choose a particular one, I think).
Should I...
Always use the default primitive (i.e. crypto_sign)
Use a specific primitive (i.e. crypto_sign_ed25519)
Do (1), but store the value of sodium_library_version_major() in the file (either in a dedicated 'sodium version' field or a general 'file format revision' field) and quit if the currently running version is lower
Do (3), but also store crypto_sign_primitive()
Do (4), but also store crypto_sign_bytes() and friends
...or should I do something else entirely?
My program will be written in C.
Let's first identify the set of possible problems and then try to solve it. We have some data (a record) and a signature. The signature can be computed with different algorithms. The program can evolve and change its behaviour, the libsodium can also (independently) evolve and change its behaviour. On the signature generation front we have:
crypto_sign(), which uses some default algorithm to produce signatures (at the moment of writing is just invokes crypto_sign_ed25519())
crypto_sign_ed25519(), which produces signatures based on specific ed25519 algorithm
I assume that for one particular algorithm given the same input data and the same key we'll always get the same result, as it's math and any deviation from this rule would make the library completely unusable.
Let's take a look at the two main options:
Using crypto_sign_ed25519() all the time and never changing this. Not that bad of an option, because it's simple and as long as crypto_sign_ed25519() exists in libsodium and is stable in its output you have nothing to worry about with stable fixed-size signature and zero management overhead for this. Of course, in future someone can discover some horrible problem with this algorithm and if you're not prepared to change the algorithm that could mean horrible problem for you.
Using crypto_sign(). With this we suddenly have a lot of problems, because the algorithm can change, so you must store some metadata along with the signature, which opens up a set of questions:
what to store?
should this metadata be record-level or file-level?
What do we have in mentioned functions for the second approach?
sodium_library_version_major() is a function to tell us the library API version. It's not directly related to changes in supported/default algorithms so it's of little use for our problems.
crypto_sign_primitive() is a function that returns a string identifying the algorithm used in crypto_sign(). That's a perfect match for what we need, because supposedly its output will change at exactly the time when the algorithm would change.
crypto_sign_bytes() is a function that returns the size of signature produced by crypto_sign() in bytes. That's useful for determining the amount of storage needed for the signature, but it can easily stay the same if algorithm changes, so it's not the metadata we need to store explicitly.
Now that we know what to store there is a question of processing that stored data. You need to get the algorithm name and use that to invoke matching verification function. Unfortunately, from what I see, libsodium itself doesn't provide any simple way to get the proper function given the algorithm name (like EVP_get_cipherbyname() or EVP_get_digestbyname() in openssl), so you need to make one yourself (which of course should fail for unknown name). And if you have to make one yourself maybe it would be even easier to store some numeric identifier instead of the name from library (more code though).
Now let's get back to file-level vs record-level. To solve that there are another two questions to ask — can you generate new signatures for old records at any given time (is that technically possible, is that allowed by policy) and do you need to append new records to old files?
If you can't generate new signatures for old records or you need to append new records and don't want the performance penalty of signature regeneration, then you don't have much choice and you need to:
have dynamic-size field for your signature
store the algorithm (dynamic string field or internal (for your application) ID) used to generate the signature along with the signature itself
If you can generate new signatures or especially if you don't need to append new records, then you can get away with simpler file-level approach when you store the algorithm used in a special file-level field and, if the signature algorithm changes, regenerate all signatures when saving the file (or use the old one when appending new records, that's also more of a compatibility policy question).
Other options? Well, what's so special about crypto_sign()? It's that its behaviour is not under your control, libsodium developers choose the algorithm for you (no doubt they choose good one), but if you have any versioning information in your file structure (not signature-specific, I mean) nothing prevents you from making your own particular choice and using one algorithm with one file version and another with another (with conversion code when needed, of course). Again, that's also based on the assumption that you can generate new signature and that's allowed by policy.
Which brings us back to the original two choices with question of whether it's worth the trouble of doing all that compared to just using crypto_sign_ed25519(). That mostly depends on your program life span, I'd probably say (just as an opinion) that if that's less than 5 years then it's easier to just use one particular algorithm. If it can easily be more than 10 years, then no, you really need to be able to survive algorithm (and probably even whole crypto library) changes.
Just use the high-level API.
Functions from the high-level API are not going to use a different algorithm without the major version of the library being bumped.
The only breaking change one can expect in libsodium 1.x.y is the removal of deprecated/undocumented functions (that don't even exist in current releases compiled with the --enable-minimal switch). Everything else will remain backward compatible.
New algorithms might be introduced in 1.x.y versions without high-level wrappers, and will be stabilized and exposed via a new high-level API in libsodium 2.
Therefore, do not bother calling crypto_sign_ed25519(). Just use crypto_sign().

Managing system-wide parameters in C

I'm developing a system of many processes which have to be aware of many configurations, options and data of the system. For doing that, I implement a shared object that use a pointer to a shared memory block of parameters and their data. The data of the parameters are the types, the values, default values, functions for get/set and etc. Basically the data is in a kind of look-up table.
This shared object has some functions for get/set these parameters, so all the processes in the system can get/set these many parameters. I have many defines for the parameters codes and many possibilities for each parameter, for example, one code can be a float value, and another is an array of ints. You can only imagine the complexity of the code with all the switch and cases..
My questions are:
Does this practice is correct for handling a system-wide parameters and configurations? for speed and efficiency I don't want to use a DB file, I have to keep the data close in the RAM. I thought about moving the look-up table in-memory DB but the processing time is critical and I don't want to waste time on building SQL statements and compiling them. Any ideas of what is the best way to do it?
Your program design sounds fine, given that he parameters are properly encapsulated in a separate file, declared as static and only accessible through set/get functions. Then the code for accessing the data, as well as any potential thread-safety code, can be placed in the same file and hidden from the caller.
Whenever it makes most sense to keep the parameters in RAM or in a DB really just depends on how fast you need the data available. It sounds like this isn't an option for you, because naturally a DB will be slower to access. It makes more sense to implement a DB if you have multiple clients that need to access the data, but that doesn't seem to be the case here.

pyDatalog: is it possible to define multiple independent datalog sessions?

I'm working on some code that assesses data in a database to see if instances in a stream of incoming events comply with a set of protocols. The idea is to use pyDatalog to do this. Ideally, we would like to be able to assess the data against several independent rule sets, which define separate protocols the events should comply with.
In other words, is it possible to create several logically independent pyDatalog sessions which each have their own sets of rules, but take data from the same underlying database?
Support for multiple rule set is planned for release 0.14, together with thread safety.
With the current and previous releases, you can store the different rule sets in the same pyDatalog session, provided that there is no predicate name conflicts. For example, you could prefix each predicate by an identifier of the rule set it belongs too. Then, by calling the appropriate predicate, you'll activate the relevant rule set, without visible performance loss.
For prefixed predicates (referring to a python class, e.g. Employee.id[X]==Y), you would need to create python subclasses with the appropriate prefix. You could see some performance drop, but that should be small.

Resources