Creating platform on automating supply chain functions with Memgraph - graph-databases

I would like to know is Memgraph a good option to create platform on automating supply chain functions. Something like Procure-to-Pay as all documents within the process has highly connected data like purchase order number which ties up with all other document.

The use case seems very suited to be represented and handled as a graph. All the connections between documents, stakeholders and supply chain links can be easily modeled as a graph. It would also be very easy and computationally efficient to find connections to other documents in the network.
You can also use some built-in features to help automate the process. For example, triggers and query modules:
• Triggers can be used to define events on which a procedure will be triggered. For example, if you need to notify a person about a document once it’s created and added to the database, you can create a trigger that will fire every time a specific document is created and send a notification (email, push notification, etc.) to the person responsible for it or all users who are connected to the document in the database.
• Query modules can be used to create custom procedures and you can implement them in Python, C/C++ and Rust. For example, you can create the aforementioned notification procedure using query modules. It could be a simple Python script that sends an email to the specified email address.
Also, depending on the architecture of your system, if you are using a message broker like Apache Kafka, you can connect it directly to Memgraph and ingest the data instead of having to implement an intermediary service that would instead connect to the database.
I hope that this info can be of help to you. Feel free to ping me if you have any further questions or you need more info

Related

How to provide data isolation/ controlled access over data stored in tables across schemas and databases

I want to provide controlled access to data which is stored in multiple tables. The access is decided based on certain run-time attributes associated with the user. I am looking for a solution which is extensible, performant as well as highly secured.
ILLUSTRATION:
There is a framework level module which stores authorization/access related data for multiple other modules. Then there are n numbers of modules which manage their own life cycle objects. e.g. module Test1 has 1000 instances which are created and stored in its base table. As framework solution I want to protect access to this data by users hence I created a notion of privileges and stored their mapping to user in my own table. Now to provide controlled access to data, my aim is that a user is shown only the objects to which he/she has access to.
Approaches in my mind:
I use oracle database and currently we are using VPD (virtual private database) so here we add a policy on each of the base table of above mentioned modules which firstly evaluates the access of currently logged in user from the privileges given to him and then that data is appended into all the query to each of the base tables of other modules (by default by database itself).
PROS: very efficient and highly secured solution.
CONS: Can not work if the base tables and our current table are in two different schema. May be two different schema in the same database instance can be overcome but some of my integrator systems might be in separate databases altogether.
Design at java layer:
We connect to our DB's through JPA data sources. So I can write a thin layer basically a wrapper of sorts over EntityManager and then replicate what VPD does for me that is firstly get the access related data from my tables then use a monitored query on the table of my integrator and then may be cache the data into a caching server(optimization).
CONS: I want to use it in production system hence want to get it done in the first shot. Want to know any patterns which are already implemented in the industry.
I do not think your solution are flexible enough to work well in a complex scenario like yours. If you have very simple queries, then yes, you can design something like SQL screener at database or "java" level and then just pass all your queries through.
But this is not flexible. As soon as your queries will start to grow complex, improving this query screener will become tremendously difficult since it is not a part of bussiness logic and cannot know the details of your permission system.
I suggest you implement some access checks in your service layer. Service must know for which user it generates or processes the data. Move query generation logic to repositories and have your services call different repository methods depending on user permissions for example. Or just customize repository calls with parameters depending on user permissions.

Observer pattern in Oracle

Can I set hook on changing or adding some rows in table and get notified someway when such event araised? I discover web and only stuck with pipes. But there is no way to get pipe message immediately when it was send. Only periodical tries to receive.
Implementing an Observer pattern from a database should generally be avoided.
Why? It relies on vendor proprietary (non-standard) technology, promotes database vendor lock-in and support risk, and causes a bit of bloat. From an enterprise perspective, if not done in a controlled way, it can look like "skunkworks" - implementing in an unusual way behaviour commonly covered by application and integration patterns and tools. If implemented at a fine-grained level, it can result in tight-coupling to tiny data changes with a huge amount of unpredicted communication and processing, affecting performance. An extra cog in the machine can be an extra breaking point - it might be sensitive to O/S, network, and security configuration or there may be security vulnerabilities in vendor technology.
If you're observing transactional data managed by your app:
implement the Observer pattern in your app. E.g. In Java, CDI and javabeans specs support this directly, and OO custom design as per Gang Of Four book is a perfect solution.
optionally send messages to other apps. Filters/interceptors, MDB messages, CDI events and web services are also useful for notification.
If users are directly modifying master data within the database, then either:
provide a singular admin page within your app to control master data refresh OR
provide a separate master data management app and send messages to dependent apps OR
(best approach) manage master data edits in terms of quality (reviews, testing, etc) and timing (treat same as code change), promote through environments, deploy and refresh data / restart app to a managed shedule
If you're observing transactional data managed by another app (shared database integration) OR you use data-level integration such as ETL to provide your application with data:
try to have data entities written by just one app (read-only by others)
poll staging/ETL control table to understand what/when changes occured OR
use JDBC/ODBC-level proprietary extension for notification or polling, as well mentioned in Alex Poole's answer OR
refactor overlapping data operations from 2 apps into a shared SOA service can either avoid the observation requirement or lift it from a data operation to a higher level SOA/app message
use an ESB or a database adapter to invoke your application for notification or a WS endpoint for bulk data transfer (e.g. Apache Camel, Apache ServiceMix, Mule ESB, Openadaptor)
avoid use of database extension infrastructure such as pipes or advanced queuing
If you use messaging (send or recieve), do so from your application(s). Messaging from the DB is a bit of an antipattern. As a last resort, it is possible to use triggers which invoke web services (http://www.oracle.com/technetwork/developer-tools/jdev/dbcalloutws-howto-084195.html), but great care is required to do this in a very coarse fashion, invoking a business (sub)-process when a set of data changes, rather than crunching fine-grained CRUD type operations. Best to trigger a job and have the job call the web service outside the transaction.
In addition to the other answers, you can look at database change notification. If your application is Java-based there is specific documentation covering JDBC, and similar for .NET here and here; and there's another article here.
You can also look at continuous query notification, which can be used from OCI.
I know link-only answers aren't good but I don't have the experience to write anything up (I have to confess I haven't used either, but I've been meaning to look into DCN for a while now...) and this is too long for a comment *8-)
Within the database itself triggers are what you need. You can run arbitrary PL/SQL when data is inserted, deleted, updated, or any combination thereof.
If you need to have the event propagate outside the database you would need a way to call out to your external application from your PL/SQL trigger. Some possible options are:
DBMS_PIPES - Pipes in Oracle are similar to Unix pipes. One session can write and a separate session can read to transfer information. Also, they are not transactional so you get the message immediately. One drawback is that the API is poll based so I suggest option #2.
Java - PL/SQL can invoke arbitrary Java (assuming you load your class into your database). This opens the door to do just about any type of messaging you'd like including using JMS to push messages to a message queue. Depending on how you implement this you can even have it being transactionally tied the INSERT/UPDATE/DELETE statement itself. The listening application would then just listen to the JMS queue and it wouldn't be tied to the DB publishing the event at all.
Depending on your requirements use triggers or auditing
Look at DBMS_ALERT, DBMS_PIPE or (preferably) AQ (Advanced queuing) it's Oracle's internal messaging system. Oracle's AQ has its own API, but also can treated like Java JMS provider.
There are also techniques like Stream or (XStream) but those are quite complex.

Is OData suitable for multi-tenant LOB application?

I'm working on a cloud-based line of business application. Users can upload documents and other types of object to the application. Users upload quite a number of documents and together there are several million docs stored. I use SQL Server.
Today I have a somewhat-restful-API which allow users to pass in a DocumentSearchQuery entity where they supply keyword together with request sort order and paging info. They get a DocumentSearchResult back which is essentially a sorted collection of references to the actual documents.
I now want to extend the search API to other entity types than documents, and I'm looking into using OData for this. But I get the impression that if I use OData, I will face several problems:
There's no built-in limit on what fields users can query which means that either the perf will depend on if they query a indexed field or not, or I will have to implement my own parsing of incoming OData requests to ensure they only query indexed fields. (Since it's a multi-tenant application and they share physical hardware, slow queries are not really acceptable since those affect other customers)
Whatever I use to access data in the backend needs to support IQueryable. I'm currently using Entity Framework which does this, but i will probably use something else in the future. Which means it's likely that I need to do my own parsing of incoming queries again.
There's no built-in support for limiting what data users can access. I need to validate incoming Odata queries to make sure they access data they actually have permission to access.
I don't think I want to go down the road of manually parsing incoming expression trees to make sure they only try to access data which they have access to. This seems cumbersome.
My question is: Considering the above, is using OData a suitable protocol in a multi-tenant environment where customers write their own clients accessing the entities?
I think it is suitable here. Let me give you some opinions about the problems you think you will face:
There's no built-in limit on what fields users can query which means
that either the perf will depend on if they query a indexed field or
not, or I will have to implement my own parsing of incoming OData
requests to ensure they only query indexed fields. (Since it's a
multi-tenant application and they share physical hardware, slow
queries are not really acceptable since those affect other customers)
True. However you can check for allowed fields in the filter to allow the operation or deny it.
Whatever I use to access data in the backend needs to support
IQueryable. I'm currently using Entity Framework which does this, but
i will probably use something else in the future. Which means it's
likely that I need to do my own parsing of incoming queries again.
Yes, there is a provider for EF. That means if you use something else in the future you will need to write your own provider. If you change EF probably you took a decision to early. I don´t recommend WCF DS in that case.
There's no built-in support for limiting what data users can access. I
need to validate incoming Odata queries to make sure they access data
they actually have permission to access.
There isn´t any out-of-the-box support to do that with WCF Data Services, right. However that is part of the authorization mechanism that you will need to implement anyway. But I have good news for you: do it is pretty easy with QueryInterceptors. simply intercepting the query and, based on the user privileges. This is something you will have to implement it independently the technology you use.
My answer: Considering the above, WCF Data Services is a suitable protocol in a multi-tenant environment where customers write their own clients accessing the entities at least you change EF. And you should have in mind the huge effort it saves to you.

Active Directory lookup failure due to replication delay

We have a third party tool that we use to create AD objects (users and groups). This tool uses ADSI to create the objects, and we do not and cannot specify a DC that it will write to. As such, it might write to DC1 today and DC2 tomorrow. Everything replicates around though, so no worries.
The problem we have is our process for creating groups looks like this:
Issue group create to 3rd party tool.
If success, lookup group object in AD via LDAP calls (this is a Java app) to get the SID. (The third party tool doesn't return this)
The problem is that the Java LDAP calls do specify a DC when performing a lookup. Let's say Java is set to read from DC1. If the third party tool writes to DC2, then the java lookup to DC1 fails to find the group.
the AD replication delay is small, so if we add a 15 second delay between the create and lookup, then it works, but it is a little ugly.
Also, I tried querying all DC's from Java. This works for the above example, but it still has the same basic trouble when we update an attribute on an user or group and immediately try to read it back. Delay seems to be the only working approach, but it seems like there should be a better approach than this.
3d party tools should not be used in this way to update a directory. The eventual consistency model prevents results from being predictable in any meaningful way. The correct procedure is to perform the update (add/mod/delete) in the application code using an ADD, MODIFY, DELETE, or MODIFY DN request with the post-read request control attached. This method is defined by the standards process, and is guaranteed to be predictable if the update worked. Please carefully study the information at "LDAP: Programming Practices" and its accompanying article.

Is there a way to prevent users from doing bulk entries in a Postgresql Database

I have 4 new data entry users who are using a particular GUI to create/update/delete entries in our main database. The "GUI" client allows them to see database records on a map and make modifications there, which is fine and preferred way of doing it.
But lately lot of guys have been accessing local database directly using PGAdmin and running bulk queries (i.e. update, insert, delete,etc) which introduces lot of problems like people updating lot of records without knowing or making mistakes while setting values. It also effects our logging procedures as we are calculating averages and time stamps for reporting purposes which are quite crucial to us.
So is there a way to prevent users from using PGAdmin (please remember lot of these guys are working from home and we do not have access to their machines) and running SQL queries directly in the database.
We still have to give them access to certain tables and allow them to execute sql as long as it's coming through a certain client but deny access to same user when he/she tries to execute a query directly in the db.
The only sane way to control access to your database is converting your db access methods to 3-tier structure. You should build a middleware (maybe some rest API or something alike) and use this API from your app. Database should be hidden behind this middleware, so no direct access is possible. From DB point of view, there are no ways to tell if one database connection is from your app, or from some other tool (pgadmin, simple psql or some custom build client). Your database should be accessible only from trusted hosts and clients should not have access to those hosts.
This is only possible if you use a trick (which might get exploited, too, but maybe your users are not smart enought).
In your client app set some harmless parameter like geqo_pool_size=1001 (if it is 1000 normally).
Now write a trigger that checks if this parameter is set and outputs "No access through PGAdmin" if this parameter is not set like from your app (and the username is not your admin username).
Alternatives: Create a temporary table and check for its existance.
I believe you should block direct access to the database, and set an application to which your clients (humans and software ones) will be able to connect.
Let this application filter and pass only allowed commands.
A great care should be taken in the filtering - I would carefully think whether raw SQL would be allowed at all. Personally, I would design some simplified API, which would make me sure that a hypothetical client-attacker (In God we trust, all others we monitor) would not find a way to sneak with some dangerous modification.
I suppose that from security standpoint your current approach is very unsafe.
You should study advanced pg_hba.conf settings.
this file is the key point for use authorization. Basic settings imply only simple authentification methods like passwords and lists of IP, but you can have some more advanced solution.
GSSAPI
kerberos
SSPI
Radius server
any pam method
So your official client can use a more advanced method, like somthing with a third tier API, some really complex authentification mechanism. Then without using the application it will at least becomes difficult to redo these tasks. If the kerberos key is encrypted in your client, for example.
What you want to do is to REVOKE your users write access, then create a new role with write access, then as this role you CREATE FUNCTION defined as SECURITY DEFINER, which updates the table in a way you allow with integrity checks, then GRANT EXECUTE access to this function for your users.
There is an answer on this topic on ServerFault which references the following blog entry with detailed description.
I believe that using middleware as other answers suggest is an unnecessary overkill in your situation. The above solution does not require for the users to change the way they access the database, just restricts their right to modify the data only through the predefined server side methods.

Resources