table design for workflow with conditions

table design for workflow with conditions - database

I need to design a tables for execution flow [this may not be the right word], each step/flow will be skipped or executed based on the conditions set. user should be able to add multiple step with conditions, so based on the conditions 'next' flow/step will change.
currently we designed table like below, routing table has the link between steps, based on the conditions[runtime data], next steps will be chosen.
is it correct approach are there any standard way to design the same?, here if we want to add any new step , i need to create a new step table, is there anyway we can avoid...in some case there wont be any conditions, it is strighforward flow like from step1 to step3 then end.
Sample DB structure

Don't reinvent the wheel. Using database to implement workflows is a clear antipattern. I would recommend looking into Cadence Workflow that provides much higher level API and includes tons of features that you never would be able to add into your home grown ad-hoc solution.

Related

When to implement soft delete logic in the code over the database?

When I want to soft delete resources as a policy of my company I can do it in one of two places.
I can do it in my database with some "instead of DELETE" trigger. Like so:
CREATE TRIGGER prevent_resource_delete
BEFORE DELETE ON resource
FOR EACH ROW EXECUTE PROCEDURE resource_soft_delete();
CREATE FUNCTION resource_soft_delete() RETURNS trigger
LANGUAGE plpgsql AS
$$
BEGIN
UPDATE resource SET deleted_at = now() WHERE id = OLD.id;
RETURN NULL;
END;
$$;
That's how pretty much every article about soft deletes suggests to do it. Other than articles written specifically by a ORM owner because they have their in-house solution.
I like this approach. The logic in my APIs looks like I am just deleting the resource.
Resource.query().deleteById(id); // Using a query builder
db.query('DELETE FROM resource WHERE id = $1;', [id]); // Using native library
To me it seems more natural and I don't have to worry about other developers accidentally hard deleting stuff. But it can also be confusing to those who don't know what is actually going on. And having any logic in the database means I can have bugs there (soft deleting logic is usually dead simple, but still...), which would be hard to debug. At least compared to those in my APIs.
But also I can instead have the logic in the APIs themselves. Keeping logic next to the other logic. Less elegant but more straightforward. No hidden logic somewhere else. I do lose the protection from people accidentally hard deleting resources.
Resource.query().findById(id).patch({deleted_at: new Date()}); // Using a query builder
db.query('UPDATE resource SET deleted_at = now() WHERE id = $1;', [id]); // Using native library
I am inclined to choose the former option as I consider the choice of whether to soft delete a database matter. The database chooses what to do with deleted data. Deleted data, soft or hard, is in principle not part of the application anymore. The APIs can't retrieve it. It is for me, the developer, to use for analytics, legal reasons or to manually aid a user who wants to recover something he/she considers lost.
But I don't like the downsides. I just talked to a colleague that was worried because he thought we were actually deleting stuff. Now, that could actually be solved with better onboarding and documentation. But should it be like that?
When to implement soft delete logic in the code over the database? Why does every article I find directly suggest the database without even considering the code? It looks like there is a strong reason I can't find.

As per me there isn't any strong reason, it depends on the architect and developer where they decide to put the logic, but below could be the possible reasons behind it ::
First is, as we are deleting something from the DB, so keeping the logic where it's best suited and,
Second writing the logic for each and every API is kind of redundant instead doing it in DB once and for all tables or nodes or collections is of less work to do. :)

Recurring Events Database Model

I've being searching for a solution for recurring events, so far I've found two approaches:
First approach:
Create an instance for each event, so if the user has a daily event for one year, it would be necessary 365 rows in the table.
It sounds plausible for a fixed time frame, but how to deal with events that has no end date?
Second approach:
Create a Reccuring pattern table that creates future events on runtime using some kind of Temporal expression (Martin Fowler).
Is there any reason to not choose the first approach instead of the second one?
The first approach is going to overpopulate the database and maybe affect performance, right?!
There's a quote about the approach number 1 that says:
"Storing recurring events as individual rows is a recipe for disaster." (https://github.com/bmoeskau/Extensible/blob/master/recurrence-overview.md)
What do you guys think about it? I would like some insights on why that would be a disaster.
I appreaciate your help

The proper answer is really both, and not either or.
Setting aside for a moment the issue of no end date for recurrence: what you want is a header that contains recurrence rules for the whole pattern. That way if you need to change the pattern, you've captured that pattern in a single record that can be edited without risking update anomalies.
Now, joining against some kind of recurrence pattern in SQL is going to be a great big pain in the neck. Furthermore, what if your rules allow you to tweak (edit, or even delete) specific instances of this recurrence pattern?
How do you handle this? You have to create an instance table with one row per recurring instance with a link (foreign key) back to the single rule that was used to create it. This let's you modify an individual child without losing sight of where it came from in case you need to edit (or delete) the entire pattern.
Consider a calendaring tool like Outlook or Google Calendar. These applications use this approach. You can move or edit an instance. You can also change the whole series. The apps ask you which you mean to do whenever you go into an editing mode.
There are some limitations to this. For example, if you edit an instance and then edit the pattern, you need to have a rule that says either (a) new parent wins or (b) modified children always win. I think Outlook and Google Calendar use approach (a).
As for why having each instance recorded explicitly, the only disastrous thing I can think of would be that if you didn't have the link back to the original recurrence pattern you would have a heck of a time cancelling the whole series in one action.
Back to no end date - This might be a case of discretion being the better part of valour and using some kind of rule of thumb that imposes a practical limit on how far into the future you extend such a series - or alternatively you could just not allow that kind of rule in a pattern. Force an end to the pattern and let the rule's creator worry about extending it at whatever future point it becomes necessary.

Store the calendar's event as a rule rather than just as a materialized event.
Storing recurring event materialized as a row is a recipe for disaster for the apparent reason, that the materialization will ideally be of infinite length. Since endless length table is not possible, the developer will try to mimic that behavior using some clever, incomprehensive trick - resulting in erratic behavior of the application.
My suggestion: Store the rules and materialize them and add as rows, only when queried - leading to a hybrid approach.
So you will have two tables store your information, first for storing rules, second, for storing rows materialized from any rule in the rules' table.
The general guidelines can be:
For a one-time event, add a row to the second table.
For a recurring event, add a row to the first table and materialize some of into the second table.
For a query about a future date, materialize the rules and save them in the second table.
For a modification of a specific instance of a recurring event, materialize the event up till the instance you want to modify, and then modify the last instance and store it.
Further, if the event is too far in the future, do not materialize it. Instead save it as a rule also and execute it later when the time arrives.
Plain tables will not be enough to store what you are trying to save. Keeping this kind of information in the database is best maintained when supported with Stored Procedures for access and modifications.

from the answers in the blog post and answers here:
1- eat DB storage and memory with these recurrences (with no need) , with the extreme case of "no-end date"
2- impact performance (for query / join / update / ...)
3- in case of update (or generally in any case you need to handle the recurrence set as a set not as individual occurrences) , you will need to update all rows

How to divide an Entity with hundreds of fields?

I'd like suggestions for the design of a CRUD business app using Silverlight 4, the Business Application Template, WCF RIA Services and the Entity Framework 4. The app tracks lab test results performed on material samples. It replaces a (difficult to maintain) existing web application. Lab tests results are stored in two "SampleData" tables made up of hundreds of fields. The tables have a one to one relationship. I combined the two tables into one using Entity Framework's Table Per Type Inheritance which I'm very happy with. Note: I've decided not to change the database design to avoid destroying the existing application, but it was considered.
My dilemma is how to break up this huge table. Each record represents a material sample that is tested. The logical grouping of fields is by lab test. I envision my UI having multiple tabs or separate pages - one for each test. The problem at this point is that I'm sucking in ALL the fields yet only displaying a few in a paged DataGrid and there is a noticeable delay. Instead of one giant entity it might be nice to have several "Lab Test" entities (each representing a type of test) that are sub-sets of my one giant TPT Inheritance table. How would I do this? The base SampleData table/entity contains header fields plus several child test results fields. The second derived table/entity contains more test result fields linked to the base by SampleID. If split up I'd need to maintain the header info with each Lab Test entity.
I'm willing to stick with one giant table/entity (despite a slight performance penalty). Still, I'm wondering the best way to create my UI with this one entity. Can a DataForm be tabbed? If I make a dashboard with links to lab tests how do I keep header info in sync with each test page?
I know this is a broad question. I'm hoping to get suggestions on a good design path that will allow me to grow the app as new lab tests are added (making an even bigger entity). I'd hope to find a path that simplifies maintenance and takes advantage of the RAD experience Microsoft is promoting.
Thanks in advance!

I scanned the post discussing the database design and must say that based on what you said and the fact that you've already got users asking for more tests (repeating values) that I wish you'd reconsider the db redesign. You can create a flat view to simulate the existing flat samples-data table and use that to minimize breakage in the existing application.
But you've already made that decision, so how about reversing the situation? Instead of fixing the database, add code to the domain service that transforms the data from it's flat layout, leaving out all the null values.
One idea is to write a view that un-flattens the data and leaving out the null no-test situations. The query will raise eyebrows (I'll probably get flamed for this) because it looks nasty but in reality the DBMS does a fine job optimizing and performing the query (in Oracle anyway). I've had great results making a view something like::
create view programmer_exp_unflat as (
select programmer_id, 'C#', csharp_yrs from programmer_exp_flat where csharp_yrs is not null
union
select programmer_id, 'Java', java_yrs from programmer_exp_flat where java_yrs is not null
union
select programmer_id, 'Cobol', cobol_yrs from programmer_exp_flat where cobol_yrs is not null
.
repeat xx times) from dual
It's backwards and ugly no matter how you look at it but it reduces your result set to a bare minimum and no need to break things into categories. New test values require modification of the view, and depending on UI flexibility and business rules, might not require any changes.
It makes coding at the UI more difficult, as it would have been with the right database design in the first place, but your query result is reduced to only the tests that had been completed. If your users are flexible the UI could be designed to show the test results as a list making display a piece of cake. Your current design pretty much forces you to modify the UI and database with each and every new test.
These are the type challenges that make being a developer so much fun -- and why all the marketing gimmick sample CRUD applications that can be built in five minutes are worthless in the real world.

I'm answering (and accepting) my own question to increase my stack overflow accept rate, but my "answer" is that I have found no answer yet. Because I've had to move on with the project I continue to use one giant entity. I've also moved away from Silverlight and turned the project into a WPF app due to various struggles with Silverlight such as inherent asynchronous data access.

There is probably a name for this. Please re-title appropriately

I'm evaluating the idea of building a set of generic database tables that will persist user input. There will then be a secondary process to kick off a workflow and process the input.
The idea is that the notion of saving the initial user input is separate from processing and putting it into the structured schema for a particular application.
An example might be some sort of job application or quiz with open-ended questions. The raw answers will not be super valuable to us for aggregate reporting without some human classification. But, we do want to store the raw input as a historical record.
We may also want the user to be able to partially fill out some information and have it persisted until he returns.
Processing all the input to the point where we can put it into our application-specific data schema may not be possible until we have ALL the data.
Two initial questions:
Assuming this concept has a name, what is it?
Is this a reasonable approach? Why or why not?
Update:
Here's another way to state the idea. The user is sequentially populating fields in a DTO. I (think I) want to save the DTO to disk even in a partially-complete state. Once the user has completed populating the fields, I want to pull out the DTO and process it for structured saving into a table which represents the specific DTO. I can't, however, save a partially complete or (worse) a temporarily incorrect set of input since some of the input really shouldn't be stored as part of the structured record.
My idea is to create some generic way to save any type of DTO and then pull them out for processing in a specific app as needed. So maybe this generic DTO table stores data relating to customer satisfaction surveys right next to questions answered in a new account setup wizard.

You stated:
My idea is to create some generic way to save any type of DTO and then pull them out for processing in a specific app as needed.
I think you're one level-of-abstration off. I would argue that the entire database is fulfilling the role you want a limited set of tables to perform. You could create some kind of complicated storage schema that wouldn't represent the data in any way, and then (slowly and painfully, from the DBMS's perspective) merge and render a view of the data ... but I would suggest that this is an over-engineered solution.
I've written several applications where, because of custom user requirements, a (sometimes significant) portion of the application is dynamic - constructed by the user, from the schema to the business rules. The ones that manufactured their storage schemas by executing statements like CREATE TABLE and ALTER TABLE were, surprisingly, the ones easiest to maintain. They also allow users to create reports in a very straightforward, expected way.

Sounds like you're initially storing the data in a normalized form(generic), and once you have the complete set you are denormalizing it(structured schema).

You might be speaking about Workflow. You might want to check out Windows Workflow.
The concepts of Workflow are that they mirror the processes of real life. That is to say, you make complete a document, but the document is not complete until it has been approved. In your case, that would be 'Data is entered' but unclassified, so it is stored in the database (dehydrated) and a flag is sent up for whoever needs to deal with the issue. It can persist in this state for as long as necessary. Once someone is able to deal with it, the workflow is kicked off again (hydrated) and continues to the next steps.
Here are some SO questions regarding workflows:
This question: "Is it better to have one big workflow or several smaller specific ones?" clears up some of the ways that workflow can be used, and also highlights some issues with it.
John Saunders has a very good breakdown of what workflow is good for in this question.

Best Practices: Storing a workflow state of an item in a database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have a question about best practices regarding how one should approach storing complex workflow states for processing tasks in a database. I've been looking online to no avail, so I figured I'd ask the community what they thought was best.
This question comes out of the same "BoxItem" example I gave in a prior question. This "BoxItem" is being tracked in my system as various tasks are performed on it. The task may take place over several days and with human interaction, so the state of the BoxItem must be persisted. Who did the task (if applicable), and when the task was done must also be tracked.
At first, I approached this by adding three fields to the "BoxItems" table for every human-interactive task that needed to be done:
IsTaskNameComplete
DateTaskNameComplete
UserTaskNameComplete
This worked when the workflow was simple... but now that it has grown to a complex process (> 10 possible human interactions in the flow... about half of which are optional, and may or may not be done for the BoxItem, which resulted in me beginning to add "DoTaskName" fields as well for those optional tasks), I've found that what should've been a simple table now has 40 or so field devoted entirely to the retaining of this state information.
I find myself asking if there isn't a better way to do it... but I'm at a loss.
My first thought was to make a generic "BoxItemTasks" table which defined the tasks that may be done on a given box, but I still would need to save the Date and User information individually, so it didn't really help.
My second thought was that perhaps it didn't matter, and I shouldn't worry if this table has 40 or more fields devoted to state retaining... and maybe I'm just being paranoid. But it feels like that's a lot of information to retain.
Anyways, I'm at a loss as far as what a third option might be, or if one of the two options above is actually reasonable. I can see this workflow potentially getting even more complex in the future, and for each new task I'm going to need to add 3-4 fields just to support the tracking of it... it feels like it's spiraling out of control.
What would you do in this situation?
I should note that this is maintenance of an existing system, one that was built without an ORM, so I can't just leave it up to the ORM to take care of it.
EDIT:
Kev, are you talking about doing something like this:
BoxItems
(PK) BoxItemID
(Other irrelevant stuff)
BoxItemActions
(PK) BoxItemID
(PK) BoxItemTaskID
IsCompleted
DateCompleted
UserCompleted
BoxItemTasks
(PK) TaskType
Description (if even necessary)
Hmm... that would work... it would represent a need to change how I currently approach doing SQL Queries to see which items are in what state, but in the long term something like this looks like it would work better (without having to make a fundamental design change like the Serialization idea represents... though if I had the time, I'd like to do it that way I think.).
So is this what you were mentioning Kin, or am I off on it?
EDIT: Ah, I see your idea as well with the "Last Action" to determine the current state... I like it! I think that might work for me... I might have to change it up a little bit (because at some point tasks happen concurrently), but the idea seems like a good one!
EDIT FINAL: So in summation, if anyone else is looking this up in the future with the same question... it sounds like the serialization approach would be useful if your system has the information pre-loaded into some interface where it's queryable (i.e. not directly calling the database itself, as the ad-hoc system I'm working on does), but if you don't have that, the additional tables idea seems like it should work well! Thank you all for your responses!

If I'm understanding correctly, I would add the BoxItemTasks table (just an enumeration table, right?), then a BoxItemActions table with foreign keys to BoxItems and to BoxItemTasks for what type of task it is. If you want to make it so that a particular task can only be performed once on a particular box item, just make the (Items + Tasks) pair of columns be the primary key of BoxItemActions.
(You laid it out much better than I did, and kudos for correctly interpreting what I was saying. What you wrote is exactly what I was picturing.)
As for determining the current state, you could write a trigger on BoxItemActions that updates a single column BoxItems.LastAction. For concurrent actions, your trigger could just have special cases to decide which action takes recency.

As the previous answer suggested, I would break your table into several.
BoxItemActions, containing a list of actions that the work flow needs to go through, created each time a BoxItem is created. In this table, you can track the detailed dates \ times \ users of when each task was completed.
With this type of application, knowing where the Box is to go next can get quite tricky, so having a 'Map' of the remaining steps for the Box will prove quite helpful. As well, this table can group like crazy, hundreds of rows per box, and it will still be very easy to query.
It also makes it possible to have 'different paths' that can easily be changed. A master data table of 'paths' through the work flow is one solution, where as each box is created, the user has to select which 'path' the box will follow. Or you could set up so that when the user creates the box, they select tasks are required for this particular box. Depends on our business problem.

How about a hybrid of the serialization and the database models. Have an XML document that serves as your master workflow document, containing a node for each step with attributes and elements that detail it's name, order in the process, conditions for whether it's optional or not, etc. Most importantly each step node can have a unique step id.
Then in your database you have a simple two table structure. The BoxItems table stores your basic BoxItem data. Then a BoxItemActions table much like in the solution you marked as the answer.
It's essentially similar to the solution accepted as the answer, but instead of a BoxItemTasks table to store the master list of tasks, you use an XML document that allows for some more flexibility for the actual workflow definition.

For what it's worth, in BizTalk they "dehydrate" long-running message patterns (workflows and the like) by binary serializing them to the database.

I think I would serialize the Workflow object to XML and store in the database with an ID column. It may be more difficult to report on, but it sounds like it may work in your case.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight