Database design, avoiding redundant data on HTML forms

Database design, avoiding redundant data on HTML forms - database

I'm trying to come up with a clean database design for a new project I'm working on. One of the data items I need to store in the database will come from an HTML form:
Q1: "Anticoagulated patient?" [YES, NO]
JavaScript;
(If yes is selected, an additional question is displayed):
Q2: "Type of Anticoagulant" [Warfari, Coumarin, Clopidogrel]
My question is, is it necessary to store the first question's response in the database? To me the data seems redundant. If the type is specified, then it can be assumed that the patient is "anticoagulated".
Once the form is submitted, the form will be accessed at a later point so the data can be ammended and the interface will need to reflect the state of the database. I should still be able to do this without needing to record the first question:
JavaScript;(If Q2 has a value, then the default option should be set to Yes
otherwise it should be set to No)
Q1: "Anticoagulated patient?" [Yes, No]
JavaScript;(Only display if Q1 is set to Yes):
Q2: "Type of Anticoagulant" [Warfari, Coumarin, Clopidogrel]
What are your thoughts on this?

I would say the extra space required to store the response to question 1 will be far outweighed by the amount of extra logic involved in marking Q2 as implying Q1. Keep it simple!

I think the two questions should indeed be handled on the client-side, as you suggested. This will provide a better user experience.
As for the database, don't use two fields, one is more than enough. Note that even if you don't show the second question dynamically, you can still store the answer somewhere else (such as the web server session) and not the back-end database.

I think , you have make right decision on this, but just think about the future and extending the application logic , maybe you need to add another question later then you will have to store the question in db,
Briefly for current situation it is adequate to store the answer.

Related

How do I design a configurable checklist table for my database?

I am working on a database model and one of the areas I need to address is a client configurable checklist table(s). Ideally the client would have a set of predefined checklists that I could then enter as bit columns. Alas, this is not going to happen. The client wants the ability to add and group checklists so they are reusable.
I could go with an EAV type model and use strings for "true/false" but I've been down this road before and would rather not take that journey again.
Any thoughts or suggestions are welcome. Thanks.

"Client Configurable" almost always means EAV. EAV's store data fine... it's just getting the data back out that's the trick.
;-)
The other... somewhat-less objectionable approach is to make generic columns. Answer1 BIT, Answer2 BIT, Answer3...
If you do this, then make a table where you define the real names of the columns so that you could build the correct view for any given check list.
Say the first check list is
Gas in the tank?
Money in the Wallet?
Then store
CheckList_ID Answer_Number Column_Name
1 1 Tank has Gas
1 2 Wallet has Money.
Then the view would be:
SELECT Answer1 "Tank has Gas", Answer2 "Wallet has Money"
FROM Generic_Answer_Table
WHERE CheckList_ID = 1
It obvious how the view would be dynamically generated.
The downside is that if you include 10 columns they can't have more than 10 EVER.
XML is the other possibility and doesn't suffer that issue... however it's not quite as SQL friendly.

I'm thinking XML will be the best way to give them the control they seek along with keeping the check boxes dynamic. Maybe an XML file per page/section that contains all the custom check boxes. Whats good about XML is that its easy to work with and gives you great flexibility. At the database level just save the output of the check boxes as finalized data, just in case the check box source XML changes you will still see the originally selected values saved at the database level.

We're building http://tallyfy.com for such a purpose, although a direct relation between actions and your database table is something new.

Viewing a "log" of new (distinct) events in a database

I have an application which has several unrelated tables in its db. I'll explain by using an "auto-updating" version of the SO homepage as an example, so lets say I have the tables "users", "comments" and "questions".
The homepage client side needs to periodically poll the server, and get a log of all the new "events" that have happened. I.e., I'd like to display (somehow) the new questions, comments and users that have been added to SO since the last poll.
On way would be to simply keep a variable on the client side containing the last index of each of my tables, send it to the server, and have the server send me the new users, comments and questions.
The problem is, what happens when I add a new type of information, say, votes. Now I have to store another variable on the client-side, and the server has to poll another table. And so on, for every new type of information I keep.
I'm looking for a solution that helps me avoid this.
Another problem - say I'd like to see all the "events" that have happened since last time, but sorted according to when they took place.
One direction I had is to have a single "events" table, which contains the info about when each event happened. I can then poll only this table, and get a list of all the new events that have happened. The problem is that each event is pretty different (a new comment has different columns than a new upvote, etc.) So I'm not sure how to implement this, or if this is even a good idea.
Does anybody have any ideas how I can solve this? This seems like something that would come up a lot, but I don't really have much experience with databases, unfortunately.
Thanks!

It sounds to me like you're trying to future proof via database design. While this can be done through something an EVA model I caution against that because the value its adds tend to not be worth the cost.
Instead you should model the database as closely to reality as possible and not how you intend to use it.
Then use SQL to project the data to how you need it. You can do this by statements that will either deliver the meta data that you need
e.g.
Select
Count(ID) , 'Comments' Type
From
Comments
Where
lastUpdate > #InputParamter1#
UNION Select
Count(ID) , 'Questions' Type
From
Questions
Where
lastUpdate > #InputParamter1#
Or (and this doesn't get used Often enough)
Return more than one result set from your database in one go
Select
userid,
ComentText
From
Comments
Where
lastUpdate > #InputParamter1#;
Select
userId,
Questions,
Tags
From
Questions
Where
lastUpdate > #InputParamter1#
That said you will still have to write some code if you add new stuff but it should be limited to updating your sql, adding new containers for your data and then code to display to the end users and then to validate and store it.
Honestly the idea of adding new stuff requiring some work doesn't seem that awful to me.

Names of businesses keyed differently by different people

I have this table
tblStore
with these fields
storeID (autonumber)
storeName
locationOrBranch
and this table
tblPurchased
with these fields
purchasedID
storeID (foreign key)
itemDesc
In the case of stores that have more than one location, there is a problem when two people inadvertently key the same store location differently. For example, take Harrisburg Chevron. On some of its receipts it calls itself Harrisburg Chevron, some just say Chevron at the top, and under that, Harrisburg. One person may key it into tblStore as storeName Chevron, locationoOrBranch Harrisburg. Person2 may key it as storeName Harrisburg Chevron, locationOrBranch Harrisburg. What makes this bad is that the business's name is Harrisburg Chevron. It seems hard to make a rule (that would understandably cover all future opportunities for this error) to prevent people from doing this in the future.
Question 1) I'm thinking as the instances are found, an update query to change all records from one way to the other is the best way to fix it. Is this right?
Questions 2) What would be the best way to have originally set up the db to have avoided this?
Question 3) What can I do to make future after-the-fact corrections easier when this happens?
Thanks.
edit: I do understand that better business practices are the ideal prevention, but for question 2 I'm looking for any tips or tricks that people use that could help. And question 1 and 3 are important to me too.

This is not a database design issue.
This is an issue with the processes around using the database design.
The real question I have is why are users entering in stores ad-hoc? I can think of scenarios, but without knowing your situation it is hard to guess.
The normal solution is that the tblStore table is a lookup table only. Normally users only have access to stores that have already been entered.
Then there is a controlled process to maintain the tblStore table in a consistent manner. Only a few users would have access to this process.
Of course as I alluded to above this is not always possible, so you may need a different solution.
UPDATE:
Question #1: An update script is the best approach. The best way to do this is to have a copy of the database if possible, or a close copy if not, and test the script against this data. Once you have ensured that the script runs correctly, then you can run it against the real data.
If you have transactional integrity you should use that. Use "begin" before running the script and if the number of records is what you expect, and any other tests you devise (perhaps also scripted), then you can "commit"
Do not type in SQL against a live DB.
Question #3: I suggest your first line of attack is to create processes around the creation of new stores, but this may not be wiuthin your ambit.
The second is possibly to get proactive and identify and enter new stores (if this is the problem) before the users in the field need to do so. I don't know if this works inside your scenario.
Lastly if you had a script that merged "store1" into "store2" you can standardise on that as a way of reducing time and errors. You could even possibly build that into an admin only screen that automated merging stores.
That is all I can think of off the top of my head.

There is probably a name for this. Please re-title appropriately

I'm evaluating the idea of building a set of generic database tables that will persist user input. There will then be a secondary process to kick off a workflow and process the input.
The idea is that the notion of saving the initial user input is separate from processing and putting it into the structured schema for a particular application.
An example might be some sort of job application or quiz with open-ended questions. The raw answers will not be super valuable to us for aggregate reporting without some human classification. But, we do want to store the raw input as a historical record.
We may also want the user to be able to partially fill out some information and have it persisted until he returns.
Processing all the input to the point where we can put it into our application-specific data schema may not be possible until we have ALL the data.
Two initial questions:
Assuming this concept has a name, what is it?
Is this a reasonable approach? Why or why not?
Update:
Here's another way to state the idea. The user is sequentially populating fields in a DTO. I (think I) want to save the DTO to disk even in a partially-complete state. Once the user has completed populating the fields, I want to pull out the DTO and process it for structured saving into a table which represents the specific DTO. I can't, however, save a partially complete or (worse) a temporarily incorrect set of input since some of the input really shouldn't be stored as part of the structured record.
My idea is to create some generic way to save any type of DTO and then pull them out for processing in a specific app as needed. So maybe this generic DTO table stores data relating to customer satisfaction surveys right next to questions answered in a new account setup wizard.

You stated:
My idea is to create some generic way to save any type of DTO and then pull them out for processing in a specific app as needed.
I think you're one level-of-abstration off. I would argue that the entire database is fulfilling the role you want a limited set of tables to perform. You could create some kind of complicated storage schema that wouldn't represent the data in any way, and then (slowly and painfully, from the DBMS's perspective) merge and render a view of the data ... but I would suggest that this is an over-engineered solution.
I've written several applications where, because of custom user requirements, a (sometimes significant) portion of the application is dynamic - constructed by the user, from the schema to the business rules. The ones that manufactured their storage schemas by executing statements like CREATE TABLE and ALTER TABLE were, surprisingly, the ones easiest to maintain. They also allow users to create reports in a very straightforward, expected way.

Sounds like you're initially storing the data in a normalized form(generic), and once you have the complete set you are denormalizing it(structured schema).

You might be speaking about Workflow. You might want to check out Windows Workflow.
The concepts of Workflow are that they mirror the processes of real life. That is to say, you make complete a document, but the document is not complete until it has been approved. In your case, that would be 'Data is entered' but unclassified, so it is stored in the database (dehydrated) and a flag is sent up for whoever needs to deal with the issue. It can persist in this state for as long as necessary. Once someone is able to deal with it, the workflow is kicked off again (hydrated) and continues to the next steps.
Here are some SO questions regarding workflows:
This question: "Is it better to have one big workflow or several smaller specific ones?" clears up some of the ways that workflow can be used, and also highlights some issues with it.
John Saunders has a very good breakdown of what workflow is good for in this question.

Best Practices: Storing a workflow state of an item in a database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have a question about best practices regarding how one should approach storing complex workflow states for processing tasks in a database. I've been looking online to no avail, so I figured I'd ask the community what they thought was best.
This question comes out of the same "BoxItem" example I gave in a prior question. This "BoxItem" is being tracked in my system as various tasks are performed on it. The task may take place over several days and with human interaction, so the state of the BoxItem must be persisted. Who did the task (if applicable), and when the task was done must also be tracked.
At first, I approached this by adding three fields to the "BoxItems" table for every human-interactive task that needed to be done:
IsTaskNameComplete
DateTaskNameComplete
UserTaskNameComplete
This worked when the workflow was simple... but now that it has grown to a complex process (> 10 possible human interactions in the flow... about half of which are optional, and may or may not be done for the BoxItem, which resulted in me beginning to add "DoTaskName" fields as well for those optional tasks), I've found that what should've been a simple table now has 40 or so field devoted entirely to the retaining of this state information.
I find myself asking if there isn't a better way to do it... but I'm at a loss.
My first thought was to make a generic "BoxItemTasks" table which defined the tasks that may be done on a given box, but I still would need to save the Date and User information individually, so it didn't really help.
My second thought was that perhaps it didn't matter, and I shouldn't worry if this table has 40 or more fields devoted to state retaining... and maybe I'm just being paranoid. But it feels like that's a lot of information to retain.
Anyways, I'm at a loss as far as what a third option might be, or if one of the two options above is actually reasonable. I can see this workflow potentially getting even more complex in the future, and for each new task I'm going to need to add 3-4 fields just to support the tracking of it... it feels like it's spiraling out of control.
What would you do in this situation?
I should note that this is maintenance of an existing system, one that was built without an ORM, so I can't just leave it up to the ORM to take care of it.
EDIT:
Kev, are you talking about doing something like this:
BoxItems
(PK) BoxItemID
(Other irrelevant stuff)
BoxItemActions
(PK) BoxItemID
(PK) BoxItemTaskID
IsCompleted
DateCompleted
UserCompleted
BoxItemTasks
(PK) TaskType
Description (if even necessary)
Hmm... that would work... it would represent a need to change how I currently approach doing SQL Queries to see which items are in what state, but in the long term something like this looks like it would work better (without having to make a fundamental design change like the Serialization idea represents... though if I had the time, I'd like to do it that way I think.).
So is this what you were mentioning Kin, or am I off on it?
EDIT: Ah, I see your idea as well with the "Last Action" to determine the current state... I like it! I think that might work for me... I might have to change it up a little bit (because at some point tasks happen concurrently), but the idea seems like a good one!
EDIT FINAL: So in summation, if anyone else is looking this up in the future with the same question... it sounds like the serialization approach would be useful if your system has the information pre-loaded into some interface where it's queryable (i.e. not directly calling the database itself, as the ad-hoc system I'm working on does), but if you don't have that, the additional tables idea seems like it should work well! Thank you all for your responses!

If I'm understanding correctly, I would add the BoxItemTasks table (just an enumeration table, right?), then a BoxItemActions table with foreign keys to BoxItems and to BoxItemTasks for what type of task it is. If you want to make it so that a particular task can only be performed once on a particular box item, just make the (Items + Tasks) pair of columns be the primary key of BoxItemActions.
(You laid it out much better than I did, and kudos for correctly interpreting what I was saying. What you wrote is exactly what I was picturing.)
As for determining the current state, you could write a trigger on BoxItemActions that updates a single column BoxItems.LastAction. For concurrent actions, your trigger could just have special cases to decide which action takes recency.

As the previous answer suggested, I would break your table into several.
BoxItemActions, containing a list of actions that the work flow needs to go through, created each time a BoxItem is created. In this table, you can track the detailed dates \ times \ users of when each task was completed.
With this type of application, knowing where the Box is to go next can get quite tricky, so having a 'Map' of the remaining steps for the Box will prove quite helpful. As well, this table can group like crazy, hundreds of rows per box, and it will still be very easy to query.
It also makes it possible to have 'different paths' that can easily be changed. A master data table of 'paths' through the work flow is one solution, where as each box is created, the user has to select which 'path' the box will follow. Or you could set up so that when the user creates the box, they select tasks are required for this particular box. Depends on our business problem.

How about a hybrid of the serialization and the database models. Have an XML document that serves as your master workflow document, containing a node for each step with attributes and elements that detail it's name, order in the process, conditions for whether it's optional or not, etc. Most importantly each step node can have a unique step id.
Then in your database you have a simple two table structure. The BoxItems table stores your basic BoxItem data. Then a BoxItemActions table much like in the solution you marked as the answer.
It's essentially similar to the solution accepted as the answer, but instead of a BoxItemTasks table to store the master list of tasks, you use an XML document that allows for some more flexibility for the actual workflow definition.

For what it's worth, in BizTalk they "dehydrate" long-running message patterns (workflows and the like) by binary serializing them to the database.

I think I would serialize the Workflow object to XML and store in the database with an ID column. It may be more difficult to report on, but it sounds like it may work in your case.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight