Best way to store incomplete data in a relational model?

Best way to store incomplete data in a relational model? - sql-server

I am writing a web app that takes user input from a series of forms (think "wizard") and stores it in a SQL database as the user progresses.
Let's say they are submitting an application form on behalf of a company that has an address. The application form will also have a list of supporting info.
So the simplified schema will have 4 tables (note that each table will have plenty of other fields to capture):
**ApplicationForm**
Id: PK
CompanyId: FK
**Company**
Id: PK
Address: FK
**Address**
Id: PK
**SupportingInfo**
Id: PK
ApplicationFormId: FK
A user first starts an application form, fills in some important details over a couple of screens, then adds the company details, then adds the company address, and finally adds the list of supporting info.
Given that I cannot create the entire data model in one insert, how do I go about saving the users' input as they progress through the web pages?
When I save the ApplicationForm, I will have a FK violation because there is no Company. When I create the Company, I will have a FK violation because the Address only gets saved later. (Supporting Info does not have this problem because it has a many-to-one relation and the FK will always be valid.)
I have thought of the following:
Do I disable constraint checks?
Or allow the FKs to be null?
Or create "dummy" entries in the related tables until the user enters their details?
Or create separate data models data entry and persistance)?
Or save "in progress" data as JSON or XML?
All the options have significant drawbacks, and the last two in particular seem like overkill for a simple user entry app.
I am working with Entity Framework and SQL Server if that makes any difference to the answers. (I realise that NoSQL may be a suggestion here, but I can't change the database.)

Related

Oracle APEX - Data Modeling & Primary Keys

I'm creating a rather large APEX application which allows managers to go in and record statistics for associates in the company. Currently we have a database in oracle with data from AD which hold all the associates information. Name, Manager, Employee ID, etc.
Now I'm responsible for creating and modeling a table that will house all their stats for each employee. The table I have created has over 90+ columns in it. Some contain data such as:
Documents Processed
Calls Received
Amount of Doc 1 Processed
Amount of Doc 2 Processed
and the list goes on for well over 90 attributes. So here is my question:
When creating this table in my application with so many different columns how would I go about choosing a primary key that's appropriate? Should I link it to our employee table using the employees identification which is unique (each have a associate number)?
Secondly, how can I create these tables (and possibly form) to allow me to associate the statistic I am entering for an individual to the actual individual?
I have ordered two books from amazon on data modeling since I am new to APEX and DBA design. Not a fresh chicken, but new enough to need some guidance. An additional problem I am running into is that each form can have only 60 fields to it. So I had thought about creating tables for different functions out of my 90+ I have.
Thanks

4.2 allows for 200 items per page.
oracle apex component limits
A couple of questions come to mind:
Are you sure that the employee Ids are not recyclable? If these ids are unique and not recycled.. you've found yourself a good primary key.
What do you plan on doing when you decide to add a new metric? Seems like you might have to add a new column to your rather large and likely not normalized table.
I'd recommend a vertical table for your metrics.. you can use oracle's pivot function to make your data appear more like a horizontal table.
If you went this route you would store your employee Id in one column, your metric key in another, and value...
I'd recommend that you create a metric table consisting of a primary key, a metric label, an active indicator, creation timestamp, creation user id, modified timestamp, modified user id.
This metric table will allow you to add new metrics, change the name of the metric, deactivate a metric, and determine who changed what and when.
This would be a much more flexible approach in my opinion. You may also want to think about audit logs.

SQLite: Individual tables per user or one table for them all?

I've already designed a website which uses an SQLite database. Instead of using one large table, I've designed it so that when a user signs up, a individual table is created for them. Each user will possibly use several hundreds of records. I done this because I thought it would be easier to structure and access.
I found on other questions on this site that one table is better than using many tables for each user.
Would it be worth redesigning my site so that instead of having many tables, there would be one large table? The current method of mine seems to work well though it is still in development so I'm not sure how well it would stack up in a real environment.
The question is: Would changing the code so that there is one large database instead of many individual ones be worth it in terms of performance, efficiency, organisation and space?
SQLite: Creating a user's table.
CREATE TABLE " + name + " (id INTEGER PRIMARY KEY, subject TEXT, topic TEXT, questionNumber INTEGER, question TEXT, answer TEXT, color TEXT)
SQLite: Adding an account to the accounts table.
"INSERT INTO accounts (name, email, password, activated) VALUES (?,?,?,?)", (name, email, password, activated,)
Please note that I'm using python with Flask if it makes any difference.
EDIT
I am also aware that there are questions like this already, however none state whether the advantages or disadvantages will be worth it.

In an object oriented language, would you make a class for every user? Or would you have an instance of a class for each user?
Having one table per user is a really bad design.
You can't search messages based on any field that isn't the username. With your current solution, how would you find all messages for a certain questionNumber?
You can't join with the messages tables. You have to make two queries, one to find the table name and one to actually query the table, which requires two round-trips to the database server.
Each user now has their own table schema. On an upgrade, you have to apply your schema migration to every messages table, and God help you if some of the tables are inconsistent with the rest.
It's effectively impossible to have foreign keys pointing to your messages table. You can't specify the table that the foreign key column points to, because it won't be the same.
You can have name conflicts with your current setup. What if someone registers with the username accounts? Admittedly, this is easy to fix by adding a user_ prefix, but still something to keep in mind.
SQL injection vulnerabilities. What if I register a user named lol; DROP TABLE accounts; --? Query parameters, the primary way of preventing such attacks, don't work on table names.
I could go on.
Please merge all of the tables, and read up on database normalization.

Suggestion on DB Design to log user activity in a system

I would like to ask a for an advice for a good database design to log user activity.
Currently I am implementing such approach for a simple website where user can post/edit/delete an article on the website.
Table Logbook
- log_id
- log_change[Enum: new/edit/remove]
- log_date
- member_id
- post_id
Table Post
- post_id
- post_title
- etc....
Table Member
- member_id
- member_username
- member_pwd
- etc..
Using this table, everytime user makes a new post (or edit/remove) of an article it will be logged on the Logbook (along with the time when it happens).
However, what if I am dealing with a larger system where not only user can post an article but do other things such as login/logout (from the system), make a purchase (transaction).
Should I go for different table for each module?. For example, if the system has modules like Posting article, E-commerce, etc.. hence I would have log tables for:
Article Log
E-Commerce Log
Where each table will log activity in each corresponding module.

You could use an entity sub-typing design approach, where common log attributes, like who and when are tracked in a single table for all types of changes. For changes that have additional attributes you can have additional tables, one for each type.
Each of the sub-type log tables reference the common table using a foreign key. Typically the foreign key from the sub-type table to the common table is also the primary key of the sub-type table, i.e. the relationship is 1:1.
In such a design, the common table often includes a column (partitioning attribute) which indicates which sub-type is applicable to each record in the common table.
This approach reduces the amount of code you need to build and maintain your logging system while allowing you to keep your log tables normalized.

Database Tables - To decouple or not?

Is it better to create tables that store a lot of data that are related to an entity (User for example) or many tables to store said data?
For example:
User Table
Name
Email
Subscription Id
Email Notifications
Permissions
Or
User Table
Name
Email
Subscription Table
User ID
Subscription ID
Notification Table
User ID
Receives?
... etc
Please consider code in this as well, or I would have posted to ServerVault.

From a relational design standpoint what is important is the normal form you're aiming for. In general, if the "column" would require multiple values (subscription_id1, subscription_id2, etc) then it is a repeating group, and that would indicate to you that it needs to be moved to a related table. You've provided very general table and column notes, but taking a cue from the fact that you named "Email Notifications" and "Permissions" with plurals, I'm going to assume that those require related tables.

Preferred way to map code with user created database entries

I am trying to work out the best database model for the current setup:
An administrator can create "customer products". This means services/products which customer can attach/subscribe to. The simple cases where the product simply costs a price, or the product subscription should send an e-mail is easy to model in the database.
But how about very specific backend code for a customer product? For example, one product might have very specific code implemented for checking a customer status on a different database. How can I map this relationship in the database so I can turn on/turn off some code based on the product settings.
My intuitive way of handling it would be to have a string column on the CustomerProducts table where a pre-defined set of strings could be set, e.g. "MyCustomCodeHandler", and then the code would check for the existence of this string in order to execute it. But for me it doesn't really feel like a real relationship between the database and code.

Data is data, whereas code is code. I would not recommend storing code in the database.
If you need to allow customers to create product types (in the object-oriented sense of "types") with associated code, I'd choose to deploy that code in the same way you deploy other code.
The custom code may also reference custom data stored in the database. I'd choose to create a dependent table per product subtype, and put the type-specific columns in there. The relationship between this subtype table and the generic product table is one-to-one. That is, the primary key in the subtype table is also a foreign key to the generic product table.