Unique kind of questionnaire - Database design - database

For a research experiment I need to design a web application that implements a particular kind of questionnaire, the results of which will serve to derive some statistics and draw some conclusions.
In short, the questionnaire has to work as follows
1-> All answers are in a scale from absolutely false to absolutely correct or an open answer.
2-> Each set of questions corresponds to a given word (for example BIRD or FISH) and a description (a small sentence).
3-> Within one set, all questions can take the form of either a small sentence (text) or an image (to be categorized) and each set may contain an arbitrary number of questions. I need to be able to store the questions that correspond to one word as both text and images and be able to choose between them.
4-> To each set of question correspond 5 different kinds of help. In the questionnaire type A the user may choose one at will. In questionnaire type B all kinds are shown.
5-> The user has to respond to all the questions once (and they have to appear in a random order). Then, if type A, he has to choose a kind of help or refuse to get help and possibly modify his answer. Or, if type B, see all kinds of help one by one (in a random order) and possibly modify his answer.
6-> For each question, for each type of questionnaire I have to know if an answer was modified, which kind of help caused the user to modify and (if type B) whether this kind of help appeared 1st, 2nd, 3rd etc.
I realize those may not be the most complicated demands, but I am new to this and rather confused. My relations up to now look something like the following
QUESTIONNAIRE(id, help_choice, type, phase)
QUESTION_CATEG(id,type, name, description)
IMAGE(#qcat_id, filepath)
TEXT(#qcat_id, content)
INCLUDES(#questionnaire_id, #qcat_id)
HELP(#id, #qcat_id, content)
ANSWER(#questionnaire_id, #qcat_id, response, was_modified, help_taken, help_order).
With help_taken being able to take special values to denote no-help and help_choice being able to take special values to denote that all help was shown.
What is bothering me is the different types of questions. I don't really like (and I don't it will work) the way I have made the distinction between a text type and an image type question for a given question category. Knowing that for a given category (say BIRD) i may have both types (image and text), I have included a 'type' attribute in QUESTION_CATEG. But I feel like I am repeating information.
Do you have any hints as to how this might be fixed. Or even ideas for a completely different approach. Any help is welcome.

This seems to work.
Q_CATEG(id, name, order, description, included)
QUESTION(id, q_categ_id, type, content, order)
AVAIL_ANSWER(id, question_id, content, order)
HELP_CATEG(id, order, name, description)
HELP(help_categ_id, q_categ_id, order, content)
QUESTIONNAIRE(id, type, phase, start, end)
GIVEN_ANSWER(questionnaire_id, question_id, answer_id, modified_answer_id, reason_answer_id, help_categ_id, help_order)

Related

Algorithm sorting details, but without excluding

I have come across a problem.
I’m not asking for help how to construct what I’m searching for, but only to guide me to what I’m looking for! 😊
The thing I want to create is some sort of ‘Sorting Algorithm/Mechanism’.
Example:
Imagine I have a database with over 1000 pictures of different vehicles.
A person sees a vehicle, he now tries to get as much information and details about that vehicle, such as:
Shape
number of wheels
number and shape of windows
number and shape of light(s)
number and shape of exhaust(s)
Etc…
He then gives me all information about that vehicle he saw. BUT! Without telling me anything about:
Make and model.
…
I will now take that information and tell my database to sort out every vehicle so that it arranges all 1000 vehicle by best match, based by the description it have been given.
But it should NOT exclude any vehicle!
So…
If the person tells me that the vehicle only has 4 wheels, but in reality it has 5 (he might not have seen the fifth wheel) it should just get a bad score in the # of wheels.
But if every other aspect matches that vehicle perfect it will still get a high score.
That way we don’t exclude the vehicle that he has seen, and we still have a change to find the correct vehicle.
The whole aspect of this mechanism is to, as said, sort out the most, so instead of looking through 1000 vehicles we only need to sort through the best matches which is 10 to maybe 50 vehicles out of a 1000 (hopefully).
I tried to describe it the best I could in a language that isn’t ‘my father’s tongue’. So bear with me.
Again, I’m not looking for anybody telling me how to make this algorithm, I’m pretty sure nobody even wants of have the time to do that for me, without getting paid somehow...
But I just need to know where to look regarding learning and understanding how to create this mess of a mechanism.
Kind regards
Gent!
Assuming that all your pictures have been indexed with the relevant fields (number of wheels, window shapes...), and given that they are not too numerous (a thousand is peanuts for a computer), you can proceed as follows:
for every criterion, weight the possible discrepancies (e.g. one wheel too much costs 5, one wheel too few costs 10, bad window shape costs 8...). Make this in a coherent way so that the costs of the criteria are well balanced.
to perform a search, evaluate the total discrepancy cost of every car, and sort the values increasingly. Report the first ten.
Technically, what you are after is called a "nearest neighbor search" in a high dimensional space. This problem has been well studied. There are fast solutions but they are extremely complex, and in your case are absolutely not worth using.
The default way of doing this for example in artificial intelligence is to encode all properties as a vector and applying certain weights to each property. The distance can then be calculated using any metric you like. In your case manhatten-distance should be fine. So in pseudocode:
distance(first_car, second_car):
return abs(first_car.n_wheels - second_car.n_wheels) * wheels_weight+ ... +
abs(first_car.n_windows - second_car.n_windows) * windows_weight
This works fine for simple properties like the number of wheels. For more complex properties like the shape of a window you'll probably need to split it up into multiple attributes depending on your requirements on similarity.
Weights are usually picked in such a way as to normalize all values, if their range is known. Optionally an additional factor can be multiplied to increase the impact of a specific attribute on the overall distance.

Ive got a pipe that consists of 5 pieces, each including 5 properties

Inlet -> front -> middle -> rear -> outlet
Those five properties have a value anything between 4 - 40. Now i want to calculate a specific match for each of those values that is either a full 10 or a 5 when a single property is summed from each pipe piece. There might be hundreds of different pipe pieces all with different properties.
So if i have all 5 pieces and when summed, their properties go like 54,51,23,71,37. That is not good and not what im looking.
Instead 55,50,25,70,40. That would be perfect.
My trouble is there are so many of the pieces that it would be insane to do the miss'matching manually, and new ones come up frequently.
I have manually inserted about 100 of these already into SQLite, but should be easy to convert into any excel or other database formats, so answer can be related to anything like mysql or googlesheets.
I need the calculation that takes every piece in account and results either in "no match" or tells me the id of each piece that is required for a match and if multiple matches are available, it separates them.
Edit: Even just the math needed to do this kind of calculation would be a lot of help here, not much of a math guy myself. I guess there should be a reference piece i need to use and then that gets checked against every possible scenario.
If the value you want to verify is in A1, use: =ROUND(A1/5,0)*5
If the pipes may not be shorter than the given values, use =CEILING(A1,5)

Practical differences between [] and null

For a numeric field, null can mean something very different than 0. For example, at a restaurant, someone's "Tip" could be $0, or it could be null, meaning that the bill was sent to the table but the patron has not signed the bill yet.
What would be some practical differences between [] and null? The only differences I can think of are at an integrity or storage level, but I'm having trouble thinking up real-world examples where data might have a different purpose using one vs. the other. What are some real-world examples for this difference?
The use of null for arrays is useful to denote that the information is unknown, as opposed to there being no members of the array.
For example:
In an application collecting information on real estate, one of the fields on a property might be an array of buildings. Null would mean that the person entering the information didn't specify the presence of lack of buildings (aka they don't know the information or don't want to say), while an empty array would mean that the property actually has no buildings (aka an empty lot).
This is particularly useful when collecting information from incomplete sources
Think of a paper towel roll. A full paper towel roll is equivalent to some array with data inside. An empty paper towel roll with just the card board is []. No paper towel roll at all is null.
If you consider an example of file & folder. An empty folder is [] and if it has some files then it like data inside an array. If there is no folder then it is NULL.
Also, have seen sometimes that user gets confused between ' ' (space) and null. Both are different. ' ' has (atleast) space in it and null means nothing.

Pattern to handle "dynamic enums" that a user could modify (programming language agnostic)

I always face this problem and every time I face this I find a different solution that doesn't satisfact me. This situation doesn't fit a specific language, but involves a database.
Imagine that you have a situation, in a management application, where the user (not a programmer so) can fill a table which rapresents the programming skill level of it's employees. Than he can assign those values to emplyees in emplyees table.
So the situation in the database is this one:
Table: ProgrammingSkillLevel
- ID
- Name
- Value
Table: Employees
- ID
- SomeUselessData
- ProgrammingSkillLevelID
In a programming fashion I'll usualy do this as an enum (I'll use C#):
enum ProgrammingSkillLevel
{
Starter = 0,
Medium,
Advanced,
}
The advantage of this approach is using Names instead of values, infact if we change the enum into something like this:
enum ProgrammingSkillLevel
{
Starter = 0,
Medium,
Good,
Advanced,
}
We won't have any problems because we use names.
In my database approach I'm using IDs to avoid some byte wastes (I know there is enum type, but the user should fill the table, not me!), not names, so the advantage is definitely lost.
What approach should I use to allow the user to generate what I call "dynamic enums"? Should I use names?
I would like to find a good pattern that will apply to all my projects possibly, because I find this situation very often.
It seems like the question contains the answer,
I didn't find a better way to make this through the net, so I'll mark this as an answer.

general database modeling and django specific modeling

I'm wondering what is the best way to model something like the following.
Lets say my company sells metal bars (parameters/fields are: length, profile_type, quantity etc.) of different profiles, where profiles may be pipe(pipe_diameter, wall_thickness) or hollow_rectangle(base, height, wall_thickness), or maybe some other profile with different parameters. Lets say maximum number of profiles would be 12, each profile having between 2-5 parameters.
Should everything be in a single table like
table_bars: id, length, quantity, profile_type, pipe_diameter, wall_thickness, base, height, etc.) where profile type would be (pipe, rectangle etc.)
or should every shape have its own table with its own parameters and in table_bars keep only id, length, quantity profile_type and profile_id)
and are there any django specific issues is multiple tables are the best answer?
Thanks
This is a classic issue and comes down to your own judgment. I had a case like this many years ago and went for generalizing the parameters, in one table. Then I made a second table that contained descriptions for the parameters based on the profile, and flags for which were required, optional, or ignored.
You various parameters might be:
Lentgh: Universal for all, no?
WidthOuter1: might be height of square or rectangle, radius of pipe, etc.
WidthOuter2: ignored for pipe and square, required for rectangle
WidthInner1: ignored for solid objects, required for hollow. Treated just like WidthOuter1, ie., radius for pipes, dimension of square, first dimension for rectangles
WidthInner2: same ideas as WidthOther2 and Widthinner1
...And perhaps your other properties would yield to the same treatment.

Resources