general database modeling and django specific modeling - django-models

I'm wondering what is the best way to model something like the following.
Lets say my company sells metal bars (parameters/fields are: length, profile_type, quantity etc.) of different profiles, where profiles may be pipe(pipe_diameter, wall_thickness) or hollow_rectangle(base, height, wall_thickness), or maybe some other profile with different parameters. Lets say maximum number of profiles would be 12, each profile having between 2-5 parameters.
Should everything be in a single table like
table_bars: id, length, quantity, profile_type, pipe_diameter, wall_thickness, base, height, etc.) where profile type would be (pipe, rectangle etc.)
or should every shape have its own table with its own parameters and in table_bars keep only id, length, quantity profile_type and profile_id)
and are there any django specific issues is multiple tables are the best answer?
Thanks

This is a classic issue and comes down to your own judgment. I had a case like this many years ago and went for generalizing the parameters, in one table. Then I made a second table that contained descriptions for the parameters based on the profile, and flags for which were required, optional, or ignored.
You various parameters might be:
Lentgh: Universal for all, no?
WidthOuter1: might be height of square or rectangle, radius of pipe, etc.
WidthOuter2: ignored for pipe and square, required for rectangle
WidthInner1: ignored for solid objects, required for hollow. Treated just like WidthOuter1, ie., radius for pipes, dimension of square, first dimension for rectangles
WidthInner2: same ideas as WidthOther2 and Widthinner1
...And perhaps your other properties would yield to the same treatment.

Related

Algorithm sorting details, but without excluding

I have come across a problem.
I’m not asking for help how to construct what I’m searching for, but only to guide me to what I’m looking for! 😊
The thing I want to create is some sort of ‘Sorting Algorithm/Mechanism’.
Example:
Imagine I have a database with over 1000 pictures of different vehicles.
A person sees a vehicle, he now tries to get as much information and details about that vehicle, such as:
Shape
number of wheels
number and shape of windows
number and shape of light(s)
number and shape of exhaust(s)
Etc…
He then gives me all information about that vehicle he saw. BUT! Without telling me anything about:
Make and model.
…
I will now take that information and tell my database to sort out every vehicle so that it arranges all 1000 vehicle by best match, based by the description it have been given.
But it should NOT exclude any vehicle!
So…
If the person tells me that the vehicle only has 4 wheels, but in reality it has 5 (he might not have seen the fifth wheel) it should just get a bad score in the # of wheels.
But if every other aspect matches that vehicle perfect it will still get a high score.
That way we don’t exclude the vehicle that he has seen, and we still have a change to find the correct vehicle.
The whole aspect of this mechanism is to, as said, sort out the most, so instead of looking through 1000 vehicles we only need to sort through the best matches which is 10 to maybe 50 vehicles out of a 1000 (hopefully).
I tried to describe it the best I could in a language that isn’t ‘my father’s tongue’. So bear with me.
Again, I’m not looking for anybody telling me how to make this algorithm, I’m pretty sure nobody even wants of have the time to do that for me, without getting paid somehow...
But I just need to know where to look regarding learning and understanding how to create this mess of a mechanism.
Kind regards
Gent!
Assuming that all your pictures have been indexed with the relevant fields (number of wheels, window shapes...), and given that they are not too numerous (a thousand is peanuts for a computer), you can proceed as follows:
for every criterion, weight the possible discrepancies (e.g. one wheel too much costs 5, one wheel too few costs 10, bad window shape costs 8...). Make this in a coherent way so that the costs of the criteria are well balanced.
to perform a search, evaluate the total discrepancy cost of every car, and sort the values increasingly. Report the first ten.
Technically, what you are after is called a "nearest neighbor search" in a high dimensional space. This problem has been well studied. There are fast solutions but they are extremely complex, and in your case are absolutely not worth using.
The default way of doing this for example in artificial intelligence is to encode all properties as a vector and applying certain weights to each property. The distance can then be calculated using any metric you like. In your case manhatten-distance should be fine. So in pseudocode:
distance(first_car, second_car):
return abs(first_car.n_wheels - second_car.n_wheels) * wheels_weight+ ... +
abs(first_car.n_windows - second_car.n_windows) * windows_weight
This works fine for simple properties like the number of wheels. For more complex properties like the shape of a window you'll probably need to split it up into multiple attributes depending on your requirements on similarity.
Weights are usually picked in such a way as to normalize all values, if their range is known. Optionally an additional factor can be multiplied to increase the impact of a specific attribute on the overall distance.

Unique kind of questionnaire - Database design

For a research experiment I need to design a web application that implements a particular kind of questionnaire, the results of which will serve to derive some statistics and draw some conclusions.
In short, the questionnaire has to work as follows
1-> All answers are in a scale from absolutely false to absolutely correct or an open answer.
2-> Each set of questions corresponds to a given word (for example BIRD or FISH) and a description (a small sentence).
3-> Within one set, all questions can take the form of either a small sentence (text) or an image (to be categorized) and each set may contain an arbitrary number of questions. I need to be able to store the questions that correspond to one word as both text and images and be able to choose between them.
4-> To each set of question correspond 5 different kinds of help. In the questionnaire type A the user may choose one at will. In questionnaire type B all kinds are shown.
5-> The user has to respond to all the questions once (and they have to appear in a random order). Then, if type A, he has to choose a kind of help or refuse to get help and possibly modify his answer. Or, if type B, see all kinds of help one by one (in a random order) and possibly modify his answer.
6-> For each question, for each type of questionnaire I have to know if an answer was modified, which kind of help caused the user to modify and (if type B) whether this kind of help appeared 1st, 2nd, 3rd etc.
I realize those may not be the most complicated demands, but I am new to this and rather confused. My relations up to now look something like the following
QUESTIONNAIRE(id, help_choice, type, phase)
QUESTION_CATEG(id,type, name, description)
IMAGE(#qcat_id, filepath)
TEXT(#qcat_id, content)
INCLUDES(#questionnaire_id, #qcat_id)
HELP(#id, #qcat_id, content)
ANSWER(#questionnaire_id, #qcat_id, response, was_modified, help_taken, help_order).
With help_taken being able to take special values to denote no-help and help_choice being able to take special values to denote that all help was shown.
What is bothering me is the different types of questions. I don't really like (and I don't it will work) the way I have made the distinction between a text type and an image type question for a given question category. Knowing that for a given category (say BIRD) i may have both types (image and text), I have included a 'type' attribute in QUESTION_CATEG. But I feel like I am repeating information.
Do you have any hints as to how this might be fixed. Or even ideas for a completely different approach. Any help is welcome.
This seems to work.
Q_CATEG(id, name, order, description, included)
QUESTION(id, q_categ_id, type, content, order)
AVAIL_ANSWER(id, question_id, content, order)
HELP_CATEG(id, order, name, description)
HELP(help_categ_id, q_categ_id, order, content)
QUESTIONNAIRE(id, type, phase, start, end)
GIVEN_ANSWER(questionnaire_id, question_id, answer_id, modified_answer_id, reason_answer_id, help_categ_id, help_order)

Is there a way to optimize this Gremlin query?

I have a graph database which looks like this (simplified) diagram:
Each unique ID has many properties, which are represented as edges from the ID to unique values of that property. Basically that means that if two ID nodes have the same email, then their has_email edges will both point to the same node. In the diagram, the two shown IDs share both a first name and a last name.
I'm having difficulty writing an efficient Gremlin query to find matching IDs, for a given set of "matching rules". A matching rule will consist of a set of properties which must all be the same for IDs to be considered to have come from the same person. The query I'm currently using to match people based on their first name, last name, and email looks like:
g.V().match(
__.as("id").hasId("some_id"),
__.as("id")
.out("has_firstName")
.in("has_firstName")
.as("firstName"),
__.as("id")
.out("has_lastName")
.in("has_lastName")
.as("lastName"),
__.as("id")
.out("has_email")
.in("has_email")
.as("email"),
where("firstName", eq("lastName")),
where("firstName", eq("email")),
where("firstName", neq("id"))
).select("firstName")
The query returns a list of IDs which match the input some_id.
When this query tries to match an ID with a particularly common first name, it becomes very, very slow. I suspect that the match step is the problem, but I've struggled to find an alternative with no luck so far.
The performance of this query will depend on the edge degrees in your graph. Since many people share the same first name, you will most likely have a huge amount of edge going into a specific firstName vertex.
You can make assumptions, like: there are fewer people with the same last name than people with the same first name. And of course, there should be even fewer people who share the same email address. With that knowledge you just can start to traverse to the vertices with the lowest degree first and then filter from there:
g.V().hasId("some_id").as("id").
out("has_email").in("has_email").where(neq("id")).
filter(out("has_lastName").where(__.in("has_lastName").as("id"))).
filter(out("has_firstName").where(__.in("has_firstName").as("id")))
With that, the performance will mostly depend on the vertex with the lowest edge degree.

Graph Database Newbie Q- How to decide on the direction of a relation between 2 nodes

How do you decide the verb-direction of a relation ?
E.g I have a Country falling under a Sub REgion which in turn is under a Region.
Which one would be better and are there any thumb rules on deciding the direction.
(Region)-[HAS]->(Sub Region)-[HAS]->(Country)
or
(Region)<-[BELONGS_TO]-(Sub Region)<-[BELONGS_TO]-(Country)
Regards
San
I agree with #InverFalcon that directionality is mostly a subjective decision. However, there may be (at least) one situation in which you might want to use a specific direction, especially if that will make an important use case faster.
This is related to the fact that often if you can make a Cypher pattern less specific (without affecting the output), then neo4j would have to do less work and your query would be faster.
For example, suppose your entire data model consists of 2 node labels and 2 relationship types, like below. (I use my own data model, since I don't know what your uses cases are.)
(:Person)-[:ACTED_IN]->(:Movie)
(:Person)-[:DIRECTED]->(:Movie)
In order to find the movies that an actor acted in, your query would have to look something like the following. (Notice that we have to specify the ACTED_IN type, because an outgoing relationship could also be of type DIRECTED. This means that neo4j has to explicitly test every outgoing relationship for its type):
MATCH (:Person {id: 123})-[:ACTED_IN]->(m:Movie)
RETURN m;
However, if your data model replaced the DIRECTED type with a DIRECTED_BY type that had opposite directionality, then it would look like this instead:
(:Person)-[:ACTED_IN]->(:Movie)
(:Person)<-[:DIRECTED_BY]-(:Movie)
With that tweak, your query could be simpler and faster (since neo4j would not have to test relationship types):
MATCH (:Person {id: 123})-->(m:Movie)
RETURN m;
And to be complete, notice that in the above 2 MATCH patterns we could actually remove the :Movie label, since in both data models the ACTED_IN end node will always have the Movie label.

database search best performances

how can i do a search based on combinations of like 50 parameters like filters.
These filters can be price color size brand etc.
So we can get different pages based on these params.
So one link can have price brand size, another one size brand color, and so on.
My question is what will be the best practice to query the database based on these params.
I have one ideea to encrypt them into 101101101 sequence of 1 and 0 and search by that.
So i have like more than 2 milions possible combinations, and i want to reduce the query time.
I heard about btree but i don't know how to use it, i have given my table columns the proper indexes but from this point i don't know in wich direction should i go. How my query is going to look like.
I think that it is a good idea to "encrypt" the params, but don't do it like "10100010", because then you'll have to be storing these values as string.
Rather encode it as base10 number. It means that 100101 = 1*32+0*16+0*8+1*4+0*2+1*1 = 37.
Ofcourse, with 50 flags you'd get a number too big to store as bigint (which is 32 bytes), so try to logically group the parameters and use 2-3 fields for them.
The problem with this aproach would be with querying the data - you would have to write a function extracting a flag from the number, to be able to query the data by only one parameter and not all of them.

Resources