Peewee - Optional fields in Model? - peewee

We have a few thousand databases, but the number of columns are not consistent.
Is it possible to define columns that may or may not appear in the table?
As example:
class ContactFields(Model):
id = IntegerField()
id_2 = IntegerField()
Sometimes id_2 does not exist. However, if I try to create a query, peewee errors out with:
InternalError: (1054, "Unknown column 't1.id_2' in 'field list'")

No, that would be magical as hell. You can try using reflection if you need to dynamically access tables. Or you can just explicitly select only those columns which are present across all databases.
http://docs.peewee-orm.com/en/latest/peewee/playhouse.html#generate_models

Related

Relational database design tables with common base

I have a lot trouble finding the best design solution for this situation. I have two tables with a common base. Currently I have designed it like this: I have an order table (the common base):
[order_table]
order_id
order_type
company
created
I have another table with reference to the order table:
[product_order]
order_id fk
product_id
quantity
price
I have second table with reference to the order table:
[special_order]
order_id fk
description
price_estimate
color
size
Both tables share the same order_id which i like. I often have to do large queries on order_table using the information available in that table lets say 'company = 200'. But for each result I also need its data from product_order or special_order depending on which type it is. So the only optimal solution I see is to left joining the query with both tables on order_id and filter the information afterwards. The only other option I see is to add the common columns to each table, but then I would have a lot of reorganizing afterwards to get them in correct order.
Is there a better way to organize the data?
So those extra tables are extra attributes to a specific order-id (1:1)?
I'd consider adding all the fields to the common tables, or at least the fields from the most used sub-table.
If not appropriate, you may want to add "Type" to the common table and let a trigger manage insert/delete of related records to avoid the fuzz with orphans etc.
Use views with your left joins (wouldn't inner be better?) to fetch the different types.

Database design and large tables?

Are tables with lots of columns indicative of bad design? For example say I have the following table that stores user information and user settings:
[Users table]
userId
name
address
somesetting1
...
somesetting50
As the site requires more settings the table gets larger. In my mind this table is normalized, all the settings are dependent on the userId.
I have a thing against tables with lots of columns it just seems wrong to me, but then I remembered that you can select what data to return from the table, so If the table is large I could still break it into several different objects in code. For example
[User object]
[UserSetting object]
and return only the data to fill those objects.
Is the above common practice, or are their other techniques that deal with tables with lots of columns that are more suitable to use?
I think you should use multiple tables like this:
[Users table]
userId
name
address
[Settings table]
settingId
userId
settingKey
settingValue
The tables are related by the userId column which you can use to retrieve the settings for the user you need to.
I would say that it is bad table design. If a user doesn't have an entry for 47 of those 50 settings then you will have a large number of NULL's in the table which isn't good practice and will also slow down performance (NULL's have to be handled in a special way).
Instead, have the following:
USER TABLE
Id,
FirstName
LastName
etc
SETTINGS
Id,
SettingName
USER SETTINGS
Id,
SettingId,
UserId,
SettingValue
You then have a many to many join, and eliminate NULL's
first, don't put spaces in table names! all the [braces] will be a real pain!
if you have 50 columns how meaningful will all that data be for each user? will there be lots of nulls? Most data may not even apply to any given user. Think 1 to 1 tables, where you break down the "settings" into logical groups:
Users: --main table where most values will be stored
userId
name
address
somesetting1 ---please note that I'm using "somesetting1", don't
... --- name the columns like this, use meaningful names!!
somesetting5
UserWidgets --all widget settings for the user
userId
somesetting6
....
somesetting12
UserAccounting --all accounting settings for the user
userId
somesetting13
....
somesetting23
--etc..
you only need to have a Users row for each user, and then a row in each table where that data applies to the given user. I f a user doesn't have any widget settings then no row for that user. You can LEFT join each table as necessary to get all the settings as needed. Usually you only need to work on a sub set of settings based on which part of the application that is running, which means you won't need to join in all of the tables, just the one or tow that you need at that time.
You could consider an attributes table. As long as your indexes are good, then you wouldn't have too much of a performance issue:
[AttributeDef]
AttributeDefId int (primary key)
GroupKey varchar(50)
ItemKey varchar(50)
...
[AttributeVal]
AttributeValId int (primary key)
AttributeDefId int (FK -> AttributeDef.AttributeDefId)
UserId int (probably FK to users table?)
Val varchar(255)
...
basically you're "pivoting" your table with many columns into 2 tables with less columns. You can write views and table functions around this structure to give you data for a group of related items or just a specific item, etc. You could also add other things to the attribute definition table to indicate required data elements, restrictions on the data elements, etc.
What's your thought on this type of design?
Use several tables with matching indexes to get the best SELECT speed. Use the indexes as a way to relate the information between tables using a JOIN.

Merge data object table with associated attributes table in a view

Here's the setup: I have several tables that hold information for data objects which have the potential to have various and sundry bits of data associated with them. Each of these tables has an associated attributes table, which holds 3 bits of information:
the id (integer) of the row the attribute is associated with
a short attribute name ( < 50 chars )
a value (varchar)
The object table will have any number of columns of varying data types, but will always have an integer primary key. If possible, I would like to set up a view that will allow me to select a row from the object table, and all of its associated attributes at one go.
****EDIT****
Ideally, the form I'd like this to take is having columns in the view with the names of the matched attribute from the attributes table, and the value as the value of the attribute.
So for example, if I have table Foo with columns 'Bar', 'Bat', and 'Baz' the view would have those columns, and additionally, columns for any attributes that a row might have.
****END EDIT****
Now, I know (or think I do) that SQL doesn't allow using variables as an alias for a column name. Is there a clean, practical way of doing what I want, or am I chasing a pipe dream?
The obvious solution is to handle all of this in the application code, but I'm curious if it can be done in SQL.
The answer depends on what you are actually seeking. Will the output of the view have one row per attribute per object or one column per attribute per object? If the former, then I'm not sure why you need a view:
Select ...
From ObjectTable
Join AttributeTable
On AttributeTable.Id = ObjectTable.Id
However, I suspect what you want is the later or something like:
Select ...
, ... As Attribute1
, ... As Attribute2
, ... As Attribute3
...
From ObjectTable
In this scenario, the columns that would be generated are not known at execution because the attribute names are dynamic. This is commonly known as a dynamic crosstab. In general, the SQL language is not designed for dynamic column generation. The only way to do this in T-SQL is to use some fugly dynamic SQL. Thus, it is better done in a reporting tool or in middle-tier code.
It sounds like you want a view for each of your 'object' tables as well as its 'attributes' table. Correct me if I am wrong in my reading. It's not clear what your intentions are with 'using variables as an alias for a column name'. Were you hoping to merge ALL your objects, with their different columns, into one view?
Suggest create one view per entity table, and join to its relevant 'attributes' table.
Question though - why is there one matching attributes table for each entity table? Why are they split out? Perhaps you've made the question simpler or obfuscated, so perhaps my question is rhetorical.
CREATE VIEW Foo AS
SELECT O.ID
,O.EverythingElse
,A.ShortName
,A.SomeVarcharValue
FROM
ObjectTable AS O --customer, invoice, whathaveyou
INNER JOIN
ObjectAttribute AS A ON A.ObjectID = O.ID
To consume from this, you could:
SELECT * FROM Foo WHERE ID = 4 OR
SELECT * FROM Foo WHERE ShortName = 'Ender'

Database design - do I need one of two database fields for this?

I am putting together a schema for a database. The goal of the database is to track applications in our department. I have a repeated problem that I am trying to solve.
For example, I have an "Applications" table. I want to keep track if any application uses a database or a bug tracking system so right now I have fields in the Applications table called
Table: Applications
UsesDatabase (bit)
Database_ID (int)
UsesBugTracking (bit)
BugTracking_ID (int)
Table: Databases:
id
name
Table: BugTracking:
id
name
Should I consolidate the "uses" column with the respective ID columns so there is only one bug tracking column and only one database column in the applications table?
Any best practice here for database design?
NOTE: I would like to run reports like "Percent of Application that use bug tracking" (although I guess either approach could generate this data.)
You could remove the "uses" fields and make the id columns nullable, and let a null value mean that it doesn't use the feature. This is a common way of representing a missing value.
Edit:
To answer your note, you can easily get that statistics like this:
select
count(*) as TotalApplications,
count(Database_ID) as UsesDatabase,
count(BugTracking_ID) as UsesBugTracking
from
Applications
Why not get rid of the two Use fields and simply let a NULL value in the _ID fields indicate that the record does not use that application (bug tracking or database)
Either solution works. However, if you think you may want to occasionally just get a list of applications which do / do not have databases / bugtracking consider that having the flag fields reduces the query by one (or two) joins.
Having the bit fields is slightly denormalized, as you have to keep two fields in sync to keep one piece of data updated, but I tend to prefer them for cases like this for the reason I gave in the prior paragraph.
Another option would be to have the field nullable, and put null in it for those entries which do not have DBs / etc, but then you run into problems with foreign key constraints.
I don't think there is any one supreme right way, just consider the tradeoffs and go with what makes sense for your application.
I would use 3 tables for the objects: Application, Database, and BugTracking. Then I would use 2 join tables to do 1-to-many joins: ApplicationDatabases, and ApplicationBugTracking.
The 2 join tables would have both an application_id and the id of the other table. If an application used a single database, it would have a single ApplicationDatabases record joining them together. Using this setup, an application could have 0 database (no records for this app in the ApplicationDatabases table), or many databases (multiple records for this app in the ApplicationDatabases table).
"Should i consolidate the "uses" column"
If I look at your problem statement, then there either is no "uses" column at all, or there are two. In either case, it is wrong of you to speak of "THE" uses column.
May I politely suggest that you learn to be PRECISE when asking questions ?
Yes using null in the foreign key fields should be fine - it seems superfluous to have the bit fields.
Another way of doing it (though it might be considered evil by database people ^^) is to default them to 0 and add in an ID 0 data row in both bugtrack and database tables with a name of "None"... when you do the reports, you'll have to do some more work unless you present the "None" values as they are as well with a neat percentage...
To answer the edited question-
Yes, the fields should be combined, with NULL meaning that the application doesn't have a database (or bug tracker).

How to design a database for unkown amount of 'meta'-data

I want to store certain items in the database with variable amount of properties.
For example:
An item can have 'url' and 'pdf' property both others do not en instead have 'image' and 'location' properties.
So the problem is an some items can have some properties and others a lot.
How would you design this database. How to make it searchable and performant?
What would the schema look like?
Thanks!
What you are after has a name - Entity Attribute Value (EAV). It is "a data model that is used in circumstances where the number of attributes (properties, parameters) that can be used to describe a thing (an "entity" or "object") is potentially very vast, but the number that will actually apply to a given entity is relatively modest."
If you are not necessarily tied to SQL, a triple store is designed for precisely this task. Most are designed to be queried with the SPARQL query language.
That sounds like a perfect job for a document database.
Start with your object (item) and create a table for items. Your item can have 1 or many attributes or none at all right? So set up a table of attributes with unique ids. Now set up a table that holds many items (some can duplicate) and many attributes (can duplicate as well)
Item
ItemID
ItemDescription
...
Attributes
AttributeID
AttributeDescription
...
ItemAttributes
rowID
ItemID
AttributeID
Now when you want to query you can simply join the tables and filter however you desire...
The Entity Attribute Value (EAV) model is very flexible. The semantic web and its query language sparql are based on EAV too. But some people don't like it because there is a performance penalty with this model.
Start with doing some high load performance tests on your database. Don't do them when you are done coding, because then it is too late.
edit: Focus on the speed of you select statements. Users expect quick results when they search.
I have designed tables like this in the past to have the following fields:
id
type
subtype
value
And then I would have another table that would define the type and subtypes used, and possibly give the datatype for that type and subtype combination so that you could programatically enforce it.
Its not pretty, and you don't want to do it unless you have to. But its the best way I have found when you do.
update: even if you leave subtype blank, I find its a good thing to have, because its too often that you want to subcategorize something that already exists. Example you create type: address, now you need mailing address and billing address and physical address.
For this kind of scenario's I use the XML-type column in MS SQL 2005...
you'll have all the advantages of XML + SQL. That is use an XPath expression as part of an SQL-statement.
It's a feature of MS SQL 2005, I am not sure which other RDBMS support this.
I am not sure what the implications are performance wise.
Create a properties table with the following fields:
item_id int(or whatever the ID type is in the item table)
property_name varchar(500)
property_value varchar(500)
Set a foreign key between item_id and the item's id field, and you're done.
That's how you do a many-to-one relationship in SQL.
Looks like an "items" table with primary key "item_id", a "properties" table with primary key "property_id" and a foreign key "item_id" with the "items" table. "properties" will have columns "name" and "value", both of type varchar.
Performant? Don't know.

Resources