How can I specify the type of a column in a python behave step data table? - python-behave

Assume I have a step defined as follows:
Then I would expect to see the following distribution for Ford
| engine | doors | color |
| 2.1L | 4 | red |
And I have the step implementation that reads the table and does the assert as follows:
#then('I would expect to see the following distribution for {car_type}')
def step(context, car_type):
car = find_car_method(car_type)
for row in context.table:
for heading in row.headings:
assertEqual(getattr(car, heading),
row[heading],
"%s does not match. " % heading + \
"Found %s" % getattr(car, heading))
(I do it like this as this approach allows for adding more fields, but keeping it generic enough for many uses of checking attributes of the car).
When my car object has 4 doors (as an int), it does not match as data table requires that there are '4' doors (as a unicode str).
I could implement this method to check the name of the column and handle it differently for different fields, but then maintenance becomes harder when adding a new field as there is one more place to add it. I would prefer specifying it in the step data table instead. Something like:
Then I would expect to see the following distribution for Ford
| engine | doors:int | color |
| 2.1L | 4 | red |
Is there something similar that I can use to achieve this (as this does not work)?
Note that I have cases where I need to create from the data table where I have the same problem. This make is useless trying to use the type of the 'car' object to determine the type as it is None in that case.
Thank you,
Baire

After some digging I was not able to find anything, so decided to implement my own solution. I am Posting it here as it might help someone in the future.
I created a helper method:
def convert_to_type(full_field_name, value):
""" Converts the value from a behave table into its correct type based on the name
of the column (header). If it is wrapped in a convert method, then use it to
determine the value type the column should contain.
Returns: a tuple with the newly converted value and the name of the field (without the
convertion method specified). E.g. int(size) will return size as the new field
name and the value will be converted to an int and returned.
"""
field_name = full_field_name.strip()
matchers = [(re.compile('int\((.*)\)'), lambda val: int(val)),
(re.compile('float\((.*)\)'), lambda val: float(val)),
(re.compile('date\((.*)\)'), lambda val: datetime.datetime.strptime(val, '%Y-%m-%d'))]
for (matcher, func) in matchers:
matched = matcher.match(field_name)
if matched:
return (func(value), matched.group(1))
return (value, full_field_name)
I could then set up by scenario as follows:
Then I would expect to see the following distribution for Ford
| engine | int(doors) | color |
| 2.1L | 4 | red |
I then changed by step as follows:
#then('I would expect to see the following distribution for {car_type}')
def step(context, car_type):
car = find_car_method(car_type)
for row in context.table:
for heading in row.headings:
(value, field_name) = convert_to_type(heading, row[heading])
assertEqual(getattr(car, field_name),
value,
"%s does not match. " % field_name + \
"Found %s" % getattr(car, field_name))
It should be each to move 'matchers' to the module level as they don't have to be re-created each time the method is called. It is also easy to extend it for more conversion methods (e.g. liter() and cc() for parsing engine size while also converting to a standard unit).

Related

'nested fields' in MS Access

I have a primary field called 'Unit', and I need to specify how many objects (exhibiting great variability) were found within them. There are 116 types of these objects, which can be broken down into 6 categories. Each category dictates what attributes may be used to describe each object type. For every unit, I need to record how many types of objects are found within it, and document how many of them exhibit each attribute. I sketched out examples of the schema and how I need to apply it. Perhaps the simplest solution would be to create a table for each type, and relate them to the table containing the unit list, but then there will be so many tables (is there a limit in MS Access?). Otherwise, is it possible to create 'nested fields' in access? I just made that term up, but it seems to describe what I'm looking to do.
Category 1 (attribs: a, b, c, d)
Type 1
Type 2
Category 2 (attribs: x, y, z)
Type 3
Type 4
Unit
Type 1 | a | b | c | d |
Type 2 | a | b | c | d |
Type 3 | x | y | z |
Type 4 | x | y | z |
UPDATE:
To clarify, I essentially need to create subtables for each field of my primary table. Each field has sub-attributes, and I need to be able to specify the distribution of objects at this finer-grained resolution.
What you want is similar to this question on SO: Database design - multi category products with properties.
So you'll have a third table which relates each the attribute value to the Unit. And to control which attributes each Category can have, a fourth table will be required which specifies the attributes (names) for each category.
Unit:
.id
.categoryid
Category:
.id
.cat_name
Category_Attributes:
.attribID
.categoryid
.attribute_name
Unit_Attributes:
.unitid
.attribID
.attrib_value

Can anyone suggest a method of versioning ATTRIBUTE (rather than OBJECT) data in DB

Taking MySQL as an example DB to perform this in (although I'm not restricted to Relational flavours at this stage) and Java style syntax for model / db interaction.
I'd like the ability to allow versioning of individual column values (and their corresponding types) as and when users edit objects. This is primarily in an attempt to drop the amount of storage required for frequent edits of complex objects.
A simple example might be
- Food (Table)
- id (INT)
- name (VARCHAR(255))
- weight (DECIMAL)
So we could insert an object into the database that looks like...
Food banana = new Food("Banana",0.3);
giving us
+----+--------+--------+
| id | name | weight |
+----+--------+--------+
| 1 | Banana | 0.3 |
+----+--------+--------+
if we then want to update the weight we might use
banana.weight = 0.4;
banana.save();
+----+--------+--------+
| id | name | weight |
+----+--------+--------+
| 1 | Banana | 0.4 |
+----+--------+--------+
Obviously though this is going to overwrite the data.
I could add a revision column to this table, which could be incremented as items are saved, and set a composite key that combines id/version, but this would still mean storing ALL attributes of this object for every single revision
- Food (Table)
- id (INT)
- name (VARCHAR(255))
- weight (DECIMAL)
- revision (INT)
+----+--------+--------+----------+
| id | name | weight | revision |
+----+--------+--------+----------+
| 1 | Banana | 0.3 | 1 |
| 1 | Banana | 0.4 | 2 |
+----+--------+--------+----------+
But in this instance we're going to be storing every single piece of data about every single item. This isn't massively efficient if users are making minor revisions to larger objects where Text fields or even BLOB data may be part of the object.
What I'd really like, would be the ability to selectively store data discretely, so the weight could possible be saved in a separate DB in its own right, that would be able to reference the table, row and column that it relates to.
This could then be smashed together with a VIEW of the table, that could sort of impose any later revisions of individual column data into the mix to create the latest version, but without the need to store ALL data for each small revision.
+----+--------+--------+
| id | name | weight |
+----+--------+--------+
| 1 | Banana | 0.3 |
+----+--------+--------+
+-----+------------+-------------+-----------+-----------+----------+
| ID | TABLE_NAME | COLUMN_NAME | OBJECT_ID | BLOB_DATA | REVISION |
+-----+------------+-------------+-----------+-----------+----------+
| 456 | Food | weight | 1 | 0.4 | 2 |
+-----+------------+-------------+-----------+-----------+----------+
Not sure how successful storing any data as blob to then CAST back to original DTYPE might be, but thought since I was inventing functionality here, why not go nuts.
This method of storage would also be fairly dangerous, since table and column names are entirely subject to change, but hopefully this at least outlines the sort of behaviour I'm thinking of.
A table in 6NF has one CK (candidate key) (in SQL a PK) and at most one other column. Essentially 6NF allows each pre-6NF table's column's update time/version and value recorded in an anomaly-free way. You decompose a table by dropping a non-prime column while adding a table with it plus an old CK's columns. For temporal/versioning applications you further add a time/version column and the new CK is the old one plus it.
Adding a column of time/whatever interval (in SQL start time and end time columns) instead of time to a CK allows a kind of data compression by recording longest uninterupted stretches of time or other dimension through which a column had the same value. One queries by an original CK plus the time whose value you want. You dont need this for your purposes but the initial process of normalizing to 6NF and the addition of a time/whatever column should be explained in temporal tutorials.
Read about temporal databases (which deal both with "valid" data that is times and time intervals but also "transaction" times/versions of database updates) and 6NF and its role in them. (Snodgrass/TSQL2 is bad, Date/Darwen/Lorentzos is good and SQL is problematic.)
Your final suggested table is an example of EAV. This is usually an anti-pattern. It encodes a database in to one or more tables that are effectively metadata. But since the DBMS doesn't know that you lose much of its functionality. EAV is not called for if DDL is sufficient to manage tables with columns that you need. Just declare appropriate tables in each database. Which is really one database, since you expect transactions affecting both. From that link:
You are using a DBMS anti-pattern EAV. You are (trying to) build part of a DBMS into your program + database. The DBMS already exists to manage data and metadata. Use it.
Do not have a class/table of metatdata. Just have attributes of movies be fields/columns of Movies.
The notion that one needs to use EAV "so every entity type can be
extended with custom fields" is mistaken. Just implement via calls
that update metadata tables sometimes instead of just updating regular
tables: DDL instead of DML.

Can a normalized database (3rd normal form) have loops in it?

I have got a rather complex relationship between several entities:
TeacherTable
|
TeacherClassLinkTable
|
ClassTable
|
StudentClassLinkTable
|
StudentTable
|
StudentTestResults
|
TestTable
|
TestModuleTable
This works for most things that I need to do with it but it fails when I try to find what modules are taken by a class. I am able to find out what modules have been taken by Students that are part of a class but Students can belong to multiple classes taking different modules in rare cases. So I would not necessarily get an accurate result to finding what modules are taken by a class. I therefore want to insert a new table which would be ClassModuleLinkTable. This would allow me to make that link easily, however it would form a loop in my database structure and I'm not sure whether my database would therefore remain in 3rd normal form.
TeacherTable
|
TeacherClassLinkTable
|
ClassTable----------------------------
| |
StudentClassLinkTable |
| |
StudentTable |
| |
StudentTestResults |
| |
TestTable |
| |
TestModuleTable--------------ClassModuleLinkTable
I don't think that this is a problem, and I don't actually think it's what I would call a loop or circular reference.
A circular reference is where e.g. table A has a non-nullable FK to table B, which has a non-nullable FK to table A (or the circle could be A to B to C to D to A). If both tables are empty you cannot actually add a row to either of them, as both require a reference to a row in the other. I'm not actually sure that this situation is against 3NF, but it's plainly a problem!
Your situation does not have a circular reference and so as far as I'm concerned it's fine.

Database Design: how to store translated numbers?

This is a general DB design question. Assume the following table:
======================================================================
| product_translation_id | language_id | product_id | name | price |
======================================================================
| 1 | 1 | 1 | foobar | 29.99 |
----------------------------------------------------------------------
| 2 | 2 | 1 | !##$%^ | &*()_ |
----------------------------------------------------------------------
(Assume that language_id = 2 is some language that is not based on Latin characters, etc.)
Is it right for me to store the translated price in the DB? While it allows me to display translations properly, I am concerned it will give me problems when I want to do mathematical operations on them (e.g. add a 10% sales tax to &*()_).
What's a good approach to handling numerical translations?
If you can programatically convert "29.99" to "&*()_" then I'd put the price in the product table and leave the translation of it the display layer. If you store it twice then you will have two obvious problems:
You will end up with consistency problems because you're storing the same thing in two different places in two different formats.
You will be storing numeric data in text format.
The first issue will cause you a lot of head aches when you need to update your prices and your accountants will hate you for making a mess of the books.
The second issue will make your database hate you whenever you need to do any computations or comparisons inside the database. Calling CONVERT(string AS DECIMAL) over and over again will have a cost.
You could keep the price in numeric form in the product table (for computation, sorting, etc.) and then have the localized translation in the your translation table as a string. This approach just magnifies the two issues above though. However, if you need to have humans translating your numbers then this approach might be necessary. If you're stuck with this then you can mitigate your consistency problems by running a sanity checker of some sort after each update, you might even be able to wrap the sanity checker in a trigger of some sort.

PostgreSQL multidimensional array search

I am a newbie to Postgresql and was trying with it.
I have created a simple table:
CREATE table items_tags (
ut_id SERIAL Primary KEY,
item_id integer,
item_tags_weights text[]
);
where:
item_id - Item Id with these tags are associated
item_tags_weights - Tags associated with Itm including weight
Example entry:
--------------------
ut_id | item_id | item_tags_weights
---------+---------+-------------------------------------------------------------------------------------------------------------------------------
3 | 2 | {{D,1},{B,9},{W,3},{R,18},{F,9},{L,15},{G,12},{T,17},{0,3},{I,7},{E,14},{S,2},{O,5},{M,4},{V,3},{H,2},{X,14},{Q,9},{U,6},{P,16},{N,11},{J,1},{A,12},{Y,15},{C,15},{K,4},{Z,17}}
1000003 | 3 | {{Q,4},{T,19},{P,15},{M,14},{O,20},{S,3},{0,6},{Z,6},{F,4},{U,13},{E,18},{B,14},{V,14},{X,10},{K,18},{N,17},{R,14},{J,12},{L,15},{Y,3},{D,20},{I,18},{H,20},{W,15},{G,7},{A,11},{C,14}}
4 | 4 | {{Q,2},{W,7},{A,6},{T,19},{P,8},{E,10},{Y,19},{N,11},{Z,13},{U,19},{J,3},{O,1},{C,2},{L,7},{V,2},{H,12},{G,19},{K,15},{D,7},{B,4},{M,9},{X,6},{R,14},{0,9},{I,10},{F,12},{S,11}}
5 | 5 | {{M,9},{B,3},{I,6},{L,12},{J,2},{Y,7},{K,17},{W,6},{R,7},{V,1},{0,12},{N,13},{Q,2},{G,14},{C,2},{S,6},{O,19},{P,19},{F,4},{U,11},{Z,17},{T,3},{E,10},{D,2},{X,18},{H,2},{A,2}}
(4 rows)
where:
{D,1} - D = tag, 1 = tag weight
Well, I just wanted to list the items_id where tags = 'U' according tag weight.
On way is to select ALL the tags from database and do the processing in high-level language with sort and use the result set.
For this, I can do the following:
1) SELECT * FROM user_tags WHERE 'X' = ANY (interest_tags_weights)
2) Extract and sort the information and display.
But considering that multiple items can be associated with a single 'TAG', and assuming
10 million entry, this method will be surely sluggish.
Any idea to list as needed with CREATE function or so?
Any pointers will be helpfull.
Many thanks.
Have you considered normalization, i.e. moving the array field into another table? Apart from being easy to query and extend, it's likely to have better performance on larger databases.

Resources