I'm working on a transformation work in which I need to transform a property graph dataset into a RDF dataset. There are so many n-ary relationships that need to be traited as a class, but I do not know how to affect an unique identification on these relations. I tried to use the row index but I've got more than one file on this work so this can't work. So I would like to know how do you affect an unique identification to relationships, if the URI is the solution, how do we do this in OntoRefine mapping? Thank you for your answers.
Lee
There are several ways to address this:
Ideally, use some characteristics of the related entities to make a deterministic URL. Eg if you're making a position (membership) node between a person and an org that involves a mandatory role and start date, you could use a URL like org/<org_id>/person/<person_id>/role/<role_id>/date/<date>
Use a blank node. In that case you don't need to worry about a URN
Use the row index if you prepend it with the table/file name (as a constant)
Use the GREL function random(). It doesn't produce a globally unique identifier, but if you ask for a large enough range, it'll be unique with a very high probability
Use a Jython function, as shown at How to create UUID in Openrefine based on the MD5 hash of the values
If you do your mapping using SPARQL, then use the builtin uuid() function
I have a table say Table1 which has following columns
1. Id
2. Name
3. TransportModeId
4. ParkingId
5. ActivityId
Column 3,4,5 are the foreign keys and all three are simple list tables which has following columns
1. Id
2. Item
For simplicity I have shown 3 tables otherwise my actual schema contains almost 25 List table.
What should be the best Practice
Option 1.
Keep all list table separate which will create 25 tables but on the other hand i will have a clean modular schema
Option 2.
Make a table with self join and add all the items in that table in which ParentId null will represent the name of the table and it can have more than one references in other tables as described above and it has to be kept in some kind of common module
thanks
Option 1 is the way how it is normally done when designing a system that is not supposed to be very configurable by end user/implementator. It has several important advantages, two of them:
when you need to add an extra attribute to any of the enumerations (e.g. parking location to the Parking enumeration), it is quite simple and does not produce extra problems.
It is optimized for speed using relation database engine's native algorithms for linking records.
As for Option 2:
It is something called Generalization. You take more types with similar attributes (methods) and create a class/table with a structure that fits different purposes.
The self reference, as you speak about it, is not a good idea for Option 2, rather make a reference to another EnumerationType table containing type names like Parking, Activity etc. with id.
Using this approach could make sense in case you need to enable end user to configure the attributes himself within your app. But otherwise it could cause you problems when you find out, that different enumeration tables need to have different structures.
Good morning!
I'm with the need to look in my database, one column of a table that does not know the name at first, what happens is the following:
In my application created for each project, a table is created which takes the name of this project, taking the given name and concatenating with the date and time of creation. So the name of this table is stored in another table called projects that have a field that tells the client that belongs to that project. When I do SELECT want to see the names of application projects related to the ID's of customers, browse the database tables behind those those customers and bring me these tables, so that we can finally see the desired fields.
Do not know if I could be clear, if they need more details just talk!
Thanks!
If I understood you correctly, you need to find the exact names of the tables that were named like your project plus they have some additional characters in their names (that look like dates and times).
Well, you can list all the tables that start with the name of your project, using a query like this:
SELECT *
FROM sys.tables
WHERE name LIKE 'yourprojectname%'
sys.tables is a system view where all your tables are listed.
'yourprojectname%' is a mask used for filtering through the list of tables. The % character is neccessary. It means 'any character or characters, any number of them (or none of them)'. (Without % the output would show you only one table whose name is exactly like your project's name. If such a table exists, that is.)
Currently I'd like to develop dictionary application for mobile device. The dictionary itself use offline file/database to translate the word. it just translates for two languages, for example english - spanish dictionary.
I've a simple design in my mind. it would be two tables: English Table and Spanish Table.
for each table contain of:
word_id = the id which would be a foreign key for other table
word = the word
word_description
correspond_trans_id = the id of other table which is the translation for this word to other language.
and also because of this is for mobile application, the database use SQLite.
The definition data for each table has been provided order by field 'word' on the table. However I'm still thinking the problem if there is addition for the data definition. Because the table would be order by field 'word', is there any method to put (insert) the new record still in order by word ? or any idea to make it more efficient ?
At least it for each translation there are a few translation possibilities depending on the context. if you like to do a bidirectional dictionary for two languages you need at least three tables:
ENGLISH
ID | WORD
1 | 'dictionary'
GERMAN
ID | WORD
1 | 'lexikon'
2 | 'wörterbuch'
TRANSLATION_EN_DE
ID_EN | ID_DE
1 | 1
1 | 2
The first two tables are containing all the words that are known in that language and the bidirectional mapping is done by the 3rd mapping table. this is a common n:n mapping case.
with two more tables you're always able to add a new language into you're dicitionary. If you're doing it with one table you'll have multiple definitions for a single word thus no normalized db.
you can also merge your language tables into a single table defining the words language by another column (referencing a language table). in that case you'll need a 2-column index for the language and the word itsself.
What do you intend to do when a word in language 1 can be translated by more than one word in language 2? I think you have to use something like wursT's design to handle that.
RE inserting records in alphabetical order: You do not normally worry about the physical ordering of records in a database. You use an ORDER BY clause to retrieve them in any desired order, and an index to make it efficient. There is nothing in the SQL standard to control physical ordering. Umm, I recall coming across something about forcing a physical ordering on some database I worked with, I think it was MySQL, but most will not give you any control of this. I haven't worked with SQLite so I can't say if it provides a way.
Surely the relationship between words and their possible translations is one-to-many or many-to-many. I'm not clear how you will represent this in your model. Seems like you may need at least one more table.
I agree with Matt - To make life much more easier I would stick with one table. Also if you plan to use CoreData, the index modelling of traditional database design is different to the object graph based model when working in Obj. C/IOS.
It's very easy to think along the traditional lines of Select querying and inner / outer joins but for example your column 'correspond_trans_id' would normally be handled by setting a 'relationship' when defining your data model for the two tables (if you are using CoreData of course).
In essence unless there is a good reason to have two tables I would stick with just one.
In relation to the ordering, you might not need to keep the order of words in the dataset. I'm guessing you want to keep everything Alphabetical which would involve some work if the data were to ever change, even for just one table.
Again using CoreData, NSFetchRequest and NSSortDescriptor, it is very easy to return a set of records ordered by a specified column, freeing you from having to worry about amends and additions to your database.
If you have any questions give me a shout.
I have an application with multiple "pick list" entities, such as used to populate choices of dropdown selection boxes. These entities need to be stored in the database. How do one persist these entities in the database?
Should I create a new table for each pick list? Is there a better solution?
In the past I've created a table that has the Name of the list and the acceptable values, then queried it to display the list. I also include a underlying value, so you can return a display value for the list, and a bound value that may be much uglier (a small int for normalized data, for instance)
CREATE TABLE PickList(
ListName varchar(15),
Value varchar(15),
Display varchar(15),
Primary Key (ListName, Display)
)
You could also add a sortOrder field if you want to manually define the order to display them in.
It depends on various things:
if they are immutable and non relational (think "names of US States") an argument could be made that they should not be in the database at all: after all they are simply formatting of something simpler (like the two character code assigned). This has the added advantage that you don't need a round trip to the db to fetch something that never changes in order to populate the combo box.
You can then use an Enum in code and a constraint in the DB. In case of localized display, so you need a different formatting for each culture, then you can use XML files or other resources to store the literals.
if they are relational (think "states - capitals") I am not very convinced either way... but lately I've been using XML files, database constraints and javascript to populate. It works quite well and it's easy on the DB.
if they are not read-only but rarely change (i.e. typically cannot be changed by the end user but only by some editor or daily batch), then I would still consider the opportunity of not storing them in the DB... it would depend on the particular case.
in other cases, storing in the DB is the way (think of the tags of StackOverflow... they are "lookup" but can also be changed by the end user) -- possibly with some caching if needed. It requires some careful locking, but it would work well enough.
Well, you could do something like this:
PickListContent
IdList IdPick Text
1 1 Apples
1 2 Oranges
1 3 Pears
2 1 Dogs
2 2 Cats
and optionally..
PickList
Id Description
1 Fruit
2 Pets
I've found that creating individual tables is the best idea.
I've been down the road of trying to create one master table of all pick lists and then filtering out based on type. While it works, it has invariably created headaches down the line. For example you may find that something you presumed to be a simple pick list is not so simple and requires an extra field, do you now split this data into an additional table or extend you master list?
From a database perspective, having individual tables makes it much easier to manage your relational integrity and it makes it easier to interpret the data in the database when you're not using the application
We have followed the pattern of a new table for each pick list. For example:
Table FRUIT has columns ID, NAME, and DESCRIPTION.
Values might include:
15000, Apple, Red fruit
15001, Banana, yellow and yummy
...
If you have a need to reference FRUIT in another table, you would call the column FRUIT_ID and reference the ID value of the row in the FRUIT table.
Create one table for lists and one table for list_options.
# Put in the name of the list
insert into lists (id, name) values (1, "Country in North America");
# Put in the values of the list
insert into list_options (id, list_id, value_text) values
(1, 1, "Canada"),
(2, 1, "United States of America"),
(3, 1, "Mexico");
To answer the second question first: yes, I would create a separate table for each pick list in most cases. Especially if they are for completely different types of values (e.g. states and cities). The general table format I use is as follows:
id - identity or UUID field (I actually call the field xxx_id where xxx is the name of the table).
name - display name of the item
display_order - small int of order to display. Default this value to something greater than 1
If you want you could add a separate 'value' field but I just usually use the id field as the select box value.
I generally use a select that orders first by display order, then by name, so you can order something alphabetically while still adding your own exceptions. For example, let's say you have a list of countries that you want in alpha order but have the US first and Canada second you could say "SELECT id, name FROM theTable ORDER BY display_order, name" and set the display_order value for the US as 1, Canada as 2 and all other countries as 9.
You can get fancier, such as having an 'active' flag so you can activate or deactivate options, or setting a 'x_type' field so you can group options, description column for use in tooltips, etc. But the basic table works well for most circumstances.
Two tables. If you try to cram everything into one table then you break normalization (if you care about that). Here are examples:
LIST
---------------
LIST_ID (PK)
NAME
DESCR
LIST_OPTION
----------------------------
LIST_OPTION_ID (PK)
LIST_ID (FK)
OPTION_NAME
OPTION_VALUE
MANUAL_SORT
The list table simply describes a pick list. The list_ option table describes each option in a given list. So your queries will always start with knowing which pick list you'd like to populate (either by name or ID) which you join to the list_ option table to pull all the options. The manual_sort column is there just in case you want to enforce a particular order other than by name or value. (BTW, whenever I try to post the words "list" and "option" connected with an underscore, the preview window goes a little wacky. That's why I put a space there.)
The query would look something like:
select
b.option_name,
b.option_value
from
list a,
list_option b
where
a.name="States"
and
a.list_id = b.list_id
order by
b.manual_sort asc
You'll also want to create an index on list.name if you think you'll ever use it in a where clause. The pk and fk columns will typically automatically be indexed.
And please don't create a new table for each pick list unless you're putting in "relationally relevant" data that will be used elsewhere by the app. You'd be circumventing exactly the relational functionality that a database provides. You'd be better off statically defining pick lists as constants somewhere in a base class or a properties file (your choice on how to model the name-value pair).
Depending on your needs, you can just have an options table that has a list identifier and a list value as the primary key.
select optionDesc from Options where 'MyList' = optionList
You can then extend it with an order column, etc. If you have an ID field, that is how you can reference your answers back... of if it is often changing, you can just copy the answer value to the answer table.
If you don't mind using strings for the actual values, you can simply give each list a different list_id in value and populate a single table with :
item_id: int
list_id: int
text: varchar(50)
Seems easiest unless you need multiple things per list item
We actually created entities to handle simple pick lists. We created a Lookup table, that holds all the available pick lists, and a LookupValue table that contains all the name/value records for the Lookup.
Works great for us when we need it to be simple.
I've done this in two different ways:
1) unique tables per list
2) a master table for the list, with views to give specific ones
I tend to prefer the initial option as it makes updating lists easier (at least in my opinion).
Try turning the question around. Why do you need to pull it from the database? Isn't the data part of your model but you really want to persist it in the database? You could use an OR mapper like linq2sql or nhibernate (assuming you're in the .net world) or depending on the data you could store it manually in a table each - there are situations where it would make good sense to put it all in the same table but do consider this only if you feel it makes really good sense. Normally putting different data in different tables makes it a lot easier to (later) understand what is going on.
There are several approaches here.
1) Create one table per pick list. Each of the tables would have the ID and Name columns; the value that was picked by the user would be stored based on the ID of the item that was selected.
2) Create a single table with all pick lists. Columns: ID; list ID (or list type); Name. When you need to populate a list, do a query "select all items where list ID = ...". Advantage of this approach: really easy to add pick lists; disadvantage: a little more difficult to write group-by style queries (for example, give me the number of records that picked value X".
I personally prefer option 1, it seems "cleaner" to me.
You can use either a separate table for each (my preferred), or a common picklist table that has a type column you can use to filter on from your application. I'm not sure that one has a great benefit over the other generally speaking.
If you have more than 25 or so, organizationally it might be easier to use the single table solution so you don't have several picklist tables cluttering up your database.
Performance might be a hair better using separate tables for each if your lists are very long, but this is probably negligible provided your indexes and such are set up properly.
I like using separate tables so that if something changes in a picklist - it needs and additional attribute for instance - you can change just that picklist table with little effect on the rest of your schema. In the single table solution, you will either have to denormalize your picklist data, pull that picklist out into a separate table, etc. Constraints are also easier to enforce in the separate table solution.
This has served us well:
SQL> desc aux_values;
Name Type
----------------------------------------- ------------
VARIABLE_ID VARCHAR2(20)
VALUE_SEQ NUMBER
DESCRIPTION VARCHAR2(80)
INTEGER_VALUE NUMBER
CHAR_VALUE VARCHAR2(40)
FLOAT_VALUE FLOAT(126)
ACTIVE_FLAG VARCHAR2(1)
The "Variable ID" indicates the kind of data, like "Customer Status" or "Defect Code" or whatever you need. Then you have several entries, each one with the appropriate data type column filled in. So for a status, you'd have several entries with the "CHAR_VALUE" filled in.