I have a quite simple ontology written in OWL 2, describing my particular world (TBox).
Since I have to store a lot of individuals, I would like to avoid scalability problems of triplestores, storing them out of the ontology (e.g., in a database table). At the same time, I would like to exploit reasoning capabilities.
For example, if I stored my individuals assertions directly into my ontology:
:Mary a :Female
:Mary :motherOf :John
...
with an opportune modelling, I could entail after a reasoning that:
:Mary a :Parent
:Mary a :Mother
...
...but what if I stored my individuals assertions outside of my ontology? Is there a framework/best practice to manage this scenario?
I am looking for a ontology dataset, which can describe the concept: "Project". For example:
the type of the project(new project, improvement, translate...)
industries involved in the project(mobile application, design, music)
products of the project
but i can't even find a keyword for such a case.
Are there any suitable datasets for my case?
Check this one : http://ckan.citek.ipn.pt/dataset/r-d-industry-ontology-and-data
Also, check the below website for similar data sets:
http://datahub.io/dataset
I have a lot of xsd files with different complex types on it. I want to import data into my oracle database, but amount of data so huge and i can't use xsd2db or altova xmlspy because it's blowing my mind. I'm looking for simple and useful etl tool which can help me with it. Does anyone know gui tool to generate ddl by xsd?
This is a follow up to my comment; I am not positioning this as an answer, but it should help you understand more what is it that you are after, and maybe what you can do about it. For sure, it should be a good example for #a_horse_with_no_name...
I am not familiar with xmlspy, but given what I saw in xsd2db, it made me think of the .NET ability to infer a DataSet from an XML Schema. While the authoring style of the XSD itself may affect the way a DataSet is derived, it would be mostly insignificant for larger bodies of XSD. Even more, there is a big chance that the derivation might not even work (so many limitations there actually).
From my own experience, the derivation process in .NET gives you a very normalized structure. To illustrate, I am going to introduce a sample XML:
<ShippingManifest>
<Date>2012-11-21</Date>
<InvoiceNumber>123ABC</InvoiceNumber>
<Customer>
<FirstName>Sample</FirstName>
<LastName>Customer</LastName>
</Customer>
<Address>
<UnitNumber>2A</UnitNumber>
<StreetNumber>123</StreetNumber>
<StreetName>A Street</StreetName>
<Municipality>Toronto</Municipality>
<ProvinceCode>ON</ProvinceCode>
<PostalCode>X9X 9X9</PostalCode>
</Address>
<PackingList>
<LineItem>
<ID>Box1</ID>
<Load>1-233</Load>
<Description>Package box</Description>
<Items>22</Items>
<Cartons>22</Cartons>
<Weight>220</Weight>
<Length>10</Length>
<Width>10</Width>
<Height>10</Height>
<Volume>1000</Volume>
</LineItem>
<LineItem>
<ID>Box2</ID>
<Load>456-233</Load>
<Description>Package box</Description>
<Items>22</Items>
<Cartons>22</Cartons>
<Weight>220</Weight>
<Length>10</Length>
<Width>10</Width>
<Height>10</Height>
<Volume>1000</Volume>
</LineItem>
</PackingList>
</ShippingManifest>
Conceptually, its structure is very simple: a shipping manifest entity, a customer, a shipping address and a packing list.
Converting this to an ADO.NET DataSet is a straight forward exercise, with a very clean output.
It should be easy to imagine how the number of entities (tables in your database if you wish) could mushroom for just a bit more complex XML...
As a sidebar, if one designs the XSD keeping in mind a process involving DataSets, then removing the PackingList element and moving the LineItem collection as repeating under ShippingManifest gives a somewhat simplified layout: one without the PackingList entity.
Automatic tools to convert an XSD data model to a relational model, such as .NET's, are typically designed to generate a highly normalized structure. The denormalization, I guess, is left to the user for obvious reasons.
QTAssistant's XML Builder is different. Our requirement was to create an ER model which would work where .NET's XSD to Dataset doesn't, and with an output containing smaller number of entities, where possible. This is what QTAssistant generates for the same:
What QTAssistant did here, was to merge all entities engaged in a one-to-one relationship. From a modeling perspective, it's an obvious sin. It does have its benefits, particularly for our users interested in a simple structure capable of capturing data (test data to be more specific).
The generated mapping (XSD to ER) is bi-directional. It means that it can be used to generate valid XML from a database, or "shred" XML data into a database (the shredding is done by generating DML statements). The way this technology is used: test cases are stored in Excel spreadsheets, XML is generated, sent to a web service, and the results are stored back in Excel.
We also generate an XML file describing the structure which through an XSLT could be converted to DDL. And this is where things could get messy, depending on you schema. It is rather common to see XSDs where simple types are unconstrained: strings without maxlength, or using patterns without a maximum length; unconstrained decimals, etc. These are just some of the reasons why, in our case, we don't have an out of the box-straight-forward-way to generate the DDL but rather provide hooks for customizations.
So, to close my comment, I pretty much know what you want to do (I have to assume that other things, such as Oracle's XML capabilities, or XML databases and XQuery, etc. have been ruled out). Unfortunately, the XSD really matters here, so if you can share those as per my comment, I can take a look - it'll be up to you how much you want to share back here.
I've been following a mostly DDD methodology for this project, so, like any DDD'er, I created my domain model classes first. My intention is to use these POCO's as my LINQ-to-SQL entities (yes, they're not pure POCO's, but I'm ok with that). I've started creating the database schema and external mapping XML file, but I'm running into some issues with modeling the entities' relationships and associations.
An artifact represents a document. Artifacts can be associated with either a Task or a Case. The Case entity looks like this:
public class Case
{
private EntitySet<Artifact> _Artifacts;
public IList<Artifact> Artifacts
{
get
{
return _Artifacts;
}
set
{
_Artifacts.Assign(value);
}
}
.
.
.
}
Since an Artifact can be associated with either a Case, or a Task, I've the option to use inheritance on the Artifact class to create CaseArtifact and TaskArtifact derived classes. The only difference between the two classes, however, would be the presence of a Case field or a Task field. In the database of course, I would have a single table, Artifact, with a type discriminator field and the CaseId and TaskId fields.
My question: is this a valid approach to solving this problem, or would creating a join table for each association (2 new tables, total) be a better approach?
I would probably go with two tables - it makes the referential integrity-PK/FKs a little simpler to handle in the database, since you won't have to have a complex constraint based on the selector column.
(to reply to your comment - I ran out of space so post here as an edit) My overall philosophy is that the database should be modelled with database best practices (protect your perimeter and ensure database consistency, using as much RI and constraints as possible, provide all access through SPs, log activity as necessary, control all modes of access, use triggers where necessary) and the object model should be modelled with OOP best practices to provide a powerful and consistent API. It's the job of your SPs/data-access layer to handle the impedance mismatch.
If you just persist a well-designed object model to a database, your database won't have much intrinsic value (difficult to data mine, report, warehouse, metadata vague, etc) when viewed without going through the lens of the object model - this is fine for some application, typically not for mine.
If you just mimic a well-designed database structure in your application, without providing a rich OO API, your application will be difficult to maintain and the internal strucutres will be awkward to deal with - typically very procedural, rigid and with a lot of code duplication.
I would consider finding commonalities in between case and task, for the lack of better word let's call it "CaseTask" and then sub-typing (inheriting) from that one. After that you attach document to the super-type.
UPDATE (after comment):
I would then consider something like this. Each document can be attached to several cases or tasks.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 months ago.
Improve this question
Object-relational mapping has been well discussed, including on here. I have experience with a few approaches and the pitfalls and compromises. True resolution seems like it requires changes to the OO or relational models themselves.
If using a functional language, does the same problem present itself? It seems to me that these two paradigms should fit together better than OO and RDBMS. The idea of thinking in sets in an RDBMS seems to mesh with the automatic parallelism that functional approaches seem to promise.
Does anyone have any interesting opinions or insights? What's the state of play in the industry?
What's the purpose of an ORM?
The main purpose of using an ORM is to bridge between the networked model (object orientation, graphs, etc.) and the relational model. And the main difference between the two models is surprisingly simple. It's whether parents point to children (networked model) or children point to parents (relational model).
With this simplicity in mind, I believe there is no such thing as an "impedance mismatch" between the two models. The problems people usually run into are purely implementation specific, and should be solvable, if there were better data transfer protocols between clients and servers.
How can SQL address the problems we have with ORMs?
In particular, the third manifesto tries to address the shortcomings of the SQL language and relational algebra by allowing for nested collections, which have been implemented in a variety of databases, including:
Oracle (probably the most sophisticated implementation)
PostgreSQL (to some extent)
Informix
SQL Server, MySQL, etc. (through "emulation" via XML or JSON)
In my opinion, if all databases implemented the SQL standard MULTISET() operator (e.g. Oracle does), people would no longer use ORMs for mapping (perhaps still for object graph persistence), because they could materialise nested collections directly from within the databases, e.g. this query:
SELECT actor_id, first_name, last_name,
MULTISET (
SELECT film_id, title
FROM film AS f
JOIN film_actor AS fa USING (film_id)
WHERE fa.actor_id = a.actor_id
) AS films
FROM actor AS a
Would yield all the actors and their films as a nested collection, rather than a denormalised join result (where actors are repeated for each film).
Functional paradigm at the client side
The question whether a functional programming language at the client side is better suited for database interactions is really orthogonal. ORMs help with object graph persistence, so if your client side model is a graph, and you want it to be a graph, you will need an ORM, regardless if you're manipulating that graph using a functional programming language.
However, because object orientation is less idiomatic in functional programming languages, you are less likely to shoehorn every data item into an object. For someone writing SQL, projecting arbitrary tuples is very natural. SQL embraces structural typing. Each SQL query defines its own row type without the need to previously assign a name to it. That resonates very well with functional programmers, especially when type inference is sophisticated, in case of which you won't ever think of mapping your SQL result to some previously defined object / class.
An example in Java using jOOQ from this blog post could be:
// Higher order, SQL query producing function:
public static ResultQuery<Record2<String, String>> actors(Function<Actor, Condition> p) {
return ctx.select(ACTOR.FIRST_NAME, ACTOR.LAST_NAME)
.from(ACTOR)
.where(p.apply(ACTOR)));
}
This approach leads to a much better compositionality of SQL statements than if the SQL language were abstracted by some ORM, or if SQL's natural "string based" nature were used. The above function can now be used e.g. like this:
// Get only actors whose first name starts with "A"
for (Record rec : actors(a -> a.FIRST_NAME.like("A%")))
System.out.println(rec);
FRM abstraction over SQL
Some FRMs try to abstract over the SQL language, usually for these reasons:
They claim SQL is not composable enough (jOOQ disproves this, it's just very hard to get right).
They claim that API users are more used to "native" collection APIs, so e.g. JOIN is translated to flatMap() and WHERE is translated to filter(), etc.
To answer your question
FRM is not "easier" than ORM, it solves a different problem. In fact, FRM doesn't really solve any problem at all, because SQL, being a declarative programming language itself (which is not so different from functional programming), is a very good match for other functional client programming languages. So, if anything at all, an FRM simply bridges the gap between SQL, the external DSL, and your client language.
(I work for the company behind jOOQ, so this answer is biased)
The hard problems of extending the relational database are extended transactions, data-type mismatches, automated query translation and things like N+1 Select that are fundamental problems of leaving the relational system and -- in my opinion -- do not change by changing the receiving programming paradigm.
That depends on your needs
If you want to focus on the data-structures, use an ORM like JPA/Hibernate
If you want to shed light on treatments, take a look at FRM libraries: QueryDSL or Jooq
If you need to tune your SQL requests to specific databases, use JDBC and native SQL requests
The strengh of various "Relational Mapping" technologies is portability: you ensure your application will run on most of the ACID databases.
Otherwise, you will cope with differences between various SQL dialects when you write manually the SQL requests .
Of course you can restrain yourself to the SQL92 standard (and then do some Functional Programming) or you can reuse some concepts of functionnal programming with ORM frameworks
The ORM strenghs are built over a session object which can act as a bottleneck:
it manages the lifecycle of the objects as long as the underlying database transaction is running.
it maintains a one-to-one mapping between your java objects and your database rows (and use an internal cache to avoid duplicate objects).
it automatically detects association updates and the orphan objects to delete
it handles concurrenty issues with optimistic or pessimist lock.
Nevertheless, its strengths are also its weaknesses:
The session must be able to compare objects so you need to implements equals/hashCode methods.
But Objects equality must be rooted on "Business Keys" and not database id (new transient objects have no database ID!).
However, some reified concepts have no business equality (an operation for instance).
A common workaround relies on GUIDs which tend to upset database administrators.
The session must spy relationship changes but its mapping rules push the use of collections unsuitable for the business algorithms.
Sometime your would like to use an HashMap but the ORM will require the key to be another "Rich Domain Object" instead of another light one...
Then you have to implement object equality on the rich domain object acting as a key...
But you can't because this object has no counterpart on the business world.
So you fall back to a simple list that you have to iterate on (and performance issues result from).
The ORM API are sometimes unsuitable for real-world use.
For instance, real world web applications try to enforce session isolation by adding some "WHERE" clauses when you fetch data...
Then the "Session.get(id)" doesn't suffice and you need to turn to more complex DSL (HSQL, Criteria API) or go back to native SQL
The database objects conflicts with other objects dedicated to other frameworks (like OXM frameworks = Object/XML Mapping).
For instance, if your REST services use jackson library to serialize a business object.
But this Jackson exactly maps to an Hibernate One.
Then either you merge both and a strong coupling between your API and your database appears
Or you must implement a translation and all the code you saved from the ORM is lost there...
On the other side, FRM is a trade-off between "Object Relational Mapping" (ORM) and native SQL queries (with JDBC)
The best way to explain differences between FRM and ORM consists into adopting a DDD approach.
Object Relational Mapping empowers the use of "Rich Domain Object" which are Java classes whose states are mutable during the database transaction
Functional Relational Mapping relies on "Poor Domain Objects" which are immutable (so much so you have to clone a new one each time you want to alter its content)
It releases the constraints put on the ORM session and relies most of time on a DSL over the SQL (so portability doesn't matter)
But on the other hand, you have to look into the transaction details, the concurrency issues
List<Person> persons = queryFactory.selectFrom(person)
.where(
person.firstName.eq("John"),
person.lastName.eq("Doe"))
.fetch();
I'd guess functional to relational mapping should be easier to create and use than OO to RDBMS. As long as you only query the database, that is. I don't really see (yet) how you could do database updates without side effects in a nice way.
The main problem I see is performance. Todays RDMS are not designed to be used with functional queries, and will probably behave poorly in quite a few cases.
I haven't done functional-relational mapping, per se, but I have used functional programming techniques to speed up access to an RDBMS.
It's quite common to start with a dataset, do some complex computation on it, and store the results, where the results are a subset of the original with additional values, for example. The imperative approach dictates that you store your initial dataset with extra NULL columns, do your computation, then update the records with the computed values.
Seems reasonable. But the problem with that is it can get very slow. If your computation requires another SQL statement besides the update query itself, or even needs to be done in application code, you literally have to (re-)search for the records that you are changing after the computation to store your results in the right rows.
You can get around this by simply creating a new table for results. This way, you can just always insert instead of update. You end up having another table, duplicating the keys, but you no longer need to waste space on columns storing NULL – you only store what you have. You then join your results in your final select.
I (ab)used an RDBMS this way and ended up writing SQL statements that looked mostly like this...
create table temp_foo_1 as select ...;
create table temp_foo_2 as select ...;
...
create table foo_results as
select * from temp_foo_n inner join temp_foo_1 ... inner join temp_foo_2 ...;
What this is essentially doing is creating a bunch of immutable bindings. The nice thing, though, is you can work on entire sets at once. Kind of reminds you of languages that let you work with matrices, like Matlab.
I imagine this would also allow for parallelism much easier.
An extra perk is that types of columns for tables created this way don't have to be specified because they are inferred from the columns they're selected from.
I'd think that, as Sam mentioned, if the DB should be updated, the same concurrency issues have to be faced as with OO world. The functional nature of the program could maybe be even a little more problematic than the object nature because of the state of data, transactions etc of the RDBMS.
But for reading, the functional language could be more natural with some problem domains (as it seems to be regardless of the DB)
The functional<->RDBMS mapping should have no big differences to OO<->RDMBS mappings. But I think that that depends a lot on what kind of data types you want to use, if you want to develop a program with a brand new DB schema or to do something against a legacy DB schema, etc..
The lazy fetches etc for associations for example could probably be implemented quite nicely with some lazy evaluation -related concepts. (Even though they can be done quite nicely with OO also)
Edit : With some googling I found HaskellDB (SQL library for Haskell) - that could be worth trying?
Databases and Functional Programming can be fused.
for example:
Clojure is a functional programming language based on relational database theory.
Clojure -> DBMS, Super Foxpro
STM -> Transaction,MVCC
Persistent Collections -> db, table, col
hash-map -> indexed data
Watch -> trigger, log
Spec -> constraint
Core API -> SQL, Built-in function
function -> Stored Procedure
Meta Data -> System Table
Note: In the latest spec2, spec is more like RMDB.
see: spec-alpha2 wiki: Schema-and-select
I advocate: Building a relational data model on top of hash-map to achieve a combination of NoSQL and RMDB advantages. This is actually a reverse implementation of posgtresql.
Duck Typing: If it looks like a duck and quacks like a duck, it must be a duck.
If clojure's data model like a RMDB, clojure's facilities like a RMDB and clojure's data manipulation like a RMDB, clojure must be a RMDB.
Clojure is a functional programming language based on relational database theory
Everything is RMDB
Implement relational data model and programming based on hash-map (NoSQL)
Being functional and being OO are two orthogonal concepts. The issue of mapping flat tables to trees of objects is orthogonal to Functional vs Imperative.
However, functional vs imperative does solve one particular mismatch, namely the mismatch between imperative updates and MVCC. In imperative programming, locking the table you are working with while you update the tables is the most intuitive approach, and anything non-sequential is extremely counterintuitive.
In FP, MVCC is much more natural than locks. The natural way to write is to compute the result set, compute the diff with read data, write (i.e. pick the updated dataset as the new one, sharing the data they have in common using persistent data structures), and do a rollback & retry if there is a write-write conflict. This matches exactly what MVCC does.