What is the role of CWA and OWA in semantic web applications? - owl

What role does CWA and OWA play in semantic web applications? How important is it for those developing ontologies, writing SHACL schemas, or generating linked data to keep these concepts in mind?

Your question is vague but I'll try addressing CWA, OWA and also UNA that you mention in the comments, using examples.
CWA
If you're told that Charles is a married man, and Sharon is a married woman, then with the closed world assumption, you would assume that Charles is not married to Sharon. If, later on, you are additionally informed that Charles and Sharon are indeed husband and wife, then you would have to retract your former conclusion and accept this new fact. With CWA, reasoning is non-monotonic, meaning that with the addition of new knowledge, you may have less conclusions than before.
OWA
Under the open world assumption, you simply don't make the CWA. Charles and Sharon may be married, but they may not be married. You simply do not draw any conclusion about their marriage. With the additional knowledge of them being husband and wife, you can add more conclusions. Additional knowledge always increases what you can conclude, thus it's monotonic. OWA, in spite of its name, is really an absence of assumption. If something can't be proved, then it can't be proved.
UNA
If I say that Dave was born on 16th September 1975 and lives at the 3rd floor of a 25-storeys building in Soho, London, and David, born in September the sixteenth of nineteen seventy five, is living in the district of Soho of the city of London, in an apartment at the third floor of a building with twenty five floors, then I may be thinking that Dave and David are names of the same person. However, if I adopt the unique name assumption, then I conclude that Dave and David are different, so that there are, coincidentally, two people with the same birthdate living in the same district. Any further knowledge that would allow us to assertain that Dave and David represent the same person would be inconsitent with the UNA. Without UNA, no such conclusion is made, and further knowledge could identify Dave and David as one person.
The role of these assumptions in the Semantic Web
The standards of the semantic web do not impose either CWA or UNA. So you may think of them as essentially "based on OWA". However OWA and CWA do not apply to every standard. For instance, talking about CWA or OWA for SHACL is irrelevant. SHACL is not a knowledge representation language. It's used to describe the shapes of RDF graphs. If a shape says that an IRI must be the subject of 2 triples in a graph, then you look at the graph and count the triples. If there are indeed 2 such triples, the shape is satisfied, otherwise it's not. There is no need for any assumption.
This does not mean that some connections cannot be made between SHACL and reasoning with CWA. Certain aspects of SHACL can emulate logical reasoning with a seemingly CWA flavour. But saying that SHACL makes the CWA is as improper as saying that my IDE makes the CWA when checking the syntax of my code.
Also, in spite of semantic web standards not relying on CWA or UNA in their specifications, nothing prevents anyone from building systems based on, say, OWL, RDFS, SWRL, RIF and apply additional assumptions for the purpose of an application. Using OWL + CWA + UNA in a system can be very useful and is perfectly fine. But if you draw conclusions from these assumptions, you should not believe that others are drawing the same conclusions.
Other assumptions
There are other kinds of assumptions that can be useful, and that are sometimes misinterpreted as CWA. First, it is possible to have a local closed world assumption which only "closes the world" on some parts of the knowledge. For instance, an airline company may know about every flight existing, so we can assume that if it's not in their knowledge base, it simply does not exist. However, it may have incomplete knowledge about touristic places. For instance, if a restaurant does not have any review from the company's customers, it does not mean that none of them ever visited this restaurant.
Second, there is the closed domain assumption, which is often confused with CWA. It assumes that the universe is limited to only the things that are named in the knowledge base. In the example of Charles and Sharon, two people who are married, if we do not have any other facts, we would conclude that Charles and Sharon are necessarily married because they are the only two existing entities (assuming it is known that someone cannot marry themself).
Formal examples
Assume the first order logic theory:
∃x married(Charles,x)
∃x married(Sharon,x)
∀x ∀y married(x,y) ⇒ married(y,x)
∀x ¬married(x,x)
Without any assumption (i.e., under OWA), we can conclude that:
¬married(Charles,Charles)
¬married(Sharon,Sharon)
With CWA, we can additionally conclude (or assume):
¬married(Charles,Sharon)
¬married(Sharon,Charles)
With CDA, but without CWA, we can conclude:
married(Charles,Sharon)
married(Sharon,Charles)
Those FOL sentences can all be expressed equivalently in OWL 2, so this works with semantic web standards as well.

Related

Difference between homonyms and synonyms in data science with examples

Please share the difference between homonyms and synonyms in data science with examples.
Synonyms for concepts:
When you determine that two concepts are synonyms (say, sofa and couch), you use the class expression owl:equivalentClass. The entailment here is that any instance that was a member of class sofa is now also a member of class couch and vice versa. One of the nice things about this approach is that "context" of this equivalence is automatically scoped to the ontology in which you make the equivalence statement. If you had a very small mapping ontology between a furniture ontology and an interior decorating ontology, you could say in the map that these two are equivalent. In another situation if you needed to retain the (subtle) difference between a couch and a sofa, you do that by merely not including the mapping ontology that declared them equivalent.
Homonyms for concepts:
As Led Zeppelin says, "and you know sometimes words have two meanings…" What happens when a "word" has two meanings is that we have what WordNet would call "word senses." In a particular language, a set of characters may represent more than one concept. One example is the English word "mole," for which WordNet has 6 word senses. The Semantic Web approach is to give each its own namespace; for instance, I might refer to the counterspy mole as cia:mole and the burrowing rodent as the mammal:mole. (These are shortened qnames for what would be full namespace names.) The nice thing about this is, if the CIA ever needed to refer to the rodent they could unambiguously refer to mammal:mole.
Credit
Homonyms- are words that have the same sound but have different in meaning.
2. Synonyms- are words that have the same or almost the same meaning.
Homonyms
Machine learning algorithms are now the subject of ethical debate. Bias, in layman's terms, is a pre-formed view created before facts are known. It applies to an estimating procedure's proclivity to provide estimations or predictions that are, on average, off goal in machine learning and data mining.
A policy's strength can be measured in a variety of ways, including confidence. "Decision trees" are diagrams that show how decisions are being made and what consequences are available. Rescale a statistic to match the scale of other variables in the model to normalise it.
Confidence is a statistician's metric for determining how reliable a sample is (we are 95 percent confident that the average blood sugar in the group lies between X and Y, based on a sample of N patients). Decision tree algorithms are methods that divide data across pieces that are becoming more and more homogeneous in terms of the outcome measure as they advance.
A graph is a graphical representation of data that statisticians call plots and charts. A graph seems to be an information structure that contains the ties and links among items, according to computer programmers. The act of arranging relational databases and their columns such that table relationships are consistent is known as normalisation.
Synonyms
Statisticians use the terms record, instance, sample, or example to describe their data. In computer science and machine learning, this can be called an attribute, input variable, or feature. The term "estimation" is also used, though its use is generally limited to numeric outcomes.
Statisticians call the non-time-series data format a record, or record. In statistics, estimation more often refers to the use of a sample statistic to measure something. Predictive modelling involves developing aggregations of low-level predictors into more informative "features".
The spreadsheet format, in which each column is still a variable, so each row is a record, is perhaps the most common non-time-series data type. Modeling in machine learning and artificial intelligence often begins with some very low-level prediction data.

Harmonizing terms in two different RDF ontologies

At first this problem seems trivial: given two ontologies, which term in ontology A best refers to a term in ontology B.
But its simplicity is deceptive: this problem is extremely hard and has currently lead to thousands of academic publications, without any consensus on how to solve this problem.
Naively, one would expect that simply looking at the term "Heart Attack" in both ontologies would suffice.
However, ontologies almost never encode the same phrase.
In simple cases "Heart Attack" might be coded as "Heart Attacks", or "Heart attack (non-fatal)", but in more complicated cases it might only be coded as "Myocardial infarction".
In other cases it is even more complicated, for example dealing with compound (composed) terms.
More importantly, simply matching the term (or string) ignores the "ontological structure".
What if "Heart Attack" in ontology A is coded as caused-by high blood pressure, whereas in ontology B it might be coded as withdrawl-from-trial-non-fatal.
In this case it might be valid to match the two terms, but not trivially so.
And this assumes the equivalent term exists at all.
It's a classical problem called Semantic/Ontology Matching, Alignment, or Harmonization. The research out there involves lexical similarity, term usage in free text, graph homomorphisms, curated mappings (like MeSH/WordNet), topic modeling, and logical inference (first- or higher-order logic). But which is the most user friendly and production ready solution, that can be integrated into a Java(/Clojure) or Python app? I've looked at Ontology matching: A literature review but they don't seem to recommend anything ... any suggestions or experiences?
Have a look at http://oaei.ontologymatching.org/2014/results/ . There were several tracks open for matchers to be sent in and be evaluated. Not every matcher participates in every track. So you might want to read the track descriptions and pick one that seems to be the most similar to your problem. For example if you don't have to deal with multiple languages you probably don't have to check the MultiFarm track. After that check the results by having a look at Recall, Precision and F-Measure and decide for yourself. You also might want to check out some earlier years.

Do I own the copyright to code I write for school projects? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I'm a student at a US college, and I've been assigned a programming project to complete on my own. I wrote a program to solve a somewhat complex problem, and I'd like to release it under an open-source license so that others can use it and learn from it. However, I'm not entirely sure to whom the code's copyright belongs. The class' syllabus says nothing about the ownership of code produced for the class, but I don't want to take any chances.
Do I own the code?
If nothing is stated by the school that you signed and agreed to...copyright defaults to you, the author.
When you're paid it's a different set of rules - Look in the comments of this answer for some excellent resources from Stephen C and outis. With anything legal it's safest to get an opinion from the experts, in this case a lawyer. (Always a good idea actually, water pipes broken? call a plumber)
IANAL, I would assume it should be specified in University policies.
For instance, University Policy Office of University of California specifies Copyright Ownership Policy
This Policy is intended to embody the
spirit of academic tradition, which
provides copyright ownership to
faculty for their scholarly and
aesthetic copyrighted works, and is
otherwise consistent with the United
States Copyright Law, which provides
the University ownership of its
employment-related works
and such it is clarified to whom such policy applies, for example ucop.edu says:
This Policy applies to University
employees, students, and other persons
or entities using designated
University facilities
Now, regarding student work, it specifies as follows:
Ownership of copyrights to student
works shall reside with the
originator.
where originator is also clearly defined:
One who produces a work by his or her
own intellectual labor.
Given this example, I would ask your office of your university for presenting you with such policy document. If no document is available, I think you need to refer to your government law, but no policy shall mean copyright belongs inherently to you.
License it before you turn it in. :)
Do you think it's going to be a problem? What does the professor say? I'd just ask to see if it's going to be a problem before I got too worried about it. The university probably has someone you can ask too. Find that person and ask.
However, there's no contract where there is no exchange of value. They can't own it and also not give you something for it. When I worked at Universities, the ownership was always specified in my contract, whether the work belonged to the school, the government, the grant funder, etc.
Some of this might depend on state law, as well. There was a case in the Perl world where an employer asserted rights to open source code because New York law states that even work an employee does in the same field unrelated to his employment is also property of the employer.
If it really matters to you though, find a free law center near you and get an answer from a real lawyer.
Don't do this, even if you own the license, the school will accuse you of academic misconduct for providing answers to younger students, we had several cases like this at Ohio State.
Yes, I think that's stupid too, but instructors reuse course work for years and stuff like this would make it very hard for them to combat cheating. If you want to share reusable components, make sure it doesn't explicitly have the answers to any classwork.
Edit: If the program you wrote is interesting enough, and doesn't explicitly seem like classwork though, it's definitely worth it to talk to the professor, and maybe even work with him to write a professional journal article about it.
I've been reading Open Source Licensing: Software Freedom and Intellectual Property Law by Lawrence Rosen, Prentice Hall, New Jersey, 2005. Very good book. I recommend it. He discusses mostly Copyright Law in the context of Software Development, talks briefly about Patents, and mentions Trademarks.
Copyright protection subsists ... in original works of authorship fixed in any tangible medium of expression ... from which they can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device ... ( 17 U.S.C. SS 102 ) ... In the case of a work made for hire, the employer or other person for whom the work was prepared is considered the author... ( 17 U.S.C. SS 201[b] ) (Rosen, p 19,20)
In your case Evan, as a student, I recommend taking your time to read this book. It even includes copies of and discusses the Licenses: BSD, MIT, Apache, Artistic, GPL, Mozilla, CPL, and the author's own OSL and AFL. Of course, since the book's publishing in 2005, there are newer versions of those licenses available online. The author recommends, and is a member of, www.opensource.org/.
However, in light of your discloser of solving a problem for class: Section 101 of the copyright law defines a "work made for hire" as: ... 2 a work specifically ordered or commissioned for use as: ... answer material for a test ...(http://www.copyright.gov/circs/circ01.pdf] April 2012, p2)
I suppose one could argue that you were ordered to create answer material for a test: but are you not the one who might be considered the employer? Maybe that would depend upon whether you attend a public or private school? I agree that it is just rude to publish answers to quizes/tests but I don't believe that is your intent. At this point I would recommend discussing it further with your professor and take his advice to further your development beyond the scope of just an answer for a test into a truly original work of authorship.
However, from what I believe, and perhaps I am slightly presumptuous: It's all yours to with it what you wish. Realize that, unless they agree to whichever license you propose, nobody may copy your work without infringing on your copyright - enforcing and proving that right requires a few extra steps as briefly described in the copyright.gov publication.
Good Luck Mr Evan. I was delighted and envious to hear about you wonderful summer: I hope, like you said, next summer will be even better. Intellectual property is a concept I often stumble over but I hope to be fearless moving forward as I continue to enlighten myself with such books.
If you wrote a paper for an English or history class, you'd own that, wouldn't you? It's your work, and if someone else copied it and turned it in, that would be plagiarism and presumably they'd fail the project at the very least.
If you were paid to write said code, whomever paid you owns the rights to the code. If you were not paid (like for a class project), you own all the rights to the code.

EMR (Electronic Medical Record) standard record format?

A few associates and myself are starting an EMR project (Electronic Medical Records). I have heard talk in the past - and more so lately - about a standard record format - to facilitate the transferring of records when appropriate (HIPAA) from one facility to another. Has anyone seen any information on this?
You can look to HL7 for interoperability between systems (http://www.hl7.org/). Patient demographic information and textual notes can be passed. I've been out of the EMR space too long to know if any standards groups have done anything interesting of late. A standard format that maintains semantic meaning is a really, really difficult problem. See SnoMed (http://www.nlm.nih.gov/research/umls/Snomed/snomed_main.html) for one long-running ontology effort -- barely the start of a rich interchange format.
A word of warning from someone who spent several years with an upstart EMR vendor...This is a very hard business to be in. Sales cycles for large health systems literally can take years, and the amount of hand-holding required for smaller medical practices can quickly erode margins. Integration with existing practice management systems is non-standard, even if those vendors claim otherwise. More and more issues abound. I'm not sure that it's a wise space for an unfunded start-up to enter.
I think it's an error to consider HL7 to be a standard in the sense you seem to mean. It is heavily customized and can be quite different from one customer to the next. It's one of those standards with too much flexibility.
I recommend you read the standard (which should take you a while), then try to find a community of developers working with the standard. Ask them for horror stories, and be prepared for what you'll hear.
A month late, but...
The standard to shoot for is definitely HL7. It is used in many fields, so is highly customizable but there is a well defined standard for healthcare. Each message (ACK, DSR MCF), segment (PID, PV1, OBR, MSH, etc), sequence and event type (A08, A12, A36) has a specific meaning regardless of your system of choice.
We haven't had a problem interfacing MiSYS, Statlan, Oacis, Epic, MUSE, GE Centricity/Lastword and others sending DICOM, ADT, PACS information between the systems we have in use. Most of these systems will be set up with an interface engine to tweak messages where needed, so adding a way to filter HL7 messages as they come through to your system, and as they go out to the downstreams, would be a must.
Even if there would be a new "presidential standard" for interoperability, and I would hazard a guess that it will be HL7 anyway, I would build the system with HL7 messaging as this is currently the industry accepted standard.
While solving interoperability, you shouldn't care only about the interchange format, the local storage formats should be standardized also, to simplify the transformation to the interchange format and vice versa.
openEHR is a great format for storage, it is more expressive than HL7 v2, v3 and CDA, so it can be transformed easily to any of those. The specs are open and here: http://openehr.org/programs/specification/releases/1.0.2
For the interchange format, any of HL7 v2, v3 and CDA are good. Also consider CCR and CCD.
http://www.aafp.org/practice-management/health-it/astm.html
If you want to go outside HL7 thinking and are looking for an comprehensive EMR or EHR with a specified record format rather than a record extract message interchange format, then have a look at openEHR, http://www.openehr.org/. The ISO 13606 extract standard is (almost) a subset of openEHR. You will also find open source reference libraries and openEHR implementations of different maturity available in Java, .NET, Ruby, Python, Groovy etc.
Some organisations are also producing HL7 artifacts like CDA as output from openEHR based EHR/EMR systems.
Have a look at the Continuity of Care Record--IIRC, that's what Google Health uses for input. It's not an HL7-family standard (there's a competing HL7-family standard--don't recall what it's called off-top).
There likely will not be a standard medical record format until the government dictates the format of one and requires its use by force of law.
That almost assuredly will not happen without socialized national health care. So in reality zero chance.
its correct answer but i think some add about meaningful use of emr..... Officials Announce ‘Meaningful Use,’ EHR Certification Criteria
Last week, CMS released proposed regulations defining the “meaningful use” of electronic health records, Reuters reports (Wutkowski/Heavey, Reuters, 12/31/09).
In addition, the Office of the National Coordinator for Health IT released an interim final rule describing the required certification standards for EHR technology (Simmons, HealthLeaders Media, 12/31/09).
Under the 2009 federal economic stimulus package, health care providers who demonstrate meaningful use of certified EHRs will qualify for incentive payments through Medicaid and Medicare.
Officials will offer a 60-day public comment period after both regulations are published in the Federal Register on Jan. 13. The interim final rule on EHR certification is scheduled to take effect 30 days after publication (Goedert, Health Data Management, 12/30/09). http://www.myemrstimulus.com/
This is a very hard problem because data collection starts with an MD and the only coding they know (ICD and CPT) is all about billing, not anything likely to be of use between providers (esp. in a form where the MD can be held legally liable). And they hate even that much paperwork.
Add to that the fact that HIPAA dictates that the patient not the provider owns the data. Not that they could understand it or do anything useful with it if they had it.
Good luck. Whatever happens will result from coercion by the govt and be a long long time coming IMHO.
Interestingly the one source of solid medical info turns out to be the VA (because they don't have the adversarial issues of payment and legal liability.) Go figure. That might be a good place to start for a standard with any existing data and some momentum, though. Here's another question with some info.

Are zip code and postal code violation of 3rd normal form?

Given that state information is implicit in the zip code aren't storing both of them some violaiton of third normal form? Can or should you simply combine them into one field?
According to this post, there are a few zip codes that cross state boundaries. So no, it is not a violation of 3NF.
Actually, there are a few rare cases where a ZIP Code crosses state boundaries. Usually it is due to access problems, such as being on a military base or due to constraints of the transporation network.
One such case is Protem, Missouri (ZIP Code 65733). Some of the Arkansas roads north of Bull Shoals Lake can best be accessed by the Protem delivery unit rather than an Arkansas post office. Some examples of such roads include Ann Street, Kalijah Road, McBride Road, Red Oak Lane, and Vance Road on Highway Carrier Route H002 in ZIP Code 65733. McBride Road actually crosses across the state boundary. If you look at the road network in an online mapping program, you can see that a rural carrier from say, nearby Diamond City, AR (ZIP code 72644), on the south side of Bull Shoals Lake, would need to drive several miles to be able to access the roads listed above.
For another example, Fort Campbell, Kentucky (ZIP Code 42223) also has some roads that exist within Tennessee.
That statement isn't actually true in all geographical areas. Australia has a few sister cities that straddle state boundaries yet share the same postcode.
And 3NF, while incredibly useful, is not inviolable. I've sometimes reverted some table information back to 2NF for performance reasons.
Nope. There are some zip codes that cross state lines. See Wikipedia for some examples. Furthermore, normalization reduces redundancy, while addresses are actually fairly complicated things that are easy to get one component of wrong. Redundancy means that even if part of the address is wrong, there is a good chance that the mail will be able to get where its going.
I recall a time when a hiker from Europe stayed at my fraternity, and wanted to send a thank-you note. He did not understand American addresses or geography very well, so when he sent the note it was addressed to "<fraternity name> <not quite correct name of university> New England? USA". The mail actually got there, amazingly enough.
Redundancy in addresses can be a very good thing, and you generally shouldn't assume more about an address than you need to. For instance, some people don't have a street number; you put "general delivery", and the mailman is expect to know where the letter goes (or you can pick it up at the post office if he doesn't).
There is a different issue. You might want to make a difference between the data that was entered (which could be conflicting) and the conclusion you make from that.
3NF violation by example
Let's look at the below denormalized table for a blog posts project. It's not the 3rd normal form, it's broken. Let's say there are multiple
posts with same author, we may update a few rows and leave others un-updated. Leaving the table data inconsistent.
Hence this violates normalization because it violates a common way to describing normalized tables in 3rd normal form, which is that every non-key attribute in the table must provide a fact about the key, the whole key and nothing but the key. And that's of a play on words for what you say in a US courtroom, telling the truth, the whole truth and nothing but the truth. The key in this case, is the Post Id and there is a non-key attribute Author Email which does not follow that. Because it does, in fact tell something about the author. And so it violates that 3rd normal form by not achieving the goals of normalization
hope this helps.

Resources