How to design a DataBase for well structured Java API Documentation? - database

I have a scenario to design a database for Java API Documentation, in which I have to present the information about every class and method in a given piece of code. For example, consider:
1. main()
2. {
3. String foo="test";
4. foo.substring(1,2);
5. }
Here, I have to show documentation for class String and method substring from Java docs (The classes/methods can be any valid class/method).
My Observations:
The classes may repeat in various packages, so they can't be unique. Same goes for methods.
The method name foo() can be :
1) The method of this class
2) Overrides the method of some parent class
3) Simply inherits the a method.
With this info, I have following tables:
1)
CREATE TABLE "JAVACLASSDESCRIPTION"
( "CLASSFULLNAME" VARCHAR2(400) NOT NULL ENABLE,
"CLASSNAME" VARCHAR2(400),
"CLASSDEFINATION" CLOB,
"CLASSDECLARATION" CLOB,
"INHERITEDCLASSES" CLOB,
CONSTRAINT "JAVACLASSDESCRIPTION_PK" PRIMARY KEY ("CLASSFULLNAME") ENABLE
) ;
INHERITEDCLASSES is a multi-valued attribute.I know it's a really poor thing, but I have reasons.
1) 1st check if the method is available in JAVAMETHODDESCRIPTION table (Either as the class method itself, or a override method ).
2) If not, it has to be a method which is inherited for some parent class. So we have to show the documentation of method of this parent class.To save multiple searching, the value INHERITEDCLASSES contains is as follows(for some random class):
java.lang.Object: clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
so that it's parent class is java.lang.Object followed by the list of methods, so that it's easy to match the method name.
2)
CREATE TABLE "JAVAMETHODDESCRIPTION"
( "CLASSFULLNAME" VARCHAR2(400) NOT NULL ENABLE,
"METHODNAME" VARCHAR2(400) NOT NULL ENABLE,
"METHODDECLARATION" VARCHAR2(400),
"METHODDEFINATION" CLOB
) ;ALTER TABLE "JAVAMETHODDESCRIPTION" ADD CONSTRAINT "JAVAMETHODDESCRIPTION_FK" FOREIGN KEY ("CLASSFULLNAME")
REFERENCES "JAVACLASSDESCRIPTION" ("CLASSFULLNAME") ON DELETE CASCADE ENABLE;
Sample output:
I know there are lot of design issues.How can I improve my database design?
EDIT:
About the multi-valued entry, if decomposed to another table may result into redundant entries.For eg. Object class is the super class for all.
Link for documentation page

hard to tell everything because a lot of things is unspecified, but:
a few things to consider: loaded java class is identified by unique full name (x.y.z.Myclass$Inner.class) and it's classloader. if you don't care about loaded class but only the source code (javadoc) then you can probably skip classloader.
javadoc can be at method, class, package and field level (each of them can be uniquely identified by signature)
if you want to support javadoc for inherited methods then you need to model multiple inheritance (javadoc can be inherited also from interfaces) and in your application traverse that tree top on request to display the javadoc. other option is to do the traversing during saving the content to db
another thing is versioning of library and jdk. don't know if you want to support different versions.

The best method for going about something like this, is to get to the 3rd normal form. You may not stay there, but at least by "seeing it" you learn a few things along the way about your system, the relationships, and such.
http://en.wikipedia.org/wiki/Third_normal_form
Once you get to that stage, then start looking at what you have, and ask yourself how it will behave with your specific situation. ie how are you accessing it .. is it going to perform poorly, would it help to denormalize things a little to help improve query performance, etc, etc.

Related

How to design a system backend which user can customize some configuration

I should model a system that clients can apply some configuration on separated entities.
Let me explain with an example:
We have users that have a config tab in their dashboards.
We have a feature to send notifications on their browsers and we have a feature which we can send an email to them.
We also have a feature as a pop-up.
The user should be able to modify our default notification message, modify our default email template, modify our default text on email or elements.
For the pop-up, The user should be able to modify the width and height of the pop-up, change the default texts, modify background color, change the button location on the pop-up.
And when I want to send an email to the user I should apply these settings on the template then send the email. Also when the front-end wants to show those pop-ups, wants to get these configs from my API and apply them.
These settings will be more and more in the future. So I can not specify a settings table with some fields. I think it is not a good idea.
What can I do? How to design and model this scenario? What are the best practices?
Can I use a NoSQL like MongoDB instead of a relational database?
Thanks a lot.
PS:
I am using Django to develop this system.
I have built similar sub-systems before, by hand.
I don't know much about Django, but do some research to see if it has any "out of the box" or community developed / open source add-ons that do what you want.
If you have to do it yourself...
A key-value pair is not going to be enough, but it's close. You only need a simple data structure:
ID (how your code recognizes this property), e.g. UserPopupBackgroundColor.
Property name (what the user see's / how they recognize this property in the UI), e.g. "Popup Background Color".
Optional - Data type. This is essential if you want to do any sensible input validation. E.g. pop up height should probably expect an integer, and have a sensible min/max value on it, where as an email address is totally different.
Optional, some kind of flag to identify valid properties.
That last flag is bit of an edge case, but it's useful if you use the subsystem to hold more properties than you want users to have access to. E.g. imagine you want to get a list of all properties and display the list to the user - are there any 'special' ones you need to filter out that they should not see?
You then need somewhere to put the values, and link them to the user:
Row ID / GUID. You can use a unique constraint across the User and PropertyID if you wanted to instead, but personally I find a unique row ID is a reliable and flexible approach for most scenarios.
UserID.
PropertyID - refers to ID mentioned above.
PropertyValue
Depending on how serious you need to get, you can dump all the values into the one PropertyValue column (assuming you're persisting this in a database) - which means that column needs to be a string, or, you can add a column per data type.
If you want to add a column per data type, don't kill yourself. The most I have ever done is:
PropertyValue_text (text/varchar)
PropertyValue_int (or double)
PropertyValue_DateTime (date/time - surprise!!)
So when I say 'column per data type' I mean per data type your stack needs/wants to handle - not the 'optional' data types you define in the logic - since that data type is partially just about input validation.
Obviously if you use different logical data types, you can map those to data type columns in the database. The reasons for doing this (using the different data types in the database are:
To reduce the amount of casting you need to do (code to database, and vis-a-versa).
To leverage database level query features, which can be useful. E.g. find emails values and verify them; find expired date values; etc.
It takes a bit of work to build all this, but it's powerful once you get set-up because you can add any number of properties. If you are using the 'full' solution with explicit data types then adding new logical data types isn't too painful if you already have a few set-up.
Before you design and build this, think about future reuse, and anyway you can package it up for later - or community use. Remember it impact all layers (UI, logic and data).
Final tip - when coming up with the property ID's (that the code uses) make them human readable, and use some sort of naming convention so that adding new ones later is easy and follows a predictable path.
Update - Defining Property and PropertyValue in database tables is an obvious way to go. Depending on the situation you can also define Property in code - especially if you don't add new ones or change existing ones very frequently. Another bonus is that if you're in an MVP situation you can use the code effectively as a stub, and build out the database/persistence part for that later.

Unique identifiers for ui automation

Our ui automation team is asking for a better way to select elements for their automated tests. My thinking is that we can inject a dedicated attribute (say "ui-auto") for each testable element. This attribute would have a value which is:
unique
persistent (doesn't change across sessions or page loads so as to not break the tests)
predictable (follows some naming convention depending on action type, location, etc.)
My questions are:
Is this a good idea? better ideas are welcome.
Are there existing conventions for this?
What the best way to implement
this?
I should mention that we are using angular and I thought that
using some kind of directive and/or service would help automate
this.
I should also say that I don't want to use the "id" attribute b/c I'd like to have separation between development concerns (ids may be used for javascript), and qa concerns (selection of elements for automated tests)
In our implementation we add to the DOM element a data-awt attribute, the value consists of a context (page and mode) type and unique string. As we use the EXTJS library our type is the xtype and the unique string is components name or text property. The context is developer controlled by placing a unique property on the upper most parent and all children use this as their context.
In practice we end up with data-awt values like devicesListing-button-edit, deviceDetails-displayfield-name, deviceDetailsEditWindow-textfield-name.
We found that relying on css, id, or other attributes aren't reliable and predictable since we don't want to rewrite our tests whenever there is some UI change. Now the test only needs updating if an existing element changes its name (for example the PM says the name field should now use the 'customer' data from the DTO).
You can also use the class of the element and provide a unique identifier prefixed with something like "auto_" or "t_".
The agreement exists that if anyone changes the class name with that prefix, tests will break.
#o4ohel I agree that not using ids is better as devs also depend on them and they need to change sometimes. Identifiers for automation should be isolated. It's nice to have that separation.

JDO, unowned relationship and persisting multiple objects at once

I would like to have an unowned relationship between two persistence capable classes, created at once. The relationship has to be unowned, because my app can’t really guarantee that the two instances I want to persist will stay in the same entity group for good. My relationship is a bidirectional one-to-many:
// in first class
#Persistent
private Set<Key> instancesOfSecondClass;
// in second class
#Persistent
private Key instanceOfFirstClass;
Then, in one servlet doPost() call, I need to persist one instance per these classes. I actually made a nice methods to maintain both sides of the relationship. First, the Key id (I use …
#PrimaryKey
#Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key id;
… as primary keys everywhere) is added to the Set in first class, then a convenience method is called on the instance of second class to notify of this change and update the one Key property accordingly, comparing old values, null checks and everything. Indeed I don’t yet have any Key when the instance is fresh, so I need to persist them both first with pm.makePersistent() call, then do the assignment and wait for pm.close() to happen later.
But, what really happens, is pretty confusing. After running the servlet (whose sole purpose is to return serialized Keys of these two instances for use elsewhere), I go check the datastore viewer and see:
the instance of first class got persisted (actually have one more issue here throwing at me NullPointerException from org.datanucleus.store.mapped.mapping.PersistenceCapableMapping.postInsert(PersistenceCapableMapping.java:1039) at the very moment I call pm.makePersistent() on the instance with Set<Key>)
the instance of second class got persisted (so far so good)
the instance of second class has the Key reference to the first instance persisted, yay
the set of Keys on the first instance is… wait for it… empty
The local datastore shows just empty space, while the one online shows <null>, even though my constructor for the class (a one with arguments) creates a new instance of HashSet<T> for the relation.
Using Google App Engine for Java 1.6.4, happened in 1.6.3 too.
Spent a whole day trying to solve this, putting transactions in between, using two PersistenceManagers, different order of persisting calls, cross-group transactions enabling, nothing helped.
Generically, I would be happy to find a working way to create and update two instances in two separate entity groups with unowned relationship between them, no matter of the possible inconsistence (doesn’t worry me that much, based on the frequency of possible updates).
In the mean time, I found two more possible solutions, but haven’t tried them yet: 1. create not only two, but four PersistenceManagers (one to create first instance, second to create second instance, third to update one side of relationship, fourth to update the second side of relationship), or 2. detach the instances and make them persistent again after update. Not sure which way to go now.
v2 of the plugin provides "real" unowned relations with none of this Key hackery needed before. You can persist both sides in the same transaction if you have the multiple-Entity-Groups flag set, or you can persist them non-transactional.
Just look at the tests for ideas http://code.google.com/p/datanucleus-appengine/source/browse/trunk/tests/com/google/appengine/datanucleus/jdo/JDOUnownedOneToManyTest.java
I have no control over what goes in their docs, just what goes in their code ;-)

Silverlight LINQtoSQL: one big dataclass, or several small ones?

I'm new to Silverlight, but being dumped right into the fray - good way to learn I suppose :o)
Anyway, the webapp I'm working on has a relatively complex database structure that represents various object types that are linked to each other, and I was wondering 2 things:
1- What is the recommended approach when it comes to dataclasses? Have just one big dataclass, or try and separate it into several smaller dataclasses, keeping in mind they will need to reference each other?
2- If the recommended approach is to have several dataclasses, how do you define the inter-dataclasses references?
I'm asking because I did a small test. In my DB (simplified here, real model is more complex but that's not important), I have a table "Orders" and a table "Parameters". "Orders" has a foreign key on "Parameters". What I did is create 2 dataclasses.
The first one, ParamClass, were I dropped the "Parameters" table only, so I can have a nice "parameter" class. I then created a simple service to add basic SELECT and INSERT functionality.
The second one, OrdersClass, where I dropped both tables, so that the relation between the tables would automatically create a "EntityRef<parameter>" variable inside the "order" class. I then removed the "parameters" class that was automatically created in the OrdersClass dataclass, since the class has already been declared in the ParamClass dataclass. Again I created a small service to test it.
So far so good, it builds happily. The problem is that when I try to handle things on the application code, I added service references for both dataclasses, but it is not happy doing something like:
OrdersServiceReference.order myOrder = new OrdersServiceReference.order();
myOrder.parameter = new ParamServiceReference.parameter(); //<-PROBLEM IS HERE
It comlpains that it cannot implicitly convert from type 'MytestDC.ParamServiceReference.parameter' to 'MytestDC.OrdersServiceReference.parameter'
Do I somehow need to declare some sort of reference to ParamClass from OrdersClass, or how do I "convert" one to the other?
Is this even a recommended and efficient way of doing this?
Since it's a team-project, I initially wanted to separate the dataclasses so that they (and their services) can be easily checked out by one member without checking out the whole entire dataclass.
Any help appreciated!
PS: using Silverlight 4, in case that's important
Based on the widely accepted Single Responsability Principle (SRP), a class should always be responsible for one task, and one task only.
That pretty much invalidates your "one big dataclass" approach.
I would always recommend smaller, more manageable bits that can be combined, instead of one humonguous class that does everything (except brew coffee for you).
Resources for the SRP:
Wikipedia on SRP
OODesign: Single Responsibility Principle
ObjectMentor: list of articles on good app design - which has a few links to PDF documents, like this one on SRP written by Robert C. Martin - the "guru" on proper OO design
OK, some more research let me to this: it is not simple to separate classes from a relational model using LINQtoSQL. I ended up switching to an Entity Framework approach, which itself doesn't deal with it gracefully (see here and there, for example), but at least it solved another major problem I had with LINQtoSQL.
There are other ORMs out there that are apparently much more capable at this (NHibernate comes up often in recommendations), unfortunately, I don't have time to investigate them now, being under such a tight deadline.
As for the referencing, it was quite simple, change the line to:
myOrder.parameter = new OrderServiceReference.parameter();
even though I removed the declaration from that dataclass.
Hope this helps someone!

Django models generic modelling

Say, there is a Page that has many blocks associated with it. And each block needs custom rendering, saving and data.
Simplest it is, from the code point of view, to define different classes (hence, models) for each of these models. Simplified as follows:
class Page(models.Model):
name = models.CharField(max_length=64)
class Block(models.Model):
page = models.ForeignKey(Page)
class Meta():
abstract = True
class BlockType1(Block):
other_data = models.CharField(max_length=32)
def render(self):
"""Some "stuff" here """
pass
class BlockType2(Block):
other_data2 = models.CharField(max_length=32)
def render(self):
"""Some "other stuff" here """
pass
But then,
Even with this code, I can't do a query like page.block_set.all() to obtain all the different blocks, irrespective of the block type.
The reason for the above is that, each model defines a different table; Working around to accomplish it using a linking model and generic foreign keys, can solve the problem, but it still leaves multiple database tables queries per page.
What would be the right way to model it? Can the generic foreign keys (or something else) be used in some way, to store the data preferably in the same database table, yet achieve inheritance paradigms.
Update:
My point was, How can I still get the OOP paradigms to work. Using a same method with so many ifs is not what I wanted to do.
The best solution, seems to me, is to create separate standard python class (Preferably in a different blocks.py), that defines a save which saves the data and its "type" by instantiating the same model. Then create a template tag and a filter that calls the render, save, and other methods based on the model's type.
Don't model the page in the database. Pages are a presentation thing.
First -- and foremost -- get the data right.
"And each block needs custom rendering, saving and data." Break this down: you have unique data. Ignore the "block" and "rendering" from a model perspective. Just define the data without regard to presentation.
Seriously. Just define the data in the model without any consideration of presentation or rending or anything else. Get the data model right.
If you confuse the model and the presentation, you'll never get anything to work well. And if you do get it to work, you'll never be able to extend or reuse it.
Second -- only after the data model is right -- you can turn to presentation.
Your "blocks" may be done simply with HTML <div> tags and a style sheet. Try that first.
After all, the model works and is very simple. This is just HTML and CSS, separate from the model.
Your "blocks" may require custom template tags to create more complex, conditional HTML. Try that second.
Your "blocks" may -- in an extreme case -- be so complex that you have to write a specialized view function to transform several objects into HTML. This is very, very rare. You should not do this until you are sure that you can't do this with template tags.
Edit.
"query different external data sources"
"separate simple classes (not Models) that have a save method, that write to the same database table."
You have three completely different, unrelated, separate things.
Model. The persistent model. With the save() method. These do very, very little.
They have attributes and a few methods. No "query different external data sources". No "rendering in HTML".
External Data Sources. These are ordinary Python classes that acquire data.
These objects (1) get external data and (2) create Model objects. And nothing else. No "persistence". No "rendering in HTML".
Presentation. These are ordinary Django templates that present the Model objects. No external query. No persistence.
I just finished a prototype of system that has this problem in spades: a base Product class and about 200 detail classes that vary wildly. There are many situations where we are doing general queries against Product, but then want to to deal with the subclass-specific details during rendering. E.g. get all Products from Vendor X, but display with slightly different templates for each group from a specific subclass.
I added hidden fields for a GenericForeignKey to the base class and it auto-fills the content_type & object_id of the child class at save() time. When we have a generic Product object we can say obj = prod.detail and then work directly with the subclass object. Took about 20 lines of code and it works great.
The one gotcha we ran into during testing was that manage.py dumpdata followed by manage.py loaddata kept throwing Integrity Errors. Turns out this is a well-known problem and a fix is expected in the 1.2 release. We work around it by using mysql commands to dump/reload the test dataset.

Resources