What is the best practise for logging activity in a WCF service to a database (sql server), so that it later can be reviewed in a UI.
I have checked with a couple of persons and one says that I could save the message as xml in a xml column in the database, whilst the other says that it's better to save it in standard relational tables.
The thing that points to the first solution (xml based) is that it simplifies the solution based on the fact that the business object sent from the service contains list properties. It does on the other hand introduce some complexity for handling the data and will probably require some kind of table-like view.
The plain old tables, however, makes it more complicated to save the list properties in a smart way. On the other hand, it's simpler to write queries and use the data for possible future data warehousing/analyzing.
Another thing that worries me is if/when the business object changes (schema). Which solution makes it easier to handle that and in what way (specially for the UI part).
Why not do both? Use an XML column that supports the unstructured flexibility you seek combined with multiple structured fields enabling easier filtering & viewing capability.
It's really a matter of opinion - there is not a correct answer for this question. I've seen it done both ways as well as the hybrid approach suggested above. It just depends on your needs. Displaying XML in the UI isn't that much more difficult than single-value fields - you just need the right toolset (see LINQ to XML, XML Computed Columns, & XML Indexes).
Related
I'm looking into storing CQRS read models in SQL Server tables due to legacy system concerns (see approaches 2 & 3 of this question).
While I'd like to implement the read models using document database such as MongoDB, due to outside systems that can't be reworked at this time, I'm stuck with keeping everything in the rdbms for now.
Since I'm looking at storing records in a properly de-normalized way, what's the best way to actually store them when dealing with typical hierarchical data, such as the typical Customer / Order / LineItems /etc, that must all be displayed in the same view? [EDIT: What I'm thinking is that I put the data needed to query the model in separate fields, but the full object in a "object data field" with it]
Due to my legacy systems (mostly out of my control) I'm thinking that I'll add triggers to the legacy system tables or make sproc changes to keep my read models current, but how should I actually store the data itself?
I considered simply storing them as JSON in a field, or storing them as XML, as both can easily be serialized/deserialized from a .net application, and can reasonably easily be updated by triggers from other activities in the database. (Xpath/XQuery isn't so bad when you get used to it, and from another answer here, I found a JSON parser for T-SQL)
Is there a better approach? If not, should I use XML or JSON?
I would go with XML as it has a built-in support in SQL Server. In general I would avoid using any additional stuff written in T-SQL, as maintaining this can be a nightmare.
I want to make a database that can store any king of objects and for each classes of objects different features.
Giving some of the questions i asked on different forums the solution is http://en.wikipedia.org/wiki/Entity-attribute-value_model or http://en.wikipedia.org/wiki/Xml with some kind of validation before storage.
Can you please give me an alternative to the ones above or some advantages or examples that would help decide which of the two methods is the best one in my case?
Thanks
UPDATE 1 :
Is your db read or write intensive?
will be both -> auction engine
Will you ever conceivably move off SQL Server and onto another platform?
I won't move it, I will use a WCF Service to expose functionality to mobile devices.
How do you plan to surface your data to the application?
Entity Framework for DAL and WCF Service Layer for Bussiness
Will people connect to your data through means other than those you control?
No
While #marc_s is correct in his cautions, there unarguably are situations where the relational model is just not flexible enough. For quite a number of years now, I've been working with a database that is straightforwardly relational for the largest part, but has a small EAV part. This is because users can invent new properties any time for observation purposes in trials.
Admittedly, it is awkward wrt querying and reporting, to name a few, but no other strategy would suffice here. We use stored procedures with T-Sql's pivot to offer flattened data structures for reporting and grids with dynamic columns for display. Once the infrastructure stands it's pretty comfortable altogether.
We never considered using XML data because it wasn't there yet and, apart from its common limitations, it has some drawbacks in our context:
The EAV data is queried heavily. A development team needs more than standard sql knowledge because of the special syntax. Indexing is possible but "there is a cost associated with maintaining the index during data modification" (as per MSDN).
The XML datatype is far less accessible than regular tables and fields when it comes to data processing and reporting.
Hardly ever do users fetch all attribute values of an entity, but the whole XML would have to be crunched anyway.
And, not unimportant: XML datatype is not (yet) supported by Entity Framework.
So, to conclude, I would go for a design that is relational as much as possible but EAV where necessary. Auction items could have a number of fixed fields and EAV's for the flexible data.
I will use my answer from another question:
EAV:
Storage. If your value will be used often for different products, e.g. clothes where attribute "size" and values of sizes will be repeated often, your attribute/values tables will be smaller. Meanwhile, if values will be rather unique that repeatable (e.g. values for attribute "page count" for books), you will get a big enough table with values, where every value will be linked to one product.
Speed. This scheme is not weakest part of project, because here data will be changed rarely. And remember that you always can denormalize database scheme to prepare DW-like solution. You can use caching if database part will be slow too.
Elasticity This is the strongest part of solution. You can easily add/remove attributes and values and ever to move values from one attribute to another!
XML storage is more like NoSQL: you will abdicate database functionality and you wisely prepare your solution to:
Do not lose data integrity.
Do not rewrite all database functionality in application (it is senseless)
I think there is way too much context missing for anyone to add any kind of valid comment to the discussion.
Is your db read or write intensive?
Will you ever conceivably move off SQL Server and onto another platform?
How do you plan to surface your data to the application?
Will people connect to your data through means other than those you control?
First do not go either route unless the structure truly cannot be known in advance. Using EAV or XML because you don't want to actually define the requirements will result in an unmaintainable mess and a badly performing mess at that. Usually at least 90+% (a conservative estimate based on my own experience) of the fields can be known in advance and should be in ordinary relational tables. Only use special techiniques for structures that can't be known in advance. I can't stress this strongly enough. EAV tables look simple but are actually very hard to query especially for complex reporting queries. Sure it is easy to get data into them, but very very difficult to get the data back out.
If you truly need to go the EAV route, consider using a nosql database for that part of the application and a relational database for the rest. Nosql databases simply handle EAV better.
We have to redesign a legacy POI database from MySQL to PostgreSQL. Currently all entities have 80-120+ attributes that represent individual properties.
We have been asked to consider flexibility as well as good design approach for the new database. However new design should allow:
n no. of attributes/properties for any entity i.e. no of attributes for any entity are not fixed and may change on regular basis.
allow content admins to add new properties to existing entities on the fly using through admin interfaces rather than making changes in db schema all the time.
There are quite a few discussions about performance issues of EAV but if we don't go with a hybrid-EAV we end up:
having lot of empty columns (we still go and add new columns even if 99% of the data does not have those properties)
spend more time maintaining database esp. when attributes keep changing.
no way of allowing content admins to add new properties to existing entities
Anyway here's what we are thinking about the new design (basic ERD included):
Have separate tables for each entity containing some basic info that is exclusive e.g. id,name,address,contact,created,etc etc.
Have 2 tables attribute type and attribute to store properties information.
Link each entity to an attribute using a many-to-many relation.
Store addresses in different table and link to entities using foreign key.
We think this will allow us to be more flexible when adding,removing or updating on properties.
This design, however, will result in increased number of joins when fetching data e.g.to display all "attributes" for a given stadium we might have a query with 20+ joins to fetch all related attributes in a single row.
What are your thoughts on this design, and what would be your advice to improve it.
Thank you for reading.
I'm maintaining a 10 year old system that has a central EAV model with 10M+ entities, 500M+ values and hundreds of attributes. Some design considerations from my experience:
If you have any business logic that applies to a specific attribute it's worth having that attribute as an explicit column. The EAV attributes should really be stuff that is generic, the application shouldn't distinguish attribute A from attribute B. If you find a literal reference to an EAV attribute in the code, odds are that it should be an explicit column.
Having significant amounts of empty columns isn't a big technical issue. It does need good coding and documentation practices to compartmentalize different concerns that end up in one table:
Have conventions and rules that let you know which part of your application reads and modifies which part of the data.
Use views to ease poking around the database with debugging tools.
Create and maintain test data generators so you can easily create schema conforming dummy data for the parts of the model that you are not currently interested in.
Use rigorous database versioning. The only way to make schema changes should be via a tool that keeps track of and applies change scripts. Postgresql has transactional DDL, that is one killer feature for automating schema changes.
Postgresql doesn't really like skinny tables. Each attribute value results in 32 bytes of data storage overhead in addition to the extra work of traversing all the rows to pull the data together. If you mostly read and write the attributes as a batch, consider serializing the data into the row in some way. attr_ids int[], attr_values text[] is one option, hstore is another, or something client side, like json or protobuf, if you don't need to touch anything specific on the database side.
Don't go out of your way to put everything into one single entity table. If they don't share any attributes in a sensible way, use multiple instantitions of the specific EAV pattern you use. But do try to use the same pattern and share any accessor code between the different instatiations. You can always parametrise the code on the entity name.
Always keep in mind that code is data and data is code. You need to find the correct balance between pushing decisions into the meta-model and expressing them as code. If you make the meta-model do too much, modifying it will need the same kind of ability to understand the system, versioning tools, QA procedures, staging as your code, but it will have none of the tools. In essence you will be doing programming in a very awkward non-standard language. On the other hand, if you leave too much in the code, every trivial change will need a new version of your software. People tend to err on the side of making the meta-model too complex. Building developer tools for meta-models is hard and tedious work and has limited benefit. On the other hand, making the release process cheaper by automating everything that happens from commit to deploy has many side benefits.
EAV can be useful for some scenarios. But it is a little like "the dark side". Powerful, flexible and very seducing it is. But it's something of an easy way out. An easy way out of doing proper analysis and design.
I think "entity" is a bit over the top too general. You seem to have some idea of what should be connected to that entity, like address and contact. What if you decide to have "Books" in the model. Would they also have adresses and contacts? I think you should try to find the right generalizations and keep the EAV parts of the model to a minium. Whenever you find yourself wanting to show a certain subset of the attributes, or test for existance of the value, or determining behaviour based on the value you should really have it modelled as a columns.
You will not get a better opportunity to design this system than now. The requirements are known since the previous version, and also what worked and what didn't. (Just don't fall victim to the Second System Effect)
One good implementation of EAV can be found in magento, a cms for ecommerce. There is a lot of bad talk about EAV those days, but I challenge anyone to come up with another solution than EAV for dealing with infinite product attributes.
Sure you can go about enumerating all the columns you would need for every product in the world, but that would take you a lot of time and you would inevitably forget product attributes in the way.
So the bottom line is : use EAV for infinite stuff but don't rely on EAV for all the database's tables. Hence an hybrid EAV and relational db, when done right, is a powerful tool that could not be acomplished by only using fixed columns.
Basically EAV is trying to implement a database within a database, and it leads to madness. The queries to pull data become overly complex, and your data has no stable, specific model to keep it in some kind of order.
I've written EAV systems for limited applications, but as a generic solution it's usually a bad idea.
One of my coworkers is going to build api directly from database. He wants to return xml for queries. Is really good idea?
We have to build api for our partners and we are looking for good architectural approach. We have few milions of products, reviews and so on.
Some partners will take 1000 products others would like to copy almost whole database.
We have to restrict access to some fields for example one partner will see productDescription other only id and productName.
Sometimes we would like to return only info about categories in xml response, sometimes we would like to include 100 products for each category returned.
One of programmers is forcing solution based almost only on mssql 2005 for xml auto. He wants to build query in application, send it to server and then return xml to partner. Without any caching within application.
Is it rally good idea?
I have used this technique for a particular web application. I have mixed feelings about this approach.
One pro is that this is really convenient for simple requirements. Another pro is that it is really easy to translate a database schema change in a change in the XML format, since everything is in one place.
I found there are cons too. When your target XML gets more complex, has more nested structures then this solution can get rapidly out of hand. Consider for example this (taken from http://msdn.microsoft.com/en-us/library/ms345137(SQL.90).aspx#forxml2k5_topic5):
SELECT CustomerID as "CustomerID",
(SELECT OrderID as "OrderID"
FROM Orders "Order"
WHERE "Order".CustomerID = Customer.CustomerID
FOR XML AUTO, TYPE),
(SELECT DISTINCT LastName as "LastName"
FROM Employees Employee
JOIN Orders "Order" ON "Order".EmployeeID = Employee.EmployeeID
WHERE Customer.CustomerID = "Order".CustomerID
FOR XML AUTO, TYPE)
FROM Customers Customer
FOR XML AUTO, TYPE
So essentially, you see that you begin writing SQL to mirror the structure of the XML output. And if you think about it, this is a bad thing - you're mixing the data retrieval logic with presentation logic - the fact that the presentation is in this case representation in a data exchange format, really does not change the fact that you're mixing two different things, making both of them harder.
For example, it is quite possible that the requirements for the exact structure of the XML change over time, whereas the actual associated data requirements remain unchanged. Then you would be rewriting queries even though there is nothing wrong with the actual dataset you are already retrieving. That's a code smell if you ask me.
Another consideration is performance/query tuning. I cannot say I have done much benchmarking of these types of queries, but I would typically avoid correlated subqueries like this whenever I can...and now, just because of this syntactic sugar, I would suddenly throw that overboard because of the convenience of generating XML with no intermediary language? I don't think it's a good idea.
So in short, I would use this technique if I could considerably simplify things. But if I could anticipate that I would need an intermediary language anyway to generate all the XML structures I need, I would choose to not use this technique at all. If you are going to generate XML, do it all in one place, don't put some in the query and some in your program because it will become a nightmare to manage change and maintain it.
It's a bad idea. SQL Server can return data only trough the TDS protocol, meaning it can only be a result set (rows). Returning XML means you still return a rowset, but a arowset of SQL data formated as XML. But ultimately, you still need a TDS protocol client (ie. SqlClient, OleDB, ODBC, JDBC etc). You still need to deal with rows, columns and T-SQL errors. You are returning a column containing XML data, not an XML response.
So if your client has to be database client, what advantage does XML give? Other than you lost all schema result metadata information in the process...
In adition, consider that stored procedures are an API for access for everything, including SSIS tasks, maintenance and ETL jobs, other applications deployed on the database etc. Having everything presented to that layer as XML will be a mess. Two stored procedures from related apps, both in the same instance, exchanging call via for-xml and then xpath-squery? Why? Keep in mind, your database will outlive every application you have in mind today.
I understand XML as a good format for exchange of data between web services. But not between the database and the client. So the answer is that your partners should see XML, but from your web service layer, not from your database.
In general, I see nothing wrong with exposing information stored in a RDBMS via a "read only" API that enforces access restrictions based on user privilege. Since your application is building the queries you can expose whatever the appropriate names are for your nouns (tables) and attributes (columns) in the user-facing API.
Most DBs can cache queries (and although I don't know SQL server at all I imagine it can do this), and the main advantage of not caching "downstream" in the application is simplicity - the data returned for each API call will be up to date, without any of the complexity of having to figure out when to refresh a "downstream" cache. You can always add caching later - when you're sure that everything works properly and you actually need the performance boost.
As for keeping the queries and the XML in sync - if you're simply dumping data records which are generated from a single table then there's not much of an issue, here. It is true that as you start combining information from multiple tables on the back end it may become harder to generate data records with a single query, but fixing this with intermediate data structures in the web server application is an approach that (typically) scales poorly as the tables grow - you're often better off putting the data you need to expose in a single query/API call into a database "view".
If your XML is designed in such a way that you have to load all the data into memory and calculate statistics (to be displayed in XML element attributes, for example) before rendering begins, then you'll have scalability problems no matter what you do. So try to avoid designing your XML this way from the beginning.
Note also that XML can often be "normalized" (just as DB tables should be), using internal links and "GLOSSARY" elements that factor out repeated information. This reduces the size of XML generated, and by rendering a GLOSSARY element at the end of the XML document (perhaps with information extracted from a subsequent SQL query) you can avoid having to hold lots of data in web server memory while serving the API call.
It depends on who will consume this API - If this API is going to be consumed by a large range of different languages, then yes it may make sense to expose returned data in an Xml format, as pretty much everything is able to parse Xml.
If on the other hand the API is predominanly going to be consumed by only 1 language (e.g. C# / .Net) then you would be much better writing the API in that language and directly exposing data in a format native to that language - exposing Xml based results will result in a needless generation and subsequent parsing of Xml.
Personally I would probably opt for a mixed approach - choose a suitable commonly used language (for the customers of this API) to write the API in, and then on top of that expose an extra xml based API if it turns out its needed.
What is the best practice of introducing custom (typically volatile) data into entity model classes? This may sound like a bad practice first, but it seems to be quite a common scenario. In our recent web application we have developed a proper model and in most cases we are fine with loading model entities. But there are cases where we cannot afford loading an entire hierarchy of entities; we need to load, say, results of a couple of SQL COUNT’s or possibly some additional information alongside (or embedded inside) the model entities. So basically, the requirements and conditions are:
It’s a web application where 99.9999999999% of all operations are read operations.
They don’t need to process or do any complicated business logic. We just need to get data quickly to HTML.
In several performance critical cases, we need to load results of SQL aggregates which don’t fit any model properties.
We need an extensible way to introduce any new custom data if needed.
How do you usually solve this issue without working too much around your ORM (for instance raw data from db)? I’m sure this has been discussed many times, but I cannot figure out a good Google query to find anything useful.
Edit: Since I later realized the question was not very well formed, I decided to reformulate it and start a new one.
If you're just getting relational data to and from a browser, with little or no behavior in between, it sounds like your trying to solve a relational problem with an OO paradigm.
I might be inclined to dispense with the Object Oriented approach altogether.
Me team recently rewrote an application by asking "What is the simplest thing that can possibly work?" and "What is the closest language to the problem?". Our new app, replacing an OO one, ended up being 10 times smaller, faster, and cheaper.
We used SQL, stored procedures, XML libraries on the DB server, XSLT (to get the HTML), and javascript.
OOP purist like myself would go to the Decorator pattern.
http://en.wikipedia.org/wiki/Decorator_pattern
But the thing is, some people may not need the flexibility it offers. Plus, creating new classes for each distinct operation may seem overkill, but it provide good compile type checking.
The best practice in my view is that your application consumes data using the Domain Model pattern. The Domain Model can offer business-logic methods for doing the type of queries that make sense and are relevant to your application needs.
These can fetch "live" results that map directly to database rows and can therefore be edited and "saved."
But additionally, the Domain Model can provide methods that fetch read-only results that are too complex to be easily saved back to the database. This includes your example of grouped aggregate query results, and also includes joined query result sets, expressions as columns, etc.
The Domain Model pattern offers a way to decouple the OO design of an application from the design of the physical database.