I'm looking into storing CQRS read models in SQL Server tables due to legacy system concerns (see approaches 2 & 3 of this question).
While I'd like to implement the read models using document database such as MongoDB, due to outside systems that can't be reworked at this time, I'm stuck with keeping everything in the rdbms for now.
Since I'm looking at storing records in a properly de-normalized way, what's the best way to actually store them when dealing with typical hierarchical data, such as the typical Customer / Order / LineItems /etc, that must all be displayed in the same view? [EDIT: What I'm thinking is that I put the data needed to query the model in separate fields, but the full object in a "object data field" with it]
Due to my legacy systems (mostly out of my control) I'm thinking that I'll add triggers to the legacy system tables or make sproc changes to keep my read models current, but how should I actually store the data itself?
I considered simply storing them as JSON in a field, or storing them as XML, as both can easily be serialized/deserialized from a .net application, and can reasonably easily be updated by triggers from other activities in the database. (Xpath/XQuery isn't so bad when you get used to it, and from another answer here, I found a JSON parser for T-SQL)
Is there a better approach? If not, should I use XML or JSON?
I would go with XML as it has a built-in support in SQL Server. In general I would avoid using any additional stuff written in T-SQL, as maintaining this can be a nightmare.
Related
What is the best practise for logging activity in a WCF service to a database (sql server), so that it later can be reviewed in a UI.
I have checked with a couple of persons and one says that I could save the message as xml in a xml column in the database, whilst the other says that it's better to save it in standard relational tables.
The thing that points to the first solution (xml based) is that it simplifies the solution based on the fact that the business object sent from the service contains list properties. It does on the other hand introduce some complexity for handling the data and will probably require some kind of table-like view.
The plain old tables, however, makes it more complicated to save the list properties in a smart way. On the other hand, it's simpler to write queries and use the data for possible future data warehousing/analyzing.
Another thing that worries me is if/when the business object changes (schema). Which solution makes it easier to handle that and in what way (specially for the UI part).
Why not do both? Use an XML column that supports the unstructured flexibility you seek combined with multiple structured fields enabling easier filtering & viewing capability.
It's really a matter of opinion - there is not a correct answer for this question. I've seen it done both ways as well as the hybrid approach suggested above. It just depends on your needs. Displaying XML in the UI isn't that much more difficult than single-value fields - you just need the right toolset (see LINQ to XML, XML Computed Columns, & XML Indexes).
A raw data is stored in a database (multiple tables). it needs to be manually checked and corrected. The checked data should be stored in database along with the raw data as well. In that case, will it be good idea to create two separate databases (eg. raw_data and checked_data) ? Or there should be only one database?
Thanks
Generally speaking it is a lot easier to work within a single instance than across multiple instances. Distributed transactions perform slower. They require more typing (always having to add a database link). This is not just a matter of convenience but also of integrity. You may want to ensure that a given record is either in the RAW data set or the CLEANSED data set but not both. Checking this sort of thing is more manageable in a single database.
How you organize things in a single database depends to some extent on your chosen DBMS flavour, and what it supports. You can have a single schema (user account) and use a naming convention such as prefix, for example RAW_TABLE_1 and CLEAN_TABLE_1. Or you may want to use different schemas, which will allow you to retain the same table name, for example RAW_USER.TABLE_1 and CLEAN_USER.TABLE_1. Both approaches have advantages. It is always good to have a constant reminder of whether we are working with raw or clean data. On the other hand we may have tools or applications which we would like to use that expect the normal table names. Synonyms can help in this regard.
If your raw data and checked data are going to be very inormous than only use two different database
With normalization and using procedures you can maintain it in one database.
There is no recommended method here except your own preferences. You can store the cleansed data with raw data in same database but in different tables and may be prefix something like raw_ to the raw data tables.
Otherwise you may have a seperate database for each type of data. The benefits would be seperation where as the drawback would be costlier Join etc if need to be done between these two.
One of my coworkers is going to build api directly from database. He wants to return xml for queries. Is really good idea?
We have to build api for our partners and we are looking for good architectural approach. We have few milions of products, reviews and so on.
Some partners will take 1000 products others would like to copy almost whole database.
We have to restrict access to some fields for example one partner will see productDescription other only id and productName.
Sometimes we would like to return only info about categories in xml response, sometimes we would like to include 100 products for each category returned.
One of programmers is forcing solution based almost only on mssql 2005 for xml auto. He wants to build query in application, send it to server and then return xml to partner. Without any caching within application.
Is it rally good idea?
I have used this technique for a particular web application. I have mixed feelings about this approach.
One pro is that this is really convenient for simple requirements. Another pro is that it is really easy to translate a database schema change in a change in the XML format, since everything is in one place.
I found there are cons too. When your target XML gets more complex, has more nested structures then this solution can get rapidly out of hand. Consider for example this (taken from http://msdn.microsoft.com/en-us/library/ms345137(SQL.90).aspx#forxml2k5_topic5):
SELECT CustomerID as "CustomerID",
(SELECT OrderID as "OrderID"
FROM Orders "Order"
WHERE "Order".CustomerID = Customer.CustomerID
FOR XML AUTO, TYPE),
(SELECT DISTINCT LastName as "LastName"
FROM Employees Employee
JOIN Orders "Order" ON "Order".EmployeeID = Employee.EmployeeID
WHERE Customer.CustomerID = "Order".CustomerID
FOR XML AUTO, TYPE)
FROM Customers Customer
FOR XML AUTO, TYPE
So essentially, you see that you begin writing SQL to mirror the structure of the XML output. And if you think about it, this is a bad thing - you're mixing the data retrieval logic with presentation logic - the fact that the presentation is in this case representation in a data exchange format, really does not change the fact that you're mixing two different things, making both of them harder.
For example, it is quite possible that the requirements for the exact structure of the XML change over time, whereas the actual associated data requirements remain unchanged. Then you would be rewriting queries even though there is nothing wrong with the actual dataset you are already retrieving. That's a code smell if you ask me.
Another consideration is performance/query tuning. I cannot say I have done much benchmarking of these types of queries, but I would typically avoid correlated subqueries like this whenever I can...and now, just because of this syntactic sugar, I would suddenly throw that overboard because of the convenience of generating XML with no intermediary language? I don't think it's a good idea.
So in short, I would use this technique if I could considerably simplify things. But if I could anticipate that I would need an intermediary language anyway to generate all the XML structures I need, I would choose to not use this technique at all. If you are going to generate XML, do it all in one place, don't put some in the query and some in your program because it will become a nightmare to manage change and maintain it.
It's a bad idea. SQL Server can return data only trough the TDS protocol, meaning it can only be a result set (rows). Returning XML means you still return a rowset, but a arowset of SQL data formated as XML. But ultimately, you still need a TDS protocol client (ie. SqlClient, OleDB, ODBC, JDBC etc). You still need to deal with rows, columns and T-SQL errors. You are returning a column containing XML data, not an XML response.
So if your client has to be database client, what advantage does XML give? Other than you lost all schema result metadata information in the process...
In adition, consider that stored procedures are an API for access for everything, including SSIS tasks, maintenance and ETL jobs, other applications deployed on the database etc. Having everything presented to that layer as XML will be a mess. Two stored procedures from related apps, both in the same instance, exchanging call via for-xml and then xpath-squery? Why? Keep in mind, your database will outlive every application you have in mind today.
I understand XML as a good format for exchange of data between web services. But not between the database and the client. So the answer is that your partners should see XML, but from your web service layer, not from your database.
In general, I see nothing wrong with exposing information stored in a RDBMS via a "read only" API that enforces access restrictions based on user privilege. Since your application is building the queries you can expose whatever the appropriate names are for your nouns (tables) and attributes (columns) in the user-facing API.
Most DBs can cache queries (and although I don't know SQL server at all I imagine it can do this), and the main advantage of not caching "downstream" in the application is simplicity - the data returned for each API call will be up to date, without any of the complexity of having to figure out when to refresh a "downstream" cache. You can always add caching later - when you're sure that everything works properly and you actually need the performance boost.
As for keeping the queries and the XML in sync - if you're simply dumping data records which are generated from a single table then there's not much of an issue, here. It is true that as you start combining information from multiple tables on the back end it may become harder to generate data records with a single query, but fixing this with intermediate data structures in the web server application is an approach that (typically) scales poorly as the tables grow - you're often better off putting the data you need to expose in a single query/API call into a database "view".
If your XML is designed in such a way that you have to load all the data into memory and calculate statistics (to be displayed in XML element attributes, for example) before rendering begins, then you'll have scalability problems no matter what you do. So try to avoid designing your XML this way from the beginning.
Note also that XML can often be "normalized" (just as DB tables should be), using internal links and "GLOSSARY" elements that factor out repeated information. This reduces the size of XML generated, and by rendering a GLOSSARY element at the end of the XML document (perhaps with information extracted from a subsequent SQL query) you can avoid having to hold lots of data in web server memory while serving the API call.
It depends on who will consume this API - If this API is going to be consumed by a large range of different languages, then yes it may make sense to expose returned data in an Xml format, as pretty much everything is able to parse Xml.
If on the other hand the API is predominanly going to be consumed by only 1 language (e.g. C# / .Net) then you would be much better writing the API in that language and directly exposing data in a format native to that language - exposing Xml based results will result in a needless generation and subsequent parsing of Xml.
Personally I would probably opt for a mixed approach - choose a suitable commonly used language (for the customers of this API) to write the API in, and then on top of that expose an extra xml based API if it turns out its needed.
I'm working on a MUD (Multi User Dungeon) in Python and am just now getting around to the point where I need to add some rooms, enemies, items, etc. I could hardcode all this in, but it seems like this is more of a job for a database.
However, I've never really done any work with databases before so I was wondering if you have any advice on how to set this up?
What format should I store the data in?
I was thinking of storing a Dictionary object in the database for each entity. In htis way, I could then simply add new attributes to the database on the fly without altering the columns of the database. Does that sound reasonable?
Should I store all the information in the same database but in different tables or different entities (enemies and rooms) in different databases.
I know this will be a can of worms, but what are some suggestions for a good database? Is MySQL a good choice?
1) There's almost never any reason to have data for the same application in different databases. Not unless you're a Fortune500 size company (OK, i'm exaggregating).
2) Store the info in different tables.
As an example:
T1: Rooms
T2: Room common properties (aplicable to every room), with a row per **room*
T3: Room unique properties (applicable to minority of rooms, with a row per property per room - thos makes it easy to add custom properties without adding new columns
T4: Room-Room connections
Having T2 AND T3 is important as it allows you to combine efficiency and speed of row-per-room idea where it's applicable with flexibility/maintanability/space saving of attribute-per-entity-per-row (or Object/attribute/value as IIRC it's called in fancy terms) schema
Good discussion is here
3) Implementation wise, try to write something re-usable, e.g. have generic "Get_room" methods, which underneath access the DB -= ideally via transact SQL or ANSI SQL so you can survive changing of DB back-end fairly painlessly.
For initial work, you can use SQLite. Cheap, easy and SQL compatible (the best property of all). Install is pretty much nothing, DB management can be done by freeware tools or even FireFox plugin IIRC (all of FireFox 3 data stores - history, bookmarks, places, etc... - are all SQLite databases).
For later, either MySQL or Postgres (I don't do either one professionally so can't recommend one). IIRC at some point Sybase had free personal db server as well, but no idea if that's still the case.
This technique is called entity-attribute-value model. It's normally preferred to have DB schema that reflects the structure of the objects, and update the schema when your object structure changes. Such strict schema is easier to query and it's easier to make sure that the data is correct on the database level.
One database with multiple tables is the way to do.
If you want a database server, I've recommend PostgreSQL. MySQL has some advantages, like easy replication, but PostgreSQL is generally nicer to work with. If you want something smaller that works directly with the application, SQLite is a good embedded database.
Storing an entire object (serialized/encoded) as a value in the database is bad for querying - I am sure that some queries in your mud will NOT need to know 100% of attributes, or may retrieve a list of object by a value of attributes.
it seems like this is more of a job
for a database
True, although 'database' doesn't have to mean 'relational database'. Most existing MUDs store all data in memory, and read it in from flat-file saved in a plain-text data format. I'm not necessarily recommending this route, just pointing out that a traditional database is by no means necessary. If you do want to go the relational route, recent versions of Python come with sqlite which is a lightweight embedded relational database with good SQL support.
Using relational databases with your code can be awkward. Any change to a game logic class can require a parallel change to the database, and changes to the code that read and write to the database. For this reason good planning will help you a lot, but it's hard to plan a good database schema without experience. At least get your entity classes planned first, then build a database schema around it. Reading up on normalizing a database and understanding the principles there will help.
You may want to use an 'object-relational mapper' which can simplify a lot of this for you. Examples in Python include SQLObject, SQLAlchemy, and Autumn. These hide a lot of the complexities for you, but as a result can hide some of the important details too. I'd recommend using the database directly until you are more familiar with it, and consider using an ORM in the future.
I was thinking of storing a Dictionary
object in the database for each
entity. In htis way, I could then
simply add new attributes to the
database on the fly without altering
the columns of the database. Does that
sound reasonable?
Unfortunately not - if you do that, you waste 99% of the capabilities of the database and are effectively using it as a glorified data store. However, if you don't need aforementioned database capabilities, this is a valid route if you use the right tool for the job. The standard shelve module is well worth looking at for this purpose.
Should I store all the information in
the same database but in different
tables or different entities (enemies
and rooms) in different databases.
One database. One table in the database per entity type. That's the typical approach when using a relational database (eg. MySQL, SQL Server, SQLite, etc).
I know this will be a can of worms,
but what are some suggestions for a
good database? Is MySQL a good choice?
I would advise sticking with sqlite until you're more familiar with SQL. Otherwise, MySQL is a reasonable choice for a free game database, as is PostGreSQL.
One database. Each database table should refer to an actual data object.
For instance, create a table for all items, all creatures, all character classes, all treasures, etc.
Spend some time now and figure out how objects will relate to each other, as this will affect your database structure. For example, can a character have more than one character class? Can monsters have character classes? Can monsters carry items? Can rooms have more than one monster?
It seems pedantic, but you'll save yourself a whole lot of trouble early by figuring out what database objects "belong" to which other database objects.
I have a database that has lots of data and is all "neat", normalized (within reason - using EAV), and I have stored procedures to access and modify the data.
I also have a WinForms application that users download to search and view this data (no inserts). To make things handy for use and updates, I've been using SQLite to store this data and it works really well.
I'm working on updating the entire process and I was wondering if I should use a denormalized view of the data to ship out to the users, ala the 1 table with all the properties as columns, or continue to use the same schema as the master database?
My initial thoughts are along the lines of :
Denormalized View:
Benefits...
Provides a simple method of querying the data (since I'm not doing a lot of joins, just a bunch of column searching.
Cons...
I'd have to manage a second data access layer. Granted I don't think it will be difficult, but it is still a bit more work.
If a new property is added, I'd have to modify the schema again and accomodate for the changes. Wheras I can simply query the property bag and work form there.
Same Schema:
Pros...
Same layout as master database, so updates are minimal, and I can even use the same queries when building my Data Access Layer since SQLite doesn't support stored procedures.
Cons...
There is a lot of small tables for lookup codes and the like, so I could start running into issues when building the queries and managing it in the DAL.
How should I proceed?
If you develop your application to query views of the data rather than the underlying data itself, you will be able to keep the same database for both scenarios without concern or the need to alter your DAL.