I have several entities that I require users be able to add custom fields to.
If I had an entity called customer with base variables like {Name, DateOfBirth, StoreId}
and another one called Store with {Name}
Then I would want it so that the owner of that store could login and add a new variable for all their customers called favourite colour which is a dropdown with red, green or blue as options.
Now I have had a look at EAV and come up with a solution that looks like this
Attribute {StoreId, Name, DataType},
Value {AttributeId, EntityName, EntityId, Value}
I'm wondering is there some solution that will work best for SQL Server 2008 especially given that I'll want to be able to view and query this information easily.
I've heard that you can query within the xml datatype. Is that a better way to go?
I will also probably want users to be able to add custom fields that are foreign keys at some point too.
Will be looking at this all day so will ask questions quickly.
EAV in general is an anti-pattern that results in dismal performance and chokes the scalability. Now if you decide to go with EAV, the SQL Server Customer Advisory Team has published a white paper with common pitfalls and problems and how to avoid them: Best Practices for Semantic Data Modeling for Performance and Scalability.
Querying an XML data type is possible in SQL, but if your XML has no schema then querying it it will be slow. If it has a schema and the schema is EAV, then it will have all the problems of relational EAV plus some of its own for XML performance. again the good folks of the CAT team have published a couple of white papers on the topic: XML Best Practices for Microsoft SQL Server 2005 and Performance Optimizations for the XML Data Type in SQL Server 2005. They are valid for SQL 2008 too.
I've been using the XML features of SQL 2005 / 2008 for a while. I've come to rely on XML columns quite a bit.
What you want to do sounds like the perfect candidate for XML.
For instance, the following snippet defines your 2 entities (#customers and #stores), with a column called "attrs" that can be expanded to include more attributes.
I hope this helps!
declare #customers as table ( id int, attrs xml);
INSERT INTO #customers VALUES
(1,'<Attrs Name="Peter" DateOfBirth="1996-01-25" StoreId="10" />'),
(2,'<Attrs Name="Smith" DateOfBirth="1993-05-02" StoreId="20" />')
;
declare #stores as table ( id int, attrs xml);
insert into #stores VALUES
(10, '<Attrs Name="Store1" />'),
(20, '<Attrs Name="Store2" />')
;
With c as (
select id as CustomerID,
attrs.value('(/Attrs[1])/#Name', 'nvarchar(100)') as Name,
attrs.value('(/Attrs[1])/#DateOfBirth', 'date') as DateOfBirth,
attrs.value('(/Attrs[1])/#StoreId', 'int') as StoreId
from #customers
), s as (
select id as StoreID,
attrs.value('(/Attrs[1])/#Name', 'nvarchar(100)') as Name
from #stores
)
select *
from c left outer join s on (c.StoreId=s.StoreID);
Excellent answers already. I'll only add the suggestion that you maintain metadata for the custom fields as well. This would make a UI for entering the custom fields easier - you'd be able to limit the set of custom fields for a Customer, for instance, and to specify that DateOfBirth is to be a date, and that StoreID is meant to match the ID of an actual store.
Some of this metadata could be maintained as XML schemas. I've seen that done, with the schemas stored in the database, and used to validate custom fields being input. I do not know if those schemas can also be used to strongly-type the XML data.
Related
SELECT
(
SELECT SUM(ISNULL(Volume,0))
FROM Order_【a1.Login】
WHERE Login = a1.Login
) AS SelfVolume
FROM dbo.Account a1
I want the table name in the sub-select (【a1.Login】) to match the value a1.Login from the outer select statement (field Login of table Account). How can I get this result?
The technical answer is: By using dynamic SQL. It's complicated, error-prone and potentially dangerous (beware of Bobby Tables). Your SQLs will become unreadable and unmaintainable. You are entering a world of pain.
The correct answer is: You don't. Don't create a separate Orders table for every user. Create one Orders table with a foreign key to your Account table.
If you still want to go ahead and work with this broken database design (remember: You are entering a world of pain, and you are just getting started), you will somehow need to construct the following SQL dynamically:
SELECT SUM(ISNULL(Login_Volume,0)) FROM
(
SELECT SUM(ISNULL(Volume,0)) AS Login_Volume FROM Order_SomeUser WHERE Login = 'SomeUser'
UNION ALL
SELECT SUM(ISNULL(Volume,0)) AS Login_Volume FROM Order_SomeOtherUser WHERE Login = 'SomeOtherUser'
UNION ALL
...
) AS AllSums
You can do that in the language of your choice, either in your target language (C#, Java, PHP, etc.), which is probably the easiest and most maintainable solution, or directly in T-SQL, by using T-SQL cursors and loops (= the hard way). Whichever language you choose, the algorithm is straight-forward:
Loop through your Account table and get the Logins.
Sanitize the value and validate that the corresponding Order_ table exists.
Create one SQL statement for each account.
Join them with UNION ALL.
Wrap them in the outer SELECT as shown above.
Again: If there is any chance of fixing your broken DB design instead, do that, it will pay off in the long run.
I had an old database with a single table containing customer orders and customer details. I went on to create a new database model using seperate tables for customers and details. I managed to migrate the customer details to the new database, but was unable to migrate the the cusomer orders. We thought that this would be ok, and that we would just build the order record from now on ignoring all previous orders in the old database. This was a while ago, and I cannot remember the exact reason why I was unable to import the customer orders. However, now we have discovered that we will need the old orders in the new database. Is there an easy way to do this using Microsoft Access?
This is the reason why:
Split a table in access into two linked tables
Depending on how complex your schema is, a simple approach would be schema-mapping by a INSERT INTO SELECT query.
For example if your old database had a table:
Orders
------
OrdID
CustID
ProductName
Price
oDay
oMonth
oYear
And your new database had fields with different names, extra fields, etc:
OrderDetails
------
Order_ID
Customer_ID
Product
Price
DeliveryAddress
OrderDate
All you would need to do was to create an insert query to append the old records to the new table. In defining the query, you can specify the source and destination field names, and you can even perform functions / expressions on the data. You can even query on the other table directly, without linking or importing it into your new database:
INSERT INTO OrderDetails (Order_ID,Customer_ID,Product,Price,OrderDate)
SELECT OrdID,CustID,ProductName,Price,DateSerial(oYear,oMonth,oDay) AS oDate
FROM Orders IN 'C:\oldDatabasePath.mdb';
If you have to do additional transformations to the data, such as run expressions on column values, I would recommend testing out the SELECT part of the query before adding the INSERT line.
So I'm trying to copy some data from database table to another. The problem is though, the target database table has 2 new columns that are required. I wanted to use the export/import wizard on SQL Server Management Studio but if I use that I will need to write a query for each table and I can only execute 1 query at a time. Was wondering if there are a more efficient way of doing it.
Here's an example of 1 table:
dbase1.dbo.Appointment { id, name, description, createdate }
dbase2.dbo.Appointment { id, name, description, createdate, auditby, auditat}
I have a total of 8 tables with those 2 additional columns. and most of them are related to each other via fk, so I wanted to use the wizard as it figures out which table gets inserted first. The problem with that is, it only works if I do a "copy data from one or more tables " and not the "write a query to specify data" (I use this to populate those two new columns).
I've been doing this very slow process in copying data as I'm using MVC Code First for my application and I dont have access to the server to be able to drop and create the table at my leisure. So I have to resort to this to maintain the data that I already have.
An idea: temporarily disable the foreign key constraints in the destination database. Then it doesn't matter what order you run your inserts. In order to populate the two new and required columns, you just need to pick some stock values to put in there (since obviously these rows initially are not subject to initial auditing). For example:
INSERT dbase2.dbo.appointment
(id, name, description, createdate, auditby, auditat)
SELECT id, name, description, createdate,
auditby = 'me', auditat = GETDATE()
FROM dbo.appointment;
Since it seems the challenge is merely that the destination requires columns that aren't in the source, and that you need to determine what should be populated in these audit columns, this seems to solve multiple problems at once. You just need to figure out what to put in there instead of 'me' and GETDATE().
(To get the wizard to pull these 8 tables for you, you might be able to create a view similar to the select portion of the above query, but that's more work and it won't see the underlying FK constraints to generate them in the right order anyway.)
Write the sql query for each of the insert processes in the order you want it. That would be the simplest approach.
Set the Default values for these two columns
Like for AuditAt - Default Date i.e. GetDate()
For AuditBy - The Person ID/Name
Now, you can Insert into these tables without entering for these two columns
Background
Recently I've started to use XML a lot more as a column in SQL Server 2005. During a bit of downtime yesterday, I noticed that two of the link tables I used a really just in the way and it bores me to tears having to write yet more supporting structure code for a couple of joins.
To actually generate the data for these two link tables, I pass in two XML fields to my stored procedure, which writes the main record, breaks the two XML variables down into #tables and inserts them into the actual tables with the new SCOPE_IDENTITY() from the master record.
After some though, I decided to just do away with those tables altogether and just store the XML in XML fields. Now I understand there are some pitfalls here, like general querying performance, GROUP BY doesn't work on XML data. And the query is generally a bit of a mess, but overall I like that I can now work with XElement when I get the data back.
Also, this stuff isn't going to get changed. It's a one shot affair, so I don't have to worry about modification.
I am wondering about the best way to actually get at this data. A lot of my queries involve getting a master record based upon the criteria of a child or even a subchild record. Most of the sprocs in the database do this but on a far more elaborate scale, usually requiring UDFs and Subqueries to work effectively but I have knocked up a trivial example to test querying some data...
INSERT INTO Customers VALUES ('Tom', '', '<PhoneNumbers><PhoneNumber Type="1" Value="01234 456789" /><PhoneNumber Type="2" Value="01746 482954" /></PhoneNumbers>')
INSERT INTO Customers VALUES ('Andy', '', '<PhoneNumbers><PhoneNumber Type="2" Value="07948 598348" /></PhoneNumbers>')
INSERT INTO Customers VALUES ('Mike', '', '<PhoneNumbers><PhoneNumber Type="3" Value="02875 482945" /></PhoneNumbers>')
INSERT INTO Customers VALUES ('Steve', '', '<PhoneNumbers></PhoneNumbers>')
Now I can see two ways of grabbing it.
Method 1
DECLARE #PhoneType INT
SET #PhoneType = 2
SELECT ct.*
FROM Customers ct
WHERE ct.PhoneNumbers.exist('/PhoneNumbers/PhoneNumber[#Type=sql:variable("#PhoneType")]') = 1
Really? sql:variable feels a bit unwholesome. However, it does work. However it's distinctively more difficult to access data in a more meaningful way.
Method 2
SELECT ct.*, pt.PhoneType
FROM Customers ct
CROSS APPLY ct.PhoneNumbers.nodes('/PhoneNumbers/PhoneNumber') AS nums(pn)
INNER JOIN PhoneTypes pt ON pt.ID = nums.pn.value('./#Type[1]', 'int')
WHERE nums.pn.value('./#Type[1]', 'int') = #PhoneType
This is more like it. Already I can easily expand it to do joins and all other good stuff. I've used CROSS APPLY before on a table valued function, and it was very good. The execution plan for this as opposed to the previous query is seriously more advanced. Admittedly I haven't done any indexing and whatnot on these tables, but it's 97% of the entire batch cost.
Method 2 (expanded)
SELECT ct.ID, ct.CustomerName, ct.Notes, pt.PhoneType
FROM Customers ct
CROSS APPLY ct.PhoneNumbers.nodes('/PhoneNumbers/PhoneNumber') AS nums(pn)
INNER JOIN PhoneTypes pt ON pt.ID = nums.pn.value('./#Type[1]', 'int')
WHERE nums.pn.value('./#Type[1]', 'int') IN (SELECT ID FROM PhoneTypes)
Nice IN clause here. I can also do something like pt.PhoneType = 'Work'
Finally
So I'm essentially obtaining the results that I want, but is there anything I should be aware of when using this mechanism to interrogate small amounts of XML data? Will it fall down on performance during elaborate searches? And is the storage of such markup style data too much of an overhead?
Side note
I've used things like sp_xml_preparedocument and OPENXML in the past just to pass lists into sprocs, but this is like a breath of fresh air in comparison!
One approach we've taken for some of our key items of information stored inside an XML column is to "surface" them as computed, persisted properties on the "parent" table. This is done using a little stored function.
It works great, because the value is computed only once every time the XML changes - as long as it's not changing, there's no recomputation, the value is stored on the table like any other column.
It's also great since it can be indexed! So if you're searching and/or joining on such a field - that works like a charm!
So you basically need a stored function along the lines of this:
CREATE FUNCTION [dbo].[GetPhoneNo1](#DataXML XML)
RETURNS VARCHAR(50)
WITH SCHEMABINDING
AS BEGIN
DECLARE #result VARCHAR(20)
SELECT
#result = #DataXML.value('(/PhoneNumbers/PhoneNumber[#Type="1"]/#Value)[1]', 'VARCHAR(50)')
RETURN #result
END
If you don't have a phone number of type 1, you'll just get back a NULL.
Then, you need to extend your parent table with a computed, persisted column:
ALTER TABLE dbo.Customers
ADD PhoneNumberType1 AS dbo.GetPhoneNo1(PhoneNumbers)
As you can see - it works just fine for single entries, but unfortunately, you cannot surface a whole list of properties. But if you have some key items, like ID's or something, that you expect most of your rows to have, this can be a very nice and slick way to get at that information more easily and more efficiently.
What is the best way to shred XML data into various database columns? So far I have mainly been using the nodes and value functions like so:
INSERT INTO some_table (column1, column2, column3)
SELECT
Rows.n.value('(#column1)[1]', 'varchar(20)'),
Rows.n.value('(#column2)[1]', 'nvarchar(100)'),
Rows.n.value('(#column3)[1]', 'int'),
FROM #xml.nodes('//Rows') Rows(n)
However I find that this is getting very slow for even moderate size xml data.
Stumbled across this question whilst having a very similar problem, I'd been running a query processing a 7.5MB XML file (~approx 10,000 nodes) for around 3.5~4 hours before finally giving up.
However, after a little more research I found that having typed the XML using a schema and created an XML Index (I'd bulk inserted into a table) the same query completed in ~ 0.04ms.
How's that for a performance improvement!
Code to create a schema:
IF EXISTS ( SELECT * FROM sys.xml_schema_collections where [name] = 'MyXmlSchema')
DROP XML SCHEMA COLLECTION [MyXmlSchema]
GO
DECLARE #MySchema XML
SET #MySchema =
(
SELECT * FROM OPENROWSET
(
BULK 'C:\Path\To\Schema\MySchema.xsd', SINGLE_CLOB
) AS xmlData
)
CREATE XML SCHEMA COLLECTION [MyXmlSchema] AS #MySchema
GO
Code to create the table with a typed XML column:
CREATE TABLE [dbo].[XmlFiles] (
[Id] [uniqueidentifier] NOT NULL,
-- Data from CV element
[Data] xml(CONTENT dbo.[MyXmlSchema]) NOT NULL,
CONSTRAINT [PK_XmlFiles] PRIMARY KEY NONCLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Code to create Index
CREATE PRIMARY XML INDEX PXML_Data
ON [dbo].[XmlFiles] (Data)
There are a few things to bear in mind though. SQL Server's implementation of Schema doesn't support xsd:include. This means that if you have a schema which references other schema, you'll have to copy all of these into a single schema and add that.
Also I would get an error:
XQuery [dbo.XmlFiles.Data.value()]: Cannot implicitly atomize or apply 'fn:data()' to complex content elements, found type 'xs:anyType' within inferred type 'element({http://www.mynamespace.fake/schemas}:SequenceNumber,xs:anyType) ?'.
if I tried to navigate above the node I had selected with the nodes function. E.g.
SELECT
,C.value('CVElementId[1]', 'INT') AS [CVElementId]
,C.value('../SequenceNumber[1]', 'INT') AS [Level]
FROM
[dbo].[XmlFiles]
CROSS APPLY
[Data].nodes('/CVSet/Level/CVElement') AS T(C)
Found that the best way to handle this was to use the OUTER APPLY to in effect perform an "outer join" on the XML.
SELECT
,C.value('CVElementId[1]', 'INT') AS [CVElementId]
,B.value('SequenceNumber[1]', 'INT') AS [Level]
FROM
[dbo].[XmlFiles]
CROSS APPLY
[Data].nodes('/CVSet/Level') AS T(B)
OUTER APPLY
B.nodes ('CVElement') AS S(C)
Hope that that helps someone as that's pretty much been my day.
in my case i'm running SQL 2005 SP2 (9.0).
The only thing that helped was adding OPTION ( OPTIMIZE FOR ( #your_xml_var = NULL ) ).
Explanation is on the link below.
Example:
INSERT INTO #tbl (Tbl_ID, Name, Value, ParamData)
SELECT 1,
tbl.cols.value('name[1]', 'nvarchar(255)'),
tbl.cols.value('value[1]', 'nvarchar(255)'),
tbl.cols.query('./paramdata[1]')
FROM #xml.nodes('//root') as tbl(cols) OPTION ( OPTIMIZE FOR ( #xml = NULL ) )
https://connect.microsoft.com/SQLServer/feedback/details/562092/an-insert-statement-using-xml-nodes-is-very-very-very-slow-in-sql2008-sp1
I'm not sure what is the best method. I used OPENXML construction:
INSERT INTO Test
SELECT Id, Data
FROM OPENXML (#XmlDocument, '/Root/blah',2)
WITH (Id int '#ID',
Data varchar(10) '#DATA')
To speed it up, you can create XML indices. You can set index specifically for value function performance optimization. Also you can use typed xml columns, which performs better.
We had a similar issue here. Our DBA (SP, you the man) took a look at my code, made a little tweak to the syntax, and we got the speed we had been expecting. It was unusual because my select from XML was plenty fast, but the insert was way slow. So try this syntax instead:
INSERT INTO some_table (column1, column2, column3)
SELECT
Rows.n.value(N'(#column1/text())[1]', 'varchar(20)'),
Rows.n.value(N'(#column2/text())[1]', 'nvarchar(100)'),
Rows.n.value(N'(#column3/text())[1]', 'int')
FROM #xml.nodes('//Rows') Rows(n)
So specifying the text() parameter really seems to make a difference in performance. Took our insert of 2K rows from 'I must have written that wrong - let me stop it' to about 3 seconds. Which was 2x faster than the raw insert statements we had been running through the connection.
I wouldn't claim this is the "best" solution, but I've written a generic SQL CLR procedure for this exact purpose - it takes a "tabular" Xml structure (such as that returned by FOR XML RAW) and outputs a resultset.
It does not require any customization / knowledge of the structure of the "table" in the Xml, and turns out to be extremely fast / efficient (although this wasn't a design goal). I just shredded a 25MB (untyped) xml variable in under 20 seconds, returning 25,000 rows of a pretty wide table.
Hope this helps someone:
http://architectshack.com/ClrXmlShredder.ashx
This isn't an answer, more an addition to this question - I have just come across the same problem and I can give figures as edg asks for in the comment.
My test has xml which results in 244 records being inserted - so 244 nodes.
The code that I am rewriting takes on average 0.4 seconds to run.(10 tests run, spread from .56 secs to .344 secs) Performance is not the main reason the code is being rewritten, but the new code needs to perform as well or better. This old code loops the xml nodes, calling a sp to insert once per loop
The new code is pretty much just a single sp; pass the xml in; shred it.
Tests with the new code switched in show the new sp takes on average 3.7 seconds - almost 10 times slower.
My query is in the form posted in this question;
INSERT INTO some_table (column1, column2, column3)
SELECT
Rows.n.value('(#column1)[1]', 'varchar(20)'),
Rows.n.value('(#column2)[1]', 'nvarchar(100)'),
Rows.n.value('(#column3)[1]', 'int'),
FROM #xml.nodes('//Rows') Rows(n)
The execution plan appears to show that for each column, sql server is doing a separate "Table Valued Function [XMLReader]" returning all 244 rows, joining all back up with Nested Loops(Inner Join). So In my case where I am shredding from/ inserting into about 30 columns, this appears to happen separately 30 times.
I am going to have to dump this code, I don't think any optimisation is going to get over this method being inherently slow. I am going to try the sp_xml_preparedocument/OPENXML method and see if the performance is better for that. If anyone comes across this question from a web search (as I did) I would highly advise you to do some performance testing before using this type of shredding in SQL Server
There is an XML Bulk load COM object (.NET Example)
From MSDN:
You can insert XML data into a SQL
Server database by using an INSERT
statement and the OPENXML function;
however, the Bulk Load utility
provides better performance when you
need to insert large amounts of XML
data.
My current solution for large XML sets (> 500 nodes) is to use SQL Bulk Copy (System.Data.SqlClient.SqlBulkCopy) by using a DataSet to load the XML into memory and then pass the table to SqlBulkCopy (defining a XML schema helps).
Obviously there a pitfalls such as needlessly using a DataSet and loading the whole document into memory first. I would like to go further in the future and implement my own IDataReader to bypass the DataSet method but currently the DataSet is "good enough" for the job.
Basically I never found a solution to my original question regarding the slow performance for that type of XML shredding. It could be slow due to the typed xml queries being inherently slow or something to do with transactions and the the SQL Server log. I guess the typed xml functions were never designed for operating on non-trivial node sizes.
XML Bulk Load: I tried this and it was fast but I had trouble getting the COM dll to work under 64bit environments and I generally try to avoid COM dlls that no longer appear to be supported.
sp_xml_preparedocument/OPENXML: I never went down this road so would be interested to see how it performs.