I'm getting some horrific performance from an XQuery projection in Sql Server.
What would be the best way to write the following transformation?
select DocumentData.query(
'<object type="dynamic">
<state>
<OrderTotal type="decimal">
{fn:sum(
for $A in /object[1]/state[1]/OrderDetails[1]/object/state[1]
return ($A/ItemPrice[1] * $A/Quantity[1]))}
</OrderTotal>
<CustomerId type="guid">
{xs:string(/object[1]/state[1]/CustomerId[1])}
</CustomerId>
<Details type="collection">
{/object[1]/state[1]/OrderDetails[1]/object}
</Details>
</state>
</object>') as DocumentData
from documents
(I know the code is a bit out of context)
If I check the executionplan for this code, there is about 10+ joins going on.
Should I break this down to use for $var for each level in the structure?
For more context, this is what I'm trying to accomplish:
http://rogeralsing.com/2011/03/02/linq-to-sqlxml-projections/
I'm writing a "Linq to XQuery translator" / NoSQL Document DB emulator, filtering works like a charm, projections suffer from perf problems.
This article is quite useful:
Performance Optimizations for the XML Data Type in SQL Server 2005
In particular it recommends that instead of writing paths of the form...
/object[1]/state[1]/CustomerId[1]
you should instead write...
(/object/state/CustomerId)[1]
Related
We use Neo4j AuraDB for our graph database but there we have issues with data upload. So, we decided to move to AWS Neptune using the migration tool.
We have 3.7M nodes and 11.2M relations in our database. The DB instance is db.r5.large with 2 CPUs and 16GiB RAM.
The same AWS Neptune OpenCypher queries are much slower than AuraDB Cypher queries (about 7-10 times slower). Also, we tried to rewrite the queries to Gremlin and test performance but it is still very slow. We have node and lookup indexes on AuraDB but we can't create them on AWS Neptune as it handles them automatically.
Is there any way to reach better performance on AWS Neptune?
UPDATE:
Example of Gremlin query:
g.V().hasLabel('Member').has('address', eq('${address}')).outE('HAS').as('member_has').inV().as('token').hasLabel('Token').inE('HAS').as('other_member_has').outV().as('other_member').hasLabel('Member').where(__.select('member_has').where(neq('other_member_has'))).select('other_member', 'token').group().by(__.select('other_member').local(__.properties().group().by(__.key()).by(__.map(__.value())))).by(__.fold().project('member', 'number_of_tokens').by(__.unfold().select('other_member').choose(neq('cypher.null'), __.local(__.properties().group().by(__.key()).by(__.map(__.value()))))).by(__.unfold().select('token').count())).unfold().select(values).order().by(__.select('number_of_tokens'), desc).limit(20)
Example of Cypher query:
MATCH (member:Member { address: '${address}' })-[:HAS]->(token:Token)<-[:HAS]-(other_member:Member) RETURN PROPERTIES(other_member) as member, COUNT(token) AS number_of_tokens ORDER BY number_of_tokens DESC LIMIT 20
As discussed in the comments, as of this moment, the openCypher support is a preview, not quite GA level. The more recent engine versions do have some significant improvements but more are yet to be delivered. As to the Gremlin query, tools that convert Cypher to Gremlin tend to build quite complex queries. I think the Gremlin equivalent to the Cypher query is going to look something like this.
g.V().has('Member','address', address).as('m').
out('HAS').hasLabel('Token').as('t').
in('HAS').hasLabel('Member').as('om').
where(neq('m')).
group().
by('om').
by(select('t').count()).
order(local).
by(values,desc).
limit(20)
and if you want all of the properties just add a valueMap as in:
g.V().has('Member','address', address).as('m').
out('HAS').hasLabel('Token').as('t').
in('HAS').hasLabel('Member').as('om').
where(neq('m')).
group().
by(select('om').valueMap(true)).
by(select('t').count()).
order(local).
by(values,desc).
limit(20)
I am generating & sending XML EVENTS from the database through a SQL BROKER using SQL CLR - and it works great. However, I am looking at the SQL PLAN and am a little shocked at some of the statistics. Small transformations seem to cost quite a bit of CPU TIME.
All the examples I see online optimize the TABLE the XML sits in by adding an index (etc)...but there is no table for me (I am simply generating the XML).
As such...
Q: Is there a way to "optimize" these kind of "generational" statements?
Maybe some approaches are better than others?
I have yet to see anything online about this
Thanks.
SAMPLES OF EXPENSIVE STATEMENTS:
DECLARE #CurrentId UNIQUEIDENTIFIER = (SELECT #Event.value('(/Event/#auditId)[1]', 'UNIQUEIDENTIFIER'));
SET #Event.modify('replace value of (/Event/#auditId)[1][1] with sql:variable("#NewId")');
EVENT XML:
An event would look like...
<Event auditId="FE4D0A4C-388B-E611-9B4D-0050569B733D" force="false" CreatedOn="2016-10-05T20:14:20.020">
<DataSource machineName="ABC123">DatabaseName</DataSource>
<Topic>
<Filter>TOPIC/ENTITY/ACTION</Filter>
</Topic>
<Name>Something.Created</Name>
<Contexts>
<Context>
<Name>TableName</Name>
<Key>
<IssueId>000</IssueId>
</Key>
</Context>
</Contexts>
</Event>
An XML index will not help you with this (Read this). There are very rare situations, where this kind of index would help you. The effect is high, if you read from your XML with full-path. In the moment you are using XQuery, any kind of navigation, it makes things even worse.
.modify() is quite heavy. In this special case it could be faster to rebuild the XML as such (you know more about it than the engine does):
DECLARE #xml XML=N'
<Event auditId="FE4D0A4C-388B-E611-9B4D-0050569B733D" force="false" CreatedOn="2016-10-05T20:14:20.020">
<DataSource machineName="ABC123">DatabaseName</DataSource>
<Topic>
<Filter>TOPIC/ENTITY/ACTION</Filter>
</Topic>
<Name>Something.Created</Name>
<Contexts>
<Context>
<Name>TableName</Name>
<Key>
<IssueId>000</IssueId>
</Key>
</Context>
</Contexts>
</Event>';
DECLARE #NewId UNIQUEIDENTIFIER=NEWID();
SELECT #NewId AS [#auditId]
,e.value('#force','nvarchar(max)') AS [#force] --read this as string to avoid expensive conversions
,e.value('#CreatedOn','nvarchar(max)') AS [#CreatedOn] --same here
,e.query('*') AS [node()] --read "as-is"
FROM #xml.nodes('/Event') AS A(e)
FOR XML PATH('Event');
There is - for sure! - no general approach to get XML things faster. If this existed, it would be the one and only ...
I'd monitor the system and pick out the most expensive calls and try to modify them one by one...
I am trying to retrieve data out of the ECM eRoom Database (which isn't documented, as I know of).
I have an eRoom with an custom "Database", some Fields.
When I query the objects table I find the "Database" row
select * from[dbo].[Objects] where internalId = 1234567
and the Rows for the entries
select top 10 * from[dbo].[Objects] where parentInternalId = 1234567
but I don't find any field with the values of the entries, only an Column with NonSearchableProperties., that is only full with Hex Data.
My Question(s),
how could i retrieve the values?
Is it possible to retrieve them with mere SQL?
What is the simplest way?
This is not the silver bullet, but it is okay for my usecase
After long goolging and alot of test scripts, i found some answers, but probably due to the fact the the system is soon end-of-life and that the documentation is not easy to read, here are my finding.
Is it possible to retrieve them with mere SQL?
As far as I could find out, no! (please correct me If I'm wrong)
how could i retrieve the values?
With the eRoom API(on the Server there are some Sample programms to query the data/objects <installation-path>\eRoom Server\Toolkit\Samples, with c++, vb, vbscript, ... all a bit to much overhead), or with the eRoom XML Query Language(exql) over soap calls.
What is the simplest way?
After alot of tests, searching in forums and many tests with soap ui. I found out that queries with exql seem to be the simplest way to retrieve Data, if you understand the structure.
Here some ressource that were helpful:
(very) Basic info of exql from the manufacturer https://eroom.asce.org/eRoomHelp/en/XML_Help/Introduction.htm
(disclaimer: I didn't find it helpful, but it shows at least some basics)
short 9 page Developer guide https://developer-content.emc.com/developer/downloads/eRoomXMLCapabilitiesUseCaseProgramDashboard.pdf (the last example on page 8, helped me understand how to setup the query, with alot of fantasy)
But for this to work, don't forget, to activate Allow XML queries and commands from external applications in the Site Settings
TIP 1:
you always can go go deeper you just need to know the right xml-element under. <Database>, <Cells> and <DBCell> can help you go deeper
TIP 2:
don't query to much data since this query likely run into timeouts
Update 1:
Just to save time for anyone who is looking, this "query" returns all rows (properties) for a Database(s) created in an eRoom Root.
(don't forget to set facility and room in the Url ex. http://server/eroomxml/facilities/TEST/Rooms/TestRoom, although it could be set in the query)
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:er="http://www.eroom.com/eRoomXML/2003/700">
<soapenv:Header/>
<soapenv:Body>
<er:ExecuteXMLCommand>
<er:eRoomXML>
<er:command er:select="HomePage/Items">
<er:getproperties>
<er:Item>
<Database>
<Rows>
<Item>
<Cells>
<DBCell>
<Content>
</Content>
</DBCell>
</Cells>
</Item>
</Rows>
</Database>
</er:Item>
</er:getproperties>
</er:command>
</er:eRoomXML>
</er:ExecuteXMLCommand>
</soapenv:Body>
</soapenv:Envelope>
I have been using EF6 for my application and will soon move to 6.1. Normally all my EF is handled with LINQ like this:
var exams = _examsRepository
.GetAll()
.Where(q => q.SubjectId == subjectId)
.Include(q => q.Tests)
.ToList();
However I just had a suggestion to use this for a particular query:
var exams1 = (from ex in DbContext.Exams
join t in DbContext.Tests on ex.ExamId equals t.ExamId
join ut in DbContext.UserTests on t.TestId equals ut.TestId
where ut.UserId == "123"
select new { ex, t, ut }).ToList();
Can someone tell me what are the advantages of using the second way. I realize one advantage is that it seems I can do things which cannot be done the first way (nobody has yet been able to code the what's needed in the second example the first way. What if any are the other advantages? Are they the only two recommended ways I could use EF to query a SQL Server 2012 database.
I would like to learn more about how to code the second way. Does anyone know of some good links which explain this?
method versus query based syntax with Linq
Method based is more comprehensive than query based.
You can mix them a little.
Query can be more readable especially around Joins.
I've got some XML Data in a SQL Server Table in an XML Column as follows:
<AffordabilityResults>
<matchlevel xmlns="urn:callcredit.co.uk/soap:affordabilityapi2">IndividualMatch</matchlevel>
<searchdate xmlns="urn:callcredit.co.uk/soap:affordabilityapi2">2013-07-29T11:20:53</searchdate>
<searchid xmlns="urn:callcredit.co.uk/soap:affordabilityapi2">{E40603B5-B59C-4A6A-92AB-98DE83DB46E7}</searchid>
<calculatedgrossannual xmlns="urn:callcredit.co.uk/soap:affordabilityapi2">13503</calculatedgrossannual>
<debtstress xmlns="urn:callcredit.co.uk/soap:affordabilityapi2">
<incomedebtratio>
<totpaynetincome>0.02</totpaynetincome>
<totamtunsecured>0.53</totamtunsecured>
<totamtincsec>0.53</totamtincsec>
</incomedebtratio>
</debtstress>
</AffordabilityResults>
You'll note that some of the elements have an xmlns attribute and some don't...
I need to write queries to return the data - and more importantly show a business analyst how to write her own queries to get the data she needs so I want it to be as simple as possible.
I can query the data easily using the WITH XMLNAMESPACES element as follows:
WITH XMLNAMESPACES (N'urn:callcredit.co.uk/soap:affordabilityapi2' as x )
SELECT
ResponseXDoc.value('(/AffordabilityResults/x:matchlevel)[1]','varchar(max)' ) AS MatchLevel
, ResponseXDoc.value('(/AffordabilityResults/x:debtstress/x:incomedebtratio/x:totamtunsecured)[1]','nvarchar(max)' ) AS UnsecuredDebt
FROM [NewBusiness].[dbo].[t_TacResults]
But adding the x: part to the query makes it look overly complicated, and I want to keep it simple for the business analyst.
I tried adding:
WITH XMLNAMESPACES (DEFAULT 'urn:callcredit.co.uk/soap:affordabilityapi2' )
and removing the x: from the XQuery - but this returns null (possibly because of the lack of the xmlns on the root element?)
Is there any way I can simplify these queries either with or without the default namespace?
If namespaces are not important in your use case, you could use the namespace wildcard selector *:, which both selects nodes without and with arbitrary namespaces.
An example query could be
(/*:AffordabilityResults/*:matchlevel)[1]
The business analyst will still have to add the selector in front of every node test, but it's the same "prefix" all the time and the only error to be expected is forgetting to use it somewhere.