How to decode OLAP Query? - sql-server

I am totally fresher to OLAP server. i have a OLAP query that is working fine, i just want to know, which tables are linked to send the result and how(i mean with which joins). Here is query.
WITH MEMBER [Measures].[ThisYearMonthToDate] AS 'Sum({[Time].[All Time].[2013].
[Q1].[January],[Time].[All Time].[2013].[Q1].[February],[Time].[All Time].[2013].
[Q1].[March],[Time].[All Time].[2013].[Q2].[April],[Time].[All Time].[2013].[Q2].[May]},
[Measures].[Main Temp Id])'MEMBER [Measures].[LastYearMonthToDate] AS
'Sum({[Time].[All Time].[2012].[Q1].[January],[Time].[All Time].[2012].[Q1].[February],
[Time].[All Time].[2012].[Q1].[March],[Time].[All Time].[2012].[Q2].[April],
[Time].[All Time].[2012].[Q2].[May]}, [Measures].[Main Temp Id])' SELECT {[Measures].
[LastYearMonthToDate], [Measures].[ThisYearMonthToDate]} ON COLUMNS,
{([PublicRegion].[All Regions].[USA]),([PublicRegion].[All Regions].[USA].[Northeast]),
([PublicRegion].[All Regions].[USA].[Midwest]),([PublicRegion].[All Regions].[USA].
[Southeast]),([PublicRegion].[All Regions].[USA].[Southwest]),([PublicRegion].[All
Regions].[USA].[West Coast]),([PublicRegion].[All Regions].[USA].[Misc]),
([PublicRegion].[All Regions].[Europe]),([PublicRegion].[All Regions].[Europe].[UK]),
([PublicRegion].[All Regions].[Europe].[France]),([PublicRegion].[All Regions].
[Europe].[Italy]),([PublicRegion].[All Regions].[Europe].[Germany]),
([PublicRegion].[All Regions].[Europe].[Spain]),([PublicRegion].[All Regions].
[Canada]),([PublicRegion].[All Regions].[Other])} ON ROWS FROM Public
i am not getting how to decode this query. Please help me..

There are two pretty easy ways to find out:
Log of you OLAP server: I'm almost sure that all leading OLAP tools logs SQL queries sent to database server.
Log of your database server: Set your database to log all queries from all users. By time of execution, and user name you declared in metadata file, you can easily filter queries sent by OLAP tool.
Hope this helps,
Best regards

Related

AWS Neptune performance / Comparing to Neo4j AuraDB

We use Neo4j AuraDB for our graph database but there we have issues with data upload. So, we decided to move to AWS Neptune using the migration tool.
We have 3.7M nodes and 11.2M relations in our database. The DB instance is db.r5.large with 2 CPUs and 16GiB RAM.
The same AWS Neptune OpenCypher queries are much slower than AuraDB Cypher queries (about 7-10 times slower). Also, we tried to rewrite the queries to Gremlin and test performance but it is still very slow. We have node and lookup indexes on AuraDB but we can't create them on AWS Neptune as it handles them automatically.
Is there any way to reach better performance on AWS Neptune?
UPDATE:
Example of Gremlin query:
g.V().hasLabel('Member').has('address', eq('${address}')).outE('HAS').as('member_has').inV().as('token').hasLabel('Token').inE('HAS').as('other_member_has').outV().as('other_member').hasLabel('Member').where(__.select('member_has').where(neq('other_member_has'))).select('other_member', 'token').group().by(__.select('other_member').local(__.properties().group().by(__.key()).by(__.map(__.value())))).by(__.fold().project('member', 'number_of_tokens').by(__.unfold().select('other_member').choose(neq('cypher.null'), __.local(__.properties().group().by(__.key()).by(__.map(__.value()))))).by(__.unfold().select('token').count())).unfold().select(values).order().by(__.select('number_of_tokens'), desc).limit(20)
Example of Cypher query:
MATCH (member:Member { address: '${address}' })-[:HAS]->(token:Token)<-[:HAS]-(other_member:Member) RETURN PROPERTIES(other_member) as member, COUNT(token) AS number_of_tokens ORDER BY number_of_tokens DESC LIMIT 20
As discussed in the comments, as of this moment, the openCypher support is a preview, not quite GA level. The more recent engine versions do have some significant improvements but more are yet to be delivered. As to the Gremlin query, tools that convert Cypher to Gremlin tend to build quite complex queries. I think the Gremlin equivalent to the Cypher query is going to look something like this.
g.V().has('Member','address', address).as('m').
out('HAS').hasLabel('Token').as('t').
in('HAS').hasLabel('Member').as('om').
where(neq('m')).
group().
by('om').
by(select('t').count()).
order(local).
by(values,desc).
limit(20)
and if you want all of the properties just add a valueMap as in:
g.V().has('Member','address', address).as('m').
out('HAS').hasLabel('Token').as('t').
in('HAS').hasLabel('Member').as('om').
where(neq('m')).
group().
by(select('om').valueMap(true)).
by(select('t').count()).
order(local).
by(values,desc).
limit(20)

pandas.DataFrame.to_sql inserts data, but doesn't commit the transaction

I have a pandas dataframe I'm trying to insert into MS SQL EXPRESS as per below:
import pandas as pd
import sqlalchemy
engine = sqlalchemy.create_engine("mssql+pyodbc://user:password#testodbc")
connection = engine.connect()
data = {'Host': ['HOST1','HOST2','HOST3','HOST4'],
'Product': ['Apache HTTP 2.2','RedHat 6.9','OpenShift 2','JRE 1.3'],
'ITBS': ['Infrastructure','Accounting','Operations','Accounting'],
'Remediation': ['Upgrade','No plan','Decommission','Decommission'],
'TargetDate': ['2018-12-31','NULL','2019-03-31','2019-06-30']}
df = pd.DataFrame(data)
When I call:
df.to_sql(name='TLMPlans', con=connection, index=False, if_exists='replace')
and then:
print(engine.execute("SELECT * FROM TLMPLans").fetchall())
I can see the data alright, but it actually doesn't commit any transaction:
D:\APPS\Python\python.exe
C:/APPS/DashProjects/dbConnectors/venv/Scripts/readDataFromExcel.py
[('HOST1', 'Apache HTTP 2.2', 'Infrastructure', 'Upgrade', '2018-12-31'), ('HOST2', 'RedHat 6.9', 'Accounting', 'No plan', 'NULL'), ('HOST3', 'OpenShift 2', 'Operations', 'Decommission', '2019-03-31'), ('HOST4', 'JRE 1.3', 'Accounting', 'Decommission', '2019-06-30')]
Process finished with exit code 0
It says here I don't have to commit as SQLAlchemy does it:
Does the Pandas DataFrame.to_sql() function require a subsequent commit()?
and the below suggestions don't work:
Pandas to_sql doesn't insert any data in my table
I spent good 3 hours looking for clues all over the Internet, but I'm not getting any relevant answers, or I don't know how to ask the question.
Any guidance on what to look for would be highly appreciated.
UPDATE
I'm able to commit changes using pyodbc connection and full insert statement, however pandas.DataFrame.to_sql() with SQLAlchemy engine doesn't work. It send the data to memory instead the actual database, regardless if schema is specified or not.
I would really appreciate help with this on, or possibly it is a panda issue I need to report?
I had the same issue, I realised you need to tell pyodbc which database you want to use. For me the default was master, so my data ended up there.
There are two ways you can do this, either:
connection.execute("USE <dbname>")
Or define the schema in the df.to_sql():
df.to_sql(name=<TABELENAME>, conn=connection, schema='<dbname>.dbo')
In my case the schema was <dbname>.dbo I think the .dbo is default so it could be something else if you define an alternative schema
This was referenced in this answer, it took me a bit longer to realise what the schema name should be.

Scaling issue with a view in sql server 2005

I have an issue with scaling when using a view in MS SQL Server 2005. My defined view returns the data of interest on a small test dataset; however, when I try to return the view on a large dataset (>10000 rows), my query just runs and runs without returning. Is this an error in the efficiency of the view definition, or maybe do I need to include indexing? Are there other ways that I could make the view return data more quickly/efficiently so that it can be practically used?
My current view definition is as follows: -
CREATE VIEW reportedProjects AS
SELECT vc.projectID, vc.sampleID, vc.runID, vc.ngsSubpanel,
va.[HGVS_c], va.[HGVS_p], va.geneID, va.transcriptID,
sc.score, sc.scorer, sc.checker
FROM dbo.varCall vc, dbo.varAnnotation va, dbo.varScore sc
WHERE vc.varCallID = va.varCallID
AND vc.variantID = va.variantID
AND va.variantID = sc.variantID
AND va.transcriptID = sc.transcriptID;
GO
-- Update --
Apologies for any red herrings this may have thrown up for readers. A change of base table columns (not listed in the above problem description) to join on within the view has solved my issue. All suggestions offered were much appreciated though.
Add indexes for varCallID, variantID, and transcriptID; then use joins and see how it performs

Query Performance in Access 2007 - drawing on a SQL Server Express backend

I've been banging my head on this issue for a little while now, and decided I should ask for help. I have a table which holds temperature/humidity chart recorder data (currently over 775,000 records) from which I am trying to run a statistical query against it. The problem is that this often will take up to two minutes, and sometimes will not come back at all - causing me to force close the program (Control-Alt-Delete). At first, I didn't have as much of a problem - it was only after I hit the magical 500k records mark that I started getting serious slowdowns, getting progressively worse as more data was compiled and imported into the table.
Here is the query (pass-through):
SELECT dbo.tblRecorderLogs.strAreaAssigned, Min(dbo.tblRecorderLogs.datDateRecorded) AS FirstRecorderDate, Max(dbo.tblRecorderLogs.datDateRecorded) AS LastRecordedDate,
Round(Avg(dbo.tblRecorderLogs.intTempCelsius),2) AS AverageTempC,
Round(Avg(dbo.tblRecorderLogs.intRHRecorded),2) AS AverageRH,
Count(dbo.tblRecorderLogs.strAreaAssigned) AS Records
FROM dbo.tblRecorderLogs
GROUP BY dbo.tblRecorderLogs.strAreaAssigned
ORDER BY dbo.tblRecorderLogs.strAreaAssigned;
Here is the table structure in which the chart data is stored:
idRecorderDataID Number Primary Key
datDateEntered Date/Time (indexed, duplicates OK)
datTimeEntered Date/Time
intTempCelcius Number
intDewPointCelcius Number
intWetBulbCelcius Number
intMixingGPP Number
intRHRecorded Number
strAssetRecorder Text (indexed, duplicates OK)
strAreaAssigned Text (indexed, duplicates OK)
I am trying to write a program which will allow people to pull data from this table based on Area Assigned, as well as start and end dates. With the dataset size I currently have, this kind of report is simply too much for it to handle (it seems) and the machine doesn't ever return an answer. I've had to extend the ODBC timeout to almost 180 seconds in any queries dealing with this table, simply because of the size. I could use some serious help, if people have some. Thank you in advance!
-- Edited 08/13/2012 # 1050 hours --
I have not been able to test the query on the SQL Server due to the fact that the IT department has taken control of the machine in question, and has someone logged into it full-time using the remote management console. I have tried an interim step to lessen the impact of the performance issue, but I am still looking for a permanent solution to this issue.
Interim step:
I created a local table mirroring the structure of the dbo.tblRecorderLogs SQL Server table, to which I do a INSERT INTO using the former SELECT statement as it's subquery. Then any subsequent statistical analysis is drawn from this 'temporary' local table. After the process is complete, the local table is truncated.
-- Edited 08/13/2012 # 1217 hours --
Ran the shown query on the SQL Server Management Console, took 1 minute 38 seconds to complete according to the query timer provided by the console.
-- Edit 08/15/2012 # 1531 hours --
Tried to run query as VBA DoCmd.RunSQL statement to populate a temporary table using the following code:
INSERT INTO tblTempRecorderDataStatsByArea ( strAreaAssigned, datFirstRecord,
datLastRecord, intAveTempC, intAveRH, intRecordCount )
SELECT dbo_tblRecorderLogs.strAreaAssigned, Min(dbo_tblRecorderLogs.datDateRecorded)
AS MinOfdatDateRecorded, Max(dbo_tblRecorderLogs.datDateRecorded) AS MaxOfdatDateRecorded,
Round(Avg(dbo_tblRecorderLogs.intTempCelsius),2) AS AveTempC,
Round(Avg(dbo_tblRecorderLogs.intRHRecorded),2) AS AveRHRecorded,
Count(dbo_tblRecorderLogs.strAreaAssigned) AS CountOfstrAreaAssigned FROM
dbo_tblRecorderLogs GROUP BY dbo_tblRecorderLogs.strAreaAssigned ORDER BY
dbo_tblRecorderLogs.strAreaAssigned
The problem arises when the code is executed, the query takes so long - it encounters Timeout before it finishes. Still hoping for a 'magic bullet' to fix this...
-- Edited 08/20/2012 # 1241 hours --
The only 'quasi' solution I've found is running the failed query repeatedly (sort of priming the pump, as it were) so that when the query is called again by my program - it has a relative chance of actually completing before the ODBC SQL Server driver times out. Basically, a filthy filthy hack - but I don't have a better one to combat this issue.
I've tried creating a view, which works on the server side - but doesn't speed things up.
The proper fields being aggregated are indexed properly, so I can't make any changes there.
I am only pulling information from the database that is immediately useful to user - no 'SELECT * madness' going on here.
I think I am, officially, out of things to try - aside from throwing raw computing horsepower at the problem, which isn't a solution right now as the item isn't live, and I have no budget to procure better hardware. I will post this as an 'answer' and leave it up until Sept 3rd - where if I do not have better answers, I will accept my own answer and accept defeat.
When I've had to run min/max functions on several fields from the same table I've often found it quicker to do each column separately as a subquery in the from line of the main/outer query.
So your query would be like this:
SELECT rLogs1.strAreaAssigned, rLogs1.FirstRecorderDate, rLogs2.LastRecorderDate, rLog3.AverageTempC, rLogs4.AverageRH, rLogs5.Records
FROM (((
(SELECT strAreaAssigned, min(datDateRecorded) as FirstRecorderDate FROM dbo.tblRecorderLogs GROUP BY strAreaAssigned) rLogs1
inner join
(SELECT strAreaAssigned, Max(datDateRecorded) as LastRecordedDate, FROM dbo.tblRecorderLogs GROUP BY strAreaAssigned) rLogs2
on rLogs1.strAreaAssigned = rLogs2.strAreaAssigned)
inner join
(SELECT strAreaAssigned, Round(Avg(intTempCelsius),2) AS AverageTempC, FROM dbo.tblRecorderLogs GROUP BY strAreaAssigned) rLogs3
on rLogs1.strAreaAssigned = rLogs3.strAreaAssigned)
inner join
(SELECT strAreaAssigned, Round(Avg(intRHRecorded),2) AS AverageRH, FROM dbo.tblRecorderLogs GROUP BY strAreaAssigned) rLogs4
on rLogs1.strAreaAssigned = rLogs4.strAreaAssigned)
inner join
(SELECT strAreaAssigned, Count(strAreaAssigned) AS Records, FROM dbo.tblRecorderLogs GROUP BY strAreaAssigned) rLogs5
on rLogs1.strAreaAssigned = rLogs5.strAreaAssigned
ORDER BY rLogs1.strAreaAssigned;
If you take your query and the one above, copy them into the same query window in SQL Server and run the estimated execution plan you should be able to compare them and see which one works better.

SOQL - Convert Date To Owner Locale

We use the DBAmp for integrating Salesforce.com with SQL Server (which basically adds a linked server), and are running queries against our SF data using OPENQUERY.
I'm trying to do some reporting against opportunities and want to return the created date of the opportunity in the opportunity owners local date time (i.e. the date time the user will see in salesforce).
Our dbamp configuration forces the dates to be UTC.
I stumbled across a date function (in the Salesforce documentation) that I thought might be some help, but I get an error when I try an use it so can't prove it, below is the example useage for the convertTimezone function:
SELECT HOUR_IN_DAY(convertTimezone(CreatedDate)), SUM(Amount)
FROM Opportunity
GROUP BY HOUR_IN_DAY(convertTimezone(CreatedDate))
Below is the error returned:
OLE DB provider "DBAmp.DBAmp" for linked server "SALESFORCE" returned message "Error 13005 : Error translating SQL statement: line 1:37: expecting "from", found '('".
Msg 7350, Level 16, State 2, Line 1
Cannot get the column information from OLE DB provider "DBAmp.DBAmp" for linked server "SALESFORCE".
Can you not use SOQL functions in OPENQUERY as below?
SELECT
*
FROM
OPENQUERY(SALESFORCE,'
SELECT HOUR_IN_DAY(convertTimezone(CreatedDate)), SUM(Amount)
FROM Opportunity
GROUP BY HOUR_IN_DAY(convertTimezone(CreatedDate))')
UPDATE:
I've just had some correspondence with Bill Emerson (I believe he is the creator of the DBAmp Integration Tool):
You should be able to use SOQL functions so I am not sure why you are
getting the parsing failure. I'll setup a test case and report back.
I'll update the post again when I hear back. Thanks
A new version of DBAmp (2.14.4) has just been released that fixes the issue with using ConvertTimezone in openquery.
Version 2.14.4
Code modified for better memory utilization
Added support for API 24.0 (SPRING 12)
Fixed issue with embedded question marks in string literals
Fixed issue with using ConvertTimezone in openquery
Fixed issue with "Invalid Numeric" when using aggregate functions in openquery
I'm fairly sure that because DBAmp uses SQL and not SOQL, SOQL functions would not be available, sorry.
You would need to expose this data some other way. Perhaps it's possible with a Salesforce report, web-service, or compiling the data through the program you are using to access the (DBAmp) SQL Server.
If you were to create a Salesforce web service, the following example might be helpful.
global class MyWebService
{
webservice static AggregateResult MyWebServiceMethod()
{
AggregateResult ar = [
SELECT
HOUR_IN_DAY(convertTimezone(CreatedDate)) Hour,
SUM(Amount) Amount
FROM Opportunity
GROUP BY HOUR_IN_DAY(convertTimezone(CreatedDate))];
system.debug(ar);
return ar;
}
}

Resources