Better way to join table valued function - sql-server

I am trying to generate a table that holds a client age analysis at a certain time. My source data is from Pastel Evolution accounting system.
They have a table valued function [_efnAgedPostARBalancesSum] that takes 2 Parameters (date and client link) and returns Age1, Age2, etc for entered client link. I need to get the ageing for all the clients in the client table.
I managed to get it working by using cross apply as per below, but it takes a long time to execute. If I run the age analysis from within Pastel it takes about 20 seconds, in Sql it takes about 6 minutes.
The function is encrypted so I cannot see what it does. I am using SQL Server 2008 R2.
Is there a more efficient alternative to cross apply?
SELECT
f.AccountLink,
f.AccountBalance,
f.Age1,
f.Age2,
f.Age3,
f.Age4,
f.Age5,
f.Age6,
f.Age7
FROM
Client
CROSS APPLY [_efnAgedPostARBalancesSum] ('2014-09-30', Client.DCLink) AS f

It looks like an AR aging bucket function from the outside - and probably has custom bucket sizes (given the generic age1, age2, etc). They're notoriously compute intensive. Its the kind of query that often spawns the need for a separate BI database as an OLTP system is not ideal for analytical queries. Its not only slow to run, its also likely to be impacting other work in your OLTP system while this function is banging on it.
You can bet it's looking at the due dates from the various documents that contain balances due (very likely several sources). They might not all be indexed on the due date columns. Look for that first. If you run the query in SSMS with show plan on, it may suggest one or more indexes to speed the execution - right-click in the query window and select "Show Actual Query Plan". From this, you can at least discover the tables that are being touched and the predicates involved in gathering the data...and you might get lucky with indexing.
There's no telling how efficiently the function computes the buckets. If they're not using some kind of window functions, it can be kind of horrible. You might find it advantageous to write your own UDF to get only what you want. Since it's generic, it may have a lot more work to do to cover all the possible bases - something your organization may not need.
If it is an inline function, you might get some relief by asking only for the columns you really need to look at. They're returning (at least) 7 buckets and a lot of AR reporting and analysis needs only 3 (the 30, 60, 90 day buckets, for example). It might also be worth doing a little pre-analysis to find out what clients you need to apply the function to - to keep from having to run it against your whole client domain.
Just looking at the function name - makes me think it's not a documented API per se. Encryption reinforces this hunch. Not sure how badly you really want to use such a function - no telling how it might get refactored (or removed) going forward.

Related

Concurrent queries in PostgreSQL - what is actually happening?

Let us say we have two users running a query against the same table in PostgreSQL. So,
User 1: SELECT * FROM table WHERE year = '2020' and
User 2: SELECT * FROM table WHERE year = '2019'
Are they going to be executed at the same time as opposed to executing one after the other?
I would expect that if I have 2 processors, I can run both at the same time. But I am thinking that matters become far more complicated depending on where the data is located (e.g. disk) given that it is the same table, whether there is partitioning, configurations, transactions, etc. Can someone help me understand how I can ensure that I get my desired behaviour as far as PostgreSQL is concerned? Under which circumstances will I get my desired behaviour and under which circumstances will I not?
EDIT: I have found this other question which is very close to what I was asking - https://dba.stackexchange.com/questions/72325/postgresql-if-i-run-multiple-queries-concurrently-under-what-circumstances-wo. It is a bit old and doesn't have much answers, would appreciate a fresh outlook on it.
If the two users have two independent connections and they don't go out of their way to block each other, then the queries will execute at the same time. If they need to access the same buffer at the same time, or read the same disk page into a buffer at the same time, they will use very fast locking/coordination methods (LWLocks, spin locks, or atomic operations like CAS) to coordinate that. The exact techniques vary from version to version, as better methods become widely available on supported platforms and as people find the time to change the implementation to use those better methods.
I can ensure that I get my desired behaviour as far as PostgreSQL is concerned?
You should always get the correct answer to your query (Or possibly some kind of ERROR indicating a failure to serialize if you are using the highest (and non-default) isolation level, but that doesn't seem to be a risk if each of those queries is run in a single-statement transaction.)
I think you are overthinking this. The point of using a database management system is that you don't need to micromanage it.
Also, "parallel-query" refers to a single query using multiple CPUs, not to different queries running at the same time.

ORDER BY is taking too much time when executed for VIEW

I have a relatively complicated set up. I'm using SQL Server 2012 with 3 linked servers which are IBM DB2 servers. I have several queries which join tables from all three linked servers to fetch data. Due to some specifics of the version that I'm using I can't use some OLAP functions directly so since an upgrade is not an option the workaround was to create views and execute those functions on the views. One problem that I'm facing right now is that using ORDER BY from the view almost triples up the time needed for the view to be executed.
When I execute the only with SELECT it takes 24 seconds (Yeah, I know we talk about ridiculous times here, but still, I want to fix the problem with the order by since I'm not allowed to change the queries to the DB2 servers but the order by is on my side), when I use ORDER BY it goes from 68 to 80 seconds depending on which column I'm ordering on. I can't create a schemabound view because it's now allowed with OpenQuery, I've read the anyways it's not allowed to use ORDER BY when creating a view, I haven't tried that but since I need the order by to be available on multiple columns it's not a n option unless I create as much views as columns I have which sounds kinda ridiculous but... dunno.
Since I have trivial knowledge about SQL at general I'm not sure what is the best choice here. Even if the execution times are fixed I don't want my Order by clause to be so much time consuming compared to the time needed for the whole query. If I can make it as fast as it is when I execute it directly in the query - when I don't use view and I add the ORDER BY to the initial query the original time is 24 seconds and then it goes up to 36 which in percents is still much better than the performance when the same ORDER BY function is executed from the view.
So my questions are - what causes the ORDER BY to be executed so slow from the view and how can I make it as fast as if it was part from the original query, also, if this is just not possible, how can I reduce the huge time it takes?
Views use different execution plans than the queries that make them. This is, in my opinion, a bit of a shortcoming of views. ORDER BY is a particularly expensive command, so it makes the difference in execution plans very noticeable.
The alternative to this issue I've found was going the Table Valued Function route as it does appear to use the same execution plan as just running the query.
Here's a decent write-up of Table Valued Functions:
http://technet.microsoft.com/en-us/library/ms191165(v=sql.105).aspx

Query optimization in Delphi 4

In Delphi 4, we have one SELECT query which is fetching 3 Text type fields along with other required fields at a time using a TQuery component.
There are more than 1,000 records (which might increase in future).
This query consumes lots of memory. and I think due to this next query takes huge amount of time to execute.
I'm using BDE to connect to SQL-server.
I need to optimize the performance so that it won't take so much time. Please advice.
You should consider some kind of Paging mechanism. do not fetch 1000 (or 1 million) records to the client, but instead use paging with SQL-server ROW_NUMBER() to get blocks of say 50-100 records per page.
so a query like:
SELECT id, username FROM mytable ORDER BY id
could look like this:
SELECT * FROM (
SELECT id, username, TOTAL_ROWS=Count(*) OVER(), ROW_NUMBER() OVER(ORDER BY id) AS ROW_NUM
FROM mytable
) T1
WHERE ROW_NUM BETWEEN 1 AND 50
The ORDER BY field(s) should be Indexed (if possible) to speed things up.
If you use a TQuery, make sure that you use a local TField outside of the retrieval loop for faster process (the FieldByName method is somewhat slow).
You can try our freeware Open Source classes to access any DB engine.
It provides a direct access to MS SQL via OleDB, without calling the ADO layer.
It is very optimized for speed, and is Unicode ready, even on older version of Delphi. It has been tested on Windows XP, Vista, and Seven (including 64 bit).
It has a TQuery emulator: this is not a true TQuery as defined in the DB.pas unit, but a class with most of the same methods. And you won't need to work with all BDE classes and units. Drawback is that you can't use Delphi DB visual controls, but for a quick TQuery, it will do the work.
It has some unique features (like late-binding use for field access), which are worth considering.
It does not require any third-party library (like the BDE), and works from Delphi 5 up to XE2. I guess it will run under Delphi 4 also.
You can download and ask for support in our site.
Really fetching TEXT column values takes the time and takes the memory.
To speedup fetching, exclude TEXT columns from the SELECT list. And fetch them using an additional query by a record primary key and only when you really need their values.
To reduce memory usage do as above, use Unidirectional query or bove.
To reduce the time (depending on data) we can use DATALENGTH in a query.
like
DATALENGTH(TEXT) <> 0
This will not load records having no value in TEXT field.
Which field types did you define? If they are large, they will take up memory, there's little you can do about it. You may try different libraries, some are smart enough to allocate only the actual size of the field and not the declared one, other will always allocate the declared one, so if you have three fields of 4000 characters and 1000 records you will have 3 * 4000 * 1000 bytes allocated just for the text fields.
Do you need to load the whole dataset at once? Fetching just the needed data using a where condition and/or incremental fetching will help to reduce both memory and maybe execution time
If the query takes long to execute, you have to understand why and where. It could be the query execution time itself, it could be the time taken to transfer the result set to the client application. You need to profile your query to understand what is the issue actually, and take the proper corrective action. 1000 records is a very small dataset today, if it is slow there's something really bad.
Each database has subtle optimization differences. You have to study the one you're using carefully, and write the proper query for that database - after you designed the proper database.
Just changing the database components without determining that's exactly the cause is plainly stupid, and if the problem is elsewhere is just wasted time.
BDE works well enough, especially if compared to ADO. And Microsoft is desupporting ADO as well, thereby I won't invest time and money in it.
Update: it would be interesting to know why this answer was downvoted. Just because ADO worshippers will have an hard time in the future, and they feel the need to hide the truth?
KEEP ON DOWNVOTING MORONS. YOU JUST SHOW YOUR IGNORANCE!

SQL Server 2008: The reasonable stress tests scenario

I am performing stress testing on SQL Server 2008 with JMeter.
I wish to improve a stored procedure that has to serve 20 requests per second.
The procedure takes an xml parameter and returns an xml result.
Should I use only one parameter value or test multiple scenarios?
My main doubts are:
recompilations of the procedure execution plan (this may slow down the procedure)
extraction of data from disk (not all necessary data may be hold in the main memory)
Designing a realistic Stress Test/Load Test in SQL Server is an art.
There are many factors that can impact performance:
Hardware: You need to run your tests against the the same hardware that you have defined your target (20 call per second). This includes disk configuration, redundancy, clustering, ... This is not always possible so you need to make it as close as possible however the more different your test environment becomes, the more unrealistic results can be. This means, for example, if you use 2 CPUs instead of 4, you cannot adjust the parameters accordingly.
Data load: in terms of number of the records you need to test, it is ideal to have around 30%-40% more of the maximum rows you expect in the tables.
Data and index distribution: It is a common mistake to load the server with a preset or completely random data. Both are wrong. The distribution of the values need to be realistic. For example distribution of the marital status is not the same across all possible values so you need to design your data generation to include this.
Index fragmentation: this is a tough one. Normally indexes are rebuilt overnight, but during the course of the day, indexes become fragmented so the performance can be very different during those times.
Concurrent load: A server could provide you with 20 requests per second, if it is the only call you are making to the database but as soon as you start making other calls, it all falls to pieces. The load need to include other related parts of the system.
Operation Load: It is absolutely no point to make 20 calls per second if the requests are all the same. You need to use Data Generation techniques to make the requests realistic not purely random.
If you are using C#, I have done this tool a while back which might help you with creating realistic random data.

SQL Server performance and fully qualified table names

It seems to be fairly accepted that including the schema owner in the query increases db performance, e.g.:
SELECT x FROM [dbo].Foo vs SELECT x FROM Foo.
This is supposed to save a lookup, because SQL Server will otherwise look for a Foo table belonging to the user in the connection context.
Today I was told that always including the database name improves the performance the same way, even if you are querying the database you selected in your connection string:
SELECT x FROM MyDatabase.[dbo].Foo
Is there any truth to this? Does this make sense as a coding standard? Does any of this (even the first example) translate to measurable benefits?
Are we talking about a few cycles for an extra dictionary lookup on the database server vs more bloated SQL and extra concatenation on the web server (or other client)?
One thing to keep in mind is that this is a compilation binding, not an execution one. So if you execute the same query 1 million times, only the first execution will 'hit' the look up time, the rest will reuse the same plan and plans are pre-bound (names are already resolved to object ids).
In this case I would personally prefer readability over the tiny performance increase that this could possibly cause, if any.
SELECT * FROM Foo
Seems a lot easier to scan than:
SELECT * FROM MyDatabase.[dbo].Foo
Try it out? Just loop through a million queries of both and see which one finishes first.
I'm guessing it's a load of bunk though. The developers of MS SQL spend millions of hours researching efficiency for search algorithms and storage methods only to be thwarted by users not specifying fully qualified table names? Laughable.
SQL server will not make an extra look up if the default schema is the same. It should be included if it's not and it's a query that is used a lot.
The database name will not benefit the query performance. I think this could be seen with the Estimated Execution Plan in the Management Studio.
As Spencer said - try it, of course make sure you clear the cache each time as this will interfere with your results.
http://www.devx.com/tips/Tip/14401
I also would be suprised if it made any apprecible difference.

Resources