Query optimization in Delphi 4

Query optimization in Delphi 4 - sql-server

In Delphi 4, we have one SELECT query which is fetching 3 Text type fields along with other required fields at a time using a TQuery component.
There are more than 1,000 records (which might increase in future).
This query consumes lots of memory. and I think due to this next query takes huge amount of time to execute.
I'm using BDE to connect to SQL-server.
I need to optimize the performance so that it won't take so much time. Please advice.

You should consider some kind of Paging mechanism. do not fetch 1000 (or 1 million) records to the client, but instead use paging with SQL-server ROW_NUMBER() to get blocks of say 50-100 records per page.
so a query like:
SELECT id, username FROM mytable ORDER BY id
could look like this:
SELECT * FROM (
SELECT id, username, TOTAL_ROWS=Count(*) OVER(), ROW_NUMBER() OVER(ORDER BY id) AS ROW_NUM
FROM mytable
) T1
WHERE ROW_NUM BETWEEN 1 AND 50
The ORDER BY field(s) should be Indexed (if possible) to speed things up.

If you use a TQuery, make sure that you use a local TField outside of the retrieval loop for faster process (the FieldByName method is somewhat slow).
You can try our freeware Open Source classes to access any DB engine.
It provides a direct access to MS SQL via OleDB, without calling the ADO layer.
It is very optimized for speed, and is Unicode ready, even on older version of Delphi. It has been tested on Windows XP, Vista, and Seven (including 64 bit).
It has a TQuery emulator: this is not a true TQuery as defined in the DB.pas unit, but a class with most of the same methods. And you won't need to work with all BDE classes and units. Drawback is that you can't use Delphi DB visual controls, but for a quick TQuery, it will do the work.
It has some unique features (like late-binding use for field access), which are worth considering.
It does not require any third-party library (like the BDE), and works from Delphi 5 up to XE2. I guess it will run under Delphi 4 also.
You can download and ask for support in our site.

Really fetching TEXT column values takes the time and takes the memory.
To speedup fetching, exclude TEXT columns from the SELECT list. And fetch them using an additional query by a record primary key and only when you really need their values.
To reduce memory usage do as above, use Unidirectional query or bove.

To reduce the time (depending on data) we can use DATALENGTH in a query.
like
DATALENGTH(TEXT) <> 0
This will not load records having no value in TEXT field.

Which field types did you define? If they are large, they will take up memory, there's little you can do about it. You may try different libraries, some are smart enough to allocate only the actual size of the field and not the declared one, other will always allocate the declared one, so if you have three fields of 4000 characters and 1000 records you will have 3 * 4000 * 1000 bytes allocated just for the text fields.
Do you need to load the whole dataset at once? Fetching just the needed data using a where condition and/or incremental fetching will help to reduce both memory and maybe execution time
If the query takes long to execute, you have to understand why and where. It could be the query execution time itself, it could be the time taken to transfer the result set to the client application. You need to profile your query to understand what is the issue actually, and take the proper corrective action. 1000 records is a very small dataset today, if it is slow there's something really bad.
Each database has subtle optimization differences. You have to study the one you're using carefully, and write the proper query for that database - after you designed the proper database.
Just changing the database components without determining that's exactly the cause is plainly stupid, and if the problem is elsewhere is just wasted time.
BDE works well enough, especially if compared to ADO. And Microsoft is desupporting ADO as well, thereby I won't invest time and money in it.
Update: it would be interesting to know why this answer was downvoted. Just because ADO worshippers will have an hard time in the future, and they feel the need to hide the truth?
KEEP ON DOWNVOTING MORONS. YOU JUST SHOW YOUR IGNORANCE!

Related

Better way to join table valued function

I am trying to generate a table that holds a client age analysis at a certain time. My source data is from Pastel Evolution accounting system.
They have a table valued function [_efnAgedPostARBalancesSum] that takes 2 Parameters (date and client link) and returns Age1, Age2, etc for entered client link. I need to get the ageing for all the clients in the client table.
I managed to get it working by using cross apply as per below, but it takes a long time to execute. If I run the age analysis from within Pastel it takes about 20 seconds, in Sql it takes about 6 minutes.
The function is encrypted so I cannot see what it does. I am using SQL Server 2008 R2.
Is there a more efficient alternative to cross apply?
SELECT
f.AccountLink,
f.AccountBalance,
f.Age1,
f.Age2,
f.Age3,
f.Age4,
f.Age5,
f.Age6,
f.Age7
FROM
Client
CROSS APPLY [_efnAgedPostARBalancesSum] ('2014-09-30', Client.DCLink) AS f

It looks like an AR aging bucket function from the outside - and probably has custom bucket sizes (given the generic age1, age2, etc). They're notoriously compute intensive. Its the kind of query that often spawns the need for a separate BI database as an OLTP system is not ideal for analytical queries. Its not only slow to run, its also likely to be impacting other work in your OLTP system while this function is banging on it.
You can bet it's looking at the due dates from the various documents that contain balances due (very likely several sources). They might not all be indexed on the due date columns. Look for that first. If you run the query in SSMS with show plan on, it may suggest one or more indexes to speed the execution - right-click in the query window and select "Show Actual Query Plan". From this, you can at least discover the tables that are being touched and the predicates involved in gathering the data...and you might get lucky with indexing.
There's no telling how efficiently the function computes the buckets. If they're not using some kind of window functions, it can be kind of horrible. You might find it advantageous to write your own UDF to get only what you want. Since it's generic, it may have a lot more work to do to cover all the possible bases - something your organization may not need.
If it is an inline function, you might get some relief by asking only for the columns you really need to look at. They're returning (at least) 7 buckets and a lot of AR reporting and analysis needs only 3 (the 30, 60, 90 day buckets, for example). It might also be worth doing a little pre-analysis to find out what clients you need to apply the function to - to keep from having to run it against your whole client domain.
Just looking at the function name - makes me think it's not a documented API per se. Encryption reinforces this hunch. Not sure how badly you really want to use such a function - no telling how it might get refactored (or removed) going forward.

ORDER BY is taking too much time when executed for VIEW

I have a relatively complicated set up. I'm using SQL Server 2012 with 3 linked servers which are IBM DB2 servers. I have several queries which join tables from all three linked servers to fetch data. Due to some specifics of the version that I'm using I can't use some OLAP functions directly so since an upgrade is not an option the workaround was to create views and execute those functions on the views. One problem that I'm facing right now is that using ORDER BY from the view almost triples up the time needed for the view to be executed.
When I execute the only with SELECT it takes 24 seconds (Yeah, I know we talk about ridiculous times here, but still, I want to fix the problem with the order by since I'm not allowed to change the queries to the DB2 servers but the order by is on my side), when I use ORDER BY it goes from 68 to 80 seconds depending on which column I'm ordering on. I can't create a schemabound view because it's now allowed with OpenQuery, I've read the anyways it's not allowed to use ORDER BY when creating a view, I haven't tried that but since I need the order by to be available on multiple columns it's not a n option unless I create as much views as columns I have which sounds kinda ridiculous but... dunno.
Since I have trivial knowledge about SQL at general I'm not sure what is the best choice here. Even if the execution times are fixed I don't want my Order by clause to be so much time consuming compared to the time needed for the whole query. If I can make it as fast as it is when I execute it directly in the query - when I don't use view and I add the ORDER BY to the initial query the original time is 24 seconds and then it goes up to 36 which in percents is still much better than the performance when the same ORDER BY function is executed from the view.
So my questions are - what causes the ORDER BY to be executed so slow from the view and how can I make it as fast as if it was part from the original query, also, if this is just not possible, how can I reduce the huge time it takes?

Views use different execution plans than the queries that make them. This is, in my opinion, a bit of a shortcoming of views. ORDER BY is a particularly expensive command, so it makes the difference in execution plans very noticeable.
The alternative to this issue I've found was going the Table Valued Function route as it does appear to use the same execution plan as just running the query.
Here's a decent write-up of Table Valued Functions:
http://technet.microsoft.com/en-us/library/ms191165(v=sql.105).aspx

Database storage requirements and management for lots of numerical data

I'm trying to figure out how to manage and serve a lot of numerical data. Not sure an SQL database is the right approach. Scenario as follows:
10000 sets of time series data collected per hour
5 floating point values per set
About 5000 hours worth of data collected
So that gives me about 250 million values in total. I need to query this set of data by set ID and by time. If possible, also filter by one or two of the values themselves. I'm also continuously adding to this data.
This seems like a lot of data. Assuming 4 bytes per value, that's 1TB. I don't know what a general "overhead multiplier" for an SQL database is. Let's say it's 2, then that's 2TB of disk space.
What are good approaches to handling this data? Some options I can see:
Single PostgreSQL table with indices on ID, time
Single SQLite table -- this seemed to be unbearably slow
One SQLite file per set -- lots of .sqlite files in this case
Something like MongoDB? Don't even know how this would work ...
Appreciate commentary from those who have done this before.

Mongo is a key-value store; might work for your data but I don't have much experience.
I can tell you that PostgreSQL will be a good choice. It will be able to handle that kind of data. SQLite is definitely not optimized for those use-cases.

What is SQL Server doing between the time my first record is returned and when my last record is returned?

Say I have a query that returns 10,000 records. When the first record has returned what can I assume about the state of my query?
Has it finished and is just returning records from the server to my instance of SSMS?
Is the query itself still being executed on the server?
What is it that causes the 10,000 records to be slowly returned for one query and nearly instantly for another?

There is potentially some mix of progressive processing on the server side, network transfer of the data, and rendering by the client.
If one query returns 10,000 rows quickly, and another one slowly -- and they are of similar row size, data types, etc., and are both destined for results to grid or results to text -- there is little we can do to analyze the differences unless you show us execution plans and/or client statistics for each one. These are options you can set in SSMS when running a query.
As an aside, switching between results to grid and results to text you might notice slightly different runtimes. This is because in one case Management Studio has to work harder to align the columns etc.

You can not make a generic assumption, a query's plan is composed of a number of different types of operations, or iterators. Some of these are Navigational based, and work like a pipeline, whilst others are set based operations, such as a sort.
If any query contains a set based operation, it requires all the records before it could output the results (i.e an order by clause within your statement.) But if you have no set based iterators you could expect the rows to be streamed to you as they become available.

The answer to each of your individual questions is "it depends."
For example, consider if you include an order by clause, and there isn't an index for the column(s) you're ordering by. In this case, the server has to find all the records that satisfy your query, then sort them, before it can return the first record. This causes a long pause before you get your first record, but you (should normally) get them quite quickly once you start getting any.
Without the order by clause, the server will normally send each record as its found, so the first record will often show up sooner, but you may see a long pause between one record and the next.
As as far simply "why is one query faster than another", a lot depends on what indexes are available, and whether they can be used for a particular query. For example, something like some_column like '%something' will almost always be quite slow. The leading '%' means this won't be able to use an index, even if some_column has one. A search for something% instead of %something% might easily be 100 or 1000 times faster. If you really need the former, you really want to use full-text searching instead (create a full-text index, and use contains() instead of like.
Of course, a lot can also depend simply on whether the database has an index for a particular column (or group of columns). With a suitable index, the query will usually be quite a lot faster.

SQL Server performance and fully qualified table names

It seems to be fairly accepted that including the schema owner in the query increases db performance, e.g.:
SELECT x FROM [dbo].Foo vs SELECT x FROM Foo.
This is supposed to save a lookup, because SQL Server will otherwise look for a Foo table belonging to the user in the connection context.
Today I was told that always including the database name improves the performance the same way, even if you are querying the database you selected in your connection string:
SELECT x FROM MyDatabase.[dbo].Foo
Is there any truth to this? Does this make sense as a coding standard? Does any of this (even the first example) translate to measurable benefits?
Are we talking about a few cycles for an extra dictionary lookup on the database server vs more bloated SQL and extra concatenation on the web server (or other client)?

One thing to keep in mind is that this is a compilation binding, not an execution one. So if you execute the same query 1 million times, only the first execution will 'hit' the look up time, the rest will reuse the same plan and plans are pre-bound (names are already resolved to object ids).

In this case I would personally prefer readability over the tiny performance increase that this could possibly cause, if any.
SELECT * FROM Foo
Seems a lot easier to scan than:
SELECT * FROM MyDatabase.[dbo].Foo

Try it out? Just loop through a million queries of both and see which one finishes first.
I'm guessing it's a load of bunk though. The developers of MS SQL spend millions of hours researching efficiency for search algorithms and storage methods only to be thwarted by users not specifying fully qualified table names? Laughable.

SQL server will not make an extra look up if the default schema is the same. It should be included if it's not and it's a query that is used a lot.
The database name will not benefit the query performance. I think this could be seen with the Estimated Execution Plan in the Management Studio.

As Spencer said - try it, of course make sure you clear the cache each time as this will interfere with your results.
http://www.devx.com/tips/Tip/14401
I also would be suprised if it made any apprecible difference.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight