JPA 2 - How to cache/optimize count queries?

JPA 2 - How to cache/optimize count queries? - query-optimization

I have Java EE application, where I have list of groups. I want to display number of items in every group, so I do:
SELECT COUNT(p) FROM Product p WHERE LOWER(p.group) = LOWER(:group)
Group is plain String.
Problem is when I have 50+ groups, time to load page is about 10s, which is unacceptable. When I remove counting time drops to ~1s. How can I optimize/cache this?
My first idea was to use EJB singleton to keep counts locally and refresh them with EJB timer.
I'm using glassfish v3, eclipselink and oracle 10g database on same machine.
I'm using cache in eclipselink, but it seems not to cache count queries.
<property name="eclipselink.cache.shared.default" value="false"/>
<property name="eclipselink.cache.size.default" value="500"/>

How about selecting all groups and associated counts in one query? This is not a JPA answer, but a plain SQL one:
SELECT LOWER(p.group), COUNT(*)
FROM Product p
GROUP BY LOWER(p.group)

A long query is usually the sign of a missing index in the database. Try adding an index on product(lower(group)).
Also, it's hard to say withount knowing the excact use-case, but I find it strange to get the number of products of a group using a group label in the product table. Shouldn't you have a Group entity, and reference the group's primary key in the product table? This would eliminate the need of an index on lower(group) (but would need an index on product.group_id).

If you are passing in :group as parameter anyway, I would delegate the lowering part into java code.
SELECT COUNT(p) FROM Product p WHERE LOWER(p.group) = (:groupInLower)
This might have been optimized out by the SQL interpreter anyway, but it's worth a try...

Related

KeyLookup in massive columns in sql server

I have one simple query which has multiple columns (more than 1000).
When i run with single column it gives me result in 2 seconds with proper index seek, logical read, cpu and every thing is under thresholds.
But when i select more than 1000 columns it takes 11 mins for the result and gives me key lookup.
You folks have you faced this type of issue?
Any suggestion on that issue?

Normally, I would suggest to add those columns in the INCLUDE fields of your non-clustered index. Adding them in the INCLUDE removes the LOOKUP in the execution plan. But as everything with SQL Server, it depends. Depending on how the table is used i.e, if you're updating the table more than just plain SELECTing on it, then the LOOKUP might be ok.

If this query is run once per year, the overhead of additional index is probably not worth it. If you need quick response time, that single time of the year when it needs to be run, look into 'pre executing' it and just present the result to the user.
The difference in your query plan might be because of join elimination (if your query contains JOINs with multiple tables) or just that the additional columns you are requesting do not exist in your currently existing indexes...

Add DATE column to store when last read

We want to know what rows in a certain table is used frequently, and which are never used. We could add an extra column for this, but then we'd get an UPDATE for every SELECT, which sounds expensive? (The table contains 80k+ rows, some of which are used very often.)
Is there a better and perhaps faster way to do this? We're using some old version of Microsoft's SQL Server.

This kind of logging/tracking is the classical application server's task. If you want to realize your own architecture (there tracking architecture) do it on your own layer.
And in any case you will need application server there. You are not going to update tracking field it in the same transaction with select, isn't it? what about rollbacks? so you have some manager who first run select than write track information. And what is the point to save tracking information together with entity info sending it back to DB? Save it into application server file.

You could either update the column in the table as you suggested, but if it was me I'd log the event to another table, i.e. id of the record, datetime, userid (maybe ip address etc, browser version etc), just about anything else I could capture and that was even possibly relevant. (For example, 6 months from now your manager decides not only does s/he want to know which records were used the most, s/he wants to know which users are using the most records, or what time of day that usage pattern is etc).
This type of information can be useful for things you've never even thought of down the road, and if it starts to grow large you can always roll-up and prune the table to a smaller one if performance becomes an issue. When possible, I log everything I can. You may never use some of this information, but you'll never wish you didn't have it available down the road and will be impossible to re-create historically.
In terms of making sure the application doesn't slow down, you may want to 'select' the data from within a stored procedure, that also issues the logging command, so that the client is not doing two roundtrips (one for the select, one for the update/insert).
Alternatively, if this is a web application, you could use an async ajax call to issue the logging action which wouldn't slow down the users experience at all.

Adding new column to track SELECT is not a practice, because it may affect database performance, and the database performance is one of major critical issue as per Database Server Administration.
So here you can use one very good feature of database called Auditing, this is very easy and put less stress on Database.
Find more info: Here or From Here
Or Search for Database Auditing For Select Statement

Use another table as a key/value pair with two columns(e.g. id_selected, times) for storing the ids of the records you select in your standard table, and increment the times value by 1 every time the records are selected.
To do this you'd have to do a mass insert/update of the selected ids from your select query in the counting table. E.g. as a quick example:
SELECT id, stuff1, stuff2 FROM myTable WHERE stuff1='somevalue';
INSERT INTO countTable(id_selected, times)
SELECT id, 1 FROM myTable mt WHERE mt.stuff1='somevalue' # or just build a list of ids as values from your last result
ON DUPLICATE KEY
UPDATE times=times+1
The ON DUPLICATE KEY is right from the top of my head in MySQL. For conditionally inserting or updating in MSSQL you would need to use MERGE instead

Cognos Report Studio: Cascading Prompts populating really slow

I've a Cognos report in which I've cascading prompts. The Hierarchy is defined in the image attached.
The First Parent (Division) fills the two cascading child in 3-5 seconds.
But when I select any Policy, (that will populate the two child beneath) it took around 2 minutes.
Facts:
The result set after two minutes is normal (~20 rows)
The Queries behind all the prompts are simple Select DISTINCT Col_Name
Ive created indexes on all the prompt columns.
Tried turning on the local cache and Execution Method to concurrent.
I'm on Cognos Report Studio 10.1
Any help would be much appreciated.
Thanks,
Nuh

There is an alternative to a one-off dimension table. Create a Query Subject in Framework for your AL-No prompt. In the query itself, build a query that gets distinct AL-No (you said that is fast, probably because there is an index on AL-No). Wrap that in a select that does a filter on ' #prompt('pPolicy')#' (assuming your Policy Prompt is keyed to ?pPolicy?)
This will force the Policy into the sql before it is sent to the database, but wrapping on the distinct AL-No will allow you to use the AL-No index.
select AL_NO from
(
select AL_NO, Policy_NO
from CLAIMS
group by AL_NO, Policy_NO
)
where Policy_NO = #prompt('pPolicyNo')#

Your issue is just too much table scanning. Typically, one would build a prompt page from dimension-based tables, not the fact table, though I admit that is not always possible with cascading prompts. The ideal solution is to create a one-off dimension table with these distinct values, then model that strictly for the prompts.
Watch out for indexing each field, as the indexes will not be used due to the selectivity of the values. A compound index of the fields may work instead. As with any time you are making changes to the DDL - open SQL profiler and see what SQL Cognos is generating, then run an explain plan before/after the changes.

MSSQL/Oracle Query Tuning 500,000+ records Coldfusion - does lower() reduce performance

I'm not trying to start a debate on which is better in general, I'm asking specifically to this question. :)
I need to write a query to pull back a list of userid (uid) from a database containing 500k+ records. I'm returning just the one field, uid. I can query either our Oracle box or our MSSQL 2000 box. The query looks like this (this has not been simplied)
select uid
from employeeRec
where uid = 'abc123'
Yes, it really is that simply of a query. Where I need the tuninig help is that the uid is indexed and some uid could be (not many but some) 'ABC123' or 'abc123'. MSSQL doesn't care of the case-sensitivity whereas Oracle does. So for Oracle, my query would look like this:
select uid
from employeeRec
where lower(uid) = 'abc123'
I've learned that if you use lower on an index field in MSSQL, you render the index useless (there are ways around it but that is beyond the scope of my question here - since if I choose MSSQL, I don't need to use lower at all). I wanted to know if I choose Oracle, and use the lower() function, will that also hurt performance of the query?
I'm looping over this query about 200 times in addition to some other queries that are being run and to process the entire loop takes 1 second per iteration and I've narrowed down the slowness to this particular query. For a web page, 200 seconds seems like eternity. For you CF readers, timeout value has been increased so the page doesn't error out and there are no page errors, I'm just trying to speed up this query.
Another item to note: This database is in a different city than the other queries being run so I do expect some lag time there.

As TomTom put, your index will simply not be used by Oracle. But, you can create a function based index, and this new index will be used when you issue your query.
create index my_new_ix on employeeRec(lower(uid));

Wrapping an indexed column in a function call would have the potential to cause performance problems in Oracle. Oracle couldn't use a plain index on UID to process your query. On the other hand, you could create a function-based index on lower(uid) that would be used by the query, i.e.
CREATE INDEX case_insensitive_idx
ON employeeRec( lower( uid ) );
Note that if you want to do case-insensitive queries in general, you may be better served setting NLS parameters to force case-insensitivity. You'd still need function-based indexes on the columns you're searching on, but it can simplify your queries a bit.

I wanted to know if I choose Oracle,
and use the lower() function, will
that also hurt performance of the
query?
Yes. The perforamnce reduction is because the index is on the original value and the collation i case sensitive, so all possible values must be run through the function to filter out the ones matching.

Have you ever encountered a query that SQL Server could not execute because it referenced too many tables?

Have you ever seen any of there error messages?
-- SQL Server 2000
Could not allocate ancillary table for view or function resolution.
The maximum number of tables in a query (256) was exceeded.
-- SQL Server 2005
Too many table names in the query. The maximum allowable is 256.
If yes, what have you done?
Given up? Convinced the customer to simplify their demands? Denormalized the database?
#(everyone wanting me to post the query):
I'm not sure if I can paste 70 kilobytes of code in the answer editing window.
Even if I can this this won't help since this 70 kilobytes of code will reference 20 or 30 views that I would also have to post since otherwise the code will be meaningless.
I don't want to sound like I am boasting here but the problem is not in the queries. The queries are optimal (or at least almost optimal). I have spent countless hours optimizing them, looking for every single column and every single table that can be removed. Imagine a report that has 200 or 300 columns that has to be filled with a single SELECT statement (because that's how it was designed a few years ago when it was still a small report).

For SQL Server 2005, I'd recommend using table variables and partially building the data as you go.
To do this, create a table variable that represents your final result set you want to send to the user.
Then find your primary table (say the orders table in your example above) and pull that data, plus a bit of supplementary data that is only say one join away (customer name, product name). You can do a SELECT INTO to put this straight into your table variable.
From there, iterate through the table and for each row, do a bunch of small SELECT queries that retrieves all the supplemental data you need for your result set. Insert these into each column as you go.
Once complete, you can then do a simple SELECT * from your table variable and return this result set to the user.
I don't have any hard numbers for this, but there have been three distinct instances that I have worked on to date where doing these smaller queries has actually worked faster than doing one massive select query with a bunch of joins.

#chopeen You could change the way you're calculating these statistics, and instead keep a separate table of all per-product stats.. when an order is placed, loop through the products and update the appropriate records in the stats table. This would shift a lot of the calculation load to the checkout page rather than running everything in one huge query when running a report. Of course there are some stats that aren't going to work as well this way, e.g. tracking customers' next purchases after purchasing a particular product.

This would happen all the time when writing Reporting Services Reports for Dynamics CRM installations running on SQL Server 2000. CRM has a nicely normalised data schema which results in a lot of joins. There's actually a hotfix around that will up the limit from 256 to a whopping 260: http://support.microsoft.com/kb/818406 (we always thought this a great joke on the part of the SQL Server team).
The solution, as Dillie-O aludes to, is to identify appropriate "sub-joins" (preferably ones that are used multiple times) and factor them out into temp-table variables that you then use in your main joins. It's a major PIA and often kills performance. I'm sorry for you.
#Kevin, love that tee -- says it all :-).

I have never come across this kind of situation, and to be honest the idea of referencing > 256 tables in a query fills me with a mortal dread.
Your first question should probably by "Why so many?", closely followed by "what bits of information do I NOT need?" I'd be worried that the amount of data being returned from such a query would begin to impact performance of the application quite severely, too.

I'd like to see that query, but I imagine it's some problem with some sort of iterator, and while I can't think of any situations where its possible, I bet it's from a bad while/case/cursor or a ton of poorly implemented views.

Post the query :D
Also I feel like one of the possible problems could be having a ton (read 200+) of name/value tables which could condensed into a single lookup table.

I had this same problem... my development box runs SQL Server 2008 (the view worked fine) but on production (with SQL Server 2005) the view didn't. I ended up creating views to avoid this limitation, using the new views as part of the query in the view that threw the error.
Kind of silly considering the logical execution is the same...

Had the same issue in SQL Server 2005 (worked in 2008) when I wanted to create a view. I resolved the issue by creating a stored procedure instead of a view.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight