Select a random row from Oracle DB in a performant way - database

Using :
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0
I am trying to fetch a random row. As suggested in other stackoverflow questions, I used DBMS_RANDOM.VALUE like this -
SELECT column FROM
( SELECT column
FROM table
WHERE COLUMN_VALUE = 'Y' -- value of COLUMN_VALUE
ORDER BY dbms_random.value
)
WHERE rownum <= 1
But this query isn't performant when the number of requests increase.
So I am looking for an alternative.
SAMPLE wouldn't work for me because the sample picked up through the clause wouldn't have a dataset that matches my WHERE clause. The query looked like this -
SELECT column FROM table SAMPLE(1) WHERE COLUMN_VALUE = 'Y'
Because the SAMPLE is applied before my WHERE clause, most times this returns no data.
P.S: I am ok to move some part of the logic to application layer (though i am definitely not looking for answers that suggest loading everything to memory)

The performance problems consist of two aspects:
selecting the data with column_value = 'Y' and
sorting this subset to get a random record
You didn't say if the subset of your table with column_value = 'Y' is a large or small. This is important and will drive your strategy.
If there are lots of records with column_value = 'Y' use the SAMPLE to limit the rows to by sorted.
You are right, this could lead to empty result - in this case repeat the query (you may additionally add a logic that increases the sample percent to avoid lot of repeating). This will boost performance while you sort ony a sample of the data
select id from (
select id from tt SAMPLE(1) where column_value = 'Y' order by dbms_random.value )
where rownum <= 1;
If there are only few records with column_value = 'Y' define an index on this column (or a separate partition) - this enables a effiective access to the records. Use the order by dbms_random.value approach. Sort will not degradate performance for small number of rows.
select id from (
select id from tt where column_value = 'Y' order by dbms_random.value )
where rownum <= 1;
Basically both approaches keep the sorted rows in small size. The first approach perform a table access comparable with FULL TABLE SCAN, the second performs INDEX ACCESS for the selected column_value.

Related

Indexing and optimization of where clause based on datetime field

I have a database with more than a million of rowset data. When I execute this query it takes hours, mostly due to pageIOLatch_sh. There are currently no indexing. Can you suggest the possible indexing in where clause. I believe it should be on datetime as it is used in where as well as order by , if so which index to use.
if(<some condition>)
BEGIN
select <some columns>
From <some tables with joins(no lock)>
WHERE
((#var2 IS NULL AND a.addr IS NOT NULL)OR
(a.addr LIKE #var2 + '%')) AND
((#var3 IS NULL AND a.ca_id IS NOT NULL) OR
(a.ca_id = #var3)) AND
b.time >= #from_datetime AND b.time <= #to_datetime AND
(
(
b.shopping_product IN ('CX12343', 'BG8945', 'GF4543') AND
b.shopping_category IN ('online', 'COD')
)
OR
(
b.shopping_product = 'LX3454' and b.sub_shopping_list in ('FF544','GT544','KK543','LK5343')
)
OR
(
b.shopping_product = 'LK434434' and b.sub_shopping_list in ('LL5435','PO89554','IO948854','OR4334','TH5444')
)
OR
(
b.shopping_product = 'AZ434434' and b.sub_shopping_list in ('LL54352','PO489554','IO9458854','OR34334','TH54344')
)
)AND
ORDER BY
b.time desc
ELSE
BEGIN
select <some columns>
From <some tables with joins(no lock)>
where <similar where as above with slight difference>
Okay then,
I said "first take indexes on these : shopping_product and shopping_category sub_shopping_list , and secondly u can try on the date , after that see the execution plan. (or would be better to create partition on the time column)"
I'm working on oracle, but the basics are the same.
You can create 3 distinct indexes on that cols : shopping_product, shopping_category, sub_shopping_list . Or you can create 1 composite index for that 3 cols. The point is you need to examine the execution plan which one is the most effective for you.
Oh, and here is a.ca_id column (almost forget), you need an index for this too.
For the date column i think you would better create a partition instead of an index.
Summary, two ways: - create 4 distinct index (shopping_product,shopping_category,sub_shopping_list, ca_id) , create a range typed partition on the date column
- create 1 composite index (shopping_product,shopping_category,sub_shopping_list) and 1 normal index(ca_id) , create a range typed partition on the date column
You probably should learn about indexing if you're dealing with tables of this size. It's not a trivial process. JOIN operations are a big deal when sorting out which indexes you need. Read this. http://use-the-index-luke.com/
In the meantime, if your date-range is highly selective (that is, if
b.time >= #from_datetime AND b.time <= #to_datetime
chooses a reasonably small fraction of the rows in your database) you should try the following compound index.
b.shopping_product, b.time
If that doesn't help, try
b.time
by itself. The idea is to structure your index so the server can do a range scan. Without a knowledge of your whole query, there's not much else to offer.

ORACLE, PLSQL, Select from pre-populated set of values in WHERE IN clause

Condensed Example & Explanation
I want to write a WHERE IN clause that selects from a pre-populated set of numbers
Here's some code. I want to store this set of numbers, and select from them so i don't have to repeat the query that generates this set of numbers.
ARRAY_OF_NUMBERS = Values from some select statement
-- SHIPMENTS CURSOR
OPEN O_SHIPMENTS_CURSOR FOR
SELECT *
FROM Q194977.AN_SHIPMENT_INFO SI
WHERE INTERNAL_ASN IN (ARRAY_OF_NUMBERS) -- need to populate something
ORDER BY INTERNAL_ASN;
-- ORDER CURSOR
OPEN O_ORDERS_CURSOR FOR
SELECT *
FROM Q194977.AN_ORDER_INFO OI
WHERE INTERNAL_ASN IN (ARRAY_OF_NUMBERS) -- need to populate something
ORDER BY INTERNAL_ASN;
I read something about using an array, but it said it had to be a global array instead of session level. I'm not sure how true this is, and I'm not sure what a global array even is, but i imagine this needs to be session level as it would change with each procedural call. Perhaps i could use a temporary table.
Any ideas on the best way i can accomplish this?
------------- EDIT ------------
(Adding detailed example)
Detailed Example and Explanation
I have 4 tables at 4 different hierarchical levels, and 4 stored procedures. Each procedure contains input criteria to build a selection of data at all 4 levels via criteria for a certain level.
In this example, my caller will input selection criteria that exists at the carton level. Then i will use the INTERNAL_ASN numbers narrowed from this selection, to move up hierarchical levels and retrieve: ORDERS this carton is on, SHIPMENTS that ORDER is on, and then down to retrieve: ITEMS on this CARTON.
I noticed when going up levels, i was repeating the same selection, and though i should somehow store this set of numbers, so i didn't rerun the selection each time to get them, but wasn't sure how.
-- SHIPMENTS CURSOR
OPEN O_SHIPMENTS_CURSOR FOR
SELECT *
FROM Q194977.AN_SHIPMENT_INFO SI
WHERE INTERNAL_ASN IN
(SELECT INTERNAL_ASN
FROM Q194977.AN_CARTON_INFO CI
WHERE (I_BOL IS NULL OR BILL_OF_LADING = I_BOL)
AND ( I_CARTON_NO IS NULL
OR CARTON_NO = I_CARTON_NO)
AND (I_PO_NO = 0 OR PO_NO = I_PO_NO)
AND (I_STORE_NO = 0 OR STORE_NO = I_STORE_NO))
ORDER BY INTERNAL_ASN;
-- ORDER CURSOR
OPEN O_ORDERS_CURSOR FOR
SELECT *
FROM Q194977.AN_ORDER_INFO OI
WHERE INTERNAL_ASN IN
(SELECT INTERNAL_ASN
FROM Q194977.AN_CARTON_INFO CI
WHERE (I_BOL IS NULL OR BILL_OF_LADING = I_BOL)
AND ( I_CARTON_NO IS NULL
OR CARTON_NO = I_CARTON_NO)
AND (I_PO_NO = 0 OR PO_NO = I_PO_NO)
AND (I_STORE_NO = 0 OR STORE_NO = I_STORE_NO))
AND (I_PO_NO = 0 OR PO_NO = I_PO_NO)
ORDER BY INTERNAL_ASN;
-- CARTONS CURSOR
OPEN O_CARTONS_CURSOR FOR
SELECT *
FROM Q194977.AN_CARTON_INFO CI
WHERE (I_BOL IS NULL OR BILL_OF_LADING = I_BOL)
AND (I_CARTON_NO IS NULL OR CARTON_NO = I_CARTON_NO)
AND (I_PO_NO = 0 OR PO_NO = I_PO_NO)
AND (I_STORE_NO = 0 OR STORE_NO = I_STORE_NO)
ORDER BY INTERNAL_ASN;
-- ITEMS CURSOR
OPEN O_ITEMS_CURSOR FOR
SELECT *
FROM Q194977.AN_ITEM_INFO II
WHERE CARTON_NO IN
(SELECT CARTON_NO
FROM Q194977.AN_CARTON_INFO CI
WHERE (I_BOL IS NULL OR BILL_OF_LADING = I_BOL)
AND ( I_CARTON_NO IS NULL
OR CARTON_NO = I_CARTON_NO)
AND (I_PO_NO = 0 OR PO_NO = I_PO_NO)
AND (I_STORE_NO = 0 OR STORE_NO = I_STORE_NO))
ORDER BY INTERNAL_ASN;
Assuming that you mean a collection of numbers (there are three collection types in PL/SQL, one of which is an associative array, but that doesn't sound like what you want here), you could do something like
CREATE OR REPLACE TYPE num_tbl
AS TABLE OF NUMBER;
Then, in your procedure
l_nums num_tbl;
BEGIN
SELECT some_number
BULK COLLECT INTO l_nums
FROM <<your query to get the numbers>>;
<<more code>>
OPEN O_SHIPMENTS_CURSOR FOR
SELECT *
FROM Q194977.AN_SHIPMENT_INFO SI
WHERE INTERNAL_ASN IN (SELECT column_value
FROM TABLE( l_nums ))
ORDER BY INTERNAL_ASN;
That is syntactically valid. Whether it is actually going to be useful to you, however, is a separate question.
Collections are stored in the relatively expensive PGA memory on the database server. If you're storing a couple hundred numbers in a collection, that's probably not a huge concern. If, on the other hand, you're storing 10's or 100's of MB of data and running this in multiple sessions, this one bit of code could easily consume many GB of the RAM on the database server leading to lots of performance issues.
Moving large quantities of data from SQL to PL/SQL and then back to SQL can also be somewhat problematic from a performance standpoint-- it's generally more efficient to leave everything in SQL and let the SQL engine handle it.
If you use a collection in this way, you're preventing the optimizer from considering join orders and query plans that merge the two queries in a more efficient manner. If you are certain that the most efficient plan is one where a small number of internal_asn values are used to probe the an_shipment_info table using an index, that may not be a major concern. If you're not sure about what the best query plan is, and particularly if your actual queries are more complicated than what you posted, however, you might be preventing the optimizer from using the most efficient plan for each query.
What is the problem that you're trying to solve? You talk about not wanting to duplicate code. That would lead me to suspect that you really just want a view that you can reference in your queries rather than repeating the code for a complicated SQL statement. But that presumes that the issue you're trying to solve is one of code elegance which may or may not be accurate.

Preserving ORDER BY in SELECT INTO

I have a T-SQL query that takes data from one table and copies it into a new table but only rows meeting a certain condition:
SELECT VibeFGEvents.*
INTO VibeFGEventsAfterStudyStart
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON
CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
ORDER BY VibeFGEvents.id
The code using the table relies on its order, and the copy above does not preserve the order I expected. I.e. the rows in the new table VibeFGEventsAfterStudyStart are not monotonically increasing in the VibeFGEventsAfterStudyStart.id column copied from VibeFGEvents.id.
In T-SQL how might I preserve the ordering of the rows from VibeFGEvents in VibeFGEventsStudyStart?
I know this is a bit old, but I needed to do something similar. I wanted to insert the contents of one table into another, but in a random order. I found that I could do this by using select top n and order by newid(). Without the 'top n', order was not preserved and the second table had rows in the same order as the first. However, with 'top n', the order (random in my case) was preserved. I used a value of 'n' that was greater than the number of rows. So my query was along the lines of:
insert Table2 (T2Col1, T2Col2)
select top 10000 T1Col1, T1Col2
from Table1
order by newid()
What for?
Point is – data in a table is not ordered. In SQL Server the intrinsic storage order of a table is that of the (if defined) clustered index.
The order in which data is inserted is basically "irrelevant". It is forgotten the moment the data is written into the table.
As such, nothing is gained, even if you get this stuff. If you need an order when dealing with data, you HAVE To put an order by clause on the select that gets it. Anything else is random - i.e. the order you et data is not determined and may change.
So it makes no sense to have a specific order on the insert as you try to achieve.
SQL 101: sets have no order.
Just add top to your sql with a number that is greater than the actual number of rows:
SELECT top 25000 *
into spx_copy
from SPX
order by date
I've found a specific scenario where we want the new table to be created with a specific order in the columns' content:
Amount of rows is very big (from 200 to 2000 millions of rows), so we are using SELECT INTO instead of CREATE TABLE + INSERT because needs to be loaded as fast as possible (minimal logging). We have tested using the trace flag 610 for loading an already created empty table with a clustered index but still takes longer than the following approach.
We need the data to be ordered by specific columns for query performances, so we are creating a CLUSTERED INDEX just after the table is loaded. We discarded creating a non-clustered index because it would need another read for the data that's not included in the ordered columns from the index, and we discarded creating a full-covering non-clustered index because it would practically double the amount of space needed to hold the table.
It happens that if you manage to somehow create the table with columns already "ordered", creating the clustered index (with the same order) takes a lot less time than when the data isn't ordered. And sometimes (you will have to test your case), ordering the rows in the SELECT INTO is faster than loading without order and creating the clustered index later.
The problem is that SQL Server 2012+ will ignore the ORDER BY column list when doing INSERT INTO or when doing SELECT INTO. It will consider the ORDER BY columns if you specify an IDENTITY column on the SELECT INTO or if the inserted table has an IDENTITY column, but just to determine the identity values and not the actual storage order in the underlying table. In this case, it's likely that the sort will happen but not guaranteed as it's highly dependent on the execution plan.
A trick we have found is that doing a SELECT INTO with the result of a UNION ALL makes the engine perform a SORT (not always an explicit SORT operator, sometimes a MERGE JOIN CONCATENATION, etc.) if you have an ORDER BY list. This way the select into already creates the new table in the order we are going to create the clustered index later and thus the index takes less time to create.
So you can rewrite this query:
SELECT
FirstColumn = T.FirstColumn,
SecondColumn = T.SecondColumn
INTO
#NewTable
FROM
VeryBigTable AS T
ORDER BY -- ORDER BY is ignored!
FirstColumn,
SecondColumn
to
SELECT
FirstColumn = T.FirstColumn,
SecondColumn = T.SecondColumn
INTO
#NewTable
FROM
VeryBigTable AS T
UNION ALL
-- A "fake" row to be deleted
SELECT
FirstColumn = 0,
SecondColumn = 0
ORDER BY
FirstColumn,
SecondColumn
We have used this trick a few times, but I can't guarantee it will always sort. I'm just posting this as a possible workaround in case someone has a similar scenario.
You cannot do this with ORDER BY but if you create a Clustered Index on VibeFGEvents.id after your SELECT INTO the table will be sorted on disk by VibeFGEvents.id.
I'v made a test on MS SQL 2012, and it clearly shows me, that insert into ... select ... order by makes sense. Here is what I did:
create table tmp1 (id int not null identity, name sysname);
create table tmp2 (id int not null identity, name sysname);
insert into tmp1 (name) values ('Apple');
insert into tmp1 (name) values ('Carrot');
insert into tmp1 (name) values ('Pineapple');
insert into tmp1 (name) values ('Orange');
insert into tmp1 (name) values ('Kiwi');
insert into tmp1 (name) values ('Ananas');
insert into tmp1 (name) values ('Banana');
insert into tmp1 (name) values ('Blackberry');
select * from tmp1 order by id;
And I got this list:
1 Apple
2 Carrot
3 Pineapple
4 Orange
5 Kiwi
6 Ananas
7 Banana
8 Blackberry
No surprises here. Then I made a copy from tmp1 to tmp2 this way:
insert into tmp2 (name)
select name
from tmp1
order by id;
select * from tmp2 order by id;
I got the exact response like before. Apple to Blackberry.
Now reverse the order to test it:
delete from tmp2;
insert into tmp2 (name)
select name
from tmp1
order by id desc;
select * from tmp2 order by id;
9 Blackberry
10 Banana
11 Ananas
12 Kiwi
13 Orange
14 Pineapple
15 Carrot
16 Apple
So the order in tmp2 is reversed too, so order by made sense when there is a identity column in the target table!
The reason why one would desire this (a specific order) is because you cannot define the order in a subquery, so, the idea is that, if you create a table variable, THEN make a query from that table variable, you would think you would retain the order(say, to concatenate rows that must be in order- say for XML or json), but you can't.
So, what do you do?
The answer is to force SQL to order it by using TOP in your select (just pick a number high enough to cover all your rows).
I have run into the same issue and one reason I have needed to preserve the order is when I try to use ROLLUP to get a weighted average based on the raw data and not an average of what is in that column. For instance, say I want to see the average of profit based on number of units sold by four store locations? I can do this very easily by creating the equation Profit / #Units = Avg. Now I include a ROLLUP in my GROUP BY so that I can also see the average across all locations. Now I think to myself, "This is good info but I want to see it in order of Best Average to Worse and keep the Overall at the bottom (or top) of the list)." The ROLLUP will fail you in this so you take a different approach.
Why not create row numbers based on the sequence (order) you need to preserve?
SELECT OrderBy = ROW_NUMBER() OVER(PARTITION BY 'field you want to count' ORDER BY 'field(s) you want to use ORDER BY')
, VibeFGEvents.*
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON
CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
Now you can use the OrderBy field from your table to set the order of values. I removed the ORDER BY statement from the query above since it does not affect how the data is loaded to the table.
I found this approach helpful to solve this problem:
WITH ordered as
(
SELECT TOP 1000
[Month]
FROM SourceTable
GROUP BY [Month]
ORDER BY [Month]
)
INSERT INTO DestinationTable (MonthStart)
(
SELECT * from ordered
)
Try using INSERT INTO instead of SELECT INTO
INSERT INTO VibeFGEventsAfterStudyStart
SELECT VibeFGEvents.*
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON
CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
ORDER BY VibeFGEvents.id`

Microsoft SQL Server Paging

There are a number of sql server paging questions on stackoverflow and many of them talk about using ROW_NUMBER() OVER (ORDER BY ...) AND CTE. Once you get into the hundreds of thousands of rows and start adding sorting on non-primary key values and adding custom WHERE clauses, these methods become very inneficient. I have a dataset of several million rows I am trying to page through with custom sorting and filtering, but I am getting poor performance, even with indexes on all the fields that I sort by and filter by. I even went as far as to include my SELECT columns in each of the indexes, but this barely helped and severely bloated my database.
I noticed the stackoverflow paging only takes about 500 milliseconds no matter what sorting criteria or page number you click on. Anyone know how to make paging work efficiently in SQL Server 2008 with millions of rows? This would include getting the total rows as efficiently as possible.
My current query has the exact same logic as this stackoverflow question about paging:
Best paging solution using SQL Server 2005?
Anyone know how to make paging work efficiently in SQL Server 2008 with millions of rows?
If you want accurate perfect paging, there is no substitute for building an index key (position row number) for each record. However, there are alternatives.
(1) total number of pages (records)
You can use an approximation from sysindexes.rows (almost instant) assuming the rate of change is small.
You can use triggers to maintain a completely accurate, to the second, table row count
(2) paging
(a)
You can show page jumps within say the next five pages to either side of a record. These need to scan at most {page size} x 5 on each side. If your underlying query lends itself to travelling along the sort order quickly, this should not be slow. So given a record X, you can go to the previous page using (assuming sort order is a asc, b desc
select top(#pagesize) t.*
from tbl x
inner join tbl t on (t.a = x.a and t.b > x.b) OR
(t.a < a.x)
where x.id = #X
order by t.a asc, t.b desc
(i.e. the last {page size} of records prior to X)
To go five pages back, you increase it to TOP(#pagesize*5) then further TOP(#pagesize) from that subquery.
Downside: This option requires that you cannot directly jump to a particular location, your options are only FIRST (easy), LAST (easy), NEXT/PRIOR, <5 pages either side
(b)
If the paging is always going to be quite specific and predictable, maintain an INDEXED view or trigger-updated table that does not contain gaps in the row number. This may be an option if the tables normally only see updates at one end of the spectrum, with gaps from deletes easily filled quickly by shifting not-so-many records.
This approach gives you a rowcount (last row) and also direct access to any page.
try this, let say you have country table as below:
DECLARE #pageIndex INT=0;
DECLARE #pageSize INT= 10;
DECLARE #sortByColumn NVARCHAR(200)='Code';
DECLARE #sortByDesc BIT=0;
;WITH tbl AS (
SELECT COUNT(id) OVER() [RowTotal], c.Id, c.Code, c.Name
FROM dbo.[Country] c
ORDER BY
CASE WHEN #sortByColumn='Code' AND #sortByDesc=0 THEN c.Code END ASC,
CASE WHEN #sortByColumn='Code' AND #sortByDesc<>0 THEN c.Code END DESC,
CASE WHEN #sortByColumn='Name' AND #sortByDesc=0 THEN c.Name END ASC,
CASE WHEN #sortByColumn='Name' AND #sortByDesc<>0 THEN c.Name END DESC,
,c.Name ASC --DEFAULT SORTING ORDER
OFFSET #PageIndex*#pageSize ROWS
FETCH NEXT #pageSize ROWS ONLY
) SELECT (#PageIndex*#pageSize)+(ROW_NUMBER() OVER(ORDER BY Id))[RowNo],* from tbl;

The fastest way to check if some records in a database table?

I have a huge table to work with . I want to check if there are some records whose parent_id equals my passing value .
currently what I implement this is by using "select count(*) from mytable where parent_id = :id"; if the result > 0 , means the they do exist.
Because this is a very huge table , and I don't care what's the exactly number of records that exists , I just want to know whether it exists , so I think count(*) is a bit inefficient.
How do I implement this requirement in the fastest way ? I am using Oracle 10.
#
According to hibernate Tips & Tricks https://www.hibernate.org/118.html#A2
It suggests to write like this :
Integer count = (Integer) session.createQuery("select count(*) from ....").uniqueResult();
I don't know what's the magic of uniqueResult() here ? why does it make this fast ?
Compare to "select 1 from mytable where parent_id = passingId and rowrum < 2 " , which is more efficient ?
An EXISTS query is the one to go for if you're not interested in the number of records:
select 'Y' from dual where exists (select 1 from mytable where parent_id = :id)
This will return 'Y' if a record exists and nothing otherwise.
[In terms of your question on Hibernate's "uniqueResult" - all this does is return a single object when there is only one object to return - instead of a set containing 1 object. If multiple results are returned the method throws an exception.]
There's no real difference between:
select 'y'
from dual
where exists (select 1
from child_table
where parent_key = :somevalue)
and
select 'y'
from mytable
where parent_key = :somevalue
and rownum = 1;
... at least in Oracle10gR2 and up. Oracle's smart enough in that release to do a FAST DUAL operation where it zeroes out any real activity against it. The second query would be easier to port if that's ever a consideration.
The real performance differentiator is whether or not the parent_key column is indexed. If it's not, then you should run something like:
select 'y'
from dual
where exists (select 1
from parent_able
where parent_key = :somevalue)
select count(*) should be lighteningly fast if you have an index, and if you don't, allowing the database to abort after the first match won't help much.
But since you asked:
boolean exists = session.createQuery("select parent_id from Entity where parent_id=?")
.setParameter(...)
.setMaxResults(1)
.uniqueResult()
!= null;
(Some syntax errors to be expected, since I don't have a hibernate to test against on this computer)
For Oracle, maxResults is translated into rownum by hibernate.
As for what uniqueResult() does, read its JavaDoc! Using uniqueResult instead of list() has no performance impact; if I recall correctly, the implementation of uniqueResult delegates to list().
First of all, you need an index on mytable.parent_id.
That should make your query fast enough, even for big tables (unless there are also a lot of rows with the same parent_id).
If not, you could write
select 1 from mytable where parent_id = :id and rownum < 2
which would return a single row containing 1, or no row at all. It does not need to count the rows, just find one and then quit. But this is Oracle-specific SQL (because of rownum), and you should rather not.
For DB2 there is something like select * from mytable where parent_id = ? fetch first 1 row only. I assume that something similar exists for oracle.
This query will return 1 if any record exists and 0 otherwise:
SELECT COUNT(1) FROM (SELECT 1 FROM mytable WHERE ROWNUM < 2);
It could help when you need to check table data statistics, regardless table size and any performance issue.

Resources