Sybase - row count - sybase

Could these counts be different, ever? (in Sybase 15)
SELECT COUNT(1) FROM MY_TABLE
and
select st.rowcnt
from sysobjects ob, systabstats st
where ob.name = "MY_TABLE"
and st.id=ob.id

Yes, they can be different, for example when there is insert/delete activity going on for the table. That may be tricky to reproduce however.

Yes, both may be different. For example when there is change in the table like insert or delete the row. It takes some time to update rowcnt in the systabstats.
But when you use count(1) then it always returns exact count.

This could be different , as data in stats table is not always/exact realtime

yes, possible different due to concurrent DML on the table.

Related

dirty reads: Different results within single query?

In SQL Server 2014, when I issue the following SQL:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT * FROM TableA
UNION ALL
SELECT * FROM TableB
WHERE NOT EXISTS (
SELECT 1 FROM TableA WHERE TableA.ID = TableB.ID
)
Is it possible to read different versions of one table even within a single Statement because of dirty reads?
Example: Reading 2 Rows from TableA in the first part of the union but reading just 1 Row from TableA in the inner select of the second part of the union because one row got deleted by another transaction meanwhile.
Short answer: yes, depending on the execution plan generated. It doesn't matter that you're doing it in a single statement; no special privileges are associated with a statement boundary. READ UNCOMMITTED means no locking on data for any reason, and that's exactly what you'll get. This is also why using that generally is very much not recommended; it's terribly easy to get inconsistent/"impossible" results. Heck, even a single SELECT is not safe: you're not even guaranteed that rows will not be skipped or duplicated!
Seems to me its god damn possible.
The query execution plan will look like this:
It looks like there will be two different reads from TableA, so it really depends on time delay between them and amount of CRUD operation made to those table.
READ UNCOMMITTED is really not so great choice for such query.

Select Count(*) vs Select Count(id) vs select count(1). Are these indeed equivalent?

As a follow up to my earlier question:
Some of the answers and comments suggest that
select count(*) is mostly equivalent to select count(id) where id is the primary key.`
I have always favored select count(1); I even always use if exists (select 1 from table_name) ...
Now my question is:
1) What is the optimal way of issuing a select count query over a table?
2) If we add a where clause: where msg_type = X; if msg_type has a non_clustered index, would select count(msg_type) from table_name where msg_type = X be the preferred option for counting?
Side-bar:
From a very early age, I was taught that select * from... is BAD BAD BAD, I guess this has made me skeptical of select count(*) as well
count(*) --counts all values including nulls
count(id)-- counts this column value by excluding nulls
count(1) is same as count(*)
If we add a where clause: where msg_type = X; if msg_type has a non_clustered index, would select count(msg_type) from table_name where msg_type = X be the preferred option for counting?
As i mentioned in my previous answer ,SQL server is a cost based optimizer and the plan choosen depends on many factors .sql tries to retrieve cheapest plan in minimum time possible..
now when you issue,count(msg_type),SQL may choose this index if this is cheaper or scan another one as long as it gives right results(no nulls in output)..
I always tend to use count(*) ,unless i want to exclude nulls
Well, those count queries are not identical and will do different things.
select count(1)
select count(*)
Are identical, and will count every record !
select count(col_name)
Will count only NOT NULL values on col_name !
So, unless col_name is the PK as you said, those query will do different things.
As for you second question, it depends, we can't provide you a generic answer that will always be true. You will have to look at the explain plan or just check for your self, although I believe that adding this WHERE clause while having this index will be better.

Optimizing ROW_NUMBER() in SQL Server

We have a number of machines which record data into a database at sporadic intervals. For each record, I'd like to obtain the time period between this recording and the previous recording.
I can do this using ROW_NUMBER as follows:
WITH TempTable AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Machine_ID ORDER BY Date_Time) AS Ordering
FROM dbo.DataTable
)
SELECT [Current].*, Previous.Date_Time AS PreviousDateTime
FROM TempTable AS [Current]
INNER JOIN TempTable AS Previous
ON [Current].Machine_ID = Previous.Machine_ID
AND Previous.Ordering = [Current].Ordering + 1
The problem is, it goes really slow (several minutes on a table with about 10k entries) - I tried creating separate indicies on Machine_ID and Date_Time, and a single joined-index, but nothing helps.
Is there anyway to rewrite this query to go faster?
The given ROW_NUMBER() partition and order require an index on (Machine_ID, Date_Time) to satisfy in one pass:
CREATE INDEX idxMachineIDDateTime ON DataTable (Machine_ID, Date_Time);
Separate indexes on Machine_ID and Date_Time will help little, if any.
How does it compare to this version?:
SELECT x.*
,(SELECT MAX(Date_Time)
FROM dbo.DataTable
WHERE Machine_ID = x.Machine_ID
AND Date_Time < x.Date_Time
) AS PreviousDateTime
FROM dbo.DataTable AS x
Or this version?:
SELECT x.*
,triang_join.PreviousDateTime
FROM dbo.DataTable AS x
INNER JOIN (
SELECT l.Machine_ID, l.Date_Time, MAX(r.Date_Time) AS PreviousDateTime
FROM dbo.DataTable AS l
LEFT JOIN dbo.DataTable AS r
ON l.Machine_ID = r.Machine_ID
AND l.Date_Time > r.Date_Time
GROUP BY l.Machine_ID, l.Date_Time
) AS triang_join
ON triang_join.Machine_ID = x.Machine_ID
AND triang_join.Date_Time = x.Date_Time
Both would perform best with an index on Machine_ID, Date_Time and for correct results, I'm assuming that this is unique.
You haven't mentioned what is hidden away in * and that can sometimes means a lot since a Machine_ID, Date_Time index will not generally be covering and if you have a lot of columns there or they have a lot of data, ...
If the number of rows in dbo.DataTable is large then it is likely that you are experiencing the issue due to the CTE self joining onto itself. There is a blog post explaining the issue in some detail here
Occasionally in such cases I have resorted to creating a temporary table to insert the result of the CTE query into and then doing the joins against that temporary table (although this has usually been for cases where a large number of joins against the temp table are required - in the case of a single join the performance difference will be less noticable)
I have had some strange performance problems using CTEs in SQL Server 2005. In many cases, replacing the CTE with a real temp table solved the problem.
I would try this before going any further with using a CTE.
I never found any explanation for the performance problems I've seen, and really didn't have any time to dig into the root causes. However I always suspected that the engine couldn't optimize the CTE in the same way that it can optimize a temp table (which can be indexed if more optimization is needed).
Update
After your comment that this is a view, I would first test the query with a temp table to see if that performs better.
If it does, and using a stored proc is not an option, you might consider making the current CTE into an indexed/materialized view. You will want to read up on the subject before going down this road, as whether this is a good idea depends on a lot of factors, not the least of which is how often the data is updated.
What if you use a trigger to store the last timestamp an subtract each time to get the difference?
If you require this data often, rather than calculate it each time you pull the data, why not add a column and calculate/populate it whenever row is added?
(Remus' compound index will make the query fast; running it only once should make it faster still.)

Count of Distinct Rows Without Using Subquery

Say I have Table1 which has duplicate rows (forget the fact that it has no primary key...) Is it possible to rewrite the following without using a JOIN, subquery or CTE and also without having to spell out the columns in something like a GROUP BY?
SELECT COUNT(*)
FROM (
SELECT DISTINCT * FROM Table1
) T1
You can do something like this.
SELECT Count(DISTINCT ProductName) FROM Products
but if you want a count of completely distinct records then you will have to use one of the other options you mentioned.
If you wanted to do something like you suggested in the question, then that would imply you have duplicate records in your table.
If you didn't have duplicate records SELECT DISTINCT * from table would be the same without the distinct.
No, it's not possible.
If you are limited by your framework/query tool/whatever, can't use a subquery, and can't spell out each column name in the GROUP BY, you are SOL.
If you are not limited by your framework/query tool/whatever, there's no reason not to use a subquery.
if you really really want to do that you can just "SELECT COUNT(*) FROM table1 GROUP BY all,columns,here" and take the size of the result set as your count.
But it would be dailywtf worthy code ;)
I just wanted to refine the answer by saying that you need to check that the datatype of the columns is comparable - otherwise you will get an error trying to make them DISTINCT:
e.g.
com.microsoft.sqlserver.jdbc.SQLServerException: The ntext data type cannot be selected as DISTINCT because it is not comparable.
This is true for large binary, xml columns and others depending on your RDBMS - rtm. The solution for SQLServer for example is to cast it from an ntext to an nvarchar(MAX) from SQLServer 2005 onwards.
If you stick to the PK columns then you should be OK (I haven't verified this myself but I'd have thought logically that PK columns would have to be comparable)

How to simplify this Sql query

The Table - Query has 2 columns (functionId, depFunctionId)
I want all values that are either in functionid or in depfunctionid
I am using this:
select distinct depfunctionid from Query
union
select distinct functionid from Query
How to do it better?
I think that's the best you'll get.
Thats as good as it gets I think...
Lose the DISTINCT clauses, as your UNION (vs UNION ALL) will take care of removing duplicates.
An alternative - but perhaps less clear and probably with the same execution plan - would be to do a FULL JOIN across the 2 columns.
SELECT
COALESCE(Query1.FunctionId, Query2.DepFunctionId) as FunctionId
FROM Query as Query1
FULL OUTER JOIN Query as Query2 ON
Query1.FunctionId = Query2.DepFunctionId
I am almost sure you can loose the distinct's.
When you use UNION instead of UNION ALL, duplicated results are thrown away.
It all depends on how heavy your inline view query is. The key for a better perfomance would be to execute only once, but that is not possible given the data that it returns.
If you do it like this :
select depfunctionid , functionid from Query
group by depfunctionid , functionid
It is very likely that you'll get repeated results for depfunctionid or functionid.
I may be wrong, but it seems to me that you're trying to retrieve a tree of dependencies. If thats the case, I personally would try to use a materialized path approach.
If the materialized path is stored in a self referencing table name, I would retrieve the tree using something like
select asrt2.function_id
from a_self_referencig_table asrt1,
a_self_referencig_table asrt2
where asrt1.function_name = 'blah function'
and asrt2.materialized_path like (asrt1.materialized_path || '%')
order by asrt2.materialized_path, asrt2.some_child_node_ordering_column
This would retrieved the whole tree in the proper order. What sucks is having to construct the materialized path based on the function_id and parent_function_id (or in your case, functionid and depfunctionid), but a trigger could take care of it quite easily.

Resources