SQLite GROUP_BY doesn't maintain data order - database

If I have names like this in database
A
C
D
B
If I use group by keyword why sqlite give data like this
A
B
C
D
I mean why does it return data sequentially ?? I don't need to maintain sequence. is there any way to maintain original format by using group by

To expand on TokenMacGuy's answer:
A real database can have hundreds of tables, each with thousands of rows, each having a variable number of rows. The database software (MySQL, SQL Server, etc.) manages where and how each of these items are stored in memory. Let's say that there is a table whose rows are stored in three different places in the hard drive. When you do a SELECT statement on that table, and you don't have ORDER BY, the database software grabs the appropriate rows and shoves them into the output.
SQLite is different - the file is generally smaller, generally local. When you do a SELECT statement on a table in SQLite, the information is likely stored in the same place. To quote the sqlite language specification,
"If a SELECT statement that returns more than one row does not have an ORDER BY clause, the order in which the rows are returned is undefined." Notice that the language specification does not guarantee that the rows won't be in order.
Long story short, the GROUP BY only changes which rows show, not what order those rows are returned in. Try creating multiple tables, and inserting to them in different orders, at different times, then running your SELECT again. It will likely not be in order this time.

Rows in databases are not in any particular order, and they are not returned in any particular order.
That is, unless you provide an order by clause in your query.

Related

SQL Server query taking long time to execute

I have the below query. It currently takes 5.5 hours to execute with the results of 5,838 rows. The whole thing runs in less than 2 minutes if I remove the timestamp limiter and just run it against all the results of the views.
I am looking for a way to make this execute faster when trying to limit the time/date range and am open to any suggestions. I can go into more detail about the datasets if needed.
SELECT
m.rpp
,m.SourceName
,m.ckt AS l_ckt
,m.amp AS l_amp
,MAX(m.reading) AS l_reading
,k.ckt AS r_ckt
,k.amp AS r_amp
,MAX(k.reading) AS r_reading
FROM
vRPPPanelLeft m
INNER JOIN
vRPPPanelRight k ON (m.sourcename = k.sourcename)
AND (CAST(m.ckt AS int) + 1 = CAST(k.ckt AS int))
WHERE
m.timestampserverlocal BETWEEN '2020-10-10' AND '2020-10-12'
AND k.timestampserverlocal BETWEEN '2020-10-10' AND '2020-10-11'
GROUP BY
m.rpp, m.sourcename, m.ckt, m.amp, k.ckt, k.amp
I'm assuming m and k are views, and potentially rather complex ones and/or having GROUP BY statements.
I'm guessing that when you look at the execution plans and/or statistics, it will actually be running at least one of the views many times - e.g., one for each row in the other view.
Quick fix
If you are able to (e.g., it's in a stored procedure) I suggest the following process
Creating temporary tables with the same structures as m and k (or with relevant columns for this purpose). Include columns for the modified versions of ckt (e.g., ints for both). Consider a PK (or at least clustered index) of sourcename and ckt (the int) for both temporary tables - it may help, or may not.
Run m and k views independently, storing results in these temporary tables. Use filtering (e.g., WHERE clause on timestampserverlocal) and potentially the relevant GROUP BY clauses to reduce the number of rows created.
Run the original SQL but using the temporary tables rather than the views, and without the need for the WHERE clause. It may still need the GROUP BY.
Longer fix
I suggest doing the quick fix first to confirm that the running-views-many-times is the problem.
If it does (and the issue is that the views are being run many times) one longer fix is to stop using the views in the FROMs, and instead putting it all together in one query and then trying to simplify the code.

SQL Server : Tables vs Cursors

I'm asking for a high level understanding of what these two things are.
From what I've read, it seems that in general, a query with an ORDER BY clause returns a cursor, and basically cursors have order to them whereas tables are literally a set where order is not guaranteed.
What I don't really understand is, why are these two things talked about like two separate animals. To me, it seems like cursors are a subset of tables. The book I'm reading vaguely mentioned that
"Some language elements and operations in SQL expect to work with
table results of queries and not with cursors; examples include table
expressions and set operators"
My question would be... why not? Why won't SQL handle it like a table anyways even if it's given an ordered set?
Just to clarify, I will type out the paragraph from the book:
A query with an ORDER BY clause results in what standard SQL calls a cursor - a nonrelational result with order guaranteed among rows. You're probably wondering why it matters whether a query returns a table result or a cursor. Some language elements and operations in SQL expect to work with table results of queries and not with cursors; examples include table expressions and set operators..."
A table is a result set. It has columns and rows. You can join to it with other tables to either filter or combine the data in ONE operation:
SELECT *
FROM TABLE1 T1
JOIN TABLE2 T2
ON T1.PK = T2.PK
A cursor is a variable that stores a result set. It has columns, but the rows are inaccessible - except the top one! You can't access the records directly, rather you must fetch them ONE ROW AT A TIME.
DECLARE TESTCURSOR CURSOR
FOR SELECT * FROM Table1
OPEN TESTCURSOR
FETCH NEXT FROM TESTCURSOR
You can also fetch them into variables, if needed, for more advanced processing.
Please let me know if that doesn't clarify it for you.
With regard to this sentence,
"Some language elements and operations in SQL expect to work with
table results of queries and not with cursors; examples include table
expressions and set operators"
I think the author is just saying that there are cases where it doesn't make sense to use an ORDER BY in a fragment of a query, because the ORDER BY should be on the outer query, where it will actually affect the final result of the query.
For instance, I can't think of any point in putting an ORDER BY on a CTE ("table expression") or on the Subquery in an IN( ) expression. UNLESS (in both cases) a TOP n was used as well.
When you create a VIEW, SQL Server will actually not allow you to use an ORDER BY unless a TOP n is also used. Otherwise the ORDER BY should be specified when Selecting from the VIEW, not in the code of the VIEW itself.

retrieve data from multiple tables and store them in one table confronted the duplicates

I'm working on some critical data retrieving report tasks and find some difficulties to
proceed. Basically, it's belonging to medical area and the whole data is distributed in
several tables and I can't change the architecture of database tables design. In order to
finish my report, I need the following steps:
1- divide the whole report to several parts, for each parts retrieve data by using
several joins. (like for part A can be retrieved by this:
select a1.field1, a2.field2 from a1 left join a2 on a1.fieldA= a2.fieldA ) then I can
got all the data from part A.
2- the same things happened for part B
select b1.field1, b2.field2 from b1 left join b2 on b1.fieldB= b2.fieldB, then I also
get all the data from part B.
3- same case for part C, part D.....and so on.
The reason I divide them is that for each part I need to have more than 8 joins (medical data is always complex) so I can't finish all of them within a single join (with more than 50 joins which is impossible to finish...)
After that, I run my Spring Batch program to insert all the data from part A and data from part b, part c.. into one table as my final report table. The problem is not every part will have same number of rows which means part A may return 10 rows while part b may return 20 rows. Since the time condition for each part is the same (1 day) and can't be changed so just wondering how can I store all these different part of data into one table with minimum overhead. I don't want to have to many duplicates, thanks for the great help.
Lei
Looks to me like what you need are joins over the "data from part A", "data from part B" & "data from part C". Lets call them da, db & dc.
It's perfectly alright that num rows in da/b/c are different. But as you're trying to put them all in a single table at the end, obviously there is some relation between them. Without better description of that relation it's not possible to provide a more concrete answer. So I'll just write my thoughts, which you might already know, but anyway...
Simplest way is to join results from your 3 [inner] queries in a higher level [outer] query.
select j.x, j.y, j.z
from (
' da join db join dc
) j;
If this is not possible (due to way too many joins as you said) then try one of these:
Create 3 separate materialized views (one each for da, db & dc) and perform the join these views. Materialized is optional (i.e. you can use the "normal" views too), but it should improve the performance greatly if available in your DB.
First run queries for da/b/c, fetch the data and put this data in intermediate tables. Run a join on those tables.
PS: If you want to run reports (many/frequent/large size) on some data then that data should be designed appropriately, else you'll run into heap of trouble in future.
If you want something more concrete, please post the relationship between da/b/c.

how in clause in SQL server works

I would like to know how comparisons for IN clause in a DB work. In this case, I am interested in SQL server and Oracle.
I thought of two comparison models - binary search, and hashing. Can someone tell me what method does SQL server follow.
SQL Server's IN clause is basically shorthand for a wordier WHERE clause.
...WHERE column IN (1,2,3,4)
is shorthand for
...WHERE Column = 1
OR Column = 2
OR column = 3
OR column = 4
AFAIK there is no other logic applied that would be different from a standard WHERE clause.
It depends on the query plan the optimizer chooses.
If there is a unique index on the column you're comparing against and you are providing relatively few values in the IN list in comparison to the number of rows in the table, it's likely that the optimizer would choose to probe the index to find out the handful of rows in the table that needed to be examined. If, on the other hand, the IN clause is a query that returns a relatively large number of rows in comparison to the number of rows in the table, it is likely that the optimizer would choose to do some sort of join using one of the many join methods the database engine understands. If the IN list is relatively non-selective (i.e. something like GENDER IN ('Male','Female')), the optimizer may choose to do a simple string comparison for each row as a final processing step.
And, of course, different versions of each database with different statistics may choose different query plans that result in different algorithms to evaluate the same IN list.
IN is the same as EXISTS in SQL Server usually. They will give a similar plan.
Saying that, IN is shorthand for OR..OR as JNK mentioned.
For more than you possibly ever needed to know, see Quassnoi's blog entry
FYI: The OR shorthand leads to another important difference NOT IN is very different to NOT EXISTS/OUTER JOIN: NOT IN fails on NULLs in the list

SQL Server: Persistant "Cache" Table Across Connections

I have a set of approx 1 million rows (approx rowsize: 1.5kb) that needs to be "cached" so that many different parts of our application can utilize it.
These rows are a derived/denormalized "view" of compiled data from other tables. Generating this data isn't terribly expensive (30-60sec) but is far too slow to generate "on the fly" as part of a view or table-valued function that the application can query directly. I want to update this data periodically, perhaps every few minutes.
My first thought is to have a scheduled job that updates a global temp table with this data every n minutes.
What's the best strategy, performance-wise? I'm not sure of the performance implications of storing it in a real table versus a global temp table (##tablename) versus other strategies I haven't thought of. I don't want to muck up the transaction logs with inserts to this table... it's all derived data and doesn't need to be persisted.
I'm using Microsoft SQL Server 2000. Upgrading during the timeframe of this project isn't an option, but if there's functionality in 2005/2008/2010 that would make this easier, I'd appreciate hearing about that.
I'd recommend using a materialized view (AKA indexed view).
Limitations:
View definition must always return the same results from the same underlying data.
Views cannot use non-deterministic functions.
The first index on a View must be a clustered, UNIQUE index.
If you use Group By, you must include the new COUNT_BIG(*) in the select list.
View definition cannot contain the following:
TOP
Text, ntext or image columns
DISTINCT
MIN, MAX, COUNT, STDEV, VARIANCE, AVG
SUM on a nullable expression
A derived table
Rowset function
Another view
UNION
Subqueries, outer joins, self joins
Full-text predicates like CONTAIN or FREETEXT
COMPUTE or COMPUTE BY
Cannot include order by in view definition

Resources