SQLite Row_Num/ID - database

I have a SQLite database that I'm trying to use data from, basically there are multiple sensors writing to the database. And I need to join one row to the proceeding row to calculate the value difference for that time period. But the only catch is the ROWID field in the database can't be used to join on anymore since there are more sensors beginning to write to the database.
In SQL Server it would be easy to use Row_Number and partition by sensor. I found this topic: How to use ROW_NUMBER in sqlite and implemented the suggestion:
select id, value ,
(select count(*) from data b where a.id >= b.id and b.value='yes') as cnt
from data a where a.value='yes';
It works but is very slow. Is there anything simple I'm missing? I've tried to join on the time difference possibly, create a view. Just at wits end! Thanks for any ideas!
Here is sample data:
ROWID - SensorID - Time - Value
1 2 1-1-2015 245
2 3 1-1-2015 4456
3 1 1-1-2015 52
4 2 2-1-2015 325
5 1 2-1-2015 76
6 3 2-1-2015 5154
I just need to join row 6 with row 2 and row 3 with row 5 and so forth based on the sensorID.

The subquery can be sped up with an index with the correct structure.
In this case, the column with the equality comparison must come first, and the one with unequality, second:
CREATE INDEX xxx ON MyTable(SensorID, Time);

Related

tableau custom sql pivot

I have a SQL Server data source that I cannot change the structure of.
I have started pivoting the data with CustomSQL query below, but I need to modify the query so that when iterations 3, 4 ,5 ...n data are added to the source in the future, it automatically includes it in the pivoted data. I don't want to have to keep updating the query. Any ideas?
KPI Name Iteration 1 Iteration 2
a 1 2
b 50 51
Select [KPI]
, 'Iteration1' as [Iteration]
, [Iteration1] as [Count]
From [MC_KPI]
Union ALL
Select [KPI]
, 'Iteration2' as [Iteration]
, [Iteration2] as [Count]
From [KPI]
Now I have this
KPI Name Iteration 1 Iteration 2
a 1 1
a 2 2
b 1 50
b 2 51
What you are doing is called "unpivot" in SQL Server. You can see the description here:
https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-2017
If you want to be able to add iterations without having to modify something in tableau, you can create a view in SQL that does unpivot and just do "select * from view" in tableau. This would give you the chance to change the view under the covers from tableau and have things continue to work (since the unpivot output is just a property bag and the columns are not really changing as you add properties into the output)

SSMS T-SQL set column for duplicate row number

I have a database with approximately 10 million rows (and 20 columns - about 4 GB) where about 10% of the rows have a duplicate column. Database is in SQL Server 2014 Express and using SSMS.
I created a new column CNT (int, null) to count the occurrences of each row where I have a duplicate ID. Desired result would look like:
ID CNT
100 1
100 2
101 1
102 1
102 2
103 1
104 1
Not being really familiar with advanced SQL capabilities I did some research and came up with using a CTE to set the CNT column. Worked fine on a small test table - but it was obvious this is not the way to go for a large table (I killed it after 5+ hours on a pretty decent system.)
Here's the code that I attempted to implement:
with CTE as
(select dbo.database.id, dbo.database.cnt,
RN = row_number() over (partition by id order by id)
from dbo.databasee)
update CTE set CNT = RN
Column ID is of type Int. All columns allow nulls - there are no keys or indexed columns.
Edit: Martin is right, I can only offer an alternate solution than the CTE at the moment. Make a new table exactly like your old one, and insert the old table's data into it with this.
INSERT INTO newTable
SELECT ID, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID)
FROM oldTable;
Then you can delete your old table. Definitely not a perfect solution, but it should work.

How to (or Can I) select a random value from the Postgresql database excluding some particular records?

Is it possible to randomly select a record from the database excluding some records with particular status?
For eg,
For example, I have a table for storing employee details.
id employeename employeestatus
1 ab 1
2 cd 1
3 ef 2
4 gh 1
5 ij 1
What I want from the query is to fetch a single random record whose status is not 2. Is it possible to do so? The database I'm using is PostgreSQL 8.4.15.
TRY This
SELECT *
FROM employee
WHERE employeestatus != 2
ORDER BY RANDOM()
LIMIT 1
Try this other question on the same topic
Best way to select random rows PostgreSQL
It's tricker than you think (to do efficiently)

Is the 'BETWEEN' function very expensive in SQL Server?

I'm trying to join two relatively simple tables together, but my query is experiencing serious hangups. I'm not sure why, but I think it might have something to do with the 'between' function. My first table looks something like this (with a lot of other columns, but this would be the only column I'm pulling):
RowNumber
1
2
3
4
5
6
7
8
My second table "groups" my rows into "blocks", and has the following schema:
BlockID RowNumberStart RowNumberStop
1 1 3
2 4 7
3 8 8
The desired result I'm looking to get is to link the RowNumber with the BlockID like below, with the same number of rows with the first table. So the result would look like this:
RowNumber BlockID
1 1
2 1
3 1
4 2
5 2
6 2
7 2
8 3
In order to get that, I used the following query, writing the results into a temp table:
select A.RowNumber, B.BlockID
into TEMP_TABLE
from TABLE_1 A left join TABLE_2 B
on A.RowNumber between B.RowNumberStart and B.RowNumberStop
TABLE_1 and TABLE_2 are actually very large tables. Table 1 is about 122M Rows, and TABLE_2 is about 65M rows. In TABLE_1, the RowNumber is defined as a 'bigint', and in TABLE_2, the BlockID, RowNumberStart, and RowNumberStop are all defined as 'int'. Not sure that makes a difference, but just wanted include that information, too.
The query has now been hung up for eight hours. Similar queries on this type and volume of data are not taking anywhere near this long. So I'm wondering if it could be the 'between' statement that's hanging up this query.
Definitely would welcome any suggestions on how to make this more efficient.
BETWEEN is simply shorthand for :
select A.RowNumber, B.BlockID
into TEMP_TABLE
from TABLE_1 A left join TABLE_2 B
on A.RowNumber >= B.RowNumberStart AND A.RowNumber <= B.RowNumberStop
If execution plan goes from B to A (but left join would indicate it has to go from A to B, really), then I'm assuming TABLE_1 is indexed on RowNumber (and that should be covering on this query). If it's only got a clustered index on RowNumber and the table is very wide, I recommend a non-clustered index only on RowNumber, since you'll fit a lot more rows per page that way.
Otherwise, you want to index on TABLE_2 on RowNumberStart DESC or RowNumberStop ASC, because for given A you'd need a DESC on RowNumberStart to match.
I think you might want to change your join to an INNER JOIN, the way your join criteria is set up. (Are you ever going to get TABLE_1 in no block?)
If you look at your execution plan, you should get more clues as to why the performance might be bad, but the Stop criteria is probably not used on the seek into TABLE_1.
Unfortunately SQLMenace's answer about the SELECT INTO has been deleted. My comment regarding that was meant to be: #Martin SELECT INTO performance isn't as bad as it once was, but I still recommend a CREATE TABLE for most production because SELECT INTO will infer types and NULLability. This is fine if you verify it is doing what you think it is doing, but creating a super long varchar or a decimal column with very strange precision can result in not only odd tables, but performance issues (especially with some of those big varchars when you forget a LEFT or whatever). I think it just helps to make it clear what you are expecting the table to look like. Often I will SELECT INTO using WHERE 0 = 1 and check out the schema and then script it with my tweaks (like adding an IDENTITY or adding a column with a timestamp default).
You have one main problem: you want to display too much data volume at once. Ar you really sure you want handle the result of ALL 122M rows from table 1 at once? Do you really need that?

MS Access row number, specify an index

Is there a way in MS access to return a dataset between a specific index?
So lets say my dataset is:
rank | first_name | age
1 Max 23
2 Bob 40
3 Sid 25
4 Billy 18
5 Sally 19
But I only want to return those records between 'rank' 2 and 4, so my results set is Bob, Sid and Billy? However, Rank is not part of the table, and this should be generated when the query is run. Why don't I use an autogenerated number, because if a record is deleted, this will be inconsistent, and what if I wanted the results in reverse!
This obviously very simple, and the reason I ask is because I am working on a product catalogue and I am looking for a more efficient way of paging through the returned dataset, so if I only return 1 page worth of data from the database this is obviously going to be quicker then return a complete set of 3000 records and then having to subselect from that set!
Thanks R.
Original suggestion:
SELECT * from table where rank BETWEEN 2 and 4;
Modified after comment, that rank is not existing in structure:
Select top 100 * from table;
And if you want to choose subsequent results, you can choose the ID of the last record from the first query, say it was ID 101, and use a WHERE clause to get the next 100;
Select top 100 * from table where ID > 100;
But these won't give you what you're looking for either, I bet.
How are you calculating rank? I assume you are basing it on some data in another dataset somewhere. If so, create a function, do a table join, or do something that can calculate rank based on values in other table(s), then you can do queries based on the rank() function.
For example:
select *
from table
where rank() between 2 and 4
If you are not calculating rank based on some data somewhere, there really isn't a way to write this query, and you might as well be returning three random rows from the table.
I think you need to use a correlated subquery to calculate the rank on the fly e.g. I'm guessing the rank is based on name:
SELECT T1.first_name, T1.age,
(
SELECT COUNT(*) + 1
FROM MyTable AS T2
WHERE T1.first_name > T2.first_name
) AS rank
FROM MyTable AS T1;
The bad news is the Access data engine is poorly optimized for this kind of query; in my experience, performace will start to noticeably degrade beyond a few hundred rows.
If it is not possible to maintain the rank on the db side of the house (e.g. high insertion environment) consider doing the paging on the client side. For example, an ADO classic recordset object has properties to support paging (PageCount, PageSize, AbsolutePage, etc), something for which DAO recordsets (being of an older vintage) have no support.
As always, you'll have to perform your own timings but I suspect that when there are, say, 10K rows you will find it faster to take on the overhead of fetching all the rows to an ADO recordset then finding the page (then perhaps fabricate smaller ADO recordset consisting of just that page's worth of rows) than it is to perform a correlated subquery to only fetch the number of rows for the page.
Unfortunately the LIMIT keyword isn't available in MS Access -- that's what is used in MySQL for a multi-page presentation. If you can write an order key into the results table, then you can use it something like this:
SELECT TOP 25 MyOrder, Etc FROM Table1 WHERE MyOrder in
(SELECT TOP 55 MyOrder FROM Table1 ORDER BY MyOrder DESC)
ORDER BY MyOrder ASCENDING
If I understand you correctly, there is ionly first_name and age columns in your table. If this is the case, then there is no way to return Bob, Sid, and Billy with a single query. Unless you do something like
SELECT * FROM Table
WHERE FirstName = 'Bob'
OR FirstName = 'Sid'
OR FirstName = 'Billy'
But I think that this is not what you are looking for.
This is because SQL databases make no guarantee as to the order that the data will come out of the database unless you specify an ORDER BY clause. It will usually come out in the same order it was added, but there are no guarantees, and once you get a lot of rows in your table, there's a reasonably high probability that they won't come out in the order you put them in.
As a side note, you should probably add a "rank" column (this column is usually called id) to your table, and make it an auto incrementing integer (see Access documentation), so that you can do the query mentioned by Sev. It's also important to have a primary key so that you can be certain which rows are being updated when you are running an update query, or which rows are being deleted when you run a delete query. For example, if you had 2 people named Max, and they were both 23, how you delete 1 row without deleting the other. If you had another auto incrementing unique column in there, you could specify the unique ID in your query to delete only one.
[ADDITION]
Upon reading your comment, If you add an autoincrement field, and want to read 3 rows, and you know the ID of the first row you want to read, then you can use "TOP" to read 3 rows.
Assuming your data looks like this
ID | first_name | age
1 Max 23
2 Bob 40
6 Sid 25
8 Billy 18
15 Sally 19
You can wuery Bob, Sid and Billy with the following QUERY.
SELECT TOP 3 FirstName, Age
From Table
WHERE ID >= 2
ORDER BY ID

Resources