The amount of data in the super table is too large to be searched, is there any way to count the number of rows?The table ofINFORMATION_SCHEMA.INS_STABLES does not exist.
taos> select count(*) from s_device ; DB error: Query killed (4328.710142s)
i want to count the number of rows in the super table
Related
I'm having a SQL Server table which is having approximately 900 million rows. It is having an Auto increment Id column. My goal is to update the table by only taking 40000 rows at a time. Like, fetch 1st 40000 rows, generate the records for those by calling an API and update in the table. Next, take the next 40000 rows starting from Id 40001, generate the records and store them in table.
For the process, I'm creating a temp table, inserting 40000 records in the temp table from the target table, processing them and updating in the target table. Again in the next iteration, truncating the temp table, taking the next 40000 rows from the target table and inserting into temp table and processing them.
I need to use the temp table because I want to get the Max Id from the temp table, so that, in the next iteration I can select rows from the target table that is greater than Max Id.
Is there any better process to do it?
How about simply using your ID column to limit the range?
SELECT TOP (40000)
*
FROM table
WHERE id > #id
ORDER BY id ASC;
Then loop through in your code, making the start id the last from the prior select.
I'm using SQL Server Export Wizard to migrate 2 million rows over to a Postgres database. After 10 hours, I got to 1.5 million records and it quit. Argh.
So I'm thinking the safest way to get this done is to do it in batches. 100k rows at a time. But how do you do that?
Conceptually, I'm thinking:
SELECT * FROM invoices WHERE RowNum BETWEEN 300001 AND 400000
But RowNum doesn't exist, right? Do I need to create a new column and somehow get a +1 incremental ID in there that I can use in a statement like this? There is no primary key and there are no columns with unique values.
Thanks!
The rows are invoices, so I created a new variable 'Quartile' that divides the invoice dollar values into quartiles using:
SELECT *,
NTILE(4) OVER(ORDER BY TOTAL_USD) AS QUARTILE
INTO invoices2
FROM invoices
This created four groups of 500k rows each. Then in the export wizard, I asked to:
SELECT * FROM invoices2 WHERE QUARTILE = 1 -- (or 2, 3, 4 etc)
And I'm going to send each group of 500k rows to its own Postgres table and then merge them over on pgAdmin. That way, if any one crashes, I can just do that smaller grouping over again without affecting the integrity of the others. Does that make sense? Maybe would have been just as easy to create an incrementing primary key?
Update:
All four batches transferred successfully. Worth noting that the total transfer time was 4x faster when sending the 2M rows as four simultaneous batches of 500k--4 hours instead of 16! Combined them back into a single table using the following query in pgAdmin:
--Combine tables back into one, total row count matches original
SELECT * INTO invoices_all FROM (
SELECT * FROM quar1
UNION All
SELECT * FROM quar2
UNION All
SELECT * FROM quar3
UNION All
SELECT * FROM quar4
) as tmp
And checked sums of all variables that had to be converted from SQL Server "money" to Postgres "numeric"
--All numeric sums match those from original
SELECT SUM("TOTAL_BEFORE_TP_TAX")
,SUM("TP_SELLER")
,SUM("TOTAL_BEFORE_TAX")
,SUM("TAX_TOTAL")
,SUM("TOTAL")
,SUM("TOTAL_BEFORE_TP_TAX_USD")
,SUM("TP_SELLER_USD")
,SUM("TOTAL_BEFORE_TAX_USD")
,SUM("TAX_TOTAL_USD")
,SUM("TOTAL_USD")
FROM PUBLIC.invoices_all
We are logging realtime data every second to a SQL Server database and we want to generate charts from 10 Million rows or more. At the moment we use something like the code below. The goal is to get at least 1000-2000 values to pass into the chart.
In the query below, we take an avg of every next n'th rows depending on the count of data we pick out from the LargeTable. This works fine up to 200.000 selected rows, but then it is way too slow.
SELECT
AVG(X),
AVG(Y)
FROM
(SELECT
X, Y,
(Id / #AvgCount) AS [Group]
FROM
[LargeTable]
WHERE
Timestmp > #From
AND Timestmp < #Till) j
GROUP BY
[Group]
ORDER BY
X;
Now we tried to select out only every n'th row from LargeTable and then make an average of this data to get more performance, but it takes nearly the same time.
SELECT
X, Y
FROM
(SELECT
X, Y,
ROW_NUMBER() OVER (ORDER BY Id) AS rownr
FROM
LargeTable
WHERE
Timestmp >= #From
AND Timestmp <= #Till) a
WHERE
a.rownr % (#count / 10000) = 0;
It is only pseudo code! We have indexes on all relevant columns.
Are there better and faster ways to get chart data?
I think on two approaches to improve the performance of the charts:
Trying to improve the performance of the queries.
Reducing the amount of data needed to be read.
It's almost impossible for me to improve the performance of the queries without the full DDL and execution plans. So I'm suggesting you to reduce the amount of data to be read.
The key is summarizing groups at a given granularity level as the data comes and storing it in a separate table like the following:
CREATE TABLE SummarizedData
(
int GroupId PRIMARY KEY,
FromDate datetime,
ToDate datetime,
SumX float,
SumY float,
GroupCount
)
IdGroup should be equals to Id/100 or Id/1000 depending on how much granularity you want in groups. With larger groups you get more coarse granularity but more efficient charts.
I'm assuming LargeTable Id column increases monotonically, so you can store the last Id that has been processed in another table called SummaryProcessExecutions
You would need a stored procedure ExecuteSummaryProcess that:
Read LastProcessedId from SummaryProcessExecutions
Read the Last Id on large table and store it into #NewLastProcessedId variable
Summarize all rows from LargeTable with Id > #LastProcessedId and Id <= #NewLastProcessedId and store the results into SummarizedData table
Store #NewLastProcessedId variable into SummaryProcessExecutions table
You can execute ExecuteSummaryProcess stored procedure frequently in a SQL Server Agent Job.
I believe that grouping by date would be a better choice than grouping by Id. It would simplify things. The SummarizedData GroupId column would not be related to LargeTable Id and you would not need to update SummarizedData rows, you would only need to insert rows.
Since the time to scan the table increases with the number of rows in it, I assume there is no index on Timestmp column. An index like the one bellow may speed up you query:
CREATE NONCLUSTERED INDEX [IDX_Timestmp] ON [LargeTable](Timestmp) INCLUDE(X, Y, Id)
Please note, that creation of such index may take significant amount of time, and it will impact your inserts too.
i have table with the following values
location date count
2150 4/5/14 100
now i need to insert 100 rows into another table .The table should have 100 rows of
location date
2150 4/5/14
Help me in achieving this.My database is netezza
Netezza has a system view that has 1024 rows each with an idx value from 0 to 1023. You can exploit this to drive an arbitrary number of rows by joining to it. Note that this approach requires you to have determined some reasonable upper limit in order to know how many times to join to _v_vector_idx.
INSERT INTO target_table
SELECT *
FROM base_table
JOIN _v_vector_idx b
ON b.idx < 100;
Then if you want to drive it based on the third column from the base_table, you could do this:
INSERT INTO target_table
SELECT location,
DATE
FROM base_table a
JOIN _v_vector_idx b
ON b.idx < a.count;
One could also take a procedural approach and create a stored procedure if you didn't have a feel for what that reasonable upper limit might be.
I have a system that needs to suck up an entire MS SQL database. Currently it does so with something like:
select top 1000 from table where id > 0 order by id;
Then, for the next chunk:
select top 1000 from table where id > 1000 order by id;
And then:
select top 1000 from table where id > 2000 order by id;
And so forth.
In MySQL, I've learned that doing LIMIT and OFFSET queries is brutally slow because the database has to first sort the results, then scan over the OFFSET count. When that count gets big, life starts to suck as the read count skyrockets.
My question is this: does the same problem apply to TOP? Put another way, can I expect a really high read count when I run these queries on a database with, say 10,000,000 records, at the time when id > 9,999,000? If so, are there any ways to handle this better?
It will be very fast if ID is indexed. If that column is not indexed then it would case a full table scan.
I would suggest the following in addition:
select * from table where id > 0 and id <= 1000 order by id ;
This way if you don't have all records you don't have bugs (duplicates).