is there any way to easily count all tables' rows in TDengine? - tdengine

I'm now using python to iter table names and run
select count(*) from ag1807
is there any way to get all table's info more quickly?

It seems TDengine still doesn't have any query syntax which could be used like a loop up to now.

Related

How to store 300M records in Postgresql to run efficiency queries

I have following table:
CREATE TABLE public.shop_prices
(
shop_name text COLLATE pg_catalog."default",
product text COLLATE pg_catalog."default",
product_category text COLLATE pg_catalog."default",
price text COLLATE pg_catalog."default"
)
and for this table i have a dataset from 18 months. In each file there are about 15M records. I have to some analysis, like in which month a shop has increased or decreased their price. I imported two months in a table and run following query just to test:
select shop, product from shop_prices group by shop, product limit 10
I waited more than 5 minutes, but no any result and response. It was still on working. What is the best way the store these datasets and run efficiency queries? Is it a good idea if I create for each dataset a seperate tables?
Using explain analyze select shop_name, product from shop_prices group by shop, product limit 10 you can see how Postgres is planning and executing the query and the time the execution takes. You'll see it needs to read the whole table (with the time consuming disk reads) and then sort it in memory - which will probably need to be cached on disk, before returning the results. In the next run you might discover the same query is very snappy if the number of shop_name+product combinations are very limited and thus stored in pg_stats after that explain analyze. The point being that a simple query like this can be deceiving.
You will faster execution by creating an index on the columns you are using (create index shop_prices_shop_prod_idx on public.shop_prices(shop_name,product)).
You should definitely change the price column type to numeric (or float/float8)) if you plan to do any numerical calculations on it.
Having said all that, I suspect this table is not what you will be using as it does not have any timestamp to compare prices between months to begin with.
I suggest you complete the table design and speculate on indices to improve performance. You might even want consider table partitioning https://www.postgresql.org/docs/current/ddl-partitioning.html
You will probably be doing all sorts of queries on this data so there is no simple solution to them all.
By all means return with perhaps more specific questions with complete table description and the output from the explain analyze statement for queries you are trying out and get some good advice.
Best regards,
Bjarni
What is your PostgreSQL version ?
First there is a typo: column shop should be shop_name.
Second you query looks strange because it has only a LIMIT clause without any ORDER BY clause or WHERE clause: do you really want to have "random" rows for this query ?
Can you try to post EXPLAIN output for the SQL statement:
explain select shop_name, product from shop_prices group by shop_name, product limit 10;
Can you also check if any statistics have been computed for this table with:
select * from pg_stats where tablename='shop_prices';

SQL Query select from multiple large tables, Execution Time too long

I have written a SQL Query that does what I need (I think) however it takes way too long to execute, as the query is searching through every record and Column and there are A LOT
I have tried using joins however my understanding of joins is limited and I did not manage
I tried to add a nested select statement in the where clause however didnt seem to help or Ive done it incorrectly
SELECT DISTINCT
OCRD.[E_Mail] ,
OCPR.[E_MailL],
OCPR.[Name],
OWHS.[WhsCode]
FROM OCRD, OCPR, OWHS
WHERE OWHS.WhsCode = 'zzdb';
If Possible, I would like to check that the values in OCRD.[E_Mail] as well as OCPR.[E_MailL]are not duplicated but included from both tables.
I want the query to simply return Names and Emails, where WhsCode = 'zzdb'
and not take an hour+ to execute
Thank you. Any help appreciated

HANA SQL CTE WHERE CONDITION

I'm writing a scripted calculation view on HANA using SQL.
Looking for some performance booster alternatives for the logic that I have implemented in a while loop. Simplified version of code is below.
It is trying to get similar looking vendors in table B for vendors from table A.
Please bear with me for inaccurate syntax.
v = select vendor, vendorname from A;
while --set a counter here
vendorname = capture the record from v for row number represented by counter here
t = select vendor, vendorname from v where (read single vendor from counter row)
union all
select vendor, vendorname from B where contains(vendorname,:vendorname,fuzzy(0.3))
union all
select vendor, vendorname from t
endwhile
This query dies when there are thousands of records in both the tables. So after reading few blogs, I realized that I'm going in wrong direction that is using loop.
To make this little faster, I came across something called CTE.
When I tried to implement the same code using CTE I'm not allowed to do so.
Sample code I'm trying to write is below. Can anybody please help me get this right? The syntax is not accepted by system.
t = with mytab ("Vendor", "VendorName")
AS ( select "Vendor", "VendorName" from "A" WHERE ( "Updated_Date" >= :From_Date AND "Updated_Date" <= :To_Date ) )
select * from "B" WHERE CONTAINS ("VendorName", mytab."VendorName",FUZZY(0.3));
The SQL error for this syntax is:
SQL: invalid identifier: MYTAB
I would like to know:
Whether such operation with CTE is allowed. If yes, what is the correct syntax in HANA SQL?
If No, how do I achieve the desired result without looping through one table?
Thanks,
Anup
CTE's are allowed in SAP HANA - you might want to check the HANA SQL reference if you're looking for syntax.
But as you're in a SQLScript context anyhow, you might as well use table variables instead.
What I'm not sure about is, what you are actually trying to do. Provide a description of your usage scenario, if possible.
Ok, based on your comments, the following approach could work for you.
Note, in my example I use a copy of the USERS system table, so you will have to fit the query to your tables.
do
begin
declare user_names nvarchar(5000);
select string_agg(user_name,' ') into user_names
from cusers
where user_name in ('SYS', 'SYSTEM');
select *
from cusers
where contains (user_name, :user_names, fuzzy(0.3));
end;
What I do here is to get all the potential names for which I want to do a fuzzy lookup into a variable user_names (separated by a space). For this I use the STRING_AGG() aggregation function.
After the first statement is finished, :user_names contains SYSTEM SYS in my example.
Now, CONTAINS allows to search multiple columns for multiple search terms at once (you may want to re-check the reference documentation for details here), so
CONTAINS (<column_name>, 'term1 term2 term3')
looks for all three terms in the column .
With that we feed the string SYS SYSTEM into the second query and the CONTAINS clause.
That works fine for me, avoids a join and runs over the table to be searched only once.
BTW: no idea where you get that statement about table variables in read-only procedures from - it's wrong. Of course you can use table variables, in fact it's recommended to make use of them.

Oracle 11g: Index not used in "select distinct"-query

My question concerns Oracle 11g and the use of indexes in SQL queries.
In my database, there is a table that is structured as followed:
Table tab (
rowid NUMBER(11),
unique_id_string VARCHAR2(2000),
year NUMBER(4),
dynamic_col_1 NUMBER(11),
dynamic_col_1_text NVARCHAR2(2000)
) TABLESPACE tabspace_data;
I have created two indexes:
CREATE INDEX Index_dyn_col1 ON tab (dynamic_col_1, dynamic_col_1_text) TABLESPACE tabspace_index;
CREATE INDEX Index_unique_id_year ON tab (unique_id_string, year) TABLESPACE tabspace_index;
The table contains around 1 to 2 million records. I extract the data from it by executing the following SQL command:
SELECT distinct
"sub_select"."dynamic_col_1" "AS_dynamic_col_1","sub_select"."dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM
(
SELECT "tab".* FROM "tab"
where "tab".year = 2011
) "sub_select"
Unfortunately, the query needs around 1 hour to execute, although I created the both indexes described above.
The explain plan shows that Oracle uses a "Table Full Access", i.e. a full table scan. Why is the index not used?
As an experiment, I tested the following SQL command:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
Even in this case, the index is not used and a full table scan is performed.
In my real database, the table contains more indexed columns like "dynamic_col_1" and "dynamic_col_1_text".
The whole index file has a size of about 50 GB.
A few more informations:
The database is Oracle 11g installed on my local computer.
I use Windows 7 Enterprise 64bit.
The whole index is split over 3 dbf files with about 50GB size.
I would really be glad, if someone could tell me how to make Oracle use the index in the first query.
Because the first query is used by another program to extract the data from the database, it can hardly be changed. So it would be good to tweak the table instead.
Thanks in advance.
[01.10.2011: UPDATE]
I think I've found the solution for the problem. Both columns dynamic_col_1 and dynamic_col_1_text are nullable. After altering the table to prohibit "NULL"-values in both columns and adding a new index solely for the column year, Oracle performs a Fast Index Scan.
The advantage is that the query takes now about 5 seconds to execute and not 1 hour as before.
Are you sure that an index access would be faster than a full table scan? As a very rough estimate, full table scans are 20 times faster than reading an index. If tab has more than 5% of the data in 2011 it's not surprising that Oracle would use a full table scan. And as #Dan and #Ollie mentioned, with year as the second column this will make the index even slower.
If the index really is faster, than the issue is probably bad statistics. There are hundreds of ways the statistics could be bad. Very briefly, here's what I'd look at first:
Run an explain plan with and without and index hint. Are the cardinalities off by 10x or more? Are the times off by 10x or more?
If the cardinality is off, make sure there are up to date stats on the table and index and you're using a reasonable ESTIMATE_PERCENT (DBMS_STATS.AUTO_SAMPLE_SIZE is almost always the best for 11g).
If the time is off, check your workload statistics.
Are you using parallelism? Oracle always assumes a near linear improvement for parallelism, but on a desktop with one hard drive you probably won't see any improvement at all.
Also, this isn't really relevant to your problem, but you may want to avoid using quoted identifiers. Once you use them you have to use them everywhere, and it generally makes your tables and queries painful to work with.
Your index should be:
CREATE INDEX Index_year
ON tab (year)
TABLESPACE tabspace_index;
Also, your query could just be:
SELECT DISTINCT
dynamic_col_1 "AS_dynamic_col_1",
dynamic_col_1_text "AS_dynamic_col_1_text"
FROM tab
WHERE year = 2011;
If your index was created solely for this query though, you could create it including the two fetched columns as well, then the optimiser would not have to go to the table for the query data, it could retrieve it directly from the index making your query more efficient again.
Hope it helps...
I don't have an Oracle instance on hand so this is somewhat guesswork, but my inclination is to say it's because you have the compound index in the wrong order. If you had year as the first column in the index it might use it.
Your second test query:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
would not use the index because you have no WHERE clause, so you're asking Oracle to read every row in the table. In that situation the full table scan is the faster access method.
Also, as other posters have mentioned, your index on YEAR has it in the second column. Oracle can use this index by performing a skip scan, but there is a performance hit for doing so, and depending on the size of your table Oracle may just decide to use the FTS again.
I don't know if it's relevant, but I tested the following query:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
WHERE "dynamic_col_1" = 123 AND "dynamic_col_1_text" = 'abc'
The explain plan for that query show that Oracle uses an index scan in this scenario.
The columns dynamic_col_1 and dynamic_col_1_text are nullable. Does this have an effect on the usage of the index?
01.10.2011: UPDATE]
I think I've found the solution for the problem. Both columns dynamic_col_1 and dynamic_col_1_text are nullable. After altering the table to prohibit "NULL"-values in both columns and adding a new index solely for the column year, Oracle performs a Fast Index Scan. The advantage is that the query takes now about 5 seconds to execute and not 1 hour as before.
Try this:
1) Create an index on year field (see Ollie answer).
2) And then use this query:
SELECT DISTINCT
dynamic_col_1
,dynamic_col_1_text
FROM tab
WHERE ID (SELECT ID FROM tab WHERE year=2011)
or
SELECT DISTINCT
dynamic_col_1
,dynamic_col_1_text
FROM tab
WHERE ID (SELECT ID FROM tab WHERE year=2011)
GROUP BY dynamic_col_1, dynamic_col_1_text
Maybe it will help you.

Problem trying to fix SQL query to return a single result

I'm trying to use this query to delete the rows that are already on a linked server's Database:
GO
USE TAMSTest
GO
DELETE from [dbo].[Hour]
WHERE [dbo].[Hour].[InHour] = (SELECT [InHour]
FROM [TDG-MBL-005].[TAMSTEST].[dbo].[Hour])
GO
When there is only 1 row in the linked server's table, SELECT [InHour] FROM [TDG-MBL-005].[TAMSTEST].[dbo].[Hour] returns that single row and the DELETE works as expected. However, with multiple rows in the linked sever's table, it doesn't work since that part of the query returns mutiple rows as its result. How can I work around this?
If there is further information needed please ask, I need to get this done ASAP.
Thanks in advance,
Eton B.
Change your equal sign to an IN statement
DELETE from [dbo].[Hour]
WHERE [dbo].[Hour].[InHour] IN (SELECT [InHour]
FROM [TDG-MBL-005].[TAMSTEST].[dbo].[Hour])
The IN clause allows you to have multiple values in your WHERE clause, and can be used in subqueries as well. Here's more information.

Resources