I need to query Netezza Data Warehouse. I've been told not to use any table for the moment. I had checked for the existence of DUAL table and found over the net like,- there is a _v_dual table as like Oracle DUAL.
Is the information is correct that I can query with _v_dual. Is this available by default for all Netezza database or is it something else I have to do in order bring up the DUAL functionality for Netezza Data Warehouse Querying.
You do not need to use _v_dual at all. Simply perform a SELECT with no FROM clause. For example:
select current_timestamp, UPPER('abc');
TIMESTAMP | UPPER
---------------------+-------
2016-06-04 02:25:20 | ABC
(1 row)
That being said, _v_dual was added as a comfort to Oracle users, and is available by default.
Related
I have a local SQL Server DB table with about 5 million records.
I snowflake server that has a similar table that is updated daily.
I need to update my local table with the new records that are added on the Snowflake table.
This code works but it takes about an hour to retrieve about 200,000 records. I insert the records into a local temp table and then insert them into my Sql server db.
Is there a faster way to retrieve the records from Snowflake and get them into SQL Server?
TIA
JohnB
SELECT A.*
into #Sale2020New
FROM OPENQUERY(SNOW, 'SELECT * FROM "DATA"."PUBLIC"."Sales" where "Sales"."Date" >= ''1/1/2020'' and "Sales"."Date" <= ''12/31/2020'' ') A
Left JOIN [SnowFlake].[dbo].Sale2020 B
ON B.PrimaryKey = A.PrimaryKey
WHERE
b.PrimaryKey IS NULL;
Does it take 1 hour just retrieving data from Snowflake or the whole process?
To speed up data retrieval from Snowflake, implement clustering on DATE column in snowflake table. This would prune micropartitions and avoid full table scan. You can get more information on clustering here
As for delta load, instead of a join you can apply filter on DATE column to current date and this will avoid a costly join operation and filter data at the start.
SELECT * FROM "SALES"
where "Sales"."Date" = '2020-04-07'
If I have a table, which structure was updated (ie system.query_log after latest update), but somehow distributed "view" has still old structure, how I could query data of new columns from that from entire cluster?
What I meant:
If you have distributed table, it could be done easily by:
select count(1) from distributed_query_log where event_date = '2019-01-24'
But select Settings.names, Settings.values from distributed_query_log where event_date = '2019-01-24' limit 1\G will fail, because it does not have those fields, when system.query_log has:
select Settings.names, Settings.values from system.query_log where event_date = '2019-01-24' limit 1\G
In Clickhouse release 1.1.54362 was added function cluster.
So, you can do it by:
select Settings.names, Settings.values from cluster('CLUSTER_TITLE', 'system.query_log') where event_date = '2019-01-24' limit 1\G
Where CLUSTER_TITLE - your cluster's title.
Thanks: Alexander Bocharov
In general case: after changing the underlying table you need to recreate (or alter) Distributed table.
I am currently using IN clause on a varchar field. Will using Contains of FTS help in performance?
For e.g.
Select * from Orders where City IN (‘London’ , ‘New York’)
vs
Select * from Orders where Contains (City, ‘London or New York’)
Thanks in advance.
Table Definition
CREATE TABLE Orders(ID INT PRIMARY KEY NOT NULL IDENTITY(1,1),City VARCHAR(100))
GO
INSERT INTO Orders
VALUES ('London'),('Newyork'),('Paris'),('Manchester')
,('Liverpool'),('Sheffield'),('Bolton')
GO
Create FTS on City Column using ID as the key
Used SSMS to create FTS Index.
Queries
-- Query 1
Select * from Orders
where City IN ('London' , 'NewYork')
GO
-- Query 2
Select * from Orders where
Contains (City, '"London" or "NewYork"')
GO
Execution Plans for both queries
As you can see The Query which used FTS costed 3 times more than the query which used IN Operator.
Having said this, when it comes to find Language specific terms in sql server FTS is the way to go, for example looking for Inflectional forms , Synonymous and much more Read Here for more information.
I have a simple model in postgresql database and I want to replicate rows of this table in another database in another machine.
The replication should be for some columns of this table and not for all columns.
What is the solution?
If by replication you mean import, look into foreign data wrappers:
http://wiki.postgresql.org/wiki/Foreign_data_wrappers
One of them ought to do the trick.
If you truly mean replication, then… if the other DB is not using Postgres, you could imagine using the above and triggers to keep the changes in sync, assuming of course that Postgres remains the master. If it is using Postgres, there are plenty of additional options to choose from:
http://www.postgresql.org/docs/current/static/high-availability.html
Take a look at the db_link module. You could try something like:
create extension dblink;
then once you have that installed:
select dblink_connect('myconn', 'hostaddr=127.0.0.1 port=5432 dbname=gis user=ubuntu password=ubuntu');
create table some_columns_table as select * from dblink('myconn','select col1, col2 from all_columns_table') AS t(col1 int, col2 text, col3 text);
select dblink_disconnect('myconn);
I need to execute the following SQL (SQL Server 2008) in a scheduled job periodically. The Query plan shows 53% cost is sort after the data is pulled from the oracle server. However, I've ordered the data in the openquery. How to force the query not to sort when merge joining?
merge target as t
using (select * from openquery(oracle, '
select * from t1 where UpdateTime > ''....'' order by k1, k2')
) as s on s.k1=t.k1 and s.k2=t.K2 -- the clustered PK of "target" is K1,k2
when matched then ......
when not matched then ......
Is there something like bulk insert's "with (order( { column [ ASC | DESC ] } [ ,...n ] ))"? will it help improve the query plan of the merge statement if it exists?
If the oracle table already have PK on K1,K2, will just using oracle.db.owner.tablename as target better? (will SQL Server figure out the index from oracle meta information?)
Or the best I can do is stored the oracle data in a local temp table and create a clustered primary key on K1,k2? I am trying to avoid to create a temp table because sometime the returned openquery data set can be large.
I think a table is the best way to go because then you can create whatever indexes you need, but there's no reason why it should be temporary; why not create a permanent staging table? A local join using local indexes will probably be much more efficient than a join on the results of a remote query, although the only way to know for sure is to test it and see.
If you're worried about the large number of rows, you can look into only copying over new or changed rows. If the Oracle table already has columns for row creation and update times, that would be quite easy.
Alternatively, you could consider using SSIS instead of a scheduled job. I understand that if you're not already using SSIS you may not want to invest time in learning it, but it's a very powerful tool and it's designed for moving large amounts of data into MSSQL. You would create a package with the following workflow:
Delete existing rows from the staging table (only if you can't populate it incrementally)
Copy the data from Oracle
Execute the MERGE statement