Group polygons with Postgis (beginner) - postgis

Good morning all, I'm trying to group the polygons that touch each other into one polygon.
I use the following formula:
drop table if exists filtre4;
create table filtre4 as
(
select st_unaryunion(unnest(st_clusterintersecting(geom))) as geom
from data
)
It works perfectly when I have less than 6,000,000 items.
example: I have this normal message that appears, with the number of entities created.
https://zupimages.net/viewer.php?id=20/15/ielc.png
But if I exceed 6,000,000 entities, the query ends but no element is created in the table. I have this message which is displayed, but does not return anything to me.
https://zupimages.net/viewer.php?id=20/15/o41z.png
I do not understand.
Thank you.

So, I think you were using PgAdmin to run the queries. Oddly enough, sometimes even if there is a memory error or other runtime errors you will not be notified. (The same happened to me while testing.) In such a case I would recommend you save the query as a sql file and run it with psql to ensure you receive an error message:
psql -U #your_username -d #your_database -f "#your_sqlfiile.sql"
So first what I would do is adjust "work_mem" in the postgresql.conf. The default is 4MB, given your specs you can probably handle more memory per an operation. I would suggest 64MB to start per the following article:
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
Make sure to restart your server after doing so.
So, I used a comparable data set and received memory issues. Adjusting work_mem was the first part. Geohashing applies to your data set because you want smaller groups of clusters in order to fit the processing in memory, and geohashing allows you to order your geometry elements spatially so that it reduces the amount of sorting operations necessary while running ST_CLUSTERINTERSECTING (I don't think you have attributes to group on from what I understand). Here is what the following example does:
Creates the output table or truncates if it exists, creates a sequence or resets it if it exists
"ordered" Pull geometry from input table, and order by its geohash (*geometry must be in degree units like EPSG 4326 in order to geohash)
"grouped" Use the sequence to put the data in to x amount of groups. I divide by 10,000 here, but the idea is your total number of entities divided by x will give you y groups. Try making groups small enough to fit in memory but large enough to be performant. Then, it takes each group, performs ST_CLUSTERINTERSECTING, unnest, and finally a ST_UNARYUNION.
Insert the value with ST_COLLECT and another ST_UNARYUNION of the geometries.
Here's the code:
DO $$
DECLARE
input_table VARCHAR(50) := 'valid_geom';
input_geometry VARCHAR(50) := 'geom_good';
output_table VARCHAR(50) := 'unary_output';
sequence_name VARCHAR(50) := 'bseq';
BEGIN
IF NOT EXISTS (SELECT 0 FROM pg_class where relname = format('%s', output_table))
THEN
EXECUTE '
CREATE TABLE ' || quote_ident(output_table) || '(
geom geometry NOT NULL)';
ELSE
EXECUTE '
TRUNCATE TABLE ' || quote_ident(output_table);
END IF;
IF EXISTS (SELECT 0 FROM pg_class where relname = format('%s', sequence_name))
THEN
EXECUTE '
ALTER SEQUENCE ' || quote_ident(sequence_name) || ' RESTART';
ELSE
EXECUTE '
CREATE SEQUENCE ' || quote_ident(sequence_name);
END IF;
EXECUTE '
WITH ordered AS (
SELECT ' || quote_ident(input_geometry) || ' as geom
FROM ' || quote_ident(input_table) || '
ORDER BY ST_GeoHash(geom_good)
),
grouped AS (
SELECT nextval(' || quote_literal(sequence_name) || ') / 10000 AS id,
ST_UNARYUNION(unnest(ST_CLUSTERINTERSECTING(geom))) AS geom
FROM ordered
GROUP BY id
)
INSERT INTO ' || quote_ident(output_table) || '
SELECT ST_UNARYUNION(ST_COLLECT(geom)) as geom FROM grouped';
END;
$$;
Caveats:
Change the declare variables to your needs.
Since your input geometry is named 'geom' as geom will fail so I would change SELECT ' || quote_ident(input_geometry) || ' as geom to SELECT ' || quote_ident(input_geometry).
Make sure all of your input geometries are valid or ST_UNARYUNION will fail. Checkout ST_ISVALID and ST_MAKEVALID.
As said before geohashing requires the projection to be in degree units. Checkout ST_TRANSFORM, (I transformed my geometry data to 4326).
Let me know if you have any more questions.

Related

limitations in SQL used in Snowflake Dashboard Tiles

We are trying to use multi statements in Snowflake Dashboard tiles and do not quite understand the behaviour.
Let's say I create these 2 statements in my tile
SET MyVar = ( SELECT TOP 1 TABLE_NAME FROM DEV_CONTROL.INFORMATION_SCHEMA.TABLES WHERE NOT TABLE_NAME = :Subscription );
SELECT $MyVar;
If I highlight the first line and run it, I get a successful statement that does not return anything.
If I get back to my tile, I see "Statement executed successfully."
If I then go back to my SQL statements and highlights both, then run it, I get the name of the first table.
Going back to my dashboard, I now see the result of the second statement, my table name.
I find this both confusing and incoherent...
The data showed in the tile should reflect ALL the code I entered, not just what I happened to highlight and run the last time I looked at the code?...
Unfortunately, it's not documented well. As you mentioned, the tiles show only the result of the last executed query - at least this is what I observed on my tests.
Using Snowflake Scripting can be helpful here:
DECLARE
MyVar VARCHAR;
Rcount NUMBER;
BEGIN
SELECT TOP 1 TABLE_NAME INTO :MyVar FROM GOKHAN_DB.INFORMATION_SCHEMA.TABLES WHERE NOT TABLE_NAME LIKE 'TEST%' ORDER BY random();
SELECT IFNULL(ROW_COUNT,0) INTO :Rcount FROM GOKHAN_DB.INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = :MyVar;
RETURN :MyVar || ' ' || :Rcount;
END;
The above code will be executed as a block.

Creating replica of one schema to another schema in Exasol

Can anyone help me with creating a replica in EXASOL i.e. I need to copy all the tables including Views,Functions and Scripts from one schema to another schema in the same server.
For Eg.: I want all the data from Schema A to be copied not moved to Schema B.
Many thanks.
Thank you wildraid for your suggestion :)
In-order to copy DDL of all the tables in schema, I've got a simple way that will give us the DDLs for all the tables :
select t1.CREATE_STATEMENT||t2.PK||');' from
(Select C.COLUMN_TABLE,‘CREATE TABLE ’ || C.COLUMN_TABLE ||'(' || group_concat( ‘“’||C.COLUMN_NAME||'“' || ' ' || COLUMN_TYPE || case when (C.COLUMN_DEFAULT is not null
and C.COLUMN_IS_NULLABLE=‘true’) or(C.COLUMN_DEFAULT<>‘NULL’ and C.COLUMN_IS_NULLABLE=‘false’) then
' DEFAULT ' || C.COLUMN_DEFAULT end || case when C.COLUMN_IS_NULLABLE=‘false’ then ' NOT NULL ' end
order by column_ordinal_position) CREATE_STATEMENT
from EXA_ALL_COLUMNS C
where
upper(C.COLUMN_SCHEMA)=upper(‘Source_Schema’) and column_object_type=‘TABLE’
group by C.COLUMN_SCHEMA, C.COLUMN_TABLE order by C.COLUMN_TABLE ) t1 left join
(select CONSTRAINT_TABLE,‘, PRIMARY KEY (’ ||group_concat(‘“’||COLUMN_NAME||'“' order by ordinal_position) || ‘)’ PK
from EXA_ALL_CONSTRAINT_COLUMNS where
constraint_type=‘PRIMARY KEY’ and upper(COnstraint_SCHEMA)=upper(‘Source_Schema’) group by CONSTRAINT_TABLE ) t2
on t1.COLUMN_TABLE=t2.constraint_table
order by 1;
Replace the Source_Schema with your schema name and it will generate the Create statement that you can run on the EXAplus.
For copying the data, I have used the same way that you've mentioned in step 2.
Ok, this question consists of two smaller problems.
1) How to copy DDL of all objects in schema
If you need to copy only small amount of schemas, the fastest way is to use ExaPlus client. Right click on schema name and select "CREATE DDL". It will provide you with SQL to create all objects. You may simply run this SQL in context of new schema.
If you have to automate it, you may take a look at this official script: https://www.exasol.com/support/browse/SOL-231
It creates DDL for all schemas, but it can be adapted to use single schema only.
2) How to copy data
This is easier. Just run the following SQL to generate INSERT ... SELECT statements for every table:
SELECT 'INSERT INTO <new_schema>.' || table_name || ' SELECT * FROM <old_schema>.' || table_name || ';'
FROM EXA_ALL_TABLES
WHERE table_schema='<old_schema>';
Copy-paste result and run it to make the actual copy.

How to design an errors table for data validations in a star schema

I am working in SQL Server 2008. I have been tasked with writing a stored procedure to do some data validations on external data before we move it into our star schema data warehouse environment. One type of test requested is domain integrity / reference lookup from our external fact data tables to our dimension tables. To do this, I use the following technique:
SELECT
some_column
FROM some_fact_table
LEFT JOIN some_dimension_table
ON
some_fact_table.some_column = some_dimension_table.lookup_column
WHERE
some_fact_table.some_column IS NOT NULL
AND
some_dimension_table.lookup_column IS NULL
The SELECT clause will match the column definition for an errors table that I will eventually move the output into via SSIS. So, the SELECT clause actually looks like:
SELECT
primary_key,
'some_column' AS Offending_Column,
'not found in lookup' AS Error_Message,
some_column AS Offending_Value
But, because the fact tables are very large, we want to minimize the number of times that we have to select from it. Hence, I have just 1 query for each fact table to check each column in question, which looks something like:
SELECT
primary_key,
'col1|col2|col3' AS Potentially_Offending_Columns,
'not found in lookup|not found in lookup|not found in lookup' AS Error_Messages,
col1 + '|' + col2 + '|' + col3 AS Potentially_Offending_Values
FROM fact_table
LEFT JOIN dim_table1
ON
fact_table.col1 = dim_table1.lookup_column
LEFT JOIN dim_table2
ON
fact_table.col2 = dim_table2.lookup_column
LEFT JOIN dim_table3
ON
fact_table.col2 = dim_table3.lookup_column
WHERE
dim_table1.lookup_column IS NULL
OR
dim_table2.lookup_column IS NULL
OR
dim_table3.lookup_column IS NULL
This has some problems with it. (1) If any of the source column rows is null, then the string concatenation in Offending_Values will result in NULL. If I wrap each column with ISNULL (and swap the nulls for something like an empty string), then I won't be able to tell if the test failed because of a true empty string in the source or if it was swapped for an empty string. (2) If just one of the columns fail in the lookup, then the error message will still read 'not found in lookup|not found in lookup|not found in lookup', i.e., I can't tell which of the columns actually failed. (3) The Potentially_offending_Columns column in the output will always be static, which means I can't tell if any of the columns failed just by looking at it.
So, in effect, I am having some design problems with my errors table. Is there a standard way of outputting to an errors table in this situation? Or, if not, what do I need to fix to make the output readable and useful?
I don't know what your data looks like, but instead of using an empty string with ISNULL, couldn't you return the word FAIL or something that's meaningful to you. You could do a CASE WHEN for your 'not found in lookup' column.
CASE WHEN Col1 IS NULL THEN 'not found in lookup' ELSE '' END + '|' +
CASE WHEN Col2 IS NULL THEN 'not found in lookup' ELSE '' END + '|' +
CASE WHEN Col3 IS NULL THEN 'not found in lookup' ELSE '' END AS Error_Messages,
ISNULL(col1,'FAIL') + '|' + ISNULL(col2,'FAIL') + '|' + ISNULL(col3,'FAIL') AS Potentially_Offending_Values

Strange Behaviour on MSSQL Stored Procedure using Conditional WHERE with CONTAINS (Full Text Index)

I need some help from a MS SQL Master...
Short version:
When I execute a Conditional Where followed by a Contains, my query delays 1 minute (In its normal execution, it takes 200 milliseconds).
With this query, everything works fine:
Where
Contains(table.product_name, #search_word)
But with a Conditional Where, it takes 1 minute to execute:
Where
(#ExecuteWhereStatement = 0 Or (Contains(table.product_name, #search_word))
Long Version:
I'm using a stored procedure that receives some parameters. This Stored Procedure query a really large table, but everything is indexed properly and the query goes very well so far.
The main query is a little big, so I want to make the WHERE clause more smart possible, to avoid repeat multiple times the same statement.
The whole idea of the DataBase, is a history of purchases made by the State. So this query involves 3 tables:
Table 1 (table_purchase) - The purchase itself
id_purchase int (PK)
date_purchase datetime
buyer_code int (Nullable)
Table 2 (table_purchase_product) - The Items of a Purchase
id_product int (PK)
id_purchase int (FK of table_purchase)
product_quantity int (Nullable)
product_name varchar(255) (Nullable) (Full-Text-Indexed)
product_description varchar(2000) (Nullable) (Full-Text-Indexed)
id_product_bid_winner int (FK of table_product_bid)
Table 3 (table_product_bids) - The Bids for Each product of a Purchase
id_product_bid int (PK)
id_product int (FK of table_purchase_product)
product_brand varchar(255) (Nullable) (Full-Text-Indexed)
bid_value decimal (20,6)
So basicly, We have a "Purchase", that has several "Products (or Items)", and each "Product" has some "Bids (or Prices)"
And there is the Bad Girl (The SQL Stored Procedure):
ALTER PROCEDURE [dbo].[procPesquisaFullText]
#search_date datetime,
#search_word varchar(8000),
#search_brand varchar(255),
#only_one_bid bit = 0,
#search_buyer_code int = 0,
#quantityFrom decimal(20,6) = 0,
#quantityTo decimal(20,6) = 0
AS
BEGIN
SET NOCOUNT ON;
Declare #ExecuteWordSearch AS bit;
if (#uasg != 0 And #search_word = '')
begin
Set #ExecuteWordSearch = 0;
Set #search_word = 'nothing';
end
else
begin
Set #ExecuteWordSearch = 1;
end
Declare #ExecuteBrandSearch AS bit;
if (#search_brand = '')
begin
Set #ExecuteBrandSearch = 0;
Set #search_brand = 'nothing';
end
else
begin
Set #ExecuteMarcaSearch = 1;
end
begin
SELECT
pp.id_product,
pp.id_purchase,
pp.description
FROM
table_purchase_product pp
inner join table_purchase p on p.id_purchase = pp.id_purchase
WHERE
(p.date_purchase >= #search_date)
and (#search_buyer_code = 0 or (l.buyer_code = #search_buyer_code))
and (#quantityFrom = 0 or (li.product_quantity >= #QuantityFrom))
and (#quantityTo = 0 or (li.product_quantity <= #QuantityTo))
and (contains(pp.product_description, #search_word) or contains(pp.product_name, #search_word))
and (#only_one_bid = 0
or ((Select COUNT(*) From table_product_bid Where table_product_bid.id_product = pp.id_product) = 1))
and (#ExecuteBrandSearch = 0 Or (exists(
select 1
from table_product_bid ppb
where ppb.id_product_bid = pp.id_product_bid_winner
and contains(ppb.product_brand, #search_brand)
)
))
ORDER BY p.date_purchase DESC
end
END
So far, so good...
In the beginning I set two variables, used inside the query.
The first, verify if the user specified a "Buyer Code" AND didn't specify a "Search Word" (So, not the Product's description nor the Product's name is verified)
The second, verify if the user specified a "Specific Brand". If so, then the Winning Bid's BRAND is verified to match the users one.
Observation: You'll notice that when the "Search Words" is empty, I set them to "nothing". I do it because if the search term in the Contains is empty, it throws me a exception, even when it's not executed (I tested it in another query, absolutely isolated too)
As You can see, my user is able to search for:
- "Products" of Some Distinct Buyer "Purchase" (passing the #search_buyer_code parameter)
- A "Product" that contains a distinct word in its name or description
- A "Product" that has the Winner Bid of a specific Brand
- A "Product" that has only 1 bid at all
- A "Product" with a maximum and minimum quantity
And You'll notice that I used a lot of Conditions INSIDE the Where, producing a very dynamic Where, instead of using a "BIG If Else" statement, and repeating a lot of code. (I guess some "Googlers" will land here looking for Conditionally Wheres, and If so, I'm glad to help!)
Ok, so everything works veeery great at all. The query executes flawless. But here is the strange, damn, tricky issue:
If I want the user to be able to specify only a "Buyer Code" for Purchase, but No Word to Search of the Product using the code above (which is the first piece of code in the stored procedure does):
Changing from:
and (contains(pp.product_description, #search_word) or contains(pp.product_name, #search_word))
To:
and (#ExecuteWordSearch = 0 Or (contains(pp.product_description, #search_word) or contains(pp.product_name, #search_word)))
The query delays near 1 minute! (the execution is about 200 milliseconds for the query above).
But WHY??? I Use the same Logic of in all "Conditionally Wheres". I also use the same logic of having a flag/variable to indicate when execute the Where clause in the Word Search and the Brand Search, but the Brand Search works PERFECTLY! So Why, WHY only when I use the condition followed by a Contains my query delays 1 minute????
And this issue is not related with the amount of data, because I tried removing the entire Contains condition, allowing a lot of data to return, and it takes 1 second maximum...
Ow, It's a Microsoft SQL Server 2008 R2.
Thanks already for You read so far!
I cannot find the documentation I had around a very similar issue, but it sounded so familiar, I at least wanted to share what I remembered. Part of the issue is that for Sql Server, the full-text search engine is separate from the regular query execution engine, and so when you mix the two, in some cases, performance can tank. This is particularly true when the condition is an 'OR' rather than and 'AND'. (I remember hitting this exact situation). Conditional ANDs worked fine. But for OR, it's as if each condition gets evaluated repeatedly row by row.
Among the workarounds, one is, as already suggested, create your sql dynamically before execution.
Another would be to break the full-text and non-full text conditions into two search functions (literally UDF's) and then do whatever is needed (INTERSECT, EXCEPT, etc) with the two resultsets.
Try changing your WHERE clause to use a CASE statement, e.g.:
WHERE
CASE
WHEN #ExecuteWhereStatement = 0 THEN 1
WHEN #ExecuteWhereStatement = 1 THEN
CASE
WHEN CONTAINS([table].product_name, #search_word) THEN 1
ELSE 0
END
END = 1;

Intra-SELECT variables?

Would it be possible to alias an expression returned by a SELECT statement in order to refer to it in other parts of this same SELECT as if it would be a column among others ?
A kind of "temporary variable" whose scope would be limited to the SELECT statement, a little bit like the WITH clause before a SELECT to use a temporary named recorset.
A naive sample of what I'd like to achieve :
SELECT
FIRSTNAME + ' ' + NAME AS FULLNAME,
CASE WHEN LEN(FULLNAME)>3 THEN 1 ELSE 0 END AS ISCORRECT
FROM USERS
where FULLNAME could be used to determine the subsequent output field ISCORRECT, though not being a real column of the table USERS... instead of this laboured error-prone (but working) copy/paste :
SELECT
FIRSTNAME + ' ' + NAME AS FULLNAME,
CASE WHEN LEN(FIRSTNAME + ' ' + NAME)>3 THEN 1 ELSE 0 END AS ISCORRECT
FROM USERS
This sample well describes what I want, but I can easily imagine similar needs where FULLNAME might also be used in other parts of the SELECT statement : in a JOIN, in the WHERE, in a GROUP BY, ORDER BY, etc.
PS : I use SQL Server 2005 but would be also interested in any 2008-specific answer.
Thanks a lot ! :-)
Edit :
In spite of my high respect towards those of you proposing to use a side- or inner-query, I don't feel at ease with such possibilities. My sample really is a naive one. The true queries are rather with 30 output fields including complex expressions (including calls to CLR functions), 15 inner/left outer joins, and 20 additionnal where criteria. I suspect I had rather not multiplying to many indirections towards co-queries if I can avoid it.
I believe you would have to put it in an inner query, and then be able to refer to it outside of the query.
Simplest example based on yours:
select a.fullname, case when len(a.fullname) > 3 then 1
else 0 end as incorrect
from (select firstname + ' ' + name as fullname
from users) a
Example with a CTE
;with names (FULLNAME) as (
SELECT FIRSTNAME + ' ' + NAME
FROM USERS
) select
FULLNAME,
CASE WHEN LEN(FULLNAME) > 3 THEN 1 ELSE 0 END AS ISCORRECT
FROM names
You can use cross apply to concatenate strings or do calculations etc.. that involves just the current row.
select T.fullname,
case when len(T.fullname) > 3
then 1
else 0
end iscorrect
from users as U
cross apply
(select U.firstname+' '+U.name) as T(fullname)
order by T.fullname
Though not very satisfied with it, I choose (temporarily ?) a third option : I avoid co-queries and copy/pasting my complex hard-to-read expression (here symbolized by the simple one aliased as FULLNAME) by embeddind it in a scalar function... which is therefore called several times in different parts of my SELECT.
SELECT
dbo.GetFULLNAME(FIRSTNAME,NAME) AS FULLNAME,
CASE WHEN LEN(dbo.GetFULLNAME(FIRSTNAME,NAME))>3 THEN 1 ELSE 0 END AS ISCORRECT
FROM USERS
What do you think of it ?
(I precise that though more complex and unreadable than in my OP, the real expression remains a "simple" matter of string manipulation using several input fields, and doesn't involve any sub-querying or anything like that).

Resources