Under what conditions will the gin index in opengauss be used? - database

CREATE TABLE tsearch.pgweb(id int, body text, title text, last_mod_date date);
CREATE TABLE
omm=# INSERT INTO tsearch.pgweb VALUES(1, 'China, officially the People''s Republic of China(PRC), located in Asia, is the world''s most populous state.', 'China', '2010-1-1');
INSERT 0 1
omm=# INSERT INTO tsearch.pgweb VALUES(2, 'America is a rock band, formed in England in 1970 by multi-instrumentalists Dewey Bunnell, Dan Peek, and Gerry Beckley.', 'America', '2010-1-1');
INSERT 0 1
omm=# INSERT INTO tsearch.pgweb VALUES(3, 'England is a country that is part of the United Kingdom. It shares land borders with Scotland to the north and Wales to the west.', 'England','2010-1-1');
– To speed up text searches, GIN indexes can be created (specify english configuration to parse and normalize strings)
omm=# CREATE INDEX pgweb_idx_1 ON tsearch.pgweb USING gin(to_tsvector('english', body));
CREATE INDEX
– concatenated columns index
omm=# CREATE INDEX pgweb_idx_3 ON tsearch.pgweb USING gin(to_tsvector('english', title || ' ' || omm(# body));
CREATE INDEX
At this point, execute explain SELECT body FROM tsearch.pgweb WHERE to_tsvector(body) ## to_tsquery('america'); and find that the gin index is not used.
I would like to ask in what circumstances will such an index be used? (this type of index was not used when testing inserting 10,000 pieces of data)

Related

How to implement many-to-many-to-many database relationship?

I am building a SQLite database and am not sure how to proceed with this scenario.
I'll use a real-world example to explain what I need:
I have a list products that are sold by many stores in various states. Not every Store sells a particular Product at all, and those that do, may only sell it in one State or another. Most stores sell a product in most states, but not all.
For example, let's say I am trying to buy a vacuum cleaner in Hawaii. Joe's Hardware sells vacuums in 18 states, but not in Hawaii. Walmart sells vacuums in Hawaii, but not microwaves. Burger King does not sell vacuums at all, but will give me a Whopper anywhere in the US.
So if I am in Hawaii and search for a vacuum, I should only get Walmart as a result. While other stores may sell vacuums, and may sell in Hawaii, they don't do both but Walmart does.
How do I efficiently create this type of relationship in a relational database (specifically, I am currently using SQLite, but need to be able to convert to MySQL in the future).
Obviously, I would need tables for Product, Store, and State, but I am at a loss on how to create and query the appropriate join tables...
If I, for example, query a certain Product, how would I determine which Store would sell it in a particular State, keeping in mind that Walmart may not sell vacuums in Hawaii, but they do sell tea there?
I understand the basics of 1:1, 1:n, and M:n relationships in RD, but I am not sure how to handle this complexity where there is a many-to-many-to-many situation.
If you could show some SQL statements (or DDL) that demonstrates this, I would be very grateful. Thank you!
An accepted and common way is the utilisation of a table that has a column for referencing the product and another for the store. There's many names for such a table reference table, associative table mapping table to name some.
You want these to be efficient so therefore try to reference by a number which of course has to uniquely identify what it is referencing. With SQLite by default a table has a special column, normally hidden, that is such a unique number. It's the rowid and is typically the most efficient way of accessing rows as SQLite has been designed this common usage in mind.
SQLite allows you to create a column per table that is an alias of the rowid you simple provide the column followed by INTEGER PRIMARY KEY and typically you'd name the column id.
So utilising these the reference table would have a column for the product's id and another for the store's id catering for every combination of product/store.
As an example three tables are created (stores products and a reference/mapping table) the former being populated using :-
CREATE TABLE IF NOT EXISTS _products(id INTEGER PRIMARY KEY, productname TEXT, productcost REAL);
CREATE TABLE IF NOT EXISTS _stores (id INTEGER PRIMARY KEY, storename TEXT);
CREATE TABLE IF NOT EXISTS _product_store_relationships (storereference INTEGER, productreference INTEGER);
INSERT INTO _products (productname,productcost) VALUES
('thingummy',25.30),
('Sky Hook',56.90),
('Tartan Paint',100.34),
('Spirit Level Bubbles - Large', 10.43),
('Spirit Level bubbles - Small',7.77)
;
INSERT INTO _stores (storename) VALUES
('Acme'),
('Shops-R-Them'),
('Harrods'),
('X-Mart')
;
The resultant tables being :-
_product_store_relationships would be empty
Placing products into stores (for example) could be done using :-
-- Build some relationships/references/mappings
INSERT INTO _product_store_relationships VALUES
(2,2), -- Sky Hooks are in Shops-R-Them
(2,4), -- Sky Hooks in x-Mart
(1,3), -- thingummys in Harrods
(1,1), -- and Acme
(1,2), -- and Shops-R-Them
(4,4), -- Spirit Level Bubbles Large in X-Mart
(5,4), -- Spiirit Level Bubble Small in X-Mart
(3,3) -- Tartn paint in Harrods
;
The _product_store_relationships would then be :-
A query such as the following would list the products in stores sorted by store and then product :-
SELECT storename, productname, productcost FROM _stores
JOIN _product_store_relationships ON _stores.id = storereference
JOIN _products ON _product_store_relationships.productreference = _products.id
ORDER BY storename, productname
;
The resultant output being :-
This query will only list stores that have a product name that contains an s or S (as like is typically case sensitive) the output being sorted according to productcost in ASCending order, then storename, then productname:-
SELECT storename, productname, productcost FROM _stores
JOIN _product_store_relationships ON _stores.id = storereference
JOIN _products ON _product_store_relationships.productreference = _products.id
WHERE productname LIKE '%s%'
ORDER BY productcost,storename, productname
;
Output :-
Expanding the above to consider states.
2 new tables states and store_state_reference
Although no real need for a reference table (a store would only be in one state unless you consider a chain of stores to be a store, in which case this would also cope)
The SQL could be :-
CREATE TABLE IF NOT EXISTS _states (id INTEGER PRIMARY KEY, statename TEXT);
INSERT INTO _states (statename) VALUES
('Texas'),
('Ohio'),
('Alabama'),
('Queensland'),
('New South Wales')
;
CREATE TABLE IF NOT EXISTS _store_state_references (storereference, statereference);
INSERT INTO _store_state_references VALUES
(1,1),
(2,5),
(3,1),
(4,3)
;
If the following query were run :-
SELECT storename,productname,productcost,statename
FROM _stores
JOIN _store_state_references ON _stores.id = _store_state_references.storereference
JOIN _states ON _store_state_references.statereference =_states.id
JOIN _product_store_relationships ON _stores.id = _product_store_relationships.storereference
JOIN _products ON _product_store_relationships.productreference = _products.id
WHERE statename = 'Texas' AND productname = 'Sky Hook'
;
The output would be :-
Without the WHERE clause :-
make Stores-R-Them have a presence in all states :-
The following would make Stores-R-Them have a presence in all states :-
INSERT INTO _store_state_references VALUES
(2,1),(2,2),(2,3),(2,4)
;
Now the Sky Hook's in Texas results in :-
Note This just covers the basics of the topic.
You will need to create combine mapping table of product, states and stores as tbl_product_states_stores which will store mapping of products, state and store. The columns will be id, product_id, state_id, stores_id.

Partitioning table based on first letter of a varchar field

I have a massive table (over 1B records) that have a specific requirement for table partitioning:
(1) Is it possible to partition a table in Postgres based on the first character of a varchar field?
For example:
For the following 3 records:
a-blah
a-blah2
b-blah
a-blah and a-blah2 would go in the "A" partition, b-blah would go into the "B" partition.
(2) If the above is not possible with Postgres, what is a good way to evenly partition a large growing table? (without partitioning by create date -- since that is not something these records have).
You can use an expression in the partition by clause, e.g.:
create table my_table(name text)
partition by list (left(name, 1));
create table my_table_a
partition of my_table
for values in ('a');
create table my_table_b
partition of my_table
for values in ('b');
Results:
insert into my_table
values
('abba'), ('alfa'), ('beta');
select 'a' as partition, name from my_table_a
union all
select 'b' as partition, name from my_table_b;
partition | name
-----------+------
a | abba
a | alfa
b | beta
(3 rows)
If the partitioning should be case insensitive you might use
create table my_table(name text)
partition by list (lower(left(name, 1)));
Read in the documentation:
Table Partitioning
CREATE TABLE

TSQL Update Issue

Ok SQL Server fans I have an issue with a legacy stored procedure that sits inside of a SQL Server 2008 R2 Instance that I have inherited also with the PROD data which to say the least is horrible. Also, I can NOT make any changes to the data nor the table structures.
So here is my problem, the stored procedure in question runs daily and is used to update the employee table. As you can see from my example the incoming data (#New_Employees) contains the updated data and I need to use it to update the data in the Employee data is stored in the #Existing_Employees table. Throughout the years different formatting of the EMP_ID value has been used and must be maintained as is (I fought and lost that battle). Thankfully, I have been successfully in changing the format of the EMP_ID column in the #New_Employees table (Yeah!) and any new records will use this format thankfully!
So now you may see my problem, I need to update the ID column in the #New_Employees table with the corresponding ID from the #Existing_Employees table by matching (that's right you guessed it) by the EMP_ID columns. So I came up with an extremely hacky way to handle the disparate formats of the EMP_ID columns but it is very slow considering the number of rows that I need to process (1M+).
I thought of creating a staging table where I could simply cast the EMP_ID columns to an INT and then back to a NVARCHAR in each table to remove the leading zeros and I am sort of leaning that way but I wanted to see if there was another way to handle this dysfunctional data. Any constructive comments are welcome.
IF OBJECT_ID(N'TempDB..#NEW_EMPLOYEES') IS NOT NULL
DROP TABLE #NEW_EMPLOYEES
CREATE TABLE #NEW_EMPLOYEES(
ID INT
,EMP_ID NVARCHAR(50)
,NAME NVARCHAR(50))
GO
IF OBJECT_ID(N'TempDB..#EXISTING_EMPLOYEES') IS NOT NULL
DROP TABLE #EXISTING_EMPLOYEES
CREATE TABLE #EXISTING_EMPLOYEES(
ID INT PRIMARY KEY
,EMP_ID NVARCHAR(50)
,NAME NVARCHAR(50))
GO
INSERT INTO #NEW_EMPLOYEES
VALUES(NULL, '00123', 'Adam Arkin')
,(NULL, '00345', 'Bob Baker')
,(NULL, '00526', 'Charles Nelson O''Reilly')
,(NULL, '04321', 'David Numberman')
,(NULL, '44321', 'Ida Falcone')
INSERT INTO #EXISTING_EMPLOYEES
VALUES(1, '123', 'Adam Arkin')
,(2, '000345', 'Bob Baker')
,(3, '526', 'Charles Nelson O''Reilly')
,(4, '0004321', 'Ed Sullivan')
,(5, '02143', 'Frank Sinatra')
,(6, '5567', 'George Thorogood')
,(7, '0000123-1', 'Adam Arkin')
,(8, '7', 'Harry Hamilton')
-- First Method - Not Successful
UPDATE NE
SET ID = EE.ID
FROM
#NEW_EMPLOYEES NE
LEFT OUTER JOIN #EXISTING_EMPLOYEES EE
ON EE.EMP_ID = NE.EMP_ID
SELECT * FROM #NEW_EMPLOYEES
-- Second Method - Successful but Slow
UPDATE NE
SET ID = EE.ID
FROM
dbo.#NEW_EMPLOYEES NE
LEFT OUTER JOIN dbo.#EXISTING_EMPLOYEES EE
ON CAST(CASE WHEN NE.EMP_ID LIKE N'%[^0-9]%'
THEN NE.EMP_ID
ELSE LTRIM(STR(CAST(NE.EMP_ID AS INT))) END AS NVARCHAR(50)) =
CAST(CASE WHEN EE.EMP_ID LIKE N'%[^0-9]%'
THEN EE.EMP_ID
ELSE LTRIM(STR(CAST(EE.EMP_ID AS INT))) END AS NVARCHAR(50))
SELECT * FROM #NEW_EMPLOYEES
the number of rows that I need to process (1M+).
A million employees? Per day?
I think I would add a 3rd table:
create table #ids ( id INT not NULL PRIMARY KEY
, emp_id not NULL NVARCHAR(50) unique );
Populate that table using your LTRIM(STR(CAST, ahem, algorithm, and update Employees directly from a join of those three tables.
I recommend using ANSI update, not Microsoft's nonstandard update ... from because the ANSI version prevents nondeterministic results in cases where the FROM produces more than one row.

TSQL: Is it possible to modify the the value of an PK-Identity?

Let say I have this table
Id(PK) Country Capital City
+--------+---------------+------------------+
+ 1 + USA + Washington, D.C. +
+--------+---------------+------------------+
+ 2 + Japan + Tokyo +
+--------+---------------+------------------+
What if I want to insert France-Paris. But I want that row to have Id = 1, USA Id = 2, then Japan Id = 3?
Is there a way to do that using SQL? What I'm about to do is to generate a Script where I'll do that manually. However, if there is a different way, I'd like to apply it.
Thanks for helping
Yes, but it isn't pretty if you get your values wrong. Use IDENTITY_INSERT
SET IDENTITY_INSERT dbo.Country ON
--Do your manipulation here.
INSERT INTO dbo.County(Id, country, [Capital City]) SELECT 1, 'France', 'Paris'
SET IDENTITY_INSERT dbo.Country OFF
BUT: You have an identity and a primary key and you need to shuffle values about so you would need to drop the ID column and re-create it since you cannot update an IDENTITY column. Given you may have other tables relying on this, this is likely to be more trouble than it's worth. If you're really desperate to have France at position 1 you will have to do quite a bit of work to disable FKs then copy the ID column to a new column, insert your value where you want it, make sure the numbers are sequential and finally re-create the identity and primary key and re-enable foreign keys.
As a side note, try and avoid spaces in object names.

What's wrong with my fulltext search query?

I'm have some trouble with the fulltext CONTAINS operator. Here's a quick script to show what I'm doing. Note that the WAITFOR line simply gives the fulltext index a moment to finish filling up.
create table test1 ( id int constraint pk primary key, string nvarchar(100) not null );
insert into test1 values (1, 'dog')
insert into test1 values (2, 'dogbreed')
insert into test1 values (3, 'dogbreedinfo')
insert into test1 values (4, 'dogs')
insert into test1 values (5, 'breeds')
insert into test1 values (6, 'breed')
insert into test1 values (7, 'breeddogs')
go
create fulltext catalog cat1
create fulltext index on test1 (string) key index pk on cat1
waitfor delay '00:00:03'
go
select * from test1 where contains (string, '"*dog*"')
go
drop table test1
drop fulltext catalog cat1
The result set returned is:
1 dog
2 dogbreed
3 dogbreedinfo
4 dogs
Why is record #7 'breeddogs' not returned?
EDIT
Is there another way I should be searching for strings that are contained in other strings? A way that is faster than LIKE '%searchword%' ?
Just because MS Full-Text search does not support suffix search - only prefix, i.e. '* ' in front of '*dog *' is simply ignored. It is clearly stated in Books Online btw.
CONTAINS can search for:
A word or phrase.
The prefix of a word or phrase.
A word near another word.
A word inflectionally generated from another (for example, the word drive is the inflectional stem of drives, drove, driving, and driven).
A word that is a synonym of another word using a thesaurus (for example, the word metal can have synonyms such as aluminum and steel).
Where prefix term is defined like this:
< prefix term > ::= { "word *" | "phrase *" }
So, unfortunately: there's no way to issue a LIKE search in fulltext search.

Resources