Why a simple index is not working in Postgres? - database

I have a simple table and some sample data
create table test_index (
id serial primary key,
name char(255)
);
insert into test_index (name) values ('tom');
insert into test_index (name) values ('john');
insert into test_index (name) values ('ken');
After created the table and data, I created an index for name column
CREATE INDEX idx_test_index_shop_name ON test_index(name);
But when I do the simple query on name column
select * from test_index where name = 'tom';
Its not using the index, just scan through the whole table
It seems a simple thing, but I can't figure out why its not working, does anyone know what is the cause of it?
Update 1
I see the answer suggest this is small data hence it doesn't use index, so I can understand why its not using it here.
But I have the similar setup with the a char(255) column and added index of that column, but the table have 16 millions rows, and it also didn't use the index created, anyone know why?
Update 2
Here is the actual table with index but not using it when querying the table
Here is the verbose output

The planner is estimating that with just three rows, it will be faster to pull the three rows from the heap (table). If it doesn't, it will go to the index, finding the matching pointers in the heap (table) and then go to the table to retrieve the data, making it slower. If you look at the example below, you will see that using the index is actually slower.
create table test_index (
id serial primary key,
name char(255)
);
insert into test_index (name) values ('tom');
insert into test_index (name) values ('john');
insert into test_index (name) values ('ken');
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
CREATE INDEX idx_test_index_shop_name ON test_index(name);
CREATE INDEX
Seq Scan on test_index (cost=0.00..1.04 rows=1 width=1028) (actual time=0.010..0.011 rows=1 loops=1)
Filter: (name = 'tom'::bpchar)
Rows Removed by Filter: 2
Planning Time: 0.324 ms
Execution Time: 0.032 ms
set enable_seqscan to off;
Index Scan using idx_test_index_shop_name on test_index (cost=0.13..8.15 rows=1 width=1028) (actual time=0.044..0.046 rows=1 loops=1)
Index Cond: (name = 'tom'::bpchar)
Planning Time: 0.065 ms
Execution Time: 0.072 ms

Related

(Alembic, SQLAlchemy) Can I copy data from non partitioned key to a partitioned one in the migration script?

I have a table needs to be partitioned, but since the postgresql_partition_by wasn't added while the creation of the table so am trying to:
create a new partitioned table that is similar the origin one.
moving the data from the old one to the new one.
drop the original one.
rename the new one.
so what is the best-practice to move the data from the old table to the new one ??
I tried this and it didn't work
COPY partitioned_table
FROM original_table;
also tried
INSERT INTO partitioned_table (column1, column2, ...)
SELECT column1, column2, ...
FROM original_table;
but both didn't work :(
noting that I am using Alembic to generate the migration scripts also am using sqlalchemy from Python
Basically you have two scenarios described below.
- The table is large and you need to split the data in several partitions
- The table gets the first partition and you add new partition for new data
Lets use this setup for the not partitioned table
create table jdbn.non_part
(id int not null, name varchar(100));
insert into jdbn.non_part (id,name)
SELECT id, 'xxxxx'|| id::varchar(20) name
from generate_series(1,1000) id;
The table contains id from 1 to 1000 and for the first case you need to split them in two partition for 500 rows each.
Create the partitioned table
with identical structure and constraints as the original table
create table jdbn.part
(like jdbn.non_part INCLUDING DEFAULTS INCLUDING CONSTRAINTS)
PARTITION BY RANGE (id);
Add partitions
to cover current data
create table jdbn.part_500 partition of jdbn.part
for values from (1) to (501); /* 1 <= id < 501 */
create table jdbn.part_1000 partition of jdbn.part
for values from (501) to (1001);
for future data (as required)
create table jdbn.part_1500 partition of jdbn.part
for values from (1001) to (1501);
Use insert to copy data
Note that this approach copy the data that means you need twice the space and a possible cleanup of the old data.
insert into jdbn.part (id,name)
select id, name from jdbn.non_part;
Check partition pruning
Note that only the partition part_500 is accessed
EXPLAIN SELECT * FROM jdbn.part WHERE id <= 500;
QUERY PLAN |
----------------------------------------------------------------+
Seq Scan on part_500 part (cost=0.00..14.00 rows=107 width=222)|
Filter: (id <= 500) |
Second Option - MOVE Data to one Partition
If you can live with the one (big) initial partition, you may use the second approach
Create the partitioned table
same as above
Attach the table as a partition
ALTER TABLE jdbn.part ATTACH PARTITION jdbn.non_part
for values from (1) to (1001);
Now the original table gets the first partition of your partitioned table. I.e. no data duplication is performed.
EXPLAIN SELECT * FROM jdbn.part WHERE id <= 500;
QUERY PLAN |
---------------------------------------------------------------+
Seq Scan on non_part part (cost=0.00..18.50 rows=500 width=12)|
Filter: (id <= 500) |
Similar answer with some hints to automation of partition creation here
After trying a few things, the solution was:
INSERT INTO new_table(fields ordered as the result of the select statement) SELECT * FROM old_table
I don't know if there was an easier way to get the fields ordered, but I tried inserting a row in DBEver from these options:
Then got names like these steps:

index on persisted computed column slower than index on non-persisted computed column

we have a big table with a varchar(max) column containing elements that are 'inspected' with charindex. For instance:
select x from y where charindex('string',[varchar_max_field]) > 0
In order to speed that up, I created a computed column with the result of the charindex command. As a test, I created both a persisted and a non-persisted version of that colum and created a nc-index for each, containing only the computed column:
CREATE TABLE [schema].[table] (
[other fields...]
[State] [NVARCHAR](MAX) NULL, /* Contains JSON information */
[NonPersistedColumn] AS (CHARINDEX('something',[State],(1))),
[PersistedColumn] AS (CHARINDEX('something',[State],(1))) PERSISTED )
CREATE NONCLUSTERED INDEX [ix_NonPersistedColumn] ON [schema].[table]
([NonPersistedColumn] ASC )
CREATE NONCLUSTERED INDEX [ix_PersistedColumn] ON [schema].[table]
([PersistedColumn] ASC )
Next,
SELECT TOP (50) [NonPersistedColumn] FROM [table] WHERE [NonPersistedColumn] > 0
uses an index seek on the index for the non-persisted column, as expected.
However,
SELECT TOP (50) [PersistedColumn] FROM [table] WHERE [PersistedColumn] > 0
uses the index of the non-persisted column (equal charindex logic, so ok) and performs an identical index seek.
If I force it to use the index on the persisted column, it reverts to a Key Lookup on the clusterd index with table's ID column as a seek predicate and the [State] column in the query plan Output List. I am only asking for the column in the nc index, it is not a covering query.
Why is it using the PK index (containing an ID column)?
Is that related to the PK always being added to the nc index?
In this way, there is little advantage in persisting the computed column, or am I missing something?
Links to the the plans (persisted and non-persisted, respectively):
https://www.brentozar.com/pastetheplan/?id=S1zoLmEEs
https://www.brentozar.com/pastetheplan/?id=S1CHwmE4j
Link to the plan for the query on the persisted column without the index hint (uses the index on the non-persisted column)
https://www.brentozar.com/pastetheplan/?id=HJB6j7EVs

Using Index scan instead of seek with lookup

I have a table with the following structure:
CREATE TABLE Article
(
id UNIQUEIDENTIFIER PRIMARY KEY,
title VARCHAR(60),
content VARCHAR(2000),
datePosted DATE,
srcImg VARCHAR(255),
location VARCHAR(255)
);
I then put a non clustered index on location:
CREATE NONCLUSTERED INDEX Articles_location
ON Articles (location);
Running a query like this one:
select a.content
from Articles a
where a.location = 'Japan, Tokyo';
results in an: "Index Scan (Clustered)"
Running another query like this :
select a.location
from Articles a
where a.location = 'Japan, Tokyo';
results in an: "Index Seek (NonClustered)"
So the nonclustered index is working. Why is it not doing a seek with lookup when I search by additional by columns but does a scan?
The total number of rows in the table is 200
The total amount of rows retrieved is 86 for this query
It looks like the query optimizer decides to scan the table instead of using an index based on the selectivity of the data.
It may be actually faster to refer to the table directly than to seek via the index and then perform a KeyLookup. This may not be the case if table has more rows (> 10k). Here 86 from 200 is more than 40%.
select a.content from Articles a where a.location = 'Japan, Tokyo';
-- clustered index scan
select a.location from Articles a where a.location = 'Japan, Tokyo';
-- covering index
Scans vs. Seeks
Thus, a seek is generally a more efficient strategy if we have a highly selective seek predicate; that is, if we have a seek predicate that eliminates a large fraction of the table.

how to define full text searching on table with two keys?

I have a table with 2 columns as primary key like below.
create table table1(key1 int NOT NULL,key2 int NOT NULL,content NVARCHAR(MAX), primary key(key1,key2))
I have created index on table with this query
CREATE unique INDEX index1 ON table1 (key1,key2);
and with this query, I create full-text searching
create fulltext index on table1 (content ) key index index1;
but I get this error because index must be single-column
'index1' is not a valid index to enforce a full-text search key. A full-text search key must be a unique, non-nullable, single-column index which is not offline, is not defined on a non-deterministic or imprecise nonpersisted computed column, does not have a filter, and has maximum size of 900 bytes. Choose another index for the full-text key.
and with single Column indexing, when I insert a new row I get a duplicate error.
what should I do?
I am using SQL Server and EF orm
Update
i solve this problem by creating a computed column that return unique data
ALTER TABLE Table1 ADD indexKey AS cast(key1 as float) + cast((cast(key2 as float)/power(10,len(key2))) as float) PERSISTED not null
and i create my index on this column and it work pretty fine.

Does SQL Server allow including a computed column in a non-clustered index? If not, why not?

When a column is included in non-clustered index, SQL Server copies the values for that column from the table into the index structure (B+ tree). Included columns don't require table look up.
If the included column is essentially a copy of original data, why does not SQL Server also allow including computed columns in the non-clustered index - applying the computations when it is copying/updating the data from table to index structure? Or am I just not getting the syntax right here?
Assume:
DateOpened is datetime
PlanID is varchar(6)
This works:
create nonclustered index ixn_DateOpened_CustomerAccount
on dbo.CustomerAccount(DateOpened)
include(PlanID)
This does not work with left(PlanID, 3):
create nonclustered index ixn_DateOpened_CustomerAccount
on dbo.CustomerAccount(DateOpened)
include(left(PlanID, 3))
or
create nonclustered index ixn_DateOpened_CustomerAccount
on dbo.CustomerAccount(DateOpened)
include(left(PlanID, 3) as PlanType)
My use case is somewhat like below query.
select
case
when left(PlanID, 3) = '100' then 'Basic'
else 'Professional'
end as 'PlanType'
from
CustomerAccount
where
DateOpened between '2016-01-01 00:00:00.000' and '2017-01-01 00:00:00.000'
The query cares only for the left 3 of PlanID and I was wondering instead of computing it every time the query runs, I would include left(PlanID, 3) in the non-clustered index so the computations are done when the index is built/updated (fewer times) instead at the query time (frequently)
EDIT: We use SQL Server 2014.
As Laughing Vergil stated - you CAN index persisted columns provided that they are persisted. You have a few options, here's a couple:
Option 1: Create the column as PERSISTED then index it
(or, in your case, include it in the index)
First the sample data:
CREATE TABLE dbo.CustomerAccount
(
PlanID int PRIMARY KEY,
DateOpened datetime NOT NULL,
First3 AS LEFT(PlanID,3) PERSISTED
);
INSERT dbo.CustomerAccount (PlanID, DateOpened)
VALUES (100123, '20160114'), (100999, '20151210'), (255657, '20150617');
and here's the index:
CREATE NONCLUSTERED INDEX nc_CustomerAccount ON dbo.CustomerAccount(DateOpened)
INCLUDE (First3);
Now let's test:
-- Note: IIF is available for SQL Server 2012+ and is cleaner
SELECT PlanID, PlanType = IIF(First3 = 100, 'Basic', 'Professional')
FROM dbo.CustomerAccount;
Execution Plan:
As you can see- the optimizer picked the nonclustered index.
Option #2: Perform the CASE logic inside your table DDL
First the updated table structure:
DROP TABLE dbo.CustomerAccount;
CREATE TABLE dbo.CustomerAccount
(
PlanID int PRIMARY KEY,
DateOpened datetime NOT NULL,
PlanType AS
CASE -- NOTE: casting as varchar(12) will make the column a varchar(12) column:
WHEN LEFT(PlanID,3) = 100 THEN CAST('Basic' AS varchar(12))
ELSE 'Professional'
END
PERSISTED
);
INSERT dbo.CustomerAccount (PlanID, DateOpened)
VALUES (100123, '20160114'), (100999, '20151210'), (255657, '20150617');
Notice that I use CAST to assign the data type, the table will be created with this column as varchar(12).
Now the index:
CREATE NONCLUSTERED INDEX nc_CustomerAccount ON dbo.CustomerAccount(DateOpened)
INCLUDE (PlanType);
Let's test again:
SELECT DateOpened, PlanType FROM dbo.CustomerAccount;
Execution plan:
... again, it used the nonclustered index
A third option, which I don't have time to go into, would be to create an indexed view. This would be a good option for you if you were unable to change your existing table structure.
SQL Server 2014 allows creating indexes on computed columns, but you're not doing that -- you're attempting to create the index directly on an expression. This is not allowed. You'll have to make PlanType a column first:
ALTER TABLE dbo.CustomerAccount ADD PlanType AS LEFT(PlanID, 3);
And now creating the index will work just fine (if your SET options are all correct, as outlined here):
CREATE INDEX ixn_DateOpened_CustomerAccount ON CustomerAccount(DateOpened) INCLUDE (PlanType)
It is not required that you mark the column PERSISTED. This is required only if the column is not precise, which does not apply here (this is a concern only for floating-point data).
Incidentally, the real benefit of this index is not so much that LEFT(PlanType, 3) is precalculated (the calculation is inexpensive), but that no clustered index lookup is needed to get at PlanID. With an index only on DateOpened, a query like
SELECT PlanType FROM CustomerAccounts WHERE DateOpened >= '2012-01-01'
will result in an index seek on CustomerAccounts, followed by a clustered index lookup to get PlanID (so we can calculate PlanType). If the index does include PlanType, the index is covering and the extra lookup disappears.
This benefit is relevant only if the index is truly covering, however. If you select other columns from the table, an index lookup is still required and the included computed column is only taking up space for little gain. Likewise, suppose that you had multiple calculations on PlanID or you needed PlanID itself as well -- in this case it would make much more sense to include PlanID directly rather than PlanType.
Computed columns are only allowed in indexes if they are Persisted - that is, if the data is written to the table. If the information is not persisted, then the information isn't even calculated / available until the field is queried.

Resources