Neo4j: Step by Step to create an automatic index - database

I am creating a new Neo4j database. I have a type of node called User and I would like an index on the properties of user Identifier and EmailAddress. How does one go setting up an index when the database is new? I have noticed in the neo4j.properties file there looks to be support for creating indexes. However when I set these as so
# Autoindexing
# Enable auto-indexing for nodes, default is false
node_auto_indexing=true
# The node property keys to be auto-indexed, if enabled
node_keys_indexable=EmailAddress,Identifier
And add a node and do a query to find an Identifier that I know exists
START n=node:Identifier(Identifier = "USER0")
RETURN n;
then I get an
MissingIndexException: Index `Identifier` does not exist
How do I create an index and use it in a start query? I only want to use config files and cypher to achieve this. i.e. at the present time I am only playing in the Power Tool Console.

Add the following to the neo4j.properties file
# Autoindexing
# Enable auto-indexing for nodes, default is false
node_auto_indexing=true
# The node property keys to be auto-indexed, if enabled
node_keys_indexable=EmailAddress,Identifier
Create the auto index for nodes
neo4j-sh (0)$ index --create node_auto_index -t Node
Check if they exist
neo4j-sh (0)$ index --indexes
Should return
Node indexes:
node_auto_index
When querying use the following syntax to specify the index
start a = node:node_auto_index(Identifier="USER0")
return a;
As the node is auto indexed the name of the index is node_auto_index
This information came from a comment at the bottom of this page
Update
In case you want to index your current data which was there before automatic indexing was turned on (where Property_Name is the name of your index)
START nd =node(*)
WHERE has(nd.Property_Name)
WITH nd
SET nd.Property_Name = nd.Property_Name
RETURN count(nd);

Indexes mainly made on property which is used for where condition. In Neo4j 2.0, indexes are easy to make now.
Create index on a label
CREATE INDEX ON :Person(name)
Drop index on a label
DROP INDEX ON :Person(name)
Create uniqueness constraint
CREATE CONSTRAINT ON (book:Book) ASSERT book.isbn IS UNIQUE
Drop uniqueness constraint
DROP CONSTRAINT ON (book:Book) ASSERT book.isbn IS UNIQUE
For listing all indexes and constraints in neo4j-browser, following command is useful
:schema
List indices and constraints for specific label with:
:schema ls -l :YourLabel

In Neo4j 2.0, you should use labels and the new constraints instead
CREATE CONSTRAINT ON (n:User) ASSERT n.Identifier IS UNIQUE
CREATE CONSTRAINT ON (n:User) ASSERT n.EmailAddress IS UNIQUE
If email isn't unique per user, just create a plain index instead:
CREATE INDEX ON :User(EmailAddress)

Related

Postgres Update add to array and if larger than 5 remove last

I have a little program that collects local news headlines all over a country. It should collect the top headline every day in an array and if it has more than 5 headlines, it should remove the oldest one and add the newest one at the top.
Heres the table:
CREATE TABLE place{
name text PRIMARY KEY,
coords text,
headlines json[]
}
The headlines array is basically just json objects with a time and headline property, that would be upserted like this:
insert into place VALUES ('giglio','52.531677;13.381777',
ARRAY[
'{"timestamp":"2012-01-13T13:37:27+00:00","headline":"costa concordia sunk"}'
]::json[])
ON CONFLICT ON CONSTRAINT place_pkey DO
UPDATE SET headlines = place.headlines || EXCLUDED.headlines
But obviously as soon at it hits 5 elements in the array, it will keep adding onto it. So is there a way to add these headlines and limit them to 5?
Alternative Solution:
insert into place VALUES ('giglio','52.531677;13.381777',
ARRAY[
'{"timestamp":"2012-01-13T13:37:27+00:00","headline":"costa concordia sunk"}'
]::json[])
ON CONFLICT ON CONSTRAINT place_pkey DO
UPDATE SET headlines = place.headlines[0:4] || EXCLUDED.headlines
RETURNING *
So is there a way to add these headlines and limit them to 5?
I believe yes.
You can define max array size
(search section 8.15.1 here https://www.postgresql.org/docs/current/arrays.html#ARRAYS-DECLARATION)
like this
headlines json[5]
But current implementation of Postgres does not enforce it (still good to do it for future compatibility and proper data model definition).
So I'd try if CHECK constraint is of any help here:
headlines json[5] CHECK (array_length(headlines) < 6)
This should give you a basic consistency check. From here there are two ways to continue (which seems out of the scope of this question):
Catch the PG exception on your app layer, clean up the data, and try inserting it again
Implement a function in your DB schema, that would attempt insert and cleanup.
Here's how I ended up doing it:
insert into place VALUES ('giglio','52.531677;13.381777',
ARRAY[
'{"timestamp":"2012-01-13T13:37:27+00:00","headline":"costa concordia sunk"}'
]::json[])
ON CONFLICT ON CONSTRAINT place_pkey DO
UPDATE SET headlines = place.headlines[0:4] || EXCLUDED.headlines
RETURNING *
EXCLUDED explanation
https://www.postgresql.org/docs/9.5/sql-insert.html

Indexing in room prepackaged database not working

I have a table in a prepackaged database whose index I created using this query
CREATE INDEX index_less_equal_to_L ON entries_less_equal_to_L(entry_word);
And I specificied in my room entity using this
#Entity(tableName = "entries_less_equal_to_L", indices = {#Index("index_less_ equal_to_L"), #Index(value = "entry_word")})
But it doesn't work, shows me this
[build error output]: https://i.stack.imgur.com/lQmtp.png
What I'm I doing wrong?
You are trying to create 2 indicies, 1 for each #Index.
The first will be named index_less_ equal_to_L but have no columns (hence the error message).
The second will be created (successfully) with a generated name and be on the entry_word column. The name of the index will be as per :-
If not set, Room will set it to the list of columns joined by '' and prefixed by "index${tableName}". So if you have a table with name "Foo" and with an index of {"bar", "baz"}, generated index name will be "index_Foo_bar_baz".
https://developer.android.com/reference/androidx/room
So you need to combine the two into a single index.
You could use either (note the space between _ and equal_to_L removed as I assume that this is a typing error)
indices = {#Index(name = "index_less_equal_to_L",value = "entry_word")}
or
indices = {Index(name ="index_less_equal_to_L",value = {"entry_word"})}
The latter being the more flexible as it allows for composite indicies (multiple columns).

Sigle-column or multi-column index(-es)?

I have a table with 3 columns: "Id", "A", "B".
All of them are searchable. Id is identity and used only to search exact rows so it's clear. But I have doubts about "A" and "B". I have a 3 cases to search in my application: search by "A", search by "B" and search by "A" and "B" simultaneously. So i'm not sure which index type to choose. Should I use two single-column indexes or one multi-column? Or maybe it's better to combine single-column indexes with multi-column (3 indexes in total)? I don't really care about INSERT/UPDATE/DELETE duration, my target priority is to make SELECT as fast as possible.
I use SQL Server 2017.
Thank you.
I think two additional indexes will be enough:
CREATE INDEX IDX_YourTable_AB ON YourTable(A,B) -- the first column here which has more different values
CREATE INDEX IDX_YourTable_B ON YourTable(B)INCLUDE(A)
If you have other columns in this table you can create included indexes:
CREATE INDEX IDX_YourTable_AB ON YourTable(A,B) INCLUDE(C,D,E,...)
CREATE INDEX IDX_YourTable_B ON YourTable(B) INCLUDE(A,C,D,E,...)
Index IDX_YourTable_AB might used for conditions WHERE A='...' or WHERE A='...' AND B='...' or WHERE A LIKE '...%' AND B='...' - used only A column or A&B columns.
Index IDX_YourTable_B might used for conditions with B column only (WHERE B='...' or WHERE B LIKE '...%').
Also try to test CREATE INDEX IDX_YourTable_BA ON YourTable(B,A) instead of CREATE INDEX IDX_YourTable_B ON YourTable(B)INCLUDE(A). Maybe it will be better.

auto increment field in Peewee

Is there a way to define autoincrement Field in peewee.
I understand we could define sequence but the need to create the sequence manually and not managed by create_tables deters me from using it. ( The build process is managed by create tables and I would prefer not to add manual steps )
import peewee
class TestModel(peewee.Model):
test_id = peewee.BigIntegerField(sequence='test_id_seq')
Alternate to the above code I would rather have. As most databases have serial field I dont see a point maintaining a sequence.
import peewee
class TestModel(peewee.Model):
test_id = peewee.AutoIncremenetIntField()
Either you can use PrimaryKeyField() as #wyatt mentioned in comment
or you can use Playhouse- Signal Support (peewee extensions)
from playhouse.signals import Model, pre_save
class MyModel(Model):
data = IntegerField()
#pre_save(sender=MyModel)
def on_save_handler(model_class, instance, created):
# find max value of temp_id in model
# increment it by one and assign it to model instance object
next_value = MyModel.select(fn.Max(MyModel.temp_id))[0].temp_id +1
instance.temp_id = next_value
The given answers here are outdated but this was still my first Google search result.
Peewee has a special field type for an auto incrementing primary key called AutoField:
The AutoField is used to identify an auto-incrementing integer primary
key. If you do not specify a primary key, Peewee will automatically
create an auto-incrementing primary key named “id”.
Take a look at the documentation. Example usage:
class Event(Model):
event_id = AutoField() # Event.event_id will be auto-incrementing PK.

clone some relationships according to a condition

I exported two tables named Keys and Acc tables as CSV files from SQL Server and imported them successfully to Neo4J by using the commands below.
CREATE INDEX ON :Keys(IdKey)
USING PERIODIC COMMIT 500
LOAD CSV FROM 'file:///C:/Keys.txt' AS line
MERGE (k:Keys { IdKey: line[0] })
SET k.KeyNam=line[1], k.KeyLib=line[2], k.KeyTyp=line[3], k.KeySubTyp=line[4]
USING PERIODIC COMMIT 500
LOAD CSV FROM 'file:///C:/Acc.txt' AS line
MERGE (callerObject:Keys { IdKey : line[0] })
MERGE (calledObject:Keys { IdKey : line[1] })
MERGE (callerObject)-[rc:CALLS]->(calledObject)
SET rc.AccKnd=line[2], rc.Prop=line[3]
Keys stands for the source code objects, Acc stands for relations among them. I imported these two tables three times for three different application projects. So to maintain IdKey property being unique for three applications, I concatenated a five character prefix to IdKey to identify the Object for Application while exporting from sql server because we can not create index based on multiple fields as I learnt from manuals. Now my aim is constructing the relations among applications. For example:
Node1 is a source code object of Application1
Node2 is another source code object of Application1
Node3 is a source code object of Application2
There is already a CALL relation created from Node1 to Node2 because of the record in Acc already imported.
The Name of the Node2 is equal to name of Node3. So we can say that Node2 and Node3 are in fact the same source codes. So we should create a relation from Node1 to Node3. To realize it, I wrote a command below. But I want to be sure that it is correct. Because I do not know how long it will execute.
MATCH (caller:Keys)-[rel:CALLS]->(called:Keys),(calledNew:Keys)
WHERE calledNew.KeyNam = called.KeyNam
and calledNew.IdKey <> called.IdKey
CREATE (caller)-[:CALLS]->(calledNew)
This following query should be efficient, assuming you also create an index on :Keys(KeyNam).
MATCH (caller:Keys)-[rel:CALLS]->(called:Keys)
WITH caller, COLLECT(called.KeyNam) AS names
MATCH (calledNew:Keys)
WHERE calledNew.KeyNam IN names AND NOT (caller)-[:CALLS]->(calledNew)
CREATE (caller)-[:CALLS]->(calledNew)
Cypher will not use an index when doing comparisons directly between property values. So this query puts all the called names for each caller into a names collection, and then does a comparison between calledNew.KeyNam and the items in that collection. This causes the index to be used, and will speed up the identification of potential duplicate called nodes.
This query also does a NOT (caller)-[:CALLS]->(calledNew) check, to avoid creating duplicate relationships between the same nodes.

Resources