Does Django automatically generate indexes for foreign keys, or does it just depend on the underlying DB policy ?
Django automatically creates an index for all models.ForeignKey columns.
From Django documentation:
A database index is automatically created on the ForeignKey. You can disable this by setting db_index to False. You may want to avoid the overhead of an index if you are creating a foreign key for consistency rather than joins, or if you will be creating an alternative index like a partial or multiple column index.
Related
Given a distributed system which is persisting records with a primary key being 'url'. Given that multiple servers are collecting data, the 'url' is a handy/convenient and accurate means of guaranteeing uniqueness. Our system queries documents by as frequently as 10,000 times per minute at the moment.
We would like to add another unique key, being a 'uuid' so that we can refer to resources as:
http://example.com/fju98hfhsiu
Rather than, for example:
http://example.com/?u=http%3A%2F%2Fthis.is.a.long.url.com%2Fthis_is%2Fa%2Fpagewitha%2Flong-url.html
It seems that creation of secondary index of UUID's is not ideal in cassandra. Is there any way to avoid creating a secondary index of UUID's in cassandra?
Let's start with the fact, that best practice and the main pattern of Cassandra is to create tables for queries, and not queries for tables, if you need to create index on table, it is "auto" anti pattern. Based on this, the simplest solution is just to use 2 tables with 2 keys.
In your case, the "uuid", is not UUID, it is some concatenation of domain and hash, of the rest of the URL i believe .If your application can generate this key on the time of request, you can just use it as the partition key, and the full URL as clustering key.
Also, if there is no hot domains,(for example http://example.com) you can use the domain as the partition key, and hash and long urls as clustering keys, creating materialized views to support different queries.
In the end, just add secondary index and see performance impact in your specific case. If it works for you, and you don't want do deal with 2 tables, materialized views etc, just use it.
We are using Hazelcast for in memory data grid. We want to extend it for analytic using in memory computation.I have few question regarding this
Which data structure to use ? (I do not have primary key as de-normalize table and have a huge data )
If IMap the only option then can we use composite key or dummy key which should have support for index and predicate?
This is not the right use case i.e Hazelcast can not used for analytics?
You can generate random keys based on UUID::randomUUID or you can create composite keys. Indexes can be created over values and keys (for keys use the magic keyword __key# and add the property of the key you're interested in.
Predicates use the same keyword if you're looking to run it against a composite key property, otherwise just query as you expect it from any other data.
We are trying to enforce a unique table constraint on certain datatables in SQL Server, which I have working but I am running into a few issues. I want it to be ordered by Primary Key, but if I include that in the Index Keys, it no longer enforces uniqueness because it obviously will always have a unique ID since its a primary key.
If I remove the ID from the indexed keys, it works as it is supposed to but it no longer sorts by Primary Key anymore, which is what I want. It sorts by another one of the columns.
How do I include the primary key in the constraint so I can use it for sorting, but have it be ignored when checking the table constraint for uniqueness(ie, it should still not allow a new record to be written if all other info is the same other than ID)?
UPDATE: How do I handle a situation where a table has more columns than can be put into an index? Can I not enforce no duplicate entries in these?
A Relational database is built based on Set theory and Predicate logic. And according to Set theory There is no difference between sets like A {1,2,3} & B {2,3,1}.
So this is the reason there is no guarantee in any RDBMS where results will come in particular order.
But you will get them in your order when you provide an ORDER BY in the SELECT statement explicitely.
So better you do it in front end or by adding an Order By clause to your query.
Do database engines utilize foreign keys transparently or a query should explicitly use them?
Based on my experience there is no explicit notion of foreign keys on a table, except that a constraint that maintains uniqueness of the key and the fact that the key (single or a group of fields) is a key which makes search efficient.
To clarify this, here is an example why it is important: I have a middleware (in particular ArcGIS for my case), for which I can control the back-end database (so I can create keys, indices, etc.) and I usually use the front (a RESTful API here). The middleware itself is a black box and to provide effective tools to take advantage of the underlying DBMS's capabilities. So what I want to understand is that if I build foreign key constraints and use queries that if implemented normally would translate into queries that would use those foreign keys, should I see performance improvements?
Is that generally the case or various engines do it differently? (I am using PostgresSQL).
Foreign keys aren't there to improve performance. They're there to enforce data integrity. They will decrease performance for inserts/updates/deletes, but they make no difference to queries.
Some DBMSs will automatically add an index to the foreign key field, which may be where the confusion is coming from. Postgres does not do this; you'll need to create the index yourself. (And yes, the database will use this index transparently.)
As far as I know Database engines needs specific queries to use foreign keys. You have to write some sort of join queries to get data from related tables.
However some Data access framework hides the complexity of accessing data from foreign keys by providing transparent way of accessing data from related tables but I am not sure that may provide much improvement in performance.
This is completely depends on the database engine.
In PostgreSQL constraints won't cause performance improvements directly, only indexes will do that.
CREATE INDEX is a PostgreSQL language extension. There are no provisions for indexes in the SQL standard.
However, adding some constraints will automatically create an index for that column(s) -- f.ex. UNIQUE & PRIMARY KEY constraints creates a btree index on the affected column(s).
The FOREIGN KEY constraint won't create indexes on the referencing column(s), but:
A foreign key must reference columns that either are a primary key or form a unique constraint. This means that the referenced columns always have an index (the one underlying the primary key or unique constraint); so checks on whether a referencing row has a match will be efficient. Since a DELETE of a row from the referenced table or an UPDATE of a referenced column will require a scan of the referencing table for rows matching the old value, it is often a good idea to index the referencing columns too. Because this is not always needed, and there are many choices available on how to index, declaration of a foreign key constraint does not automatically create an index on the referencing columns.
In tables where you need only 1 column as the key, and values in that column can be integers, when you shouldn't use an identity field?
To the contrary, in the same table and column, when would you generate manually its values and you wouldn't use an autogenerated value for each record?
I guess that it would be the case when there are lots of inserts and deletes to the table. Am I right? What other situations could be?
If you already settled on the surrogate side of the Great Primary Key Debacle then I can't find a single reason not use use identity keys. The usual alternatives are guids (they have many disadvatages, primarily from size and randomness) and application layer generated keys. But creating a surrogate key in the application layer is a little bit harder than it seems and also does not cover non-application related data access (ie. batch loads, imports, other apps etc). The one special case is distributed applications when guids and even sequential guids may offer a better alternative to site id + identity keys..
I suppose if you are creating a many-to-many linking table, where both fields are foreign keys, you don't need an identity field.
Nowadays I imagine that most ORMs expect there to be an identity field in every table. In general, it is a good practice to provide one.
I'm not sure I understand enough about your context, but I interpret your question to be:
"If I need the database to create a unique column (for whatever reason), when shouldn't it be a monotonically increasing integer (identity) column?"
In those cases, there's no reason to use anything other than the facility provided by the DBMS for the purpose; in your case (SQL Server?) that's an identity.
Except:
If you'll ever need to merge the table with data from another source, use a GUID, which will prevent duplicate keys from colliding.
If you need to merge databases it's a lot easier if you don't have to regenerate keys.
One case of not wanting an identity field would be in a one to one relationship. The secondary table would have as its primary key the same value as the primary table. The only reason to have an identity field in that situation would seem to be to satisfy an ORM.
You cannot (normally) specify values when inserting into identity columns, so for example if the column "id" was specified as an identify the following SQL would fail:
INSERT INTO MyTable (id, name) VALUES (1, 'Smith')
In order to perform this sort of insert you need to have IDENTITY_INSERT on for that table - this is not intended to be on normally and can only be on for a maximum of 1 tables in the database at any point in time.
If I need a surrogate, I would either use an IDENTITY column or a GUID column depending on the need for global uniqueness.
If there is a natural primary key, or the primary key is defined as a unique combination of other foreign keys, then I typically do not have an IDENTITY, nor do I use it as the primary key.
There is an exception, which is snapshot configuration tables which I am tracking with an audit trigger. In this case, there is usually a logical "primary key" (usually date of the snapshot and natural key of the row - like a cost center or gl account number for which the row is a configuration record), but instead of using the natural "primary key" as the primary key, I add an IDENTITY and make that the primary key and make a unique index or constraint on the date and natural key. Although theoretically the date and natural key shouldn't change, in these tables, if a user does that instead of adding a new row and deleting the old row, I want the audit (which reflects a change to a row identified by its primary key) to really reflect a change in the row - not the disappearance of a key and the appearance of a new one.
I recently implemented a Suffix Trie in C# that could index novels, and then allow searches to be done extremely fast, linear to the size of the search string. Part of the requirements (this was a homework assignment) was to use offline storage, so I used MS SQL, and needed a structure to represent a Node in a table.
I ended up with the following structure : NodeID Character ParentID, etc, where the NodeID was a primary key.
I didn't want this to be done as an autoincrementing identity for two main reasons.
How do I get the value of a NodeID after I add it to the database/data table?
I wanted more control when it came to generating my own IDs.