Mongodb inserting case insensitive strings automatically match without finding and inserting [duplicate] - database
#CompoundIndexes({
#CompoundIndex(name = "fertilizer_idx",
unique = true,
def = "{'name': 1, 'formula': 1, 'type': 1}")
})
public class Fertilizer extends Element implements Serializable {
//class stuff
}
Is it possible to create the index case insensitive? Right now it is differentiating from NAME to NAMe. Saving a second field lowercase (or uppercase) is not a possibility for me.
Thanks,
Pedro
Prior of MongoDB version 3.4 we were unable to create index with case insensitive.
In version 3.4 has collation option that allows users to specify language-specific rules for string comparison, such as rules for lettercase and accent marks.
The collation option has the following syntax:
collation: {
locale: <string>,
caseLevel: <boolean>,
caseFirst: <string>,
strength: <int>,
numericOrdering: <boolean>,
alternate: <string>,
maxVariable: <string>,
backwards: <boolean>
}
where the locale field is mandatory; all other fields are optional.
To create index with case insensitive we need to use mandatory field locale and strength field for string comparison level. strength allows value rage 1 - 5. read more about collation
The strength attribute determines whether accents or case are taken into account when collating or matching text
Example:
if strength=1 then role = Role = rôle
if strength=2 then role = Role < rôle
if strength=3 then role < Role < rôle
Comparison level doc
So we need to use strength=2 to create index. like:
db.collectionName.createIndex(
{ name: 1, formula: 1, type: 1 },
{
name: "fertilizer_idx",
collation: {locale: "en", strength: 2},
unique: true
}
)
N.B: collation option is not available for text indexes.
Yes, it is now available in MongoDB 3.4 with the new collation feature.
you can create a case insensitive index like this:
db.collection.createIndex({
name:1,
formula:1,
type:1
},
{
collation:{
locale:"en",
strength:2
}
});
where the strength attribute is the comparaison level
you can then get case insensitive match with this query:
db.collection.find({name: "name"}).collation({locale: "en", strength: 2});
see collation for details
if you upgraded to mongodb 3.4 from a previous version, you may need to set compatibility before creating the index like this
db.adminCommand( { setFeatureCompatibilityVersion: "3.4" } )
Spring Data Mongo2.2 provides 'Annotation-based Collation support through #Document and #Query.'
Ref. What's new in Spring Data Mongo2.2
#Document(collection = 'fertilizer', collation = "{'locale':'en', 'strength':2}")
public class Fertilizer extends Element implements Serializable {
#Indexed(unique = true)
private String name;
//class stuff
}
When the application is started, it will create the indexes along with respective collation for every document.
db.collection.createIndex(
{ name: 1, formula: 1, type: 1 },
{ name: "fertilizer_idx", unique: true, collation:{ locale: "en", strength: 2 } }
)
Use collation as an option for db.collection.createIndex()
more info here:
https://docs.mongodb.com/manual/reference/method/db.collection.createIndex/
here for locale/language information:
https://docs.mongodb.com/manual/reference/collation-locales-defaults/#collation-languages-locales
strength: integer
Optional. The level of comparison to perform. Possible values are:
1: Primary level of comparison. Collation performs comparisons of the base characters only, ignoring other differences such as diacritics and case.
2: Secondary level of comparison. Collation performs comparisons up to secondary differences, such as diacritics. That is, collation performs comparisons of base characters (primary differences) and diacritics (secondary differences). Differences between base characters takes precedence over secondary differences.
3: Tertiary level of comparison. Collation performs comparisons up to tertiary differences, such as case and letter variants. That is, collation performs comparisons of base characters (primary differences), diacritics (secondary differences), and case and variants (tertiary differences). Differences between base characters takes precedence over secondary differences, which takes precedence over tertiary differences.
This is the default level.
4: Quaternary Level. Limited for specific use case to consider punctuation when levels 1-3 ignore punctuation or for processing Japanese text.
5: Identical Level. Limited for specific use case of tie breaker.
Mongo 3.4 has collation, which allows users to specify language-specific rules for string comparison
Collation includes:
collation: {
locale: <string>,
caseLevel: <boolean>,
caseFirst: <string>,
strength: <int>,
numericOrdering: <boolean>,
alternate: <string>,
maxVariable: <string>,
backwards: <boolean>
}
As mentioned above by Shaishab Roy you should use collation.strength
There are no way to define that with annotations of spring data
But you can implement it manually. To implement this behavior with your spring application you should create event listener to listen that your application is ready, inject MongoOperations bean and define index like in example below:
#Configuration
public class MongoConfig {
#Autowired
private MongoOperations mongoOperations;
#EventListener(ApplicationReadyEvent.class)
public void initMongo() {
mongoOperations
.indexOps(YourCollectionClass.class)
.ensureIndex(
new Index()
.on("indexing_field_name", Sort.Direction.ASC)
.unique()
.collation(Collation.of("en").strength(2)));
}
}
Related
How can I generate a DB, that fits my room scheme?
I have a database with quite a lot of Entities and I want to preload data from a file, on first creation of the database. For that the scheme of Room needs to fit the scheme of the database file. Since converting the json scheme by hand to SQLite statements is very error-prone ( I would need to copy paste every single of the statements and exchange the variable names) I am looking for a possibility to automatically generate a database from the scheme, that I then just need to fill with the data. However apparently there´s no information if that is possible or even how to do so, out in the internet. It´s my first time working with SQLite (normally I use MySQL) and also the first time I see a database scheme in json. (Since standard MariaDB export options always just export the CREATE TABLE statements.) Is there a way? Or does Room provide anyway to actually get the create table statements as a proper text, not split up in tons of JSON arrays? I followed the guide on Android Developer Guidelines to get the json-scheme, so I have that file already. For those, who do not know it´s structure, it looks like this: { "formatVersion": 1, "database": { "version": 1, "identityHash": "someAwesomeHash", "entities": [ { "tableName": "Articles", "createSql": "CREATE TABLE IF NOT EXISTS `${TABLE_NAME}` (`id` INTEGER NOT NULL, `germanArticle` TEXT NOT NULL, `frenchArticle` TEXT, PRIMARY KEY(`id`))", "fields": [ { "fieldPath": "id", "columnName": "id", "affinity": "INTEGER", "notNull": true }, { "fieldPath": "germanArticle", "columnName": "germanArticle", "affinity": "TEXT", "notNull": true }, { "fieldPath": "frenchArticle", "columnName": "frenchArticle", "affinity": "TEXT", "notNull": false } ], "primaryKey": { "columnNames": [ "id" ], "autoGenerate": false }, "indices": [ { "name": "index_Articles_germanArticle", "unique": true, "columnNames": [ "germanArticle" ], "createSql": "CREATE UNIQUE INDEX IF NOT EXISTS `index_Articles_germanArticle` ON `${TABLE_NAME}` (`germanArticle`)" }, { "name": "index_Articles_frenchArticle", "unique": true, "columnNames": [ "frenchArticle" ], "createSql": "CREATE UNIQUE INDEX IF NOT EXISTS `index_Articles_frenchArticle` ON `${TABLE_NAME}` (`frenchArticle`)" } ], "foreignKeys": [] }, ... Note: My question was not, how to create the Room DB out of the scheme. To receive the scheme, I already had to create all the Entities and the database. But how to get the structure Room creates as SQL to prepopulate my Database. However, I think the answer is a really nice explanation, and in fact I found the SQL-Statements I was searching for in the generated Java-file, which was an awesome hint. ;)
Is there a way? Or does Room provide anyway to actually get the create table statements as a proper text, not split up in tons of JSON arrays? You cannot simply provide the CREATE SQL for Room, what you need to do is to generate the java/Kotlin classes (Entities) from the JSON and then add those classes to the project. native SQLite (i.e. not using Room) would be a different matter as it could do done at runtime. The way Room works is that the database is generated from the classes annotated with #Entity (at compile time). The Entity/classes have to exist for the compile to correctly generate the code that it generates. Furthermore the Entity(ies) have to be incorporated/included into a class for the Database, that being annotated with #Database (this class is typically abstract). Yet furthermore to access the database tables you have abstract classes or interfaces for the SQL each being annotated with #Dao and again these require the Entity classes as the SQL is checked at compile time. e.g. the JSON you provided would equate to something like :- #Entity( indices = { #Index(value = "germanArticle", name = "index_Articles_germanArticle", unique = true), #Index(value = "frenchArticle", name = "index_Articles_frenchArticle", unique = true) } , primaryKeys = {"id"} ) public class Articles { //#PrimaryKey // Could use this as an alternative long id; #NonNull String germanArticle; String frenchArticle; } so your process would have to convert the JSON to create the above and which could then be copied into the project. You would then need a Class for the database which could be for example :- #Database(entities = {Articles.class},version = 1) abstract class MyDatabase extends RoomDatabase { } Note that Dao classes would be added to body of the above along the lines of :- abstract MyDaoClass getDao(); Or does Room provide anyway to actually get the create table statements as a proper text, not split up in tons of JSON arrays? Yes it does .... At this stage if you compile it generates java (MyDatabase_Impl for the above i.e. the name of the Database class suffixed with _Impl). However as there are no Dao classes/interfaces. The database would unusable from a Room perspective (and thus wouldn't even get created). part of the code generated would be :- #Override public void createAllTables(SupportSQLiteDatabase _db) { _db.execSQL("CREATE TABLE IF NOT EXISTS `Articles` (`id` INTEGER NOT NULL, `germanArticle` TEXT NOT NULL, `frenchArticle` TEXT, PRIMARY KEY(`id`))"); _db.execSQL("CREATE UNIQUE INDEX IF NOT EXISTS `index_Articles_germanArticle` ON `Articles` (`germanArticle`)"); _db.execSQL("CREATE UNIQUE INDEX IF NOT EXISTS `index_Articles_frenchArticle` ON `Articles` (`frenchArticle`)"); _db.execSQL("CREATE TABLE IF NOT EXISTS room_master_table (id INTEGER PRIMARY KEY,identity_hash TEXT)"); _db.execSQL("INSERT OR REPLACE INTO room_master_table (id,identity_hash) VALUES(42, 'f7294cddfc3c1bc56a99e772f0c5b9bb')"); } As you can see the Articles table and the two indices are created, the room_master_table is used for validation checking.
How to search for a case insensitive substring in an array of objects
In ArangoDB I am playing around with a test collection that is the IMDB dataset downloaded from their site as csv. The movies document is structured as follows: movies: { _key: 123456, name: "Movie title", ... , releases: [ { title: "Local title", region: 'US', language: 'en', ... }, { title: "Other title", region: 'GB', language: '??' ... } ] } I have created an index on the movies.releases[*].title field. I am interested in querying that field, not only by equality, but also by using case insensitive and substring matching. The problem is that the only kind of query that uses the index is when I do something like that: FOR doc IN movies: FILTER 'search' IN doc.releases[*].title With this I can only match the whole string in a case sensitive way: how can I look for a substring in a case insensitive way? I cannot use a full-text index, since ArangoDB does not support it in arrays, and I cannot use LOWER() and CONTAINS() since it is an array. Any ideas? Thanks!
It's possible to nest your search, giving you the power to search within the array without having the constraints applied by using the '[*]' notation. Here is an example that does a search inside each releases array, looking for a case insensitive match, and then returning if it gets any hits. The FILTER function there will only return the movie if at least one of the releases has a match. FOR doc IN movies LET matches = ( FOR release IN doc.releases FILTER CONTAINS(LOWER(release.title), LOWER('title')) RETURN release ) FILTER LENGTH(matches) > 0 RETURN doc It's straight forward there to change 'title' to a parameter. Note: To put less pressure on the query, the goal of the matches variable is to have a LENGTH property greater than 0 if there is a release with your key word in it. The function above has the line RETURN release which returns possibly a large amount of data when you won't be reading it, so an alternative there is to replace that line with RETURN true as that is all that is needed to force matches to become an array and have a LENGTH greater than 0.
Fast matching of input string with strings in db
Given an algorithm of strings matching which works with certain strings (e.g. "123456789") and string patterns (e.g. "1*******9"). String patterns are not any kind of regexp or SQL LIKE pattern - they only provide "*" placeholder which means "a single digit or letter". So, the algorithm will treat these values as "equal": 12ABCDE89 12A***E89 **A****8* ********* The data is stored in the relational database (MS SQL Server) and .net core app addressed it via EntityFramework Core. The required scenario is to obtain 500 input strings (either certain or a pattern) and to find matched rows in the database (in the table containing 1 million of rows). First I implemented it using LIKE pattern matching (first I transformed input strings to the LIKE pattern and then built a predicate for WHERE clause), but tests showed that it has unacceptable performance. Can I implement this task using FULL-TEXT SEARCH feature of MSSQL? What will the predicate look like in this case? Any other ideas on the implementation?
You can try CLR user-defined function approach (example). But you don't need to use SQL queries at all. Just compare 2 strings using your algorithm. Such approach theoretically should be faster [SqlFunction( DataAccess = DataAccessKind.None, SystemDataAccess = SystemDataAccessKind.None, IsPrecise = true, IsDeterministic = true) ] // Have to be public and static public static bool CustomIsEqualTo(string baseString, string stringToCompare) { return true; }
Retrieve ContentPart using IContentManager, filtered by case-insensitive field
In the development of an Orchard module, how do I retrieve ContentParts case insensitively filtered by a field? I have tried var name = viewModel.Name.ToUpper(); var samples = _contentManager.Query<SamplePart, SamplePartRecord>() .Where(x => x.Name.ToUpper() == name) .List(); and I'm getting an error Index was out of range. Must be non-negative and less than the size of the collection. Parameter name: index but when I tried to retrieve without bothering if it's case sensitive var name = viewModel.Name; var samples = _contentManager.Query<SamplePart, SamplePartRecord>() .Where(x => x.Name == name) .List(); No errors reported. What gives?
Be aware that the expression inside the Where clause is being translated by NHibernate to an SQL query at some point. Hence you're pretty restricted to what you can do there. In this case it seems like the ToUpper method is not supported. Another thing - the string comparison behavior in SQL Server depends on the actual collation set on your database. By default it's case insensitive, so any string comparison will ignore case. So, if you stick to defaults you're good with just an ordinary == on two strings, like in your last example.
Is it possible to use a database sequence for a non-PK field in a Grails app?
In my application i need a unique value for a specific field in the database. This field has to be of type Integer. So i was wondering if it is possible to use a sequence for the field? And how to i implement that in my GORM domain class?
See the grails doc on how to use sequences. Depending on the source of the sequence ( (oracle/postgres) sequence number generator type / database table) static mapping = { intField1 generator:'sequence', params:[sequence:'sequence_name'] intField2 generator:'sequence', params:[table: 'hi_value', column: 'next_value', max_lo: 100] }