I'm considering using RavenDb for a new project we're doing at our company.
The project will consist of entities that have a set of dynamic properties based on the labels that a user might attach to them.
Example:
Entity called Image has:
Id
Name
Size
We want to use Labels (just another entity in the system) to allow the users to create specific properties for an image.
A label consists of a name, and might have a parent label.
If the user creates two Labels:
House
Car
The House label has the following properties:
Location
Color
Size
The Car label has the following properties:
Brand
Color
Engine type
Total doors
(These labels and properties must be managed by the user with special edit screens in our application).
When a user then creates an Image and assigns a specific label to that image, all the properties from that label must be present on the new image.
There can be multiple Labels attached to one Image. The Labels should be queried separately in order to show them in the GUI.
My question is:
I know how to do this in SQL. But I'm a bit concerned about the performance when there might be 300000 images with all kinds of properties. Especially when we want to search for those properties.
Can anyone give me a jump start (or an already existing tutorial) for this kind of setup? I'm not sure on how to model my entities for this kind of data.
Thnx!
Not quite sure what you mean by "Labels should be queried separately in order to show them in the GUI." but a document for each image with the properties from each label stored directly would be a starting point.
E.g.
public class Image
{
public string Id { get; set; }
public string Name { get; set; }
public Dictionary<string, Dictionary<string, object>> LabelProperties { get; set; }
}
You can then populate this as follows:
var img1 = new Image
{
Name = "Image1",
LabelProperties = new Dictionary<string, Dictionary<string, object>>
{
{
"Car",
new Dictionary<string, object>
{
{ "Brand", "GM"},
{ "Color", "Blue"},
{ "Engine type", "Big"},
{ "Total doors", 4}
}
},
{
"House",
new Dictionary<string, object>
{
{ "Location", "Downtown"},
{ "Color", "Blue"},
{ "Size", 240}
}
}
}
};
This then ends up looking quite nicely structure JSON in the db:
{
"Name": "Image1",
"LabelProperties": {
"Car": {
"Brand": "GM",
"Color": "Blue",
"Engine type": "Big",
"Total doors": 4
},
"House": {
"Location": "Downtown",
"Color": "Blue",
"Size": 240
}
}
}
You could then query this using dynamic indexes. E.g. to find all images that contain a blue house:
var blueHouses = session.Query<Image>()
.Customize(x => x.WaitForNonStaleResults())
.Where(x => Equals(x.LabelProperties["House"]["Color"], "Blue"));
Console.WriteLine("--- All Images Containing a House of Color Blue");
foreach (var item in blueHouses)
{
Console.WriteLine("{0} | {1}", item.Id, item.Name);
}
If you want to query on the labels themselves an index might be required. See the gist for the full example.
Related
First of all, I tried searching a lot but I am not able to find any resource which satisfies my need. I know there might be some answers already, if you know one please help with the link.
I know how to show search suggestions but I don't know how to show full search results when someone clicks on a search suggestion. Like how to do that in MERN stack with an example if possible.
I need a solution that best fits my scenario:
I have three models,
tags - holds tags
categories - holds categories
items - holds items data - has categories and tags both
currently, I am not storing references to categories and tags table instead
storing a copy directly inside items
Now, I basically want to search the items having the specific categories and tags when someone searches for a keyword.
What I am doing currently is, I search for tags matching the keyword, then categories, then taking out their _id(s) and finding that in items collection
const tags = await Tags.find(
{ tag: { $regex: category.toString(), $options: "i" } },
{ projection: { createdBy: 0 } });
const categories = await Categories.find(
{ category: { $regex: category.toString(), $options: "i" } },
{ projection: { createdBy: 0 } });
const tagsIdArray = tags.map((item) => new ObjectId(item._id));
const catIdArray = categories.map((item) => new
Object(item._id));
$match: {
$and: [
{
$or: [
{ "tags._id": { $in: [...tagsIdArray] } },
{ "category._id": { $in: [...catIdArray] } },
],},],},
And I know that this is not the best way, and it takes a lot of time to search for a given keyword.
Please suggest me schema structure and way to implement search with suggestions.
We are using Azure Cognitive Search to index various documents, e.g. Word or PDF files, which are stored in Azure Blob Storage. We would like to be able to translate the extracted content of non-English documents and store the translation result into a dedicated field in the index.
Currently the built-in Text Translation cognitive skill supports up to 50,000 characters on the input. The documents that we have could contain up to 1 MB of text. According to the documentation it's possible to split the text into chunks using the built-in Split Skill, however there's no skill that could merge the translated chunks back together. Our goal is to have all the extracted text translated and stored in one index field of type Edm.String, not an array.
Is there any way to translate large text blocks when indexing, other than creating a custom Cognitive Skill via Web API for that purpose?
Yes, the Merge Skill will actually do this. Define the skill in your skillset like the below. The "text" and "offsets" inputs to this skill are optional, and you can use "itemsToInsert" to specify the text you want to merge together (specify the appropriate source for your translation output). Use insertPreTag and insertPostTag if you want to insert perhaps a space before or after each merged section.
{
"#odata.type": "#Microsoft.Skills.Text.MergeSkill",
"description": "Merge text back together",
"context": "/document",
"insertPreTag": "",
"insertPostTag": "",
"inputs": [
{
"name": "itemsToInsert",
"source": "/document/translation_output/*/text"
}
],
"outputs": [
{
"name": "mergedText",
"targetName" : "merged_text_field_in_your_index"
}
]
}
Below is a snippet in C#, using Microsoft.Azure.Search classes. It follows the suggestion given by Jennifer in the reply above.
The skillset definition was tested to properly support translation of the text blocks bigger than 50k characters.
private static IList<Skill> GetSkills()
{
var skills = new List<Skill>();
skills.AddRange(new Skill[] {
// ...some skills in the pipeline before translation
new ConditionalSkill(
name: "05-1-set-language-code-for-split",
description: "Set compatible language code for split skill (e.g. 'ru' is not supported)",
context: "/document",
inputs: new []
{
new InputFieldMappingEntry(name: "condition", source: SplitLanguageExpression),
new InputFieldMappingEntry(name: "whenTrue", source: "/document/language_code"),
new InputFieldMappingEntry(name: "whenFalse", source: "= 'en'")
},
outputs: new [] { new OutputFieldMappingEntry(name: "output", targetName: "language_code_split") }
),
new SplitSkill
(
name: "05-2-split-original-content",
description: "Split original merged content into chunks for translation",
defaultLanguageCode: SplitSkillLanguage.En,
textSplitMode: TextSplitMode.Pages,
maximumPageLength: 50000,
context: "/document/merged_content_original",
inputs: new []
{
new InputFieldMappingEntry(name: "text", source: "/document/merged_content_original"),
new InputFieldMappingEntry(name: "languageCode", source: "/document/language_code_split")
},
outputs: new [] { new OutputFieldMappingEntry(name: "textItems", targetName: "pages") }
),
new TextTranslationSkill
(
name: "05-3-translate-original-content-pages",
description: "Translate original merged content chunks",
defaultToLanguageCode: TextTranslationSkillLanguage.En,
context: "/document/merged_content_original/pages/*",
inputs: new []
{
new InputFieldMappingEntry(name: "text", source: "/document/merged_content_original/pages/*"),
new InputFieldMappingEntry(name: "fromLanguageCode", source: "/document/language_code")
},
outputs: new [] { new OutputFieldMappingEntry(name: "translatedText", targetName: "translated_text") }
),
new MergeSkill
(
name: "05-4-merge-translated-content-pages",
description: "Merge translated content into one text string",
context: "/document",
insertPreTag: " ",
insertPostTag: " ",
inputs: new []
{
new InputFieldMappingEntry(name: "itemsToInsert", source: "/document/merged_content_original/pages/*/translated_text")
},
outputs: new [] { new OutputFieldMappingEntry(name: "mergedText", targetName: "merged_content_translated") }
),
// ... some skills in the pipeline after translation
});
return skills;
}
private static string SplitLanguageExpression
{
get
{
var values = Enum.GetValues(typeof(SplitSkillLanguage)).Cast<SplitSkillLanguage>();
var parts = values.Select(v => "($(/document/language_code) == '" + v.ToString().ToLower() +"')");
return "= " + string.Join(" || ", parts);
}
}
Given a model that looks like this:
{
[Key]
public string Id { get; set; }
[IsSearchable]
[Analyzer(AnalyzerName.AsString.Keyword)]
public string AccountId { get; set; }
}
And sample data for the AccountId that would look like this:
1-ABC123
1-333444555
1-A4KK498
The field can have any combination of letters/digits and a dash in the middle.
I need to be able to search on this field using queries like 1-ABC*. However, none of the basic analyzers seem to support the dash except Keyword, which isn't picking up any wildcard queries, only fully matching. I've seen some other articles about custom analyzers, but I can't get enough information about how to build it to solve this issue.
I need to know if I have to build a customer analyzer for this field, and do I need a different search analyzer and index analyzer?
I'm using StandardLucene for other alphanumeric fields without dashes, and I have another field with dashes but it's all digits, and Keyword works just fine there. It seems the issue is with a mix of letters AND digits.
Custom analyzer is indeed the way to go here.
Basically you could define a custom analyzer that uses a “keyword” tokenizer with a “lowercase” token filter.
Add the custom analyzer to your Index class, and change the analyzer name in your model to match the custom analyzer name:
new Index()
{
...
Analyzers = new[]
{
new CustomAnalyzer()
{
Name = "keyword_lowercase",
Tokenizer = TokenizerName.Keyword,
TokenFilters = new[] { TokenFilterName.Lowercase }
}
}
}
Model:
{
[Key]
public string Id { get; set; }
[IsSearchable]
[Analyzer("keyword_lowercase")]
public string AccountId { get; set; }
}
In the REST API this would look something like:
{
"fields": [{
"name": "Id",
"type": "Edm.String",
"key": true
},
{
"name": "AccountId",
"type": "Edm.String",
"searchable": true,
"retrievable": true,
"analyzer": "keyword_lowercase"
}],
"analyzers":[
{
"name":"keyword_lowercase",
"#odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"keyword_v2",
"tokenFilters":["lowercase"]
}
]
}
I am using Vue with Laravel and trying to create a page that shows a list of items based on a category. Each item is stored in the database with a "category" column. I want to be able to display only the categories (no repeats) that then could have maybe a dropdown that shows all items with that category.
For example, if these were objects, they would look like this:
{
name: "Car",
category: "Red"
},
{
name: "Car2",
category: "Blue"
},
{
name: "Motorcycle",
category "Red"
},
{
name: "Motorcycle2",
category: "Blue"
}
If the data were like this, I would want to display two options, Red and Blue, and under both I would want to show their respective objects. For example:
Red: Car, Motorcycle
Blue: Car2, Motorcycle2
I've tried using JavaScript to help "stack" them in a way when mapping over. I've tried selecting only unique values from the database.
I am using backgrid.js with backbone.js. I'm trying to populate JSON (user list) in backgrid. Below is my JSON,
[{"name": "kumnar", "emailId":"kumar#xxx.com",
"locations":{"name":"ABC Inc.,", "province":"CA"}
}]
I can access name & emailId as below,
var User = Backbone.Model.extend({});
var User = Backbone.Collection.extend({
model: User,
url: 'https://localhost:8181/server/rest/user',
});
var users = new User();
var columns = [{
name: "loginId",
label: "Name",
cell: "string"
}, {
name: "emailId",
label: "E-mail Id",
cell: "string"
}
];
var grid = new Backgrid.Grid({
columns: columns,
collection: users
});
$("#grid-result").append(grid.render().$el);
userEntities.fetch();
My question is, how do I add a column for showing locations.name?
I have specified locations.name in the name property of columns but it doesn't work.
{
name: "locations.name",
label: "E-mail Id",
cell: "string"
}
Thanks
Both backbone and backgrid currently don't offer any support for nested model attributes, although there are a number of tickets underway. To properly display the locations info, you can either turn the locations object into a string on the server and use a string cell in backgrid, or you can attempt to supply your own cell implementation for the locations column.
Also, you may try out backbone-deep-model as it seems to support the path syntax you are looking for. I haven't tried it before, but if it works, you can just create 2 string columns called location.name and location.province respectively.
It's really easy to extend Cell (or any of the existing extensions like StringCell). Here's a start for you:
var DeepstringCell = Backgrid.DeepstringCell = StringCell.extend({
render: function () {
this.$el.empty();
var modelDepth = this.column.get("name").split(".");
var lastValue = this.model;
for (var i = 0;i<modelDepth.length;i++) {
lastValue = lastValue.get(modelDepth[i]);
}
this.$el.text(this.formatter.fromRaw(lastValue));
this.delegateEvents();
return this;
},
});
In this example you'd use "deepstring" instead of "string" for your "cell" attribute of your column. Extend it further to use a different formatter (like EmailFormatter) if you want to reuse the built-in formatters along with the deep model support. That's what I've done and it works great. Even better is to override the Cell definitions to look for a "." in the name value and treat it as a deep model.
Mind you, this only works because I use backbone-relational which returns Model instances from "get" calls.