Creating schema not successful in Solr - solr

I have the data ready for index now, it is a json file:
{"122": "20180320-08:08:35.038", "49": "VIPER", "382": "0", "151": "1.0", "9": "653", "10071": "20180320-08:08:35.088", "15": "JPY", "56": "XSVC", "54": "1", "10202": "APMKTMAKING", "10537": "XOSE", "10217": "Y", "48": "179492540", "201": "1", "40": "2", "8": "FIX.4.4", "167": "OPT", "421": "JPN", "10292": "115", "10184": "337912000000002", "456": "101", "11210": "337912000000002", "1133": "G", "10515": "178", "10": "200", "11032": "-1", "10436": "20180320-08:08:35.038", "10518": "178", "11": "337912000000002", "75": "20180320", "10005": "178", "10104": "Y", "35": "RIO", "10208": "APAC.VIPER.OOE", "59": "0", "60": "20180320-08:08:35.088", "528": "P", "581": "13", "1": "TEST", "202": "25375.0", "455": "179492540", "55": "JNI253D8.OS", "100": "XOSE", "52": "20180320-08:08:35.088", "10241": "viperooe", "150": "A", "10039": "viperooe", "39": "A", "10438": "RIO.4.5", "38": "1", "37": "337912000000002", "372": "D", "660": "102", "44": "2.0", "10066": "20180320-08:08:35.038", "29": "4", "50": "JPNIK01", "22": "101"}
You can inspect the json here: https://jsonformatter.org/
I need to create index and enable searching on tags: 37(order_id), 75(trade_date) and 10242 (where available, this sample message doesn't have it)
My understanding is I need to create the file managed-schema, I added two fields as below:
<field name="order_id" type="text_general" indexed="true" stored="false" multiValued="true"/>
<field name="trd_date" type="text_general" indexed="true" stored="false" multiValued="true"/>
Then I go back to Solr Admin, I don't see the two new fields in Schema section
Anything I am missing here? and once the two fields are put in the managed-schema, can I add the json file through upload in Solr Admin?
Thank you very much.
Update: I have 100+ fields in the data to be index'ed, the data is a json file format. I wonder what is the best practice to create the schema file, thanks.

You shouldn't have to create the file yourself, that should be created by Solr (since it's a managed schema). If you're manually editing the file, you have to reload the collection/core or restart Solr afterwards.
Otherwise you can use the Schema API to add or change fields. If you're running in a cloud context / cluster, you'll want to use the Schema API so that your changes can be spread across all nodes (and your schema would live in Zookeeper in that case anyway).

Related

How to see BACnet scrape data on volttron.log

Can someone give me a tip on how see the BACnet scrape data on the VOLTTRON .log?
Would this have anything to do with the log level? Maybe I just cant see any data because of incorrect log levels? Any tips setting the log level appropriate greatly appreciated.
vctl config get platform.driver devices/201201
returns this:
{
"driver_config": {
"device_address": "12345:2",
"device_id": 201201
},
"driver_type": "bacnet",
"interval": 60,
"registry_config": "config://registry_configs/201201.csv"
}
Running:
vctl config get platform.driver registry_configs/201201.csv
Looks good I can see all of the device points that were discovered:
{
"Reference Point Name": "Oat",
"Volttron Point Name": "Oat",
"Units": "degreesFahrenheit",
"Unit Details": "",
"BACnet Object Type": "analogValue",
"Property": "presentValue",
"Writable": "FALSE",
"Index": "301",
"Write Priority": "",
"Notes": ""
},
{
"Reference Point Name": "RmTmpSpt",
"Volttron Point Name": "RmTmpSpt",
"Units": "degreesFahrenheit",
"Unit Details": "",
"BACnet Object Type": "analogValue",
"Property": "presentValue",
"Writable": "FALSE",
"Index": "302",
"Write Priority": "",
"Notes": ""
},
{
"Reference Point Name": "RmTmp",
"Volttron Point Name": "RmTmp",
"Units": "degreesFahrenheit",
"Unit Details": "",
"BACnet Object Type": "analogValue",
"Property": "presentValue",
"Writable": "FALSE",
"Index": "300",
"Write Priority": "",
"Notes": ""
}
Running a vctl status and even restarting UUID a and 4 doesnt seem to do anything.
UUID AGENT IDENTITY TAG STATUS HEALTH
a bacnet_proxyagent-0.5 platform.bacnet_proxy proxy running [73753] GOOD
4 platform_driveragent-4.0 platform.driver platform_driver running [73754] GOOD
6 simplewebagent-0.1 webagent simpleWebAgent
Also the BACpypes.ini has the proper ID address set for the IP address of the computer running VOLTTRON.
Any tips appreciated.
Generall you won't want all of the data going through the message bus in the log as that will make your log huge and fill up the system.
However, if you install and start a listener agent you will get that behaviour. A listener agent will write to the log everything that goes through the message bus. It is located in the examples/ListenerAgent from the volttron repository.

Is there anyway to store folder Content in a .JSON file variable?

i am completely new to JSON and Java in General.
i have a Task with a similar Block of code:
{
"name": "Chew Barka",
"breed": "Bichon",
"age": "2 years",
"weight": 8,
"bio": "The park, The pool or the Playground - I love to go anywhere!",
"filename": ""
},
And i would like to have the Contents of the folder:
"C:/Temp" for example
Stored in "filename"
so that when i call "filename" i get the "C:/Temp" Content

NoSQL Database schema design for simple courses sharing application

I am working on a mini-simple app using NoSQL. I currently have the design below and I am seeking practical advice and feedback on the database schema design.
Here is the basic overview of the website: A user can log in (via Google or Github), create a course review, other users can rate this course, give comments and also favorite this course.
Here is the schema design for now:
/user
"userid": "12jdfbvsidf3123"
"username": "admin"
"FirstName": "Jobs"
"LastName": "Tim"
"email": "123123#123.edu"
"rated"
"course1": 2
"course2" 3
"favorites":
"course1_id": 2
"course2_id" 3
/courses
"courseid": "sfgsdfwthrw34523"
"userid": "12jdfbvsidf3123"
"title": "the Art of Copy and Paste from StackOverflow"
"instructor": "you-know-nothing"
"rating": 0
"introduction": "bla bla bla"
"timestamp": "2017-08-09"
/comments
"commentid": "242334h5kjh2j4"
"userid": "12jdfbvsidf3123"
"courseid": "sfgsdfwthrw34523"
"content": "Great course, tell me all about how to copy and paste without thinking"
"timestamp": "2019-09-07"

How to handle multi-word/phrase synonyms in Azure Search

According to article https://azure.microsoft.com/pl-pl/blog/azure-search-synonyms-public-preview/ I should be to use multi-word/phrase synonym in synonymMaps
Multi-word synonyms
In many full text search engines, support for synonyms is limited to single words. Our team has engineered a solution that allows Azure Search to support multi-word synonyms. This allows for phrase queries (“”) to function properly while using synonyms. If someone has mapped ‘hot tub’ to ‘whirlpool bath’ and they then search for “large hot tub,” Azure Search will return matches which contain both “large hot tub” and “large whirlpool bath.”
However, in my case I got match on sub words.
My synonymMap looks like:
{"name":"map",
"format":"solr",
"synonyms":"Gastroenterology (acute and chronic),vomiting, diarrhoea, weight loss\n"}
And I have documents in search index which contains medicine disciplines like Gastroenterology (acute and chronic).
What I receives after ?search="vomiting" is:
{
"#search.score": 1.0405536,
"#search.highlights": {
"disciplines/name": [
"<em>Acute</em> <em>and</em> <em>chronic</em> ear disease",
"<em>Acute</em> <em>and</em> <em>chronic</em> skin disease",
"<em>Gastroenterology</em> (<em>acute</em> <em>and</em> <em>chronic</em>)",
"Haematology (<em>acute</em> <em>and</em> <em>chronic</em>)",
"Respiratory medicine (<em>acute</em> <em>and</em> <em>chronic</em>)"
],
And I am expecting:
{
"#search.score": 1.0405536,
"#search.highlights": {
"disciplines/services/translatedName": [
"<em>Gastroenterology (acute and chronic)</em>",
],
Am I doing something wrong?
I tried to cut main word to one-word like Gastroenterology but some of them simply cannot be cut.
Providing quotes like synonyms => "Gastroenterology (acute and chronic)" also does not work.
UPDATED
I was wondering why I thought there is problem.
Well, I provided:
{"name":"map",
"format":"solr",
"synonyms":"Gastroenterology (acute and chronic),vomiting, diarrhoea, weight loss\n"}
And actually using:
{"name":"map",
"format":"solr",
"synonyms":"Gastroenterology (acute and chronic),vomiting, diarrhoea, weight loss
=> Gastroenterology (acute and chronic)\n"}
In that case I vae 4 results:
"#odata.count": 4,
"value": [
{
"#search.score": 1.0137179,
"#search.highlights": {
"disciplines/services/translatedName": [
"<em>Acute</em> <em>and</em> <em>chronic</em> ear disease",
"<em>Acute</em> <em>and</em> <em>chronic</em> skin disease",
"<em>Gastroenterology</em> (<em>acute</em> <em>and</em> <em>chronic</em>)",
"Haematology (<em>acute</em> <em>and</em> <em>chronic</em>)",
"Respiratory medicine (<em>acute</em> <em>and</em> <em>chronic</em>)"
],
"equipment/translatedName": [
"Emergency <em>and</em> crictial care",
"In house skin <em>and</em> ear cyology"
],
"disciplines/translatedName": [
"Anaesthesia <em>and</em> analgesia",
"Emergency <em>and</em> critical care"
]
},
...
{
"#search.score": 0.33542877,
"#search.highlights": {
"disciplines/services/translatedName": [
"<em>Chronic</em> pain management"
],
"disciplines/translatedName": [
"Anaesthesia <em>and</em> analgesia"
]
},
...
{
"#search.score": 0.13757591,
"#search.highlights": {
"equipment/translatedName": [
"Emergency <em>and</em> crictial care"
],
"disciplines/translatedName": [
"Emergency <em>and</em> critical care"
]
},
...
{
"#search.score": 0.07112321,
"#search.highlights": {
"disciplines/services/translatedName": [
"<em>Chronic</em> pain management"
]
},
Could you explain to me how it works in that case?
Azure Search does support multi-word synonyms and the result in your case is as expected. There are a couple of things to be called out here.
First ?search="vomiting" will return docs that match 'vomiting' or specified synonyms anywhere within the document. The multi-word synonym Gastroenterology (acute and chronic) in the collection disciplines/name matches your query, resulting the document to be returned.
The second thing that is probably the source of confusion, is the highlighting. Azure search doesn't support phrase highlighting currently. If used with a phrase query, it highlights the individual terms in the phrase. Since the matching document also had individual terms elsewhere, all of those were highlighted. Check Azure search highlights for phrases with double quotes for more details.
So, the multi-word synonym expansion and search is functioning as expected. You can test this by indexing a test document that just contains Gastroenterology (acute and chronic) and then another that just contains acute and chronic. The query should result only return the 1st document.
If you have a strict requirement on highlighting phrases, you'll have to do some client side processing after retrieving the search results

Solr gives different querynorm for the same query

I am using Solr for searching institutions... My Solr DB has around 400k documents each of which has multiple fields like ("name","id","city",...)...
A document in my DB looks like this:
"docs":
{
"id": "91348",
"p_code": "71637",
"name": "University of Toronto - Mississauga",
"ext_name": "",
"city": "Mississauga",
"country": "CA",
"state": "ON",
"type": "academic/campus",
"alt_name": "",
"ext_city": "",
"zip": "L5L 1C6",
"alt_ext_city": "",
}
I write a query like {name: (university of toronto)}... Top two matches are:
"docs":
{
"id": "91348",
"p_code": "71637",
"name": "University of Toronto - Mississauga",
"ext_name": "",
"city": "Mississauga",
"country": "CA",
"state": "ON",
"type": "academic/campus",
"alt_name": "",
"ext_city": "",
"zip": "L5L 1C6",
"alt_ext_city": "",
"_version_": 1473710223400108000,
"score": 1.499069
},
{
"id": "10624",
"p_code": "7938",
"name": "University of Toronto",
"ext_name": "",
"city": "Toronto",
"country": "CA",
"state": "ON",
"type": "academic",
"alt_name": "Saint George Downtown Campus",
"ext_city": "",
"zip": "M5S 1A1",
"alt_ext_city": "",
"_version_": 1473710220148473900,
"score": 1.4967358
}
I am really surprised to see that "University of Toronto - Mississauga" returns a higher score than "university of Toronto". Intuitively, the field containing "University of Toronto - Mississauga" should get a lower score since it is longer than the other one.
I was also very surprised to see that Solr gives different values for querynorm as follows:
(0.03198291 = queryNorm) for the top document and (0.03203078 = queryNorm) for the second ranked document. I presumed that the query norm should be exactly the same for the all documents as it is only a function of the query.
I am not sure if I got something wrong about how Solr works or there is something wrong in indexing or configuration? Has anybody faced the same problem?
Make sure that omitNorms is set to false for that field and that your collection is using the latest version of the schema. Then re-index all of your documents for the change to the field to take effect.
I've found that some schema modifications are best treated with a complete wipe of the index prior to indexing in new content. I am not sure, but I believe this may be one of them. For most of the changes you can just re-index all of your content and overwrite the old stuff.

Resources