multi dimensional points in solr

multi dimensional points in solr - solr

Lucene added multi dimensional point support in 6.0
https://issues.apache.org/jira/browse/LUCENE-7494
How can I can use this in Solr? I am hoping for a "simple end to end example" It would be a worthy blog post.
1) define in the schema
curl -XPOST -H 'Content-type:application/json' --data-binary '{
"add-field-type" : {
"name":"mypoint",
"class":"solr.DoublePointField" //is this right?
}}' http://localhost:8983/solr/mycore/schema
curl -XPOST -H 'Content-type:application/json' --data-binary '{
"add-field":{
"name":"coords",
"type":"mypoint",
"stored":true,
"indexed":true
}
}' http://localhost:8983/solr/mycore/schema
2) post data
curl -X POST -H 'Content-Type: application/json' --data-binary '{
"id": "1",
"coords": "1.5 -0.2222 14213 here I can use my n-dimensional point?",
}' http://localhost:8983/solr/mycore/update/json/docs
3) do a point range or distance query
??? I don't know how to do this.
Update: I ended up using postgresql, which supports k-d tree euclidean distance search through the cube plugin.

As of this writing, it has not been implemented yet. The underlying Lucene layer has it.
https://issues.apache.org/jira/browse/SOLR-11077

Related

Apache solr fuzzy search on list values

Enviornment - solr-8.9.0
To implement fuzzy search on column "name" (fuzzy search for 'alaistiar~') of csv file in apache solr i am issueing following query
http://localhost:8983/solr/bigboxstore/select?indent=on&q=name:'alaistiar~'&wt=json
To implement fuzzy search on column "name" (fuzzy search for 'shanka~') of csv file in apache solr
http://localhost:8983/solr/bigboxstore/select?indent=on&q=name:'shanka~'&wt=json
May i combine both the above query in a single and find out the documents?
My first http request is doing fuzzy search for value alaistiar~ on name colums and giving some score value and second http request is for shanka~. When i combine both with 'OR' operator Will it behave same as they are individual request.Acutally My purpose is that i dont want to invoke multiple http request for multiple names, Also i want fuzzy search name in output indicating that this document is for name alaistiar~ and this document is for name shanka~
I have loaded a csv file having 4 columns(Size-5GB.) with 100 milion records. .csv file has following column names -
'name', 'father_name', 'date_of_passing','admission_number'
I have created index on column 'name. To do this i have executed following curl request on managed-schema(solr-8.9.0, jdk-11.0.12)
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field":{"name":"name","type":"text_general","stored":true,"indexed":true }}' http://localhost:8983/solr/bigboxstore/schema
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field":{"name":"father_name","type":"text_general","stored":true,"indexed":false }}' http://localhost:8983/solr/bigboxstore/schema
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field":{"name":"date_of_passing","type":"pdate","stored":true,"indexed":false }}' http://localhost:8983/solr/bigboxstore/schema
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field":{"name":"admission_number","type":"text_general","stored":true,"indexed":false }}' http://localhost:8983/solr/bigboxstore/schema
Is this a right way to create index on 1 column(only on name) as described above?
Now i have list of 1 milion names. On each name i have to do fuzzy-search(column:name) on already loaded data. In the output, for each name I have to return list of java objects including all 4 columns of .csv file.
Note- In output I also have to include name which was supplied as input(in where clause).
For each name, i am doing fuzzy search as follows :
http://localhost:8983/solr/bigboxstore/select?indent=on&q=name:'alaistiar~'&wt=json.
To do this i have to execute 1 milion http request, which i dont want. Instead of executing 1milion http request, May i do in a single http request?
I understand that 'OR operator will not solve my problem because i will not able to group output documents on the basis of name which was passed as a input.

Yes, you could unify the queries by using "OR":
name:alaistiar~ OR name:shanka~
http://localhost:8983/solr/bigboxstore/select?indent=on&q=name:alaistiar~ OR name:shanka~&wt=json
You could omit "OR" if your default operator is "OR". The query would look like:
name:alaistiar~ name:shanka~
http://localhost:8983/solr/bigboxstore/select?indent=on&q=name:alaistiar~ name:shanka~&wt=json
Of course the "space" should be escaped in URL.
Hello, again. After you edited your question, it is more clear what you are looking for:
only 1 query for 1 mil names
see in the result which response to which name corresponds
There is a solutions, but you have to do some post processing. You can use a POST request with a json with params (for 1) and you can use hit highlighting (for 2) like I did here:
curl 'http://localhost:8983/solr/bigboxstore/query?hl.fl=name&hl.simple.post=</b>&hl.simple.pre=<b>&hl=on' -H "Content-Type: application/x-www-form-urlencoded" -X POST -d 'json={"query":"name:alaistiar~ name:shanka~"}'
The answer contains two parts: the first one with the result and the second one with the id and the highlights -> you will have to pair them up on id after receiving the response.

CouchDB - Mango Query to select records based on complex composite key

I have records with keys looking like this:
"001_test_66"
"001_testab_54"
"002_testbc_88"
"0020_tesgdtbc_38"
How can I query a couchdb database using Mango queries based on the first part of the key (001 or 002). The fourth one should fail if I search on '002'

You can use $regex operator described in chapter Condition Operators of CouchDB API Reference. In below example, I assumed _id to be the key you want to search by.
"selector": {
"_id": {
"$regex": "^001.*"
}
}
Here's an example using CURL (replace <db> with the name of your database).
curl -H 'Content-Type: application/json' -X POST http://localhost:5984/<db>/_find -d '{"selector":{"_id":{"$regex": "^001.*"}}}'

Solr add field to schema using curl

A:\DOS> curl -X POST -H 'Content-type:application/json' \
--data-binary '{"add-field":{"name":"timestamp","type":"date","indexed":true,"stored":true,\
"default":NOW,"multiValued":false}}' http://localhost:8983/solr/testt/schema
{
"reponseHeader":{
"status":0,
"QTime":0},
"errors":"no stream"}
I am trying to add a 'timestamp' field to solr and this is the error which I am getting. Can anyone help me figure out where I am wrong in this?

There may have two problem
&commit=true
At the end of URL commit parameter was not added.
schema.xml
Schema not contain field you wanted to set.

There is a problem with your curl command. Because if you try it differently it works
curl -d '#timestamp.json' -X POST -H 'Content-type:application/json' http://localhost:8983/solr/testt/schema
and create the timestamp.json file
{
"add-field":{"name":"timestamp","type":"date","indexed":true,"stored":true,"default":NOW,"multiValued":false}
}

How can I export all results from Blazegraph into a file?

I would like to export the results of my SPARQL query from Blazegraph into a file. However, it exports only the first page of the results. When I try to display all results, my browser crashes.
How can I fix this?
I'm running Blazegraph 2.1.2 on a local cluster.

To export results you can rely on curl and query your SPARQL endpoint through command line like this:
curl -X POST http://localhost:9999/bigdata/namespace/YOUR_NAMESPACE/sparql --data-urlencode 'query=SELECT * WHERE{ ?s p ?o } LIMIT 1' --data-urlencode 'format=json' > outputfile
You have to specify your endpoint's address of course and your query as you want. This is just an example but it may give you an idea.
Also you can modify your expected output format (CSV, XML, JSON, etc) and include headers if you want.
Here you can read more about it.

If you want to download all your graph you should use a CONSTRUCT query:
curl --X POST \
--url 'https://{host}/bigdata/namespace/{namespace}/sparql' \
--data-urlencode 'query=CONSTRUCT { ?s ?p ?o } where { ?s ?p ?o }' \
--header 'Accept: application/x-turtle' > outputfile.ttl
In this case I am exporting it in a turtle format.

Solr4.1 Cant delete documents older than 30 days

I am running Solr4.1.
I do have a version field but I do not understand how to delete by query with regards to time. I dont have any field in my schema that has a timestamp in it that I can see.
What I am trying to do is run a query that deletes all documents older than say 30 days.
I have tried everything I can find on the net.
curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" --data-binary '<delete><query>_version_:[* TO NOW-60DAYS] </query></delete>'
curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" --data-binary '<delete><query>timestamp:[* TO NOW-60DAYS] </query></delete>'
curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" --data-binary '<delete><query>createTime:[* TO NOW-60DAYS] </query></delete>'
other deletes work fine eg
curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" --data-binary '<delete><query>field:value</query></delete>'

You can enable the timestamp field that is included in the schema.xml, just it is commented out. That field is auto-populated with the current datetime each time a document is inserted into the index. Look for the following in your schema.xml:
<!-- Uncommenting the following will create a "timestamp" field using
a default value of "NOW" to indicate when each document was indexed.
-->
<!--
<field name="timestamp" type="date" indexed="true" stored="true"
default="NOW" multiValued="false"/>
-->
You will need to re-index your documents for them to have this value set.