Index Data into Solr Core using Postman - solr

I have created a Solr core. Now i want to insert data into it using Postman. Can we do that and how can we insert data to solr core using Postman. Is there any REST API in Apache SOLR which can be directly called from Postman and insert data to Solr Core.
This is my JSON data which i want to insert. I am getting this exception
Exception writing document id 6 to the index; possible analysis error: For input string: \"\"","code":400}}
[{
"id":6,
"AssetId": 123456,
"Availability": "Up"
},
{
"id":7,
"AssetId": 223456,
"Availability": "Up"
},
{
"id":8,
"AssetId": 987456,
"Availability": "Up"
},
{
"id":9,
"AssetId": 122726,
"Availability": "Up"
}]
I want to insert this data to my SOLR Core names as asset. But I am getting exception.

As you can see in the docs, you should be able to index that directly:
curl -X POST -H 'Content-Type: application/json' 'http://localhost:8983/solr/my_collection/update' --data-binary '
[
{
"id" : "978-0641723445",
"cat" : ["book","hardcover"],
"name" : "The Lightning Thief",
"author" : "Rick Riordan",
"series_t" : "Percy Jackson and the Olympians",
"sequence_i" : 1,
"genre_s" : "fantasy",
"inStock" : true,
"price" : 12.50,
"pages_i" : 384
},
{
...
}
]'
If you need to further transform the data, you should look into this

Related

Salesforce Bulk API Delete by externalIdFieldName

Note I am not using Bulk Api 2.0
Is there way to delete salesforce objects using the external salesforce Id?
When I set up the job I sent the following:
{
"operation" : "delete",
"object" : "Subscription",
"contentType" : "JSON",
"externalIdFieldName": "ExternalId"
}
But when I post the batch with the external Id it fails;
Request:
[{"externalId":"123456789"}]
Response:
[
{
"success": false,
"created": false,
"id": null,
"errors": [
{
"message": "bad id 123456789",
"fields": [],
"statusCode": "MALFORMED_ID",
"extendedErrorDetails": null
}
]
}
]
Other combinations also fail:
[{"Id":"123456789"}]
[{"externalIdFieldName":"123456789"}]
It does delete if I use the salesforce ID
[{"Id":"xu97987oUv"}]
But I want to delete using the external ID if that is possible.
This is not possible. You have to query the salesforce record with the external id field to get the salesforce id of the record and use it to delete the record.

MongoDB updating all records in a collection with the results of query from another collection

I have around 40k records to update and each record gets the data from querying another collection.
I have an existing query to do this, but it runs for more than an hour. It usually disconnects, then I rerun it again.
I think there is a better way to do this, I am just a noob with mongodb and this solution works but I am not happy with the execution speed.
Maybe you have a better or much faster solution.
To better illustrate the data, please see below:
accounts
[
{
"_id": ObjectId("AC101"),
"emails":null,
"name":"Account 101",
...
},
{
"_id": ObjectId("AC102"),
"emails":null,
"name":"Account 102",
...
},
{
"_id": ObjectId("AC103"),
"emails":null,
"name":"Account 103",
...
},
...
]
account_contacts
[
{
"_id": Object("ACC001"),
"account": {
"$ref" : "account",
"$id" : ObjectId("AC101")
},
"email":"acc001#test.com",
"name":"Contact 001",
...
},
{
"_id": Object("ACC002"),
"account": {
"$ref" : "account",
"$id" : ObjectId("AC102")
},
"email":"acc002#test.com",
"name":"Contact 002",
...
},
{
"_id": Object("ACC003"),
"account": {
"$ref" : "account",
"$id" : ObjectId("AC103")
},
"email":"acc003#test.com",
"name":"Contact 003",
...
},
{
"_id": Object("ACC004"),
"account": {
"$ref" : "account",
"$id" : ObjectId("AC103")
},
"email":"acc004#test.com",
"name":"Contact 004",
...
},
{
"_id": Object("ACC005"),
"account": {
"$ref" : "account",
"$id" : ObjectId("AC103")
},
"email":"acc005#test.com",
"name":"Contact 005",
...
},
...
]
Query:
db.getCollection('accounts').find({ 'emails':{ $eq:null } }).forEach(p => {
const emails = [];
db.getCollection('account_contacts').find({"account.$id": p._id}).forEach(c => {
emails.push(c.email);
});
db.getCollection('accounts').updateOne({"_id": p._id}, {$set: {"emails": emails}});
});
I have a filter to get only the accounts with null emails, so that if it gets a timeout error (1hr)... I just rerun the script and it will process those accounts with null emails.
Currently, I do not have any idea on how to improve the query... but I know it is not the best solution for this case since it takes more than an hour.
Update:
While I still cannot make the aggregate/lookUp approach work, I did tried to run the old script in mongo console, which I ran before and executes more than an hour in my ID... If you run it directly in the mongo console, it only takes 12-14 mins which is not bad.
This is what I did for now, but I still want to convert my script to use aggregation.
TIA
Using MongoDB 4.2, you can avoid pulling the documents to the client side if you are willing to use a temporary collection.
Use aggregation to match all of the documents with null email, extract just the _id and store it in a temporary collection. Note that if you have an index on {emails:1, _id:1} it will streamline this part. You may want to procedurally generate the temporary collection name so it doesn't use the same name for successive runs.
db.accounts.aggregate([
{$match: {emails: null}},
{$project: {_id: 1}},
{$out: "temporary_null_email_collection"}
])
Then aggregate the temporary collection, lookup the email from the account_contacts collection, get rid of any extraneous fields, and merge the results back with the accounts collection.
db.temporary_null_email_collection.aggregate([
{$lookup:{
from: "account_contacts",
localField: "_id",
foreignField: "$id", // verify this field name is correct
as: contacts
}},
{$project: {
_id: 1,
emails: "$contacts.emails"
}},
{$merge: "accounts"}
])

Find similar documents/records in database

I have a quite big number of records currently stored in mongodb, each looks somehow like this:
{
"_id" : ObjectId("5c38d267b87d0a05d8cd4dc2"),
"tech" : "NodeJs",
"packagename" : "package-name",
"packageversion" : "0.0.1",
"total_loc" : 474,
"total_files" : 7,
"tecloc" : {
"JavaScript" : 316,
"Markdown" : 116,
"JSON" : 42
}
}
What I want to do is to find similar data record based on e.g., records which have about (+/-10%) the number of total_loc or use some of the same technologies (tecloc).
Can I somehow do this with a query against mongodb or is there a technology that fits better for what I want to do? I am fine with regenerating the data and storing it e.g., in elastic or some graph-db.
Thank you
One of the possibility to solve this problem is to use Elasticsearch. I'm not claiming that it's the only solution you have.
On the high level - you would need to setup Elasticsearch and index your data. There are various possibilities to achieve: mongo-connector, or Logstash and JDBC input plugin or even just dumping data from MongoDB and putting it manually. No limits to do this job.
The difference I would propose initially is to make field tecloc - multivalued field, by replacing { to [, and adding some other fields for line of code, e.g:
{
"tech": "NodeJs",
"packagename": "package-name",
"packageversion": "0.0.1",
"total_loc": 474,
"total_files": 7,
"tecloc": [
{
"name": "JavaScript",
"loc": 316
},
{
"name": "Markdown",
"loc": 116
},
{
"name": "JSON",
"loc": 42
}
]
}
This data model is very trivial and obviously have some limitations, but it's already something for you to start and see how well it fits your other use cases. Later you should discover nested type as one of the possibility to mimic your data more properly.
Regarding your exact search scenario - you could search those kind of documents with a query like this:
{
"query": {
"bool": {
"should": [
{
"term": {
"tecloc.name.keyword": {
"value": "Java"
}
}
},
{
"term": {
"tecloc.name.keyword": {
"value": "Markdown"
}
}
}
],
"must": [
{"range": {
"total_loc": {
"gte": 426,
"lte": 521
}
}}
]
}
}
}
Unfortunately, there is no support for syntax with +-10% so this is something that should be calculated on the client.
On the other side, I specified that we are searching documents which should have Java or Markdown, which return example document as well. In this case, if I would have document with both Java and Markdown the score of this document will be higher.

Elasticsearch build sql server index and create search query failes

After reading and trying several of articles and getting no result..
I want to create and elasticsearch query that returns data base results
Example:
[Step 1]:
my db is [my_db] and my table name is [my_table]
to build new index on localhost:9200
POST /my_index/my_type/_meta
{
"type":"jdbc",
"jdbc":
{
"driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver",
"url":"jdbc:sqlserver://[my_db_ip];databaseName=[my_db]",
"user":"sa","password":"xxxxxx",
"sql":"SELECT * FROM [my_table]",
"poll":"5s",
"index": "my_index",
"type": "my_type"
}
}
The index creation result:
{
"_index": "my_index",
"_type": "my_type",
"_id": "_meta",
"_version": 1,
"created": true
}
[Step 2]:
The search query
POST /my_index/_search
{
"query_string" : {
"query" : "FreeText"
}
}
The search result
{
"error": "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures....
}
what is wrong with my search query??
how can i create a query that returns results from [my_table] rows?
Try the match_all query (see here for the official documentation). This will bring you all the results of my_type.
Example:
POST /my_index/my_type/_search
{
"query": { "match_all": {} }
}
If you need to search for some specific term then you must pay attention to the mappings of your type and the query type that you'll use.
Update
Mappings:
From the schema of your table I understand that the below mappings for my_type would suit you well.
{
"my_table" : {
"properties" : {
"orderid" : {"type" : "integer", "index" : "not_analyzed"},
"ordername " : {"type" : "string" }
}
}
}
Keep in mind that if the data are already indexed you cannot change the mappings. You must reindex your data after defining the proper mapping.
Generally I'd propose you to follow the below methodology:
Create your index with the index settings that you need
Define the mappings of your type
Index your data
Do not mingle all of these steps in one and avoid leaving things in luck (like default mappings).
You can use the match query in order to search the data of a field on the document.
Example
POST /my_index/my_type/_search
{
"query": {
"match": {
"FIELD": "TEXT"
}
}
}
You can use the multi-match query in order to search the data of multiple fields of the document.
Example
POST /my_index/my_type/_search
{
"query": {
"multi_match": {
"query": "TEXT",
"fields": [ "field1", "field2" ]
}
}
}
For more querying options check the official documentation on Query DSL.

How to count occurence of each value in array?

I have a database of ISSUES in MongoDB, some of the issues have comments, which is an array; each comments has a writer. How can I count the number of comments each writer has written?
I've tried
db.test.issues.group(
{
key = "comments.username":true;
initial: {sum:0},
reduce: function(doc, prev) {prev.sum +=1},
}
);
but no luck :(
A Sample:
{
"_id" : ObjectId("50f48c179b04562c3ce2ce73"),
"project" : "Ruby Driver",
"key" : "RUBY-505",
"title" : "GETMORE is sent to wrong server if an intervening query unpins the connection",
"description" : "I've opened a pull request with a failing test case demonstrating the bug here: https://github.com/mongodb/mongo-ruby-driver/pull/134\nExcerpting that commit message, the issue is: If we do a secondary read that is large enough to require sending a GETMORE, and then do another query before the GETMORE, the secondary connection gets unpinned, and the GETMORE gets sent to the wrong server, resulting in CURSOR_NOT_FOUND, even though the cursor still exis ts on the server that was initially queried.",
"status" : "Open",
"components" : [
"Replica Set"
],
"affected_versions" : [
"1.7.0"
],
"type" : "Bug",
"reporter" : "Nelson Elhage",
"priority" : "major",
"assignee" : "Tyler Brock",
"resolution" : "Unresolved",
"reported_on" : ISODate("2012-11-17T20:30:00Z"),
"votes" : 3,
"comments" : [
{
"username" : "Nelson Elhage",
"date" : ISODate("2012-11-17T20:30:00Z"),
"body" : "Thinking some more"
},
{
"username" : "Brandon Black",
"date" : ISODate("2012-11-18T20:30:00Z"),
"body" : "Adding some findings of mine to this ticket."
},
{
"username" : "Nelson Elhage",
"date" : ISODate("2012-11-18T20:30:00Z"),
"body" : "I think I tracked down the 1.9 dependency."
},
{
"username" : "Nelson Elhage",
"date" : ISODate("2012-11-18T20:30:00Z"),
"body" : "Forgot to include a link"
}
]
}
You forgot the curly braces on the key value and you need to terminate that line with a , instead of a ;.
db.issues.group({
key: {"comments.username":true},
initial: {sum:0},
reduce: function(doc, prev) {prev.sum +=1},
});
UPDATE
After realizing comments is an array...you'd need to use aggregate for that so that you can 'unwind' comments and then group on it:
db.issues.aggregate(
{$unwind: '$comments'},
{$group: {_id: '$comments.username', sum: {$sum: 1}}}
);
For the sample doc in the question, this outputs:
{
"result": [
{
"_id": "Brandon Black",
"sum": 1
},
{
"_id": "Nelson Elhage",
"sum": 3
}
],
"ok": 1
}
Just a snide answer here to compliment #JohnnyHKs answer: it sounds like your new to MongoDB and as such possibly working on a new version of MongoDB if that is the case (if not I would upgrade) either way the old group count is kinda bad. It won't, for one, work with sharding.
Instead in MongoDB 2.2 you can just do:
db.col.aggregate({$group: {_id: "$comments.username", count: {$sum: 1}}})
Or something similar. You can read more about it here: http://docs.mongodb.org/manual/applications/aggregation/

Resources