Azure Cognitive Search and Dictionary serialization problem - azure-cognitive-search

I have a C# class that include a Dictionary (this is just one row of the class)
public Dictionary<string, double?> Resultat { get; set; }
When I created the Index I created that as a Collection of complex type
Collection(Edm.ComplexType)
key (Edm.String)
value (Edm.Double)
I use the C# native library to post documents to the search index and when I run the code I'll get an error stating that it looks for a start marker. I figured out that Azure Cognitive Search serializer converts the Dictionary to a json class instead of an json array.
the result looks a bit like:
{
"key1": "value1",
"key2": "value2"
}
but Cognitive Search expects the data to look like:
[
{"key1":"value1"},
{"key2":"value2"}
]
Since the dictionary is dynamic in both count and keys (unknown counts, and unknown keys) it cannot create the field as a Edm.Complex type.
Is there any way to send serializer instructions to Cognitive Search to serialize Dictionarys to arrays instead of objects?? Are there any other solutions?

Related

How to use Avro Generated Classes with children as UNION in Flink Operator

I am trying to build a list of Avro Generated Records while iterating with a .map() over Kafka Records that are getting pulled from a Kafka Source.
The problem I'm having is that I have to work with multiple types of events on that Kafka Topic, so I ended up having a GenricType (schema/avro generated object) that has an UNION on a field ('data').
While processing those records and trying to build the result, I've debugged and ended up in the PojoType validation phase, and since the class GenericType has a child declared as an UNION, that field becomes: private java.lang.Object data;
While processing this field, in the PojoType validator, it throws an exception:
Exception in thread "main" java.lang.IllegalStateException: Expecting type to be a PojoTypeInfo
My GenericType java class (that gets generated, does extends explicitly com.avro.specific.SpecificRecordBase but the problem still remains, because the field from it, is of type java.lang.Object.
Here is the code that is causing issues:
SingleOutputStreamOperator<GenericType> producedRecords =
kafkaRecords
.map(
value -> {
String kafkaKey = value.get(KEY).asText();
String kafkaRecordJson = MAPPER.writeValueAsString(value.get(VALUE));
return (GenericType) Converter.convert(kafkaKey, kafkaRecordJson);
})
.returns(
TypeInformation.of(
new TypeHint<>() {
#Override
public TypeInformation<GenericType> getTypeInfo() {
return super.getTypeInfo();
}
}));
Avro Schemas:
{
"type": "record",
"name": "GenericType",
"namespace": "com.zzz.yyy",
"fields": [
{
"name": "data",
"type": [
"com.zzz.yyy.Schema1",
"com.zzz.yyy.Schema2"
]
}
]
}
I have also tried with an Avro schema looking like this:
[
"com.zzz.yyy.Schema1",
"com.zzz.yyy.Schema2"
]
So that is just an UNION for the generic type object, but I can't make the avro plugin that generates the object to actually work. Always stating that the schema is invalid.
This schema will generate an java object looking like this (Obviously cleared out the boilerplate code that avro adds) - worth mentioning that this class below, does
extend SpecificRecordBase - just to exclude this being the problem for the exception.
public class GenericType {
// boiler plate here
private java.lang.Object data;
// boiler plate here
}
And this is the actual problem, while debugging like I said, while verifying the fields from the object, the 'data' field, is not fine, because it's not a primitive or a POJO Type (being an object), it does not respect some of the rules (having to have a no args constructor, getters, setters etc)
Trying to figure out how could I generate those Avro Object, or what could I use instead of the generic one inside my job, so that I can move on past that exception - as honestly by observing that validation there, I'm not sure how this would be possible, since also, the Avro plugin will always generate a field as a java.lang.Object for an UNION.
More Context:
Avro schemas registered with schema-registry.
Produced Avro objects sent to a kafka sink.
It was just a silly problem, the mvn plugin used to generate the avro classes had the set a flag to not create setters. After adding that, all validation passed on the Avro Pojos, so the flow was succesfull.

query by object value inside array on firebase firestore [duplicate]

This is my structure of the firestore database:
Expected result: to get all the jobs, where in the experience array, the lang value is "Swift".
So as per this I should get first 2 documents. 3rd document does not have experience "Swift".
Query jobs = db.collection("Jobs").whereArrayContains("experience.lang","Swift");
jobs.get().addOnSuccessListener(new OnSuccessListener<QuerySnapshot>() {
#Override
public void onSuccess(QuerySnapshot queryDocumentSnapshots) {
//Always the queryDocumentSnapshots size is 0
}
});
Tried most of the answers but none worked out. Is there any way to query data in this structure? The docs only available for normal array. Not available for array of custom object.
Actually it is possible to perform such a query when having a database structure like yours. I have replicated your schema and here are document1, document2, and document3.
Note that you cannot query using partial (incomplete) data. You are using only the lang property to query, which is not correct. You should use an object that contains both properties, lang and years.
Seeing your screenshot, at first glance, the experience array is a list of HashMap objects. But here comes the nicest part, that list can be simply mapped into a list of custom objects. Let's try to map each object from the array to an object of type Experience. The model contains only two properties:
public class Experience {
public String lang, years;
public Experience() {}
public Experience(String lang, String years) {
this.lang = lang;
this.years = years;
}
}
I don't know how you named the class that represents a document, but I named it simply Job. To keep it simple, I have only used two properties:
public class Job {
public String name;
public List<Experience> experience;
//Other prooerties
public Job() {}
}
Now, to perform a search for all documents that contain in the array an object with the lang set to Swift, please follow the next steps. First, create a new object of the Experience class:
Experience firstExperience = new Experience("Swift", "1");
Now you can query like so:
CollectionReference jobsRef = rootRef.collection("Jobs");
jobsRef.whereArrayContains("experience", firstExperience).get().addOnCompleteListener(new OnCompleteListener<QuerySnapshot>() {
#Override
public void onComplete(#NonNull Task<QuerySnapshot> task) {
if (task.isSuccessful()) {
for (QueryDocumentSnapshot document : task.getResult()) {
Job job = document.toObject(Job.class);
Log.d(TAG, job.name);
}
} else {
Log.d(TAG, task.getException().getMessage());
}
}
});
The result in the logcat will be the name of document1 and document2:
firstJob
secondJob
And this is because only those two documents contain in the array an object where the lang is set to Swift.
You can also achieve the same result when using a Map:
Map<String, Object> firstExperience = new HashMap<>();
firstExperience.put("lang", "Swift");
firstExperience.put("years", "1");
So there is no need to duplicate data in this use-case. I have also written an article on the same topic
How to map an array of objects from Cloud Firestore to a List of objects?
Edit:
In your approach it provides the result only if expreience is "1" and lang is "Swift" right?
That's correct, it only searches for one element. However, if you need to query for more than that:
Experience firstExperience = new Experience("Swift", "1");
Experience secondExperience = new Experience("Swift", "4");
//Up to ten
We use another approach, which is actually very simple. I'm talking about Query's whereArrayContainsAny() method:
Creates and returns a new Query with the additional filter that documents must contain the specified field, the value must be an array, and that the array must contain at least one value from the provided list.
And in code should look like this:
jobsRef.whereArrayContainsAny("experience", Arrays.asList(firstExperience, secondExperience)).get().addOnCompleteListener(new OnCompleteListener<QuerySnapshot>() {
#Override
public void onComplete(#NonNull Task<QuerySnapshot> task) {
if (task.isSuccessful()) {
for (QueryDocumentSnapshot document : task.getResult()) {
Job job = document.toObject(Job.class);
Log.d(TAG, job.name);
}
} else {
Log.d(TAG, task.getException().getMessage());
}
}
});
The result in the logcat will be:
firstJob
secondJob
thirdJob
And this is because all three documents contain one or the other object.
Why am I talking about duplicating data in a document it's because the documents have limits. So there are some limits when it comes to how much data you can put into a document. According to the official documentation regarding usage and limits:
Maximum size for a document: 1 MiB (1,048,576 bytes)
As you can see, you are limited to 1 MiB total of data in a single document. So storing duplicated data will only increase the change to reach the limit.
If i send null data of "exprience" and "swift" as "lang" will it be queried?
No, will not work.
Edit2:
whereArrayContainsAny() method works with max 10 objects. If you have 30, then you should save each query.get() of 10 objects into a Task object and then pass them one by one to the to the Tasks's whenAllSuccess(Task... tasks).
You can also pass them directly as a list to Tasks's whenAllSuccess(Collection> tasks) method.
With your current document structure, it's not possible to perform the query you want. Firestore does not allow queries for individual fields of objects in list fields.
What you would have to do is create an additional field in your document that is queryable. For example, you could create a list field with only the list of string languages that are part of the document. With this, you could use an array-contains query to find the documents where a language is mentioned at least once.
For the document shown in your screenshot, you would have a list field called "languages" with values ["Swift", "Kotlin"].

How to get a document from mongodb by using its string id and string collection name in SpringData

I know that generally, we need to do something similar to this for getting a document back from mongodb in spring data:
Define a class and annotate it with #Document:
#Document ("persons")
public class Person
Use MongoTemplete:
mongoOps.findById(p.getId(), Person.class);
The problem is that in runtime I don't know the class type of the document, I just have its string collection name and its string Id. How is it possible to retrieve the document using SpringData? Something like this:
db.myCollectionName.findOne({_id: myId})
The result object type is not a concern, it can be even an object, I just want to map it to a jackson JsonNode.
A possible workaround for this you can use the aggregate function of mongooperation like this
AggregationResults<Object> aggResults = mongoOps.aggregate(newAggregation(match(Criteria.where("_id").is(myId)) ,
myCollectionName, Object.class);
return aggResults.getUniqueMappedResult();

SolrJ nested documents

I have is this:
public class Product {
#Field("object_id")
private String objectId;
private List<MyObject> listOfMyObjects;
}
I use SolrJ to save the info. How can I make listOfMyObjects look like list of nested documents in Solr response. I can make the field multivalued, but I need the list to be list of documents.
I can see that this question is asked few times(e.g. Solrj Block Join Bean support ) but no answer. Solr supports nested documents, but how to make it happen using SolrJ with annotations and schema.xml.

Storing JSON document with AppEngine

I'm trying to store JSON document into the AppEngine datastore using Objectify as persistence layer. To be able to query for document values, instead of just inserting the whole document as String field, I created a MapEntity which looks like this:
#Entity(name="Map")
public class MapEntity {
#Id
private Long id;
private Map<String,String> field;
// Code omitted
}
Since eventually when "unrolled" every key-value in the JSON document can be represented with Map
Example:
String subText = "{\"first\": 111, \"second\": [2, 2, 2], \"third\": 333}";
String jsonText = "{\"first\": 123, \"second\": [4, 5, 6], \"third\": 789, \"fourth\":"
+ subText + "}";
I will have the map fields stored in the datastore:
KEY VALUE
field.first => 123
field.second => [4,5,6]
field.third => 789
field.fourth-first => 111
field.fourth-second => [2,2,2]
field.fourth-third => 333
If I use my parse() method:
Parse the JSON document using JSON.Simple library and then do a recursive parse:
private MapEntity parse(String root, MapEntity entity, Map json) {
Iterator iter = json.entrySet().iterator();
while (iter.hasNext()) {
Map.Entry entry = (Map.Entry) iter.next();
if (entry.getValue() instanceof Map){
entity = parse((String)entry.getKey()+"-", entity, (Map) entry.getValue());
System.out.println("Map instance");
} else {
entity.setField(root + String.valueOf(entry.getKey()), String.valueOf(entry.getValue()));
}
}
return entity;
}
My app works like this:
MapEntity jsonEntity = new MapEntity();
Map json = null;
json = (Map) parser.parse(jsonText, containerFactory); // JSON.Simple parser
jsonEntity = parse("", jsonEntity, json);
Problems I encounter are:
I can't use the "." dot in the Map key field, so I have to use the "-"
Also my approach in storing JSON document is not very efficient
If your JSON follows a strict format, you'd probably be better off constructing a class to represent your data format and serializing directly to and from that class using a library like Jackson. You can use that class directly as your entity class in Objectify, but whether you want to do depends on whether you want to:
Store and expose the exact same set of data
Tightly couple your storage and JSON representations
You could use JSONObject as a replacement for your MapEntity and store the json to google app engine as a string using the toString() method. Upon retrieval you could simply restore the JSONObject using the appropriate constructor. This, of course, limits your ability to index properties in app engine and query against them.
If you want Objectify to do this for you, you could register a Translator to take care of calling the toString() and reconstruction.

Resources