Kotlin Parsing json array with new line separator - arrays

I'm using OKHttpClient in a Kotlin app to post a file to an API that gets processed. While the process is running the API is sending back messages to keep the connection alive until the result has been completed. So I'm receiving the following (this is what is printed out to the console using println())
{"status":"IN_PROGRESS","transcript":null,"error":null}
{"status":"IN_PROGRESS","transcript":null,"error":null}
{"status":"IN_PROGRESS","transcript":null,"error":null}
{"status":"DONE","transcript":"Hello, world.","error":null}
Which I believe is being separated by a new line character, not a comma.
I figured out how to extract the data by doing the following but is there a more technically correct way to transform this? I got it working with this but it seems error-prone to me.
data class Status (status : String?, transcript : String?, error : String?)
val myClient = OkHttpClient ().newBuilder ().build ()
val myBody = MultipartBody.Builder ().build () // plus some stuff
val myRequest = Request.Builder ().url ("localhost:8090").method ("POST", myBody).build ()
val myResponse = myClient.newCall (myRequest).execute ()
val myString = myResponse.body?.string ()
val myJsonString = "[${myString!!.replace ("}", "},")}]".replace (",]", "]")
// Forces the response from "{key:value}{key:value}"
// into a readable json format "[{key:value},{key:value},{key:value}]"
// but hoping there is a more technically sound way of doing this
val myTranscriptions = gson.fromJson (myJsonString, Array<Status>::class.java)

An alternative to your solution would be to use a JsonReader in lenient mode. This allows parsing JSON which does not strictly comply with the specification, such as in your case multiple top level values. It also makes other aspects of parsing lenient, but maybe that is acceptable for your use case.
You could then use a single JsonReader wrapping the response stream, repeatedly call Gson.fromJson and collect the deserialized objects in a list yourself. For example:
val gson = GsonBuilder().setLenient().create()
val myTranscriptions = myResponse.body!!.use {
val jsonReader = JsonReader(it.charStream())
jsonReader.isLenient = true
val transcriptions = mutableListOf<Status>()
while (jsonReader.peek() != JsonToken.END_DOCUMENT) {
transcriptions.add(gson.fromJson(jsonReader, Status::class.java))
}
transcriptions
}
Though, if the server continously provides status updates until processing is done, then maybe it would make more sense to directly process the parsed status instead of collecting them all in a list before processing them.

Related

What's the simplest reliable way to encode multiple jpeg images in a single byte string?

I need to publish a Google Cloud Pub/Sub message with multiple jpeg images. It needs to go in the data body. Putting it as a base64-encoded string in an attribute won't work, because attribute values are limited to 1024 bytes:
https://cloud.google.com/pubsub/quotas#resource_limits
What's a simple and reliable pattern for doing that? It might seem possible to choose some fixed delimiter, but I want to avoid the possibility of that delimiter occurring inside an image. Is it possible something like |||| might occur in a jpeg byte array? Another possibility might seem to encode as multi-part mime, but I haven't found any general-purpose non-http libraries to do that. I need implementations in both Java/Scala and Python. Or maybe can I just concatenate the jpeg byte arrays without any external delimiter, and split them based on header identifiers?
You probably want to store data in some kind of schema-based message using something like Avro or Protocol Buffers. Both can generate code that can be used to serialize and deserialize messages in Java/Scala and Python.
For example, in protocol buffers, you could create a message in a file image.proto:
syntax = "proto3";
message Images {
bytes images = 1;
}
You could generate the python code for this with protoc compiler:
$ protoc -I=. --python_out=. image.proto
In Python3, to add images, serialize the message, and send it, you would do the following:
import image_pb2
from google.cloud import pubsub_v1
publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path(<project name>, <topic name>)
def send_images(images):
img_msg = image_pb2.Images()
for i in images:
img_msg.images.append(i)
msg_data = img_msg.SerializeToString()
message_future = publisher.publish(topic_path, data=msg_data)
print(message_future.result())
To receive the images and process them:
import image_pb2
from google.cloud import pubsub_v1
def receive(message):
images = image_pb2.Images()
images.ParseFromString(message.data)
for i in images.images:
# Process the image
message.ack()
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(<project name>, <subscription name>)
subscribe_future = subscriber.subscribe(subscription_path, receive)
print(subscribe_future.result())
It looks like the following approach may work, written in Scala, using only natural delimiters:
def serializeJpegs(jpegs: Seq[Array[Byte]]): Array[Byte] =
jpegs.foldLeft(Array.empty[Byte])(_ ++ _)
def deserializeJpegs(bytes: Array[Byte]): Seq[Array[Byte]] = {
val JpegHeader = Array(0xFF.toByte, 0xD8.toByte)
val JpegFooter = Array(0xFF.toByte, 0xD9.toByte)
val Delimiter = JpegFooter ++ JpegHeader
val jpegs: mutable.Buffer[Array[Byte]] = mutable.Buffer.empty
var (start, end) = (0, 0)
end = bytes.indexOfSlice(Delimiter, start) + JpegFooter.length
while (end > JpegFooter.length) {
jpegs += bytes.slice(start, end)
start = end
end = bytes.indexOfSlice(Delimiter, start) + JpegFooter.length
}
if (start < bytes.length) {
jpegs += bytes.drop(start)
}
jpegs
}
I'm sure there's a more efficient and functional implementation, but that's for another day!

Regex to match the last value in given data

I get data from a URL, and am working on the data to check for a few conditions. The data from the URL look like this:
1528190345":100,"1528190346":100,"1528190368":100,"1528190414":100,"1528190439":99,"1528190440":99,"1528190463":100,"1528190485":100,"1528190508":100,"1528190550":100,"1528190575":100,"1528190576":100,"1528190599":100,"1528190600":100,"1528190622":100,"1528190667":100,"1528190688":100,"1528190689":100,"1528190712":100,"1528190736":100,"1528190762":100,"1528190785":100,"1528190786":100,"1528190807":100,"1528190828":100,"1528190853":100,"1528190877":100,"1528190901":100,"1528190925":100,"1528190948":100,"1528190968":100,"1528190991":100}}]
====
I have converted that too JSON
{"metric"=>"Insta_real-unique_value", "tags"=>{"host"=>"letme.quickly.com", "tier"=>"2", "device"=>"tester1", "dc"=>"xxx"}, "aggregateTags"=>["device_name", "device_ip"], "dps"=>{"1526972408"=>100, "1526972424"=>100, "1526972440"=>100, "1526972456"=>100, "1526972472"=>100, "1526972488"=>100, "1526972504"=>100, "1526972520"=>100, "1526972536"=>100, "1526972552"=>100, "1526972568"=>100, "1526972569"=>100, "1526972584"=>100, "1526972585"=>100, "1526972601"=>100, "1526972617"=>100, "1526972633"=>100, "1526972649"=>100, "1526972665"=>100, "1526972681"=>100}}
I want to extract the value that corresponds to 100. When I do this:
url = "#{URL}"
uri = URI(url)
response = Net::HTTP.get(uri)
value = response[-6..-4]
puts value
I get the last value, but when the last value changes to 99/9/0, it prints :99 or ":9.
Is there a way to get the exact value as is?
When dealing with JSON data, it's almost always better to parse the data properly rather than using regex against the string.
In this case, we can do:
JSON.parse(response)['dps'].values.last #=> 100
If the response is a json response, you must use a json parser else if is not a json response, you can use a regex expression with a Regex Object.
In case of a json response, assuming that the object is something like is declared into the variable response of the next code, you can parse it into a JObject. (using Newtonsoft.Json available from nuget repository).
See the next example :
string response = "[{\"response\":{\"1528190345\":100,\"1528190346\":100,\"1528190368\":100,\"1528190414\":100,\"1528190439\":99,\"1528190440\":99,\"1528190463\":100,\"1528190485\":100,\"1528190508\":100,\"1528190550\":100,\"1528190575\":100,\"1528190576\":100,\"1528190599\":100,\"1528190600\":100,\"1528190622\":100,\"1528190667\":100,\"1528190688\":100,\"1528190689\":100,\"1528190712\":100,\"1528190736\":100,\"1528190762\":100,\"1528190785\":100,\"1528190786\":100,\"1528190807\":100,\"1528190828\":100,\"1528190853\":100,\"1528190877\":100,\"1528190901\":100,\"1528190925\":100,\"1528190948\":100,\"1528190968\":100,\"1528190991\":100}}]";
List<Dictionary<string, Dictionary<string, int>>> values = JsonConvert.DeserializeObject<List<Dictionary<string, Dictionary<string, int>>>>(response);
Dictionary<string, Dictionary<string, int>> firstLevel = values[0]; // Access to the first object of the list closed with ']'
Dictionary<string, int> secondLevel = firstLevel["response"]; // Access to the first object response and get's it's object context of first '}' starting from the end of response
/** This is an option, if you ever knows the name of the element (1528190991) */
int thirdLevel = secondLevel["1528190991"]; // Access to the last element of the object by it's name, context of second '}' starting from the end of response.
Console.WriteLine(thirdLevel);
/** This is another option if you doesn't know the name of the element and wants ever the last element. */
List<int> listOfValues = secondLevel.Values.ToList();
Console.WriteLine(listOfValues[listOfValues.Count-1]);
Note that i've chenged a little bit your response adding [{\"response\":{\" at the start to become a json response.
If is not a json response you can use this pattern with regular expression :
:(.{2,6})}}\]$
Hope will help!

How to buffer and drop a chunked bytestring with a delimiter?

Lets say you have a publisher using broadcast with some fast and some slow subscribers and would like to be able to drop sets of messages for the slow subscriber without having to keep them in memory. The data consists of chunked ByteStrings, so dropping a single ByteString is not an option.
Each set of ByteStrings is followed by a terminator ByteString("\n"), so I would need to drop a set of ByteStrings ending with that.
Is that something you can do with a custom graph stage? Can it be done without aggregating and keeping the whole set in memory?
Avoid Custom Stages
Whenever possible try to avoid custom stages, they are very tricky to get correct as well as being pretty verbose. Usually some combination of the standard akka-stream stages and plain-old-functions will do the trick.
Group Dropping
Presumably you have some criteria that you will use to decide which group of messages will be dropped:
type ShouldDropTester : () => Boolean
For demonstration purposes I will use a simple switch that drops every other group:
val dropEveryOther : ShouldDropTester =
Iterator.from(1)
.map(_ % 2 == 0)
.next
We will also need a function that will take in a ShouldDropTester and use it to determine whether an individual ByteString should be dropped:
val endOfFile = ByteString("\n")
val dropGroupPredicate : ShouldDropTester => ByteString => Boolean =
(shouldDropTester) => {
var dropGroup = shouldDropTester()
(byteString) =>
if(byteString equals endOfFile) {
val returnValue = dropGroup
dropGroup = shouldDropTester()
returnValue
}
else {
dropGroup
}
}
Combining the above two functions will drop every other group of ByteStrings. This functionality can then be converted into a Flow:
val filterPredicateFunction : ByteString => Boolean =
dropGroupPredicate(dropEveryOther)
val dropGroups : Flow[ByteString, ByteString, _] =
Flow[ByteString] filter filterPredicateFunction
As required: the group of messages do not need to be buffered, the predicate will work on individual ByteStrings and therefore consumes a constant amount of memory regardless of file size.

Gmail API .NET: Get full message

How do I get the full message and not just the metadata using gmail api?
I have a service account and I am able to retrieve a message but only in the metadata, raw and minimal formats. How do I retrieve the full message in the full format? The following code works fine
var request = service.Users.Messages.Get(userId, messageId);
request.Format = UsersResource.MessagesResource.GetRequest.FormatEnum.Metadata;
Message message = request.Execute();
However, when I omit the format (hence I use the default format which is FULL) or I change the format to UsersResource.MessagesResource.GetRequest.FormatEnum.Full
I get the error: Metadata scope doesn't allow format FULL
I have included the following scopes:
https://www.googleapis.com/auth/gmail.readonly,
https://www.googleapis.com/auth/gmail.metadata,
https://www.googleapis.com/auth/gmail.modify,
https://mail.google.com/
How do I get the full message?
I had to remove the scope for the metadata to be able to get the full message format.
The user from the SO post have the same error.
Try this out first.
Go to https://security.google.com/settings/security/permissions
Choose the app you are working with.
Click Remove > OK
Next time, just request exactly which permissions you need.
Another thing, try to use gmailMessage.payload.parts[0].body.dataand to decode it into readable text, do the following from the SO post:
import org.apache.commons.codec.binary.Base64;
import org.apache.commons.codec.binary.StringUtils;
System.out.println(StringUtils.newStringUtf8(Base64.decodeBase64(gmailMessage.payload.parts[0].body.data)));
You can also check this for further reference.
try something like this
public String getMessage(string user_id, string message_id)
{
Message temp =service.Users.Messages.Get(user_id,message_id).Execute();
var parts = temp.Payload.Parts;
string s = "";
foreach (var part in parts) {
byte[] data = FromBase64ForUrlString(part.Body.Data);
s += Encoding.UTF8.GetString(data);
}
return s
}
public static byte[] FromBase64ForUrlString(string base64ForUrlInput)
{
int padChars = (base64ForUrlInput.Length % 4) == 0 ? 0 : (4 - (base64ForUrlInput.Length % 4));
StringBuilder result = new StringBuilder(base64ForUrlInput, base64ForUrlInput.Length + padChars);
result.Append(String.Empty.PadRight(padChars, '='));
result.Replace('-', '+');
result.Replace('_', '/');
return Convert.FromBase64String(result.ToString());
}

Bug when sending array in node.js and socket.io

I use socket.io version 0.8.4
I have boiled down my problem to the following. I have data looking like this:
data.prop1 = [];
data.prop1.push("man");
data.prop2 = [];
data.prop2["hey"] = "man";
I send the data from the server to the client this way:
socket.emit("data", data);
On the client side I receive the data this way:
socket.on("data", function(data){ console.log(data); });
The weird thing is:
data.prop1 = [];
data.prop1.push("man"); // This data exists in the client side data object
data.prop2 = [];
data.prop2["hey"] = "man"; // This data does not exist.
data.prop2 is just an empty array on the client side.
Is there a known bug in json serializing arrays on the form in prop2?
Thankyou in advance
EDIT:
Problem solved using this workaround:
data.prop1 = [];
data.prop1.push("man");
data.prop2 = {}; // <= Object instead of array
data.prop2["hey"] = "man";
ECMA-262 about JSON.stringify:
The representation of arrays includes only the elements between zero and array.length – 1 inclusive. Named properties are excluded from the stringification.
Arrays are supposed to have numerical property names. So when the data.prop2 is transformed to JSON (which socket.io sends the data in, I imagine), it doesn't get the 'hey' property. If you want to use non-numerical property names, you should use objects instead of arrays:
data.prop1 = [];
data.prop1.push("man");
data.prop2 = {}; // Notice we're creating an object, not an array.
data.prop2["hey"] = "man"; // Alternatively: data.prop2.hey = "man"
Unfortunately, Javascript doesn't really work like that.
Check out this article, about half way down. It explains the problem where you try to set data.prop2["hey"] = "man";

Resources