End of Json string identification - c

Json string : "{\n\t"tag": "Value",\n\t"attributes": {\n\t\t"rfid": "2"\n\t},\n\t"dat": "1"\n}
I am receiving the Json string from web server part by part i.e 8bytes at a time
When i am trying to collect the data in one buffer with the below logic in C
static char *jsonString ;
bool found_json = false;
jsonString = (char*)malloc (1024, sizeof(char));
while(data[i] != "}")
{
found_json = true;
for( i = 0; i< len; i++)
{
memcpy(jsonString, data, len);
}
}
can someday throw some light that how to detect end of the string of Json as there will be two closing Json object

I think you have two proper ways: either fully parse the JSON (you can use some libraries for that) or somehow receive the length of the string (if this is a HTTP request then there should be Content-Length header which indicates the length). Things like messing with curly braces is not reliable because even a simple number like 1233 or boolean values like true are valid JSON strings.

Here is some pseudo code for finding the end of your string:
open = 0;
close = 0;
while ( visit each character in your buffer )
{
if (new character == '{')
open++;
else if (new character == '}')
close++;
if (open == 0) // Did we even start the JSON string yet?
discard character
else if (open == close) // Matching number of { and } found. We are done.
{
process JSON string
open = close = 0; // Prepare for next incoming JSON string.
}
}

Related

JSON object is incorrectly decoded as JSON String(Parson library)

I'm using the parson library to decode a json message body sent from a backend (AzureIoTHub direct method call). My string is always decoded as type JSONString instead of JSONObject so I cannot extract and keys and values. I think something is wrong with my string in the payload buffer but I'm stuck on what it is. ANy help is appreciated.
Here are two examples of different json strings stored in payload_buffer variable and neither work.
"{'device': '65'}"
JSON Type: 2
or
"{"device": "3445667"}"
JSON Type: 2
uint8_t payload_buffer[448];
JSON_Value *root_value = NULL;
JSON_Object *root_object = NULL;
JSON_Value *device_name = NULL;
const char* device;
printf("%.*s \r\n", (INT)insert_index, (CHAR *)&payload_buffer );
root_value = json_parse_string((const char *)&payload_buffer[0]);
if (root_value != NULL)
{
if (json_value_get_type(root_value) == JSONObject)
{
root_object=json_value_get_object(root_value);
device_name = json_object_dotget_value (root_object, "device");
device = json_value_get_string(device_name);
if (device != NULL)
{
printf("device: %s", device );
}
}
else
{
printf("JSON Type: %u \r\n", json_value_get_type(root_value));
}
}
For parson library to interpret a buffer as a JSONObject and not a JSONString you cannot have the buffer start and end with " character. Once I edited my buffer to this
{"device": "3445667"}
my code was able to parse the JSON object correctly.

Extracting jansson JSON data

I am using the C jansson library http://www.digip.org/jansson/
It's quite easy to use https://jansson.readthedocs.org/en/2.7/tutorial.html#the-program
But I cannot get a simple int out of my JSON string. I can successfully receive and load a string of JSON (as in I get no errors, nothing is null) but when I use the jansson get functions to get an int, my int is always 0 even though using steps and breakpoints, the jansson function process an in is not returning 0.
The JSON string looks like this:
{"type":3}
Here is the code:
static void foo(json_t *jsonRoot) {
// json root is error checked even before this, and is not null
if (jsonRoot == NULL) {
return;
}
// Trying to get type = 3
json_t *j_type;
int type = 0;
j_type = json_object_get(jsonRoot, "type");
if (!json_is_integer(j_type)) {
printf("Not an int!\n");
return;
} else {
// I get in to the else
// json_integer_value has a its own internal check and
// will return 0 if the value is not an int, but it is not
// returning 0. It is running the macro json_to_integer(json)->value
type = json_integer_value(j_type);
}
printf("type is %d\n", type);
// type is 0
}
My issue was with strtoll. I had to redefine it.

Reaching from one string to another using given dictionary

For this question, a dictionary was given and two strings also given, it was basically asked to reach from one string to another one just using the words in dictionary, and only one letter can be changed at a time. I came up with this solution. There were some corner cases that my code can not handle. Can you help to find all the corner cases to make this code prettier?
public static int findNumberOfSteps(String start, String end , HashSet<String> dict){
if( start == null || end == null || dict.isEmpty()){
throw new IllegalArgumentException();
}
dict.add(end);
Queue<String> wordHolder = new LinkedList<>();
Queue<Integer> distanceCount = new LinkedList<Integer>();
wordHolder.add(start);
distanceCount.add(1);
int result = Integer.MAX_VALUE;
while (!wordHolder.isEmpty()){
String currentWord = wordHolder.poll();
int currDistance = distanceCount.poll();
if(currentWord.equals(end)){
int result = currDistance;
return result;
}
for (int i = 0 ; i < currentWord.length() ; i++){
char[] charCurrentWord = currentWord.toCharArray();
for ( char c = 'a' ; c <= 'z' ; c++){
charCurrentWord[i] = c;
String newWord = new String(charCurrentWord);
if (dict.contains(newWord)){
wordHolder.add(newWord);
distanceCount.add(currDistance+1);
dict.remove(newWord);
}
}
}
}
return 0;
}
There are a couple of problems in the code. The first problem is in this code
if(currentWord.equals(end)){
result = Math.min(result, currDistance);
}
Note that when you reach the end word, that code updates the result, but then the code is going to search for ways to change the end word to something else. That's a huge waste of time, the code should continue with the while(!wordHolder.isEmpy()) loop after the end is found.
The second problem is in this code
if (dict.contains(newWord)){
wordHolder.add(newWord);
distanceCount.add(currDistance+1);
dict.remove(newWord);
}
Note that if newWord is equal to the end word, then that code removes the end word from the dictionary, which means that you'll never find the end word again.
The solution to both problems is to check for the end word inside that if statement. When the end is found, don't add it to the wordHolder and don't remove it from the dictionary.
if (dict.contains(newWord)){
if(newWord.equals(end)){
result = Math.min(result, currDistance+1);
}
else{
wordHolder.add(newWord);
distanceCount.add(currDistance+1);
dict.remove(newWord);
}
}

Any Efficient way to parse large text files and store parsing information?

My purpose is to parse text files and store information in respective tables.
I have to parse around 100 folders having more that 8000 files and whole size approximately 20GB.
When I tried to store whole file contents in a string, memory out exception was thrown.
That is
using (StreamReader objStream = new StreamReader(filename))
{
string fileDetails = objStream.ReadToEnd();
}
Hence I tried one logic like
using (StreamReader objStream = new StreamReader(filename))
{
// Getting total number of lines in a file
int fileLineCount = File.ReadLines(filename).Count();
if (fileLineCount < 90000)
{
fileDetails = objStream.ReadToEnd();
fileDetails = fileDetails.Replace(Environment.NewLine, "\n");
string[] fileInfo = fileDetails.ToString().Split('\n');
//call respective method for parsing and insertion
}
else
{
while ((firstLine = objStream.ReadLine()) != null)
{
lineCount++;
fileDetails = (fileDetails != string.Empty) ? string.Concat(fileDetails, "\n", firstLine)
: string.Concat(firstLine);
if (lineCount == 90000)
{
fileDetails = fileDetails.Replace(Environment.NewLine, "\n");
string[] fileInfo = fileDetails.ToString().Split('\n');
lineCount = 0;
//call respective method for parsing and insertion
}
}
//when content is 90057, to parse 57
if (lineCount < 90000 )
{
string[] fileInfo = fileDetails.ToString().Split('\n');
lineCount = 0;
//call respective method for parsing and insertion
}
}
}
Here 90,000 is the bulk size which is safe to process without giving out of memory exception for my case.
Still the process is taking more than 2 days for completion. I observed this is because of reading line by line.
Is there any better approach to handle this ?
Thanks in Advance :)
You can use a profiler to detect what sucks your performance. In this case it's obvious: disk access and string concatenation.
Do not read a file more than once. Let's take a look at your code. First of all, the line int fileLineCount = File.ReadLines(filename).Count(); means you read the whole file and discard what you've read. That's bad. Throw away your if (fileLineCount < 90000) and keep only else.
It almost doesn't matter if you read line-by-line in consecutive order or the whole file because reading is buffered in any case.
Avoid string concatenation, especially for long strings.
fileDetails = fileDetails.Replace(Environment.NewLine, "\n");
string[] fileInfo = fileDetails.ToString().Split('\n');
It's really bad. You read the file line-by-line, why do you do this replacement/split? File.ReadLines() gives you a collection of all lines. Just pass it to your parsing routine.
If you'll do this properly I expect significant speedup. It can be optimized further by reading files in a separate thread while processing them in the main. But this is another story.

how to remove request header from HttpInputStream

I need help with a servlet.
I need to read a inputStream in one request and write a tiff file.
The inputStream come with request header and i dont know how remove that bytes and write only the file.
See initial bytes from the writen file.
-qF3PFkB8oQ-OnPe9HVzkqFtLeOnz7S5Be
Content-Disposition: form-data; name=""; filename=""
Content-Type: application/octet-stream; charset=ISO-8859-1
Content-Transfer-Encoding: binary
I want to remove that and write only bytes from the tiff file.
PS: sender of file its not me.
I'm not sure why you're not using HttpServletRequest's getInputStream() method to get the content without its headers, either way you have the option to start reading the input stream and ignoring the content until you find two consecutive CRLF's, which defines the end of the headers.
One way of doing that is like this:
String headers = new java.util.Scanner(inputStream).next("\\r\\n\\r\\n");
// Read rset of input stream
Apache commons solve 90% of your problems... only need know what keywords use in search :)
"parse multipart request"
and google say:
http://www.oreillynet.com/onjava/blog/2006/06/parsing_formdata_multiparts.html
int boundaryIndex = contentType.indexOf("boundary=");
byte[] boundary = (contentType.substring(boundaryIndex + 9)).getBytes();
ByteArrayInputStream input = new ByteArrayInputStream(buffer.getBytes());
MultipartStream multipartStream = new MultipartStream(input, boundary);
boolean nextPart = multipartStream.skipPreamble();
while(nextPart) {
String headers = multipartStream.readHeaders();
System.out.println("Headers: " + headers);
ByteArrayOutputStream data = new ByteArrayOutputStream();
multipartStream.readBodyData(data);
System.out.println(new String(data.toByteArray());
nextPart = multipartStream.readBoundary();
}
For me I use annotation and parameter like this:
#Consumes(MediaType.APPLICATION_OCTET_STREAM)
public Response testUpload(File uploadedInputStream)
And then I can read the file content with:
byte[] totalBytes = Files.readAllBytes(Paths.get(uploadedInputStream.toURI()));
Then I have to ignore the first 4 lines, also the end-of-content part, like this:
int headerLen = 0;
int index = 0;
while(totalBytes[index] != '\n' && index < totalBytes.length) {
headerLen++;
index++;
}
//ignore next three line
for (int i = 0; i < 3; i++) {
index++;
while (totalBytes[index] != '\n' && index < totalBytes.length) {
index++;
}
}
index++;
out.write(totalBytes, index, totalBytes.length - index - (headerLen+3));
out.flush();
out.close();

Resources