java nio socketChannel.write() missing bytes - nio

System.out.println(" # bBuffer = " + bBuffer.capacity());
headerBuffer.rewind();
socketChannel.write(headerBuffer);
int writen = socketChannel.write(bBuffer);
System.out.println(" # writen = " + writen);
bBuffer is an object of type ByteBuffer and it came from FileChannel.map() (it's an image file). When I receive this image file on client, it wasn't a complete image--about half of image was missing. So I checked how much bytes was written by printing some statistics to the console. The output was:
# bBuffer = 319932
# writen = 131071
What happened to the rest of the bytes? It seems that (319923 - 131071) bytes are missing.
Sometimes written is equals to bBuffer.capacity() and it's seems irrespective to file size or buffer capacity.

Your code makes an invalid assumption. read() and write() aren't obliged to transfer more than one byte per call. You have to loop. But you're doing this wrong anyway. It should be:
while (headerBuffer.position() > 0)
{
headerBuffer.flip();
socketChannel.write(headerBuffer);
headerBuffer.compact();
}
or
headerBuffer.flip();
while (headerBuffer.hasRemaining())
{
socketChannel.write(headerBuffer);
}
headerBuffer.compact();
If the SocketChannel is in non-blocking mode, it gets considerably more complex, but you haven't mentioned that in your question.
You also need to check your receiving code for the same assumption.

Related

Decoding TIFF LZW codes not yet in the dictionary

I made a decoder of LZW-compressed TIFF images, and all the parts work, it can decode large images at various bit depths with or without horizontal prediction, except in one case. While it decodes files written by most programs (like Photoshop and Krita with various encoding options) fine, there's something very strange about the files created by ImageMagick's convert, it produces LZW codes that aren't yet in the dictionary, and I don't know how to handle it.
Most of the time the 9 to 12-bit code in the LZW stream that isn't yet in the dictionary is the next one that my decoding algorithm will try to put in the dictionary (which I'm not sure should be a problem although my algorithm fails on an image that contains such cases), but at times it can even be hundreds of codes into the future. In one case the first code after the clear code (256) is 364, which seems quite impossible given that the clear code clears my dictionary of all codes 258 and above, in another case the code is 501 when my dictionary only goes up to 317!
I have no idea how to deal with it, but it seems that I'm the only one with this problem, the decoders in other programs load such images fine. So how do they do it?
Here's the core of my decoding algorithm, obviously due to how much code is involved I can't provide complete compilable code in a compact manner, but since this is a matter of algorithmic logic this should be enough. It follows closely the algorithm described in the official TIFF specification (page 61), in fact most of the spec's pseudo code is in the comments.
void tiff_lzw_decode(uint8_t *coded, buffer_t *dec)
{
buffer_t word={0}, outstring={0};
size_t coded_pos; // position in bits
int i, new_index, code, maxcode, bpc;
buffer_t *dict={0};
size_t dict_as=0;
bpc = 9; // starts with 9 bits per code, increases later
tiff_lzw_calc_maxcode(bpc, &maxcode);
new_index = 258; // index at which new dict entries begin
coded_pos = 0; // bit position
lzw_dict_init(&dict, &dict_as);
while ((code = get_bits_in_stream(coded, coded_pos, bpc)) != 257) // while ((Code = GetNextCode()) != EoiCode)
{
coded_pos += bpc;
if (code >= new_index)
printf("Out of range code %d (new_index %d)\n", code, new_index);
if (code == 256) // if (Code == ClearCode)
{
lzw_dict_init(&dict, &dict_as); // InitializeTable();
bpc = 9;
tiff_lzw_calc_maxcode(bpc, &maxcode);
new_index = 258;
code = get_bits_in_stream(coded, coded_pos, bpc); // Code = GetNextCode();
coded_pos += bpc;
if (code == 257) // if (Code == EoiCode)
break;
append_buf(dec, &dict[code]); // WriteString(StringFromCode(Code));
clear_buf(&word);
append_buf(&word, &dict[code]); // OldCode = Code;
}
else if (code < 4096)
{
if (dict[code].len) // if (IsInTable(Code))
{
append_buf(dec, &dict[code]); // WriteString(StringFromCode(Code));
lzw_add_to_dict(&dict, &dict_as, new_index, 0, word.buf, word.len, &bpc);
lzw_add_to_dict(&dict, &dict_as, new_index, 1, dict[code].buf, 1, &bpc); // AddStringToTable
new_index++;
tiff_lzw_calc_bpc(new_index, &bpc, &maxcode);
clear_buf(&word);
append_buf(&word, &dict[code]); // OldCode = Code;
}
else
{
clear_buf(&outstring);
append_buf(&outstring, &word);
bufwrite(&outstring, word.buf, 1); // OutString = StringFromCode(OldCode) + FirstChar(StringFromCode(OldCode));
append_buf(dec, &outstring); // WriteString(OutString);
lzw_add_to_dict(&dict, &dict_as, new_index, 0, outstring.buf, outstring.len, &bpc); // AddStringToTable
new_index++;
tiff_lzw_calc_bpc(new_index, &bpc, &maxcode);
clear_buf(&word);
append_buf(&word, &dict[code]); // OldCode = Code;
}
}
}
free_buf(&word);
free_buf(&outstring);
for (i=0; i < dict_as; i++)
free_buf(&dict[i]);
free(dict);
}
As for the results that my code produces in such situations it's quite clear from how it looks that it's only those few codes that are badly decoded, everything before and after is properly decoded, but obviously in most cases the subsequent image after one of these mystery future codes is ruined by virtue of shifting the rest of the decoded bytes by a few places. That means that my reading of the 9 to 12-bit code stream is correct, so this really means that I see a 364 code right after a 256 dictionary-clearing code.
Edit: Here's an example file that contains such weird codes. I've also found a small TIFF LZW loading library that suffers from the same problem, it crashes where my loader finds the first weird code in this image (code 3073 when the dictionary only goes up to 2051). The good thing is that since it's a small library you can test it with the following code:
#include "loadtiff.h"
#include "loadtiff.c"
void loadtiff_test(char *path)
{
int width, height, format;
floadtiff(fopen(path, "rb"), &width, &height, &format);
}
And if anyone insists on diving into my code (which should be unnecessary, and it's a big library) here's where to start.
The bogus codes come from trying to decode more than we're supposed to. The problem is that a LZW strip may sometimes not end with an End-of-Information 257 code, so the decoding loop has to stop when a certain number of decoded bytes have been output. That number of bytes per strip is determined by the TIFF tags ROWSPERSTRIP * IMAGEWIDTH * BITSPERSAMPLE / 8, and if PLANARCONFIG is 1 (which means interleaved channels as opposed to planar), by multiplying it all by SAMPLESPERPIXEL. So on top of stopping the decoding loop when a code 257 is encountered the loop must also be stopped after that count of decoded bytes has been reached.

Reading serial port faster

I have a computer software that sends RGB color codes to Arduino using USB. It works fine when they are sent slowly but when tens of them are sent every second it freaks out. What I think happens is that the Arduino serial buffer fills out so quickly that the processor can't handle it the way I'm reading it.
#define INPUT_SIZE 11
void loop() {
if(Serial.available()) {
char input[INPUT_SIZE + 1];
byte size = Serial.readBytes(input, INPUT_SIZE);
input[size] = 0;
int channelNumber = 0;
char* channel = strtok(input, " ");
while(channel != 0) {
color[channelNumber] = atoi(channel);
channel = strtok(0, " ");
channelNumber++;
}
setColor(color);
}
}
For example the computer might send 255 0 123 where the numbers are separated by space. This works fine when the sending interval is slow enough or the buffer is always filled with only one color code, for example 255 255 255 which is 11 bytes (INPUT_SIZE). However if a color code is not 11 bytes long and a second code is sent immediately, the code still reads 11 bytes from the serial buffer and starts combining the colors and messes them up. How do I avoid this but keep it as efficient as possible?
It is not a matter of reading the serial port faster, it is a matter of not reading a fixed block of 11 characters when the input data has variable length.
You are telling it to read until 11 characters are received or the timeout occurs, but if the first group is fewer than 11 characters, and a second group follows immediately there will be no timeout, and you will partially read the second group. You seem to understand that, so I am not sure how you conclude that "reading faster" will help.
Using your existing data encoding of ASCII decimal space delimited triplets, one solution would be to read the input one character at a time until the entire triplet were read, however you could more simply use the Arduino ReadBytesUntil() function:
#define INPUT_SIZE 3
void loop()
{
if (Serial.available())
{
char rgb_str[3][INPUT_SIZE+1] = {{0},{0},{0}};
Serial.readBytesUntil( " ", rgb_str[0], INPUT_SIZE );
Serial.readBytesUntil( " ", rgb_str[1], INPUT_SIZE );
Serial.readBytesUntil( " ", rgb_str[2], INPUT_SIZE );
for( int channelNumber = 0; channelNumber < 3; channelNumber++)
{
color[channelNumber] = atoi(channel);
}
setColor(color);
}
}
Note that this solution does not require the somewhat heavyweight strtok() processing since the Stream class has done the delimiting work for you.
However there is a simpler and even more efficient solution. In your solution you are sending ASCII decimal strings then requiring the Arduino to spend CPU cycles needlessly extracting the fields and converting to integer values, when you could simply send the byte values directly - leaving if necessary the vastly more powerful PC to do any necessary processing to pack the data thus. Then the code might be simply:
void loop()
{
if( Serial.available() )
{
for( int channelNumber = 0; channelNumber < 3; channelNumber++)
{
color[channelNumber] = Serial.Read() ;
}
setColor(color);
}
}
Note that I have not tested any of above code, and the Arduino documentation is lacking in some cases with respect to descriptions of return values for example. You may need to tweak the code somewhat.
Neither of the above solve the synchronisation problem - i.e. when the colour values are streaming, how do you know which is the start of an RGB triplet? You have to rely on getting the first field value and maintaining count and sync thereafter - which is fine until perhaps the Arduino is started after data stream starts, or is reset, or the PC process is terminated and restarted asynchronously. However that was a problem too with your original implementation, so perhaps a problem to be dealt with elsewhere.
First of all, I agree with #Thomas Padron-McCarthy. Sending character string instead of a byte array(11 bytes instead of 3 bytes, and the parsing process) is wouldsimply be waste of resources. On the other hand, the approach you should follow depends on your sender:
Is it periodic or not
Is is fixed size or not
If it's periodic you can check in the time period of the messages. If not, you need to check the messages before the buffer is full.
If you think printable encoding is not suitable for you somehow; In any case i would add an checksum to the message. Let's say you have fixed size message structure:
typedef struct MyMessage
{
// unsigned char id; // id of a message maybe?
unsigned char colors[3]; // or unsigned char r,g,b; //maybe
unsigned char checksum; // more than one byte could be a more powerful checksum
};
unsigned char calcCheckSum(struct MyMessage msg)
{
//...
}
unsigned int validateCheckSum(struct MyMessage msg)
{
//...
if(valid)
return 1;
else
return 0;
}
Now, you should check every 4 byte (the size of MyMessage) in a sliding window fashion if it is valid or not:
void findMessages( )
{
struct MyMessage* msg;
byte size = Serial.readBytes(input, INPUT_SIZE);
byte msgSize = sizeof(struct MyMessage);
for(int i = 0; i+msgSize <= size; i++)
{
msg = (struct MyMessage*) input[i];
if(validateCheckSum(msg))
{// found a message
processMessage(msg);
}
else
{
//discard this byte, it's a part of a corrupted msg (you are too late to process this one maybe)
}
}
}
If It's not a fixed size, it gets complicated. But i'm guessing you don't need to hear that for this case.
EDIT (2)
I've striked out this edit upon comments.
One last thing, i would use a circular buffer. First add the received bytes into the buffer, then check the bytes in that buffer.
EDIT (3)
I gave thought on comments. I see the point of printable encoded messages. I guess my problem is working in a military company. We don't have printable encoded "fire" arguments here :) There are a lot of messages come and go all the time and decoding/encoding printable encoded messages would be waste of time. Also we use hardwares which usually has very small messages with bitfields. I accept that it could be more easy to examine/understand a printable message.
Hope it helps,
Gokhan.
If faster is really what you want....this is little far fetched.
The fastest way I can think of to meet your needs and provide synchronization is by sending a byte for each color and changing the parity bit in a defined way assuming you can read the parity and bytes value of the character with wrong parity.
You will have to deal with the changing parity and most of the characters will not be human readable, but it's gotta be one of the fastest ways to send three bytes of data.

Ada - reading large files

I'm working on building an HTTP server, mostly for learning/curiosity purposes, and I came across a problem I've never had in Ada before. If I try to read files that are too big using Direct_IO, I get a Storage Error: Stack Overflow exception. This almost never happens, but when I request a video file, the exception will be thrown.
So I got the idea to read and send files in chunks of 1M characters at a time but this leaves me with End Errors since most files aren't going to be exactly 1M characters in length. I'm also not entirely sure I did it right anyway since reading the whole file has always been sufficient before. Here is the procedure that I've written:
procedure Send_File(Channel : GNAT.Sockets.Stream_Access; Filepath : String) is
File_Size : Natural := Natural(Ada.Directories.Size (Filepath));
subtype Meg_String is String(1 .. 1048576);
package Meg_String_IO is new Ada.Direct_IO(Meg_String);
Meg : Meg_String;
File : Meg_String_IO.File_Type;
Count : Natural := 0;
begin
loop
Meg_String_IO.Open(File, Mode => Meg_String_IO.In_File, Name => Filepath);
Meg_String_IO.Read(File, Item => Meg);
Meg_String_IO.Close(File);
String'Write(Channel, Meg);
exit when Count >= File_Size;
Count := Count + 1048576;
end loop;
end Send_File;
I had the thought to declare two separate Direct_IO packages/string sizes, where one would be 1048576 in length while the other would be the file length mod 1048576 in length but I'm not sure how I would use the two readers sequentially.
Thanks to anyone who can help.
I'd use Stream_IO (ARM A.12.1), which allows you to read into a buffer and tells you how much data was actually read; see the second form of Read,
procedure Read (File : in File_Type;
Item : out Stream_Element_Array;
Last : out Stream_Element_Offset);
with semantics described in ARM 13.13.1 (8),
The Read operation transfers stream elements from the specified stream to fill the array Item. Elements are transferred until Item'Length elements have been transferred, or until the end of the stream is reached. If any elements are transferred, the index of the last stream element transferred is returned in Last. Otherwise, Item'First - 1 is returned in Last. Last is less than Item'Last only if the end of the stream is reached.
procedure Send_File (Channel : GNAT.Sockets.Stream_Access;
Filepath : String) is
File : Ada.Streams.Stream_IO.File_Type;
Buffer : Ada.Streams.Stream_Element_Array (1 .. 1024);
Last : Ada.Streams.Stream_Element_Offset;
use type Ada.Streams.Stream_Element_Offset;
begin
Ada.Streams.Stream_IO.Open (File,
Mode => Ada.Streams.Stream_IO.In_File,
Name => Filepath);
loop
Read the next Buffer-full from File. Last receives the index of the last byte that was read; if we’ve reached the end-of-file in this read, Last will be less than Buffer’Last.
Ada.Streams.Stream_IO.Read (File, Item => Buffer, Last => Last);
Write the data that was actually read. If File’s size is a multiple of Buffer’Length, the last Read will read no bytes and will return a Last of 0 (Buffer’First - 1), so this will write Buffer (1 .. 0), i.e. no bytes.
Ada.Streams.Write (Channel.all, Buffer (1 .. Last));
The only reason for reading less than a Buffer-full is that end-of-file was reached.
exit when Last < Buffer’Last;
end loop;
Ada.Streams.Stream_IO.Close (File);
end Send_File;
(note also that it’s best to open and close the file outside the loop!)

An Audio Buffer after mpg123_read, what is it? How can I manipulate it?

This is the example code:
while (mpg123_read(mh, buffer, buffer_size, &done) == MPG123_OK)
{
// -> I'm consider this line
if((ao_play(dev, (char*)buffer, done)==0)){
}
}
In this code i want to edit the audio before it's played. Anyone suggest me to use fft to do this, personally i'm try to do this:
while (mpg123_read(mh, buffer, buffer_size, &done) == MPG123_OK)
{
buffer=((int)buffer)*2
if((ao_play(dev, (char*)buffer, done)==0))
}
for experiment, but this can't do anything. So, what is a buffer? How i can change it in real time? And can i stop it and after resume it (also called "pause" in music player..)?
Sorry for noob questions but I'm starting to program just from 6 months.
A buffer is a memory block used to contain an arbitrary, up bounded amount of data. In C, it's used as an array. If the buffer is dynamically allocated, then the variable buffer is a pointer that points to the address where the actual buffer (memory block) begins. You have to look at the declaration of variable buffer to know what is the type of the elements inside such array.
Also you have to look at the mpg123 documentation to know how to interpret the data that is returned by the mpg123_read() function.
Making an educated guess based upon the nature of the data you have decoded, I would say that buffer is probably an array of interleaved short ints that comprises data for stereo channels L and R of an uncompressed 16-bit audio signal. Channel L being at even indexed elements, and channel R being at odd indexed elements.
So, a possible editing would be like this:
for (int i=0;i<done;i+=2)
{
buffer[i] = (buffer[i+1] - buffer[i]) / 2;
}
This would substract left channel data with right channel data, cancelling any audio data that is identical on both channels. It's the basic technique for cancelling vocals in a song.
Your proposed editing has no meaning. You are changing the value of the pointer buffer by multiplying it by two. That makes the pointer to have a very different memory address, much possibly illegal, so when that pointer is used in ao_play() you will get a segmentation fault.
I guess that what you want to do with your example is to make your audio data twice louder, don't you? In that case, you are looking for this:
for (int i=0;i<done;i+=2)
{
if (buffer[i]>16383)
buffer[i] = 32767;
else if (buffer[i]<-16384)
buffer[i] = -32768;
else
buffer[i] = 2*buffer[i];
}
To stop and resume, you have to find a way for your program to check the value of something you can change with an input device (a button pressed in a window, a key pressed, etc).
For example, let's say you have a function called khbit() that returns non zero if a key is being pressed (this function is present in DOS compilers and sometimes is available as non-standard library for easing portability of older DOS programs: look at conio.h if you have it). Then you can do something like this:
int paused = 0; /* flip-flop variable to pause/resume playing */
while (mpg123_read(mh, buffer, buffer_size, &done) == MPG123_OK)
{
if (!paused)
{
if((ao_play(dev, (char*)buffer, done)==0))
break;
}
if (kbhit() && getchar()==' ')
paused = !paused;
}
This will play/pause your music using the SPACE bar.
It will not pause the music, only mute it. Reading (mpg123_read) goes on so you're just skipping a part.

What could cause a Labwindows/CVI C program to hate the number 2573?

Using Windows
So I'm reading from a binary file a list of unsigned int data values. The file contains a number of datasets listed sequentially. Here's the function to read a single dataset from a char* pointing to the start of it:
function read_dataset(char* stream, t_dataset *dataset){
//...some init, including setting dataset->size;
for(i=0;i<dataset->size;i++){
dataset->samples[i] = *((unsigned int *) stream);
stream += sizeof(unsigned int);
}
//...
}
Where read_dataset in such a context as this:
//...
char buff[10000];
t_dataset* dataset = malloc( sizeof( *dataset) );
unsigned long offset = 0;
for(i=0;i<number_of_datasets; i++){
fseek(fd_in, offset, SEEK_SET);
if( (n = fread(buff, sizeof(char), sizeof(*dataset), fd_in)) != sizeof(*dataset) ){
break;
}
read_dataset(buff, *dataset);
// Do something with dataset here. It's screwed up before this, I checked.
offset += profileSize;
}
//...
Everything goes swimmingly until my loop reads the number 2573. All of a sudden it starts spitting out random and huge numbers.
For example, what should be
...
1831
2229
2406
2637
2609
2573
2523
2247
...
becomes
...
1831
2229
2406
2637
2609
0xDB00000A
0xC7000009
0xB2000008
...
If you think those hex numbers look suspicious, you're right. Turns out the hex values for the values that were changed are really familiar:
2573 -> 0xA0D
2523 -> 0x9DB
2247 -> 0x8C7
So apparently this number 2573 causes my stream pointer to gain a byte. This remains until the next dataset is loaded and parsed, and god forbid it contain a number 2573. I have checked a number of spots where this happens, and each one I've checked began on 2573.
I admit I'm not so talented in the world of C. What could cause this is completely and entirely opaque to me.
You don't specify how you obtained the bytes in memory (pointed to by stream), nor what platform you're running on, but I wouldn't be surprised to find your on Windows, and you used the C stdio library call fopen(filename "r"); Try using fopen(filename, "rb");. On Windows (and MS-DOS), fopen() translates MS-DOS line endings "\r\n" (hex 0x0D 0x0A) in the file to Unix style "\n", unless you append "b" to the file mode to indicate binary.
A couple of irrelevant points.
sizeof(*dataset) doesn't do what you think it does.
There is no need to use seek on every read
I don't understand how you are calling a function that only takes one parameter but you are giving it two (or at least I don't understand why your compiler doesn't object)

Resources