I am trying to read a block which contains block bitmap and inode bitmap
I read a block as a unsigned char array
than I convert it to binary as follows:
for (i = 0; i < 4096; i++) {
for (j = 8; j <=0 ; j--) {
bits[j] = bitmap[i]%2;
bitmap[i] = bitmap[i]/2;
}
for(t=0; t<8;t++)
printf("%d\t",bits[t]);
printf("\n");
}
when I put '0' to char and print it as
printf("%d",'0');
I get 48
and my bits array contains 00110000
That works, however when I check inode bitmap
it does not work
for example a bitmap is:
1 1 1 0 0 0 0
but I get
0 0 0 0 1 1 1
I could not check if same thing happens with block bitmap.
To repeat, the code works normal conversation for example it prints 00110000 which is 48, for char '0' which print 48 also. This swapping occurs with inode bitmap.
When I change for it will work for inode bitmap but how can I now it will work for blok bitmap. This will fix the code but the logic is wrong.
Any idea?
The lines
for(t=0; t<8;t++)
printf("%d\t",bits[t]);
print the bit at position 0 (the least significant one) first and the bit at position 7 (the most significant) last. As you want it to be the other way around, change the loop to:
for(t=7; t>=0;t--)
or similar.
looks like your bit order is swapped
big-endian vs little endian.
http://en.wikipedia.org/wiki/Endianness
you can swap with htonl, htons, ntohl, ntohs family of functions. try man htons.
or run your loop in reverse.
Related
I wrote this function that performs a slightly modified variation of run-length encoding on text files in C.
I'm trying to generalize it to binary files but I have no experience working with them. I understand that, while I can compare bytes of binary data much the same way I can compare chars from a text file, I am not sure how to go about printing the number of occurrences of a byte to the compressed version like I do in the code below.
A note on the type of RLE I'm using: bytes that occur more than once in a row are duplicated to signal the next-to-come number is in fact the number of occurrences vs just a number following the character in the file. For occurrences longer than one digit, they are broken down into runs that are 9 occurrences long.
For example, aaaaaaaaaaabccccc becomes aa9aa2bcc5.
Here's my code:
char* encode(char* str)
{
char* ret = calloc(2 * strlen(str) + 1, 1);
size_t retIdx = 0, inIdx = 0;
while (str[inIdx]) {
size_t count = 1;
size_t contIdx = inIdx;
while (str[inIdx] == str[++contIdx]) {
count++;
}
size_t tmpCount = count;
// break down counts with 2 or more digits into counts ≤ 9
while (tmpCount > 9) {
tmpCount -= 9;
ret[retIdx++] = str[inIdx];
ret[retIdx++] = str[inIdx];
ret[retIdx++] = '9';
}
char tmp[2];
ret[retIdx++] = str[inIdx];
if (tmpCount > 1) {
// repeat character (this tells the decompressor that the next digit
// is in fact the # of consecutive occurrences of this char)
ret[retIdx++] = str[inIdx];
// convert single-digit count to string
snprintf(tmp, 2, "%ld", tmpCount);
ret[retIdx++] = tmp[0];
}
inIdx += count;
}
return ret;
}
What changes are in order to adapt this to a binary stream? The first problem I see is with the snprintf call since it's operating using a text format. Something that rings a bell is also the way I'm handling the multiple-digit occurrence runs. We're not working in base 10 anymore so that has to change, I'm just unsure how having almost never worked with binary data.
A few ideas that can be useful to you:
one simple method to generalize RLE to binary data is to use a bit-based compression. For example the bit sequence 00000000011111100111 can be translated to the sequence 0 9623. Since the binary alphabet is composed by only two symbols, you need to only store the first bit value (this can be as simple as storing it in the very first bit) and then the number of the contiguous equal values. Arbitrarily large integers can be stored in a binary format using Elias gamma coding. Extra padding can be added to fit the entire sequence nicely into an integer number of bytes. So using this method, the above sequence can be encoded like this:
00000000011111100111 -> 0 0001001 00110 010 011
^ ^ ^ ^ ^
first bit 9 6 2 3
If you want to keep it byte based, one idea is to consider all the even bytes frequencies (interpreted as an unsigned char) and all the odd bytes the values. If one byte occur more than 255 times, than you can just repeat it. This can be very inefficient, though, but it is definitively simple to implement, and it might be good enough if you can make some assumptions on the input.
Also, you can consider moving out from RLE and implement Huffman's coding or other sophisticated algorithms (e.g. LZW).
Implementation wise, i think tucuxi already gave you some hints.
You only have to address 2 problems:
you cannot use any str-related functions, because C strings do not deal well with '\0'. So for example, strlen will return the index of the 1st 0x0 byte in a string. The length of the input must be passed in as an additional parameter: char *encode(char *start, size_t length)
your output cannot have an implicit length of strlen(ret), because there may be extra 0-bytes sprinkled about in the output. You again need an extra parameter: size_t encode(char *start, size_t length, char *output) (this version would require the output buffer to be reserved externally, with a size of at least length*2, and return the length of the encoded string)
The rest of the code, assuming it was working before, should continue to work correctly now. If you want to go beyond base-10, and for instance use base-256 for greater compression, you would only need to change the constant in the break-things-up loop (from 9 to 255), and replace the snprintf as follows:
// before
snprintf(tmp, 2, "%ld", tmpCount);
ret[retIdx++] = tmp[0];
// after: much easier
ret[retIdx++] = tmpCount;
Pre-condition:
I can't use malloc.
The error will happen within two bytes, means I can search word by word.
My CPU is 32bit ARM11, no OS during this time.
The first two bytes are important, if first two bytes are 0x00, that means all the rest of the bytes should be 0x00.
If first two bytes are 0xFF, all the rest of the bytes should be 0xFF.
If first two byte are not both 0x0000 and 0xFFFF, I just report an error, no need compare the rest.
I read 256Kbyte block data, which should only have two states:
all 0xFF
all 0x00
However, some data may change to a non-predictable value. I need find them out. I can search it one byte by one byte but seems too slow, so I decided to use dichotomy way to do it — which looks like:
divide read out data into equal half, then compare.
if both are not equal to all 0 or F, it means the data is corrupt at both side and I just need to find the earliest one, so I should give up the 2nd part and just divide the first part again. If only one side has problem, just give up the good one and focus on the problematic on.e
loop above idea
seems after 17 time, should find the point.
How to write the code into loop? Do I need 17 different reference static data with different sizes and to use memcmp?
My current code looks like:
unsigned char gReferData1[2] = {0xFF, 0xFF};
unsigned char gReferData2[2] = {0x00, 0x00};
int main(void)
{
int i = 0, result1 = 0, result2 =0;
read_somewhere(readBuff, sizeof(readBuff)); //read out data
//first test first two bytes
result1 = memcmp(gReferData1, readBuff, 2); //test if 0xFFFF
result2 = memcmp(gReferData2, readBuff, 2); //test if 0x0000
if(result1 == 0)
{
// means all rest data should be 0xFF
for(i=2; i<(0x40000/2); i++)
{
result1 = memcmp(gReferData1, readBuff + offet, 2); //test if 0xFFFF
if(result1 != 0)
{
//means find
// do error handle
}
offset+=2;
}
}
else if(result2 == 0)
{
// means all rest data should be 0x00
for(i=2; i<(0x40000/2); i++)
{
result2 = memcmp(gReferData2, readBuff + offet, 2); //test if 0x0000
if(result2 != 0)
{
//means find
// do error handle
}
offset+=2;
}
}
else
{
//just error
// do error handle
}
return 0;
}
In order to find a defect at a random position you will need to examine each byte at least once. There is no algorithm faster than O(n) for this.
However, your proposed algorithm requires to examine each byte more than once. In order to "divide read out data into equal half, then compare", you will have to read every byte. This is what memcmp will do internally: loop through both memory segments from start to finish until there is a discepancy. It isn't magic. It can't do that any more efficiently than you could with a simple loop.
An optimization which might speed this up (test and measure it!) could be to not go through your data-array byte-by-byte but in steps of sizeof(long) and then cast that segment to long before you compare it. This makes use of the fact that on many 32bit CPUs (not all, test and measure it!) it doesn't take more time to compare two 32bit values than it takes to compare two 8bit values.
You need to check that no byte of that buffer has an illegal state, so you have to check each byte at least once.
On most systems, jumping around is expensive, and reading bytes sequentially is less expensive than anything. So I'd use the more sequential reads possible.
One thing you might try to do is to read the whole buffer sequentially and compare each entry with that of the previous entry, "entry" being a byte or a 16, 32, or 64-bit word, depending on which is faster:
DATATYPE previous = *bufptr;
for (i = 1; i < (length of buffer divided by DATATYPE size); i++) {
if (previous != *(bufptr++)) {
break;
}
}
if (i != (length of buffer divided by DATATYPE size)) {
// There has been an error.
}
// Verify that previous is either 0 or the appropriate number of 0xF's.
Another possibility is to run a memcmp() between the first half of the buffer and the second half of the buffer, then (just for the lols) verify that the first byte is indeed either 0x00 or 0xFF. This fails if two bits in the same relative positions in the two halves flip at the same time. How likely is that? It also depends on the hardware (suppose the buffer is two identical chips one over the other, skewered by the same cosmic ray incoming at a perfectly right angle...?).
Depending on the architecture, the compiler and optimizations used, either solution might turn out to be faster; probably not by all that much.
I tried to run this program through the terminal and this error showed up.
"Segmentation Fault: 11"
I would like to know why . What this program does is, it reads a .ppm file and it saves it's information in a matrix variable of type Pixel, so, a PPM file is basically composed by: the first line is going to be "P3" by default, second line the size of the matrix, and the third line the highest value possible for a Pixel attribute, the other lines will have 3 integers of maximum value of 255, so for each member of the matrix there will be a pixel R, G, B.
what I tried to do in the function save_image, first recognize if we are dealing with a ppm file(checking if there is a P3 in the first line), then read the number of rows and columns for the matrix, then it creates a new matrix using the malloc function, then it will save the data in the lines of the file to the .r and .g and .b of the variable myImg.
I am very new to debugging/programming so I'm sorry if this isn't enough information but I tried my best.
#include <stdio.h>
#include <stdlib.h>
typedef struct{
int r;
int g;
int b;
}Pixel;
void save_image(FILE* img, Pixel ** newImg) {
int i;
int j;
int fcount;
int scount;
int count;
int dcc;
char init[3];
fscanf(img,"%s",init);
if(init[0]=='P' && init[1]=='3'){
printf("worked!\n");
fscanf(img,"%d %d",&j,&i);
fscanf(img, "%d",&dcc);
*newImg = (Pixel*)malloc(sizeof(Pixel) * i);
for ( count = 0; count < i ; ++count)
{
newImg[count] = (Pixel*)malloc(sizeof(Pixel) * j);
}
for (fcount = 0; fcount <= i ; ++fcount)
{
for (scount = 0; scount <= j; ++scount)
{
fscanf(img,"%i %i %i",&newImg[i][j].r,&newImg[i][j].g,&newImg[i][j].b);
}
}
}
else
printf("Type of file not recognized\n");
fclose(img);
}
int main(int argc, char const *argv[])
{
FILE* image;
Pixel myImg;
Pixel** newImg;
**newImg = myImg;
image = fopen(argv[1],"r");
save_image(image,newImg);
return 0;
}
The program fails because the initial malloc for newImg[] is malloc'ing some multiple of the size of Pixel rather than size of pointer to Pixel combined with problems with the passing of the pointer to the newImg as a parameter to the save_image() function. See my comment about where the variable newImg should be defined and the desirable modification to the declaration of the save_image() function
Given the was the posted code is written, it seems to be expecting the 'plain' .ppm file format
and the posted code is failing to allow for any embedded comments within the file
given this description of the format of a .ppm file:
The format definition is as follows. You can use the libnetpbm C subroutine library to read and interpret the format conveniently and accurately.
A PPM file consists of a sequence of one or more PPM images. There are no data, delimiters, or padding before, after, or between images.
Each PPM image consists of the following:
A "magic number" for identifying the file type. A ppm image's magic number is the two characters "P6".
Whitespace (blanks, TABs, CRs, LFs).
A width, formatted as ASCII characters in decimal.
Whitespace.
A height, again in ASCII decimal.
Whitespace.
The maximum color value (Maxval), again in ASCII decimal. Must be less than 65536 and more than zero.
A single whitespace character (usually a newline).
A raster of Height rows, in order from top to bottom. Each row consists of Width pixels, in order from left to right. Each pixel is a triplet of red, green, and blue samples, in that order. Each sample is represented in pure binary by either 1 or 2 bytes. If the Maxval is less than 256, it is 1 byte. Otherwise, it is 2 bytes. The most significant byte is first.
A row of an image is horizontal. A column is vertical. The pixels in the image are square and contiguous.
In the raster, the sample values are "nonlinear." They are proportional to the intensity of the ITU-R Recommendation BT.709 red, green, and blue in the pixel, adjusted by the BT.709 gamma transfer function. (That transfer function specifies a gamma number of 2.2 and has a linear section for small intensities). A value of Maxval for all three samples represents CIE D65 white and the most intense color in the color universe of which the image is part (the color universe is all the colors in all images to which this image might be compared).
ITU-R Recommendation BT.709 is a renaming of the former CCIR Recommendation 709. When CCIR was absorbed into its parent organization, the ITU, ca. 2000, the standard was renamed. This document once referred to the standard as CIE Rec. 709, but it isn't clear now that CIE ever sponsored such a standard.
Note that another popular color space is the newer sRGB. A common variation on PPM is to substitute this color space for the one specified.
Note that a common variation on the PPM format is to have the sample values be "linear," i.e. as specified above except without the gamma adjustment. pnmgamma takes such a PPM variant as input and produces a true PPM as output.
Strings starting with "#" may be comments, the same as with PBM.
Note that you can use pamdepth to convert between a the format with 1 byte per sample and the one with 2 bytes per sample.
All characters referred to herein are encoded in ASCII. "newline" refers to the character known in ASCII as Line Feed or LF. A "white space" character is space, CR, LF, TAB, VT, or FF (I.e. what the ANSI standard C isspace() function calls white space).
Plain PPM
There is actually another version of the PPM format that is fairly rare: "plain" PPM format. The format above, which generally considered the normal one, is known as the "raw" PPM format. See pbm for some commentary on how plain and raw formats relate to one another and how to use them.
The difference in the plain format is:
There is exactly one image in a file.
The magic number is P3 instead of P6.
Each sample in the raster is represented as an ASCII decimal number (of arbitrary size).
Each sample in the raster has white space before and after it. There must be at least one character of white space between any two samples, but there is no maximum. There is no particular separation of one pixel from another -- just the required separation between the blue sample of one pixel from the red sample of the next pixel.
No line should be longer than 70 characters.
Here is an example of a small image in this format.
P3
# feep.ppm
4 4
15
0 0 0 0 0 0 0 0 0 15 0 15
0 0 0 0 15 7 0 0 0 0 0 0
0 0 0 0 0 0 0 15 7 0 0 0
15 0 15 0 0 0 0 0 0 0 0 0
There is a newline character at the end of each of these lines.
Programs that read this format should be as lenient as possible, accepting anything that looks remotely like a PPM image.
Can anyone help me with some code i need to implement bit stuffing on an array of data? The program is for an AVR micro-controller (Tiny84A) using GNU C.
unsigned char datas[3] = {00011111,10000001,00000000};
To add 0 to each set of 5 one’s i.e. after every five consecutive 1′s appear a zero
Therefore data should be
00011111,10000001 becomes 00011111 01000000 10000000
I'm unsure where to start, an example would be great!
There are lots of examples on the web for bit stuffing (inserting a 0 after 5 1s). Unfortunately, many of them involve reading and writing strings of characters 0 and 1.
There are also examples of bit streams. It looks like you need to find examples of each and combine them.
You need to expand your question to explain more of the design decisions you have made.
In particular, it looks like you are expecting the input to be in an array instead of a stream. Are you expecting to write the output to the same array? That is a little bit trickier than writing to a different array. A stream of bits would be different again.
The basic idea is to keep counters of where you are in the input and output streams, and load and write bytes appropriately as you reach the end of 8 bits.
unsigned char datas[3] = {00011111,10000001,00000000};
int in_byte, in_bit, out_byte, out_bit;
void init()
{
in_byte = 0; in_bit = 0; out_byte = 0; out_bit = 0;
}
int get_bit()
{
int ret = 0;
if (datas[in_byte] & 1 << (7 - in_bit))
{
ret = 1;
{
++in_bit;
if (in_bit == 8)
{
in_bit = 0;
++in_byte;
}
return ret;
}
You have to write your own void put_bit(int bit), but it is similar using the "or" operator |.
You also need to make a function to do the loop that counts up to 5 and puts an extra 0. Look at the examples on the web for that.
Following my previous question (Why do I get weird results when reading an array of integers from a TCP socket?), I have come up with the following code, which seems to work, sort of. The code sample works well with a small number of array elements, but once it becomes large, the data is corrupt toward the end.
This is the code to send the array of int over TCP:
#define ARRAY_LEN 262144
long *sourceArrayPointer = getSourceArray();
long sourceArray[ARRAY_LEN];
for (int i = 0; i < ARRAY_LEN; i++)
{
sourceArray[i] = sourceArrayPointer[i];
}
int result = send(clientSocketFD, sourceArray, sizeof(long) * ARRAY_LEN);
And this is the code to receive the array of int:
#define ARRAY_LEN 262144
long targetArray[ARRAY_LEN];
int result = read(socketFD, targetArray, sizeof(long) * ARRAY_LEN);
The first few numbers are fine, but further down the array the numbers start going completely different. At the end, when the numbers should look like this:
0
0
0
0
0
0
0
0
0
0
But they actually come out as this?
4310701
0
-12288
32767
-1
-1
10
0
-12288
32767
Is this because I'm using the wrong send/recieve size?
The call to read(..., len) doesn't read len bytes from the socket, it reads a maximum of len bytes. Your array is rather big and it will be split over many TCP/IP packets, so your call to read probably returns just a part of the array while the rest is still "in transit". read() returns how many bytes it received, so you should call it again until you received everything you want. You could do something like this:
long targetArray[ARRAY_LEN];
char *buffer = (char*)targetArray;
size_t remaining = sizeof(long) * ARRAY_LEN;
while (remaining) {
ssize_t recvd = read(socketFD, buffer, remaining);
// TODO: check for read errors etc here...
remaining -= recvd;
buffer += recvd;
}
Is the following ok?
for (int i = 0; sourceArrayPointer < i; i++)
You are comparing apples and oranges (read pointers and integers). This loop doesnot get executed since the pointer to array of longs is > 0 (most always). So, in the receiving end, you are reading off of from an unitialized array which results in those incorrect numbers being passed around).
It'd rather be:
for (int i = 0; i < ARRAY_LEN; i++)
Use functions from <net/hton.h>
http://en.wikipedia.org/wiki/Endianness#Endianness_in_networking
Not related to this question, but you also need to take care of endianness of platforms if you want to use TCP over different platforms.
It is much simpler to use some networking library like curl or ACE, if that is an option (additionally you learn a lot more at higher level like design patterns).
There is nothing to guarantee how TCP will packet up the data you send to a stream - it only guarantees that it will end up in the correct order at the application level. So you need to check the value of result, and keep on reading until you have read the right number of bytes. Otherwise you won't have read the whole of the data. You're making this more difficult for yourself using a long array rather than a byte array - the data may be send in any number of chunks, which may not be aligned to long boundaries.
I see a number of problem's here. First, this is how I would rewrite your send code as I understand it. I assume getSourceArray always returns a valid pointer to a static or malloced buffer of size ARRAY_LEN. I'm also assuming you don't need sourceArrayPointer later in the code.
#define ARRAY_LEN 262144
long *sourceArrayPointer = getSourceArray();
long sourceArray[ARRAY_LEN];
long *sourceArrayIdx = sourceArray;
for (; sourceArrayIdx < sourceArray+ARRAY_LEN ; )
sourceArrayIdx++ = sourceArrayPointer++;
int result = send(clientSocketFD, sourceArray, sizeof(long) * ARRAY_LEN);
if (result < sizeof(long) * ARRAY_LEN)
printf("send returned %d\n", result);
Looking at your original code I'm guessing that your for loop was messed up and never executing resulting in you sending whatever random junk happens to be in the memory sourceArray points to. Basically your condition
sourceArrayPointer < i;
is pretty much guaranteed to fail the first time through.