How to read caret notations from a binary file in C? - c

I have a requirement to read the characters in a binary file one byte at a time and concatenate them when a certain condition is met. I'm running into problems while reading the null character i.e ^# as denoted in the caret notations. Neither snprintf nor strcpy is helping me concatenate this null character with other characters. It is strange because when I print this character using
printf("%c",char1);
it prints out the null character in the caret notation i.e ^#. So my understanding is that even snprintf should have succeeded in concatenation.
Could anybody please let me know how can I achieve such a concatenation?
Thanks

C strings are null-terminated. If your input data can contain null bytes, you cannot use the string functions safely. Instead, consider just allocating a large-enough buffer (or dynamically resizing it as needed) and write each incoming byte to the right place in that buffer.

Since you're not working with raw ANSI strings, you can't use functions meant to be used with raw ANSI strings, because the way the interpret strings.
In C (and C++) strings are usually null terminated, i.e. the last character is \0 (value 0x00). At least this is true for standard functions for string manipulation and input/ouput (like printf() or strcpy()).
For example, the line
const char *text = "Hello World";
behind the scenes becomes
const char *text = "Hello World\0";
So when you're reading \0 from a file and putting it into your string, you essentially end up with an essentially empty string.
To make the issue more clear, just a simple example:
// Let's just assume the sequence 0x00, 0x01 is some special encoding
const char *input = "Hello\0\1World!";
char output[256];
strcpy(output, input);
// strncpy() is for string manipulation, as such it will stop once it encounters a null terminator
printf("%s\n", output); // This will print 'Hello'
memcpy(output, input, 14); // 14 is the string length above plus null terminator
printf("%s\n", output); // This will again print 'Hello' (since it stops at the terminator)
printf("%s\n", output + 7); // This will print "World" (you're skipping the terminator using the offset)
The following is a quick example I put together. It doesn't necessarily show best practice and it's also likely that there are a few bugs in there, but it should show you some possible concepts how to work with raw byte data, avoiding standard string functions wherever possible.
#include <stdio.h>
#define WIDTH 16
int main (int argc, char **argv) {
int offset = 0;
FILE *fp;
int byte;
char buffer[WIDTH] = ""; // This buffer will store the data read, essentially concatenating it
if (argc < 2)
return 1;
if (fp = fopen(argv[1], "rb")) {
for(;;) {
byte = fgetc(fp); // get the next byte
if (byte == EOF) { // did we read over the end of the file?
if (offset % WIDTH)
printf("%*s %*.*s", 3 * (WIDTH - offset % WIDTH), "", offset % WIDTH, offset % WIDTH, buffer);
else
printf("\n");
return 0;
}
if (offset % WIDTH == 0) { // should we print the offset?
if (offset)
printf(" %*.*s", WIDTH, WIDTH, buffer); // print the char representation of the last line
printf("\n0x%08x", offset);
}
// print the hex representation of the current byte
printf(" %02x", byte);
// add printable characters to our buffer
if (byte >= ' ')
buffer[offset % WIDTH] = byte;
else
buffer[offset % WIDTH] = '.';
// move the offset
++offset;
}
fclose(fp);
}
return 0;
}
Once compiled, pass any file as the first parameter to view its contents (shouldn't be too big to not break formatting).

Related

C - Using sprintf() to put a prefix inside of a string

I'm trying to use sprintf() to put a string "inside itself", so I can change it to have an integer prefix. I was testing this on a character array of length 12 with "Hello World" inside it already.
The basic premise is that I want a prefix that denotes the amount of words within a string. So I copy 11 characters into a character array of length 12.
Then I try to put the integer followed by the string itself by using "%i%s" in the function. To get past the integer (I don't just use myStr as the argument for %s), I make sure to use myStr + snprintf(NULL, 0, "%i", wordCount), which should be myStr + characters taken up by the integer.
The problem is that I'm having is that it eats the 'H' when I do this and prints "2ello World" instead of having the '2' right beside the "Hello World"
So far I've tried different options for getting "past the integer" in the string when I try to copy it inside itself, but nothing really seems to be the right case, as it either comes out as an empty string or just the integer prefix itself '222222222222' copied throughout the entire array.
int main() {
char myStr[12];
strcpy(myStr, "Hello World");//11 Characters in length
int wordCount = 2;
//Put the integer wordCount followed by the string myStr (past whatever amount of characters the integer would take up) inside of myStr
sprintf(myStr, "%i%s", wordCount, myStr + snprintf(NULL, 0, "%i", wordCount));
printf("\nChanged myStr '%s'\n", myStr);//Prints '2ello World'
return 0;
}
First, to insert a one-digit prefix into a string “Hello World”, you need a buffer of 13 characters—one for the prefix, eleven for the characters in “Hello World”, and one for the terminating null character.
Second, you should not pass a buffer to snprintf as both the output buffer and an input string. Its behavior is not defined by the C standard when objects passed to it overlap.
Below is a program that shows you how to insert a prefix by moving the string with memmove. This is largely tutorial, as it is not generally a good way to manipulate strings. For short strings, where space is not an issue, most programmers would simply print the desired string into a temporary buffer, avoiding overlap issues.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/* Insert a decimal numeral for Prefix into the beginning of String.
Length specifies the total number of bytes available at String.
*/
static void InsertPrefix(char *String, size_t Length, int Prefix)
{
// Find out how many characters the numeral needs.
int CharactersNeeded = snprintf(NULL, 0, "%i", Prefix);
// Find the current string length.
size_t Current = strlen(String);
/* Test whether there is enough space for the prefix, the current string,
and the terminating null character.
*/
if (Length < CharactersNeeded + Current + 1)
{
fprintf(stderr,
"Error, not enough space in string to insert prefix.\n");
exit(EXIT_FAILURE);
}
// Move the string to make room for the prefix.
memmove(String + CharactersNeeded, String, Current + 1);
/* Remember the first character, because snprintf will overwrite it with a
null character.
*/
char Temporary = String[0];
// Write the prefix, including a terminating null character.
snprintf(String, CharactersNeeded + 1, "%i", Prefix);
// Restore the first character of the original string.
String[CharactersNeeded] = Temporary;
}
int main(void)
{
char MyString[13] = "Hello World";
InsertPrefix(MyString, sizeof MyString, 2);
printf("Result = \"%s\".\n", MyString);
}
The best way to deal with this is to create another buffer to output to, and then if you really need to copy back to the source string then copy it back once the new copy is created.
There are other ways to "optimise" this if you really needed to, like putting your source string into the middle of the buffer so you can append and change the string pointer for the source (not recommended, unless you are running on an embedded target with limited RAM and the buffer is huge). Remember code is for people to read so best to keep it clean and easy to read.
#define MAX_BUFFER_SIZE 128
int main() {
char srcString[MAX_BUFFER_SIZE];
char destString[MAX_BUFFER_SIZE];
strncpy(srcString, "Hello World", MAX_BUFFER_SIZE);
int wordCount = 2;
snprintf(destString, MAX_BUFFER_SIZE, "%i%s", wordCount, srcString);
printf("Changed string '%s'\n", destString);
// Or if you really want the string put back into srcString then:
strncpy(srcString, destString, MAX_BUFFER_SIZE);
printf("Changed string in source '%s'\n", srcString);
return 0;
}
Notes:
To be safer protecting overflows in memory you should use strncpy and snprintf.

Assigning part of a string to a char * in C

I want to copy X to Y words of a string to the out char * array.
unsigned char * string = "HELLO WORLD!!!" // length 14
unsigned char out[9];
size_t length = 9;
for(i=0 ;i < length ;++i)
{
out[i] = string[i+3];
}
printf("%s = string\n%s = out\n", string, out);
When looking at the output of out, why is there gibberish after a certain point of my string? I see the string of out as LO WORLD!# . Why are there weird characters appearing after the content I copied, isn't out supposed to be a an array of 9? I expected the output to be
LO WORLD!
In C you need to terminate your string with a 0x00 value so a string of length 9 needs ten bytes to store it with the last set to 0. Otherwise your print statements run off into random data.
unsigned char * string = "HELLO WORLD!!!" // length 14
unsigned char out[10];
size_t length = 9;
for(i=0 ;i < length ;++i)
{
out[i] = string[i+3];
}
out[length] = 0x00;
printf("%s = string\n%s = out\n", string, out);
A minor point, but string literals have type char* (or const char* in C++), not unsigned char* -- these might be the same in your implementation, but they don't need to be.
Furthermore, this is not true:
unsigned char * string = "HELLO WORLD!!!" // length 14
The string actually occupies 15 bytes -- there is an extra, hidden '\0' at the end, called a nul byte, which marks the end of the string. These nul terminators are very important, because if they're not present, then many C library functions which manipulate strings will keep going until they hit a byte with a value equal to '\0' -- and so can end up reading or trampling over bits of memory they shouldn't do. This is called a buffer overrun, and is a classic bug (and exploitable security problem) in C programmes.
In your example, you haven't included this nul terminator in your copied string, so printf() just keeps going until it finds one, hence the gibberish you're seeing. In general, it's a good idea only to use C library functions to manipulate C strings if possible, as these are careful to add the terminator for you. In this case, strncpy from string.h does exactly what you're after.
A 9 character string needs 10 bytes because it must be null ( 0 ) terminated. Try this:
unsigned char out[10]; // make this 10
size_t length = 9;
for(i=0 ;i < length ;++i)
{
out[i] = string[i+3];
}
out[i] = 0; // add this to terminate the string
A better approach would be just the line:
strncpy(out, string+3, 9);
C strings must be null terminated. You only created an array large enough for 8 characters + the null terminator, but you never added the terminator.
So, you need to allocate the length plus 1 and add the terminator.
// initializes all elements to 0
char out[10] = {0};
// alternatively, add it at the end.
out[9] = '\0';
Think of it this way; you're passed a char* which represents a string. How do you know how long it is? How can you read it? Well, in C, a sentinel value is added to the end. This is the null terminator. It is how strings are read in C, and passing around unterminated strings to functions which expect C strings results in undefined behavior.
And then... just use strncpy to copy strings.
If you want to have copy 9 characters from your string, you'll need to have an array of 10 to do that. It is because a C string needs to have '\0' as null terminated character. So your code should be rewritten like this:
unsigned char * string = "HELLO WORLD!!!" // length 14
unsigned char out[10];
size_t length = 9;
for(i=0 ;i < length ;++i)
{
out[i] = string[i+3];
}
out[9] = 0;
printf("%s = string\n%s = out\n", string, out);

Print a string reversed in C

I'm coding a program that takes some files as parameters and prints all lines reversed. The problem is that I get unexpected results:
If I apply it to a file containing the following lines
one
two
three
four
I get the expected result, but if the file contains
september
november
december
It returns
rebmetpes
rebmevons
rebmeceds
And I don't understand why it adds a "s" at the end
Here is my code
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void reverse(char *word);
int main(int argc, char *argv[], char*envp[]) {
/* No arguments */
if (argc == 1) {
return (0);
}
FILE *fp;
int i;
for (i = 1; i < argc; i++) {
fp = fopen(argv[i],"r"); // read mode
if( fp == NULL )
{
fprintf(stderr, "Error, no file");
}
else
{
char line [2048];
/*read line and reverse it. the function reverse it prints it*/
while ( fgets(line, sizeof line, fp) != NULL )
reverse(line);
}
fclose(fp);
}
return (0);
}
void reverse(char *word)
{
char *aux;
aux = word;
/* Store the length of the word passed as parameter */
int longitud;
longitud = (int) strlen(aux);
/* Allocate memory enough ??? */
char *res = malloc( longitud * sizeof(char) );
int i;
/in this loop i copy the string reversed into a new one
for (i = 0; i < longitud-1; i++)
{
res[i] = word[longitud - 2 - i];
}
fprintf(stdout, "%s\n", res);
free(res);
}
(NOTE: some code has been deleted for clarity but it should compile)
You forget to terminate your string with \0 character. In reversing the string \0 becomes your first character of reversed string. First allocate memory for one more character than you allocated
char *res = malloc( longitud * sizeof(char) + 1);
And the try this
for (i = 0; i < longitud-1; i++)
{
res[i] = word[longitud - 2 - i];
}
res[i] = '\0'; // Terminating string with '\0'
I think I know the problem, and it's a bit of a weird issue.
Strings in C are zero terminated. This means that the string "Hi!" in memory is actually represented as 'H','i','!','\0'. The way strlen etc then know the length of the string is by counting the number of characters, starting from the first character, before the zero terminator. Similarly, when printing a string, fprintf will print all the characters until it hits the zero terminator.
The problem is, your reverse function never bothers to set the zero terminator at the end, which it needs to since you're copying characters into the buffer character by character. This means it runs off the end of your allocated res buffer, and into undefined memory, which just happened to be zero when you hit it (malloc makes no promises of the contents of the buffer you allocate, just that it's big enough). You should get different behaviour on Windows, since I believe that in debug mode, malloc initialises all buffers to 0xcccccccc.
So, what's happening is you copy september, reversed, into res. This works as you see, because it just so happens that there's a zero at the end.
You then free res, then malloc it again. Again, by chance (and because of some smartness in malloc) you get the same buffer back, which already contains "rebmetpes". You then put "november" in, reversed, which is slightly shorter, hence your buffer now contains "rebmevons".
So, the fix? Allocate another character too, this will hold your zero terminator (char *res = malloc( longitud * sizeof(char) + 1);). After you reverse the string, set the zero terminator at the end of the string (res[longitud] = '\0';).
there are two errors there, the first one is that you need one char more allocated (all chars for the string + 1 for the terminator)
char *res = malloc( (longitud+1) * sizeof(char) );
The second one is that you have to terminate the string:
res[longitud]='\0';
You can terminate the string before entering in the loop because you know already the size of the destination string.
Note that using calloc instead of malloc you will not need to terminate the string as the memory gets alreay zero-initialised
Thanks, it solved my problem. I read something about the "\0" in strings but wasn't very clear, which is now after reading all the answers (all are pretty good). Thank you all for the help.

How to get the length of a standardinput in C? [duplicate]

This question already has answers here:
Capturing a variable length string from the command-line in C
(4 answers)
Closed 9 years ago.
I will start with my code:
char input[40];
fgets( input, 40, stdin );
if( checkPalin(input) == 0 ) {
printf("%s ist ein Palindrom \n", input);
}
else {
printf("%s ist kein Palindrom \n", input);
}
What I want to do is: Read some standardinput and check with my function if it is a Palindrome or not.
My problems are the following:
How can I get the length of the standardinput? Because if it is larger then 40 chars I wanna put an errormessage and furthermore I want my char array to be the exact length of the actual input.
Anybody can help me?
fgets( input, 40, stdin );
length of input should not go beyond 40 characters == 39characters + nul character
If you give string having length more than 39 characters, then fgets() reads first 39 characters and place nul character('\0') as 40 character and ignores remaining characters.
If you give string less than 39 characters , for example 5
then it places reads newline also
length becomes 6(excluding nul character)
Do not forgot to remove newline character.
char input[60];
fgets(input,sizeof input,stdin);
For example if you declare input buffer size with some 60 then if you want to do error checking for more than 40 characters.
You can simply check with strlen() and check length is more than 40.then show error message
If you want to check error with fgets() check against NULL
There's no any function to do it, you need to write it yourself. I.e., read byte by byte looking for EOF character. But I guees you're doing it for avoid overflow, right? if input is larger than 40 characters, you don't need to because is guaranted such a extra values is not put into your buffer by fgets() function, it's never larger than the size you have requested: 40. The value may be less-than or equal, but never greater than.
EDIT:
By "How to get the lenght of a standardinput in C?" I was thinking that you're talking about how many bytes there's in stdin. I'm sorry for that. If you want to get how may bytes has fgets() written in, just use strlen()
With
fgets( input, 40, stdin );
input is guaranteed to have number of characters less than equal to 40 (null termination included)
You don't have to perform checks .
And for getting size of the input you can always use strlen() function on input, as the produced character string from fgets is always null terminated.
It just turned out that it is not so easy to write a function which uses fgets() repeatedly in order to return a malloc()ed string.
The function does no proper error reporting: If there was an error using realloc() or fgets(), the data retrieved till now is returned.
Apart from these, the function proved quite usable.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
char * read_one_line(FILE * in)
{
size_t alloc_length = 64;
size_t cumulength = 0;
char * data = malloc(alloc_length);
while (1) {
char * cursor = data + cumulength; // here we continue.
char * ret = fgets(cursor, alloc_length - cumulength, in);
printf("r %p %p %zd %zd %zd\n", data, cursor, cumulength, alloc_length, alloc_length - cumulength);
if (!ret) {
// Suppose we had EOF, no error.
// we just return what we read till now...
// there is still a \0 at cursor, so we are fine.
break;
}
size_t newlength = strlen(cursor); // how much is new?
cumulength += newlength; // add it to what we have.
if (cumulength < alloc_length - 1 || data[cumulength-1] == '\n') {
// not used the whole buffer... so we are probably done.
break;
}
// we need more!
// At least, probably.
size_t newlen = alloc_length * 2;
char * r = realloc(data, newlen);
printf("%zd\n", newlen);
if (r) {
data = r;
alloc_length = newlen;
} else {
// realloc error. Return at least what we have...
// TODO: or better free and return NULL?
return data;
}
}
char * r = realloc(data, cumulength + 1);
printf("%zd\n", cumulength + 1);
return r ? r : data; // shrinking should always have succeeded, but who knows?
}
int main()
{
char * p = read_one_line(stdin);
printf("%p\t%zd\t%zd\n", p, malloc_usable_size(p), strlen(p));
printf("%s\n", p);
free(p);
}

After using fopen to open a text file in C, it has additional characters

I need to read in table of data in a format x*[tab]*y*[tab]*z*[tab]\n* so I am using fopen and fgetc to stream characters. Loop is ending when c==EOF. (c is character.)
But I had difficulties with that as it overflows my array. After doing some debugging I realised that the opened file after the last line contains:
Northampton Oxford 68
ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ[...]ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍýýýý««««««««îþîþ
What is that? And why does that not appear in my plain text file? And how do I overcome this problem?
destination = fopen("ukcities.txt", "rt"); // r = read, t=text
if (destination != NULL) {
do {
c = fgetc (destination);
if (c == ' ') {
temp_input[i][n] = '\0';
i++;
n=0;
} else if (c == '\n') {
temp_input[i][n] = '\0';
printf("%s %s %s \n", temp_input[0], temp_input[1], temp_input[2]);
i = 0;
n=0;
} else {
temp_input[i][n] = c;
n++;
}
} while (c != -1);
return 1;
} else {
return 0;
}
Looking into my crystal ball, I see that fread or whatever you're using (apparently that's fgetc which makes it even more true) doesn't null-terminate the data it reads and you're trying to print it as a C-string. Terminate the data with a NUL character (a 0) and then it will print correctly.
That string looks unterminated. In C, strings that don't end with a '\0' character (a.k.a. null character) lead to constant trouble because a lot of the standard library and system libraries expect strings to be null-terminated.
Make sure that when you have finished reading in all the data, that the string is terminated; in some cases it must be done manually. There are a few ways to do this (the below makes all characters of the string null, so as long as you don't overwrite the very last one, the string will always be null terminated):
// (1) declare an array of char, set all characters to null character
char buffer[1000] = {0};
Alternatively, if you are keeping track of where you are in the buffer, you can also do this:
// (2) after reading in all data, add the null character yourself:
int n; // number of bytes read
char buf[1000];
// read data into buf, updating n
buf[n] = '\0'; // (tip: may need to use buf[n+1])
In either case, it is important that you don't overstep the end of the buffer. If you've only allocated 1000 bytes, then use only 999 bytes and save 1 byte for the null character.

Resources