I am reading from a file using fgetc and doing that makes it so that I have a char. However, I want to convert this char to a string such that I can use the strtok function upon it. How would I go about doing this?
int xp;
while(1) {
xp = fgetc(filename);
char xpchar = xp;
//convert xpchar into a string
}
Simply create an array with two items, your character and the null terminator:
char str[] = {ch, '\0'};
Or if you will, use a compound literal to do the same:
(char[]){ch, '\0'}
Compound literals can be used to convert your character directly, inside an expression:
printf("%s", (char[]){ch, '\0'} );
I suppose, you are going to read not just one character from file, so look at the following example:
#define STR_SIZE 10
// STR_SIZE defines the maximum number of characters to be read from file
int xp;
char str[STR_SIZE + 1] = { 0 }; // here all array of char is filled with 0
// +1 in array size ensure that at least one '\0' char
// will be in array to be the end of string
int strCnt = 0; // this is the conter of characters stored in the array
while (1) {
xp = fgetc(f);
char xpchar = xp;
//convert xpchar into a string
str[strCnt] = xpchar; // store character to next free position of array
strCnt++;
if (strCnt >= STR_SIZE) // if array if filled
break; // stop reading from file
}
And name of your file-pointer-variable - filename looks strange (filename is good name for string variable that store name of file, but fgetc and getc need FILE *), so check that in your program you have something like:
FILE * f = fopen(filename, "r");
or think over changing name for filename.
Related
I'm relatively new to C, so any help understanding what's going on would be awesome!!!
I have a struct called Token that is as follows:
//Token struct
struct Token {
char type[16];
char value[1024];
};
I am trying to read from a file and append characters read from the file into Token.value like so:
struct Token newToken;
char ch;
ch = fgetc(file);
strncat(newToken.value, &ch, 1);
THIS WORKS!
My problem is that Token.value begins with several values I don't understand, preceding the characters that I appended. When I print the result of newToken.value to the console, I get #�����TheCharactersIWantedToAppend. I could probably figure out a band-aid solution to retroactively remove or work around these characters, but I'd rather not if I don't have to.
In analyzing the � characters, I see them as (in order from index 1-5): \330, \377, \377, \377, \177. I read that \377 is a special character for EOF in C, but also 255 in decimal? Do these values make up a memory address? Am I adding the address to newToken.value by using &ch in strncat? If so, how can I keep them from getting into newToken.value?
Note: I get a segmentation fault if I use strncat(newToken.value, ch, 1) instead of strncat(newToken.value, &ch, 1) (ch vs. &ch).
I'll try to consolidate the answers already given in the comments.
This version of the code uses strncat(), as yours, but solving the problems noted by Nick (we must initialize the target) and Dúthomhas (the second parameter to strncat() must be a string, and not a pointer to a single char) (Yes, a "string" is actually a char[] and the value passed to the function is a char*; but it must point to an array of at least two chars, the last one containing a '\0'.)
Please be aware that strncat(), strncpy() and all related functions are tricky. They don't write more than N chars. But strncpy() only adds the final '\0' to the target string when the source has less than N chars; and strncat() always adds it, even if it the source has exactly N chars or more (edited; thanks, #Clifford).
#include <stdio.h>
#include <string.h>
int main() {
FILE* file = stdin; // fopen("test.txt", "r");
if (file) {
struct Token {
char type[16];
char value[1024];
};
struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty"
int aux;
char source[2] = ""; // A literal "" has a single char with value '\0', but this syntax fills the entire array with '\0's
while ((aux = fgetc(file)) != EOF) {
source[0] = (char)aux;
strncat(newToken.value, source, 1); // This appends AT MOST 1 CHAR (and always adds a final '\0')
}
strncat(newToken.value, "", 1); // As the source string is empty, it just adds a final '\0' (superfluous in this case)
printf(newToken.value);
}
return 0;
}
This other version uses an index variable and writes each singe char directly into the "current" position of the target string, without using strncat(). I think is simpler and more secure, because it doesn't mix the confusing semantics of single chars and strings.
#include <stdio.h>
#include <string.h>
int main() {
FILE* file = stdin; // fopen("test.txt", "r");
if (file) {
struct Token {
int index = 0;
char type[16];
char value[1024]; // Max size is 1023 chars + '\0'
};
struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty". This is not really necessary anymore
int aux;
while ((aux = fgetc(file)) != EOF)
// Index will stop BEFORE 1024-1 (value[1022] will be the last "real" char, leaving space for a final '\0')
if (newToken.index < sizeof newToken.value -1)
newToken.value[newToken.index++] = (char)aux;
newToken.value[newToken.index++] = '\0';
printf(newToken.value);
}
return 0;
}
Edited: fgetc() returns an int and we should check for EOF before casting it to a char (thanks, #chqrlie).
You are appending string that is not initialised, so can contain anything. The end I'd a string is indicated by a NUL(0) character, and in your example there happened to be one after 6 bytes, but there need not be any within the value array, so the code is seriously flawed, and will result in non-deterministic behaviour.
You need to initialise the newToken instance to empty string. For example:
struct Token newToken = { "", "" } ;
or to zero initialise the whole structure:
struct Token newToken = { 0 } ;
The point is that C does not initialise non-static objects without an explicit initialiser.
Furthermore using strncat() is very inefficient and has non-deterministic execution time that depends on the length of the destination string (see https://www.joelonsoftware.com/2001/12/11/back-to-basics/).
In this case you would do better to maintain a count of the number of characters added, and write the character and terminator directly to the array. For example:
size_t index ;
int ch = 0 ;
do
{
ch = fgetc(file);
if( ch != EOF )
{
newToken.value[index] = (char)ch ;
index++ ;
newToken.value[index] = '\0' ;
}
} while( ch != EOF &&
index < size of(newToken.value) - 1 ) ;
I would like to write a program in C that gets the file content via stdin and reads it line by line and, for each line, converts it to an array of 8-bit integer values.
I also would like to be able to do the reverse process. After working with my array of 8-bit values, I would like to convert it again to "lines" that would be organized as a new buffer.
So basically, I would like to convert a char * line to an int array[] and back (an int array[] to a char * line) while keeping the consistency, so when I create the file again out of the conversions, the file is valid (and by valid I mean, the conversion from int array[] to char * line generates the same content of the original char * line, while reading each line of the stdin.
My code is currently as follows:
#include <stdio.h>
#include <stdlib.h>
int main() {
FILE *stream;
char *line = NULL;
size_t len = 0;
ssize_t read;
stream = stdin;
if (stream == NULL)
exit(EXIT_FAILURE);
while ((read = getline(&line, &len, stream)) != -1) {
char * array = line_to_array(line);
// here I include the rest of my code
// where I am going to use the generated array
// ...
}
free(line);
fclose(stream);
exit(EXIT_SUCCESS);
}
The line_to_array function would be the one to convert the "line" content to the array of integers. In a second file, I would just do the opposite.
The mechanics of the process would be like this:
The first program (first.c) would receive a file content via stdin. By reading it using getline, I would have each line to convert to an array of integers and send each line to a second program (second.c) that would convert each array to a char * buffer again and the reconstruct the file.
In the terminal, I would run it like this:
./first | ./second
I appreciate any help on this matter.
Thank you.
I believe you may already know that a name of array is a kind of constant pointer. You could verify the fact from following code:
char hello[] = "hello world!";
for( int idx=0; *(hello + idx) != 0; idx++ )
{
printf("%c", *(hello + idx));
}
printf("\n");
So, there are no reason to convert character pointer to array. For your information, A char variable is a 8bit data in C, this can contain a integer value which is represent a character: 65 represent 'A' in ASCII code.
Secondly, this link may help you to understand how to convert between c string and std::string.
On second thought, may your input file is UNICODE or UTF-8 encoded file which is using multi-byte character code. In that case, you may not able to use getline() to read the string from the file. If so, please refer this question: Reading unicode characters.
I wish following code assist you to understand char type, array and pointer in C/C++:
std::string hello("Hello world");
const char *ptr = hello.c_str();
for( int idx=0; idx < hello.size(); idx++ )
{
printf("%3d ", *(ptr + idx));
}
printf("\n");
std::string hello("Hello world");
const char *ptr = hello.c_str();
for( int idx=0; idx < hello.size(); idx++ )
{
printf("%3d ", ptr[idx]);
}
printf("\n");
I have a char array list that contains text from a text file, for example:
this is the first line
this is the second line
I want to have the first line copied to another char array without \n (and/or \r).
I do not know the size of the first line exactly but I do know it is less than 100 bytes.
Snappet of my code:
unsigned char *line;
line = (u_char *)calloc(100, sizeof(char));
//read txt file to list
while(list[0] != '\n'){
line[0] = list[0];
list++;
line++;
}
Unfortunaly line is empty. Note that I know for sure list isn't empty, and contains the text as showed above.
Any suggestions on this code, or another solution? The file is opened using open() and not fopen() so I've to loop through my list array.
You can do it like this:
for ( int i = 0; list[i] && list[i] != '\n'; ++i ) {
line[i] = list[i];
}
You also could use strcspn() from the standard library string.h:
Declaration:
size_t strcspn(const char *str1, const char *str2);
Finds the first sequence of characters in the string str1 that does
not contain any character specified in str2.
Returns the length of this first sequence of characters found that do
not match with str2.
Source
Your program would then become
unsigned char *line;
int firstlineLength;
//read txt file to list
/*count the characters up to first linebreak */
firstlineLength = strspn(list, "\n");
/* allocate just the memory you need +1 one for the terminating zero*/
line = (u_char *)calloc(firstlineLength+1, sizeof(char));
strncpy(line, list, firstlineLength);
I want to copy X to Y words of a string to the out char * array.
unsigned char * string = "HELLO WORLD!!!" // length 14
unsigned char out[9];
size_t length = 9;
for(i=0 ;i < length ;++i)
{
out[i] = string[i+3];
}
printf("%s = string\n%s = out\n", string, out);
When looking at the output of out, why is there gibberish after a certain point of my string? I see the string of out as LO WORLD!# . Why are there weird characters appearing after the content I copied, isn't out supposed to be a an array of 9? I expected the output to be
LO WORLD!
In C you need to terminate your string with a 0x00 value so a string of length 9 needs ten bytes to store it with the last set to 0. Otherwise your print statements run off into random data.
unsigned char * string = "HELLO WORLD!!!" // length 14
unsigned char out[10];
size_t length = 9;
for(i=0 ;i < length ;++i)
{
out[i] = string[i+3];
}
out[length] = 0x00;
printf("%s = string\n%s = out\n", string, out);
A minor point, but string literals have type char* (or const char* in C++), not unsigned char* -- these might be the same in your implementation, but they don't need to be.
Furthermore, this is not true:
unsigned char * string = "HELLO WORLD!!!" // length 14
The string actually occupies 15 bytes -- there is an extra, hidden '\0' at the end, called a nul byte, which marks the end of the string. These nul terminators are very important, because if they're not present, then many C library functions which manipulate strings will keep going until they hit a byte with a value equal to '\0' -- and so can end up reading or trampling over bits of memory they shouldn't do. This is called a buffer overrun, and is a classic bug (and exploitable security problem) in C programmes.
In your example, you haven't included this nul terminator in your copied string, so printf() just keeps going until it finds one, hence the gibberish you're seeing. In general, it's a good idea only to use C library functions to manipulate C strings if possible, as these are careful to add the terminator for you. In this case, strncpy from string.h does exactly what you're after.
A 9 character string needs 10 bytes because it must be null ( 0 ) terminated. Try this:
unsigned char out[10]; // make this 10
size_t length = 9;
for(i=0 ;i < length ;++i)
{
out[i] = string[i+3];
}
out[i] = 0; // add this to terminate the string
A better approach would be just the line:
strncpy(out, string+3, 9);
C strings must be null terminated. You only created an array large enough for 8 characters + the null terminator, but you never added the terminator.
So, you need to allocate the length plus 1 and add the terminator.
// initializes all elements to 0
char out[10] = {0};
// alternatively, add it at the end.
out[9] = '\0';
Think of it this way; you're passed a char* which represents a string. How do you know how long it is? How can you read it? Well, in C, a sentinel value is added to the end. This is the null terminator. It is how strings are read in C, and passing around unterminated strings to functions which expect C strings results in undefined behavior.
And then... just use strncpy to copy strings.
If you want to have copy 9 characters from your string, you'll need to have an array of 10 to do that. It is because a C string needs to have '\0' as null terminated character. So your code should be rewritten like this:
unsigned char * string = "HELLO WORLD!!!" // length 14
unsigned char out[10];
size_t length = 9;
for(i=0 ;i < length ;++i)
{
out[i] = string[i+3];
}
out[9] = 0;
printf("%s = string\n%s = out\n", string, out);
As simple as that. I'm on C++ btw. I've read the cplusplus.com's cstdlib library functions, but I can't find a simple function for this.
I know the length of the char, I only need to erase last three characters from it. I can use C++ string, but this is for handling files, which uses char*, and I don't want to do conversions from string to C char.
If you don't need to copy the string somewhere else and can change it
/* make sure strlen(name) >= 3 */
namelen = strlen(name); /* possibly you've saved the length previously */
name[namelen - 3] = 0;
If you need to copy it (because it's a string literal or you want to keep the original around)
/* make sure strlen(name) >= 3 */
namelen = strlen(name); /* possibly you've saved the length previously */
strncpy(copy, name, namelen - 3);
/* add a final null terminator */
copy[namelen - 3] = 0;
I think some of your post was lost in translation.
To truncate a string in C, you can simply insert a terminating null character in the desired position. All of the standard functions will then treat the string as having the new length.
#include <stdio.h>
#include <string.h>
int main(void)
{
char string[] = "one one two three five eight thirteen twenty-one";
printf("%s\n", string);
string[strlen(string) - 3] = '\0';
printf("%s\n", string);
return 0;
}
If you know the length of the string you can use pointer arithmetic to get a string with the last three characters:
const char* mystring = "abc123";
const int len = 6;
const char* substring = mystring + len - 3;
Please note that substring points to the same memory as mystring and is only valid as long as mystring is valid and left unchanged. The reason that this works is that a c string doesn't have any special markers at the beginning, only the NULL termination at the end.
I interpreted your question as wanting the last three characters, getting rid of the start, as opposed to how David Heffernan read it, one of us is obviously wrong.
bool TakeOutLastThreeChars(char* src, int len) {
if (len < 3) return false;
memset(src + len - 3, 0, 3);
return true;
}
I assume mutating the string memory is safe since you did say erase the last three characters. I'm just overwriting the last three characters with "NULL" or 0.
It might help to understand how C char* "strings" work:
You start reading them from the char that the char* points to until you hit a \0 char (or simply 0).
So if I have
char* str = "theFile.nam";
then str+3 represents the string File.nam.
But you want to remove the last three characters, so you want something like:
char str2[9];
strncpy (str2,str,8); // now str2 contains "theFile.#" where # is some character you don't know about
str2[8]='\0'; // now str2 contains "theFile.\0" and is a proper char* string.