Splitting and modifying a string from text file in C - c

In a school project I am making a Caesar's Cipher. The requirements I have to meet are the following: I need to read text from a text file, store it into a bi-dimensional array of strings with a maximum of 81 characters per line(80 useful + '\0') and 1000 lines, and then modify the content in order to cipher or decipher it. What if in a single line of text from the file I has more than 80 useful characters? I thought about making it read in a way that each space it reads transforms it into a '\0' and changes line in the array but I don't know if I can do it with fgets, instead of fgetc as I was doing it.
This is what I have right now:
int lerficheiro(char * texto[MAXLINHAS][MAXCARPORLINHA])
{
char caractere;
FILE * fp;
fp = fopen("tudomaiusculas.txt", "r");
if(fp==NULL)
{
printf("Erro ao ler ficheiro.");
return (-1);
}
for(int linha = 0; linha < MAXLINHAS; linha++)
{
for(int coluna = 0; coluna < MAXCARPORLINHA; coluna++)
{
caractere = fgetc(fp);
if(caractere == ' ') caractere = '\0'; break;
if(caractere == '\n') caractere = '\0'; break;
if(caractere < 'A' || caractere > 'Z')
{
printf("Erro ao ler, o ficheiro não contem as letras todas
maiusculas");
return (-1);
}
* texto[linha][coluna] = caractere;
}
}
}

Initialize your array to 0 and make use of the fact that you select how many bytes you are reading using fgets. Just check the return value of fgets to see if you've reached the end of the file.
Also note that you should not need to have your result array as a pointer, since array already are pointers (EDIT to mirror #user3629249 suggestions)
EDIT2: Edited code to take into account the new-line problem. Alo removed the -1 that was causing 79 char lines instead of 80
#include <stdio.h>
#include <string.h>
#define MAX_LINES 8000
#define HSIZE 81
int parse_file( FILE * inp_file, char res[MAX_LINES][HSIZE])
{
int l = 0;
int len = HSIZE;
while( fgets( res[l]+(HSIZE-len), len, inp_file ))
{
len = HSIZE - strlen( res[l]);
if( len <= 1)
{
l++;
len = HSIZE;
}
}
}
int main()
{
char parsed_file[MAX_LINES][HSIZE] = {0};
FILE * inp_file;
inp_file = fopen( "file_to_parse.txt", "r");
if( inp_file == NULL)
{
printf( "Failed to read input file...\n");
return 1;
}
parse_file( inp_file, parsed_file);
fclose( inp_file);
for( int i=0; parsed_file[i][0] != 0; i++)
printf( "line %04d: %s\n", i+1,parsed_file[i]);
return 0;
}
If you want you could also replace the new line in your parsed_file with something like this
char *pos;
while( (pos = strchr( line, '\n'))
*pos = ' ';
With test file:
This is a random file that I'm testing out for the pure randomness of random files.
Still reading, m'kay man lets get going!!!!!!!!!!! So last day the craziest thing happened, let me tell you about it....
and output
line 0001: This is a random file that I'm testing out for the pure randomness of random fil
line 0002: es.
Still reading, m'kay man lets get going!!!!!!!!!!! So last day the craziest
line 0003: thing happened, let me tell you about it....
Take care that the printf will still print the newline character

Related

How can I create a 2D array to store a collection of words scanned from a .txt file in C?

I am working on a program where I want to scan a .txt file that contains a poem. After scanning the poem, I want to be able to store each individual word as a single string and store those strings in a 2D array. For example, if my .txt file contains the following:
Haikus are easy.
But sometimes they don't make sense.
Refrigerator.
I want to be able to store each word as the following in a single array:
H a i k u s \0
a r e \0
e a s y . \0
B u t \0
s o m e t i m e s \0
t h e y \0
d o n ' t \0
m a k e \0
s e n s e . \0
R e f r i g e r a t o r . \0
So far, this is the code I have. I am having difficulties understanding 2D arrays, so if someone could explain that to me as well in context to this problem, that would be great. I am still learning the C language, so it takes time for me to understand some things. I have been scratching my head at this for a few hours now and am using this as help after trying everything I could think of!
The following is my function for getting the words and storing them in to arrays (it also returns the number of words there are, which is used separately for a different part of the program):
int getWords(int maxSize, FILE* inFile, char strings[][COL_SIZE]){
int numWords;
for(int i = 0; i < maxSize; i++){
fscanf(inFile, "%s", strings[i]);
while(fscanf(inFile, "%s", strings[i] == 10){
numWords++;
}
}
return numWords;
}
Here's the code I have where I call the function in the main function (I am not sure what numbers to set the COL_SIZE and MAX_LENGTH to, like I said, I am new to this and am trying my best to understand 2D arrays and how they work):
#define COL_SIZE 10
#define MAX_LENGTH 500
int main(){
FILE* fp;
char strArray[MAX_LENGTH][COL_SIZE];
fp = fopen(FILE_NAME, "r");
if(fp == NULL){
printf("File could not be found!");
}
else{
getWords(MAX_LENGTH, fp, strArray);
fclose(fp);
}
return 0;
}
What you are not understanding, it that COL_SIZE must be large enough to store the longest word +1 for the nul-terminating character. Take:
R e f r i g e r a t o r . \0
----------------------------
1 2 3 4 5 6 7 8 9 0 1 2 3 4 - > 14 characters of storage required
You declare a 500 x 10 2D array of char:
char strArray[500][10]
"Refrigertator." cannot fit in strArray, so what happens is "Refrigerat" is stored at one row-index, and then "tor.\0" overwrites the first 5 characters of the next.
There are a number of ways to handle the input, but if you want to use fscanf, then you need (1) to include a field-width modifier with the string conversion to limit the number of characters stored to the amount of storage available, and (2) validate the next character after those you have read is a whitespace character, e.g.
#include <ctype.h>
int getWords(int maxSize, FILE* inFile, char strings[][COL_SIZE])
{
char c;
int n = 0;
while (n < maxSize) {
int rtn = fscanf (inFile, "%9s%c", strings[n], &c);
if (rtn == 2 && isspace(c))
n++;
else if (rtn == 1) {
n++;
break;
}
else
break;
}
return n;
}
Note the format string contains a field-width modifier of one-less than the total number of characters available, and then the character conversion stores the next character and validates it is whitespace (if it isn't you have a word that is too long to fit in your array)
With any user-input function, you cannot use it correctly unless you check the return. Above, the return from fscanf() is saved in rtn. If you have a successful conversion of both your string limited to COL_SIZE - 1 by your field-width modifier and c is whitespace, you have a successful read of the word and you are not yet at EOF. If the return is 1, you have the successful read of the word and you have reached EOF (non-POSIX line end on last line). Otherwise, you will either reach the limit of MAX_LENGTH and exit the loop, or your will reach EOF and fscanf() will return EOF forcing an exit of the loop through the else clause.
Lastly, don't skimp on buffer size. The longest word in the non-medical unabridged dictionary is 29-character, requiring a total of 30 characters storage, so #define COL_SIZE 32 makes more sense than 10.
Look things over and let me know if you have more questions.
stdio.h Only
If you are limited to stdio.h, then you can manually confirm that c contains a whitespace character:
if (rtn == 2 && (c == ' ' || c == '\t' || c == '\n'))
n++;
You probably don't want a traditional 2D array. Those are usually rectangular, which is not well suited to storing variable length words. Instead, you would want an array of pointers to buffers, sort of like argv is. Since the goal is to load from a file, I suggest using a contiguous buffer rather than allocating a separate one for each word.
The general idea is this:
First pass: get total file size and read in the whole thing (+1 byte for trailing NUL).
Second pass: count the words and split them with NULs.
Third pass: allocate a buffer for the word pointers and fill it in
Here's how to load the entire file:
#include <sys/stat.h>
#include <stdlib.h>
#include <stdio.h>
char *load_file(const char *fname, int *n)
{
struct stat st;
if(stat(fname, &st) == -1 || st.st_size == 0) return NULL;
char *buffer = malloc(st.st_size + 1);
if(buffer == NULL) return NULL;
FILE *file = fopen(fname, "r");
if(file == NULL || fread(buffer, 1, st.st_size, file)) {
free(buffer);
buffer = NULL;
}
fclose(file);
*n = st.st_size;
return buffer;
}
You can count the words by just stepping through the file contents and marking the end of each word.
#include <ctype.h>
char *skip_nonword(char *text, char *end)
{
while(text != end && !isalpha(*text)) text++;
return text;
}
char *skip_word(char *text, char *end)
{
while(text != end && isalpha(*text)) text++;
return text;
}
int count_words(char *text, int n)
{
char *end = text + n;
int count = 0;
while(text < end) {
text = skip_nonword(text, end);
if(text < end) {
count++;
text = skip_word(text, end);
*text = '\0';
}
}
return count;
}
Now you are in position to allocate the word buffer and fill it in:
char **list_words(const char *text, int n, int count)
{
char *end = text + n;
char **words = malloc(count * sizeof(char *));
if(words == NULL) return NULL;
for(int i = 0; i < count; i++) {
words[i] = skip_nonword(text, end);
text = skip_word(words[i], end);
}
return words;
}

Reading columns of strings and integers from file

I am able to read chars, words, sentences and integers from separate files but I am struggling to read words and integers from the same file. Let's say my file contains the following:
Patrice 95
Rio 96
Marcus 78
Wayne 69
Alex 67
Chris 100
Nemanja 78
My partial solution (to read in strings) so far was to use fgetc() and check for spaces and or carriage returns in my text file to separate the name from the number.
The main issue with fgetc is that it reads in character by character, and so integers are not meant to be read in like this. As a workaround, I am converting the character to an integer whenever a number is read in.
This is the main code structure:
typedef struct person {
char name[10][10];
char surname[10][10];
int age [10];
} person_t;
FILE *inp; /* pointer to input file */
char c;
int word_count = 0;
int char_count = 0;
int i = 0;
int x;
person_t my_person;
while ((c = fgetc(inp)) != EOF) {
if (c == ' ' || c == '\r') {
printf("\n");
my_person.name[word_count][char_count] = '\0'; //Terminate the string
char_count = 0; //Reset the counter.
word_count++;
}
else {
if (c >= '0' && c <= '9') {
x = c - '0'; //converting to int
my_person.age[i] = x;
printf("%d", my_person.age[i]);
i++;
}
else {
my_person.name[word_count][char_count] = c;
printf("%c",my_person.name[word_count][char_count]);
if (char_count < 19) {
char_count++;
}
else {
char_count = 0;
}
}
}
}
}
for (int i = 0; i<7; i++) {
printf("ages: %d \n",my_person.age[i] ); //never executes
}
Sample Output:
Patrice
95
Rio
96
Marcus
78
Wayne
69
Alex
67
Chris
Full code can be found on pastebin.
Why is the for loop never executing? Any suggestions on what I can improve to read the columns of strings and integers?
Use fgets() to read a whole line.
char line[100];
while (fgets(line, sizeof line, inp)) {
// got a line, need to isolate parts
}
Then, depending on whether the words can have embedded spaces choose one of the strategies below.
a) sscanf() to isolate name and age
while (fgets(line, sizeof line, inp)) {
char name[30];
int age;
if (sscanf(line, "%29s%d", name, &age) != 2) /* error, bad line */;
// ...
}
b) strrchr() to find the last space, then string manipulation to extract name and age.
while (fgets(line, sizeof line, inp)) {
char name[30];
int age;
char *space = strrchr(line, ' ');
if (!space) /* error, bad line */;
if (space - line >= 30) /* error, name too long */;
sprintf(name, "%.*s", space - line, line);
age = strtol(space, NULL, 10); // needs error checking
// ...
}
strategy b) on https://ideone.com/ZOLie9

C extract words from a txt file except spaces and punctuations

I'm trying to extract the words from a .txt file which contains the following sentence
Quando avevo cinqve anni, mia made mi perpeteva sempre che la felicita e la chiave della vita. Quando andai a squola mi domandrono come vuolessi essere da grande. Io scrissi: selice. Mi dissero che non avevo capito il corpito, e io dissi loro che non avevano capito la wita.
The problem is that in the array that I use to store the words, it stores also empty words ' ' which come always after one of the following ',' '.' ':'
I know that things like "empty words" or "empty chars" don't make sense but please try the code with the text that I've passed and you'll understand.
Meanwhile I'm trying to understand the use of sscanf with this modifier sscanf(buffer, "%[^.,:]"); that should allow me to store strings ignoring the . and , and : characters however I don't know what should i write in %[^] to ignore the empty character ' ' which always gets saved.
The code is the following
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
static void load_array(const char* file_name){
char buffer[2048];
char a[100][100];
int buf_size = 2048;
FILE *fp;
int j = 0, c = 0;
printf("\nLoading data from file...\n");
fp = fopen(file_name,"r");
if(fp == NULL){
fprintf(stderr,"main: unable to open the file");
exit(EXIT_FAILURE);
}
fgets(buffer,buf_size,fp);
//here i store each word in an array of strings when I encounter
//an unwanted char I save the word into the next element of the
//array
for(int i = 0; i < strlen(buffer); i++) {
if((buffer[i] >= 'a' && buffer[i] <= 'z') || (buffer[i] >= 'A' && buffer[i] <= 'Z')) {
a[j][c++] = buffer[i];
} else {
j++;
c = 0;
continue;
}
}
//this print is used only to see the words in the array of strings
for(int i = 0; i < 100; i++)
printf("%s %d\n", a[i], i);
fclose(fp);
printf("\nData loaded\n");
}
//Here I pass the file_name from command line
int main(int argc, char const *argv[]) {
if(argc < 2) {
printf("Usage: ordered_array_main <file_name>\n");
exit(EXIT_FAILURE);
}
load_array(argv[1]);
}
I know that I should store only the necessary number and words and not 100 everytime, I want to think about that later on, right now I want to fix the issue with the empty words.
Compilation and execution
gcc -o testloadfile testloadfile.c
./testloadfile "correctme.txt"
you could instead try to use strtok
fgets(buffer,buf_size,fp);
for (char* tok = strtok(buffer,".,: "); *tok; tok = strtok(NULL,".,: "))
{
printf("%s\n", tok);
}
Note that if you want to store what strtok returns you need to either copy the contents of what tok points to or allocate a copy using strdup/malloc+strcpy since strtok modifies its copy of the first argument as it parses the string.
You forgot to add the final '\0' in each of a's line, and your algorithm have many flaw (like how you increment j each time a non-letter appear. What if you have ", " ? you increment two time instead of one).
One "easy" way is to use "strtok", as Anders K. show you.
fgets(buffer,buf_size,fp);
for (char* tok = strtok(buffer,".,:"); *tok; tok = strtok(NULL,".,:")) {
printf("%s\n", tok);
}
The "problem" of that function, is that you have to specify all the delimiter, so you have to add ' ' (space), '\t' (tabulation) etc etc.
Since you only want "word" as described by "contain only letter, minuscule or majuscule", then you can do the following:
int main(void)
{
char line[] = "Hello ! What a beautiful day, isn't it ?";
char *beginWord = NULL;
for (size_t i = 0; line[i]; ++i) {
if (isalpha(line[i])) { // upper or lower letter ==> valid character for a word
if (!beginWord) {
// We found the beginning of a word
beginWord = line + i;
}
} else {
if (beginWord) {
// We found the end of a word
char tmp = line[i];
line[i] = '\0';
printf("'%s'\n", beginWord);
line[i] = tmp;
beginWord = NULL;
}
}
}
return (0);
}
Note that how "isn't" is splitted in "isn" and "t", since ' is not an accpeted character for your word.
The algo is pretty simple: we just loop the string, and if it's a valid letter and beginWord == NULL, then it's the beginning of the word. If it's not a valid letter and beginWord != NULL, then it's the end of a word. Then you can have every number of letter between two word, you still can detect cleanly the word.

How to use scanf() to capture only Strings

Hi i am new to C and i am trying to use the Character array type below to captures input from users. How do i prevent or escape numerical characters. I just want only strings to be captured.
char str_input[105];
In have tried
scanf("%[^\n]s",str_input);
scanf("%[^\n]",str_input);
scanf("%[^0-9]",str_input);
scanf("%[A-Zaz-z]",str_input);
str_input = fgetc(stdin);
None of the above worked for me.
Input
2
hacker
Expected Output
Hce akr
int main() {
char *str_input;
size_t bufsize = 108;
size_t characters;
str_input = (char *)malloc(bufsize * sizeof(char));
if (str_input == NULL)
{
perror("Unable to allocate buffer");
exit(1);
}
characters = getline(&str_input,&bufsize,stdin);
printf("%zu characters were read.\n",characters);
int i;
int len = 0;
for (i = 0, len = strlen(str_input); i<=len; i++) {
i%2==0? printf("%c",str_input[i]): 'b';
}
printf(" ");
for (i = 0, len = strlen(str_input); i<=len; i++) {
i%2!=0? printf("%c",str_input[i]): 'b';
}
return 0;
}
Error
solution.c: In function ‘main’:
solution.c:21:5: warning: implicit declaration of function ‘getline’ [-Wimplicit-function-declaration]
characters = getline(&str_input,&bufsize,stdin);
Since your buffer has limited size, then using fgets(3) is fine. fgets() returns NULL on failure to read a line, and appends a newline character at the end of the buffer.
In terms of preventing numerical characters from being in your buffer, you can simply create another buffer, and only add non-numerical characters to it. You could just delete the numerical characters from your original buffer, but this can be a tedious procedure if you are still grasping the basics of C. Another method would be just to read single character input with getchar(3), which would allow you assess each character and simply ignore numbers. THis method is by far the easiest to implement.
Since you asked for an example of using fgets(), here is some example code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define INPUTSIZE 108
int main(void) {
char str_input[INPUTSIZE], characters[INPUTSIZE];
size_t slen, char_count = 0;
printf("Enter input:\n");
if (fgets(str_input, INPUTSIZE, stdin) != NULL) {
/* removing newline from fgets() */
slen = strlen(str_input);
if (slen > 0 && str_input[slen-1] == '\n') {
str_input[slen-1] = '\0';
} else {
fprintf(stderr, "Number of characters entered exceeds buffer size\n");
exit(EXIT_FAILURE);
}
/* checking if string is valid */
if (*str_input == '\0') {
fprintf(stderr, "No input found\n");
exit(EXIT_FAILURE);
}
printf("Buffer: %s\n", str_input);
/* only adding non-numbers */
for (size_t i = 0; str_input[i] != '\0'; i++) {
if (!isdigit(str_input[i])) {
characters[char_count++] = str_input[i];
}
}
/* terminating buffer */
characters[char_count] = '\0';
printf("New buffer without numbers: %s\n", characters);
}
return 0;
}
Example input:
Enter input:
2ttt4y24t4t3t2g
Output:
Buffer: 2ttt4y24t4t3t2g
New buffer without numbers: tttytttg
Update:
You could just use this even simpler approach of ignoring non-number characters:
char str_input[INPUTSIZE];
int ch;
size_t char_count = 0;
while ((ch = getchar()) != EOF && ch != '\n') {
if (!isdigit(ch)) {
if (char_count < sizeof(str_input)) {
str_input[char_count++] = ch;
}
}
}
str_input[char_count] = '\0';
If you're using Linux, I would use the getline() function to get a whole line of text, then verify it. If it is not valid input, I would in a loop ask the user to enter a line of text again and again until you the input is acceptable.
If not using Linux, well, your best bet is probably to reimplement getline(). You can also use fgets() if you find a limited-size buffer acceptable. I don't find limited-size buffers acceptable, so that's why I prefer getline().
getline() is used according to the way explained in its man page: http://man7.org/linux/man-pages/man3/getdelim.3.html
Basically, your loop should be something similar to:
char *buf = NULL;
size_t bufsiz = 0;
while (1)
{
if (getline(&buf, &bufsiz, stdin) < 0)
{
handle_error();
}
if (is_valid(buf))
{
break;
}
printf("Error, please re-enter input\n");
}
use_buffer(buf);
free(buf);
Well that's not possible. Numbers are string too. But you can set loop to look for numbers and print error. like this :
char *str = "ab234cid20kd", *p = str;
while (*p) { // While there are more characters to process...
if (isdigit(*p)) { // Upon finding a digit, ...
printf("Numbers are forbidden");
return 0;
} else {
p++;
}
}

Counting lines, numbers, and characters in C

I'm new to C and I got an assignment today that requires that I read text in from a file, count the number of lines, characters, and words, and return it in a specific format.
Just to be clear - I need to read in this text file:
"I must not fear.
Fear is the mind-killer.
Fear is the little-death that brings total obliteration.
I will face my fear.
I will permit it to pass over me and through me.
And when it has gone past I will turn the inner eye to see its path.
Where the fear has gone there will be nothing... only I will remain"
Litany Against Fear, Dune by Frank Herbert
and have it output like so:
1)"I must not fear.[4,17]
2)Fear is the mind-killer.[4,24]
3)Fear is the little-death that brings total obliteration.[8,56]
4)I will face my fear.[5,20]
5)I will permit it to pass over me and through me.[11,48]
6)And when it has gone past I will turn the inner eye to see its path.[16,68]
7)Where the fear has gone there will be nothing... only I will remain"[13,68]
8) Litany Against Fear, Dune by Frank Herbert[7,48]
Now, I've written something that will accept the file, it counts the number of lines properly, but I have 2 major issues - 1. How do I get the text from the file to appear in the output? I can't get that at all. My word count doesn't work at all, and my character count is off too. Can you please help?
#include <stdio.h>
#define IN 1
#define OUT 0
void main()
{
int numChars = 0;
int numWords = 0;
int numLines = 0;
int state = 0;
int test = 0;
FILE *doesthiswork;
doesthiswork = fopen("testWords.in", "r");
state = OUT;
while ((test = fgetc(doesthiswork)) != EOF)
{
++numChars;
if ( test == '\n')
{
++numLines;
if (test == ' ' || test == '\t' || test == '\n')
{
state = OUT;
}
else if (state == OUT)
{
state = IN;
++numWords;
}
}
printf("%d) I NEED TEXT HERE. [%d %d]\n",numLines, numWords, numChars);
}
}
It will be better if you use getline() function to read each line from the file.
And after reading the line process it using strtok() function. With this you will get the number of words in the line and save it in a variable.
Then process each variable and get the number of characters.
Output the line number, number of words and the number of characters.
Then read another line and so on.
How do I get the text from the file to appear in the output?
It should be stored there by preparing a buffer.
My word count doesn't work at all, and my character count is off too.
Order in which the test is wrong.
fix like this:
#include <stdio.h>
#define IN 1
#define OUT 0
int main(){
int numChars = 0;
int numWords = 0;
int numLines = 0;
int state = OUT;
int test;
char buffer[1024];
int buff_pos = 0;
FILE *doesthiswork;
doesthiswork = fopen("data.txt", "r");
state = OUT;
while((test = fgetc(doesthiswork)) != EOF) {
++numChars;
buffer[buff_pos++] = test;
if(test == ' ' || test == '\t' || test == '\n'){
state = OUT;
if(test == '\n') {
++numLines;
--numChars;//no count newline
buffer[--buff_pos] = '\0';//rewrite newline
printf("%d)%s[%d,%d]\n", numLines, buffer, numWords, numChars);
buff_pos = 0;
numWords = numChars = 0;
}
} else {
if(state == OUT){
state = IN;
++numWords;
}
}
}
fclose(doesthiswork);
if(buff_pos != 0){//Input remains in the buffer.
++numLines;
buffer[buff_pos] = '\0';
printf("%d)%s[%d,%d]\n", numLines, buffer, numWords, numChars);
}
return 0;
}

Resources