C text parser does not identify word - c

I'm trying to do a simple parsing of text with a C program. A function I have written is supposed to check the buffer that has a line of text saved into it and see if this line contains a particular word at BOL.
The input arguments are:
size: the sizeof(word), calculated before the function is called.
buf: the buffer containing a line from the text being parsed.
word: the word that the function looks for at BOL.
The code is as follows:
#include <stdio.h>
#include <string.h>
int strchk(int size, const char buf[1024], char *word) {
char a[size];
int i;
for (i = 0; i < size - 1; i++) {
a[i] = buf[i];
}
if (strcmp(a, word) == 0)
return 1;
else
return 0;
}
The problem is that for some reason, a word is not being recognized. Previous words have been correctly identified by the same function. Below are two contexts wherein the function is being called, the first one results in a correct identification, the second does not, while the text contains both words at the start of different lines within the text.
char c[] = "|conventional_long_name";
if (strchk(sizeof(c), buf, c)) {
fputs(" conventional_long_name: \"", stdout);
getdata(buf, c, sizeof(c));
}
char d[] = "|official_languages";
if (strchk(sizeof(d), buf, d)) {
fputs(" religion: \"", stdout);
getdata(buf, d, sizeof(d));
}
When I check string a in the strchk() function for size first, it gives me a size of 20, but if I make it print out the string it tells me it is in fact |official_languagesfici. When you count the number of characters it's just as long as the previously mentioned |conventional_long_name, which would suggest some parameter from that function call is at play in the next function call, I just can't figure out where I have made the mistake. Any help would be greatly appreciated.

You need to set the null terminator of the a array in the function strchk.
Do it using
a[size - 1] = '\0';
after the for-loop.
Notes:
since you don't modify the string word in the function strchk, declare the parameter const. const-correctness is important!

Related

How do I read the contents of a file in C and store it into an array using fgets()?

So I need to create a word search program that will read a data file containing letters and then the words that need to be found at the end
for example:
f a q e g g e e e f
o e q e r t e w j o
t e e w q e r t y u
government
free
and the list of letters and words are longer but anyway I need to save the letters into an array and i'm having a difficult time because it never stores the correct data. here's what I have so far
#include <stdio.h>
int main()
{
int value;
char letters[500];
while(!feof(stdin))
{
value = fgets(stdin);
for(int i =0; i < value; i++)
{
scanf("%1s", &letters[i]);
}
for(int i=0; i<1; i++)
{
printf("%1c", letters[i]);
}
}
}
I also don't know how I am gonna store the words into a separate array after I get the chars into an array.
You said you want to read from a data file. If so, you should open the file.
FILE *fin=fopen("filename.txt", "r");
if(fin==NULL)
{
perror("filename.txt not opened.");
}
In your input file, the first few lines have single alphabets each separated by a space.
If you want to store each of these letters into the letters character array, you could load each line with the following loop.
char c;
int i=0;
while(fscanf(fin, "%c", &c)==1 && c!='\n')
{
if(c!=' ')
{
letters[i++]=c;
}
}
This will only store the letters and is not a string as there is no \0 character.
Reading the words which are at the bottom may be done with fgets().
Your usage of the fgets() function is wrong.
Its prototype is
char *fgets(char *str, int n, FILE *stream);
See here.
Note that fgets() will store the trailing newline(\n) into string as well. You might want to remove it like
str[strlen(str)-1]='\0';
Use fgets() to read the words at the bottom into a character array and replace the \n with a \0.
and do
fgets(letters, sizeof(letters, fin);
You use stdin instead of the fin here when you want to accept input from the keyboard and store into letters.
Note that fgets() will store the trailing newline(\n) into letters as well. You might want to remove it like
letters[strlen(letters)-1]='\0';
Just saying, letters[i] will be a character and not a string.
scanf("%1s", &letters[i]);
should be
scanf("%c", &letters[i]);
One way to store the lines with characters or words is to store them in an array of pointers to arrays - lines,
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXLET 500
#define MAXLINES 1000
int main()
{
char *lptr;
// Array with letters from a given line
char letters[MAXLET];
// Array with pointers to lines with letters
char *lineptr[MAXLINES];
// Length of current array
unsigned len = 0;
// Total number of lines
int nlines = 0;
// Read lines from stdin and store them in
// an array of pointers
while((fgets(letters,MAXLET,stdin))!=NULL)
{
len = strlen(letters);
letters[len-1] = '\0';
lptr = (char*) malloc(len);
strcpy(lptr, letters);
lineptr[nlines++]=lptr;
}
// Print lines
for (int i = 0; i < nlines; i++)
printf("%s\n", lineptr[i]);
// Free allocated memory
for (int i = 0; i < nlines; i++)
free(lineptr[i]);
}
In the following, pointer to every line from stdin is stored in lineptr. Once stored, you can access and manipulate each of the lines - in this simple case I only print them one by one but the examples of simple manipulation are shown later on. At the end, program frees the previously allocated memory. It is a good practice to free the allocated memory once it is no longer in use.
The process of storing a line consists of getting each line from the stdin, collecting it's length with strlen, stripping it's newline character by replacing it with \0 (optional), allocating memory for it with malloc, and finally storing the pointer to that memory location in lineptr. During this process the program also counts the number of input lines.
You can implement this sequence for both of your inputs - chars and words. It will result in a clean, ready to use input. You can also consider moving the line collection into a function, that may require making lineptr type arrays global. Let me know if you have any questions.
Thing to remember is that MAXLET and especially MAXLINES may have to be increased for a given dataset (MAXLINES 1000 literally assumes you won't have more than a 1000 lines).
Also, while on Unix and Mac this program allows you to read from a file as it is by using $ prog_name < in_file it can be readily modified to read directly from files.
Here are some usage examples - lineptr stores pointers to each line (array) hence the program first retrieves the pointer to a line and then it proceeds as with any array:
// Print 3rd character of each line
// then substitute 2nd with 'a'
char *p;
for (int i = 0; i < nlines; i++){
p = lineptr[i];
printf("%c\n", p[2]);
p[1] = 'a';
}
// Print lines
for (int i = 0; i < nlines; i++)
printf("%s\n", lineptr[i]);
// Swap first and second element
// of each line
char tmp;
for (int i = 0; i < nlines; i++){
p = lineptr[i];
tmp = p[0];
p[0] = p[1];
p[1] = tmp;
}
// Print lines
for (int i = 0; i < nlines; i++)
printf("%s\n", lineptr[i]);
Note that these examples are just a demonstration and assume that each line has at least 3 characters. Also, in your original input the characters are separated by a space - that is not necessary, in fact it's easier without it.
The code in your post does not appear to match your stated goals, and indicates you have not yet grasp the proper application of the functions you are using.
You have expressed an idea describing what you want to do, but the steps you have taken (at least those shown) will not get you there. Not even close.
It is always good to have a map in hand to plan to plan your steps. An algorithm is a kind of software map. Before you can plan your steps though, you need to know where you are going.
Your stated goals:
1) Open and read a file into lines.
2) Store the lines, somehow. (using fgets(,,)?)
3) Use some lines as content to search though.
4) Use other lines as objects to search for
Some questions to answer:
a) How is the search content distinguished from the strings to search
for?
b) How is the search content to be stored?
c) How are the search words to be stored?
d) How will the comparison between content and search word be done?
e) How many lines in the file? (example)
f) Length of longest line? (discussion and example) (e & f used to create storage)
g) How is fgets() used. (maybe a google search: How to use fgets)
h) Are there things to be aware of when using feof()? (discussion and examaple feof)
i) Why is my input not right after the second call to scanf? (answer)
Finish identifying and crystallizing the list of items in your goals, then answer these (and maybe other) questions. At that point you will be ready to start identifying the steps to get there.
value = fgets(stdin); is a terrible expression! You don't respect at all the syntax of the fgets function. My man page says
char *
fgets(char * restrict str, int size, FILE * restrict stream);
So here, as you do not pass the stream at the right place, you probably get an underlying io error and fgets returns NULL, which is converted to the int 0 value. And then the next loop is just a no-op.
The correct way to read a line with fgets is:
if (NULL == fgets(letters, sizeof(letters), stdin) {
// indication of end of file or error
...
}
// Ok letters contains the line...

Convert buffer to array of 8-bit values and back in C

I would like to write a program in C that gets the file content via stdin and reads it line by line and, for each line, converts it to an array of 8-bit integer values.
I also would like to be able to do the reverse process. After working with my array of 8-bit values, I would like to convert it again to "lines" that would be organized as a new buffer.
So basically, I would like to convert a char * line to an int array[] and back (an int array[] to a char * line) while keeping the consistency, so when I create the file again out of the conversions, the file is valid (and by valid I mean, the conversion from int array[] to char * line generates the same content of the original char * line, while reading each line of the stdin.
My code is currently as follows:
#include <stdio.h>
#include <stdlib.h>
int main() {
FILE *stream;
char *line = NULL;
size_t len = 0;
ssize_t read;
stream = stdin;
if (stream == NULL)
exit(EXIT_FAILURE);
while ((read = getline(&line, &len, stream)) != -1) {
char * array = line_to_array(line);
// here I include the rest of my code
// where I am going to use the generated array
// ...
}
free(line);
fclose(stream);
exit(EXIT_SUCCESS);
}
The line_to_array function would be the one to convert the "line" content to the array of integers. In a second file, I would just do the opposite.
The mechanics of the process would be like this:
The first program (first.c) would receive a file content via stdin. By reading it using getline, I would have each line to convert to an array of integers and send each line to a second program (second.c) that would convert each array to a char * buffer again and the reconstruct the file.
In the terminal, I would run it like this:
./first | ./second
I appreciate any help on this matter.
Thank you.
I believe you may already know that a name of array is a kind of constant pointer. You could verify the fact from following code:
char hello[] = "hello world!";
for( int idx=0; *(hello + idx) != 0; idx++ )
{
printf("%c", *(hello + idx));
}
printf("\n");
So, there are no reason to convert character pointer to array. For your information, A char variable is a 8bit data in C, this can contain a integer value which is represent a character: 65 represent 'A' in ASCII code.
Secondly, this link may help you to understand how to convert between c string and std::string.
On second thought, may your input file is UNICODE or UTF-8 encoded file which is using multi-byte character code. In that case, you may not able to use getline() to read the string from the file. If so, please refer this question: Reading unicode characters.
I wish following code assist you to understand char type, array and pointer in C/C++:
std::string hello("Hello world");
const char *ptr = hello.c_str();
for( int idx=0; idx < hello.size(); idx++ )
{
printf("%3d ", *(ptr + idx));
}
printf("\n");
std::string hello("Hello world");
const char *ptr = hello.c_str();
for( int idx=0; idx < hello.size(); idx++ )
{
printf("%3d ", ptr[idx]);
}
printf("\n");

How to pass output from one function to another in C?

my program has two separate functions it excludes non-prime positions and then reserve the output from this. Except the issue I'm having is that at present it is not working how I want it to in the sense that the first function excludes the non prime places and then the second function uses the original input and reserves that instead of the output from function one. I'm new to C so please go easy on me.
#include <stdio.h>
#include <string.h>
void reverse(char[], long, long);
int main()
{
char str[50];
int i, j, k, cnt = 0;
long size;
printf("Enter: ");
scanf("%s", str);
size = strlen(str);
reverse(str, 0, size - 1);
printf(str);
return 0;
}
void reverse(char str[], long index, long size)
{
char temp;
temp = str[index];
str[index] = str[size - index];
str[size - index] = temp;
if (index == size / 2)
{
return;
}
reverse(str, index + 1, size);
}
Sorry for being so vague, a sample output from an input of 1234567 would be 2357 then this output reversed into 7532.
Here is what your code does (as it is written right now):
Input a string from console
Print out every character that is in a "prime position" (2, 3, 5, 7, etc) without modifying the string
Reverse the original, unmodified string
print out the reversed string
It sounds to me that what you are looking for is the following:
When you "exclude characters from a string" you create a new string that you populate with the characters from the input string that are in non-prime positions. Than you use that modified string as a parameter into the reverse() function.
I have no doubt that if you understand the above paragraph you'd have no problem fixing your code.
Here are the steps to achieve what you need:
input string into str (as you currently do)
introduce a new string str2 of the same size 50 as the original string
in your loop copy every character from str into str2 that is not in a "prime position"
call reverse() providing it the str2 as a parameter, not the (current) str

How to correctly input a string in C

I am currently learning C, and so I wanted to make a program that asks the user to input a string and to output the number of characters that were entered, the code compiles fine, when I enter just 1 character it does fine, but when I enter 2 or more characters, no matter what number of character I enter, it will always say there is just one character and crashes after that. This is my code and I can't figure out what is wrong.
int main(void)
{
int siz;
char i[] = "";
printf("Enter a string.\n");
scanf("%s", i);
siz = sizeof(i)/sizeof(char);
printf("%d", siz);
getch();
return 0;
}
I am currently learning to program, so if there is a way to do it using the same scanf() function I will appreciate that since I haven't learned how to use any other function and probably won't understand how it works.
Please, FORGET that scanf exists. The problem you are running into, whilst caused mostly by your understandable inexperience, will continue to BITE you even when you have experience - until you stop.
Here is why:
scanf will read the input, and put the result in the char buffer you provided. However, it will make no check to make sure there is enough space. If it needs more space than you provided, it will overwrite other memory locations - often with disastrous consequences.
A safer method uses fgets - this is a function that does broadly the same thing as scanf, but it will only read in as many characters as you created space for (or: as you say you created space for).
Other observation: sizeof can only evaluate the size known at compile time : the number of bytes taken by a primitive type (int, double, etc) or size of a fixed array (like int i[100];). It cannot be used to determine the size during the program (if the "size" is a thing that changes).
Your program would look like this:
#include <stdio.h>
#include <string.h>
#define BUFLEN 100 // your buffer length
int main(void) // <<< for correctness, include 'void'
{
int siz;
char i[BUFLEN]; // <<< now you have space for a 99 character string plus the '\0'
printf("Enter a string.\n");
fgets(i, BUFLEN, stdin); // read the input, copy the first BUFLEN characters to i
siz = sizeof(i)/sizeof(char); // it turns out that this will give you the answer BUFLEN
// probably not what you wanted. 'sizeof' gives size of array in
// this case, not size of string
// also not
siz = strlen(i) - 1; // strlen is a function that is declared in string.h
// it produces the string length
// subtract 1 if you don't want to count \n
printf("The string length is %d\n", siz); // don't just print the number, say what it is
// and end with a newline: \n
printf("hit <return> to exit program\n"); // tell user what to do next!
getc(stdin);
return 0;
}
I hope this helps.
update you asked the reasonable follow-up question: "how do I know the string was too long".
See this code snippet for inspiration:
#include <stdio.h>
#include <string.h>
#define N 50
int main(void) {
char a[N];
char *b;
printf("enter a string:\n");
b = fgets(a, N, stdin);
if(b == NULL) {
printf("an error occurred reading input!\n"); // can't think how this would happen...
return 0;
}
if (strlen(a) == N-1 && a[N-2] != '\n') { // used all space, didn't get to end of line
printf("string is too long!\n");
}
else {
printf("The string is %s which is %d characters long\n", a, strlen(a)-1); // all went according to plan
}
}
Remember that when you have space for N characters, the last character (at location N-1) must be a '\0' and since fgets includes the '\n' the largest string you can input is really N-2 characters long.
This line:
char i[] = "";
is equivalent to:
char i[1] = {'\0'};
The array i has only one element, the program crashes because of buffer overflow.
I suggest you using fgets() to replace scanf() like this:
#include <stdio.h>
#define MAX_LEN 1024
int main(void)
{
char line[MAX_LEN];
if (fgets(line, sizeof(line), stdin) != NULL)
printf("%zu\n", strlen(line) - 1);
return 0;
}
The length is decremented by 1 because fgets() would store the new line character at the end.
The problem is here:
char i[] = "";
You are essentially creating a char array with a size of 1 due to setting it equal to "";
Instead, use a buffer with a larger size:
char i[128]; /* You can also malloc space if you desire. */
scanf("%s", i);
See the link below to a similar question if you want to include spaces in your input string. There is also some good input there regarding scanf alternatives.
How do you allow spaces to be entered using scanf?
That's because char i[] = ""; is actually an one element array.
Strings in C are stored as the text which ends with \0 (char of value 0). You should use bigger buffer as others said, for example:
char i[100];
scanf("%s", i);
Then, when calculating length of this string you need to search for the \0 char.
int length = 0;
while (i[length] != '\0')
{
length++;
}
After running this code length contains length of the specified input.
You need to allocate space where it will put the input data. In your program, you can allocate space like:
char i[] = " ";
Which will be ok. But, using malloc is better. Check out the man pages.

Reading formatted strings from file into Array in C

I am new to the C programming language and trying to improve by solving problems from the Project Euler website using only C and its standard libraires. I have covered basic C fundamentals(I think), functions, pointers, and some basic file IO but now am running into some issues.
The question is about reading a text file of first names and calculating a "name score" blah blah, I know the algorithm I am going to use and have most of the program setup but just cannot figure out how to read the file correctly.
The file is in the format
"Nameone","Nametwo","billy","bobby","frank"...
I have searched and searched and tried countless things but cannot seem to read these as individual names into an array of strings(I think thats the right way to store them individually?) I have tried using sscanf/fscanf with %[^\",]. I have tried different combos of those functions and fgets, but my understanding of fgets is everytime I call it it will get a new line, and this is a text file with over 45,000 characters all on the same line.
I am unsure if I am running into problems with my misunderstanding of the scanf functions, or my misunderstanding with storing an array of strings. As far as the array of strings goes, I (think) I have realized that when I declare an array of strings it does not allocate memory for the strings themselves, something that I need to do. But I still cannot get anything to work.
Here is the code I have now to try to just read in some names I enter from the command line to test my methods.
This code works to input any string up to buffer size(100):
int main(void)
{
int i;
char input[100];
char* names[10];
printf("\nEnter up to 10 names\nEnter an empty string to terminate input: \n");
for(int i = 0; i < 10; i++)
{
int length = 0;
printf("%d: ", i);
fgets(input, 100, stdin);
length = (int)strlen(input);
input[length-1] = 0; // Delete newline character
length--;
if(length < 1)
{
break;
}
names[i] = malloc(length+1);
assert(names[i] != NULL);
strcpy(names[i], input);
}
}
However, I simply cannot make this work for reading in the formatted strings.
PLEASE advise me as to how to read it in with format. I have previously used sscanf on the input buffer and that has worked fine, but I dont feel like I can do that on a 45000+ char line? Am I correct in assuming this? Is this even an acceptable way to read strings into an array?
I apologize if this is long and/or not clear, it is very late and I am very frustrated.
Thank anyone and everyone for helping, and I am looking forward to finally becoming an active member on this site!
There are really two basic issues here:
Whether scanning string input is the proper strategy here. I would argue not because while it might work on this task you are going to run into more complicated scenarios where it too easily breaks.
How to handle a 45k string.
In reality you won't run into too many string of this size but it is nothing that a modern computer of any capacity can't easily handle. Insofar as this is for learning purposes then learn iteratively.
The easiest first approach is to fread() the entire line/file into an appropriately sized buffer and parse it yourself. You can use strtok() to break up the comma-delimited tokens and then pass the tokens to a function that strips the quotes and returns the word. Add the word to your array.
For a second pass you can do away with strtok() and just parse the string yourself by iterating over the buffer and breaking up the comma tokens yourself.
Last but not least you can write a version that reads smaller chunks of the file into a smaller buffer and parses them. This has the added complexity of handling multiple reads and managing the buffers to account for half-read tokens at the end of a buffer and so on.
In any case, break the problem into chunks and learn with each refinement.
EDIT
#define MAX_STRINGS 5000
#define MAX_NAME_LENGTH 30
char* stripQuotes(char *str, char *newstr)
{
char *temp = newstr;
while (*str)
{
if (*str != '"')
{
*temp = *str;
temp++;
}
str++;
}
return(newstr);
}
int main(int argc, char *argv[])
{
char fakeline[] = "\"Nameone\",\"Nametwo\",\"billy\",\"bobby\",\"frank\"";
char *token;
char namebuffer[MAX_NAME_LENGTH] = {'\0'};
char *name;
int index = 0;
char nameArray[MAX_STRINGS][MAX_NAME_LENGTH];
token = strtok(fakeline, ",");
if (token)
{
name = stripQuotes(token, namebuffer);
strcpy(nameArray[index++], name);
}
while (token != NULL)
{
token = strtok(NULL, ",");
if (token)
{
memset(namebuffer, '\0', sizeof(namebuffer));
name = stripQuotes(token, namebuffer);
strcpy(nameArray[index++], name);
}
}
return(0);
}
fscanf("%s", input) reads one token (a string surrounded by spaces) at a time. You can either scan the input until you encounter a specific "end-of-input" string, such as "!", or you can wait for the end-of-file signal, which is achieved by pressing "Ctrl+D" on a Unix console or by pressing "Ctrl+Z" on a Windows console.
The first option:
fscanf("%s", input);
if (input[0] == '!') {
break;
}
// Put input on the array...
The second option:
result = fscanf("%s", input);
if (result == EOF) {
break;
}
// Put input on the array...
Either way, as you read one token at a time, there are no limits on the size of the input.
Why not search the giant string for quote characters instead? Something like this:
#include <stdio.h>
#include <string.h>
int main(void)
{
char mydata[] = "\"John\",\"Smith\",\"Foo\",\"Bar\"";
char namebuffer[20];
unsigned int i, j;
int begin = 1;
unsigned int beginName, endName;
for (i = 0; i < sizeof(mydata); i++)
{
if (mydata[i] == '"')
{
if (begin)
{
beginName = i;
}
else
{
endName = i;
for (j = beginName + 1; j < endName; j++)
{
namebuffer[j-beginName-1] = mydata[j];
}
namebuffer[endName-beginName-1] = '\0';
printf("%s\n", namebuffer);
}
begin = !begin;
}
}
}
You find the first double quote, then the second, and then read out the characters in between to your name string. Then you process those characters as needed for the problem in question.

Resources