Splitting "String" into characters in C - c

I am taking a beginner's course in C, and trying to wrap my head around "strings". I have previously programmed in Java, so it is a bit confusing.
I am wondering how to split a "string" into characters, so as to be able to remove certain characters. I have written code for a linked list, with each node holding a data value, as well as a next pointer (calling it node_line, as it holds lines).
typedef struct node {
char *data;
struct node *next;
} node_line;
This works without problems, and I can traverse the entire list and print out each element:
void print_list(node_line head) {
node_line * current = head;
while(current != NULL) {
printf("%s\n", current->data);
current = current->next;
}
}
However, I am having problems with converting the "string" in current->data into characters. That is, reading one character at a time.
For instance, I want to write a program that removes all the vowels in a "string". I have managed to solve this when reading a file, using the getc() function. However, I can't seem to do so with the text in current-> data.
int c;
while((c = getc(f)) != EOF) {
//REMOVE
if(c=='a' || c=='e' || c=='i' || c=='o' || c=='u' || c=='y') {
printf(""); //Remove the vowel
}
else {
putchar(c); //Write out one character at the time.
}
}
I imagine it being something like:
while ((c = getc(current->data) != NULL) { ... }
Any help, tips, etc. are highly appreciated!

getc is for reading from files. To access chars in a char * buffer (string) you would typically do something like this:
for (const char * p = current->data; *p != '\0'; ++p)
{
char c = *p;
if (c=='a' || c=='e' || c=='i' || c=='o' || c=='u' || c=='y') {
...
}
}
Or if you prefer explicit array indexing rather than pointers:
const char * s = current->data;
for (int i = 0; i < strlen(s); ++i)
{
char c = s[i];
if (c=='a' || c=='e' || c=='i' || c=='o' || c=='u' || c=='y') {
...
}
}

void print_list(node_line head) {
should be
void print_list(node_line *head) {
getc:
Returns the character currently pointed by the internal file position
indicator of the specified stream.
That's not what you want, use pointer arithmetic:
char *s = current->data;
while (*s) {
if(*s=='a' || *s=='e' || *s=='i' || *s=='o' || *s=='u' || *s=='y') {
printf(""); //Remove the vowel
}
else {
putchar(*s); //Write out one character at the time.
}
s++;
}
or better:
char *s = current->data;
while (*s) {
if(*s!='a' && *s!='e' && *s!='i' && *s!='o' && *s!='u' && *s!='y') {
putchar(*s); //Write out one character at the time.
}
s++;
}

Related

Is there a way to make my if statement more efficient with an enum?

I am working on a simple parser which takes a string as input and parses a string to see if the opening and closing parentheses/brackets/braces are correctly placed. One step in this involves me skipping every character that is not a valid token (a parenthesis, bracket, or brace), because at this point I don't care whether the expression inside the parentheses are valid or not–I'm only interested in whether the parentheses are syntactically correct. I wrote an if statement which tells the loop to skip to the next iteration when it encounters anything that's not an opening or closing brace, but the code looks ugly and is repetitive. I was wondering if there was a better way to do it–perhaps with an enum. Directly below is the function in question, (parse), and below that, I've pasted the code for the entire program so far. If you answer or attempt to answer this, thank you for your time.
void parse(char *string) {
for (int i = 0; i < strlen(string); i++) {
if (string[0] == ')' || string[0] == ']' || string[0] == '}') {
printf("ParseError: Statement begins with invalid token '%c'", string[0]);
return;
}
if (string[i] != '(' || string[i] != '[' || string[i] != '{' ||
string[i] != ')' || string[i] != ']' || string[i] != '}') {
continue;
}
}
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct Node {
char character;
struct Node *link;
} * top;
struct Node *getNewNode(char);
void push(char);
void pop(void);
void parse(char *);
void print(void);
int main() {
top = NULL;
char string[100];
char *string_ptr = string;
printf("Enter an expression to parse: ");
fgets(string, sizeof(string), stdin);
parse(string_ptr);
print();
}
struct Node *getNewNode(char character) {
struct Node *newNode = (struct Node *)(malloc(sizeof(struct Node)));
newNode->character = character;
newNode->link = top;
return newNode;
}
void push(char character) {
struct Node *newNode = getNewNode(character);
top = newNode;
}
void pop(void) {
struct Node *temp;
if (top == NULL) {
printf("Stack is empty!");
return;
}
temp = top;
top = top->link;
free(temp);
temp = NULL;
}
void print(void) {
struct Node *temp = top;
while (temp != NULL) {
printf("%c", temp->character);
temp = temp->link;
}
}
void parse(char *string) {
for (int i = 0; i < strlen(string); i++) {
if (string[0] == ')' || string[0] == ']' || string[0] == '}') {
printf("ParseError: Statement begins with invalid token '%c'", string[0]);
return;
}
if (string[i] != '(' || string[i] != '[' || string[i] != '{' ||
string[i] != ')' || string[i] != ']' || string[i] != '}' ||) {
continue;
}
}
}
There are college courses on parsing theory. We construct software “machines” of various kinds to parse strings, so good solutions for issues like this involve incorporating them into an overall parsing scheme, not solving each one individually.
However, given that, a typical way to handle something like this is to prepare an array with information about the characters:
#include <limits.h>
// Define bit flags for character properties.
enum { IsOpener = 1, IsCloser = 2, };
// Initialize array with flags for characters.
static unsigned CharacterProperties[UCHAR_MAX+1] =
{
['('] = IsOpener,
['['] = IsOpener,
['{'] = IsOpener,
[')'] = IsCloser,
[']'] = IsCloser,
['}'] = IsCloser,
};
…
if (CharacterProperties[string[0]] & IsOpener)
… // Here string[0] is one of the “open parentheses” type of characters.
Note there are some sign issues to watch out for: The subscript to CharacterProperties should be nonnegative, so string should be unsigned char * or you should cast it with (unsigned char) in the subscript or you should ensure char is unsigned. And, if any of the characters in the initialization could be negative (are not in C’s basic execution character set), they should be cast too.
This may be a good use case for a switch:
void parse( char *string )
{
/**
* Make sure string[0] is valid first
*/
switch( string[0] )
{
case ')':
case ']':
case '}':
fprintf( stderr, "parse error..." );
return;
break;
default:
break;
}
/**
* Compute the length of the string once rather
* than every time through the loop.
*/
size_t len = strlen( string );
for ( size_t i = 0; i < len; i ++ )
{
/**
* Is the current character a delimiter?
*/
switch( string[i] )
{
case '(':
case ')':
case '[':
case ']':
case '{':
case '}':
// we'll process the delimiter character following
// the end of the switch statement
break;
default:
// this is not a delimiter, go back to the beginning of
// the loop
continue;
break;
}
// process delimiter character
}
}
but that heavily depends on how you're going to process other characters once you beef up your parser. Switches can get ugly and unmaintainable in a hurry. I've written this such that the switches act only as filters; they simply decide whether to proceed with the current operation or not, there's no processing logic in either one.

Problems in data structure

I'm making a data structure that stores strings from a file in memory. The struture is:
typedef struct node
{
bool end;
struct node *letters[27];
} node;
Each char from the string goes through a hash function:
int
hash(int letter)
{
int n;
// converts upper case to a number from 1 to 26
if (letter >= 65 && letter <= 90)
n = letter - 64;
// converts letter case to a number from 1 to 26
else if (letter >= 97 && letter <= 122)
n = letter - 96;
// converts apostrophe to 27
else if (letter == '\'')
n = 0;
return n;
}
Thus, the structure is similar to a tree where the position of each pointer in the array corresponds to a letter, as follows:
tree
The function that loads the words into memory is as follows:
bool
load(const char *dict)
{
// open the dictionary file
FILE *infile = fopen(dict, "r");
if (infile == NULL)
{
return false;
}
// pointer for the first tree
node *first = calloc(28, sizeof(node));
if (first == NULL)
return false;
//pointer to the nodes
node *nextptr = NULL;
// word storage variables
int index = 0;
char word[LENGTH+1];
// stores the words of the file in the struct
for (int c = fgetc(infile); c != EOF; c = fgetc(infile))
{
// reads only letters and apostrophes
if ((c >= 65 && c <= 90) || (c >= 97 && c <= 122) || (c == '\''))
{
word[index] = c;
// creates a new tree from the first tree
if (index == 0)
{
// checks if there is a struct for the char
if (first->letters[hash(word[0])] != NULL)
{
nextptr = first->letters[hash(word[0])];
index++;
}
// creates a new struct for the char
else
{
first->letters[hash(word[0])] = calloc(28, sizeof(node));
if (first->letters[hash(word[0])] == NULL)
return false;
nextptr = first->letters[hash(word[0])];
index++;
}
}
// create the following structures
else
{
// checks if there is a struct for the char
if (nextptr->letters[hash(word[index])] != NULL)
{
nextptr = nextptr->letters[hash(word[index])];
index++;
}
// creates a new struct for the char
else
{
nextptr->letters[hash(word[index])] = calloc(28, sizeof(node));
if (nextptr->letters[hash(word[index])] == NULL)
return false;
nextptr = nextptr->letters[hash(word[index])];
index++;
}
}
}
// creates the last struct for a word
else if (c == '\n')
{
// ends the word
word[index] = '\0';
// the boolean end is set to true, defining the end of the word
nextptr->end = true;
nextptr = NULL;
// prepares for a new word
index = 0;
}
}
return true;
}
The function that has an error is an function that check strings and verify if is in the structure:
bool
check(const char *word)
{
node *checkptr = first;
for (int i = 0; i < sizeof(word); i++)
{
if (word[i] == '\0' && checkptr->end == true)
return true;
else if (checkptr->letters[hash(word[i])] != NULL)
checkptr = checkptr->letters[hash(word[i])];
else
return false;
}
return false;
}
When the program is started, segmentation fault occurs on the line else if (checkptr->letters[hash(word[i])] != NULL) and valgrind shows Invalid read of size 8.
I will still create a function to give the necessary free but I guess that the problem is there, mainly because I tried to check if the checkptr pointer was really set to the same structure as the first, but I discover that first is set to NULL, why?
Sorry for my bad english, i'm a beginner programmer and it's my first time on Stack Overflow, but I really don't know how to solve this. If someone can help me in any way, i thank you in advance.
The problem was solved. The error really was in the pointer first. I had accidentally used calloc for a new pointer, instead of the original first, so check was accessing the original first pointer (which had remained NULL).

I am not able to find out the position of string from file

Actually my program use to find the no of words in given user file.
What I am doing here means i am getting string from file and i am calculating the every requirements for that string(position, line number). But i not able to find the position. could you please anyone help me to find out that..
Below is my code path:
void find(FILE *str)
{
short i = 0;
char ch, substr[20];
char *p;
while((ch = fgetc(str))!=EOF)
{
noc++;
if(ch == '\n')
{
pos = 1;
nol++;
}
if (ch != '\n' && ch != '\t' && ch != ' ')
substr[i++] = ch;
else
{
now++;
substr[i] = '\0';
create(substr);
i = 0;
}
}
return;
}
Thanks in advance
-Rubesh G.
Position is zero (pos = 0) at the beginning of each line and increments everytime a character is not a new line.

HEAP CORRUPTION DETECTED in C

I am having some problems with my program and getting this error :
HEAP CORRUPTION DETECTED: before Normal block (#9873672) at 0x00968988.
CRT detected that the application wrote to memory before start of heap buffer.
I have tried to look for fixes but I can't figure out what it wrong with my program, what to fix and where :(
Here is the function I'm using and that is causing me problems :
What I am doing is basically look into a file for a specific keyword (argument of the function getText) and printing its matching value.
Sorry if most of the variables are in French, it's a project for school and our teacher require us to use French names >_<
#include "getText.h"
#include "main.h"
#include <stdlib.h>
texteLangue* ressourcesTexteLangue = NULL;
int compteur = 0;
char* getText(char* clef)
{
char* texte = NULL;
texte = clef; //clef is the keyword passed in the function as argument
texteLangue temp;
temp.clef = clef;
texteLangue* resultat = (texteLangue*) bsearch(&temp, ressourcesTexteLangue, compteur, sizeof(texteLangue), comparerClef); //returns the value associated with the key
if (clef != NULL)
{
if (resultat != NULL)
texte = resultat->valeur;
}
return texte;
}
void lectureTexte(char* langue)
{
char nomFichierRessources[64];
sprintf(nomFichierRessources, "ressources_%s.txt", langue); //give the file name a specific ending depending on the language chosen
FILE* FichierRessources = fopen(nomFichierRessources, "r");
if (FichierRessources == NULL)
{
system("cls");
perror("The following error occured ");
system("PAUSE");
exit(42);
}
//allocates memory for the language resources
int taille = 10;
ressourcesTexteLangue = (texteLangue *) calloc(taille, sizeof(texteLangue));
if (ressourcesTexteLangue == NULL)
printf("Pas assez de place mémoire pour les ressources texte");
//gives a value to TextResource.key and TextResource.value for each line of the file
char* ligne;
while ((ligne = lectureLigne(FichierRessources)))
{
if (strlen(ligne) > 0)
{
if (compteur == taille)
{
taille += 10;
ressourcesTexteLangue = (texteLangue *) realloc(ressourcesTexteLangue, taille * sizeof(texteLangue));
}
ressourcesTexteLangue[compteur].clef = ligne;
while (*ligne != '=')
{
ligne++;
}
*ligne = '\0';
ligne++;
ressourcesTexteLangue[compteur].valeur = ligne;
compteur++;
}
}
//sorts out the values of TextResource obtained
qsort(ressourcesTexteLangue, compteur, sizeof(texteLangue), comparerClef);
fclose(FichierRessources);
}
//reads a line and returns it
char* lectureLigne(FILE *fichier)
{
int longeur = 10, i = 0, c = 0;
char* ligne = (char*) calloc(longeur, sizeof(char));
if (fichier)
{
c = fgetc(fichier);
while (c != EOF)
{
if (i == longeur)
{
longeur += 10;
ligne = (char*) realloc(ligne, longeur * sizeof(char));
}
ligne[i++] = c;
c = fgetc(fichier);
if ((c == '\n') || (c == '\r'))
break;
}
ligne[i] = '\0';
while ((c == '\n') || (c == '\r'))
c = fgetc(fichier);
if (c != EOF)
ungetc(c,fichier);
if ((strlen(ligne) == 0) && (c == EOF))
{
free(ligne);
ligne = NULL;
}
}
return ligne;
}
//frees the TextRessources
void libererTexte()
{
if (ressourcesTexteLangue != NULL)
{
while (compteur--)
{
free(ressourcesTexteLangue[compteur].clef);
}
free(ressourcesTexteLangue);
}
}
//compares the keys
int comparerClef(const void* e1, const void* e2)
{
return strcmp(((texteLangue*) e1)->clef, ((texteLangue*) e2)->clef);
}
the structure of RessourceTextelangue (TextResources) look like this :
typedef struct texteLangue {
char* clef;
char* valeur;
} texteLangue;
There are several potential problems with your code that could be causing the error report you see.
Here is one:
if (i == longeur)
should be:
if ((i+1) == longeur)
otherwise,
ligne[i] = '\0';
can occur in conditions when
ligne[i++] = c;
has caused i to become equal to longeur.
Here is another:
while (*ligne != '=')
{
ligne++;
}
*ligne = '\0';
the above code should be:
while (*ligne != '=' && *ligne != '\0')
{
ligne++;
}
*ligne = '\0';
otherwise, you will corrupt memory in the case when there is no '=' to be found in the string.
Although either of these could cause the symptom you report, I see some other oddities that make me think there is more wrong than I have seen so far. Nevertheless, fixing those two problems will at least reduce the number of possibilities you have to consider.
Is your input guaranteed to contain a '=' in each line?
while (*ligne != '=') // if not, this will fly off the end of your buffer...
{
ligne++;
}
*ligne = '\0'; // ...and write to unallocated heap memory
Edit
Given #Heath's comment, if your input contains a blank line (including ending with a single blank line) then the above would most certainly be triggered.
c = fgetc(fichier); // reads '\n'
while (c != EOF)
{
...
ligne[i++] = c;
...
ligne[i] = '\0';
ligne now contains "\n" and is returned. later it is used:
if (strlen(ligne) > 0) // strlen("\n") is greater than 0
{
...
while (*ligne != '=') // oops! loop until we find a '=' somewhere
// in the heap or crash trying.
{
ligne++;
}
*ligne = '\0'; // corrupt the heap here

C Devoweller memory error

I have almost completed my devowelling program for class but have encountered the memory access violation error when it reaches a while loop where I have tried to detect the absence of vowels from a linked list node. I realise that the way I have done this is ridiculously inefficient (many logical OR checks) but I was struggling with other ways to do this. Utterly confused. Not expecting too much help but any pointers (:S) would be greatly appreciated.
https://gist.github.com/3992412
Or, copy'n'paste:
#include <iostream>
#include <stdlib.h>
struct NODE {
char letter;
struct NODE *next;
};
int vowelcheck(struct NODE *llist, int num);
void addnode(struct NODE *llist, char c);
void showsentence(struct NODE *llist);
void devowel(struct NODE *llist);
int main(void) {
char charin;
int input = 1;
struct NODE *llist;
int nodeno = 0;
llist = (struct NODE *)malloc(sizeof(struct NODE));
llist->letter = 0;
llist->next = NULL;
while(input != 0) {
printf("\n\n --Disemvoweler--\n");
printf("(0) Quit\n");
printf("(1) Enter sentence\n");
printf("(2) Disemvowel\n");
printf("(3) Display parsed sentence\n");
scanf("%d", &input);
switch(input) {
case 0: //exit
default:
printf("Exiting\n");
break;
case 1: //sentence input
printf("\nEnter sentence, finish sentence with full stop (.) :\n");
do
{
charin=getchar();
addnode(llist, charin);
}
while (charin != '.');
break;
case 2: //remove vowels
printf("Your choice: `Disembvowel'\n");
while(llist->next != NULL) {
devowel(llist);
llist = llist->next;
}
printf("Disembvoweled!\n");
break;
case 3: //show sentence in memory (devoweled or not)
printf("\n Parsed sentence: \n");
showsentence(llist);
break;
}
}
free(llist);
return(0);
}
void showsentence(struct NODE *llist) {
while(llist->next != NULL) { //while not the last link (ie not full stop)
printf("%c ", llist->letter); //print letter
llist = llist->next; //move to next link
}
}
void addnode(struct NODE *llist, char charin) {
while(llist->next != NULL)
llist = llist->next;
llist->next = (struct NODE *)malloc(sizeof(struct NODE));
llist->next->letter = charin;
llist->next->next = NULL;
}
void devowel(struct NODE *llist) {
struct NODE *temp;
temp = (struct NODE *)malloc(sizeof(struct NODE));
if(llist->letter == 'A' || llist->letter == 'a' || llist->letter == 'E' || llist->letter == 'e' || llist->letter == 'I' || llist->letter == 'i' || llist->letter == 'O' || llist->letter == 'o' || llist->letter == 'U' || llist->letter == 'u')
{
/* remove the node */
temp = llist->next;
free(llist);
llist = temp;
} else {
while(llist->next->letter != 'A' || llist->next->letter != 'a' || llist->next->letter != 'E' || llist->next->letter != 'e' || llist->next->letter != 'I' || llist->next->letter != 'i' || llist->next->letter != 'O' || llist->next->letter != 'o' || llist->next->letter != 'U' || llist->next->letter != 'u')
llist = llist->next;
temp = llist->next->next;
free(llist->next);
llist->next = temp;
}
}
while(...HORRIBLE CONDITION DEREFERENCING llist->next SNIPPED...)
llist = llist->next;
temp = llist->next->next;
free(llist->next);
llist->next = temp;
}
This piece of code has at least two fatal problems. First of all, in C++ indentation does not determine blocks, {} do, and you are missing a pair. Second, you access contents of llist->next without checking if there is next element, or if list has ended.
You might have use for something like:
int character_is_vowel(char ch)
{
return strchr("AEIOUaeiou", ch) != NULL;
}
See what I did, there? I took a small part of the problem at hand (determine whether a character is a vowel in English) and broke it out into a standalone piece of the program.
Then, using standard library functions to cut down on the repetitive nature is of course another idea that's generally good to make code more readable.
Regarding your code, your linked-list code is severely broken in many places. You should consider if you really must implement this using linked lists. It's a string transform, and strings in C are not typically treated as linked lists. Of course, since this was for class, I guess your hands are tied.
Then, you should look hard at all the list operations you do, and analyze if they make sense. Think about memory validity, checks for NULL so you don't overstep the end of the list, and (again) if it's maybe possible to break these operations out into dedicated functions that you can write, think about, and test in isolation from each other and from the actual problem you're trying to solve.
Epigrams about mountains and molehills spring to mind.
Here's a working program that reads lines of input and then disemvowels those lines.
Sample run
$ ./disemvowel
What's the point of including vowels if you're going to strip 'em all?
Entered: What's the point of including vowels if you're going to strip 'em all?
Disemvowelled: Wht's th pnt f ncldng vwls f y'r gng t strp 'm ll?
$
Sample source
#include <string.h>
#include <stdio.h>
static int is_vowel(char c)
{
return(strchr("aeiouAEIOU", c) != 0);
}
int main(void)
{
char line[4096];
while (fgets(line, sizeof(line), stdin) != 0)
{
printf("Entered: %s", line);
char *dst = line;
char *src = line;
char c;
while ((c = *src++) != '\0')
{
if (!is_vowel(c))
*dst++ = c;
}
*dst = '\0';
printf("Disemvowelled: %s", line);
}
return(0);
}
If the string contains no vowels, it copies the string, byte-by-byte, over itself. However, adding a conditional in the loop to see if dst < src would complicate things for negligible benefit (and, in the long run, it would slow things down). If you want to speed this up, you'd use a table-driven is_vowel() function:
static int is_vowel(char c)
{
static vowels[256];
if (vowels['a'] == 0)
{
unsigned char *v = "aeiouAEIOU";
while (*v != '\0')
vowel[*v++] = 1;
}
return vowels[(unsigned char)c];
}
Note the coercion of possibly signed char c to unsigned char to ensure no problems.
The next level of performance would replace the is_vowel() function with a macro that accesses a global array vowels that is appropriately initialized before first use. That's harder to orchestrate (but it's what typically happens with the macros in <ctype.h> such as isalpha()).
Note, too, that the name is_vowel() steers clear of the names reserved by the <ctype.h> header:
§7.31.2 Character handling <ctype.h>
¶1 Function names that begin with either is or to, and a lowercase letter may be added to
the declarations in the <ctype.h> header.

Resources