I am trying to create the function delete_comments(). The read_file() and main functions are given.
Implement function char *delete_comments(char *input) that removes C comments from program stored at input. input variable points to dynamically allocated memory. The function returns pointer to the polished program. You may allocate a new memory block for the output, or modify the content directly in the input buffer.
You’ll need to process two types of comments:
Traditional block comments delimited by /* and */. These comments may span multiple lines. You should remove only characters starting from /* and ending to */ and for example leave any following newlines untouched.
Line comments starting with // until the newline character. In this case, newline character must also be removed.
The function calling delete_comments() only handles return pointer from delete_comments(). It does not allocate memory for any pointers. One way to implement delete_comments() function is to allocate memory for destination string. However, if new memory is allocated then the original memory in input must be released after use.
I'm having trouble understanding why my current approach is wrong or what is the specific problem that I'm getting weird output. I'm approaching the problem by trying to create a new array where to copy the input string with the new rules.
#include "source.h"
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/* Remove C comments from the program stored in memory block <input>.
* Returns pointer to code after removal of comments.
* Calling code is responsible of freeing only the memory block returned by
* the function.
*/
char *delete_comments(char *input)
{
input = malloc(strlen(input) * sizeof (char));
char *secondarray = malloc(strlen(input) * sizeof (char));
int x, y = 0;
for (x = 0, y = 0; input[x] != '\0'; x++) {
if ((input[x] == '/') && (input[x + 1] == '*')) {
int i = 0;
while ((input[x + i] != '*') && (input[x + i + 1] != '/')) {
y++;
i++;
}
}
else if ((input[x] == '/') && (input[x + 1] == '/')) {
int j = 0;
while (input[x + j] != '\n') {
y++;
j++;
}
}
else {
secondarray[x] = input[y];
y++;
}
}
return secondarray;
}
/* Read given file <filename> to dynamically allocated memory.
* Return pointer to the allocated memory with file content, or
* NULL on errors.
*/
char *read_file(const char *filename)
{
FILE *f = fopen(filename, "r");
if (!f)
return NULL;
char *buf = NULL;
unsigned int count = 0;
const unsigned int ReadBlock = 100;
unsigned int n;
do {
buf = realloc(buf, count + ReadBlock + 1);
n = fread(buf + count, 1, ReadBlock, f);
count += n;
} while (n == ReadBlock);
buf[count] = 0;
return buf;
}
int main(void)
{
char *code = read_file("testfile.c");
if (!code) {
printf("No code read");
return -1;
}
printf("-- Original:\n");
fputs(code, stdout);
code = delete_comments(code);
printf("-- Comments removed:\n");
fputs(code, stdout);
free(code);
}
Your program has fundamental issues.
It fails to tokenize the input. Comment start sequences can occur inside string literals, in which case they do not denote comments: "/* not a comment".
You have some basic bugs:
if ((input[x] == '/') && (input[x + 1] == '*')) {
int i = 0;
while ((input[x + i] != '*') && (input[x + i + 1] != '/')) {
y++;
i++;
}
}
Here, when we enter the loop, with i = 0, input + x is still pointing to the opening /. We did not skip over the opening * and are already looking for a closing *. This means that the sequence /*/ will be recognized as a complete comment, which it isn't.
This loop's also assumes that every /* comment is properly closed. It's not checking for the null character which can terminate the input, so if the comment is not closed, it will march beyond the end of the buffer.
C has line continuations. In ISO C translation stage 2, all backlash-newline sequences are deleted, converting one or more physical lines into logical lines. What that means is that a // comment can span multiple physical lines:
// this is an \
extended comment
You can see, by the way, that StackOverflow's automatic language detector for syntax highlighting is getting this right!
Line continuations are independent of tokenization, which doesn't happen until translation stage 3. Which means:
/\
/\
this is an extended \
comment
That one has defeated StackOverflow's syntax highlighting.
Furthermore, a line continuation can happen in any token, possibly multiple times:
"\
this is a string literal\
"
If you really want to make this work 100% correctly, you need to parse the input. By "parse" I mean a more formal, rigorous detection routine that understands what it is reading, in the context it is reading it.
For example, there are many times where this code could be defeated.
printf("the answer is %d // %d\n", a, b);
would likely trip your // detection and strip the end of the printf.
There are two general approaches to the problem above:
Find every corner case where comment-like characters could be used, and write conditional statements to avoid them before stripping.
Fully parse the language, so you will know if you are within a string or some other context that's wrapping comment like characters, or if you are in the top level context where the characters really mean "this is a comment"
To learn about parsing, I generally recommend "The Dragon Book" but it is a hard read, unless you have studied a bit of Discrete Mathematics. It covers a lot of different parsing techniques, and in doing so it doesn't have many pages left for examples. This means that it's the kind of book where you have to read, think, and then program a mini-example. If you follow that path, there is no input you can't tackle.
If you are pragmatic in your solution, and it is not about learning parsing, but about stripping comments, I recommend that you find a well constructed parser for C, and then learn how to walk the Abstract Syntax Tree in an Emitter, which fails to emit the comments.
There are some projects that do this already; but, I don't know if they have the right structure for easy modification. lint comes to mind, as well as other "pretty-printers" GCC certainly has the parsing code in there, but I've heard that GCC's Abstract Syntax Tree isn't easy to learn.
Your solution has several problems:
The worst issue
As the first instruction in delete_comments() you overwrite input with a new pointer returned by malloc(), which points to memory of random contents.
In consequence the address to the real input is lost.
Oh, and please check the returned value, if you call malloc().
Failing to increment the scanned position in comments correctly
You are scanning the input by the index x, but if you detect a comment, you don't change it.
You are actually advancing y but this is only used for the copying.
Think about lines like these:
int x; /* some /* weird /* comment */
///////////////////////////////
for (;;) { }
Ignoring character and string literals
Your solution should take character and string literals into account.
For example:
int c_plus_plus_comment_start = '//'; /* multi character constant */
const char* c_comment_start = "/*";
Note: There are more. Learn to use a debugger, or at least insert lots of printf()s in "interesting" places.
Related
So I need to create a word search program that will read a data file containing letters and then the words that need to be found at the end
for example:
f a q e g g e e e f
o e q e r t e w j o
t e e w q e r t y u
government
free
and the list of letters and words are longer but anyway I need to save the letters into an array and i'm having a difficult time because it never stores the correct data. here's what I have so far
#include <stdio.h>
int main()
{
int value;
char letters[500];
while(!feof(stdin))
{
value = fgets(stdin);
for(int i =0; i < value; i++)
{
scanf("%1s", &letters[i]);
}
for(int i=0; i<1; i++)
{
printf("%1c", letters[i]);
}
}
}
I also don't know how I am gonna store the words into a separate array after I get the chars into an array.
You said you want to read from a data file. If so, you should open the file.
FILE *fin=fopen("filename.txt", "r");
if(fin==NULL)
{
perror("filename.txt not opened.");
}
In your input file, the first few lines have single alphabets each separated by a space.
If you want to store each of these letters into the letters character array, you could load each line with the following loop.
char c;
int i=0;
while(fscanf(fin, "%c", &c)==1 && c!='\n')
{
if(c!=' ')
{
letters[i++]=c;
}
}
This will only store the letters and is not a string as there is no \0 character.
Reading the words which are at the bottom may be done with fgets().
Your usage of the fgets() function is wrong.
Its prototype is
char *fgets(char *str, int n, FILE *stream);
See here.
Note that fgets() will store the trailing newline(\n) into string as well. You might want to remove it like
str[strlen(str)-1]='\0';
Use fgets() to read the words at the bottom into a character array and replace the \n with a \0.
and do
fgets(letters, sizeof(letters, fin);
You use stdin instead of the fin here when you want to accept input from the keyboard and store into letters.
Note that fgets() will store the trailing newline(\n) into letters as well. You might want to remove it like
letters[strlen(letters)-1]='\0';
Just saying, letters[i] will be a character and not a string.
scanf("%1s", &letters[i]);
should be
scanf("%c", &letters[i]);
One way to store the lines with characters or words is to store them in an array of pointers to arrays - lines,
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXLET 500
#define MAXLINES 1000
int main()
{
char *lptr;
// Array with letters from a given line
char letters[MAXLET];
// Array with pointers to lines with letters
char *lineptr[MAXLINES];
// Length of current array
unsigned len = 0;
// Total number of lines
int nlines = 0;
// Read lines from stdin and store them in
// an array of pointers
while((fgets(letters,MAXLET,stdin))!=NULL)
{
len = strlen(letters);
letters[len-1] = '\0';
lptr = (char*) malloc(len);
strcpy(lptr, letters);
lineptr[nlines++]=lptr;
}
// Print lines
for (int i = 0; i < nlines; i++)
printf("%s\n", lineptr[i]);
// Free allocated memory
for (int i = 0; i < nlines; i++)
free(lineptr[i]);
}
In the following, pointer to every line from stdin is stored in lineptr. Once stored, you can access and manipulate each of the lines - in this simple case I only print them one by one but the examples of simple manipulation are shown later on. At the end, program frees the previously allocated memory. It is a good practice to free the allocated memory once it is no longer in use.
The process of storing a line consists of getting each line from the stdin, collecting it's length with strlen, stripping it's newline character by replacing it with \0 (optional), allocating memory for it with malloc, and finally storing the pointer to that memory location in lineptr. During this process the program also counts the number of input lines.
You can implement this sequence for both of your inputs - chars and words. It will result in a clean, ready to use input. You can also consider moving the line collection into a function, that may require making lineptr type arrays global. Let me know if you have any questions.
Thing to remember is that MAXLET and especially MAXLINES may have to be increased for a given dataset (MAXLINES 1000 literally assumes you won't have more than a 1000 lines).
Also, while on Unix and Mac this program allows you to read from a file as it is by using $ prog_name < in_file it can be readily modified to read directly from files.
Here are some usage examples - lineptr stores pointers to each line (array) hence the program first retrieves the pointer to a line and then it proceeds as with any array:
// Print 3rd character of each line
// then substitute 2nd with 'a'
char *p;
for (int i = 0; i < nlines; i++){
p = lineptr[i];
printf("%c\n", p[2]);
p[1] = 'a';
}
// Print lines
for (int i = 0; i < nlines; i++)
printf("%s\n", lineptr[i]);
// Swap first and second element
// of each line
char tmp;
for (int i = 0; i < nlines; i++){
p = lineptr[i];
tmp = p[0];
p[0] = p[1];
p[1] = tmp;
}
// Print lines
for (int i = 0; i < nlines; i++)
printf("%s\n", lineptr[i]);
Note that these examples are just a demonstration and assume that each line has at least 3 characters. Also, in your original input the characters are separated by a space - that is not necessary, in fact it's easier without it.
The code in your post does not appear to match your stated goals, and indicates you have not yet grasp the proper application of the functions you are using.
You have expressed an idea describing what you want to do, but the steps you have taken (at least those shown) will not get you there. Not even close.
It is always good to have a map in hand to plan to plan your steps. An algorithm is a kind of software map. Before you can plan your steps though, you need to know where you are going.
Your stated goals:
1) Open and read a file into lines.
2) Store the lines, somehow. (using fgets(,,)?)
3) Use some lines as content to search though.
4) Use other lines as objects to search for
Some questions to answer:
a) How is the search content distinguished from the strings to search
for?
b) How is the search content to be stored?
c) How are the search words to be stored?
d) How will the comparison between content and search word be done?
e) How many lines in the file? (example)
f) Length of longest line? (discussion and example) (e & f used to create storage)
g) How is fgets() used. (maybe a google search: How to use fgets)
h) Are there things to be aware of when using feof()? (discussion and examaple feof)
i) Why is my input not right after the second call to scanf? (answer)
Finish identifying and crystallizing the list of items in your goals, then answer these (and maybe other) questions. At that point you will be ready to start identifying the steps to get there.
value = fgets(stdin); is a terrible expression! You don't respect at all the syntax of the fgets function. My man page says
char *
fgets(char * restrict str, int size, FILE * restrict stream);
So here, as you do not pass the stream at the right place, you probably get an underlying io error and fgets returns NULL, which is converted to the int 0 value. And then the next loop is just a no-op.
The correct way to read a line with fgets is:
if (NULL == fgets(letters, sizeof(letters), stdin) {
// indication of end of file or error
...
}
// Ok letters contains the line...
What would be the reason for out[0] = '\0'; on the main() function?
It does seem to be working without it.
Code
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAXTOKEN 100
enum { NAME, PARENS, BRACKETS };
int tokentype;
char token[MAXTOKEN]; /*last token string */
char name[MAXTOKEN]; /*identifier name */
char datatype[MAXTOKEN]; /*data type = char, int, etc. */
char out[1000];
void dcl(void);
void dirdcl(void);
int gettoken(void);
/*
Grammar:
dcl: optional * direct-dcl
direct-dcl: name
(dcl)
direct-dcl()
direct-dcl[optional size]
*/
int main() /* convert declaration to words */
{
while (gettoken() != EOF) { /* 1st token on line */
/* 1. gettoken() gets the datatype from the token */
strcpy(datatype, token);
/* 2. Init out to end of the line? */
/* out[0] = '\0'; */
/* parse rest of line */
dcl();
if (tokentype != '\n')
printf("syntax error\n");
printf("%s: %s %s\n", name, out, datatype);
}
return 0;
}
int gettoken(void) /* return next token */
{
int c, getch(void);
void ungetch(int);
char *p = token;
/* Skip blank spaces and tabs */
while ((c = getch()) == ' ' || c == '\t')
;
if (c == '(') {
if ((c = getch()) == ')') {
strcpy(token, "()");
return tokentype = PARENS;
} else {
ungetch(c);
return tokentype = '(';
}
} else if (c == '[') {
for (*p++ = c; (*p++ = getch()) != ']'; )
;
*p = '\0';
return tokentype = BRACKETS;
} else if (isalpha(c)) {
/* Reads the next character of input */
for (*p++ = c; isalnum(c = getch()); ) {
*p++ = c;
}
*p = '\0';
ungetch(c); /* Get back the space, tab */
return tokentype = NAME;
} else
return tokentype = c;
}
/* dcl: parse a declarator */
void dcl(void)
{
int ns;
for (ns = 0; gettoken() == '*'; ) /* count *'s */
ns++;
dirdcl();
while (ns-- > 0)
strcat(out, " pointer to");
}
/* dirdcl: parse a direct declarator */
void dirdcl(void)
{
int type;
if (tokentype == '(') {
dcl();
if (tokentype != ')')
printf("error: missing )\n");
}
else if (tokentype == NAME) /* variable name */ {
strcpy(name, token);
printf("token: %s\n", token);
}
else
printf("error: expected name or (dcl)\n");
while ((type = gettoken()) == PARENS || type == BRACKETS) {
if (type == PARENS)
strcat(out, " function returning");
else {
strcat(out, " array");
strcat(out, token);
strcat(out, " of");
}
}
}
You need out[0] to be zero in order for strcat to work.
While this line
out[0] = '\0';
was required prior to the introduction of static initialization rules, it is no longer required, because static arrays, such as out[], are initialized to all zeros.
According to initialization rules of C99,
...
if it has arithmetic type, it is initialized to (positive or unsigned) zero.
if it is an aggregate, every member is initialized (recursively) according to these rules.
It is resetting the char array (aka string) to empty array. (removing junk values)
like we use:
int i = 0;
before doing something like:
i += 1;
so that junk value don't add
So just '\0' in 0 index of array tells that array is completely empty and the strcat function starts appending value from 0 index, over writing the junk values in other indexes of array.
If program is working without resetting array then it means your IDE tool is doing that for you, but it is good practice to reset it.
In short: In this particular case it's not strictly necessary, but in many other cases that look suspiciously similar, it is, so most people do it as "good style". So why would it be necessary?
There is no such thing as "empty" memory. There is no such thing as a "length". Unless you explicitly keep track of it, or define your own.
Memory is just bytes, which are numbers from 0 to 255. Since 0 is just as valid a number as 255, there is no way to tell whether a byte is used or not. You can "add up" several bytes if you need larger numbers, but everything is built out of bytes, in the end. Text is simply mapped to a number. A couple decades ago it was decided which number represents which character. So if you see a byte with the value 32, it could be a 32. Or it could be the 32nd letter in the computer's alphabet (which is the space character).
When you receive a string and you don't know how much text you will be dealing with, what you usually do is you reserve a large block of bytes. This is what char out[1000]; above does. But how do you tell where the text ends? How much of the 1000 bytes you've already used?
Well, in the old days, some people would just declare another variable, say, int length; and keep track of how many bytes they've used so far. The designers of C went a different route. They decided to pick a very rare character and use that as a marker. They picked the character with the value 0 for that (That is not the character '0'. The character '0' actually is the 48th letter of a computer's alphabet).
So you can just look at all the bytes in your string from the start, and if a character is > 0, you know it is used. If you reach a 0 character, you know this is the end of your string. There are various advantages to either approach. An int uses 4 bytes, an additional 0-character only 1. On the other hand, if you use an int, a string can also contain a 0-character, it's just another character, nobody cares.
Whenever you write "foo" in C, what C actually does is reserve room for 4 bytes, for 'f', 'o', 'o' and for the 0 to indicate the end. When you write "" in C, what it does is reserve room for a single byte, the 0. So that you can tell that the string is empty.
So, what is memory filled with before you put something into it at startup? Well, in most cases, it is just garbage. Whatever was in that memory the last time it was used (after all, you have limited RAM, so when you quit one application on your computer, its memory can get re-used for the next app you launch after that). These will be random numbers, often outside of the range of common characters.
So, if you want strcat to see out as an empty string, you need to give it a block of memory that starts with this 0 value character. If you just leave memory like it is, there might be some random characters in it. Your buffer might contain "jbhasugaudq7e1723876123798dbkda0skno§§^^%$#-9H0HWDZmwus0/usr/local/bin"
or whatever was in that memory before. If you now appended some text to it, it would think the stuff before the first 0 (which is just randomly in this place) was a valid string, and append it to that. It will only know that this string is supposed to be empty, if you put a 0 right at the start.
So why did I say it is "not strictly necessary"? Well, because in your case, out is a global variable, and global variables are special because they automatically get cleared to 0 when your application starts up (or assigned any value that you assign them when you declare them).
However, this is only true for global variables (both regular globals and static globals). So many programmers make it a habit to always initialize their blocks of bytes. That way, if someone later decides to change a global into a local variable, or copy-and-pastes the code to another spot to use with a local variable, they do not have to worry about forgetting to add this statement.
This is especially useful as random memory often contains 0 characters. So depending on what program you previously used, you might not notice you forgot the initial 0 because there happened to be one already in there. And only later, when one of your users runs this application, they get garbage at the start of their string.
Does that clarify things a bit?
I'm working on a project and I just encountered a really annoying problem. I have a file which stores all the messages that my account received. A message is a data structure defined this way:
typedef struct _message{
char dest[16];
char text[512];
}message;
dest is a string that cannot contain spaces, unlike the other fields.
Strings are acquired using the fgets() function, so dest and text can have "dynamic" length (from 1 character up to length-1 legit characters). Note that I manually remove the newline character after every string is retrieved from stdin.
The "inbox" file uses the following syntax to store messages:
dest
text
So, for example, if I have a message from Marco which says "Hello, how are you?" and another message from Tarma which says "Are you going to the gym today?", my inbox-file would look like this:
Marco
Hello, how are you?
Tarma
Are you going to the gym today?
I would like to read the username from the file and store it in string s1 and then do the same thing for the message and store it in string s2 (and then repeat the operation until EOF), but since text field admits spaces I can't really use fscanf().
I tried using fgets(), but as I said before the size of every string is dynamic. For example if I use fgets(my_file, 16, username) it would end up reading unwanted characters. I just need to read the first string until \n is reached and then read the second string until the next \n is reached, this time including spaces.
Any idea on how can I solve this problem?
#include <stdio.h>
int main(void){
char username[16];
char text[512];
int ch, i;
FILE *my_file = fopen("inbox.txt", "r");
while(1==fscanf(my_file, "%15s%*c", username)){
i=0;
while (i < sizeof(text)-1 && EOF!=(ch=fgetc(my_file))){
if(ch == '\n' && i && text[i-1] == '\n')
break;
text[i++] = ch;
}
text[i] = 0;
printf("user:%s\n", username);
printf("text:\n%s\n", text);
}
fclose(my_file);
return 0;
}
As the length of each string is dynamic then, if I were you, I would read the file first for finding each string's size and then create a dynamic array of strings' length values.
Suppose your file is:
A long time ago
in a galaxy far,
far away....
So the first line length is 15, the second line length is 16 and the third line length is 12.
Then create a dynamic array for storing these values.
Then, while reading strings, pass as the 2nd argument to fgets the corresponding element of the array. Like fgets (string , arrStringLength[i++] , f);.
But in this way you'll have to read your file twice, of course.
You can use fgets() easily enough as long as you're careful. This code seems to work:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
enum { MAX_MESSAGES = 20 };
typedef struct Message
{
char dest[16];
char text[512];
} Message;
static int read_message(FILE *fp, Message *msg)
{
char line[sizeof(msg->text) + 1];
msg->dest[0] = '\0';
msg->text[0] = '\0';
while (fgets(line, sizeof(line), fp) != 0)
{
//printf("Data: %zu <<%s>>\n", strlen(line), line);
if (line[0] == '\n')
continue;
size_t len = strlen(line);
line[--len] = '\0';
if (msg->dest[0] == '\0')
{
if (len < sizeof(msg->dest))
{
memmove(msg->dest, line, len + 1);
//printf("Name: <<%s>>\n", msg->dest);
}
else
{
fprintf(stderr, "Error: name (%s) too long (%zu vs %zu)\n",
line, len, sizeof(msg->dest)-1);
exit(EXIT_FAILURE);
}
}
else
{
if (len < sizeof(msg->text))
{
memmove(msg->text, line, len + 1);
//printf("Text: <<%s>>\n", msg->dest);
return 0;
}
else
{
fprintf(stderr, "Error: text for %s too long (%zu vs %zu)\n",
msg->dest, len, sizeof(msg->dest)-1);
exit(EXIT_FAILURE);
}
}
}
return EOF;
}
int main(void)
{
Message mbox[MAX_MESSAGES];
int n_msgs;
for (n_msgs = 0; n_msgs < MAX_MESSAGES; n_msgs++)
{
if (read_message(stdin, &mbox[n_msgs]) == EOF)
break;
}
printf("Inbox (%d messages):\n\n", n_msgs);
for (int i = 0; i < n_msgs; i++)
printf("%d: %s\n %s\n\n", i + 1, mbox[i].dest, mbox[i].text);
return 0;
}
The reading code will handle (multiple) empty lines before the first name, between a name and the text, and after the last name. It is slightly unusual in they way it decides whether to store the line just read in the dest or text parts of the message. It uses memmove() because it knows exactly how much data to move, and the data is null terminated. You could replace it with strcpy() if you prefer, but it should be slower (the probably not measurably slower) because strcpy() has to test each byte as it copies, but memmove() does not. I use memmove() because it is always correct; memcpy() could be used here but it only works when you guarantee no overlap. Better safe than sorry; there are plenty of software bugs without risking extras. You can decide whether the error exit is appropriate — it is fine for test code, but not necessarily a good idea in production code. You can decide how to handle '0 messages' vs '1 message' vs '2 messages' etc.
You can easily revise the code to use dynamic memory allocation for the array of messages. It would be easy to read the message into a simple Message variable in main(), and arrange to copy into the dynamic array when you get a complete message. The alternative is to 'risk' over-allocating the array, though that is unlikely to be a major problem (you would not grow the array one entry at a time anyway to avoid quadratic behaviour when the memory has to be moved during each allocation).
If there were multiple fields to be processed for each message (say, date received and date read too), then you'd need to reorganize the code some more, probably with another function.
Note that the code avoids the reserved namespace. A name such as _message is reserved for 'the implementation'. Code such as this is not part of the implementation (of the C compiler and its support system), so you should not create names that start with an underscore. (That over-simplifies the constraint, but only slightly, and is a lot easier to understand than the more nuanced version.)
The code is careful not to write any magic number more than once.
Sample output:
Inbox (2 messages):
1: Marco
How are you?
2: Tarma
Are you going to the gym today?
I have a tiny problem with my assignment. The whole program is about tree data structures but I do not have problems with that.
My problem is about some basic stuff: reading strings from user input and then storing them in an array list.
char str[1000];
fgets(str, 1000, stdin);
int x = 0;
int y = 0;
int z = 0;
char **list;
list = (char**)malloc((x+1)*sizeof(char));
list[x] = (char*)malloc((y+1)*sizeof(char));
while(str[z] != '\n')
{
list[x][y] = str[z];
z++;
if(str[z] == ',')
{
x++;
y = 0;
list = (char**)realloc(list, (x+1) * sizeof(char*));
list[x] = (char*)malloc((y + 1)*sizeof(char));
z++;
if(str[z] == ' ') // Skips space after the comma
{
z++;
}
}
else if(str[z] == '\n')
{
break;
}
else
{
y++;
list[x] = (char*)realloc(list[x], (y+1)*sizeof(char));
}
}
I pass this list array into another function.
As an example, inputs could be something like
Abcde, Fghijk, Lmnop, Qrstu
and I am trying to split each of these words into the array list.
Abcde
Fghijk
Lmnop
Qrstu
When I try to output the strings I sometimes get weird, excessive characters such as upside down question marks and numbers.
printf("%s ", list[some_number]);
gets me
Fghijk¿
or
Fghijk\200
All of my program works as expected except for this minor problem which I am having trouble solving. Even with the same exact inputs the bugs may or may not appear. I am guessing it has to do with memory allocation?
Thanks for your help!
You need to put '\0' at the end of your new string.
See most of the C library functions such as printf and strlen process strings assuming \0 as the end character of all. Otherwise, they keep on reading the memory out of bounds either making a memory violation or gets some where the value 0 and stops and all the bytes in between in the memory are interpreted to their extended ascii equivalent hence you are getting such a strange behaviour.
So, allocate an extra byte for \0 character and assign it to the last byte.
Either initialize your variables to null, or as tomato said, put a null character at the end of the new string.
C lacks many of the luxuries programmers now take for granted when it comes to memory management. You're on the right path with malloc but that function only allocates memory... it doesn't clear it out. As a result, your variables will have the correct amount of space (critical for reducing memory leaks and overflow errors), but will be filled with garbage. This garbage could be anything, and in your case, it's an upside down question mark. Appropriate, don't you think?
I could be mistaken since I can't run the code myself without more information, but after your
char **list;
list = (char**)malloc((x+1)*sizeof(char));
list[x] = (char*)malloc((y+1)*sizeof(char));
statements, you'll want to do something like this:
list = NULL;
and the like to clear out the garbage.
Furthermore, you may care to use the strlen() function (contained in string.h) to figure out just how many blocks of memory you need to allocate.
Clearing out the spaces you use for variables is a good practice to get into with C. Good to see you learning it as well.
I'm learning C from the k&r as a first language, and I just wanted to ask, if you thought this exercise was being solved the right way, I'm aware that it's probably not as complete as you'd like, but I wanted views, so I'd know I'm learning C right.
Thanks
/* Exercise 1-22. Write a program to "fold" long input lines into two or
* more shorter lines, after the last non-blank character that occurs
* before then n-th column of input. Make sure your program does something
* intelligent with very long lines, and if there are no blanks or tabs
* before the specified column.
*
* ~svr
*
* [NOTE: Unfinished, but functional in a generic capacity]
* Todo:
* Handling of spaceless lines
* Handling of lines consisting entirely of whitespace
*/
#include <stdio.h>
#define FOLD 25
#define MAX 200
#define NEWLINE '\n'
#define BLANK ' '
#define DELIM 5
#define TAB '\t'
int
main(void)
{
int line = 0,
space = 0,
newls = 0,
i = 0,
c = 0,
j = 0;
char array[MAX] = {0};
while((c = getchar()) != EOF) {
++line;
if(c == NEWLINE)
++newls;
if((FOLD - line) < DELIM) {
if(c == BLANK) {
if(newls > 0) {
c = BLANK;
newls = 0;
}
else
c = NEWLINE;
line = 0;
}
}
array[i++] = c;
}
for(line = 0; line < i; line++) {
if(array[0] == NEWLINE)
;
else
printf("%c", array[line]);
}
return 0;
}
I'm sure you on the rigth track, but some pointers for readability:
comment your stuff
name the variables properly and at least give a description if you refuse
be consequent, some single-line if's you use and some you don't. (imho, always use {} so it's more readable)
the if statement in the last for-loop can be better, like
if(array[0] != NEWLINE)
{
printf("%c", array[line]);
}
That's no good IMHO.
First, it doesn't do what you were asked for. You were supposed to find the last blank after a nonblank before the output line boundary. Your program doesn't even remotely try to do it, it seems to strive for finding the first blank after (margin - 5) characters (where did the 5 came from? what if all the words had 9 letters?). However it doesn't do that either, because of your manipulation with the newls variable. Also, this:
for(line = 0; line < i; line++) {
if(array[0] == NEWLINE)
;
else
printf("%c", array[line]);
}
is probably wrong, because you check for a condition that never changes throughout the loop.
And, last but not least, storing the whole file in a fixed-size buffer is not good, because of two reasons:
the buffer is bound to overflow on large files
even if it would never overflow, people still wouldn't like you for storing eg. a gigabyte file in memory just to cut it into 25-character chunks
I think you should start again, rethink your algorithm (incl. corner cases), and only after that, start coding. I suggest you:
process the file line-by-line (meaning output lines)
store the line in a buffer big enough to hold the largest output line
search for the character you'll break at in the buffer
then print it (hint: you can terminate the string with '\0' and print with printf("%s", ...)), copy what you didn't print to the start of the buffer, proceed from that
An obvious problem is that you statically allocate 'array' and never check the index limits while accessing it. Buffer overflow waiting to happen. In fact, you never reset the i variable within the first loop, so I'm kinda confused about how the program is supposed to work. It seems that you're storing the complete input in memory before printing it word-wrapped?
So, suggestions: merge the two loops together and print the output for each line that you have completed. Then you can re-use the array for the next line.
Oh, and better variable names and some comments. I have no idea what 'DELIM' is supposed to do.
It looks (without testing) like it could work, but it seems kind of complicated.
Here's some pseudocode for my first thought
const int MAXLINE = ?? — maximum line length parameter
int chrIdx = 0 — index of the current character being considered
int cand = -1 — "candidate index", Set to a potential break character
char linebuf[bufsiz]
int lineIdx = 0 — index into the output line
char buffer[bufsiz] — a character buffer
read input into buffer
for ix = 0 to bufsiz -1
do
if buffer[ix] == ' ' then
cand = ix
fi
linebuf[lineIdx] = buffer[ix]
lineIdx += 1
if lineIdx >= MAXLINE then
linebuf[cand] = NULL — end the string
print linebuf
do something to move remnants to front of line (memmove?)
fi
od
It's late and I just had a belt, so there may be flaws, but it shows the general idea — load a buffer, and copy the contents of the buffer to a line buffer, keeping track of the possible break points. When you get close to the end, use the breakpoint.