Duplicate string detection in C - c

I'm trying to create a function that removes duplicate strings from a stream in C. The strings are already sorted so the only necessary step is to check the string that just appeared to make sure the current string is not a duplicate. However, my attempted implementation is not giving me the correct output.In fact, I get no output at all. The strings are separated by newline characters. Can anyone tell me what I'm missing here?
void dupEliminate(int file, char string[100])
{
FILE *stream;
stream = fdopen(file, "r");
char* savedString;
char* prevString;
while(!feof(stream)){
(fgets(savedString, 100, stream));
if(strcmp(savedString,prevString) != 0 ){
strcat(string, savedString);
strcpy(prevString,savedString);
}

char* prevString;
prevString is uninitialized in this function , and yet you compare it here -
if(strcmp(savedString,prevString) != 0 )
Also , before taking input in savedString using fgets , you need to allocate memory to it using malloc or calloc ,as it is an unintialized pointer.
What it will compare to ? Initialize you prevString and then compare it .
Note- Just a suggestion instead of using while(!feof(stream)) , use fgets to control loop -
while(fgets(savedString, 100, stream)!=NULL){
...
}

You need to give savedString and prevString some memory - malloc or make them char[101]
You will also need to initialise prevString
prevString needs to be updated with the last line unconditionally, not in the if block
I don't like the names either :)

It is unclear what you ultimately want to do with the non-duplicate strings given the code you have posted. At present, even if you collect only non-duplicate strings, you are currently overwriting the value in string without doing any thing with it.
If you are reading from a continual stream, then you either need to just print (or save or copy) the non-duplicate string somewhere to make use of it, or you need to buffer the non-duplicate strings somewhere and make use of them when your buffer is full.
Since you say you are not getting the correct output, but have no output functions listed, it is somewhat of a guess what you are trying to do. If you simply want to output the non-duplicate strings, then you can simply print them (note using fgets you will have a newline at the end of each string read). An example would be:
#define MAXC 100
...
char string[MAXC] = {0};
...
void dupEliminate (int file, char *string)
{
FILE *stream;
stream = fdopen (file, "r");
if (!stream) {
fprintf (stderr, "dupEliminate() error: stream open failed.\n");
return;
}
char pre[MAXC] = {0}; /* previous string */
size_t word = 0;
while (fgets (string, MAXC, stream)) {
if (word == 1) {
if (strcmp (pre, string) != 0) {
printf ("%s", pre);
printf ("%s", string);
}
else
printf ("%s", string);
}
if (word > 1 && strcmp (pre, string) != 0)
printf ("%s", string);
strncpy (pre, string, MAXC);
word++;
}
fclose (stream);
}
If your intent was to buffer all non-duplicate strings and make them available in main(), then there are several things you need to do. Let me know if you have any further questions and I'm happy to work with your further once I have more details about your intent.

Related

Issue with fread () and fwrite () functions in C

I have written a basic code which writes into a file a string in binary mode (using fwrite()). Also I can read the same string from the file (using fread()) in to the buffer and print it. It works but in the part where I read from the file, extra junk is also read into the buffer. My question is how to know the length of the bytes to be read, correctly?
The following is the code --
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#define BUFSZ 81
char * get_string (char *, size_t);
int main (int argc, char * argv[])
{
if (argc != 2)
{
fprintf (stderr, "Invalid Arguments!!\n");
printf ("syntax: %s <filename>\n", argv[0]);
exit (1);
}
FILE * fp;
if ((fp = fopen(argv[1], "ab+")) == NULL)
{
fprintf (stderr, "Cannot openm file <%s>\n", argv[1]);
perror ("");
exit (2);
}
char string[BUFSZ];
char readString[BUFSZ];
size_t BYTES, BYTES_READ;
puts ("Enter a string: ");
get_string (string, BUFSZ);
// printf ("You have entered: %s\n", string);
BYTES = fwrite (string, sizeof (char), strlen (string), fp);
printf ("\nYou have written %zu bytes to file <%s>.\n", BYTES, argv[1]);
printf ("\nContents of the file <%s>:\n", argv[1]);
rewind (fp);
BYTES_READ = fread (readString, sizeof (char), BUFSZ, fp);
printf ("%s\n", readString);
printf ("\nYou have read %zu bytes from file <%s>.\n", BYTES_READ, argv[1]);
getchar ();
fclose (fp);
return 0;
}
char * get_string (char * str, size_t n)
{
char * ret_val = fgets (str, n, stdin);
char * find;
if (ret_val)
{
find = strchr (str, '\n');
if (find)
* find = '\0';
else
while (getchar () != '\n')
continue;
}
return ret_val;
}
in the part where I read from the file, extra junk is also read into the buffer.
No, it isn't. Since you're opening the file in append mode, it's possible that you're reading in extra data preceding the string you've written, but you are not reading anything past the end of what you wrote, because there isn't anything there to read. When the file is initially empty or absent, you can verify that by comparing the value of BYTES to the value of BYTES_READ.
What you are actually seeing is the effect of the read-back data not being null terminated. You did not write the terminator to the file, so you could not read it back. It might be reasonable to avoid writing the terminator, but in that case you must supply a new one when you read the data back in. For example,
readString[BYTES_READ] = '\0';
My question is how to know the length of the bytes to be read, correctly?
There are various possibilities. Among the prominent ones are
use fixed-length data
write the string length to the file, ahead of the string data.
Alternatively, in your particular case, when the file starts empty and you write only one string in it, there is also the possibility of capturing and working with how many bytes were read instead of knowing in advance how many should be read.
First of all you get string from the user, which will contain up to BUFSZ-1 characters (get_string() function will remove the trailing newline or skip any character exceeding the BUFSZ limit.
For example, the user might have inserted the word Hello\n, so that after get_string() call string array contains
-------------------
|H|e|l|l|o|'\0'|...
-------------------
Then you fwrite the string buffer to the output file, writing strlen (string) bytes. This doesn't include the string terminator '\0'.
In our example the contents of the output file is
--------------
|H|e|l|l|o|...
--------------
Finally you read back from the file. But since readString array is not initialized, the file contents will be followed by every junk character might be present in the uninitialized array.
For example, readString could have the following initial contents:
---------------------------------------------
|a|a|a|a|a|T|h|i|s| |i|s| |j|u|n|k|!|'\0'|...
---------------------------------------------
and after reading from the file
---------------------------------------------
|H|e|l|l|o|T|h|i|s| |i|s| |j|u|n|k|!|'\0'|...
---------------------------------------------
So that the following string would be printed
HelloThis is junk!
In order to avoid these issues, you have to make sure that a trailing terminator is present in the target buffer. So, just initialize the array in this way:
char readString[BUFSZ] = { 0 };
In this way at least a string terminator will be present in the target array.
Alternatively, memset it to 0 before every read:
memset (readString, 0, BUFSZ);

How to read a text file into matrix form in C

Given the following text file with the following content in it
SpotA B C
SpotB pass D
Spotc A E F
How to do I break up the words into tokens and store them in a 10 x 10 matrix.
Note that if the content in the file is a matrix size with smaller than 10 x 10, I want to add the character ~ to those positions.
So far this is my code:
char *matrix[10][10];
int loadFileToMatrix(char *filename){
FILE *fp;
int row = 0;
int col= 0;
char *tokens;
char buffer[1000];
fp = fopen(filename,"r");
if(fp == NULL){
perror(filename);
return(1);
}
while((fgets(buffer, sizeof(buffer), fp))!= NULL) {
tokens = strtok(buffer," ");
map[row++][col++] = tokens;
}
return(0);
}
If some can help me figure out how to achieve my goal that would be nice. Currently, I am really confused on how to proceed.
Just use fscanf to read tokens from file to buffer, then copy tokens into your the matrix map. You can use fgetc to detect if it reaches the end of line and the end of file.
char ch;
while (1) {
fscanf(fp, "%s", buffer);
matrix[row][col] = (char *)malloc(sizeof(char) * (strlen(buffer) + 1));
strcpy(matrix[row][col], buffer);
ch = fgetc(fp);
if (ch == ' ') {
col += 1;
}
else if (ch == '\n') {
row += 1;
col = 0;
}
else if (ch == EOF) {
break; // end of file.
}
}
strtok() is a weird function.
The key part of the man page is this:
"On the first call to strtok() the string to be parsed should be specified in str. In each subsequent call that should parse the same string, str should be NULL."
The reason for this is that strtok() alters the string you pass it. It searches through a string until it finds the next character that matches one of the delimiters, and then replaces that delimiter with a null terminator. If the delimiter is found at position n, internally, strtok() saves the position n+1 as the start of the rest of the string.
By calling strtok a second time with a non-null value, you are telling the function to start all over again at the start of that string, and try again to find a delimiter -- which it can never do, because it already found the first one. Instead, your second call to strtok() should pass NULL as the first argument, so each pass can bring out the next token.
If for some reason you need to call strtok() on multiple strings simultaneously, you will overwrite the internally-saved address; only the most recent call is saved properly. The reentrant function strtok_r() is useful in that situation.
If you're ever not sure how to use a function, the man pages are the best resource. You can type man strtok at the command line, or even just google it.
It looks like, in this case, you're using strtok() only once. This will just return the address of the first piece of the buffer, delimited by your delimiters. You need to call strtok() in a loop to get each piece in turn.

How to save every line in file (IN C) in a variable? :)

I need to save every line of text file in c in a variable.
Here's my code
int main()
{
char firstname[100];
char lastname[100];
char string_0[256];
char string[256] = "Vanilla Twilight";
char string2[256];
FILE *file;
file = fopen("record.txt","r");
while(fgets(string_0,256,file) != NULL)
{
fgets(string2, 256, file);
printf("%s\n", string2);
if(strcmp(string, string2)==0)
printf("A match has been found");
}
fclose(file);
return 0;
}
Some lines are stored in the variable and printed on the cmd but some are skipped.
What should I do? When I tried sscanf(), all lines were complete but only the first word of each line is printed. I also tried ffscanf() but isn't working too. In fgets(), words per line are complete, but as I've said, some lines are skipped (even the first line).
I'm just a beginner in programming, so I really need help. :(
You're skipping over the check every odd number of lines, as you have two successive fgets() calls and only one strcmp(). Reduce your code to
while(fgets(string_0,256,file) != NULL)
{
if( ! strcmp(string_0, string2) )
printf("A match has been found\n");
}
FWIW, fgets() reads and stores the trailing newline, which can cause problem is string comparison, you need to take care of that, too.
As a note, you should always check the return value of fopen() for success before using the returned pointer.

fscanf() how to go in the next line?

So I have a wall of text in a file and I need to recognize some words that are between the $ sign and call them as numbers then print the modified text in another file along with what the numbers correspond to.
Also lines are not defined and columns should be max 80 characters.
Ex:
I $like$ cats.
I [1] cats.
[1] --> like
That's what I did:
#include <stdio.h>
#include <stdlib.h>
#define N 80
#define MAX 9999
int main()
{
FILE *fp;
int i=0,count=0;
char matr[MAX][N];
if((fp = fopen("text.txt","r")) == NULL){
printf("Error.");
exit(EXIT_FAILURE);
}
while((fscanf(fp,"%s",matr[i])) != EOF){
printf("%s ",matr[i]);
if(matr[i] == '\0')
printf("\n");
//I was thinking maybe to find two $ but Idk how to replace the entire word
/*
if(matr[i] == '$')
count++;
if(count == 2){
...code...
}
*/
i++;
}
fclose(fp);
return 0;
}
My problem is that fscanf doesn't recognize '\0' so it doesn't go in the next line when I print the array..also I don't know how to replace $word$ with a number.
Not only will fscanf("%s") read one whitespace-delimited string at a time, it will also eat all whitespace between those strings, including line terminators. If you want to reproduce the input whitespace in the output, as your example suggests you do, then you need a different approach.
Also lines are not defined and columns should be max 80 characters.
I take that to mean the number of lines is not known in advance, and that it is acceptable to assume that no line will contain more than 80 characters (not counting any line terminator).
When you say
My problem is that fscanf doesn't recognize '\0' so it doesn't go in the next line when I print the array
I suppose you're talking about this code:
char matr[MAX][N];
/* ... */
if(matr[i] == '\0')
Given that declaration for matr, the given condition will always evaluate to false, regardless of any other consideration. fscanf() does not factor in at all. The type of matr[i] is char[N], an array of N elements of type char. That evaluates to a pointer to the first element of the array, which pointer will never be NULL. It looks like you're trying to determine when to write a newline, but nothing remotely resembling this approach can do that.
I suggest you start by taking #Barmar's advice to read line-by-line via fgets(). That might look like so:
char line[N+2]; /* N + 2 leaves space for both newline and string terminator */
if (fgets(line, sizeof(line), fp) != NULL) {
/* one line read; handle it ... */
} else {
/* handle end-of-file or I/O error */
}
Then for each line you read, parse out the "$word$" tokens by whatever means you like, and output the needed results (everything but the $-delimited tokens verbatim; the bracket substitution number for each token). Of course, you'll need to memorialize the substitution tokens for later output. Remember to make copies of those, as the buffer will be overwritten on each read (if done as I suggest above).
fscanf() does recognize '\0', under select circumstances, but that is not the issue here.
Code needs to detect '\n'. fscanf(fp,"%s"... will not do that. The first thing "%s" directs is to consume (and not save) any leading white-space including '\n'. Read a line of text with fgets().
Simple read 1 line at a time. Then march down the buffer looking for words.
Following uses "%n" to track how far in the buffer scanning stopped.
// more room for \n \0
#define BUF_SIZE (N + 1 + 1)
char buffer[BUF_SIZE];
while (fgets(buffer, sizeof buffer, stdin) != NULL) {
char *p = buffer;
char word[sizeof buffer];
int n;
while (sscanf(p, "%s%n", word, &n) == 1) {
// do something with word
if (strcmp(word, "$zero$") == 0) fputs("0", stdout);
else if (strcmp(word, "$one$") == 0) fputs("1", stdout);
else fputs(word, stdout);
fputc(' ', stdout);
p += n;
}
fputc('\n', stdout);
}
Use fread() to read the file contents to a char[] buffer. Then iterate through this buffer and whenever you find a $ you perform a strncmp to detect with which value to replace it (keep in mind, that there is a 2nd $ at the end of the word). To replace $word$ with a number you need to either shrink or extend the buffer at the position of the word - this depends on the string size of the number in ascii format (look solutions up on google, normally you should be able to use memmove). Then you can write the number to the cave, that arose from extending the buffer (just overwrite the $word$ aswell).
Then write the buffer to the file, overwriting all its previous contents.

How to copy one line from long string C

I'm looking to copy the FIRST line from a LONG string P into a buffer
I have no idea how to make it.
while (*pros_id != '/n'){
*pros_id_line=*pros_id;
pros_id++;
pros_id_line++;
}
And tried
fgets(pros_id_line, sizeof(pros_id_line), pros_id);
Both are not working. Can I get some help please?
Note, as Adriano Repetti pointed out in a comment and an answer, that the newline character is '\n' and not '/n'.
Your initial code can be fixed up to work, provided that the destination buffer is big enough:
while (*pros_id != '\n' && *pros_id != '\0')
*pros_id_line++ = *pros_id++;
*pros_id_line = '\0';
This code does not include the newline in the copied buffer; it is easy enough to add it if you need it.
One advantage of this code is that it makes a single pass through the data up to the newline (or end of string). An alternative makes two passes through the data, one to find the newline and another to copy to the newline:
if ((end = strchr(pros_id, '\n')) != 0)
{
memmove(pros_id_line, pros_id, end - pros_id);
pros_id_line[end - pros_id] = '\0';
}
This ensures that the string is null-terminated; again, it omits the newline, and assumes there is enough space in the pros_id_line buffer for the data. You have to decide what is the correct behaviour when there is no newline in the buffer. It might be sufficient to copy the buffer without the newline into the target area, or you might prefer to report a problem.
You can use strncpy() instead of memmove() but it has a more complex loop condition than memmove() — it has to check for a null byte as well as the count, whereas memmove() only has to check the count. You can use memcpy() instead of memmove() if you're sure there's no overlap between source and target, but memmove() always works and memcpy() sometimes doesn't (though only when the source and target areas overlap), and I prefer reliability over possible misbehaviour.
Note that setting a buffer to zero before copying a string to it is a waste of energy. The parts that you're about to overwrite with data didn't need to be zeroed. The parts that you aren't going to overwrite with data didn't need to be zeroed either. You should know exactly which byte needs to be zeroed, so why waste the time on zeroing anything except the one byte that needs to be zeroed?
(One exception to this is if you are dealing with sensitive data and are concerned that some function that your code will call may deliberately read beyond the end of the string and come across parts of a password or other sensitive data. Then it may be appropriate to wipe the memory before writing new data to it. On the whole, though, most people aren't writing such code.)
New line is \n not /n anyway I'd use strchar for this:
char* endOfFirstLine = strchr(inputString, '\n');
if (endOfFirstLine != NULL)
{
strncpy(yourBuffer, inputString,
endOfFirstLine - inputString);
}
else // Input is one single line
{
strcpy(yourBuffer, inputString);
}
With inputString as your char* multiline string and inputBuffer (assuming it's big enough to contain all data from inputString and it has been zeroed) as your required output (first line of inputString).
If you're going to be doing a lot of reading from long text buffers, you could try using a memory stream, if you system supports them: https://www.gnu.org/software/libc/manual/html_node/String-Streams.html
#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
static char buffer[] = "foo\nbar";
int
main()
{
char arr[100];
FILE *stream;
stream = fmemopen(buffer, strlen(buffer), "r");
fgets(arr, sizeof arr, stream);
printf("First line: %s\n", arr);
fgets(arr, sizeof arr, stream);
printf("Second line: %s\n", arr);
fclose (stream);
return 0;
}
POSIX 2008 (e.g. most Linux systems) has getline(3) which heap-allocates a buffer for a line.
So you could code
FILE* fil = fopen("something.txt","r");
if (!fil) { perror("fopen"); exit(EXIT_FAILURE); };
char *linebuf=NULL;
size_t linesiz=0;
if (getline(&linebuf, &linesiz, fil) {
do_something_with(linebuf);
}
else { perror("getline"; exit(EXIT_FAILURE); }
If you want to read an editable line from stdin in a terminal consider GNU readline.
If you are restricted to pure C99 code you have to do the heap allocation yourself (malloc or calloc or perhaps -with care- realloc)
If you just want to copy the first line of some existing buffer char*bigbuf; which is non-NULL, valid, and zero-byte terminated:
char*line = NULL;
char *eol = strchr(bigbuf, '\n');
if (!eol) { // bigbuf is a single line so duplicate it
line = strdup(bigbuf);
if (!line) { perror("strdup"); exit(EXIT_FAILURE); }
} else {
size_t linesize = eol-bugbuf;
line = malloc(linesize+1);
if (!line) { perror("malloc"); exit(EXIT_FAILURE);
memcpy (line, bigbuf, linesize);
line[linesize] = '\0';
}

Resources