Writing and Reading to and from a binary file C language - c

I was given an assignment to manage a sort of music store. In the database (which is saved as a .dat file) we have an artists name, and the album.
I'm having problems writing and reading the file.
First thing is, even if I don't write anything, just create the file, and then open the file in notepad, i see gibberish and letters in chinese or japanese.
Even if i write to the fail, or read from it using visual studio, this doesnt seem to change. Here's my code:
I opened the file with:
p=fopen("database.dat","w+");
The add item function:
void add_item(char* artist,char* record,FILE* p) //adds an item with artist and record to store
{
item node;
int item_size=sizeof(item);
rewind(p);
strcpy(node.artist,artist);
strcpy(node.record,record);
fwrite(&node,item_size,1,p);
printf("Data added\n");
}
item is the struct that is used to define a single item in the store. it has 2 fields, string artist and string record.
typedef struct item
{
char artist[100],record[100];
}item;
This is for reading:
void print_file(FILE* p) //print the entire file
{
int size=sizeof(item);
item node;
rewind(p);
while(!feof(p))
{
fread(&node,size,1,p);
printf("%s - %s\n",node.artist,node.record);
}
}
if i use print_file, i see gibberish, if i actually open the file with notepad, i see japanese.
Help! :D
edit: Just discovered something. If I add an item, and then read the file, i will read the item. But if i run the program again, and try to read the file immediatly, i see gibberish.

Problem is with "w+", it will truncate existing file to zero length and open it for write & read. When you run the program second time this is what happens and your read (before write) returns gibberish.
More on fopen here

I'm guessing based on your symptoms that the strings are defined as char* and not char[], which would be your problem. String IO on files doesn't work that way. Keep in mind that a string is actually of type char*, that is, a pointer to one or more 8-bit integers. When you write the string to the file, you're actually writing the value of the address itself, not the characters it points to.
You should use the fprintf() function to write strings to files:
http://www.cplusplus.com/reference/cstdio/fprintf/
And then fscanf() to read them:
http://www.cplusplus.com/reference/cstdio/fscanf/
In general, if you're going to use a struct as a file format, you can't put pointers in the struct. You could put char[] values, because they aren't handled the same as pointers, but that would require a hard limit on string size. This is one of the reasons why structs as file formats are discouraged by some-- better to read and write the values one at a time, handling strings and so on appropriately.
The reason it works when you read the file back immediately (before quitting your program) is because that pointer address is still valid for that string. But the string itself never got written into the file.

Related

In C how do you delete data (a structure) from a binary file IN UNIX?

I have this struct of people:
struct patient{
char name[100];
char address[100];
int age;
}
struct patient p1;
int f;
f = open("patients.dat",O_RDWR,S_IRUSR|S_IWUSR);
And I wrote a binary file by using f = open... and write(f,&p1,sizeof(struct patient))
Now there is a task where I have to delete certain people from the binary file (for example those whose name I enter) I thought about changing the last person with the one I wanna delete, but then there is still the last one in the file to be deleted.
Is there any way to delete that from the file, I don't know, like changing the p1's name,address,and age to '\0', but it didn't work, it still shows "ghost" things.
You cannot remove (or add) data from/to the middle of a file in POSIX -- only at the end. You can overwrite data in the middle, but that does not change the size of the file.
So if you want to delete a data record from the middle, you need to overwrite it with other data (possibly moving all subsequent data down, or possibly by just copying the last element), and then change the size to remove the last element (which is now a duplicate).
You change the size of a file with ftruncate
And I wrote a binary file by using f = open... and write(f,&p1,sizeof(struct patient)).
Please read the documentation of open(2) and write(2) (and also of lseek(2), close(2), ftruncate(2), inode(7), credentials(7) ...).
You need to check the return value of open and write.
As with most other syscalls(2). See also errno(3) and read Advanced Linux Programming
You could adopt a convention (and document it): every patient entry (inside your binary file) whose name starts with a 0 byte is cleared and could be reused.
You might also consider using mmap(2) to access your binary file. This would work well if your binary file is not too big (less than a few gigabytes in 2020). see also this answer.
Actually, you could consider using some database library, such as sqlite. Or some indexed file library, such as gdbm.
And both sqlite and gdbm are open source software coded in C for Linux. You'll learn a lot by studying their source code.

Difficulty in reading in a DNA sequence file using C code

I am currently taking Coursera Bioinformatics course, and doing the programming assignments. Being, a first year undergraduate just learning C, while I am aware that python is the language that's becoming much more popular for bioinformatics, I am challenging myself to implement every single algorithm in the course in the C language to master it, as it will also benefit me in all my CS courses here, which use C/C++ a lot.
In working on one of the assignments, our goal is to write a program that can take in a shorter DNA pattern and compare it with a long complete DNA strand, and the output the count, which is the number of times the shorter DNA pattern appears in the long complete DNA strand. We are given a file consisting of all the inputs we need, and the gold output, but I am having great trouble in parsing the file properly, even though I've consulted my textbook and numerous documentation. I actually have no problem implementing the algorithm itself; I tested it using smaller hardcoded character arrays in the program itself.
The input file is as follows:
Input
TAACAGCCTTTAGCCTTTAGCCTTTAGCCTTTAGCCTTTAGCCTTTGAGCCTTTGAGCCTTTTAGCCTTTCAGCCTTTAGCCTTTAAGCCTTTCCGCATCGAGCCTTTCAGCCTTTCGTAGCCTTTCGAGCCTTTAGCCTTTCAGCCTTTAGAGCCTTTAAGCCTTTAGTCGATGTAGCCTTTAGCCTTTAGCCTTTAGCCTTTGCAGCCTTTAGTAGGCAAGCCTTTTAGCCTTTGAGCCTTTCGAGCCTTTCTCGCTAGCCTTTAGCCTTTGGTGAGCCTTTTAGCCTTTAGCCTTTTCGCAGCCTTTTGAGCCTTTCTTGTTTGAATGGCAAGAGCCTTTTCGAGCCTTTAGCCTTTGAGCCTTTCAGCCTTTAAAAGCCTTTCGTTAGCCTTTAGCCTTTATCGAGCCTTTAAGCCTTTTATGCAAAGCCTTTGAGCCTTTAGCCTTTCAGCCTTTCAGCCTTTCATTGACAAGCCTTTCAGCCTTTAGCCTTTAGCCTTTCTCAGCCTTTGAGCCTTTGAGCCTTTGTCGAGCCTTTTTTCAGAGCCTTTTAGCCTTTAGCCTTTGAGCCTTTAGCCTTTAGCCTTTTACGAGCCTTTGCAAGCCTTTCAGCCTTTCCAAGCCTTTAGCCTTTGCTTTAGCCTTTCATGGGATAGCCTTTAGCCTTTATTAAGCCTTTTTTATCAAGCCTTTGTAGCCTTTAAGCCTTTCCAGCCTTTAGGAGCCTTTGTATAGCCTTTTGAGCCTTTCTACAGTAAAGCCTTTTTTGGTCAGCCTTTCTAGCCTTTGATAGCCTTTCTGAAGCCTTTGGCGGAGCCTTTCTGTTAACAGCCCAGCCTTTCTCATAGCCTTTGCGGTATCAGCCTTTGCAGCCTTTCTGGAGCGATAGCCTTTCAGCCTTTCCGAGCCTTTTTCAGAGCCTTTGAGCCTTTAGCCTTTAGCCTTTAGCCTTTAGCCTTTAGCCTTTAGCCTTTCCGAGCCTTTTCAGCCTTTACAGCCTTTTTAGCCTTTGAGCCTTTCACAGCCTTTGAGCTAGCCTTTAAGTTAAAGCCTTTAGCCTTTCAGCCTTTACATTAGCCTTTTAGCCTTTTCAGCCTTTAGAGCCTTTAGCCTTTGCTGAAGCCTTTAGTAGCCTTTAGCCTTTGCGAAGCCTTTCTGGTGCAACAAGTGAAGCCTTTGCCCTAGCCTTTGCTAGCCTTTCCGAGCCTTTGTCGATATAGCCTTTAGCCTTTAGAAAGCCTTTAGCCTTTGCTAGCCTTTATAGCCTTTAGCCTTTAGCCTTTCCCAGCCTTTAGCCTTTATCCTAAGCCTTTAGCCTTTTCCAGAAGCAGCCTTTTGATCAGAGCCTTTCTTCGGACTGCTCCCAGCCTTTAGCCTTTCAGCCTTTAGCCTTTAGCCTTTCAGCCTTTAGCCTTTTCTAGCCTTTAGCCTTTGTGTAGCCTTTCTAGCCTTTACGAGCCTTTGCCCAGCCTTTCCCAGCCTTTGAGAGCCTTTACCATATAGCCTTTACATAAGCCTTTGATGAGCCTTTAGCCTTTCGAGCCTTTCAGCCTTTACTCCAGCCTTTATAGCCTTTATATAGCCTTTCCTGTTAGGCCGTCGGTGCAGCCTTTTAGCCTTTAGCCTTTAGCCTTTAGCCTTTTAGCCTTTCAGCCTTTGCTAGCCTTTCCAGCCTTTACAGCCTTTTACCGAGCCTTTCCATCAGCCTTTAGCCTTTATATCTCTGATCGGGTAGCCTTTCGGCTAGCCTTTGGTAGCCTTTTTCAGCCTTTATGTAAAGCCTTTGTAGCCTTTGATGTGAGCCTTTAGAAGCCTTTGTCAGCCTTTTAGCCTTTAGCCTTTAGCCTTTTTAAGCCTTTTACAAGCCTTTACTCAGAGCCTTTACGAGCCTTTTAGCCTTTGCAGCCTTTTAGCCTTTTGATGAGCCTTTGCGGAGCCTTTTCTTTCAGCCTTTTGGCAGCCTTTAGCCTTTGTACAGCCTTTTTGAGCACAGCCTTTCGCGAAAGAGCCTTTATCAGCCTTTAAGCCTTTGCTAGCCTTTAGCCTTTTGAGCCTTTAGCCTTTGTATCTGTCTATCATCGAGCCTTTCTAAGCCTTTGCGGAAGCCTTTAGCCTTTGTCAGCCTTTCAAAGCCTTTAGCCTTTTATTCAAGCCTTTGAACCATAGCCTTTGGCAGCCTTTCAAGCCTTTGACGACAGCCTTTAGCCTTTCATTAGCCTTTTAGGAGGCTCATCCGTCTAGCCTTTAAATAGCCTTTAGCCTTTATAGCCTTTAAGCCTTTAGCCTTTTAAGCCTTTGAGAGCCTTTAAAGCCTTTAAACCAAGCCTTTGCGAAGCCTTTAGCCTTTAGCCTTTCGCAGCCTTTAGCCTTTCGAGCCTTTTGTAGCCTTTTGGAGAGCCTTTGGGCAAGCCTTTAGTATAAGCCTTTAGCCTTTTAGCCTTTCAAGCCTTTAGCCTTTAAGCCTTTGGCACAGCCTTTTGAGCCTTTCAGGAGCCTTTATTGGTGAGCCTTTAGTATAGCCTTTTCAGCCTTTGGCAGCCTTTAATGAAAGCCTTTGCTCAGCCTTTTTCAGCCTTTACAGCCTTTCAAGCCTTTAGCCACAAGCCTTTAGCCTTTAGCCTTTCAGCCTTTGCGAGCCTTTGTAGCCTTTAAGCCTTTCAGCCTTTAGCCTTTAGCCTTTTTCAAGCCTTTTCAGCCTTTCAAAGCCTTTCAAGCCTTTGAAGCCTTTCTAAGCCTTTGAGCCTTTGAGCCTTTGAGCCTTTAGCCTTTGTTCCTAGCCTTTATAGCCTTTTAGGCAGCCTTTCAGAGCCTTTTAAGCCTTTAGCCTTTCAGAAAGAGCCTTTAGCCCAGCCTTTTGATTAGCCTTTAGGGAACAGCCTTTAGCCTTTTAAGCCTTTGGTATACAATCAACGCAGCCTTTAGCCTTTAAGCCTTTTGGAGCCTTTCAGACTGATCCCAGCCTTTCAGCCTTTCTCAGCCTTTAAGCCTTTCTCCAAGCCTTTTGAGCCTTTTCGAGCCTTTAGTGAGCCTTTTGAAGCCTTTGTTTAGCCTTTTGTATAGGGTAGCCTTTAGCCTTTCCGGAAGCCTTTTGTAGCCTTTAAGCCTTTTGTCCGGGAAAGCCTTTGTAAGCCTTTAATGCAGCCTTTCCTATAGCCTTTAAGCCTTTCAGCCTTTTGGAGCCTTTTCTCAGCCTTTAGCCTTTCGCCAGCCTTTCTCCCGAGCAGCCTTTTAGAAAAAGCCTTTTAGCCTTTTACCGTGGACAGCCTTTCACGAGCCTTTACAGGCTAGCCTTTAGCCTTTGCTAGCCTTTTCCCAGCCTTTTGAGCCTTTAAGCCTTTCTAAGTTCTACGCTTGGGCTAAAGCCTTTAGCCTTTAAGCCTTTCAGCCTTTTGCAGCCTTTATATAACTTGAGCCTTTAGCCTTTAGCCTTTATAGCCTTTAGCCTTTTAGCCTTTTATATCCCTTAAGCCTTTGTAAGCCTTTAGCCTTTAAGCCTTTACGAGGAAAGCCTTTCATGCAGCCTTTAGCCTTTAGCCTTTGAGCCTTTCCAGCCTTTCAGCCTTTCAGCCTTTAGCCTTTAGCCTTTAGCCTTTTAGCCTTTATGAGCCTTTATAGCCTTTAGCCTTTTCACCAGCCTTTCCAGATGCACAAGCCTTTCAGCCTTTAGCCTTTCGAGCCTTTGGCTTATAGCCTTTCATCAGCCTTTCTAGCCTTTTAGCCTTTAGCCTTTAGCCTTTTCTAGCCTTTCAGCCTTTAGCCTTTTCGAAGCCTTTAGCCTTTTTAGCCTTTAGCTCAGCCTTTAGCCTTTATCTAACAGCCTTTAGCCTTTAGCCTTTAAAGCCTTTATGTCCAATTCTAACAGCCTTTAGCCTTTAAAGCCTTTGCAGCCTTTGAGCCTTTTAGCCTTTGAAGCCTTTAGCCTTTGTCAGCCTTTCCAGCCTTTTAGCCTTTAGCAGCCTTTAGTACGCCAGCCTTTAGCCTTTGTATAAGCCTTTAGCCTTTAGCCTTTCCACTAGCCTTTAGCCTTTAGAGGAGCGATAGCCTTTCAGCCTTTAGAAAGCCTTTGTTGCTGCTAGCCTTTGGGTTCTCAGCCTTTTAGCCTTTAGCCTTTAGCCTTTAGCCTTTAGCCTTTTGTAGCCTTTTACATAGGATTGATTCAAAAGCCTTTTTGAGCCTTTCTGCATTAGCCTTTTCCTCTAGCCTTTAGCCTTTCGCAGCCTTTAGCCTTTTAGAGCCTTTAGATAGCCTTTCGCGACAGCCTTTTGTTTAGCCTTTAGCCTTTGTTAGCCTTTGAGCCTTTGAGCCTTTTAGCCTTTCCTAGCCTTTCAGCCTTTCCAAAGCCTTTGACAGGGTGTAGCCTTTCTAGCCTTTTTAGCCTTTAGCCTTTAAACTTAAGCCTTTTTAGCCTTTAGCCTTTCAACCCAGCCTTTAGCCTTTTAAGCCTTTAGCCTTTAGCCTTTTTAGAAGCCTTTTAGCCTTTAGCCTTTGGAGCCTTTCAGATCTCAGCCTTTTCGAGCCTTTTAGCCTTTTCAGAAAAGTAGCCTTTTTAGCAGCCTTTTAAAGCCTTTGGAGCCTTTAGCCTTTAGCCTTTGTAGCCTTTTCCCAAAAGCCTTTACAGCCTTTGTGAGCCTTTTAGTTCGTTTGAGCCTTTCCAGCCTTTCAGCCTTTAGCCTTTATAGCCTTTTGCGAGAAGCCTTTAAGCCTTTAGCCTTTTGACGTTCTAGAGCCTTTGGAGCCTTTCACGCGAGCCTTTCAAGCCTTTGACTCCGCAGCCTTTTCGCGACCAGCCTTTGCCGTGCCAGCCTTTAGCCTTTCAACACAGCCTTTAGCCTTTGGGCCGCAGAGCCTTTGAGTAGCCTTTAGCCTTTGACAGCCTTTAGCCTTTCTAGCCTTTGCAGCCTTTGTCTAGGTAGCCTTTAGCCTTTAGCCTTTCTAGCCTTTTAGCCTTTAGCCTTTTGAGCCTTTTGGAAGCCTTTCAGCCTTTAGCCTTTCGCGAGCCTTTGAGCCTTTACCCAGCCTTTACGGAGCCTTTAGCCTTTCCCATAGCCTTTAGCCTTTCCAGCCTTTAGCCTTTTAGCCTTTCAAATCTAAGCCTTTCGCATATATGGTAGCCTTTAGCCTTTAGCCTTTATGGTCCTTCAGTTTGAGCCTTTTAGAGCCTTTAAAGGAGCCTTTGTAAGACGAAGGTAGCCTTTAGCCTTTGCCAGCCTTTTTAGCCTTTAGCCTTTAAAAAGCCTTTGAGCCTTTAGCCTTTAGCCTTTGAGCCTTTAGCCTTTTCTCCTAGCCTTTCATAGCCTTTGAGCCTTTAGCCTTTTAGCCTTTTAGCCTTTAGCCTTTAGCCTTTGGAGGTCAGCCTTTATGTTAAAGCCTTTAGTTCCCAGCCTTTCAGCCTTTAGCCTTTAGCCTTTGAGCCTTTCAGCCTTTTAGCCTTTCAGCCTTTCAGCCTTTGAAGCCTTTTGTAGCCTTTGCCCGAGCCTTTAGCCTTTAGCCTTTCCCAACCCTGATCCGTAGCCTTTGGGCTGATCCTGAGCCTTTTCAGCCTTTAAGCCTTTAGCCTTTAGCCTTTGAGAAGCCTTTAGCCTTTCAGCCTTTAACAGCCTTTAAGCCTTTATAGCCTTTAGCCAGCCTTTGCAGCCTTTCAGTAGCCTTTAGCCTTTAGCCTTTCTAGCCTTTCTTGGAGCCTTTCCCAGCCTTTAAGAGCCTTTAGCCTTTTAGCCTTTCAGCCTTTAGCCTTTTCGTAGCCTTTGACCATTGTCAGCCTTTCTACTGAGCCTTTCATAGCCTTTTTTAGCCTTTCTAGCAGCCTTTGGAGCCTTTAGAAGAGCCTTTAGCCTTTTAAGCCTTTGAGCCTTTAACACAAGCCTTTATCTGGGCCGCGAGCCTTTTCAACCTAACTACAGCCTTTCTAAGCCTTTAGCCTTTAGCCTTTCAGCCTTTTAGCCTTTACCGAGCCTTTGCGGGAAGCCTTTAAAGAGCCTTTAGAAAAAGCCTTTGGGATAGCCTTTCCAGCCTTTCCAGCCTTTTTAGCCTTTTCCTCAAGATTTAGCCTTTGATGAAGCCTTTGAGCCTTTAGCCTTTCATTGAGCCTTTTAAGCCTTTCAGCCTTTTCTCATCAGCCTTTCACAGCCTTTCTACAGCCTTTAGCCTTTAGCCTTTGGAGCCTTTTCGCCCCGAGCCTTTAGCCTTTAGCCTTTTAGCCTTTCAGCCTTTGTAGCCTTTAGAGCCTTTGCTTAGCCTTTAGCCTTTAGTAGCCTTTAGATAGCCTTTTCTGGGAGCCTTTACAGCCTTTAGCCTTTAGCCTTTAGCCTTTTAAAGCCTTTCCCCAAAGCCTTTGTTGAGCCTTTAGCCTTTACAGTCTAGCCTTTAGCCTTTCAAGCCTTTACCTTAGCCTTTGGCAGCCTTTCTAGCCTTTAGCCTTTTCAGCCTTTAGCCTTTAAGCCTTTAGCCTTTTCGAGCCTTTGAGCCTTTAAGCCTTTATAAAAAGCCTTTAGCCTTTAAGCCTTTACCAGCCTTTAGCCTTTCAGCCTTTTATCGGAAAGCCTTTAAGCCTTTTAGCCTTTCAGCCTTTGAGCCTTTCAGCCTTTAGCCTTTGGCAAAGCCTTTTTGCAGCCTTTGGAAGCCTTTAGCCTTTTTCAAGCCTTTCAGCCTTTAGCCTTTGCACGTATTAGGAAGCCTTTTACTCTAAGCCTTTATCAGCCTTTAGCCTTTAGCCTTTAAGCCTTTAGCCTTTAGCCTTTAGCCTTTAGCCTTTAGCCTTTAGCCTTTAGCCTTTACGGTCAGCCTTTGGTAGCCTTTTCAGCCTTTAAGCCTTTAAGCCTTTGAGCCTTTAGCCTTTAGCCTTTGAGCCTTTAAGAGCCTTTCAGCCTTTTTTAGCCTTTTAGCCTTTGAGCCTTTCCTAGCCTTTCAAGCCTTTGAGCCTTTCGAAGCCTTTTAGCCTTTAGCCTTTAGCCTTTATGGAGCCTTTAGCCTTTAGCGGAGCCTTTGAGCCTTTACAGAGCCTTTAGCCTTTAGCCTTTTAAGCCTTTTGCAGCCTTTCAAAGAGCCTTTAGCCTTTACGGAGCCTTTAGCCTTTAAGCCTTTCTCACTAGCCTTTTTAGCCTTTGAGCCTTTATGACGAAGCCTTTAGCCTTTTGTCGTGACCTGAGCCTTTAGCCTTTACAGCCTTTCAGCCTTTAGCCTTTCTTAAAAGCCTTTTAGCCTTTTTGAGCCTTTACAGCCTTTCGAGCCTTTGAGCCTTTCCCAGCCTTTGAAGCCTTTTGGACAGAGCCTTTGCTAGCCTTTAGCCTTTTAGCCTTTAGCCTTTAGCCTTTACTTAGCCTTTTAGCCTTTATGGATAGCCTTTAGCCTTTGAGAGCCTTTGCCTAGCCTTTGAAGCCTTTTTAGCCTTTAACGAGCCTTTAGCCTTTAGCCTTTAGCCTTTAAGCCTTTAGCCTTTCGAGCCTTTCTCAGCCTTTGTAGCCTTTAGCCTTTAGAGCAGCCTTTAGCCTTTCCAGCCTTTAGCCTTTTCAGCCTTTAGCCTTTCAGCCTTTGCCCCGAGCACGTAGCCTTTACAGCCTTTAGCCTTTAGCCTTTTAGCCTTTACAGCCTTTTGAGCCTTTAGCCTTTGAAAGCCTTTTGAAGAGCCTTTCAGCCTTTCTTACTAGCCTTTGCAGCCTTTTAGCCTTTCCGAGCCTTTGATAGCCTTTGTCGGTAAGCCTTTGTAGAGCCTTTAGCCTTTAAGCCTTTGGTAAAGAGCCTTTTCAACAGCCTTTCGGAGCCTTTCGCTACAAGCCTTTTGGCCTAGCCTTTAGCCTTTCAGCCTTTCAAGAGCCTTTAGCCTTTCGCAGCCTTTATAGCCTTTCAGCCTTTCAGCCTTTAGCCTTTAGAGCCTTTGAGCCTTTCGTTATCTAAGCCTTTACTCCATAGCCTTTGAGCCTTTAGCCTTTGTCAGTCGAGCCTTTGTTCTTGAGCCTTTAGCCTTTGCAGCCTTTAGCCTTTTGTTTGTGGAGCCTTTAGCCTTTGAATACAGCCTTTAGCCTTTAGCCTTTAGCCTTTCTAGCCTTTCAGCAGCCTTTGTAGCCTTTGAACCAGCCTTTAGCCTTTTAGCCTTTTCCTTAGCCTTTCCAGCCTTTTAGTGAGCCTTTAGCCTTTGCACCAGCCTTTAGCCTTTAGCCTTTCAGCCTTTAGCCTTTCGAGCCTTTTAGCCTTTGAACAGCCTTTTGAGCCTTTGACGATATGAGCCTTTAGCCTTTTGTAGCCTTTTTTAGCCTTTGAACAGCCTTTGGAGTCAAGCCTTTACGCAGCCTTTCCAGCCTTTCAGCCTTTAGCCTTTGGTCAGCCTTTTCAGAGCCTTTGCGGTTAGCCTTTGAATAGCCTTTAAAGCCTTTCTCAGCCTTTGTAAGCCTTTAGCCTTTTAGCCTTTGTGAGCCTTTCAGCCTTTCCGAGCCTTTAGCCTTTGCCTACGGAAGCCTTTAGCCTTTGCTATCAGCTTGAGCCTTTTAGCCTTTAGTAGCAGCCTTTTAGCCTTTTAGCCTTTCAGCCTTTCTCTAGCCTTTAGCCTTTATCCGAGCCTTTACCAGCCTTTGAGCCTTTAGCCTTTATAGCCTTTATACGTAGCTAGCCTTTAGCCTTTAGAGCCTTTACCCTGTACCAGCCTTTAAGCCTTTCTCGTGAAGCCTTTAGCCTTTGAGCCTTTCGAGCCTTTAGCCTTTAGCCTTTAAGCCTTTTTGTGTGAGCCTTTAGCCTTTGGGGAGCCTTTAGCCTTTCAGCCTTTTAGCCTTTTCAAGCCTTTAGCCTTTAGCCTTTTGAGCCTTTAAAGCCTTTAGCCTTTAGGTAGCAAGCCTTTCGTTATAGCCTTTTATAAGCCTTTTTTAATGAGCCTTTAGCCTTTAGCCTTTGAGCAGCCTTTAGCCTTTAGTAGCCTTTTGATATTAGCCTTTCAGCCTTTAGCCTTTCCCCGAGCCTTTGTTAGAGCCTTTGCAGCCTTTGGAGCCTTTAGCCTTTCGGAGCCTTTAGCCTTTGGGACAGCCTTTAGCCTTTAGCCTTTGAAGCCTTTTGCAGCCTTTAAGATAGCCTTTGAGCCTTTTCAGCCTTTACAGCCTTTAAGCCTTTAGCCTTTGAGCCTTTGAGCCTTTTGAGCCTTTTAGCCTTTGTTGCAGCCTTTAGCCTTTAGCCTTTTAGCCTTTAGCCTTTAGCCTTTGAGCCTTTGAGCCTTTTAGCCTTTAGCCTTTGAGCCTTTTGGACAGCCTTTCTGAGCCTTTCGTAGCCTTTACCGCAAGCCTTTATAGCCTTTGAAGAGGAGCCTTTATAGCCTTTCAGAAGCCTTTTAAGCCTTTTCGCAGCCTTTTATCAGCCTTTAGCCTTTAGCCTTTTAGCCTTTCAGCCTTTAGCCTTTACAAGCCTTTAGCCTTTAGCCTTTATCAAGCCTTTCTAGCCTTTGAGCCTTTGTGAGCCTTTGTGTCAGCCTTTCAAGCCTTTTTAAGTACAGCCTTTACTCAGCCTTTATAGCCTTTGTCGTAAGCCTTTAGCCTTTAGCCTTTGAAAAGCCTTTACGCACAGACAAGTAGCCTTTCAGCCTTTAAGCCTTTGAGTATGTCCTTGAGCCTTTAAAAGAGCCTTTGGTAGCCTTTAGCCTTTAGCCTTTTATAGCCTTTAAGCCTTTAAGCCTTT
AGCCTTTAG
Output
294
First line is the string "Input"
Second line is entire DNA sequence (which as the label "Input" implies, is my input DNA sequence, and which I will refer to it throughout as text)
Third line the shorter DNA sequence, which I will refer to as pattern.
Fourth line is string "Output"
Fifth line is the gold output, the number of counts which my program should be returning.
I tried parsing the file with the following code:
int main(int argc, char **argv)
{
if(argc>1)
{
FILE * dataset = fopen(argv[1], "r");
if(dataset==NULL)
{
printf("File count not be opened or found!\n");
return 1;
}
char in_label[1000], dna_text[10000], dna_pattern[1000], out_label[1000];
int count=0;
fscanf(dataset, "%s, %s, %s, %s, %d", in_label, dna_text,dna_pattern, out_label,&count);
... and other code below that calls the counting algorithm which I won't show here ...
While my call to fscanf does return me in_label correctly, it does not work for the remaining arguments. Basically when I printf out each of my in_label, dna_text, dna_pattern, out_label and count, only in_label correctly gives me the string Input, but the rest all are garbage. I'm really confused, because I thought that the fscanf function automatically skips linefeeds or spaces when reading in from the stream. So why did Input get correctly read into in_label, but not the others???
Also a second question I have is about one shortcoming that I'm aware of in my program, which are the hardcoded array sizes. I know about malloc function, and just learnt about it this week in class, but I just can't figure out how to use it here. Because in order to use malloc, we need to be able to at least "soft code" the size of our array in advance, and here, I just can't imagine how I would be able to tell the compiler, in any "soft coded" manner, what my array sizes will be especially for the dna_text array, which varies greatly from dataset to dataset.
C is really challenging, a world away from python whose convenience I've been so spoilt by. I would greatly appreciate any help to overcome this issue, so that I can move on with my learning of bioinformatics. Thank you very much!
You can use fstat() to get the file size and malloc() for allocating proper buffers
Use fgets() for reading text files line by line. Do not use fscanf() at all.
Never read a file at once as you should never know what exactly went wrong if your reading API returned an error. My experience tells me that reading line-by-line is the best strategy when working with text files. Just be sure you have a buffer large enough for storing the longest possible line.

Flex - Function that compares strings in C

I'm programing with flex using C, a C code compiler and I want to compare strings on a file, in this case my symbol table, with yytext. If yytext and the respective string of the table are the same one it should exit the function and if there are no instances on the table then the function will write the string down on the symbol table.
This is my function:
search (char *x){
int c;
int n = 0;
char *cdn;
while ((c = fgetc(comp)) != EOF){
fscanf(comp, "%s", cdn);
if (strcmp(cdn, yytext) == 0){
n++; //if n>0 when it finishes searching the file then there's a copy on the file
}else{}
return 0;
}
if (n==0){
fprintf(comp, "%d\t %s\n", pos++, yytext); //will write if there's no copy in the table
}else return 0;}
The input for the function is yytext, yytext will have for example "a".
After running this, the program doesn't write anything and it need to be closed manually. (More like program.extension has stopped working.)
Can someone help me with this?
First of all, putting your symbol table into a file is a very debatable design choice:
symbols are checked very frequently, so file accesses will slow down your compiler a lot.
symbols will be associated with a lot of grammatical informations (for instance, they may represent a variable with an associated type), so storing only the names will not be enough to make later stages of your compiler work.
If you store all symbol informations into a file, you will have to re-read the entire file and convert each bit of information into a memory representation each time you will want to access a given symbol.
This is not only inefficient, it will also force you to write tons of unnecessary and complicated code.
Now for your search function.
Regardless of the current bugs, what your function does is not a search, though you need to search your file to make it work.
What your function does is create a unique list of yytext values. The "search" you're performing inside it simply makes sure an already present value is not duplicated.
The very first thing to do would be to give it a less misleading name, or modify it so that it does what its name implies.
Now for the bugs
If for some reason you still want to use a file, I suppose you will put each name on a single line.
So why not use fgets(), that will take care of the line endings for you?
Whatever method you are using to read each name, you will have to provide a buffer with actual storage space for the string, not just an unitialized pointer.
If your input string is yytext, your x parameter will never be used.
Lastly, your search function (which inserts the current yytext value into an unsorted list), has no reason to return anything (except an error code if your disk gets full and you can't add new names to the list).

Overwriting an ID3v2 tag in C using fwrite()

For some reason, when I call fwrite(), it doesn't overwrite the file pointed to by the pointer. When I run the program, it displays that the tag of the file was replaced by the inputted tag specified by the user. After I check the file using a separate code to see the current tags, the tag wasn't replaced at all. I believe there is something wrong with how I used fwrite and into thinking that it would really overwrite the file. The tag in this case is the title. Here is the code:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include "title.h"
char title[30];
char title1[30];
int main(int argc, char *argv[]){
FILE *fPtrs;
// print the current tag
fPtrs = fopen(argv[1],"r+b");
fseek(fPtrs,-125,SEEK_END);
fread(title1,1,30,fPtrs);
strncpy(title,title1,30);
printf("%s\n",title);
fclose(fPtrs);
// call to overwrite the current tag
fPtrs = fopen(argv[1],"w+b");
title_tag(fPtrs,argv[2]);
fclose(fPtrs);
// print the tag of the should be overwritten file
fPtrs = fopen(argv[1],"r+b");
fseek(fPtrs,-125,SEEK_END);
fread(title1,1,30,fPtrs);
strncpy(title,title1,30);
printf("%s\n",title);
fclose(fPtrs);
return 0;
}
#include<stdio.h>
void title_tag(FILE* fName, char title_s[]){
fseek(fName,-125,SEEK_END);
fwrite(title_s,1,30,fName);
}
This is a project I'm working on in the university and we were told that we were not allowed to use the id3lib -.-
You aren't doing any input validation. Your program depends on there being two command-line arguments, but you aren't testing for argc == 3 or anything similar.
There's no error checking. After blindly assuming that the first command-line argument was a correctly specified filename, you aren't checking the return value of the fopen() call to verify that the call succeeded. You also aren't doing any checking to verify that the file you've opened has an ID3v1 tag at the end, you're simply assuming it is present.
When you call fread(), you're not making sure that what you read was a valid string, but you're then proceeding to treat it as one. (You're reading in a 30-byte field, but it doesn't appear to me that it's required to be a null-terminated 30-byte field. What happens if the track title is 30 bytes long?
Why are you copying the title from title into title1? Totally leaving aside that you haven't ensured that it's null-terminated, and therefore should be either null-terminating it or using memcpy(), the only thing you're doing with the copy is passing it to printf(). What did copying it gain you?
You shouldn't be closing/reopening/closing/reopening the file. The first time you open it, you're using mode "r+b", which opens the file for reading and writing, so all you needed to do was that fseek(). (And, of course, sanity-checking to see if you can update an existing ID3v1 tag or if you need to append one.)
You really shouldn't be closing/reopening/closing/reopening the file. The open mode "w+b", which you're using when you unnecessarily close/reopen the file to update it, is documented to truncate the file to zero length if the file already exists, or create a new file otherwise.
What you're passing to your title_tag() function.
Because of what you're doing with it, the second argument to title_tag() needs to be a block of at least 30 bytes of storage. You're passing it argv[2], which is not guaranteed to be such. You'll need to copy argv[2] into your own storage, and pass that instead.
You've written the title_tag() function such that someone might mistakenly think the second argument in the function is an array. However, due to the mechanics of the C language, within the context of the function the argument is simply a pointer-to-char. (You're allowed to write it as char title_s[], but in C, if you pass an array as an argument to a function, what the function actually gets is a pointer to the first element of the array. For most purposes, this makes no functional difference.)
There. Hope that helps.
highly recommend using the id3lib. will allow you better control over the manipulation of id3 tags rather than just trying to write to the file raw
http://id3lib.sourceforge.net/

Writing structure into a file in C

I am reading and writting a structure into a text file which is not readable. I have to write readable data into the file from the structure object.
Here is little more detail of my code:
I am having the code which reads and writes a list of itemname and code into a file (file.txt). The code uses linked list concept to read and write data.
The data are stored into a structure object and then writen into a file using fwrite.
The code works fine. But I need to write a readable data into the text file.
Now the file.txt looks like bellow,
㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀\䵏㵈䑜㵅㡸䍏䥔䥆㘸䘠\㵅㩃䠀䵏㵈䑜㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀䵏㵈\䑜㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀䵏㵈䑜㵅㡸䍏䥔\䥆㘸䘠㵅㩃䠀䵏㵈
I am expecting the file should be like this,
pencil aaaa
Table bbbb
pen cccc
notebook nnnn
Here is the snippet:
struct Item
{
char itemname[255];
char dspidc[255];
struct Item *ptrnext;
};
// Writing into the file
printf("\nEnter Itemname: ");
gets(ptrthis->itemname);
printf("\nEnter Code: ");
gets(ptrthis->dspidc);
fwrite(ptrthis, sizeof(*ptrthis), 1, fp);
// Reading from the file
while(fread(ptrthis, sizeof(*ptrthis), 1, fp) ==1)
{
printf("\n%s %s", ptrthis->itemname,ptrthis->dspidc);
ptrthis = ptrthis->ptrnext;
}
Writing the size of an array that is 255 bytes will write 255 bytes to file (regardless of what you have stuffed into that array). If you want only the 'textual' portion of that array you need to use a facility that handles null terminators (i.e. printf, fprintf, ...).
Reading is then more complicated as you need to set up the idea of a sentinel value that represents the end of a string.
This speaks nothing of the fact that you are writing the value of a pointer (initialized or not) that will have no context or validity on the next read. Pointers (i.e. memory locations) have application only within the currently executing process. Trying to use one process' memory address in another is definitely a bad idea.
The code works fine
not really:
a) you are dumping the raw contents of the struct to a file, including the pointer to another instance if "Item". you can not expect to read back in a pointer from disc and use it as you do with ptrthis = ptrthis->ptrnext (i mean, this works as you "use" it in the given snippet, but just because that snippet does nothing meaningful at all).
b) you are writing 2 * 255 bytes of potential crap to the file. the reason why you see this strange looking "blocks" in your file is, that you write all 255 bytes of itemname and 255 bytes of dspidc to the disc .. including terminating \0 (which are the blocks, depending on your editor). the real "string" is something meaningful at the beginning of either itemname or dspidc, followed by a \0, followed by whatever is was in memory before.
the term you need to lookup and read about is called serialization, there are some libraries out there already which solve the task of dumping data structures to disc (or network or anything else) and reading it back in, eg tpl.
First of all, I would only serialize the data, not the pointers.
Then, in my opinion, you have 2 choices:
write a parser for your syntax (with yacc for instance)
use a data dumping format such as rmi serialization mechanism.
Sorry I can't find online docs, but I know I have the grammar on paper.
Both of those solution will be platform independent, be them big endian or little endian.

Resources