Replacing bytes at current offset in c - c

I'm currently developing a program that mimics UNIX file system. I've prepared my disk as file (1 MB) got all data blocks inside it. Now what I'm doing is implementing some simple commands like mkdir, ls etc. In order to work with those commands, I need to read specific offset(no problem with that) and write the modified blocks to specific location.
Simply my goal is:
SIIIDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD (Current Disk)
I wan't to change three blocks with AAA after 16.byte so it will be like:
SIIIDDDDDDDDDDDDAAADDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD (Modified Disk)
I'm not going to provide all of my implementation here I just want to have some ideas about it how can I implement it without buffering all the 1 MB data in my program. In short I know locations of my data blocks so I just want to replace that part of my file not whole file. Can't I simply do this with file stream functions ?
Another example:
fseek(from_disk,superblock.i_node_bit_map_starting_addr , SEEK_SET); //seek to known offset.
read_bit_map(&from_disk); // I can read at specific location without problem
... manipulate bit map ...
fseek(to_disk,superblock.i_node_bit_map_starting_addr , SEEK_SET); //seek to known offset.
write_bit_map(&to_disk); //Write back the data.
//This will destroy the current data of file. (Tried with w+, a modes.)
Note: Not provided in example but I have two file pointers both writing and reading and I'm aware I need to close one before opening another.

I think you are looking for the r+ (potentially rb+ mode). Here is a complete example, afterwards you can run grep -n hello data.txt to verify for yourself the result. You can run it with make prog && ./prog.
#include <stdio.h>
#include <unistd.h>
#include <string.h>
int main(int argc, char const *argv[])
{
FILE *file;
file = fopen("data.txt", "w+");
char dummy_data[] = "This is stackoverflow.com\n";
int dummy_data_length = strlen(dummy_data);
for (int i = 0; i < 1000; ++i)
fwrite(dummy_data, dummy_data_length, 1, file);
fclose(file);
file = fopen("data.txt", "r+");
fseek(file, 500, SEEK_CUR);
fwrite("hello", 5, 1, file);
fclose(file);
return 0;
}

Related

Why does the calling of dup2 go wrong?

As you see, the program has two file pointer sport and fruit point to the file fruit.txt. The problem is that after run the program, sport.txt is empty and fruit.txt contains Chinese characters. I expected that the sport.txt should contains the word "basketball" because it is written to the file before redirecting happens. So, what is wrong here?
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include "../cus_header/cus_header.h"
int main(){
FILE *fruit = fopen("fruit.txt", "w");
if(!fruit)
error("cannot open fruit.txt");
FILE *sport = fopen("sport.txt", "w");
if(!sport)
error("cannot open sport.txt");
int de_sport = fileno(sport);
int de_fruit = fileno(fruit);
printf("file number of sport.txt: %i and of fruit.txt: %i\n", de_sport, de_fruit);
fwrite("basketball", sizeof(char), 10, sport);
fwrite("apple", sizeof(char), 6, fruit);
if(dup2(de_fruit, de_sport) == -1)
error("cannot redirect");
fwrite("basketball", sizeof(char), 10, sport); //???
fwrite("apple", sizeof(char), 6, fruit); // ???
fclose(sport);
fclose(fruit);
return 0;
}
As the comments already mention, you shouldn't mix file manipulation with streams (using FILE*, fopen, fwrite, fclose) with raw file manipulation (using file descriptors, open, write, close, dup2). And especially don't mix them on the same file pointer/descriptor like you are doing in this piece of code.
Let's go through the code to see why it behaves the way it does:
FILE *fruit = fopen("fruit.txt", "w");
...
FILE *sport = fopen("sport.txt", "w");
You shouldn't care about how the FILE structure looks like, let's just suppose it keeps the underlying file descriptor somewhere.
int de_sport = fileno(sport);
int de_fruit = fileno(fruit);
You create local variables holding the same file descriptors as the two FILE* refer.
fwrite("basketball", sizeof(char), 10, sport);
fwrite("apple", sizeof(char), 6, fruit);
You write something in each of the two files. Because C file streams are buffered by default, the actual writing in the file on disk might not happen right away (and in your case it doesn't).
dup2(de_fruit, de_sport)
This closes the file descriptor de_sport and makes it refer to the same file as de_fruit. The actual numerical values remain the same, only the actual files that they refer to are changed. This means that the two FILE handles will write to the same file after the dup2 call.
fwrite("basketball", sizeof(char), 10, sport); //???
fwrite("apple", sizeof(char), 6, fruit); // ???
This will write to the same underlying file because the two descriptors now refer to the same file. But again, because streams are buffered, this might actually just append to the buffers of those two FILE*s.
fclose(sport);
fclose(fruit);
This flushes the buffers, so the actual writing to disk happens here. Because the descriptors have been changed, if no flushing happened until now, both streams will actually flush to the same file on disk.
This is probably why you're seeing that behavior, but keep in mind that what you're doing is not safe and that the behavior or file contents might differ.

read multiple fasta sequence using external library kseq.h

I am trying to find fasta sequences of 5 ids/name as provided by user from a big fasta file (containing 80000 fasta sequences) using an external header file kseq.h as in: http://lh3lh3.users.sourceforge.net/kseq.shtml. When I run the program in a for loop, I have to open/close the big fasta file again and again (commented in the code) which makes the computation time slow. On the contrary, if I open/close only once outside the loop, the program stops if it encounters an entry which is not present in the big fasta file I.e. it reaches end of the file. Can anyone suggest how to get all the sequences without losing computational time. The code is:
#include <zlib.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include "ext_libraries/kseq.h"
KSEQ_INIT(gzFile, gzread)
int main(int argc, char *argv[])
{
char gwidd_ids[100];
kseq_t *seq;
int i=0, nFields=0, row=0, col=0;
int size=1000, flag1=0, l=0, index0=0;
printf("Opening file %s\n", argv[1]);
char **gi_ids=(char **)malloc(sizeof(char *)*size);
for(i=0;i<size;i++)
{
gi_ids[i]=(char *)malloc(sizeof(char)*50);
}
FILE *fp_inp = fopen(argv[1], "r");
while(fscanf(fp_inp, "%s", gwidd_ids) == 1)
{
printf("%s\n", gwidd_ids);
strcpy(gi_ids[index0], gwidd_ids);
index0++;
}
fclose(fp_inp);
FILE *f0 = fopen("xxx.txt", "w");
FILE *f1 = fopen("yyy.txt", "w");
FILE *f2 = fopen("zzz", "w");
FILE *instream = NULL;
instream = fopen("fasta_seq_uniprot.txt", "r");
gzFile fpf = gzdopen(fileno(instream), "r");
for(col=0;col<index0;col++)
{
flag1=0;
// FILE *instream = NULL;
// instream = fopen("fasta_seq_nr_uniprot.txt", "r");
// gzFile fpf = gzdopen(fileno(instream), "r");
kseq_t *seq = kseq_init(fpf);
while((kseq_read(seq)) >= 0 && flag1 == 0)
{
if(strcasecmp(gi_ids[col], seq->name.s) == 0)
{
fprintf(f1, ">%s\n", gi_ids[col]);
fprintf(f2, ">%s\n%s\n", seq->name.s, seq->seq.s);
flag1 = 1;
}
}
if(flag1 == 0)
{
fprintf(f0, "%s\n", gi_ids[col]);
}
kseq_destroy(seq);
// gzclose(fpf);
}
gzclose(fpf);
fclose(f0);
fclose(f1);
fclose(f2);
for(i=0;i<size;i++)
{
free(gi_ids[i]);
}
free(gi_ids);
return 0;
}
A few examples of inputfile (fasta_seq_uniprot.txt) is:
P21306
MSAWRKAGISYAAYLNVAAQAIRSSLKTELQTASVLNRSQTDAFYTQYKNGTAASEPTPITK
P38077
MLSRIVSNNATRSVMCHQAQVGILYKTNPVRTYATLKEVEMRLKSIKNIEKITKTMKIVASTRLSKAEKAKISAKKMD
-----------
-----------
The user entry file is
P37592\n
Q8IUX1\n
B3GNT2\n
Q81U58\n
P70453\n
Your problem appears a bit different than you suppose. That the program stops after trying to retrieve a sequence that is not present in the data file is a consequence of the fact that it never rewinds the input. Therefore, even for a query list containing only sequences that are present in the data file, if the requested sequence IDs are not in the same relative order as the data file then the program will fail to find some of the sequences (it will pass them by when looking for an earlier-listed sequence, never to return).
Furthermore, I think it likely that the time savings you observe comes from making only a single pass through the file, instead of a (partial) pass for each requested sequence, not so much from opening it only once. Opening and closing a file is a bit expensive, but nowhere near as expensive as reading tens or hundreds of kilobytes from it.
To answer your question directly, I think you need to take these steps:
Move the kseq_init(seq) call to just before the loop.
Move the kseq_destroy(seq) call to just after the loop.
Put in a call to kseq_rewind(seq) as the last statement in the loop.
That should make your program right again, but it is likely to kill pretty much all your time savings, because you will return to scanning the file from the beginning for each requested sequence.
The library you are using appears to support only sequential access. Therefore, the most efficient way to do the job both right and fast would be to invert the logic: read sequences one at a time in an outer loop, testing each one as you go to see whether it matches any of the requested ones.
Supposing that the list of requested sequences will contain only a few entries, like your example, you probably don't need to do any better testing for matches than just using an inner loop to test each requested sequence id vs. the then-current sequence. If the query lists may be a lot longer, though, then you could consider putting them in a hash table or sorting them into the same order as the data file to make it possible to test more efficiently for matches.

Open image file as binary, store image as string of bytes, save the image - possible in plain C?

I would like to read an image, lets say, picture.png in C. I know I can open it in binary mode, and then read - it's pretty simple.
But I need something more: I would like to be able to read the image once, store it in my code, for example, in *.h file, as 'string of bytes', for example:
unsigned char image[] = "0x87 0x45 0x56 ... ";
and then, be able to just do:
delete physical file I read from disk,
save image into file - it will create my file once again,
EVEN if I removed image from disk (deleted physical file picture.png I read earlier) I will still be able to create an image on disk, simply by writing my image array into file using binary mode. Is that possible in pure C? If so, how can I do this?
There's even a special format for this task, called XPM and a library to manipulate these files. But remember due to its nature it's suitable only for relatively small images. But yes, it was used for years in X Window System to provide icons. Well, those old good days icons were 16x16 pixels wide and contained no more than 256 colors :)
Of course it's possible, but it's a bit unclear what you're after.
There are stand-alone programs that convert binary data to C source code, you don't need to implement that. But doing it that way of course means that the image becomes a static part of your program's executable.
If you want it to be more dynamic, like specifying the filename to your program when it's running, then the whole thing about converting to C source code becomes moot; your program is already compiled. C programs can't add to their own source at run-time.
UPDATE If all you want to do is load a file, hold it in memory and then write it back out, all in the same run of your program, that's pretty trivial.
You'd use fopen() to open the file, fseek() to go to the end, ftell() to read the size of the file. Then rewind() it to the start, malloc() a suitable buffer, fread() the file's contents into the buffer and fclose() the file. Later, fopen() a new output file, and fwrite() the buffer into that before using fclose() to close the file. Then you're done. You can do it again, as many times as you like. It can be an image, a program, a document or any other kind of file, it doesn't matter.
pic2h.c :
#include <stdio.h>
int main(int argc, char *argv[]){
if(argc != 3){
fprintf(stderr, "Usage >pic2h image.png image.h\n");
return -1;
}
FILE *fi = fopen(argv[1], "rb");
FILE *fo = fopen(argv[2], "w");
int ch, count = 0;
fprintf(fo, "extern unsigned char image[];\n");
fprintf(fo, "unsigned char image[] =");
while(EOF!=(ch=fgetc(fi))){
if(count == 0)
fprintf(fo, "\n\"");
fprintf(fo, "\\x%02X", ch);
if(++count==24){
count = 0;
fprintf(fo, "\"");
}
}
if(count){
fprintf(fo, "\"");
}
fprintf(fo, ";\n");
fclose(fo);
fclose(fi);
return 0;
}
resave.c :
#include <stdio.h>
#include "image.h"
int main(int argc, char *argv[]){
if(argc != 2){
fprintf(stderr, "Usage >resave image.png\n");
return 0;
}
size_t size = sizeof(image)-1;
FILE *fo = fopen(argv[1], "wb");
fwrite(image, size, 1, fo);
fclose(fo);
return 0;
}

How do i read a file backwards using read() in c? [duplicate]

This question already has answers here:
Reading a text file backwards in C
(5 answers)
Closed 9 years ago.
I am supposed to create a program that takes a given file and creates a file with reversed txt. I wanted to know is there a way i can start the read() from the end of the file and copy it to the first byte in the created file if I dont know the exact size of the file?
Also i have googled this and came across many examples with fread, fopen, etc. However i cant use those for this project i can only use read, open, lseek, write, and close.
here is my code so far its not much but just for reference:
#include<stdio.h>
#include<unistd.h>
int main (int argc, char *argv[])
{
if(argc != 2)/*argc should be 2 for correct execution*/
{
printf("usage: %s filename",argv[0[]);}
}
else
{
int file1 = open(argv[1], O_RDWR);
if(file1 == -1){
printf("\nfailed to open file.");
return 1;
}
int reversefile = open(argv[2], O_RDWR | O_CREAT);
int size = lseek(argv[1], 0, SEEK_END);
char *file2[size+1];
int count=size;
int i = 0
while(read(file1, file2[count], 0) != 0)
{
file2[i]=*read(file1, file2[count], 0);
write(reversefile, file2[i], size+1);
count--;
i++;
lseek(argv[2], i, SEEK_SET);
}
I doubt that most filesystems are designed to support this operation effectively. Chances are, you'd have to read the whole file to get to the end. For the same reasons, most languages probably don't include any special feature for reading a file backwards.
Just come up with something. Try to read the whole file in memory. If it is too big, dump the beginning, reversed, into a temporary file and keep reading... In the end combine all temporary files into one. Also, you could probably do something smart with manual low-level manipulation of disk sectors, or at least with low-level programming directly against the file system. Looks like this is not what you are after, though.
Why don't you try fseek to navigate inside the file? This function is contained in stdio.h, just like fopen and fclose.
Another idea would be to implement a simple stack...
This has no error checking == really bad
get file size using stat
create a buffer with malloc
fread the file into the buffer
set a pointer to the end of the file
print each character going backwards thru the buffer.
If you get creative with google you can get several examples just like this.
IMO the assistance you are getting so far is not really even good hints.
This appears to be schoolwork, so beware of copying. Do some reading about the calls used here. stat (fstat) fread (read)
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
int main(int argc, char **argv)
{
struct stat st;
char *buf;
char *p;
FILE *in=fopen(argv[1],"r");
fstat(fileno(in), &st); // get file size in bytes
buf=malloc(st.st_size +2); // buffer for file
memset(buf, 0x0, st.st_size +2 );
fread(buf, st.st_size, 1, in); // fill the buffer
p=buf;
for(p+=st.st_size;p>=buf; p--) // print traversing backwards
printf("%c", *p);
fclose(in);
return 0;
}

How can I use Linux's splice() function to copy a file to another file?

here's another question about splice(). I'm hoping to use it to copy files, and am trying to use two splice calls joined by a pipe like the example on splice's Wikipedia page. I wrote a simple test case which only tries to read the first 32K bytes from one file and write them to another:
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
int main(int argc, char **argv) {
int pipefd[2];
int result;
FILE *in_file;
FILE *out_file;
result = pipe(pipefd);
in_file = fopen(argv[1], "rb");
out_file = fopen(argv[2], "wb");
result = splice(fileno(in_file), 0, pipefd[1], NULL, 32768, SPLICE_F_MORE | SPLICE_F_MOVE);
printf("%d\n", result);
result = splice(pipefd[0], NULL, fileno(out_file), 0, 32768, SPLICE_F_MORE | SPLICE_F_MOVE);
printf("%d\n", result);
if (result == -1)
printf("%d - %s\n", errno, strerror(errno));
close(pipefd[0]);
close(pipefd[1]);
fclose(in_file);
fclose(out_file);
return 0;
}
When I run this, the input file seems to be read properly, but the second splice call fails with EINVAL. Anybody know what I'm doing wrong here?
Thanks!
From the splice manpage:
EINVAL Target file system doesn't support splicing; target file is
opened in append mode; neither of the descriptors refers to a
pipe; or offset given for non-seekable device.
We know one of the descriptors is a pipe, and the file's not open in append mode. We also know no offset is given (0 is equivalent to NULL - did you mean to pass in a pointer to a zero offset?), so that's not the problem. Therefore, the filesystem you're using doesn't support splicing to files.
What kind of file system(s) are you copying to/from?
Your example runs on my system when both files are on ext3 but fails when I use an external drive (I forget offhand if it is DOS or NTFS). My guess is that one or both of your files are on a file system that splice does not support.
The splice(2) system call is for copying between files and pipes and not between files, so it can not be used to copy between files, as has been pointed out by the other answers.
As of Linux 4.5 however a new copy_file_range(2) system call is available that can copy between files. In the case of NFS it can even cause server side copying.
The linked man page contains a full example program.

Resources