Search a pattern in .gz file in C - c

I want to read a .gz file (text.gz) with 300MB length and search a pattern in it. I opened the text file in a binary format using fopen with "rb" and stored it in a buffer. When I search a pattern that I know it exists in the text, the result is wrong. When I debug the program, the elements of the buffer are different from what I expect. Do I have to read and store these kind of files in other ways??????

You might try using zlib and gzread to read the file.
http://zlib.net/manual.html

Try this.
gunzip -c file.gz | grep <pattern>

If the program is exiting and failing to read the file, a real common problem is that you don't close the file in Notepad or whatever is using it and the FileIO fails due to not being able to access the file. Make sure you don't have anything with that file open before you test your program.

Related

how to get the file name in C program, that i had given in input redirection?

steps:
Let's say I have a C program inputFileName.c
I run inputFileName with input redirection such as ./inputFileName < file
How can I print the name of the file in my C program that I have typed in the terminal as an input redirection file?
The input redirection is a function of the shell. Your inputFileName executable see this as standard input. Depending on the exact operating system, you may be able to use system-specific functions to get the information you want, but there is not a standard means of doing so.
Input redirection can be achieved not only with the '<' symbol, but also with '|'.
program < filename
is equivalent to
cat filename | program
From there, one could go to
cat file1 file2 file3 | program
You begin to see why the initial 'stdin' for an executable cannot and does not have a "filename" associated with it.
If input comes from a pipe, there can't be an associated filename. Also if the file has been deleted or moved before closing the file descriptor, there is no associated filename. A file can have multiple names. In that case there are multiple filenames.
Given that, the "associated filename" of a file descriptor doesn't really make much sense. And even if you could get that info, using the filename in any way might make race conditions an issue.
The linux kernel does try to track an associated filename if a file descriptor was created by opening a file. But the keyword here is "tries".
If you are running Linux, you can find the filenname for standard input as a symlink under "/proc/self/fd/0". Just remember that you should not rely on that name for anything more than debug or display purposes.

Untarring file fails when fd is not previously closed

Consider the following scenario: I am opening a tar file (say abc.tar.gz), writing the data, and before closing the file descriptor, I am trying to extract the same file.
I am unable to do so. But if I extract the file after having closing the fd, it works fine.
I wonder what could be the reason.
All files has a position where data is read or written. After writing to the file, the position is at the end. Trying to read will attempt to read from that position. You have to change the position to the beginning of the file with a function like lseek.
Also, you did open the file in both read and write mode?
Edit
After reading your comments, I see you do not actually read the file from inside your program, but from an external program. Then it might be as simple as you not flushing the file to disk, which happens automatically when closing a file. You might want to check the fsync function for that, or possible the sync function.

fopen() returning a NULL pointer, but the file definitely exists

The code I have is as follows:
FILE *txt_file = fopen("data.txt", "r");
if (txt_file == NULL) {
perror("Can't open file");
}
The error message returned is:
Can't open file: No such file or directory
The file 'data.txt' definitely exists in the working directory (it exists in the directory that contains my .c and .h files), so why is fopen() is returning a NULL pointer?
Standard problem. Try
FILE *txt_file = fopen("C:\\SomeFolder\\data.txt", "r");
I.e. try opening it with the full absolute path first ; if it works then you just have to figure out what the current directory is with _getcwd() and then fix your relative path.
Is it possible that the filename is not really "data.txt"?
On Unix, filenames are really byte strings not character strings, and it is possible to create files with controls such as backspace in their names. I have seen cases in the past in which copy-pasting into terminals resulted in files with ordinary-looking names, but trying to open the filename that appears in a directory listing results in an error.
One way to tell for sure that the filenames really are what you think they are:
$ python
>>> import os
>>> os.listdir('.')
My problem was that I had a file filename.txt and I didn't realize that in reality it was filename.txt.txt due to windows not showing the extension.
Make sure that your input file is in the same directory as the executable, which may be different than the one where your source files are kept. If you're running the program in an IDE debugger, make sure that your working directory is set to the location of the input file. Also, if you're running in *nix rather than Windows, you may need to prepend a "./" to the input filename.
Invisible SPACE character in file name?
Once a year I have a similar problem:
I try to open a file with the filename in a string, obtained from a sting operation. When I print the name it seems OK, but fopen() returns a null pointer. The only help is printing the name with delimiters showing the exact beginning and end of the filename string. Of course this does not not help with unprintable chars.
I just had a similar issue like this where I knew the path was correct and the file was in the right location. Check the file permissions. It is possible that the program cannot access the file because it is getting permission denied.
I encountered the same errno to fopen on Linux from a script file corrupted by Windows.
ENOENT 2 No such file or directory
Wordpad on Windows (or some other Microsoft culprit) inserted CRLF = (0x0D, 0x0A) into my linux script files in place of newline = LF = 0x0A. When I read the file name into a buffer and called fopen if failed due to the invisible appended CR character.
In the Codelite editor on Linux Mint I was able to show EOL characters (View > Display EOL) and remove them with find and replace, using copy and paste of the CRLF from the corrupted script files and the LF from an uncorrupted file into the text fields.

C:copying multiple files into one

I am stuck/struggling with a problem I am trying in C(Linux) using API calls(only) to copy multiple input files via command line into one output file. I have searched the Internet for answers but none seem to solve.
My program allows me to specify multiple input files and one output file via the command line. For example:
./archiver file1.txt file2 file3 file4 outputfile
I read these parameters using argc/argv. For some reason when I do ls -l, ./archiver and outputfile have the same number of bytes, thus meaning none of my input files have been copied to my outputfile, just whatever was in memory (when I do cat outputfile it shows a bunch of these )
None of the contents from my input files are in my output files.
Please could you help me as after those bunch of "" I don't know what to do I have tried reading up on malloc() etc. but I don't know how to implement that or if thats even relevant here.
Any help is appreciated, thanks for your time.
file_desc_in = open(argv[i],O_RDONLY,0);
//NEED a loop to copy multiple files in...
while (!eof) {
bytes_read = read(file_desc_in, &buffer, sizeof(buffersize));
if (bytes_read > 1)
bytes_written = write(file_desc_out, &i, bytes_read);
else {
eof=1;
}
I haven't included the errors but I do have them. Thanks for replying immediately.
It'd help to see your code. There's not a lot here to go on, but I'm going to take a wild guess. I suspect you're copying the file specified by argv[0] (your program) and not getting the rest. I don't think I can do any better with what you've given.
You say you are only using API calls. What API are you talking about? The POSIX API? The standard C file I/O API?
If you are just combining input files, you don't really need to write a C program to do it. Since you are running Linux, try using the shell command cat input1 input2 input3 > output.
If you must write a C program to do it, start simple. Before you actually do any file I/O, make sure that you can interpret the input arguments correctly. Have your program simply read in the command-line input and print out something like this:
Input files: file1.txt file2.txt file2.txt
Output files: outputfile.txt
That way, you can verify that your CLI parsing code works correctly before you start worrying about file I/O. It's much easier to debug things one piece at a time.
Your outer loop needs to open each filename, and close it at the end of the loop. You close the output file at the very end, after all the input files are read.
You should also learn the difference between open, read, write and fopen, fread, fwrite.

How to check whether two file names point to the same physical file

I have a program that accepts two file names as arguments: it reads the first file in order to create the second file. How can I ensure that the program won't overwrite the first file?
Restrictions:
The method must keep working when the file system supports (soft or hard) links
File permissions are fixed and it is only required that the first file is readable and the second file writeable
It should preferably be platform-neutral (although Linux is the primary target)
On linux, open both files, and use fstat to check if st_ino (edit:) and st_dev are the same. open will follow symbolic links. Don't use stat directly, to prevent race conditions.
The best bet is not to use filenames as identities. Instead, when you open the file for reading, lock it, using whatever mechanism your OS supports. When you then also open the file for writing, also lock it - if the lock fails, report an error.
If possible, open the first file read-only, (O_RDONLY) in LINUX. Then, if you try to open it again to write to it, you will get an error.
You can use stat to get the file status, and check if the inode numbers are the same.
Maybe you could use the system() function in order to invoke some shell commands?
In bash, you would simply call:
stat -c %i filename
This displays the inode number of a file. You can compare two files this way and if their inodes are identical, it means they are hard links. The following call:
stat -c %N filename
will display the file's name and if it's a symbolic link, it'll print the file name it links to as well. It prints out only one name, even if the file it points to has hard links, so checking the symbolic link would require comparing inode numbers for the 2nd file and the file the symbolic links links to in order to make sure.
You could redirect stat output to a text file and then parse the file in your program.
If you mean the same inode, in bash, you could do
[ FILE1 -ef FILE2 ] && echo equal || echo difference
Combined with realpath/readlink, that should handle the soft-links as well.

Resources