Reading the first two bytes from a file efficiently - Golang - file

I'm trying to find a good way of reading the first two bytes from a file using Go.
I have some .zip files in my current directory, mixed in with other files.
I would like to loop through all the files in the directory and check if the first two bytes contain the right .zip identifier, namely 50 4B.
What would be a good way to accomplish this using the standard library without having to read the entire file?
Going through the available functions in the io package I managed to find:
func LimitReader(r Reader, n int64) Reader
Which seems to fit my description, it reads from Reader (How do I get a Reader?) but stops after n bytes. Since I'm rather new to Go, I'm not sure how to go about it.

You get the initial reader by opening the file. For 2 bytes, I wouldn't use the LimitReader though. Just reading 2 bytes with io.ReadFull is easier.
r, err := os.Open(file)
if err != nil {
return err
}
defer r.Close()
var header [2]byte
n, err := io.ReadFull(r, header[:])
if err != nil {
return err
}

Related

How to get the cursor location during parsing?

I made a minimal example for Packcc parser generator.
Here, the parser have to recognize float or integer numbers.
I try to print the location of the detected numbers. For simplicity there is no
line/column count, just the number from "ftell".
%auxil "FILE*" # The type sent to "pcc_create" for access in "ftell".
test <- line+
/
_ EOL+
line <- num _ EOL
num <- [0-9]+'.'[0-9]+ {printf("Float at %li\n", ftell(auxil));}
/
[0-9]+ {printf("Integer at %li\n", ftell(auxil));}
_ <- [ \t]*
EOL <- '\n' / '\r\n' / '\r'
%%
int main()
{
FILE* file = fopen("test.txt", "r");
stdin = file;
if(file == NULL) {
// try to open.
puts("File not found");
}
else {
// parse.
pcc_context_t *ctx = pcc_create(file);
while(pcc_parse(ctx, NULL));
pcc_destroy(ctx);
}
return 0;
}
The file to parse can be
2.0
42
The command can be
packcc test.peg && cc test.c && ./a.out
The problem is the cursor value is always at the end of file whatever the number
position.
Positions can be retrieved by special variables.
In the example above "ftell" must be replaced by "$0s" or "$0e".
$0s is the begining of the matched pattern, $0e is the end of the matched pattern.
https://github.com/arithy/packcc/blob/master/README.md
Without looking more closely at the generated code, it would seem that the parser insists on reading the entire text into memory before executing any of the actions. That seems unnecessary for this grammar, and it is certainly not the way a typical generated lexical scanner would work. It's particularly odd since it seems like the generated scanner uses getchar to read one byte at a time, which is not very efficient if you are planning to read the entire file.
To be fair, you wouldn't be able to use ftell in a flex-generated scanner either, unless you forced the scanner into interactive mode. (The original AT&T lex, which also reads one character at a time, would give you reasonable value from ftell. But you're unlikely to find a scanner built with it anymore.)
Flex would give you the wrong answer because it deliberately reads its input in chunks the size of its buffer, usually 8k. That's a lot more efficient than character-at-a-time reading. But it doesn't work for interactive environments -- for example, where you are parsing directly from user input -- because you don't want to read beyond the end of the line the user typed.
You'll have to ask whoever maintains packcc what their intended approach for maintaining source position is. It's possible that they have something built in.

How do I open a zip within a zip with libzip

I am trying to open a zip inside a zip
#include "zip.h"
#include "gtk.h"
zip_t *mainzipfile = zip_open(g_file_get_path(file), ZIP_CHECKCONS, &error);
zip_file_t *childzip = zip_fopen(mainzipfile, "child.zip", ZIP_RDONLY);// this segfaults
zip_file_t *childofchild = zip_fopen_index((zip_t*)childzip, 1, ZIP_RDONLY);
From what I am seeing childzip is not being read as a zip so its seg faulting.
I tried casting because I know childzip is a zip file but the program is failing to see it as so
How do I set the zip_file_t as zip_t so that I can also extract its children
There is no generic support for opening a ZIP file inside a zip. To some extent, this is because reading ZIP file require direct access to the data (the ability to seek by offset). However, compressed ZIP files do not support the ability to read by offset. The only way to read a specific offset is to rewind the zip_file_t object, and skip over bytes.
The leaves two possible scenarios (assuming the goal is to avoid extracting the inside zip into a file).
1. Reading from uncompressed zip.
In most cases, when a ZIP archive is placed into another ZIP archive, the zip program will realize that compression will not be effective, and will use the 'store' method. In those situation, it is possible to use zip_source_zip method to create (seekable) zip_source, which then be opened
See https://libzip.org/documentation/zip_source.html
// Find index
zip_int64_t child_idx= zip_name_locate(main_zip, "child.zip", flags);
// Create zip_source from the complete child.zip
zip_source_t *src = zip_source_zip(archive, main_zip, child_idx, flags, 0, 0);
// Create zip_t
zip_t child_zip = zip_open_from_source(src, flags, &error);
// work with the child zip
2. Unzipping into memory.
As an alternative, and assuming that the ZIP can fit in memory, consider reading the whole child zip into memory, than using the same context of zip_source to create a zip_source, which can be opened. In theory, simpler to implement.
zip_stat (...) ;
N = size_of_child_zip(...) ;
zip_file_t *child_file = zip_fopen(main_zip, "child.zip", flags);
char *buffer = calloc(1, N);
zip_fread(child_file, buffer, N) ;
zip_source = zip_source_buffer_create(buffer, N, ...)
// Create zip_t
zip_t child_zip = zip_open_from_source(zip_source, flags, &error);
// work with the child zip

Why does writing to a deleted file not return an error in Go?

This program successfully runs even though it's writing to a deleted file. Why does this work?
package main
import (
"fmt"
"os"
)
func main() {
const path = "test.txt"
f, err := os.Create(path) // Create file
if err != nil {
panic(err)
}
err = os.Remove(path) // Delete file
if err != nil {
panic(err)
}
_, err = f.WriteString("test") // Write to deleted file
if err != nil {
panic(err)
}
err = f.Close()
if err != nil {
panic(err)
}
fmt.Printf("No errors occurred") // test.txt doesn't exist anymore
}
On Unix-like systems, when a process opens a file it gets a File descriptor which points to the process File table entry, which, in turn, refers to inode structure on the disk. inode keeps file information, including data location.
Contents of a directory are just pairs of inode numbers and names.
If you delete a file, you simply delete a link to inode from the directory, inode still exists (as long as there is no link to it from somewhere, including processes) and data can be read and written from/to data location.
On Windows this code fails since Windows does not allow opened file to be deleted:
panic: remove test.txt: The process cannot access the file because it is being used by another process.
goroutine 1 [running]:
main.main()
D:/tmp/main.go:18 +0x1d1
exit status 2

Get file size given file descriptor in Go

If given a path, I would use this to get file size
file, _ := os.Open(path)
fi, _ := file.Stat()
fsuze := fi.Size()
But if only given fd, how can I get the file size?
Is there any way in Go like this in C:
lseek(fd, 0, SEEK_END)
You create a new *os.File from a file descriptor using the os.NewFile function.
You can do it exactly the same way as in C, using Seek
offset, err := f.Seek(0, os.SEEK_END)
But since you have the *os.File already, you can call Stat even if it was derived directly from the file descriptor.
try to get the file start
fileInfo, err := file.Stat()
if err != nil {...
}
files fileInfo.Size())

Truncate file in ocaml

How to truncate file to size N in ocaml ?
I don't see a function in Pervasive. The closest thing is "open_trunc" flag which I'm not really sure what it does.
If you're under Unix, you can use
val Unix.truncate : string -> int -> unit
It "runcates the named file to the given size".
But this function is not implemented in the Windows version of OCaml (or more precisely, it is not emulated).
If you're under Windows and want to emulate it, you might be interested in
val really_input : in_channel -> string -> int -> int -> unit
"really_input ic buf pos len reads len characters from channel ic, storing them in string buf, starting at character number pos. Raise End_of_file if the end of file is reached before len characters have been read. Raise Invalid_argument "really_input" if pos and len do not designate a valid substring of buf."
I think you are correct.
open_trunc -Open the named file for writing, and return a new output channel on that file, positionned at the beginning of the file. The file is truncated to zero length if it already exists. It is created if it does not already exists. Raise Sys_error if the file could not be opened.
Refer this link also.
The OS capabilities of Pervasives correspond roughly to what you can do in standard C. There is no function to truncate a file to a specified length in standard C, all you can do is truncate a file to be empty when you open it (exposed through the Open_trunc flag). There is one in Unix/POSIX (truncate), so look for it in the Unix module, which does have a truncate function (or ftruncate for an open file, again following Unix/POSIX).

Resources