How to get file length in Go dynamically? - file

I have the following code snippet:
func main() {
// Some text we want to compress.
original := "bird and frog"
// Open a file for writing.
f, _ := os.Create("C:\\programs\\file.gz")
// Create gzip writer.
w := gzip.NewWriter(f)
// Write bytes in compressed form to the file.
while ( looping over database cursor) {
w.Write([]byte(/* the row from the database as obtained from cursor */))
}
// Close the file.
w.Close()
fmt.Println("DONE")
}
However, I wish to know a small modification. When the size of file reaches a certain threshold I want to close it and open a new file. And that too in compressed format.
For example:
Assume a database has 10 rows each row is 50 bytes.
Assume compression factor is 2, ie 1 row of 50 bytes is compressed to 25 bytes.
Assume a file size limit is 50 bytes.
Which means after every 2 records I should close the file and open a new file.
How to keep track of the file size while its still open and still writing compressed documents to it ?

gzip.NewWriter takes a io.Writer. It is easy to implement custom io.Writer that does what you want.
E.g. Playground
type MultiFileWriter struct {
maxLimit int
currentSize int
currentWriter io.Writer
}
func (m *MultiFileWriter) Write(data []byte) (n int, err error) {
if len(data)+m.currentSize > m.maxLimit {
m.currentWriter = createNextFile()
}
m.currentSize += len(data)
return m.currentWriter.Write(data)
}
Note: You will need to handle few edge cases like what if len(data) is greater than the maxLimit. And may be you don't want to split a record across files.

You can use the os.File.Seek method to get your current position in the file, which as you're writing the file will be the current file size in bytes.
For example:
package main
import (
"compress/gzip"
"fmt"
"os"
)
func main() {
// Some text we want to compress.
lines := []string{
"this is a test",
"the quick brown fox",
"jumped over the lazy dog",
"the end",
}
// Open a file for writing.
f, err := os.Create("file.gz")
if err != nil {
panic(err)
}
// Create gzip writer.
w := gzip.NewWriter(f)
// Write bytes in compressed form to the file.
for _, line := range lines {
w.Write([]byte(line))
w.Flush()
pos, err := f.Seek(0, os.SEEK_CUR)
if err != nil {
panic(err)
}
fmt.Printf("pos: %d\n", pos)
}
// Close the file.
w.Close()
// The call to w.Close() will write out any remaining data
// and the final checksum.
pos, err := f.Seek(0, os.SEEK_CUR)
if err != nil {
panic(err)
}
fmt.Printf("pos: %d\n", pos)
fmt.Println("DONE")
}
Which outputs:
pos: 30
pos: 55
pos: 83
pos: 94
pos: 107
DONE
And we can confirm with wc:
$ wc -c file.gz
107 file.gz

Related

How to make write operations to a file faster

I am trying to write a large amount of data to a file but it takes quite some time. I have tried 2 solutions but they both take same amount of time. Here are the solutions I have tried;
Solution A:
f, err := os.Create("file.txt")
if err != nil {
log.Fatal(err)
}
defer f.Close()
w := bufio.NewWriter(f)
for _, d := range data {
bb, err := w.WriteString(fmt.Sprint(d + "\n"))
if err != nil {
fmt.Println(err)
}
}
err = w.Flush()
if err != nil {
log.Fatal(err)
}
Solution B:
e, err := os.OpenFile(filePath, os.O_RDWR|os.O_CREATE|os.O_TRUNC, 0666)
if err != nil {
panic(err)
}
defer e.Close()
for _, d := range data {
_, err = e.WriteString(d)
err = e.Sync()
if err != nil {
return err
}
}
Any other suggestion on how I can make this write operation faster?
I think bufio is your friend, as it can help to reduce the number of sys calls required to write the data to disk. You are already using it as part of solution A, however note the default buffer size is 4K. If you want to try larger buffer sizes you can use NewWriterSize() to set a larger buffer for the writer.
See https://pkg.go.dev/bufio#NewWriterSize
Based on your solution A I have created a benchmark test you can use for experimenting with different buffer sizes. For the test I am using a data set of 100k records of 600 bytes written to the file. The results I get on my machine for 10 repetitive calls of the FUT with various buffer sizes are as follows:
BenchmarkWriteTest/Default_Buffer_Size
BenchmarkWriteTest/Default_Buffer_Size-10 15 73800317 ns/op
BenchmarkWriteTest/Buffer_Size_16K
BenchmarkWriteTest/Buffer_Size_16K-10 21 55606873 ns/op
BenchmarkWriteTest/Buffer_Size_64K
BenchmarkWriteTest/Buffer_Size_64K-10 25 49562057 ns/op
As you can see the number of iterations in the test interval (first number) increases significantly with larger buffer size. Accordingly the time spent per operation drops.
https://gist.github.com/mwittig/f1e6a81c2378906292e2e4961f422870
Combine all your data into a single string, and write that in one operation. This will avoid the overhead of filesystem calls.

Jump to specific line in file in Go

In Go is it possible to jump to particular line number in a file and delete it? Something like linecache in python.
I'm trying to match some substrings in a file and remove the corresponding lines. The matching part I've taken care of and I have an array with line numbers I need to delete but I'm stuck on how to delete the matching lines in the file.
This is an old question, but if anyone is looking for a solution I wrote a package that handles going to any line in a file. Link here. It can open a file and seek to any line position without reading the whole file into memory and splitting.
import "github.com/stoicperlman/fls"
// This is just a wrapper around os.OpenFile. Alternatively
// you could open from os.File and use fls.LineFile(file) to get f
f, err := fls.OpenFile("test.log", os.O_CREATE|os.O_WRONLY, 0600)
defer f.Close()
// return begining line 1/begining of file
// equivalent to f.Seek(0, io.SeekStart)
pos, err := f.SeekLine(0, io.SeekStart)
// return begining line 2
pos, err := f.SeekLine(1, io.SeekStart)
// return begining of last line
pos, err := f.SeekLine(0, io.SeekEnd)
// return begining of second to last line
pos, err := f.SeekLine(-1, io.SeekEnd)
Unfortunately I'm not sure how you would delete, this just handles getting you to the correct position in the file. For your case you could use it to go to the line you want to delete and save the position. Then seek to the next line and save that as well. You now have the bookends of the line to delete.
// might want lineToDelete - 1
// this acts like 0 based array
pos1, err := f.SeekLine(lineToDelete, io.SeekStart)
// skip ahead 1 line
pos2, err := f.SeekLine(1, io.SeekCurrent)
// pos2 will be the position of the first character in next line
// might want pos2 - 1 depending on how the function works
DeleteBytesFromFileFunction(f, pos1, pos2)
Based on my read of the linecache module it takes a file and explodes it into an array based on '\n' line endings. You could replicate the same behavior in Go by using strings or bytes. You could also use the bufio library to read a file a line by line and only store or save the lines you want.
package main
import (
"bytes"
"fmt"
)
import "io/ioutil"
func main() {
b, e := ioutil.ReadFile("filename.txt")
if e != nil {
panic(e)
}
array := bytes.Split(b, []byte("\n"))
fmt.Printf("%v", array)
}
I wrote a small function that allowing you remove from a file a specific line.
package main
import (
"io/ioutil"
"os"
"strings"
)
func main() {
path := "path/to/file.txt"
removeLine(path, 2)
}
func removeLine(path string, lineNumber int) {
file, err := ioutil.ReadFile(path)
if err != nil {
panic(err)
}
info, _ := os.Stat(path)
mode := info.Mode()
array := strings.Split(string(file), "\n")
array = append(array[:lineNumber], array[lineNumber+1:]...)
ioutil.WriteFile(path, []byte(strings.Join(array, "\n")), mode)
}

Golang How to read input filename in Go

I would like to run my go file on my input.txt where my go program will read the input.txt file when I type in go run command ie:
go run goFile.go input.txt
I don't want to put input.txt in my goFile.go code since my go file should run on any input name not just input.txt.
I try ioutil.ReadAll(os.Stdin) but I need to change my command to
go run goFile.go < input.txt
I only use package fmt, os, bufio and io/ioutil. Is it possible to do it without any other packages?
Please take a look at the package documentation of io/ioutil which you are already using.
It has a function exactly for this: ReadFile()
func ReadFile(filename string) ([]byte, error)
Example usage:
func main() {
// First element in os.Args is always the program name,
// So we need at least 2 arguments to have a file name argument.
if len(os.Args) < 2 {
fmt.Println("Missing parameter, provide file name!")
return
}
data, err := ioutil.ReadFile(os.Args[1])
if err != nil {
fmt.Println("Can't read file:", os.Args[1])
panic(err)
}
// data is the file content, you can use it
fmt.Println("File content is:")
fmt.Println(string(data))
}
Firs you check for the provided argument. If the first argument satisfy the condition of an input file, then you use the ioutil.ReadFile method, providing parameter the os.Args result.
package main
import (
"fmt"
"os"
"io/ioutil"
)
func main() {
if len(os.Args) < 1 {
fmt.Println("Usage : " + os.Args[0] + " file name")
os.Exit(1)
}
file, err := ioutil.ReadFile(os.Args[1])
if err != nil {
fmt.Println("Cannot read the file")
os.Exit(1)
}
// do something with the file
fmt.Print(string(file))
}
Another possibility is to use:
f, err := os.Open(os.Args[0])
but for this you need to provide the bytes lenght to read:
b := make([]byte, 5) // 5 is the length
n, err := f.Read(b)
fmt.Printf("%d bytes: %s\n", n, string(b))
For running .go file from command-line by input parameter like file (for example abc.txt).We need use mainly os, io/ioutil, fmt packages. Additionally for reading command line parameters we use
os.Args Here is example code
package main
import (
"fmt"
"os"
"io/ioutil"
)
func main() {
fmt.Println(" Hi guys ('-') ")
input_files := os.Args[1:]
//input_files2 := os.Args[0];
//fmt.Println("if2 : ",input_files2)
if len(input_files) < 1{
fmt.Println("Not detected files.")
}else{
fmt.Println("File_name is : ",input_files[0])
content, err := ioutil.ReadFile(input_files[0])
if err != nil {
fmt.Println("Can't read file :", input_files[0],"Error : ",err)
}else {
fmt.Println("Output file content is(like string type) : \n",string(content))//string Output
fmt.Println("Output file content is(like byte type) : \n",content)//bytes Output
}
}
}
Args holds command line arguments, including the command as Args[0].
If the Args field is empty or nil, Run uses {Path}.
In typical use, both Path and Args are set by calling Command.
Args []string
function. This function return back array on string type https://golang.org/pkg/os/exec/.Args hold the command-line arguments, starting with the program name. In this case short way to take filename from command-line is this functions os.Args[1:] . And here is output
elshan_abd$ go run main.go abc.txt
Hi guys ('-')
File_name is : abc.txt
Output file content is(like string type) :
aaa
bbb
ccc
1234
Output file content is(like byte type) :
[97 97 97 10 98 98 98 10 99 99 99 10 49 50 51 52 10]
Finally we need for reading content file this function
func ReadFile(filename string) ([]byte, error) source is https://golang.org/pkg/io/ioutil/#ReadFile

How to read a file starting from a specific line number using Scanner?

I am new to Go and I am trying to write a simple script that reads a file line by line. I also want to save the progress (i.e. the last line number that was read) on the filesystem somewhere so that if the same file was given as the input to the script again, it starts reading the file from the line where it left off. Following is what I have started off with.
package main
// Package Imports
import (
"bufio"
"flag"
"fmt"
"log"
"os"
)
// Variable Declaration
var (
ConfigFile = flag.String("configfile", "../config.json", "Path to json configuration file.")
)
// The main function that reads the file and parses the log entries
func main() {
flag.Parse()
settings := NewConfig(*ConfigFile)
inputFile, err := os.Open(settings.Source)
if err != nil {
log.Fatal(err)
}
defer inputFile.Close()
scanner := bufio.NewScanner(inputFile)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
}
// Saves the current progress
func SaveProgress() {
}
// Get the line count from the progress to make sure
func GetCounter() {
}
I could not find any methods that deals with line numbers in the scanner package. I know I can declare an integer say counter := 0 and increment it each time a line is read like counter++. But the next time how do I tell the scanner to start from a specific line? So for example if I read till line 30 the next time I run the script with the same input file, how can I make scanner to start reading from line 31?
Update
One solution I can think of here is to use the counter as I stated above and use an if condition like the following.
scanner := bufio.NewScanner(inputFile)
for scanner.Scan() {
if counter > progress {
fmt.Println(scanner.Text())
}
}
I am pretty sure something like this would work, but it is still going to loop over the lines that we have already read. Please suggest a better way.
If you don't want to read but just skip the lines you read previously, you need to acquire the position where you left off.
The different solutions are presented in a form of a function which takes the input to read from and the start position (byte position) to start reading lines from, e.g.:
func solution(input io.ReadSeeker, start int64) error
A special io.Reader input is used which also implements io.Seeker, the common interface which allows skipping data without having to read them. *os.File implements this, so you are allowed to pass a *File to these functions. Good. The "merged" interface of both io.Reader and io.Seeker is io.ReadSeeker.
If you want a clean start (to start reading from the beginning of the file), simply pass start = 0. If you want to resume a previous processing, pass the byte position where the last processing was stopped/aborted. This position is the value of the pos local variable in the functions (solutions) below.
All the examples below with their testing code can be found on the Go Playground.
1. With bufio.Scanner
bufio.Scanner does not maintain the position, but we can very easily extend it to maintain the position (the read bytes), so when we want to restart next, we can seek to this position.
In order to do this with minimal effort, we can use a new split function which splits the input into tokens (lines). We can use Scanner.Split() to set the splitter function (the logic to decide where are the boundaries of tokens/lines). The default split function is bufio.ScanLines().
Let's take a look at the split function declaration: bufio.SplitFunc
type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
It returns the number of bytes to advance: advance. Exactly what we need to maintain the file position. So we can create a new split function using the builtin bufio.ScanLines(), so we don't even have to implement its logic, just use the advance return value to maintain position:
func withScanner(input io.ReadSeeker, start int64) error {
fmt.Println("--SCANNER, start:", start)
if _, err := input.Seek(start, 0); err != nil {
return err
}
scanner := bufio.NewScanner(input)
pos := start
scanLines := func(data []byte, atEOF bool) (advance int, token []byte, err error) {
advance, token, err = bufio.ScanLines(data, atEOF)
pos += int64(advance)
return
}
scanner.Split(scanLines)
for scanner.Scan() {
fmt.Printf("Pos: %d, Scanned: %s\n", pos, scanner.Text())
}
return scanner.Err()
}
2. With bufio.Reader
In this solution we use the bufio.Reader type instead of the Scanner. bufio.Reader already has a ReadBytes() method which is very similar to the "read a line" functionality if we pass the '\n' byte as the delimeter.
This solution is similar to JimB's, with the addition of handling all valid line terminator sequences and also stripping them off from the read line (it is very rare they are needed); in regular expression notation, it is \r?\n.
func withReader(input io.ReadSeeker, start int64) error {
fmt.Println("--READER, start:", start)
if _, err := input.Seek(start, 0); err != nil {
return err
}
r := bufio.NewReader(input)
pos := start
for {
data, err := r.ReadBytes('\n')
pos += int64(len(data))
if err == nil || err == io.EOF {
if len(data) > 0 && data[len(data)-1] == '\n' {
data = data[:len(data)-1]
}
if len(data) > 0 && data[len(data)-1] == '\r' {
data = data[:len(data)-1]
}
fmt.Printf("Pos: %d, Read: %s\n", pos, data)
}
if err != nil {
if err != io.EOF {
return err
}
break
}
}
return nil
}
Note: If the content ends with an empty line (line terminator), this solution will process an empty line. If you don't want this, you can simply check it like this:
if len(data) != 0 {
fmt.Printf("Pos: %d, Read: %s\n", pos, data)
} else {
// Last line is empty, omit it
}
Testing the solutions:
Testing code will simply use the content "first\r\nsecond\nthird\nfourth" which contains multiple lines with varying line terminating. We will use strings.NewReader() to obtain an io.ReadSeeker whose source is a string.
Test code first calls withScanner() and withReader() passing 0 start position: a clean start. In the next round we will pass a start position of start = 14 which is the position of the 3. line, so we won't see the first 2 lines processed (printed): resume simulation.
func main() {
const content = "first\r\nsecond\nthird\nfourth"
if err := withScanner(strings.NewReader(content), 0); err != nil {
fmt.Println("Scanner error:", err)
}
if err := withReader(strings.NewReader(content), 0); err != nil {
fmt.Println("Reader error:", err)
}
if err := withScanner(strings.NewReader(content), 14); err != nil {
fmt.Println("Scanner error:", err)
}
if err := withReader(strings.NewReader(content), 14); err != nil {
fmt.Println("Reader error:", err)
}
}
Output:
--SCANNER, start: 0
Pos: 7, Scanned: first
Pos: 14, Scanned: second
Pos: 20, Scanned: third
Pos: 26, Scanned: fourth
--READER, start: 0
Pos: 7, Read: first
Pos: 14, Read: second
Pos: 20, Read: third
Pos: 26, Read: fourth
--SCANNER, start: 14
Pos: 20, Scanned: third
Pos: 26, Scanned: fourth
--READER, start: 14
Pos: 20, Read: third
Pos: 26, Read: fourth
Try the solutions and testing code on the Go Playground.
Instead of using a Scanner, use a bufio.Reader, specifically the ReadBytes or ReadString methods. This way you can read up to each line termination, and still receive the full line with line endings.
r := bufio.NewReader(inputFile)
var line []byte
fPos := 0 // or saved position
for i := 1; ; i++ {
line, err = r.ReadBytes('\n')
fmt.Printf("[line:%d pos:%d] %q\n", i, fPos, line)
if err != nil {
break
}
fPos += len(line)
}
if err != io.EOF {
log.Fatal(err)
}
You can store the combination of file position and line number however you choose, and the next time you start, you use inputFile.Seek(fPos, os.SEEK_SET) to move to where you left off.
If you want to use Scanner you have go trough the begging of the file till you find GetCounter() end-line symbols.
scanner := bufio.NewScanner(inputFile)
// context line above
// skip first GetCounter() lines
for i := 0; i < GetCounter(); i++ {
scanner.Scan()
}
// context line below
for scanner.Scan() {
fmt.Println(scanner.Text())
}
Alternatively you could store offset instead of line number in the counter but remember that termination token is stripped when using Scanner and for new line the token is \r?\n (regexp notation) so it isn't clear if you should add 1 or 2 to the text length:
// Not clear how to store offset unless custom SplitFunc provided
inputFile.Seek(GetCounter(), 0)
scanner := bufio.NewScanner(inputFile)
So it is better to use previous solution or not using Scanner at all.
There's a lot of words in the other answers, and they're not really reusable code so here's a re-usable function that seeks to the given line number & returns it and the offset where the line starts. play.golang
func SeekToLine(r io.Reader, lineNo int) (line []byte, offset int, err error) {
s := bufio.NewScanner(r)
var pos int
s.Split(func(data []byte, atEof bool) (advance int, token []byte, err error) {
advance, token, err = bufio.ScanLines(data, atEof)
pos += advance
return advance, token, err
})
for i := 0; i < lineNo; i++ {
offset = pos
if !s.Scan() {
return nil, 0, io.EOF
}
}
return s.Bytes(), pos, nil
}

Go : concatenate file contents

I'm currently learning how to develop with Go (or golang) and I have a strange issue:
I try to create a script looking inside an HTML file in order to get all the sources of each tags.
The goal of the script is to merge all the retrieved files.
So, that's for the story: for now, I'm able to get the content of each JavaScript files but... I can't concatenate them...
You can see below my script:
//Open main file
mainFilePath := "/path/to/my/file.html"
mainFileDir := path.Dir(mainFilePath)+"/"
mainFileContent, err := ioutil.ReadFile(mainFilePath)
if err == nil {
mainFileContent := string(mainFileContent)
var finalFileContent bytes.Buffer
//Start RegExp searching for JavaScript src
scriptReg, _ := regexp.Compile("<script src=\"(.*)\">")
scripts := scriptReg.FindAllStringSubmatch(mainFileContent,-1)
//For each SRC found...
for _, path := range scripts {
//We open the corresponding file
subFileContent, err := ioutil.ReadFile(mainFileDir+path[1])
if err == nil {
//And we add its content to the "final" variable
fmt.Println(finalFileContent.Write(subFileContent))
} else {
fmt.Println(err)
}
}
//Try to display the final result
// fmt.Println(finalFileContent.String())
fmt.Printf(">>> %#v", finalFileContent)
fmt.Println("Y U NO WORKS? :'(")
} else {
fmt.Println(err)
}
So, each fmt.Println(finalFileContent.Write(subFileContent)) display something like 6161 , so I assume the Write() method is correctly executed.
But fmt.Printf(">>> %#v", finalFileContent) displays nothing. Absolutely nothing (even the ">>>" are not displayed!) And it's the same for the commented line just above.
The funny part is that the string "Y U NO WORK ? :'(" is correctly displayed...
Do you know why?
And do you know how to solve this issue?
Thanks in advance!
You are ignoring some errors. What are your results when you run the following version of your code?
package main
import (
"bytes"
"fmt"
"io/ioutil"
"path"
"regexp"
)
func main() {
//Open main file
mainFilePath := "/path/to/my/file.html"
mainFileDir := path.Dir(mainFilePath) + "/"
mainFileContent, err := ioutil.ReadFile(mainFilePath)
if err == nil {
mainFileContent := string(mainFileContent)
var finalFileContent bytes.Buffer
//Start RegExp searching for JavaScript src
scriptReg, _ := regexp.Compile("<script src=\"(.*)\">")
scripts := scriptReg.FindAllStringSubmatch(mainFileContent, -1)
//For each SRC found...
for _, path := range scripts {
//We open the corresponding file
subFileContent, err := ioutil.ReadFile(mainFileDir + path[1])
if err == nil {
//And we add its content to the "final" variable
// fmt.Println(finalFileContent.Write(subFileContent))
n, err := finalFileContent.Write(subFileContent)
fmt.Println("finalFileContent Write:", n, err)
} else {
fmt.Println(err)
}
}
//Try to display the final result
// fmt.Println(finalFileContent.String())
// fmt.Printf(">>> %#v", finalFileContent)
n, err := fmt.Printf(">>> %#v", finalFileContent)
fmt.Println()
fmt.Println("finalFileContent Printf:", n, err)
fmt.Println("Y U NO WORKS? :'(")
} else {
fmt.Println(err)
}
}
UPDATE:
The statement:
fmt.Println("finalFileContent Printf:", n, err)
Outputs:
finalFileContent Printf: 0 write /dev/stdout: winapi error #8
or
finalFileContent Printf: 0 write /dev/stdout: Not enough storage is available to process this command.
From MSDN:
ERROR_NOT_ENOUGH_MEMORY
8 (0x8)
Not enough storage is available to process this command.
The formatted output to the Windows console overflows the buffer (circa 64KB).
There is a related Go open issue:
Issue 3376: windows: detect + handle console in os.File.Write

Resources