Read line-by-line for big files

Read line-by-line for big files - file

I'm trying to write reader for big files, based on iterations in Clojure. But how I can return line by line strings in Clojure? I want to make something like that:
(println (do_something(readFile (:file opts))) ; process and print first line
(println (do_something(readFile (:file opts))) ; process and print second line
Code:
(ns testapp.core
(:gen-class)
(:require [clojure.tools.cli :refer [cli]])
(:require [clojure.java.io]))
(defn readFile [file, cnt]
; Iterate over opened file (read line by line)
(with-open [rdr (clojure.java.io/reader file)]
(let [seq (line-seq rdr)]
; how return only one line there? and after, when needed, take next line?
)))
(defn -main [& args]
; Main function for project
(let [[opts args banner]
(cli args
["-h" "--help" "Print this help" :default false :flag true]
["-f" "--file" "REQUIRED: File with data"]
["-c" "--clusters" "Count of clusters" :default 3]
["-g" "--hamming" "Use Hamming algorithm"]
["-e" "--evklid" "Use Evklid algorithm"]
)]
; Print help, when no typed args
(when (:help opts)
(println banner)
(System/exit 0))
; Or process args and start work
(if (and (:file opts) (or (:hamming opts) (:evklid opts)))
(do
; Use Hamming algorithm
(if (:hamming opts)
(do
(println (readFile (:file opts))
(println (readFile (:file opts))
)
;(count (readFile (:file opts)))
; Use Evklid algorithm
(println "Evklid")))
(println "Please, type path for file and algorithm!"))))

May be i'm not understanding right what do you mean by "return line by line", but i'll suggest you to write function, which accepts file and processing function, then prints result of processing fuction for every line of your big file. Or, evem more general way, let's accept processing function and output function (println by default), so if we want not just print, but send it over network, save someplace, send to another thread, etc:
(defn process-file-by-lines
"Process file reading it line-by-line"
([file]
(process-file-by-lines file identity))
([file process-fn]
(process-file-by-lines file process-fn println))
([file process-fn output-fn]
(with-open [rdr (clojure.java.io/reader file)]
(doseq [line (line-seq rdr)]
(output-fn
(process-fn line))))))
So
(process-file-by-lines "/tmp/tmp.txt") ;; Will just print file line by ine
(process-file-by-lines "/tmp/tmp.txt"
reverse) ;; Will print each line reversed

Try doseq:
(defn readFile [file]
(with-open [rdr (clojure.java.io/reader file)]
(doseq [line (line-seq rdr)]
(println line))))

You can also try to read lazily from the reader, which is not the same as the lazy list of strings returned by line-seq. The details are discussed in this answer to a very similar question, but the gist of it is here:
(defn lazy-file-lines [file]
(letfn [(helper [rdr]
(lazy-seq
(if-let [line (.readLine rdr)]
(cons line (helper rdr))
(do (.close rdr) nil))))]
(helper (clojure.java.io/reader file))))
You can then map over the lines which will only be read as far as necessary. As discussed in more details in the linked answer, the downside is that if you don't read till the end of the file, the (.close rdr) will never be run, potentially causing issues with resources.

Related

How can I succesfully re-write a file appending an element to a list using Common Lisp?

I want to write a Common Lisp list in a .lisp file. If the file does not exist, it will be created and add the element.
If the file already exists, it will re-write the file appending new content to the list.
This implementation partially works:
(defun append-to-list-in-file (filename new-item &aux contents) ;;;;
(setq contents (list)) ;; default in case reading fails
(ignore-errors
(with-open-file (str filename :direction :input)
(setq contents (read str))))
(setq contents (nconc contents (list new-item)))
(with-open-file (str filename :direction :output :if-exists :overwrite)
(write contents :stream str)))
If I do:
(append-to-list-in-file "/home/pedro/miscellaneous/misc/tests-output/CL.lisp" 4)
It works. The code creates the file AND puts 4 inside of it as '(4). However, if I run the code again with a new element using the file that was just created:
(append-to-list-in-file "/home/pedro/miscellaneous/misc/tests-output/CL.lisp" 5)
It throws an error:
Error opening #P"/home/pedro/miscellaneous/misc/tests-output/CL.lisp"
[Condition of type SB-EXT:FILE-EXISTS]
I was expecting: '(4 5)
What do I need to change?

Probably a good idea to create the file. Otherwise you can't overwrite it.
... :if-does-not-exist :create :if-exists :overwrite ...

How to read a whole binary file (Nippy) into byte array in Clojure?

I need to convert Nippy data structures stored on disk into something that can be read by Nippy? Nippy uses byte arrays, so I need some way to convert the file into a byte array. I have tried
(clojure.java.io/to-byte-array (clojure.java.io/file folder-path file-path))
but this gives
java.lang.IllegalArgumentException: Value out of range for byte: ?
Then I try:
(into-array Byte/TYPE (map byte (slurp (clojure.java.io/file folder-path file-path))))
but somehow the namespace is wrong, and I can't find the right one.
To write the Nippy structures in the first place, I am using:
(with-open [w (clojure.java.io/output-stream file-path)]
(.write w (nippy/freeze data)))))

Here's how I do it generically with clojure built-ins
(defn slurp-bytes
"Slurp the bytes from a slurpable thing"
[x]
(with-open [in (clojure.java.io/input-stream x)
out (java.io.ByteArrayOutputStream.)]
(clojure.java.io/copy in out)
(.toByteArray out)))
EDIT: Updated answer based on Jerry101's suggestion in comments.

I'm not aware of anything built-in to Clojure that will handle this. You definitely don't want slurp because that will decode the stream contents as text.
You could write your own method to do this, basically reading from the InputStream into a buffer and writing the buffer to a java.io.ByteArrayOutputStream. Or you could use the IOUtils class from Apache Commons IO:
(require '[clojure.java.io :as io])
(import '[org.apache.commons.io IOUtils])
(IOUtils/toByteArray (io/input-stream file-path))
You should also take a look at Nippy's thaw-from-in! and freeze-to-out! functions:
(import '[java.io DataInputStream DataOutputStream])
(with-open [w (io/output-stream file-path)]
(nippy/freeze-to-out! (DataOutputStream. w) some-data))
(with-open [r (io/input-stream file-path)]
(nippy/thaw-from-in! (DataInputStream. r)))

Since you know the .length of the file, you can allocate once and use DataInputStream's readFully method. No additional libraries, buffer copies, or loops required.
(defn file-to-byte-array
[^java.io.File file]
(let [result (byte-array (.length file))]
(with-open [in (java.io.DataInputStream. (clojure.java.io/input-stream file))]
(.readFully in result))
result))

A quick make-shift solution may be this code:
(defn slurpb [is]
"Convert an input stream is to byte array"
(with-open [baos (java.io.ByteArrayOutputStream.)]
(let [ba (byte-array 2000)]
(loop [n (.read is ba 0 2000)]
(when (> n 0)
(.write baos ba 0 n)
(recur (.read is ba 0 2000))))
(.toByteArray baos))))
;;test
(String. (slurpb (java.io.ByteArrayInputStream. (.getBytes "hello"))))

Please note that I just cut Nippy v2.13.0 which now includes a pair of helper utils to help simplify this use case: freeze-to-file and thaw-from-file.
Release details at: https://github.com/ptaoussanis/nippy/releases/tag/v2.13.0
Cheers!

You can give a try to ClojureWerk's Buffy : https://github.com/clojurewerkz/buffy.
Buffy is a Clojure library for working with binary data, writing complete binary protocol implementations in Clojure, storing complex data structures in an off-heap cache, reading binary files and doing everything you would usually do with ByteBuffer.
It's very neat if your binary data is structured as you can define complex composite types and frames depending on structure types, even decode UTF.

Emacs & LaTeX: amount of columns for tables/matrices/arrays

Would anyone have a suggestion how to go about having for:
\begin{array}{cc}
Lorem & Ipsum \\
More & Stuff \\
\end{array}
Where adding or removing a c, l or r in the part after array would add or remove the & from all lines in the array environment.
Basically the same trick could then be applied to matrices or table environments.
At the least I'd be interested in how others go about this "easy-to-go-wrong", "hard-to-efficiently-alter" task.

I usually generate the tables from a different format (tab separated values or org-mode tables) in which such operations are simpler.

This is not exactly the answer, but this is how I was doing it:
Align on &, for example: C-x . &.
Select the entire column I need using regular selection commands.
Cut a rectangular area by using C-x r k.
This is not super automatic, but given some exercise isn't really a hurdle, except, perhaps, if you have to re-format some old document and make a lot of changes all at once.
EDIT
(defun latex-merge-next-column (start end column)
"Works on selected region, removes COLUMN'th ampersand
in every line in the selected region"
(interactive "r\nnColumn to merge: ")
(labels ((%nth-index-of
(line)
(let ((i -1) (times 0))
(while (and (< times column) i)
(setq i (position ?\& line :start (1+ i))
times (1+ times))) i)))
(let ((region (split-string (buffer-substring start end) "\n"))
amp-pos
replacement)
(dolist (line region)
(setq amp-pos (%nth-index-of line)
replacement
(cons (if amp-pos
(concat (subseq line 0 amp-pos)
(subseq line (1+ amp-pos)))
line) replacement)))
(kill-region start end)
(insert (mapconcat #'identity (reverse replacement) "\n")))))
This would work on the selected region and remove the n'th ampersand in every line. You could bind it to some key that is comfortable for you, say:
(global-set-key (kbd "C-c C-n") 'latex-merge-next-column)
Then C-c C-n 2 would remove every second ampersand in the selected lines.

As suggested you can make a YASnippet that according to the amount of letters in the second argument array automatically adds the appropriate amount of &s to the first row of the array:
# -*- mode: snippet -*-
# name: array
# key: arr
# expand-env: ((yas/indent-line 'fixed))
# --
\begin{array}{${1:cc}}$0
${1:$
(let ((row ""))
(dotimes (i (- (string-width yas/text) 1) row)
(setq row (concat row "& "))))
}\\\\
\end{array}
The manual exemplifies this technique. The line with (yas/indent-line 'fixed) is to avoid AUCTeX indenting the row. The reason for placing the exit point of the snippet ($0) at the end of the declaration of the array rather than at the beginning of the first row is that when placed at the beginning of the first row the exit point does not behave as expected.
The following snippet will also add as many rows as there are columns:
# -*- mode: snippet -*-
# name: array
# key: arr
# expand-env: ((yas/indent-line 'fixed))
# --
\begin{array}{${1:cc}}$0
${1:$
(let ((row "") (allrows ""))
(dotimes (i (- (string-width yas/text) 1))
(setq row (concat row "& ")))
(dotimes (i (string-width yas/text) allrows)
(setq allrows (concat allrows row "\\\\\\\\\n"))))
}\end{array}
A problem with this snippet is that it adds \\ even if there only one column but such arrays may be rare.
There seems to be problems with adding lisp comments to embedded lisp code in snippets so I simply add a commented version of only the lisp code to explain it:
;; Make an empty row with as many columns as symbols in $1 (the $1 in
;; the snippet which is what yas/text refer to)
(let ((row "") (allrows ""))
;; Make an empty row with as many columns as symbols in $1
(dotimes (i (- (string-width yas/text) 1))
(setq row (concat row "& ")))
;; Make as many rows as symbols in $1
(dotimes (i (string-width yas/text) allrows)
(setq allrows (concat allrows row "\\\\\\\\\n"))))

Building on the solution by #wvxvw, how about just using M-x align-current in the tabular/matrix/array environment and then manipulating using the block selection/insertion commands? This seems to work intelligently with escaped ampersands. I find it useful to disable wrapping during this operation. I don't find this hard to edit at all, as relatively regular re-alignment makes everything quite readable.

Reading an array from a text file in Common Lisp

I am trying to read data (which is actually an array) in Lisp from a text file.
I tried to use with-open-file and read-line stuff but could not achieve my goal. What I am looking for is equivalent to doing data=load('filename.txt') in MATLAB, so that I get an array called data which has loaded the whole information in filename.txt.
The text file will be in a format like
1.0 2.0 3.0 ...
1.5 2.5 3.5 ...
2.0 3.0 4.0 ...
.....
The size may also vary. Thanks a lot in advance.

The basic way to do that is to use with-open-file for getting the input stream, read-line in a loop to get the lines, split-sequence (from the library of the same name) to split it into fields, and parse-number (from the library of the same name) to transform the strings into numbers. All libraries mentioned are available from Quicklisp.
EDIT: Just to get you started, this is a simple version without validation:
(defun load-array-from-file (filename)
(with-open-file (in filename
:direction :input)
(let* ((data-lol (loop :for line := (read-line in nil)
:while line
:collect (mapcar #'parse-number:parse-number
(cl-ppcre:split "\\s+" line))))
(rows (length data-lol))
(columns (length (first data-lol))))
(make-array (list rows columns)
:initial-contents data-lol))))
You should add some checks and think about what you want to get in case they are not fulfilled:
Are the rows all the same length?
Are all fields valid numbers?

Assuming your file follows the formatting pattern you gave in your question: a sequence of numbers separated with white spaces, this is a quick snippet that should do what you want.
(defun read-array (filename)
(with-open-file (in filename)
(loop for num = (read in nil)
until (null num)
collect num)))

Another approach is to leverage the lisp reader to parse the data in the text file. To do this, I'd probably convert the entire file into a string first, and then call
(eval (read-from-string (format nil "~a~a~a" "(initial wrapper code " str ")")))
For example, if you wanted to read in a data file that is all numbers, delimited by whitespace/newlines, into a list, the previous command would look like:
(eval (read-from-string (format nil "~a~a~a" "(list " str ")")))

I followed Svante's advice. I just needed a single column in the text file, this is what I am using for this purpose.
(defun load_data (arr column filename)
(setf lnt (first (array-dimensions arr)))
(with-open-file (str (format nil "~A.txt" filename) :direction :input)
(loop :for i :from 0 :to (1- lnt) :do
(setf (aref arr i 0) (read-from-string (nth (1- column) (split-sequence:SPLIT-SEQUENCE #\Space (read-line str))))))))
Thank you all for your help.

Stuck in a Clojure loop, need some guidance

I am stuck in a Clojure loop and need help to get out.
I first want to define a vector
(def lawl [1 2 3 4 5])
I do
(get lawl 0)
And get "1" in return. Now, I want a loop that get each number in the vector, so I do:
(loop [i 0]
(if (< i (count lawl))
(get lawl i)
(recur (inc i))))
In my mind this is supposed to set the value of i to nil, then if i is lower then the count of the lawl vector, it should get each lawl value and then increase the i variable with 1 and try again, getting the next value in the vector.
However, this does not work and I have spent some time trying to get it working and are totally stuck, would appreciate some help. I have also tried changing "if" to "when" with the same result, it doesn't provide any data the REPL just enters a new line and blink.
EDIT: Fixed the recur.

You need to consider what is "to get each lawl value" supposed to mean. Your get call does indeed "get" the appropriate value, but since you never do anything with it, it is simply discarded; Bozhidar's suggestion to add a println is a good one and will allow you to see that the loop does indeed access all the elements of lawl (just replace (get ...) with (println (get ...)), after fixing the (inc) => (inc i) thing Bozhidar mentioned also).
That said, if you simply want to do something with each number in turn, loop / recur is not a good way to go about it at all. Here are some others:
;;; do some side-effecty thing to each number in turn:
(dotimes [i (count lawl)]
(println (str i ": " (lawl i)))) ; you don't really need the get either
;; doseq is more general than dotimes, but doesn't give you equally immediate
;; acess to the index
(doseq [n lawl]
(println n))
;;; transform the lawl vector somehow and return the result:
; produce a seq of the elements of lawl transformed by some function
(map inc lawl)
; or if you want the result to be a vector too...
(vec (map inc lawl))
; produce a seq of the even members of lawl multiplied by 3
(for [n lawl
:when (even? n)]
(* n 3))
This is just the beginning. For a good tour around Clojure's standard library, see the Clojure -- Functional Programming for the JVM article by Mark Volkmann.

(recur (inc)) should be (recur (inc i))
Even so this code will just return 1 in the end, if you want a listing of the number you might add a print expression :-) Btw index based loops are not needed at all in scenarios such as this.
(loop [list [1 2 3 4 5] ]
(if (empty? list)
(println "done")
(do
(println (first list))
(recur (rest list)))))

OK, I'm about 10-1/2 years too late on this, but here goes:
The problem here is a pretty common misunderstanding of how the arguments to the if function are used. if takes three arguments - the condition/predicate, the code to be executed if the predicate is true, and the code to be executed if the predicate is false. In this case both of the true and false cases are supplied. Perhaps if we fix the indentation and add some appropriate comments we'll be able to see what's happening more easily:
(loop [i 0]
(if (< i (count lawl))
(get lawl i) ; then
(recur (inc i)))) ; else
So the problem is not that the code gets "stuck" in the loop - the problem is that the recur form is never executed. Here's how the execution flows:
The loop form is entered; i is set to 0.
The if form is entered.
The predicate form is executed and found to be true.
The code for the then branch of the if is executed, returning 1.
Execution then falls out the bottom of the loop form.
Right now I hear people screaming "Wait! WHAT?!?". Yep - in an if form you can only have a single form in the "then" and "else" branches. "But...THAT'S STUPID!" I hear you say. Well...not really. You just need to know how to work with it. There's a way to group multiple forms together in Clojure into a single form, and that's done by using do. If we want to group (get lawl i) and (recur... together we could write it as
(loop [i 0]
(if (< i (count lawl))
(do
(get lawl i) ; then
(recur (inc i))
)
)
)
As you can see, we have no "else" branch on this if form - instead, the (get... and (recur... forms are grouped together by the (do, so they execute one after the other. So after recurring its way through the lawl vector the above snippet returns nil, which is kind of ugly. So let's have it return something more informative:
(loop [i 0]
(if (< i (count lawl))
(do
(get lawl i) ; then
(recur (inc i)))
(str "All done i=" i) ; else
)
)
Now our else branch returns "All done i=5".

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Read line-by-line for big files - file

Try doseq: (defn readFile [file] (with-open [rdr (clojure.java.io/reader file)] (doseq [line (line-seq rdr)] (println line))))

Related

How can I succesfully re-write a file appending an element to a list using Common Lisp?

How to read a whole binary file (Nippy) into byte array in Clojure?

Emacs & LaTeX: amount of columns for tables/matrices/arrays

Reading an array from a text file in Common Lisp

Stuck in a Clojure loop, need some guidance

Categories

Resources