Awk - Separate one .txt file to files by condition - file

I have one problem, I would like to separate one file by condition to more files.
INPUT: One text file
variable chrom=chr1
1000 10
1010 20
1020 10
vriable chrom=chr2
1000 20
1100 30
1200 10
OUTPUT: two files for this example.
chr1.txt
variable chrom=chr1
1000 10
1010 20
1020 10
chr2.txt
variable chrom=chr2
1000 20
1100 30
1200 10
So, the separator condition if row starts with chrom=chr$i (i={1..22}) => separate to other text file.
Thank you

Something along these lines:
awk 'BEGIN { filename="unknown.txt" } /^variable chrom=/ { close(filename); filename = substr($0, index($0, "=") + 1) ".txt"; } { print > filename }'
Where the awk code is
BEGIN { filename="unknown.txt" } # default file name, used only if the
# file doesn't start with a variable chrom=
# line
/^variable chrom=/ { # in such a line:
close(filename) # close the previous file (if open)
# and set the new filename
filename = substr($0, index($0, "=") + 1) ".txt" filename
}
{ print > filename } # print everything to the current file.
The basic algorithm is very straightforward: Read file linewise, change filename when you find a line that starts a new section, always print the current line to the current file, so the devil is in the detail of isolating the file name from the marker line. The
filename = substr($0, index($0, "=") + 1) ".txt"
approach is simplistic but serviceable for the example you showed: It takes everything after the = and attaches .txt to get the file name. If your marker lines are more complicated than variable chrom=filenamestub, this will have to be amended, but in that case I could only guess your requirements and would probably guess wrong.

If you know how many lines there are between, you could use
split -l 4 textfile.txt
This will split the textfile every 4th line it finds, making the files xaa and xab, and so on.

Related

Read file txt with lua

A simple question. I have 1 file test.txt in userPath().."/log/test.txt with 15 line
I wish read first line and remove first line and finally file test.txt with 14 line
local iFile = 'the\\path\\test.txt'
local contentRead = {}
local i = 1
file = io.open(iFile, 'r')
for lines in file:lines() do
if i ~= 1 then
table.insert(contentRead, lines)
else
i = i + 1 -- this will prevent us from collecting the first line
print(lines) -- just in case you want to display the first line before deleting it
end
end
io.close(file)
local file = io.open(iFile, 'w')
for _,v in ipairs(contentRead) do
file:write(v.."\n")
end
io.close(file)
there must be other ways to simplify this, but basically what I did in the code was:
Open the file in reading mode, and store all lines of text except the first line in the table contentRead
I opened the file again, but this time in Write mode, causing the entire contents of the file to be erased, and then, I rewrote all the contents stored in the table contentRead in the file.
Thus, the first line of the file was "deleted" and only the other 14 lines remained

How can I go about filtering a variable or file in autohotkey

I am trying to filter specific information to variable via parsing the clipboard but I need some help doing this.
Loop, parse, clipboard, `n, `r
{
If A_LoopField contains XYZ
;Copy whatever text is found 2 or 3 lines below into file but continue on.
}
Here is an example of whats
Clipboard =
(
Line 1 - Blank
Line 2 - XYZ Some text telling my script to copy line 4 and so on
Line 3 - Blank
Line 4 - "Text to be copied"
Line 5 - Blank
Line 6 - XYZ Some text telling my script to copy lines 8 and so on
Line 7 . . .
)
Not sure if something like this is what you are looking for
cb =
(LTrim
Line1
Line2
copy:5
Line4
Line5
Line6
copy:10
Line8
Line9
Line10
Line11
Line12
)
copied := []
Loop, parse, cb, `n, `r
{
pos := (v:=strSplit(A_LoopField, ":")[2]) ? v:pos
if (pos && A_Index >= pos){
copied.push(A_LoopField)
}
}
for k, v in copied
{
msgBox % v
}

How do you count number of characters from each lines then add them all up?

I have given a question to write a function "that returns a count of the number of characters in the file whose name is given as a parameter."
So if a file called "data.txt" contains "Hi there!" and is printed by using my codes from below, it will return value of 10. (which is correct)
"""Attemping Question 7.
Author: Ark
Date: 28/04/2015
"""
def file_size(filename):
"""extracts word from a line"""
filename = open(filename, 'r')
for line in filename:
result = len(line) #count number of characters in a line.
return result
However, let say I have made another file called "data2.txt" and it contains
EEEEE
DDDD
CCC
BB
A
If I print this out it would give the value of 6. So, my challenge starts here.. what can I do with my coding to read the lines and add them all up?
print(file_size("data2.txt"))
expected 16 words (?)
You must sum the lengths of the lines, right now you return the length of the very first line.
Also, you must strip a trailing newline if it's there. This should work:
def character_count(filename):
with open(filename) as f:
return sum(len(line.rstrip("\n")) for line in f)

Find a list of max values in a text file using awk

I am new to awk and I cannot figure out the correct syntax for the task I am working on.
I have a text file which looks something like this (the content is always sorted but is not always the same, so I cannot hard code the index of the array):
27 abc123
27 abd333
27 dce123
23 adb234
21 abc789
18 bcd213
So apparently the max is 27. However, I want my output to be:
27 abc123
27 abd333
27 dce123
and not the first row only.
The second column is just there, my code always sorts the text file based on the first column.
My code right now set the max as the first value (27 for example), and as it reads through the lines, it stores only the rows with the max values in an array and eventually print out the output.
awk 'BEGIN {max=$1} {if(($1)==max) a[NR]=($0)} END {for (i in a) print a[i]}' file
You can't read fields in a BEGIN block, since it's executed before the file is read.
To find the first record, use the pattern NR == 1. NR is the number of the current record. To find the other records, just check whether $1 equals the max value.
NR == 1 { max = $1 }
$1 == max { print }
Since your input is always sorted, you can optimise this program by exiting after reading all the records with the max value:
$1 != max { exit }

AutoHotkey's Loop (read file contents) issues related to "+" symbol

Referring to Loop (read file contents), a quite strange thing happens every time I use a code like this one to run a script:
^+k::
{
Gosub, MySub
}
Return
MySub:
{
Send, +{Enter}
Loop, read, C:\MyFile.txt
{
temp = %A_LoopReadLine%
Send, %temp%
Send, +{Enter}
}
}
Return
MyFile.txt is a simple text file where sometimes the "plus" symbol (+) is used together with normal letters and numbers.
Despite of this, however, what I see if I run the hotkey on an empty text file, either a Notepad or Microsoft Word blank sheet, is that every + is replaced by an underscore (_), an exclamation mark (!) or a question mark (?). I've seen an occurrence with a dollar symbol ($) replacement, too.
I tried to debug it printing on screen a message box with
MsgBox, %temp%
before sending text and it shows the original content of MyFile.txt perfectly.
Thus the issue should be on Send rather than on file reading.
The content of my file is something like this (repeated for about 20 rows more):
+---------------------------------
120001267381 ~ TEXT 0 10/20/18 VARIABLE word text -> numbers: 17,000 x 108.99 | 109.26 x 15,000 /// number = +5.500% some text
+---------------------------------
120001267381 ~ TEXT 0 10/20/18 VARIABLE word text -> numbers: 17,000 x 108.99 | 109.26 x 15,000 /// number = +5.500% some text
+---------------------------------
120001267381 ~ TEXT 0 10/20/18 VARIABLE word text -> numbers: 17,000 x 108.99 | 109.26 x 15,000 /// number = +5.500% some text
+---------------------------------
120001267381 ~ TEXT 0 10/20/18 VARIABLE word text -> numbers: 17,000 x 108.99 | 109.26 x 15,000 /// number = +5.500% some text
+---------------------------------
What can be the cause of this?
Found the answer: due to the fact that + symbols read from my file are sent like pressing the Shift key, the output is amended by the pressing of such a key instead of sending the original symbol present in file.
In order to send the original content of my file without triggering special hotkeys, I have to use SendRaw instead of Send, like in this example:
^+k::
{
Gosub, MySub
}
Return
MySub:
{
Send, +{Enter}
Loop, read, C:\MyFile.txt
{
temp = %A_LoopReadLine%
SendRaw, %temp%
Send, +{Enter}
}
}
Return
Here's an updated version that pastes using CTRL-V instead of Send to "retype" rows of data:
^+k::
{
Gosub, MySub
}
Return
MySub:
{
Send, +{Enter}
Loop, read, C:\MyFile.txt
{
temp = %A_LoopReadLine%
Clipboard = %temp% ; Write to clipboard
Send, ^v+{enter} ; Paste from clipboard
Sleep 10
; Short delay so it doesn't try to paste again before the clipboard has changed
; This check can get a lot more complex, but just increase it if 10 doesn't work
}
}
Return

Resources