Octave - Adding '\n' to String Array is Not Creating a New Line - arrays

I want to change ',' character to '\n' and save it to the text file
All files are in this format:
546,234,453,685,.....,234
I want to make it like:
546
234
453
685
...
234
My initiation to this problem is like this:
fid=fopen(files{i});
strArr=fscanf(fid,'%s');
newstrArr=strrep(strArr,',','\n');
% Take each .txt input
for j=1:length(newstrArr)
Array=[Array newstrArr(j)];
endfor
Let me explain step by step:
1st I open the current text file
fid=fopen(files{i});
2nd I find the strings in text file
strArr=fscanf(fid,'%s');
Please Note that you can't replace %s with %d. (Correct me if I am wrong)
3rd I replace commas with newline character
newstrArr=strrep(strArr,',','\n');
4th I add each character to a new array with for loop
for j=1:length(newstrArr)
Array=[Array newstrArr(j)];
endfor
However When I display, using;
disp(Array);
I have this output
How can I properly replace the commas with newlines?
Regards

The issue is that you are inserting a literal '\n' (the characters \ and n) and not a newline character. This is because in Octave, a single-quote enclosed string ignores escape sequences. If you want Octave to respect escape sequences you could use a double-quoted string which will convert \n into a newline.
strrep(strArr, ',', "\n");
Or if you want your code to be MATLAB-compatible, you'll want to instead use char(10) (an actual new-line character). This is because MATLAB does not have double-quote enclosed strings.
output = strrep(strArr, ',', char(10));
Another option would be to split your input at the , and use sprintf to add the newlines (it'll treat \n as a newline)
values = strsplit(strArr, ',');
output = sprintf('%s\n', values{:});
If you just want to save each entry to a new line in a file, you can use fprintf instead.
values = strsplit(strArr, ',');
fout = fopen('output.txt', 'w');
fprintf(foug, '%s\n', values{:});
fclose(fout);

If you really just want to replace "," with newline simply do
in = fileread ("yourfile");
out = strrep (in, ",", "\n")
out = 546
234
453
685
234
Btw, see the difference between "\n" (in GNU Octave a newline) and '\n' (literally \n)

Another option is to use regexprep(), this has the advantage of being MATLAB compatible. Assuming that the newline convention you want is \n, then
regexprep('123,456,789',',','\n')
ans = 123
456
789
When output to a file via fprintf() the result looks like
123
456
789
provided the text editor understands the newline convention.

Related

Flex string recognition "unrecognised rule" error

Im trying to create a string recognition rule to run in flex,the string can consist of escape characters(\n , \t , \r , \ , " , '), symbols( -, +, *, /, :, _, $, !, #, #, &, ~, ^, (, ) ) and a-zA-Z0-9 characters,i have tried many variations of the code below,but i keep getting the same error mentioned above.
ESCAPECHAR [\n] | [\t] | [\r] | [\] | ['] | ["]
SYMBOLS [-+*/:_$!##&~^()]
CHARACTERS [0-9a-zA-Z]
STRING ("({ESCAPECHAR} | {SYMBOLS} | {CHARACTERS})*") | ('({ESCAPECHAR} | {SYMBOLS} | {CHARACTERS})*')
You would do well to read the Flex manual chapter on patterns syntax. It is not very long, and it gives a complete description of the syntax of Flex patterns.
Here are a few of the errors you have made:
Flex patterns cannot include unquoted whitespace (unless you put them inside of a subexpression marked with the x flag). So
[\n] | [\t] | [\r] | [\] | ['] | ["]
is invalid.
Also, the \ is used to indicate that:
the following letter is a code for a control character (so that \n is a newline character), or
the following punctuation symbol should not be given special significance.
So in [\], the \ indicates that the following ] should be treated as an ordinary character, instead of being the end of a character class, which means that the character class will continue up to the next ]. Space characters inside a character class are considered to be quoted, so the character class consists of the characters ], space, |, [ and '. (Flex lets you repeat characters inside a character class, so it won't complain about the fact that there are two space characters.) You probably meant [\\].
Anyway, you should write character classes in the same way you wrote the other character classes, as a series of characters or escaped codes inside [ and ]:
[\n\t\r\\ '"]
Flex lets you quote characters by surrounding them with quotation marks, so that `"({ESCAPECHAR} | {SYMBOLS} | {CHARACTERS})*" is treated as a single literal string, which must be matched literally in the text. You probably intended the quotation marks to be ordinary characters, so you should have escaped them or put them into a single-character character class:
["]({ESCAPECHAR}|{SYMBOLS}|{CHARACTERS})*["]
Again, it is necessary to remove the whitespace from the pattern.
I assume that your intention was to allow "escape characters" to appear in a string only if they are actually escaped. Your {ESCAPECHAR} macro expands to a collection of actual characters, so that it includes newline, tab and carriage return characters. It also includes quote and apostrophe, which really should be reserved for terminating the string literal. Probably, what you meant was to allow escape codes if they are preceded with a \ (as with C or, as mentioned above, flex itself). In that case, what you really need to write is
ESCAPECHAR \\[ntr'"]
(That is, a \\, followed by exactly on of the characters n, t, r, ', ".) Even that is not precise, though: It does not allow the use of \\ to indicate a single \, and it forces the user to write "Don\'t just copy code." and '\"', both of which would normally be written without the backslash escapes.

Python joining a list by "\"

Lets say I have a list of elements.
l = ["xf3", "x03", "x8c"] etc.
Now I would like to join the elements inside my list with a "\". I tried r"\".join(l) but it didn't work.
\ is used to escape 'special' characters, hence a Python string can not terminate with a single \ because it escapes the closing quote.
You have to escape it by using a second \, ie '\\'.join(l)
l = ["xf3", "x03", "x8c"]
'\\'.join(l)
The important part is to escape the '\\' as inn Python Strings:
the backslash "\" is a special character, also called the "escape"
character. It is used in representing certain whitespace characters:
"\t" is a tab, "\n" is a newline, and "\r" is a carriage return. As well "\"
can be used to escape itself: "\" is the literal backslash character.
I'm assuming is what you actually want is to create a string containing those escaped characters. The easiest way I can think of is ast.literal_eval:
>>> import ast
>>> ast.literal_eval("'\\" + "\\".join(l) + "'")
'ó\x03\x8c'
This works by first creating a string of those strings joined by backslash characters (xf3\x03\x8c), surrounding those by quotes and adding the initial backslash ('\xf3\x03\x8c'), and finally, by evaluating it as a literal, to turn it from a length 12 string into a length 3 string.

Can't display unicode characters from file properly

I'm writing a script which should operate on words from a number of files which have unicode characters in a form of something\u0142somethingelse.
I use python 3 so I suppose after reading line \u0142 should be replaced by 'ł' character, but it isn't. I receive "something\u0142somethingelse" in console.
After manually copying "bad" output from console and pasting it to: print("something\u0142somethingelse") it is displayed correctly.
Problematic part of the script:
list_of_files = ['test/stack.txt']
for file in list_of_files:
with open(file,'r') as fp:
for line in fp:
print(line)
print("something\u0142somethingelse")
stack.txt:
something\u0142somethingelse
Output:
something\u0142somethingelse
somethingłsomethingelse
I experimented with utf-8 encoding when opening this file and really I'm out of ideas...
I think you can do what you want with ast.literal_eval. This uses the same syntax as the Python interpreter to understand literals: like eval but safer. So this works, for example:
a = 'something\\u0142somethingelse'
import ast
b = ast.literal_eval('"' + a + '"')
print '"' + a + '"'
print b
The output should be:
"something\u0142somethingelse"
somethingłsomethingelse

How do you count number of characters from each lines then add them all up?

I have given a question to write a function "that returns a count of the number of characters in the file whose name is given as a parameter."
So if a file called "data.txt" contains "Hi there!" and is printed by using my codes from below, it will return value of 10. (which is correct)
"""Attemping Question 7.
Author: Ark
Date: 28/04/2015
"""
def file_size(filename):
"""extracts word from a line"""
filename = open(filename, 'r')
for line in filename:
result = len(line) #count number of characters in a line.
return result
However, let say I have made another file called "data2.txt" and it contains
EEEEE
DDDD
CCC
BB
A
If I print this out it would give the value of 6. So, my challenge starts here.. what can I do with my coding to read the lines and add them all up?
print(file_size("data2.txt"))
expected 16 words (?)
You must sum the lengths of the lines, right now you return the length of the very first line.
Also, you must strip a trailing newline if it's there. This should work:
def character_count(filename):
with open(filename) as f:
return sum(len(line.rstrip("\n")) for line in f)

Handling CR line endings in Lua

I'm trying to read a file with CR line endings using the file:read method which seems to be acting up for some reason. The file contents look like this:
ABCDEFGH
12345
##
6789
I want it to behave consistently with all types of line endings. Every time I try to read the file, it returns the last line in the file concatenated with the any trailing characters from the previous lines that have a greater position than the position of the last character in the last line. Here's what I mean:
> file=io.open("test.lua", "rb")
> function re_read(openFile)
openFile:seek("set");
return openFile:read("*a");
end
> =re_read(file) -- With CR
67895FGH
> =re_read(file) -- With CRLF
ABCDEFGH
12345
##
6789
> =re_read(file) -- with LF
ABCDEFGH
12345
##
6789
>
As you can see, the string being returned is the last string plus 5 in the previous line and plus FGH from the first line. Any lines shorter than the last line are skipped.
My goal is to use the file:line() method to read the file line by line. My hope is that if a 'fix' for file:read is found then it can be applied to file:lines().
In the case with CR only, re_read actually works as expected: it returns the lines separated by CR. But when the interpreter displays it, it interprets the CR characters as "go back to the beginning of the line". So here is how the result changes line by line:
ABCDEFGH
12345FGH
##345FGH
67895FGH
EDIT: here it is character by character, with a "virtual cursor" (|).
|
A|
AB|
ABC|
ABCD|
ABCDEF|
ABCDEFGH|
|ABCDEFGH
1|BCDEFGH
12|CDEFGH
123|DEFGH
1234|EFGH
12345|FGH
|12345FGH
#|2345FGH
##|345FGH
|##345FGH
6|#345FGH
67|345FGH
678|45FGH
6789|5FGH
Proof:
> s = "ABCDEFGH\r12345\r##\r6789"
> =s
67895FGH
You could normalize your line endings with gsub then iterate over the product with gmatch.
local function cr_lines(s)
return s:gsub('\r\n?', '\n'):gmatch('(.-)\n')
end
local function cr_file_lines(filename)
local f = io.open(filename, 'rb')
local s = f:read('*a')
f:close()
return cr_lines(s)
end
for ln in cr_file_lines('test.txt') do
print(ln)
end

Resources