Leading Zeroes Getting Trimmed while loading data into excel using unix . My platform is MAC. Is there any way we can handle it in Unix without any manual effort.
Thanks
The leading zeroes are removed because as strings containing digits only are converted into numbers. Nothing stops you from changing the format the numbers to show leading zeroes or convert the column to text.
Convert numbers into strings with something like
cat inputfile | while read line
quoteLine=$(echo ${line}|sed 's/,/","/g')
# do not forget first and last quote, some ugly backslashes here
echo "\"${quoteLine}\""
done > outputfile.csv
In your case (TAB-separated), you can replace the first , in the sed command by a TAB. Difficult to see when you copy-paste, so you can do it after copying.
Related
We are trying to create a file format using the not character ¬ as the delimiter. We can't get Snowflake to work with a file delimited in this format. The documentation says multibyte delimiters are now supported. We've tried:
Just typing ¬ in the file format dialog
The hex code (permutations of 0xC2AC , 0xC20xAC etc)
The octal code 302 254 entered as permutations of \302254 etc
But whatever we try we get errors. Typing the delimiter straight it seems to think 0xC2 is the delimited and it gets confused by the second byte (0xAC). Using hex code or octal code gives an error about wrong number of columns. Any advice please?
Answer from Sergiu works perfectly:
For octal format use \302\254
I am trying to write code to scan a file and produce a "match!" message when the tool reads a certain line of code preceded and followed by blank lines. The line I am interested in matching is:
Appliance Version 3.1.2
Using regex.h, I have a simple tool that compiles my regex pattern then executes it against every line in the file to search for a match. The basic functionality of the tool is fine: I am able to get it to successfully search for various regex matches. Trouble arises when I try to match a regex containing a blank line before and after the above line of text. Here is my precompiled regex:
[[:space:]]+\n^Appliance Version [[:alnum:]]$\n
I have tried a series of different combinations similar to this, and nothing seems to work. I think it might have to do with \n in which case I would need to figure out a new way to specify the two blank lines. Any insight of POSIX regex would be greatly appreciated!
Looking at your regex, it looks like it is trying to match
Appliance Version [[:alnum:]]
at the end of a line ($). That would be matched by
Appliance Version 3
(3 is an instance of [:alnum:]), but not by
Appliance version 33
([[:alnum:]] only matches one character), and much less by
Appliance version 3.1.2
(the above problem, and also . is not an instance of [:alnum:])
So at a minimum you need to change [[:alnum:]] to [.[:alnum:]]* (or some such).
In addition, your use of ^ and $ is redundant with the explicit \n, but nothing in the regex requires the match to be preceded or followed by a blank line. For example, [[:space:]]\n would happily be matched with the line:
Not a blank line, but with a blank at the end: \n
(where I've written the \n explicitly to show the blank character at the end of the line.)
Matching blank lines
A single blank line is matched with ^[[:space:]]*$. That does not match the newlines at either end. If you want to match a blank line before something, use: ^[[:space:]]*\nSOMETHING. To match a blank line after something: SOMETHING\n[[:space:]]*$. Or, if you really want a blank line before and after: ^[[:space:]]*\nSOMETHING\n[[:space:]]*$. (But that won't match if SOMETHING happens to be the first line of the input, for example. Or the last line.)
As #rici notes, you cannot combine \n^ to match two blank lines -- the markers ^ and $ match a position, not a literal \n character.
To match a blank line, use \n\n, or -- better because you probably don't want to do anything with the hard return that ends the line above, (?<=\n)\n at the start. You can leave the \n\n at the end, though.
I have written some data to a file manually i.e. not by my application.
My code is reading the data char by char and storing them in different arrays but my program gets stuck when I insert the condition EOF.
After some investigation I found out that in my file before EOF there are three to four \n characters. I have not inserted them. I don't understand why they are in my file.
Want to remove those pesky extra characters? First, see how many of them there are at the end of your file:
od -c <filename> | tail
Then, remove however many characters you don't like. If it's 3:
truncate -s -3 <filename>
But overall, if it were me, I'd change my program to discard undesired newline characters, unless they're truly invalid according to the input file format specification.
It is very easy to add additional newlines to the end of a file in every text editor. You have to push the cursor around to see them. Open your file in your editor and see what happens when you navigate to the end, you'll see the extra newlines.
There is no such thing as an EOF character in general. Windows treats control-Z as EOF in some cases. Perhaps you are talking about the return value from some API that indicates that it has reached the end of file?
hi all
Suppose we have a text file (file1.txt)
file1.txt contains many words and spaces and enter characters (cR+LF).
I wanna to replace a specific word that follows with an enter character and replace it with only that word. I mean eliminating cr+lf character.
How ?
Thank you
i assume you're asking about how to do it programmatically.
LF and CR are characters and as such they have an ascii code assigned (10,13). you'll need to load the text file, copy it to a new buffer word by word and whenever you encounter the word you want to replace - check whether it is followed by 10,13 and just don't copy those characters if so.
then write the new buffer back to the file.
Use of regular expressions should make short work of this:
replace word\r\n with word
How this is exactly done depends on your environment / editor / tools. You mentioned cf + lf, which hints that you're using Windows.
If you use Notepad++ for example, it has builtin regex support and you can use these facilities to obtain your goal.
Update: I have tried this variant it works:
Download Vim for Windows.
Open your file in Vim.
In it, issue the following command:
%s/\v([[:digit:]]+NPN[[:alpha:]]+)\n/\1 /g
Explanation:
%s - work for all lines
\v - easier regex manipulation regarding backslashes
([[:digit:]]+NPN[[:alpha:]]+) - match some digits, then NPN, then letters and capture this
\n - match end of line
\1 - replace everything with first group and two spaces
g - do this many times for each line (this is basically optional)
If you want to convert CRLF to LF:
sed 's/.$//' # assumes that all lines end with CR/LF
If you want to remove CRLF altogether
cat file1.txt | tr '\n' ' ' # join the lines with a space
cat file1.txt | tr -d '\n' # join the lines without a space
You might have to convert the line endings to unix (CRLF to LF) first and then do the translation.
I was recently editing a Unicode-encoded text file that also includes Thai characters (alongside "normal" characters). For some reason, after each sequence of Thai characters, a new line appeared.
After some mucking around with C, trying to remove all newline characters, I fired up vim to inspect the file. Apparently, after each Thai character sequence, there appears a "^M" string (without quotes).
Why is this happening, and what's that "^M"? I've found that I can fix the problem by removing the last three characters from the Thai string, but there surely must be a more elegant way to fix this ...
This has nothing to do with the fact that you have some Thai characters in the file. The ^M ('carrot M') is the representation of a Microsoft (DOS) carriage return. Dos2unix the file to get rid of these before editing it in vim.