vim, reformat text to initializers - c

I've a big file with lines that look like
2 No route to specified transit
network
3 No route to destination
i.e. a number at the start of a line followed by a description.
And I'd like to transform that for use as a struct initializer
{2,"No route to specified transit
network"},
{3,"No route to destination"},
How would I do this ?

Try
:%s/^\(\d\+\)\s\(.*\)$/{\1, "\2"},/
This uses search-and-replace and searches for a line starting with a digit, followed by whitespace, followed by arbitrary text until the end of the line. This is replaced by the pattern you specified.
Or, using “more magic” (thanks to Al in the comments):
:%s/\v^(\d+)\s(.*)$/{\1, "\2"},/

Related

ksh: remove last extension from a multiple extension filename

I have a filename in the format dir1/dir2/filename.txt.org and I like to rename this to dir1/dir2/filename.txt . how can this be done. I tried 'cut' with '.' separator but it also removes .txt
You can try korn shell variable expansion formats, instead of using a subprocess (e.g. cut) . This can be much faster.
example:
var1=dir1/dir2/filename.txt.org
var2=${var1%.*}
If you now print $var2 its value will be dir1/dir2/filename.txt
The % tells it to delete the smallest matching rightmost match for .* (which means anything following the rightmost period character).
${variable%pattern} - return the value of variable without the smallest ending portion that matches pattern.
Other variable expansion formats are available, it is worthwhile to study the docs.

How to avoid sub folders in snowflake copy statement

I have a requirement to exclude certain folder from prefix and process the data in snowflake (Copy statement)
In the below example I need to process files under emp/ and exclude files from abc/
Input :
s3://bucket1/emp/
E1.CSV
E2.CSV
/abc/E11.csv
s3://bucket1/emp/abc/ - E11.csv
Output :
s3://bucket1/emp/
E1.CSV
E2.CSV
Is there any suggestion around pattern to handle this ?
With the pattern keyword you can try to exclude certain files. However when using the pattern matching with the NOT syntax, you exclude any file with any of the characters.
Assuming your stage URL is defined as s3://bucket1/emp/
LS #MY_STAGE pattern = '[^abc].*';
Excludes anything starting with a, b, or c
LS #MY_STAGE pattern = '[^a][^b][^c][^\\/].*';
Excludes anything where:
The first character is a, OR
The second character is b, OR
The third character is c, OR
The fourth character is a forward slash /
Edit
After testing with Sharvan's example. Here is what I've found:
Doesn't work:
ls #my_stage PATTERN='^((?!/abc/).)*$'; because the first forward slash is duplicated as part of the stage URL (it is automatically appended to the stage URL if not present)
Works: ls #my_stage PATTERN='^((?!abc/).)*$'; because the first forward slash is removed
Updated as the forward slash does not need to be escaped
Snowflake does not support backreferences (per their documentation) but there is no mention of lookaheads or lookbehinds, which I thought was un-supported.
https://docs.snowflake.net/manuals/sql-reference/functions-regexp.html#backreferences
Use this to exclude the prefix pattern
ls #stage PATTERN='^((?!/abc/).)*$'

How to modify word2vec code to build embedding for tab-delimited sequence of phrases?

Given text file with lines as follows:
Phrase foo\tPhrase bla\tPhrase blabla\t...
Phrase bar\tPhrase blabla\tPhrase blablabla\t...
where each text line is a tab-delimited sequence of of phrases, which can contain space and other special characters. We are interested in embedding at phrase level, NOT word level.
The current word2vec.c support "space", "tab", "new line" as delimiters. How to disable "space" and enable only "tab" and "new line" as delimiters in word2vec.c in this case?
I got word2vec.c from Tomas Mikolov GitHub
The line https://github.com/tmikolov/word2vec/blob/20c129af10659f7c50e86e3be406df663beff438/word2vec.c#L80 defines the delimiters in word2vec.c; if you're compiling that file, you could edit that line & re-compile to make it behave differently.
But, it'd be easier and more robust (if in fact you're using some other word2vec implementation) if you simply pre-processed your text to transform it into the expected form. For example, you might change all spaces ' ' to underscores '_' (or some other plug character, if any original underscores are important to keep distinct).
When later interpreting the results, remember to apply the same space-to-underscore transform on lookups, or reverse it by replacing underscore-with-space to display results.

Changing backslashes to forward slashes changes file size

I have two small to medium sized files (2k) that are for all intents and purposes identical. The second file is the result of the first file being duplicated and replacing backslashes with forward slashes. The new file is bigger by 80 bytes (or one byte per line).
I did this with a simple batch script,and at first I thought the script might have unintentionally added some spaces or other artifacts. Or maybe the fact that their extensions are different has something to do with it (one has a tmp extension and the other has a lst extension).
From an editor, I replaced all forward slashes in the new file with backslashes and saved it without changing the extension.
And, hey guess what? The files were the same size again.
Now, before this is written off as a random fluke, I also see the same behavior exhibited in three other pairs of files (in other words six files) created in the same manner as the first. They are all one byte bigger per line in the file. The largest is about 12k bytes, and the smallest is about 2k.
I wouldn't think it has anything to do with escaping because I am on a Windows box using the Windows 7 cmd.exe shell.
Also one other thing. I tried the following:
echo \\\\\ >> a.txt
echo ///// >> b.txt
The files matched in size (7 bytes)
Does anyone have an explanation for this behavior?
I would suggest opening the files with an editor like Notepad++ that shows the type of linefeed (Windows/Mac/Unix). This is most likely your problem if the file size differs 1 byte per line.
Notepad++ can show line endings as small CR/LF symbols (View -> Show Symbol -> Show End of Line) and convert between the Windows/Mac/Unix line endings (Edit -> EOL Conversion).
Both Unix and Mac systems are usually storing files with an one byte line ending (Mac: CR, Unix: LF), Windows uses two bytes (CR LF).
Depending on the programs your batch scripts use, this might occur even though your system is a pure Windows box. The reason you don't get a difference when using an editor is that editors usually keep the file's original line endings.
Okay. I just solved it. #schnaader pointed me in the right direction. It actually has nothing to do with the forward or backslashes.
What happened is that my script added one character of trailing white space to each line. Why the file again became the same size after I reverted the slashes is because the editor I used to find and replace (Komodo Edit) is set up to automatically trim trailing white space on file save.
Funny.

C How do i specify a POSIX regex that begins in a blank line and ends in a blank line?

I am trying to write code to scan a file and produce a "match!" message when the tool reads a certain line of code preceded and followed by blank lines. The line I am interested in matching is:
Appliance Version 3.1.2
Using regex.h, I have a simple tool that compiles my regex pattern then executes it against every line in the file to search for a match. The basic functionality of the tool is fine: I am able to get it to successfully search for various regex matches. Trouble arises when I try to match a regex containing a blank line before and after the above line of text. Here is my precompiled regex:
[[:space:]]+\n^Appliance Version [[:alnum:]]$\n
I have tried a series of different combinations similar to this, and nothing seems to work. I think it might have to do with \n in which case I would need to figure out a new way to specify the two blank lines. Any insight of POSIX regex would be greatly appreciated!
Looking at your regex, it looks like it is trying to match
Appliance Version [[:alnum:]]
at the end of a line ($). That would be matched by
Appliance Version 3
(3 is an instance of [:alnum:]), but not by
Appliance version 33
([[:alnum:]] only matches one character), and much less by
Appliance version 3.1.2
(the above problem, and also . is not an instance of [:alnum:])
So at a minimum you need to change [[:alnum:]] to [.[:alnum:]]* (or some such).
In addition, your use of ^ and $ is redundant with the explicit \n, but nothing in the regex requires the match to be preceded or followed by a blank line. For example, [[:space:]]\n would happily be matched with the line:
Not a blank line, but with a blank at the end: \n
(where I've written the \n explicitly to show the blank character at the end of the line.)
Matching blank lines
A single blank line is matched with ^[[:space:]]*$. That does not match the newlines at either end. If you want to match a blank line before something, use: ^[[:space:]]*\nSOMETHING. To match a blank line after something: SOMETHING\n[[:space:]]*$. Or, if you really want a blank line before and after: ^[[:space:]]*\nSOMETHING\n[[:space:]]*$. (But that won't match if SOMETHING happens to be the first line of the input, for example. Or the last line.)
As #rici notes, you cannot combine \n^ to match two blank lines -- the markers ^ and $ match a position, not a literal \n character.
To match a blank line, use \n\n, or -- better because you probably don't want to do anything with the hard return that ends the line above, (?<=\n)\n at the start. You can leave the \n\n at the end, though.

Resources