Replacing string within a file using bash + environmental variables - c

Let's say for simplicities sake I have a file (please forgive my useless pseudo code)
file.txt
std::string filename = "filename.txt"
double v_no = 2.0;
const int v_minor = 0; // < --- Target
std::string random_var1 = "Hello"
std::string random_var2 = "Hello 2"
int main()
{
// ..
}
And I have a bash file in the same directory - set_version.sh
I want to replace a string in this file with this script - specifically "v_minor = 0" with "v_minor = $VARIABLE" - in my case the variable will be an environmental set on a server.
So lets say it has been successfully run a couple of times. Now the string reads "v_minor = 2". I still want the same set_version.sh script to change 2 to whatever the variable is.
In the windows build of my software I have a batch file that changes "v_minor = %d" to "v_minor = %VERSION%".
My question is how I do something similar in bash? E.g. ignoring what the decimal is in the string and changing it to variable.
What I've got so far:
set_version.sh
#!/bin/bash
VERSION=75
sed -i '' 's/v_minor = %d/v_minor = $VERSION/g/' file.txt
Version var being set is just for testing purposes.
This returns error
sed: 1: "s/v_minor = %d/v_minor ...": bad flag in substitute command: '/'
I'm running Mac OS X Yosemite for this test.
Again, essentially %d can be any integer.
Thank you

That will work for you:
sed -i '' "s/v_minor = .*$/v_minor = $VERSION/g" file.txt
.*$ means till the end of that string.
Don't forget to use " " when operating with variables.

sed -i '' 's/v_minor = %d/v_minor = $VERSION/g/' file.txt
# ^
# /
# remove this slash ---

According to your description, I would suggest another easier way as follows (and simplicity will make less bugs...):
First, Change your target to
const int v_minor = V_MINOR; // < --- Target
Second, add an include line, anywhere before the target statement:
#include "version.h"
Third, write a script to generate the version.h similar to the followings:
#ifndef _VERSION_H_
#define V_MINOR 0 // <== this 0 is what you want to change.
#endif
Generate a script to output the said version.h is too simple (Just some fixed prints plus the target number). Thus, I don't provide it here.
Comparing to those possible error-prone sed-awk-perl solution, I prefer this simple solution.

Related

How can I replace two characters in a 40GB file in Unix?

I have two huge json files (20GB each) and I need to join them. The files have the following content:
file_1.json = [{"key": "value"}, {...}]
file_2.json = [{"key": "value"}, {...}]
The main problem, however, is that I need all dict to be in the same list. I tried to do this in python, but unfortunately, I don't have the memory to do this operation.
So, I thought maybe I could tackle this with unix commands, by replacing, in the first file, the ] for , (note that there is a space after the comma) and erasing [ from the second file. Then, I would join the two files with the cat unix command.
Is there a way for me to edit only the last 10 char in unix?
I tried to use echo and tr but I might be doing something wrong with the syntax.
You can very easily append to a file in place, i.e. add characters at the end without rewriting the data that's already there. With the right tools (truncate if your system has it), you can truncate a file in place, i.e. remove characters at the end without rewriting the data that's staying. With the right tools (dd, if you're feeling adventurous), you can replace a part of a file by a string of the same length, without rewriting the unchanged parts. On the other hand, you can't remove characters from the beginning or middle of a file without rewriting the file (with a few exceptions that aren't relevant here).
But anyway rewriting both files in place wouldn't help you that much. You will need to at least rewrite the content of the second file to append it to the first file.
If you don't need to keep the split files around, you can append the second file to the first file in place, after taking care of the middle punctuation. Remove the last ] character from the first file, as well as any following spaces and line breaks. Assuming that the first file ends in ] and a newline and you have GNU core utilities (e.g. non-embedded Linux):
truncate -s -2 file_1.json
Now you can add a comma and optionally a line break to the first file, and append the data from the second file without its first character.
echo , >>file_1.json
tail -c +2 file_2.json >>file_1.json
If you want to keep the original files unmodified, you can make a copy of the first file and truncate it. Or you can directly make a truncated copy of the first file (still assuming GNU coreutils):
head -c -2 file_1.json >concatenated.json
echo , >>concatenated.json
tail -c +2 file_2.json >>concatenated.json
If you're more comfortable with Python, you can do all of this in Python. Just don't read the whole file in one go, i.e. don't call read() or use readline() in a way that reads all the lines as once. Instead, read and process a single line at a time (if the lines are short) or a single block of data. Untested code:
with open('concatenated.json', 'wb') as out:
with open('file_1.json', 'rb') as inp:
buf = bytes(1024)
size = inp.seek(-len(buf), io.SEEK_END)
n = inp.readinto(buf)
m = re.search(rb']\s*\Z', buf)
stop_at = m.start()
inp.seek(0, io.SEEK_SET)
n = inp.readinto(buf)
total = n
while n > 0:
out.write(buf)
n = inp.readinto(buf)
total += n
if total > stop_at:
out.write(buf[:len(buf)-(total-stop_at)])
n = 0
out.write(b',')
with open('file_2.json', 'rb') as inp:
buf = bytes(1024)
n = inp.readinto(buf)
assert buf[0] == b'['
buf[0:1] = b'\n'
while n > 0:
out.write(buf)
n = inp.readinto(buf)

How to use C code variable inside system()

I am using C code with sed. I want to read lines in the interval 1-10,11-20 etc. to perform some calculation.
int i,j,m,n;
for(i=0;i<10;i++){
j=i+1;
//correction. m,n is modified which was incorrect earlier.
m=i*10;
n=j*10;
system("sed -n 'm,n p' oldfile > newfile");
}
Ouput.
m,n p
It looks the variable is not passed in system. Is there any way to do that?
Use sprintf to build the command line:
char cmdline[100];
sprintf(cmdline, "sed -n '%d,%dp' oldfile.txt > newfile.txt", 10*i+1, 10*(i+1));
puts(cmdline); // optionally, verify manually it's going to do the right thing
system(cmdline);
(This is vulnerable to buffer overflow, but if your command-line arguments are not too flexible, 100 bytes should be enough.)
You cannot replace part of a string literal in C. What you need is to
Form a string with patterns
Replace those patterns with proper values with formatted I/O functions.
sprintf()/snprintf() will be your friend in this. You can do something like (copying from pmg's comment)
char cmd[100];
snprintf(cmd, 100, "sed -n '%d,%dp' oldfile > newfile", 10*i+1, 10*(i+1));
system(cmd);

splittling a file into multiple with a delimiter awk

I am trying to split files evenly in a number of chunks. This is my code:
awk '/*/ { delim++ } { file = sprintf("splits/audio%s.txt", int(delim /2)); print >> file; }' < input_file
my files looks like this:
"*/audio1.lab"
0 6200000 a
6200000 7600000 b
7600000 8200000 c
.
"*/audio2.lab"
0 6300000 a
6300000 8300000 w
8300000 8600000 e
8600000 10600000 d
.
It is giving me an error: awk: line 1: syntax error at or near *
I do not know enough about awk to understand this error. I tried escaping characters but still haven't been able to figure it out. I could write a script in python but I would like to learn how to do this in awk. Any awkers know what I am doing wrong?
Edit: I have 14021 files. I gave the first two as an example.
For one thing, your regular expression is illegal; '*' says to match the previous character 0 or more times, but there is no previous character.
It's not entirely clear what you're trying to do, but it looks like when you encounter a line with an asterisk you want to bump the file number. To match an asterisk, you'll need to escape it:
awk '/\*/ { close(file); delim++ } { file = sprintf("splits/audio%d.txt", int(delim /2)); print >> file; }' < input_file
Also note %d is the correct format character for decimal output from an int.
idk what all the other stuff around this question is about but to just split your input file into separate output files all you need is:
awk '/\*/{close(out); out="splits/audio"++c".txt"} {print > out}' file
Since "repetition" metacharacters like * or ? or + can take on a literal meaning when they are the first character in a regexp, the regexp /*/ will work just fine in some (e.g. gawk) but not all awks and since you apparently have a problem with having too many files open you must not be using gawk (which manages files for you) so you probably need to escape the * and close() each output file when you're done writing to it. No harm doing that and it makes the script portable to all awks.

Search and replace a string as shown below

I am reading a file say x.c and I have to find for the string "shared". Once the string like that has been found, the following has to be done.
Example:
shared(x,n)
Output has to be
*var = &x;
*var1 = &n;
Pointers can be of any name. Output has to be written to a different file. How to do this?
I'm developing a source to source compiler for concurrent platforms using lex and yacc. This can be a routine written in C or if u can using lex and yacc. Can anyone please help?
Thanks.
If, as you state, the arguments can only be variables and not any kind of other expressions, then there are a couple of simple solutions.
One is to use regular expressions, and do a simple search/replace on the whole file using a pretty simple regular expression.
Another is to simply load the entire source file into memory, search using strstr for "shared(", and use e.g. strtok to get the arguments. Copy everything else verbatim to the destination.
Take advantage of the C preprocessor.
Put this at the top of the file
#define shared(x,n) { *var = &(x); *var1 = &(n); }
and run in through cpp. This will include external resources also and replace all macros, but you can simply remove all #something lines from the code, convert using injected preprocessor rules and then re-add them.
By the way, why not a simple macro set in a header file for the developer to include?
A doubt: where do var and var1 come from?
EDIT: corrected as shown by johnchen902
When it comes to preprocessor, I'll do this:
#define shared(x,n) (*var=&(x),*var1=&(n))
Why I think it's better than esseks's answer?
Suppose this situation:
if( someBool )
shared(x,n);
else { /* something else */ }
In esseks's answer it will becomes to:
if( someBool )
{ *var = &x; *var1 = &n; }; // compile error
else { /* something else */ }
And in my answer it will becomes to:
if( someBool )
(*var=&(x),*var1=&(n)); // good!
else { /* something else */ }

How to embed a Lua script within a C binary?

I've been getting spoiled in the shell world where I can do:
./lua <<EOF
> x="hello world"
> print (x)
> EOF
hello world
Now I'm trying to include a Lua script within a C application that I expect will grow with time. I've started with a simple:
const char *lua_script="x=\"hello world\"\n"
"print(x)\n";
luaL_loadstring(L, lua_script);
lua_pcall(L, 0, 0, 0);
But that has several drawbacks. Primarily, I have to escape the line feeds and quotes. But now I'm hitting the string length ‘1234’ is greater than the length ‘509’ ISO C90 compilers are required to support warning while compiling with gcc and I'd like to keep this program not only self-contained but portable to other compilers.
What is the best way to include a large Lua script inside of a C program, and not shipped as a separate file to the end user? Ideally, I'd like to move the script into a separate *.lua file to simplify testing and change control, and have that file somehow compiled into the executable.
On systems which support binutils, you can also 'compile' a Lua file into a .o with 'ld -r', link the .o into a shared object, and then link your application to the shared library. At runtime, you dlsym(RTLD_DEFAULT,...) in the lua text and can then evaluate it as you like.
To create some_stuff.o from some_stuff.lua:
ld -s -r -o some_stuff.o -b binary some_stuff.lua
objcopy --rename-section .data=.rodata,alloc,load,readonly,data,contents some_stuff.o some_stuff.o
This will get you an object file with symbols that delimit the start, end, and size of your lua data. These symbols are, as far as I know, determined by ld from the filename. You don't have control over the names, but they are consistently derived. You will get something like:
$ nm some_stuff.o
000000000000891d R _binary_some_stuff_lua_end
000000000000891d A _binary_some_stuff_lua_size
0000000000000000 R _binary_some_stuff_lua_start
Now link some_stuff.o into a shared object like any other object file. Then, within your app, write a function that will take the name "some_stuff_lua", and do the appropriate dlsym magic. Something like the following C++, which assumes you have a wrapper around lua_State called SomeLuaStateWrapper:
void SomeLuaStateWrapper::loadEmbedded(const std::string& embeddingName)
{
const std::string prefix = "_binary_";
const std::string data_start = prefix + embeddingName + "_start";
const std::string data_end = prefix + embeddingName + "_end";
const char* const data_start_addr = reinterpret_cast<const char*>(
dlsym(RTLD_DEFAULT, data_start.c_str()));
const char* const data_end_addr = reinterpret_cast<const char*>(
dlsym(RTLD_DEFAULT, data_end.c_str()));
THROW_ASSERT(
data_start_addr && data_end_addr,
"Couldn't obtain addresses for start/end symbols " <<
data_start << " and " << data_end << " for embedding " << embeddingName);
const ptrdiff_t delta = data_end_addr - data_start_addr;
THROW_ASSERT(
delta > 0,
"Non-positive offset between lua start/end symbols " <<
data_start << " and " << data_end << " for embedding " << embeddingName);
// NOTE: You should also load the size and verify it matches.
static const ssize_t kMaxLuaEmbeddingSize = 16 * 1024 * 1024;
THROW_ASSERT(
delta <= kMaxLuaEmbeddingSize,
"Embedded lua chunk exceeds upper bound of " << kMaxLuaEmbeddingSize << " bytes");
namespace io = boost::iostreams;
io::stream_buffer<io::array_source> buf(data_start_addr, data_end_addr);
std::istream stream(&buf);
// Call the code that knows how to feed a
// std::istream to lua_load with the current lua_State.
// If you need details on how to do that, leave a comment
// and I'll post additional details.
load(stream, embeddingName.c_str());
}
So, now within your application, assuming you have linked or dlopen'ed the library containing some_stuff.o, you can just say:
SomeLuaStateWrapper wrapper;
wrapper.loadEmbedded("some_stuff_lua");
and the original contents of some_stuff.lua will have been lua_load'ed in the context of 'wrapper'.
If, in addition, you want the shared library containing some_stuff.lua to be able to be loaded from Lua with 'require', simply give the same library that contains some_stuff.o a luaopen entry point in some other C/C++ file:
extern "C" {
int luaopen_some_stuff(lua_State* L)
{
SomeLuaStateWrapper wrapper(L);
wrapper.loadEmbedded("some_stuff_lua");
return 1;
}
} // extern "C"
Your embedded Lua is now available via require as well. This works particularly well with luabind.
With SCons, it is fairly easy to educate the build system that when it sees a .lua file in the sources section of a SharedLibrary that it should 'compile' the file with the ld/objcopy steps above:
# NOTE: The 'cd'ing is annoying, but unavoidable, since
# ld in '-b binary' mode uses the name of the input file to
# set the symbol names, and if there is path info on the
# filename that ends up as part of the symbol name, which is
# no good. So we have to cd into the source directory so we
# can use the unqualified name of the source file. We need to
# abspath $TARGET since it might be a relative path, which
# would be invalid after the cd.
env['SHDATAOBJCOM'] = 'cd $$(dirname $SOURCE) && ld -s -r -o $TARGET.abspath -b binary $$(basename
$SOURCE)'
env['SHDATAOBJROCOM'] = 'objcopy --rename-section .data=.rodata,alloc,load,readonly,data,contents $
TARGET $TARGET'
env['BUILDERS']['SharedLibrary'].add_src_builder(
SCons.Script.Builder(
action = [
SCons.Action.Action(
"$SHDATAOBJCOM",
"$SHDATAOBJCOMSTR"
),
SCons.Action.Action(
"$SHDATAOBJROCOM",
"$SHDATAOBJROCOMSTR"
),
],
suffix = '$SHOBJSUFFIX',
src_suffix='.lua',
emitter = SCons.Defaults.SharedObjectEmitter))
I'm sure it is possible to do something like this with other modern build systems like CMake as well.
This technique is of course not limited to Lua, but can be used to embed just about any resource in a binary.
A really cheap, but not so easy to alter way is to use something like bin2c to generate a header out of a selected lua file (or its compiled bytecode, which is faster and smaller), then you can pass that to lua to execute.
You can also try embedding it as a resource, but I have no clue how that works outside of visual studio/windows.
depending what you want to do, you might even find exeLua of use.

Resources