Replacing C++ single-line comments with C89 comments

Replacing C++ single-line comments with C89 comments - c

What's a good way to replace single-line // input number comments with multi-line /* input number */ comments?
I don't have any preference for the language used to accomplish the task; I was thinking of Perl or sed. The source language will be C (ANSI X3.159-1989).
Simple scripts like
while(<>) {
if (m#^(.*?)//#) {
print $1;
} else {
print $_;
}
}
would be fooled by strings containing // and are not OK. Similarly, // inside a multi-line comment should be left alone.
Edit: Code can assume that there are no trigraphs.
This is the opposite of replace C style comments by C++ style comments. It is similar to Replacing // comments with /* comments */ in PHP (though the accepted answer there cannot handle the special cases I mentioned and so is arguably wrong).

You can use boost::wave lexer's output to replace all the c++ style comments to C style comments. without getting bothered about the edge cases.
#include <iostream>
#include <fstream>
#include <boost/wave/cpplexer/cpp_lex_token.hpp>
#include <boost/wave/cpplexer/cpp_lex_iterator.hpp>
typedef boost::wave::cpplexer::lex_token<> token_type;
typedef boost::wave::cpplexer::lex_iterator<token_type> token_iterator;
typedef token_type::position_type position_type;
int main()
{
const char* infile = "infile.h";
const char* outfile = "outfile.h";
std::string instr;
std::stringstream outstrm;
std::string cmt_str;
std::ifstream instream(infile);
std::ofstream outstream(outfile);
if(!instream.is_open()) {
std::cerr << "Could not open file: "<< infile<<"\n";
}
if(!outstream.is_open()) {
std::cerr << "Could not open file: "<< outfile<<"\n";
}
instream.unsetf(std::ios::skipws);
instr = std::string(std::istreambuf_iterator<char>(instream.rdbuf()),
std::istreambuf_iterator<char>());
position_type pos(infile);
token_iterator it = token_iterator(instr.begin(), instr.end(), pos,
boost::wave::language_support(boost::wave::support_cpp|boost::wave::support_option_long_long));
token_iterator end = token_iterator();
boost::wave::token_id id = *it;
while(it!=end) {
//here you check the c++ style comments
if(id == boost::wave:: T_CPPCOMMENT) {
std::cout<<"Found CPP COMMENT";
cmt_str = it->get_value();
cmt_str[0] = '/';
cmt_str[1] = '*';
//since the last token is the new_line token so replace the new line
cmt_str[cmt_str.size()-1] = '*';
cmt_str.push_back('/');
//and then append the newline at the end of the string
cmt_str.push_back('\n');
outstrm<<cmt_str;
}
else {
outstrm<<it->get_value();
}
++it;
id = *it;
}
outstream<<outstrm;
return 0;
}
For further documentation please see:
http://www.boost.org/doc/libs/1_47_0/libs/wave/index.html

There are a lot of corner cases to consider. Stray //s can appear in string literals, character constants (yes, really), and within /* ... */ comments and // comments. Line-splicing with trailing \ characters can really mess things up -- and a \ can be represented as the trigraph ??/. I seriously doubt that I've thought of all of them.
If you need a 100% reliable replacement, you're going to have to reproduce (or steal!) part of the preprocessor of a C compiler.
If you don't need 100% reliability, you might consider just doing a naive replacement, then comparing the input to the output and manually cleaning up any problems. (For typically code, it's likely that there won't be any, but you'll need to check.) The practicality of this approach depends in part on how much code you need to translate.
Most of the corner cases will result in code that won't compile:
printf("Hello // world\n");
-->
print("Hello /* world\n"); */
You might also consider whether this is really necessary. Most C89/C90 compilers do support // comments, at least optionally.

This won't cover 100% of corner cases, but it covers the ones you mentioned in your request.
#!/usr/bin/env python
import re
from sys import stdin, stdout
for line in stdin.readlines():
line = line[:-1] # Trim the newline
stripped = re.sub(r'[\'"].*[\'"]', '', line) # Ignore strings
stripped = re.sub(r'/\*.*\*/', '', stripped) # Ignore multi-line comments
m = re.match(r'.*?//(.*)', stripped) # Only match actual C++-style comments
if m:
offset = len(m.group(1)) + 2
content = line[:offset*-1] # Get the original line sans comment
print '%s/* %s */' % (content, m.group(1)) # Combine the two with C-style comments
else:
print line

Related

Flex and Bison, Windows Error using symbol table

Program is intended to store values in a symbol table and then have them be able to be printed out stating the part of speech. Further to be parsed and state more in the parser, whether it is a sentence and more.
I create the executable file by
flex try1.l
bison -dy try1.y
gcc lex.yy.c y.tab.c -o try1.exe
in cmd (WINDOWS)
My issue occurs when I try to declare any value when running the executable,
verb run
it goes like this
BOLD IS INPUT
verb run
run
run
syntax error
noun cat
cat
syntax error
run
run
syntax error
cat run
syntax error
MY THOUGHTS: I'm unsure why I'm getting this error back from the code Syntax error. Although after debugging and trying to print out what value was being stored, I figured there has to be some kind of issue with the linked list. As it seemed only one value was being stored in the linked list and causing an error of sorts. As I tried to print out the stored word_type integer value for run and it would print out the correct value 259, but would refuse to let me define any other words to my symbol table. I reversed the changes of the print statements and now it works as previously stated. I think again there is an issue with the addword method as it isn't properly being added so the lookup method is crashing the program.
Lexer file, this example is taken from O'Reily 2nd edition on Lex And Yacc,
Example 1-5,1-6.
Am trying to learn Lex and Yacc on my own and reproduce this example.
%{
/*
* We now build a lexical analyzer to be used by a higher-level parser.
*/
#include <stdlib.h>
#include <string.h>
#include "ytab.h" /* token codes from the parser */
#define LOOKUP 0 /* default - not a defined word type. */
int state;
%}
/*
* Example from page 9 Word recognizer with a symbol table. PART 2 of Lexer
*/
%%
\n { state = LOOKUP; } /* end of line, return to default state */
\.\n { state = LOOKUP;
return 0; /* end of sentence */
}
/* whenever a line starts with a reserved part of speech name */
/* start defining words of that type */
^verb { state = VERB; }
^adj { state = ADJ; }
^adv { state = ADV; }
^noun { state = NOUN; }
^prep { state = PREP; }
^pron { state = PRON; }
^conj { state = CONJ; }
[a-zA-Z]+ {
if(state != LOOKUP) {
add_word(state, yytext);
} else {
switch(lookup_word(yytext)) {
case VERB:
return(VERB);
case ADJECTIVE:
return(ADJECTIVE);
case ADVERB:
return(ADVERB);
case NOUN:
return(NOUN);
case PREPOSITION:
return(PREPOSITION);
case PRONOUN:
return(PRONOUN);
case CONJUNCTION:
return(CONJUNCTION);
default:
printf("%s: don't recognize\n", yytext);
/* don't return, just ignore it */
}
}
}
. ;
%%
int yywrap()
{
return 1;
}
/* define a linked list of words and types */
struct word {
char *word_name;
int word_type;
struct word *next;
};
struct word *word_list; /* first element in word list */
extern void *malloc() ;
int
add_word(int type, char *word)
{
struct word *wp;
if(lookup_word(word) != LOOKUP) {
printf("!!! warning: word %s already defined \n", word);
return 0;
}
/* word not there, allocate a new entry and link it on the list */
wp = (struct word *) malloc(sizeof(struct word));
wp->next = word_list;
/* have to copy the word itself as well */
wp->word_name = (char *) malloc(strlen(word)+1);
strcpy(wp->word_name, word);
wp->word_type = type;
word_list = wp;
return 1; /* it worked */
}
int
lookup_word(char *word)
{
struct word *wp = word_list;
/* search down the list looking for the word */
for(; wp; wp = wp->next) {
if(strcmp(wp->word_name, word) == 0)
return wp->word_type;
}
return LOOKUP; /* not found */
}
Yacc file,
%{
/*
* A lexer for the basic grammar to use for recognizing English sentences.
*/
#include <stdio.h>
%}
%token NOUN PRONOUN VERB ADVERB ADJECTIVE PREPOSITION CONJUNCTION
%%
sentence: subject VERB object{ printf("Sentence is valid.\n"); }
;
subject: NOUN
| PRONOUN
;
object: NOUN
;
%%
extern FILE *yyin;
main()
{
do
{
yyparse();
}
while (!feof(yyin));
}
yyerror(s)
char *s;
{
fprintf(stderr, "%s\n", s);
}
Header file, had to create 2 versions for some values not sure why but code was having an issue with them, and I wasn't understanding why so I just created a token with the full name and the shortened as the book had only one for each.
# define NOUN 257
# define PRON 258
# define VERB 259
# define ADVERB 260
# define ADJECTIVE 261
# define PREPOSITION 262
# define CONJUNCTION 263
# define ADV 260
# define ADJ 261
# define PREP 262
# define CONJ 263
# define PRONOUN 258

If you feel that there is a problem with your linked list implementation, you'd be a lot better off testing and debugging it with a simple driver program rather than trying to do that with some tools (flex and bison) which you are still learning. On the whole, the simpler a test is and the fewest dependencies which it has, the easier it is to track down problems. See this useful essay by Eric Clippert for some suggestions on debugging.
I don't understand why you felt the need to introduce "short versions" of the token IDs. The example code in Levine's book does not anywhere use these symbols. You cannot just invent symbols and you don't need these abbreviations for anything.
The comment that you "had to create 2 versions [of the header file] for some values" but that the "code was having an issue with them, and I wasn't understanding why" is far too unspecific for an answer. Perhaps the problem was that you thought you could use identifiers which are not defined anywhere, which would certainly cause a compiler error. But if there is some other issue, you could ask a question with an accurate problem description (that is, exactly what problem you encountered) and a Minimal, Complete, and Verifiable example (as indicated in the StackOverflow help pages).
In any case, manually setting the values of the token IDs is almost certainly preventing you from being able to recognized inputs. Bison/yacc reserves the values 256 and 257 for internal tokens, so the first one which will be generated (and therefore used in the parser) has value 258. That means that the token values you are returning from your lexical scanner have a different meaning inside bison. Bottom line: Never manually set token values. If your header isn't being generated correctly, figure out why.
As far as I can see, the only legal input for your program has the form:
sentence: subject VERB object
Since none of your sample inputs ("run", for example) have this form, a syntax error is not surprising. However, the fact that you receive a very early syntax error on the input "cat" does suggest there might be a problem with your symbol table lookup. (That's probably the result of the problem noted above.)

How to create Yacc/Lex rules for embedding C source code snippets?

I'm implementing a custom parser generator with embedded lexer and parser to parse HTTP headers in an event-driven state machine way. Here's some definitions the eventual parser generator could consume to parse a single header field without CRLF at the end:
token host<prio=1> = "[Hh][Oo][Ss][Tt]" ;
token ospace = "[ \t]*" ;
token htoken = "[-!#$%&'*+.^_`|~0-9A-Za-z]+" ;
token hfield = "[\t\x20-\x7E\x80-\xFF]*" ;
token space = " " ;
token htab = "\t" ;
token colon = ":" ;
obsFoldStart = 1*( space | htab ) ;
hdrField =
obsFoldStart hfield
| host colon ospace hfield<print>
| htoken colon ospace hfield
;
The lexer is based on a maximal munch rule and the tokens are dynamically turned on and off depending on the context, so there is no conflict between htoken and hfield, and the priority value resolves the conflict between host and htoken. I'm planning to implement the parser as LL(1) table parser. I haven't yet decided if I'll implement regexp token matching by simulating the nondeterministic finite automaton or go all the way to exploding it to a deterministic finite automaton.
Now, I would like to include some C source code in my parser generator input:
hdrField =
obsFoldStart hfield
| host {
parserState->userdata.was_host = 1;
} colon ospace hfield<print>
| htoken {
parserState->userdata.was_host = 0;
} colon ospace hfield
;
What I need thus is some way to read text tokens that end when the same amount of } characters are read than the amount of { characters read.
How to do this? I'm handling comments using BEGIN(COMMENTS) and BEGIN(INITIAL) but I don't believe such a strategy would work for embedded C source. Also, the comment handling could complicate the embedded C source code handling a lot, because I don't believe a single token can have a comment in the middle of it.
Basically, I need the embedded C language snippet as a C string I can store to my data structures.

So, I took some of the generated lex code and made it self standing.
I hope, it's OK that I used C++ code although I recognized the c only. IMHO, it concerns only the not so
relevant parts of this sample code. (Memory management in C is much more tedious than simply delegating this to std::string.)
scanC.l:
%{
#include <iostream>
#include <string>
#ifdef _WIN32
/// disables #include <unistd.h>
#define YY_NO_UNISTD_H
#endif // _WIN32
// buffer for collected C/C++ code
static std::string cCode;
// counter for braces
static int nBraces = 0;
%}
/* Options */
/* make never interactive (prevent usage of certain C functions) */
%option never-interactive
/* force lexer to process 8 bit ASCIIs (unsigned characters) */
%option 8bit
/* prevent usage of yywrap */
%option noyywrap
EOL ("\n"|"\r"|"\r\n")
SPC ([ \t]|"\\"{EOL})*
LITERAL "\""("\\".|[^\\"])*"\""
%s CODE
%%
<INITIAL>"{" { cCode = '{'; nBraces = 1; BEGIN(CODE); }
<INITIAL>. |
<INITIAL>{EOL} { std::cout << yytext; }
<INITIAL><<EOF>> { return 0; }
<CODE>"{" {
cCode += '{'; ++nBraces;
//updateFilePos(yytext, yyleng);
} break;
<CODE>"}" {
cCode += '}'; //updateFilePos(yytext, yyleng);
if (!--nBraces) {
BEGIN(INITIAL);
//return new Token(filePosCCode, Token::TkCCode, cCode.c_str());
std::cout << '\n'
<< "Embedded C code:\n"
<< cCode << "// End of embedded C code\n";
}
} break;
<CODE>"/*" { // C comments
cCode += "/*"; //_filePosCComment = _filePos;
//updateFilePos(yytext, yyleng);
char c1 = ' ';
do {
char c0 = c1; c1 = yyinput();
switch (c1) {
case '\r': break;
case '\n':
cCode += '\n'; //updateFilePos(&c1, 1);
break;
default:
if (c0 == '\r' && c1 != '\n') {
c0 = '\n'; cCode += '\n'; //updateFilePos(&c0, 1);
} else {
cCode += c1; //updateFilePos(&c1, 1);
}
}
if (c0 == '*' && c1 == '/') break;
} while (c1 != EOF);
if (c1 == EOF) {
//ErrorFile error(_filePosCComment, "'/*' without '*/'!");
//throw ErrorFilePrematureEOF(_filePos);
std::cerr << "ERROR! '/*' without '*/'!\n";
return -1;
}
} break;
<CODE>"//"[^\r\n]* | /* C++ one-line comments */
<CODE>"'"("\\".|[^\\'])+"'" | /*"/* C/C++ character constants */
<CODE>{LITERAL} | /* C/C++ string constants */
<CODE>"#"[^\r\n]* | /* preprocessor commands */
<CODE>[ \t]+ | /* non-empty white space */
<CODE>[^\r\n] { // any other character except EOL
cCode += yytext;
//updateFilePos(yytext, yyleng);
} break;
<CODE>{EOL} { // special handling for EOL
cCode += '\n';
//updateFilePos(yytext, yyleng);
} break;
<CODE><<EOF>> { // premature EOF
//ErrorFile error(_filePosCCode,
// compose("%1 '{' without '}'!", _nBraces));
//_errorManager.add(error);
//throw ErrorFilePrematureEOF(_filePos);
std::cerr << "ERROR! Premature end of input. (Not enough '}'s.)\n";
}
%%
int main(int argc, char **argv)
{
return yylex();
}
A sample text to scan scanC.txt:
Hello juhist.
The text without braces doesn't need to have any syntax.
It just echoes the characters until it finds a block:
{ // the start of C code
// a C++ comment
/* a C comment
* (Remember that nested /*s are not supported.)
*/
#define MAX 1024
static char buffer[MAX] = "", empty="\"\"";
/* It is important that tokens are recognized to a limited amount.
* Otherwise, it would be too easy to fool the scanner with }}}
* where they have no meaning.
*/
char *theSameForStringConstants = "}}}";
char *andCharConstants = '}}}';
int main() { return yylex(); }
}
This code should be just copied
(with a remark that the scanner recognized the C code a such.)
Greetings, Scheff.
Compiled and tested on cygwin64:
$ flex --version
flex 2.6.4
$ flex -o scanC.cc scanC.l
$ g++ --version
g++ (GCC) 7.3.0
$ g++ -std=c++11 -o scanC scanC.cc
$ ./scanC < scanC.txt
Hello juhist.
The text without braces doesn't need to have any syntax.
It just echoes the characters until it finds a block:
Embedded C code:
{ // the start of C code
// a C++ comment
/* a C comment
* (Remember that nested /*s are not supported.)
*/
#define MAX 1024
static char buffer[MAX] = "", empty="\"\"";
/* It is important that tokens are recognized to a limited amount.
* Otherwise, it would be too easy to fool the scanner with }}}
* where they have no meaning.
*/
char *theSameForStringConstants = "}}}";
char *andCharConstants = '}}}';
int main() { return yylex(); }
}// End of embedded C code
This code should be just copied
(with a remark that the scanner recognized the C code a such.)
Greetings, Scheff.
$
Notes:
This is taken from a helper tool (not for selling). Hence, this is not bullet-proof but just good enough for productive code.
What I saw when adapting it: The line continuation of pre-processor lines is not handled.
It's surely possible to fool the tool with a creative combination of macros with unbalanced { } – something we would never do in pur productive code (see 1.).
So, it might be at least a start for further development.
To check this against a C lex specification, I have ANSI C grammar, Lex specification at hand, though it's 22 years old. (There are probably newer ones available matching the current standards.)

How i can disable maximal munch rule in Lex?

Suppose i want to deal with certain patterns and have the other text(VHDL code) as it is in the output file.
For that purpose i would be required to write a master rule in the end as
(MY_PATTERN){
// do something with my pattern
}
(.*){
return TOK_VHDL_CODE;
}
Problem with this strategy is MY_PATTERN is useless in this case and would be matched with .* by maximum munch rule.
So how can i get this functionality ?

The easy way is to get rid of the * in your default rule at the end and just use
. { append_to_buffer(*yytext); }
so your default rule takes all the stuff that isn't matched by the previous rules and stuffs it off in a buffer somehwere to be dealt with by someone else.

In theory, it's possible to find a regular expression which will match a string not containing a pattern, but except in the case of very simple patterns, it is neither easy nor legible.
If all you want to do is search for (and react to) specific patterns, you could use a default rule which matches one character and does nothing:
{Pattern1} { /* Do something with the pattern */ }
{Pattern2} { /* Do something with the pattern */ }
.|\n /* Default rule does nothing */
If, on the other hand, you wanted to do something with the not-otherwise-matched strings (as in your example), you'll need to default rule to accumulate the strings, and the pattern rules to "send" (return) the accumulated token before acting on the token which they matched. That means that some actions will need to send two tokens, which is a bit awkward with the standard parser calls scanner for a token architecture, because it requires the scanner to maintain some state.
If you have a no-too-ancient version of bison, you could use a "push parser" instead, which allows the scanner to call the parser. That makes it easy to send two tokens in a single action. Otherwise, you need to build a kind of state machine into your scanner.
Below is a simple example (which needs pattern definitions, among other things) using a push-parser.
%{
#include <stdlib.h>
#include <string.h>
#include "parser.tab.h"
/* Since the lexer calls the parser and we call the lexer,
* we pass through a parser (state) to the lexer. This is
* how you change the `yylex` prototype:
*/
#define YY_DECL static int yylex(yypstate* parser)
%}
pattern1 ...
pattern2 ...
/* Standard "avoid warnings" options */
%option noyywrap noinput nounput nodefault
%%
/* Indented code before the first pattern is inserted at the beginning
* of yylex, perfect for local variables.
*/
size_t vhdl_length = 0;
/* These are macros because they do out-of-sequence return on error. */
/* If you don't like macros, please accept my apologies for the offense. */
#define SEND_(toke, leng) do { \
size_t leng_ = leng; \
char* text = memmove(malloc(leng_ + 1), yytext, leng_); \
text[leng_] = 0; \
int status = yypush_parse(parser, toke, &text); \
if (status != YYPUSH_MORE) return status; \
} while(0);
#define SEND_TOKEN(toke) SEND_(toke, yyleng)
#define SEND_TEXT do if(vhdl_length){ \
SEND_(TEXT, vhdl_length); \
yytext += vhdl_length; yyleng -= vhdl_length; vhdl_length = 0; \
} while(0);
{pattern1} { SEND_TEXT; SEND_TOKEN(TOK_1); }
{pattern2} { SEND_TEXT; SEND_TOKEN(TOK_2); }
/* Default action just registers that we have one more char
* calls yymore() to keep accumulating the token.
*/
.|\n { ++vhdl_length; yymore(); }
/* In the push model, we're responsible for sending EOF to the parser */
<<EOF>> { SEND_TEXT; return yypush_parse(parser, 0, 0); }
%%
/* In this model, the lexer drives everything, so we provide the
* top-level interface here.
*/
int parse_vhdl(FILE* in) {
yyin = in;
/* Create a new pure push parser */
yypstate* parser = yypstate_new();
int status = yylex(parser);
yypstate_delete(parser);
return status;
}
To actually get that to work with bison, you need to provide a couple of extra options:
parser.y
%code requires {
/* requires blocks get copied into the tab.h file */
/* Don't do this if you prefer a %union declaration, of course */
#define YYSTYPE char*
}
%code {
#include <stdio.h>
void yyerror(const char* msg) { fprintf(stderr, "%s\n", msg); }
}
%define api.pure full
%define api.push-pull push

Checking for a blank line in C - Regex

Goal:
Find if a string contains a blank line. Whether it be '\n\n',
'\r\n\r\n', '\r\n\n', '\n\r\n'
Issues:
I don't think my current regex for finding '\n\n' is right. This is my first time really using regex outside of simple use of * when removing files in command line.
Is it possible to check for all of these cases (listed above) in one regex? or do I have to do 4 seperate calls to compile_regex?
Code:
int checkForBlankLine(char *reader) {
regex_t r;
compile_regex(&r, "*\n\n");
match_regex(&r, reader);
return 0;
}
void compile_regex(regex_t *r, char *matchText) {
int status;
regcomp(r, matchText, 0);
}
int match_regex(regex_t *r, char *reader) {
regmatch_t match[1];
int nomatch = regexec(r, reader, 1, match, 0);
if (nomatch) {
printf("No matches.\n");
} else {
printf("MATCH!\n");
}
return 0;
}
Notes:
I only need to worry about finding one blank line, that's why my regmatch_t match[1] is only one item long
reader is the char array containing the text I am checking for a blank line.
I have seen other examples and tried to base the code off of those examples, but I still seem to be missing something.
Thank you kindly for the help/advice.
If anything needs to be clarified please let me know.

It seems that you have to compile the regex as extended:
regcomp(&re, "\r?\n\r?\n", REG_EXTENDED);
The first atom, \r? is probably unnecessary, because it doesn't add to the blank-line condition if you don't capture the result.
In the above, blank line really means empty line. If you want blank line to mean a line that has no characters except for white space, you can use:
regcomp(&re, "\r?\n[ \t]*\r?\n", REG_EXTENDED);
(I don't think you can use the space character pattern, \s here instead of [ \t], because that would include carriage return and new-line.)
As others have already hinted at, the "simple use of * in the command line` is not a regular expression. This wildcard-matching is called file globbing and has different semantics.

Check what the * in a regex means. It's not like the wildcard "anything" in the command line. The * means that the previous component can appear any amount of times. The wildcard in regex is the .. So if you want to say match anything you can do .*, which would be anything, any amount of times.
So in your case you can do .*\n\n.* which would match anything that has \n\n.
Finally, you can use or in a regex and ( ) to group stuff. So you can do something like .*(\n\n|\r\n\r\n).* And that would match anything that has a \n\n or a \r\n\r\n.
Hope that helps.

Rather than looking for only \r or \n, look for not \r or \n?
Your regex would simply be
'[^\r\n]'
and a match result of false indicates a blank line to your specification.

Search and replace a string as shown below

I am reading a file say x.c and I have to find for the string "shared". Once the string like that has been found, the following has to be done.
Example:
shared(x,n)
Output has to be
*var = &x;
*var1 = &n;
Pointers can be of any name. Output has to be written to a different file. How to do this?
I'm developing a source to source compiler for concurrent platforms using lex and yacc. This can be a routine written in C or if u can using lex and yacc. Can anyone please help?
Thanks.

If, as you state, the arguments can only be variables and not any kind of other expressions, then there are a couple of simple solutions.
One is to use regular expressions, and do a simple search/replace on the whole file using a pretty simple regular expression.
Another is to simply load the entire source file into memory, search using strstr for "shared(", and use e.g. strtok to get the arguments. Copy everything else verbatim to the destination.

Take advantage of the C preprocessor.
Put this at the top of the file
#define shared(x,n) { *var = &(x); *var1 = &(n); }
and run in through cpp. This will include external resources also and replace all macros, but you can simply remove all #something lines from the code, convert using injected preprocessor rules and then re-add them.
By the way, why not a simple macro set in a header file for the developer to include?
A doubt: where do var and var1 come from?
EDIT: corrected as shown by johnchen902

When it comes to preprocessor, I'll do this:
#define shared(x,n) (*var=&(x),*var1=&(n))
Why I think it's better than esseks's answer?
Suppose this situation:
if( someBool )
shared(x,n);
else { /* something else */ }
In esseks's answer it will becomes to:
if( someBool )
{ *var = &x; *var1 = &n; }; // compile error
else { /* something else */ }
And in my answer it will becomes to:
if( someBool )
(*var=&(x),*var1=&(n)); // good!
else { /* something else */ }

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight