How to store a paragraph in C source code?

How to store a paragraph in C source code? - c

I'm a perl programmer and surprised to find c language has no a convenient way to store a paragraph, like:
my $a = <<'dd';
hello wolrd..
1
2
3
dd
So how do I the smiliary operation in C?

It's done like this:
char a[] =
"hello world\n"
" 1\n"
" 2\n"
" 3\n";

You've two different answers to your question which provide many of the same functionalities.
Both
char *str="Line 1\nLine 2\nLine 3";
and
char str[]="Line 1\nLine 2\nLine 3";
Allow you to print a paragraph as such:
printf("%s",str);
However, the first declaration (char *str) creates a string in what is generally read-only memory, whereas the second allows the string to be edited during run-time. This delineation is important, but not always clear. See this question for a few more details.
The character \n is the line feed character, and you should check to ensure that it behaves the way you expect it to on your target platform. For instance, on DOS you may need to use `"\r\n", which is carriage return + line feed. Wiki has an article about this.
Another difference in these forms, as one commenter pointed out, is that *str works as a pointer whereas str[] does not. They often, but not always, have the same behavior; this question has more information regarding this.
As some commenters have pointed out, there is a limit on the length of string literals in some compilers. MSVC has a limit of 2048 characters (see here) whereas GCC has no limit, by some accounts. A length of at least 509 one-byte characters is guaranteed by C90; this was increased to 4095 in C99.
Regardless, if you want to avoid this length limit or you want to organize the text in a prettier way, you can use this format (note that newlines and quotes must be used explicitly, the compiler treats the adjacent strings as being concatenated):
char *str =
"Line 1\n"
"Line 2\n"
"Line 3\n";
or this (the backslashes at the end of the line escape the newline you've inserted for the formatting, if you indent your code here, that will become part of the string):
char *str =
"Line 1 \
Line 2 \
Line 3";

you may try
char *str = "hello world\n 1\n 2\n 3\n";

Related

string splicing in c language, A stange phenomenon

Recently, I saw a C language code, the following:
printf("%s\n", "1234" "qwer");
// output: 1234qwer
snprintf(buffer, sizeof(buffer), "bvcx" "mju");
// buffer data: bvcxmju
To be honest, it's amazing for me. Before that, I didn't know that the strings can be pasted in "1234" "qwer" format. Why can it run?
then, I try this 'char a[] = "1234" "qwer"', gcc return an error!
so, can someone explain this phenomenon and explain theory?

What you saw has been part of the C language syntax for a long time. A string literal can be split in multiple parts separated only by white space, after preprocessing and comment removal. This syntax enables for example:
writing a long string literal on multiple lines:
char message[] = "This is a long message that can be split on "
"multiple lines for readability";
combining string fragments defined as macros:
printf("The value of i32 is %" PRId32 "\n", i32);
separating string contents that have a different meaning if juxtaposed:
char s1[] = "This is ESC 4: \x1B" "4";
char s2[] = "so is this: \0334 and this: \33""4";
char s3[] = "but not this: \334";
char s4[] = "nor this: \x1B4";
combining stringified macro arguments

Adjacent string literals are always concatenated into a single one as part of the translation phases. See C17 6.4.5/5:
In translation phase 6, the multibyte character sequences specified by any sequence of adjacent character and identically-prefixed string literal tokens are concatenated into a single multibyte character sequence.
Formally, translation phase 6 happens after macro expansion but before preprocessor tokens are converted to tokens. Meaning for example that
sizeof "hello " "world" yields the result 12, equivalent to:
sizeof "hello world"
Practically, this is convenient when writing various "stringification" macros, example:
#include <stdio.h>
#define STRINGIFY(x) #x
#define STRINGIFY_CONCAT(a,b) STRINGIFY(a) " " STRINGIFY(b)
int main (void)
{
puts(STRINGIFY_CONCAT(hello,world));
}
It's also a useful feature whenever you have to use hex escape sequences and need to terminate them, since C allows them to be of variable length: puts("\xABBA") vs puts("\xAB" "BA") will give different outputs.

Why does C not recognize strings across multiple lines?

(I am very new to C.)
Visual newlines seem to be unimportant in C.
For instance:
int i; int j;
is same as
int i;
int j;
and
int k = 0 ;
is same as
int
k
=
0
;
so why is
"hello
hello"
not the same as
"hello hello"

It is because a line that contains a starting quote character and not an ending quote character was more likely a typing mistake or other error than an attempt to write a string across multiple lines, so the decision was made that string literals would not span source lines, unless deliberately indicated with \ at the end of a line.
Further, when such an error occurs, the compile would be faced with reading possibly thousands of lines of code before determining there was no closing quote character (end of file was reached) or finding what was intended as an opening quote character for some other string literal and then attempting to parse the contents of that string literal as C code. In addition to burdening early compilers with limited compute resources, this could result in confusing error messages in a part of the source code far removed from the missing quote character.
This choice is effected in C 2018 6.4.5 1, which says that a string literal is " s-char-sequenceopt ", where s-char-sequence is any member of the character set except the quote character, a backslash, or a new-line character (and a string literal may also have an encoding prefix, a u8, u, U, or L before the first ").

Strings can be continued over newlines by putting a backslash immediately before the newline:
"hello \
hello"
Or (better), using string concatenation:
"hello "
"hello"
Note that the space has been carefully preserved so that these are equivalent to "hello hello" except for the line numbering in the file after the appearance.
The backslash-newline line elimination is done very early in the translation process — in phase 2 of the conceptual translation phases.
Note that there is no stripping leading blanks or anything. If you write:
printf("Some long string with maybe an integer %d in it\
and some more data on the next line\n", i);
Then the string has a sequence of (at least) 8 blanks in it between in it and and some. The count of 8 assumes that the printf() statement is aligned in the left margin; if it is indented, you'd need to add the extra white space corresponding to the indentation.

1- using double quotes for each string :
char *str = "hello "
"hello" ;
** One problem with this approach is that we need to escape specially characters such as quotation mark " itself.
2- Using - \ :
char *str = "hello \
hello" ;
** This form is a lot easier to write, we don't need to write quotation marks for each line.

We can think of a C program as a series of tokens: groups of characters that can't be split up without changing their meaning. Identifiers and keywords are tokens. So are operators like + and -, punctuation marks such as the comma
and semicolon, and string literals.
For example, the line
int i; int j;
consists of 6 tokens: int, i, ;, int, j and ;. Most of the time, and particularly in this case, the amount of space (space, tab and newline characters) is not critical. That's why the compiler will treat
int i
;int
j;
The same.
Writing
"Hello
Hello"
Is like writing
un signed
and hope that the compiler treat it as
unsigned
Just like space is not allowed between a keyword, newline character is not allowed in a string literal token. But it can be included using the newline escape '\n' when needed.
To write strings across lines use string concatenation method
"Hello"
"Hello"
Although the above method is recommended, you can also use a backslash
"Hello \
Hello"
With the backslash method, beware of the beginning space in a new line. The string will include everything in that line until it finds a closing quote or another backslash.

**argv contain a lot of characters than expected

First, I need to execute two commands with system(), for example, I receive an string and open this string with an text editor, like this:
$ ./myprogram string1
And the output should be a command like this:
$ vim string1
But, I cannot find a way to do this like this pseudo code:
system("vim %s",argv[1]); //Error:
test.c:23:3: error: too many arguments to function 'system'
system("vim %s",argv[1]);
Therefore, my solution is store the argv[1] on a char array that already initialized with four characters, like this:
char command[strlen(argv[1])+4];
command[0] = 'v'; command [1] = 'i'; command[2] = 'm'; command[3] = ' ';
And assign the argv[1] to my new char array:
for(int i = 0; i < strlen(argv[1]) ; i++)
command[i+4] = argv[1][i];
And finally:
system(command);
But, if the arguments given to my program has less than 3 characters, its works fine, but if not, some weird characters that I do not expect appear in the output, like this:
./myprogramg 1234
And the output is:
$ vim 12348�M�
How can I solve this bug and why does this happen?
The full code is:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main (int argc,char **argv) {
char command[strlen(argv[1])+4];
command[0] = 'v'; command [1] = 'i'; command[2] = 'm'; command[3] = ' ';
for(int i = 0; i < strlen(argv[1]) ; i++)
command[i+4] = argv[1][i];
system(command);
return 0;
}

You need to NUL terminate your C-style strings, and that includes allocating enough memory to hold the NUL.
Your array is a byte short (must be char command[strlen(argv[1])+4+1]; to leave space for NUL), and you should probably just use something like sprintf to fill it in, e.g.:
sprintf(command, "vim %s", argv[1]);`
That's simpler than manual loops, and it also fills in the NUL for you.
The garbage you see is caused by the search for the NUL byte (which terminates the string) wandering off into unrelated (and undefined for that matter) memory that happens to occur after the buffer.

The reason you're running into problems is that you aren't terminating your command string with NULL. But you really want to use sprintf (or even better to use snprintf) for something like this. It works similarly to printf but outputs to memory instead of stdout and handles the terminating NULL for you. E.g:
char cmd[80];
snprintf(cmd, 80, "vim %s", argv[1])
system(cmd);
As #VTT points out, this simplified code assumes that the value in argv[1] will be less than 75 characters (80 minus 4 for "vim " minus 1 for the NULL character). One safer option would be to verify this assumption first and throw an error if it isn't the case. To be more flexible you could dynamically allocate the cmd buffer:
char *cmd = "vim ";
char *buf = malloc(strlen(argv[1]) + strlen(cmd) + 1);
sprintf(buf, "%s%s", cmd, argv[1]);
system(buf);
free(buf);
Of course you should also check to be sure argc > 1.

I know that there are already good answers here, but I'd like to expand them a little bit.
I often see this kind of code
system("vim %s",argv[1]); //Error:
and beginners often wonder, why that is not working.
The reason for that is that "%s", some_string is not a feature of the C
language, the sequence of characters %s has no special meaning, in fact it is
as meaningful as the sequence mickey mouse.
The reason why that works with printf (and the other members of the
printf-family) is because printf was designed to replace sequences like
%s with a value passed as an argument. It's printf which make %s special,
not the C language.
As you may have noticed, doing "hallo" + " world" doesn't do string
concatenation. C doesn't have a native string type that behaves like C++'s
std::string or Python's String. In C a string is just a sequence of
characters that happen to have a byte with value of 0 at the end (also called
the '\0'-terminating byte).
That's why you pass to printf a format as the first argument. It tells
printf that it should print character by character unless it finds a %,
which tells printf that the next character(s)1 is/are special and
must substitute them with the value passed as subsequent arguments to printf.
The %x are called conversion specifiers and the documentation of printf
will list all of them and how to use them.
Other functions like the scanf family use a similar strategy, but that doesn't
mean that all functions in C that expect strings, will work the same way. In
fact the vast majority of C functions that expect strings, do not work in that
way.
man system
#include <stdlib.h>
int system(const char *command);
Here you see that system is a function that expects one argument only.
That's why your compiler complains with a line like this: system("vim %s",argv[1]);.
That's where functions like sprintf or snprintf come in handy.
1If you take a look at the printf documentation you will see that
the conversion specifier together with length modifiers can be longer than 1
character.

Writing printf(#"") with multiple line string

Is it possible to write something like this:
printf(#"
-
-
-
-
");
I can do it in C#, but can't in C. It gives me an error in CodeBlocks. Am I allowed to do such ?
Error message: error: stray '#' in program.

No. That syntax doesn't exist in C.
If you want a multiple-line string, write it as multiple double-quoted strings with no other tokens in between them. They will be combined.
printf(
"some string"
"more of the string"
"even more of the string"
);
(You will, of course, need to add a \n at the end of each line if that's what you want.)

No that's not a syntax that C understands, C doesn't have raw literals.
You can use \ as the last character to continue on the next line:
const char *str = "hello\n\
world";
Also, consecutive string literals will be concatenated. So you can do e.g.
const char *str = "Hello\n"
"world\n";

C#'s verbatim strings are not available in C. If you have some characters to escape, like " or \, escape them with '\', there is no there option in this language.
If you want to embed multiple lines in a string literal, you can either insert \n at the appropriate location in your string, or escape the return character as well:
printf("Here's\
a multiline\
string litteral");

Line continuation with \ at the end of the line.
printf("\
\
-\
-\
-\
-\
");

String literals in C may not contain newlines. You have two workarounds:
Use implicit string concatenation (done by the compiler).
printf("The quick brown"
" fox jumps over"
" the sleazy dog.");
Escape the newline by placing a backslash in front of it.
printf("The quick brown\
fox jumps over\
the sleazy dog.");
Personally, I prefer the first form since the second looks ugly (my opinion) and forces you to ruin your code indentation.
In either case, the string will simply not contain the newlines. So if you really meant for them to be there, you'll have to add them via \n.

Can scanf identify a format character within a string?

Let's say that I expect a list of items from the standard input which are separated buy commas, like this:
item1, item2, item3,...,itemn
and I also want to permit the user to emit white-spaces between items and commas, so this kind of input is legal in my program:
item1,item2,item3,...,itemn
If I use scanf like this:
scanf("%s,%s,%s,%s,...,%s", s1, s2, s3, s4,...,sn);
it will fail when there are no white-spaces (I tested it) because it will refer to the whole input as one string. So how can I solve this problem only with C standard library functions?

The quick answer is never, ever use scanf to read user input. It is intended for reading strictly formatted input from files, and even then isn't much good. At the least, you should be reading entire lines and then parsing them with sscanf(), which gives you some chance to correct errors. at best you should be writing your own parsing functions
If you are actually using C++, investigate the use of the c++ string and stream classes, which are much more powerful and safe.

You could have a look at strtok. First read the line into a buffer, then tokenize:
const int BUFFERSIZE = 32768;
char buffer[BUFFERSIZE];
fgets(buffer, sizeof(buffer), stdin);
const char* delimiters = " ,\n";
char* p = strtok(buffer, delimiters);
while (p != NULL)
{
printf("%s\n", pch);
p = strtok(NULL, delimiters);
}
However, with strtok you'll need to be aware of the potential issues related to reentrance.

I guess it is better to write your own parsing function for this. But if you still prefer scanf despite of its pitfalls, you can do some workaround, just substitute %s with %[^, \t\r\n].
The problem that %s match sequence of non white space characters, so it swallows comma too. So if you replace %s with %[^, \t\r\n] it will work almost the same (difference is that %s uses isspace(3) to match space characters but in this case you explicitly specify which space characters to match and this list probably not the same as for isspace).
Please note, if you want to allow spaces before and after comma you must add white space to your format string. Format string "%[^, \t\r\n] , %[^, \t\r\n]" matches strings like "hello,world", "hello, world", "hello , world".

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to store a paragraph in C source code? - c

I'm a perl programmer and surprised to find c language has no a convenient way to store a paragraph, like: my $a = <<'dd'; hello wolrd.. 1 2 3 dd So how do I the smiliary operation in C?

It's done like this: char a[] = "hello world\n" " 1\n" " 2\n" " 3\n";

you may try char *str = "hello world\n 1\n 2\n 3\n";

Related

string splicing in c language, A stange phenomenon

Why does C not recognize strings across multiple lines?

**argv contain a lot of characters than expected

Writing printf(#"") with multiple line string

Can scanf identify a format character within a string?

Categories

Resources