I am trying to split a long format string to fprintf() into multiple lines using the \ character as shown below:
fprintf(stdout, "This program take a date supplied by the user in dd/mm/yyyy format...\n\
And returns the day of the week for that date using Zeller's rule.\n\n\
Enter date in (dd/mm/yyyy) format: ");
this causes whitespaces to be added to the output as shown below:
This program take a date supplied by the user in dd/mm/yyyy format...
And returns the day of the week for that date using Zeller's rule.
Enter date in (dd/mm/yyyy) format:
this answer suggests this should work. I also checked this answer before posting here. A comment on it mentions that this approach...
...suffers from the fact that it breaks if there is any whitespace after
the '\'; a bug that can be baffling when it occurs.
cat -A's output on the program file...
^Ifprintf(stdout, "This program take a date supplied by the user in dd/mm/yyyy format...\n\$
^I^I^IAnd returns the day of the week for that date using Zeller's rule.\n\n\$
^I^I^IEnter date in (dd/mm/yyyy) format: ");$
...shows no whitespace after \; although it does introduce <TAB>s into the lines that follow. I am using vim to edit my source files.
I use line continuation with \ all the time in Bash, and was under the impression that it works similarly with fprintf() format strings in C.
I'd like to keep my code readable and line widths reasonable. Other than splitting the long string into multiple fprintf().
Is this standard behavior for printf()/fprintf()
Is it normal for a vim to my code when I do line continuation with \?
How do I fix this issue?
You do not need \ as it is not a macro definition.
Simply have as many as you want as string literals separated by as many as you want whitespace (new line is also a whitespace). The C compiler ignores the whitespace.
int main(void)
{
fprintf(stdout, "This program take a date supplied by the user in dd/mm/yyyy format...\n"
"And returns the day of the week for that date using Zeller's rule.\n\n"
"Enter date in (dd/mm/yyyy) format: ");
}
int main(void)
{
fprintf(stdout, "This program take a date"
" supplied by the user in dd/"
"mm/yyyy format...\n"
"And returns "
"the "
"day of "
"the "
"week for that "
"date using Zeller's rule.\n\n"
"Enter date in (dd/mm/yyyy) format: ");
}
https://godbolt.org/z/6ovj3G
There are fairly low limits in actual portability, but practically you can almost certainly do:
fprintf(stdout, "This program take a date supplied by the user in dd/mm/yyyy format...\n"
"And returns the day of the week for that date using Zeller's rule.\n\n"
"Enter date in (dd/mm/yyyy) format: "
);
If you do start hitting limits on concatenating strings like that, you can also do:
fprintf(stdout, "%s\n%s\n\n%s",
"This program take a date supplied by the user in dd/mm/yyyy format...",
"And returns the day of the week for that date using Zeller's rule.",
"Enter date in (dd/mm/yyyy) format: "
);
I wanted read a text file in C and I want to perform search on this file. This is the content of the text file:
(EDIT: The original format looks a bit different as there are no newlines in the file. It has been reformatted to remove the whitespace between the text strings and filtered through a multicolumn program for a 80col screen.)
^%1~3~31225~2999 ^%1~8~33983~5304
~MAC100 ~MAC100
~RAJU ~LATHA CHERIAN
CR ~ELIM VILLA
~CHEMPOLA ~1
~VT : 2999 ~9847569922
~9847569922 ~32166
~29408 ~Message for bill gro~1960.0
~Message for bill gro~750.0 ~160.0
~250.0 ~0.0
~0.0 ~1~scheme name
~1~scheme name ~0
~0 ~June
~June ~VA019_95784~-
~VA019_93159~- ~0.0
~0.0 ~0~amc date 1~amc date 2~990
~1~amc date 1~amc date 2~990 ~15.0
~15.0 ~150.0
~150.0 ~narration
~narration ^%1~9~31588~3235
^%1~5~30882~2496 ~MAC100
~MAC100 ~BABU
~VISWAMPARAN T. P. ~NADUMPARMBIL
~THALAKOTTUCHALIL ~0
~C 4771 ~9847569922
~9847569922 ~29771
~29065 ~Message for bill gro~3304.0
~Message for bill gro~4320.0 ~160.0
~160.0 ~0.0
~0.0 ~1~scheme name
~1~scheme name ~0
~0 ~June
~June ~VA019_93516~-
~VA019_92833~- ~0.0
~0.0 ~0~amc date 1~amc date 2~990
~0~amc date 1~amc date 2~990 ~15.0
~15.0 ~150.0
~150.0 ~narration
~narration ^?
This is database in format of billing system. I want to do a generic search function this file based on the name and the id (which is the ^%1~9~**31588**~3235, here 31588 like that). This is file of records. Each record are begin with ^%1.~ the ~ is used to separate column values of each record. The first and last characters are not needed (^%1 in of each records and ^? at last of the file). Please help me to do this.
You first should define (or understand) precisely your input format (what are the possible & forbidden characters), perhaps using some EBNF notation.
Then you could process your input line by line (using fgets or getline) and parse each line individually (using sscanf or strtol and extra manual parsing)
Read Line by Line using fgets(), and then tokenize using strtok on the basis "~". hope that works
I just got done doing a project here on R and am now doing some work with matlab.
I need to make 3 vectors :
DOD
Country
Age
Count and store a .txt list with 236 data points the data in the text file looks like this:
Unknown woman
Cause of death: found dead, with eyes removed.
Location of death: Jardim dos Ipês Itaquaquecetuba, São Paulo, Brazil
Date of death: August 9th, 2014
Cris
Cause of death: multiple gunshot wounds
Location of death: Portal da Foz, Foz do Iguaçu, Brazil
Date of death: September 13th, 2014
Betty Skinner (52 years old)
Cause of death: blunt force trauma to the head
Location of death: Cleveland, Ohio, USA
Date of death: December 4th, 2013
Brittany Stergis (22 years old)
Cause of death: gunshot wound to the head
Location of death: Cleveland, Ohio, USA
Date of death: December 5th, 2013
I have no idea how to look for string and organize them but would appreciate any ideas how to get started.
You can use textscan to read the file into a cell array of strings, and then use regexp to parse the strings to get your desired fields.
First, we read the text file into a cell array of strings:
fid = fopen('deaths.txt');
scanned_fields = textscan(fid, '%s', 'Delimiter','\n');
text_array = scanned_fields{1};
fclose(fid);
While textscan is capable of some rudimentary parsing, it's not sophisticated enough for what we're doing. So we're just using it to read each line as a single string: format %s means we are expecting a string, and setting Delimiter to \n means that the strings are separated by newline characters.
Next, we can unleash the awesome power of regular expressions to parse your string of dead women:
format = {
'(?<name>[ \w]*)'
' \('
'(?<age>[\d]*)'
' years old\) - Cause of death: '
'(?<cause>[ \w]*)'
' - Location of death: '
'(?<city>[ \w]*)'
', '
'(?<province>[ \w]*)'
', '
'(?<country>[ \w]*)'
' - Date of death: '
'(?<date>[ ,\w]*)'
};
format = [format{:}];
Here we're just defining a format string. I've broken it up like this to make it a little clearer what's going on. Let's go through it line-by-line:
(?<name>[ \w]*) The parentheses indicate that this is a chunk of text (a.k.a. a "token") that we wish to capture. The ?<name> says that we will call this token "name". Finally, the [ \w]* specifies what kind of text to match. The stuff inside the square brackets specifies which characters to look for: spaces () and/or alphanumeric characters (\w). The * outside the square brackets indicates that we will accept any number of these characters.
\( Next we are looking for a space and an open parenthesis. The backslash in front of the parenthesis is to indicate that we are looking for a literal parenthesis, i.e. this parenthesis should not be interpreted as the start of another token to capture.
(?<age>[\d]*) Another token to capture. This one is called "age" and contains any number of \d (numeric characters).
years old \) - Cause of death: More text to look for. Again, we will be matching this text, but we will not capturing it (because it is not enclosed in parentheses).
(?<city>[ \w]*) Another token to capture. This one is called "city" and contains any number of spaces and/or alphanumeric characters.
, Comma, space
(?<province>[ \w]*), (?<country>[ \w]*) - Date of death: You get the idea
(?<date>[ ,\w]*) Our final token, called "date", which contains any number of spaces, commas, and/or alphanumeric characters.
Then we parse the strings into a struct array:
parsed_fields = regexp(text_array, format, 'names');
parsed_fields = [parsed_fields{:}]'
This is what the output should look like:
>> parsed_fields(1)
ans =
name: 'Jacqueline Cowdrey'
age: '50'
cause: 'unknown'
city: 'Worthing'
province: 'West Sussex'
country: 'United Kingdom'
date: 'November 20th, 2013'
So you can get your vector of countries pretty straightforward-ly:
Country = {parsed_fields.country}';
Age is a simple numeric conversion:
Age_str = {parsed_fields.age};
Age = cellfun(#str2double, Age_str)';
Date as a string is pretty easy:
Date_str = {parsed_fields.date}';
But it's nice to have it as a MATLAB "serial date number", which allows arithmetic computations and reformatting into different types of representation formats. Unfortunately, having the day as "20th" instead of "20" is incompatible with the conversion functions, so we'll need to first strip off the "st", "nd", "rd" from "1st", "2nd", "3rd", etc:
Date_str = regexprep(Date_str, '(?<day>[\d]+)(st|nd|rd|th)', '$<day>');
Date_num = datenum(Date_str, 'mmmm dd, yyyy');
Some other notes:
If the file is very large, you may wish to use fgetl to read it one line at a time (and then also parse it one line at a time) rather than reading the entire file into memory as we did above.
In your example, it looks like the entries are separated by an extra newline. I'm not sure if that's case in your actual data or if that's just a stackoverflow thing, but if you need to remove these newlines you can do so with:
is_empty_line = cellfun(#isempty, text_array);
text_array = text_array(~is_empty_line);
In your example, there were a lot of typos (an extra space here and there, sometimes the colons or dashes were other symbols). If these typos exist in your actual data, you will need to adjust the format specification to account for this. For example, instead of using - to match (space, dash, space), you can use \s*\W\s* to match (any number of whitespace characters, a single non-alphanumeric character, any number of whitespace characters).
If syntax like format = [format{:}]; or Country = {parsed_fields.country}'; look strange to you, these are equivalent to:
format = [format{1} format{2} format{3} ... format{end}];
Country = cell(length(parsed_fields),1);
for ii = 1:length(parsed_fields)
Country{ii} = parsed_fields(ii).country;
end
MATLAB R2014b added a new datetime class, so there may be a better way to deal with that nowadays.
Sorry about my previous answer; I had misunderstood how exactly the data is formatted.
As before, let's first read the text file into a cell array of strings:
fid = fopen('deaths.txt');
scanned_fields = textscan(fid, '%s', 'Delimiter','\n');
text_array = scanned_fields{1};
fclose(fid);
While textscan is capable of some rudimentary parsing, it's not sophisticated enough for what we're doing. So we're just using it to read each line as a single string: format %s means we are expecting a string, and setting Delimiter to \n means that the strings are separated by newline characters.
In the sample data you posted, each entry is 4 lines (name, cause, location, date) followed by an empty line. As long as we can rely on this formatting, this provides an easy way to split up the data (instead of the regexp parsing I proposed in my previous answer).
name_str_array = text_array(1:5:end);
cause_str_array = text_array(2:5:end);
loc_str_array = text_array(3:5:end);
date_str_array = text_arary(4:5:end);
So for example, name_strs is going to be every 5th line, starting with line #1. Likewise, cause_strs is every 5th line, starting with line #2. Just be careful that there are not any extra or missing lines in the data.
Next we will parse each of these to get the information that we want. In my previous answer, I proposed parsing all of the strings at once, but I think it would be easier to understand if we went through it one entry at a time. For example, let's consider the first entry.
name_str = name_str_array{1};
loc_str = loc_str_array{1};
date_str = date_str_array{1};
Let's start with the easiest one: parsing the date.
date_format = 'Date of death:\s*(?<date>.*)';
parsed_fields = regexp(date_str, date_format, 'names');
DOD = parsed_fields.date;
The format we're looking for is the string Date of death:, followed by any number of whitespace characters (\s*), followed by the chunk of text (aka "token") that we wish to capture: (?<date>.*)
The parentheses indicate that this is a token we wish to capture, the ?<date> indicates that we wish to call this token "date", and the .* specifies which characters to look for. The . is the universal wildcard, i.e. it matches all possible characters. The * indicates that we are interested in any number of repeats. So in essence, this .* means "match all remaining characters in the string".
Calling regexp with the names option causes it to return a struct with the named tokens as its fields.
Next, let's do the country. This one is a little trickier because there is a variable number of city/region specifiers. But the country will always be the last one, so that's the one we'll grab.
country_format = '(?<country>\w[ \w]*)$';
parsed_fields = regexp(loc_str, country_format, 'names');
Country = parsed_fields.country;
This format specification is the token (?<country>\w[ \w]*) followed by the end of the string (denoted by the special character $). In the token specification we are matching an alphanumeric character (\w) followed by any number of spaces and/or alphanumeric characters ([ \w]*). The reason for specifying this leading \w is so that we don't match the space between the previous comma and the start of the country name.
Finally, let's do the age. This one is tricky because not every entry has an age. At least it's easy because the age (if it exists) is the only numeric data in the line. Hence:
age_format = '(?<age>[\d]+)';
parsed_fields = regexp(name_str, age_format, 'names');
if isempty(parsed_fields)
Age = -1;
else
Age = str2double(parsed_fields.age);
end
The format specification is simply the token (?<age>[\d]+), which specifies that we are looking for numeric characters (\d), and we are looking for one or more of them (+).
After parsing, we check whether or not there was a match. If not (parsed_fields is empty), then we assign Age a value of -1. Otherwise, we convert the parsed age field into a number.
So putting it all together:
date_format = 'Date of death:\s*(?<date>.*)';
country_format = '(?<country>\w[ \w]*)[\W]?$';
age_format = '(?<age>[\d]+)';
nEntries = length(date_str_array);
DOD = cell(nEntries, 1);
Country = cell(nEntries, 1);
Age = zeros(nEntries, 1);
for ii = 1:nEntries
name_str = name_str_array{ii};
loc_str = loc_str_array{ii};
date_str = date_str_array{ii};
parsed_fields = regexp(date_str, date_format, 'names');
assert(~isempty(parsed_fields), 'Could not parse date from:\n%s', date_str);
DOD{ii} = parsed_fields.date;
parsed_fields = regexp(loc_str, country_format, 'names');
assert(~isempty(parsed_fields), 'Could not parse country from:\n%s', loc_str);
Country{ii} = parsed_fields.country;
parsed_fields = regexp(name_str, age_format, 'names');
if isempty(parsed_fields)
Age(ii) = -1;
else
Age(ii) = str2double(parsed_fields.age);
end
end
I added the assert statements to help debug what's going on if you encounter errors in parsing.
For example, you may also notice that I added an [\W]? to the country format. This is because while running it on your example data, I encountered one country that contained a period at the end of the line (i.e. it ended with "Brazil." instead of just "Brazil"). So now we're looking to match a non-alphanumeric character (\W) repeated zero or 1 times (?), and it's outside of the parentheses so it is not being captured as part of the "country" token.
I need to parse a string like this:
Apr 3, 2014 10:03:51 AM
to something like this:
YYYY-MM-DD HH:MM:SS
And also, this long long:
1396682344000
To the same kind of string:
YYYY-MM-DD HH:MM:SS
Is there any library or function to do that? I am not very confortable writing C and I am not used to parse this kind of strings.
I tried with strptime with this code:
observationDate_message is the like first string (Apr 3, 2014 10:45:01 AM)
strptime(observationDate_message, "%G-%m-%d %r", &result);
debugLog(DEB_INFO, "observationDateConverted: %d-%d-%d %d:%d:%d\n", result.tm_year, result.tm_mon, result.tm_mday, result.tm_hour, result.tm_min, result.tm_sec);
And what I get is:
0-52-0 36905376:32630:1497284224
Tutorial in; http://pic.dhe.ibm.com/infocenter/iseries/v7r1m0/index.jsp?topic=%2Frtref%2Fstrpti.htm
Check if your system has the function strptime. It's part of POSIX and will do the parsing of the string for you. To convert in the opposite direction there's the C standard function strftime.
you can parse
Apr 3, 2014 10:03:51 AM
this string using sscanf() and get the year month date and time information.
if str contains the string,
sscanf(str,"%s %d, %d %d:%d:%d AM",month,&dd,&yy,&hh,&mm,&ss);
you can get the data from string. This is just a example you can extract formatted data from string as you want using sscanf()
I have struct as:
struct stored
{
char *dates; // 12/May/2010, 10/Jun/2010 etc..
};
// const
struct stored structs[] = {{"12/May/2010"}, {"12/May/2011"},
{"21/May/2009"}, {"13/May/2011"},
{"10/May/2011"}, {"19/May/2011"}};
What I want to do is to sort struct 'stored' by stored.dates.
qsort(structs, 9, sizeof(struct stored*), sortdates); // sortdates function
I'm not quite sure what would be a good way to sort those days? Compare them as c-strings?
I would convert the dates to numbers using something like:
year * 10000 + month * 100 + day;
and then do a simple numeric comparison (and for month, you'll need to map from Jan to 1, Feb to 2, etc.).
If you're doing a lot of comparisons, you may want to cache the numeric equivalent in the structure.
If you convert the dates to the format YYYYMMDD (as in 20100314), you can compare them as a string or as an integer (after conversion).
ISO 8601 formatted dates ("YYYYMMDD" or "YYYY-MM-DD" etc.) are trivially comparable as C strings. Your format is not - would changing the format of the date strings be an option?
PS: If you get rid of the "-", you could even store the date as plain 32bit integer. Depending on what your application does with those dates, that might be an additional bonus.
You can't compare these as strings, but you can compare substrings. Compare the years, and if they aren't equal you have your answer. Next compare the months, you'll need some kind of table to order the months by name. Finally if the months are the same, compare the days.