I'm attempting to fully justify (left and right columns line-up) input from files and this is what I came up with. The input files have embedded commands so from my pseudo output below I start justifying at the company's line and end at telephone As you can see it randomly joins two of the lines read together. Can someone please tell me why it's doing this? My input files definitely have newline characters in them since I double checked they were entered.
Also how do I do the following: Check if my read line will fit into my output array (of 40 char)? If it doesn't I want to move the overflowed string(s) into the next line or char(s) if it's easier. This one isn't as necessary as my first question but I would really like to make the output as nice as possible and I don't know how to restrict and carry overflow from read lines into the next output array.
Since it began to escape from AT&T's Bell Laboratories in
the early 1970's, the success of the UNIX
operating system has led to many different
versions: recipients of the (at that time free) UNIX system
code all began developing their own different
versions in their own different ways for use and sale.
Universities, research
institutes, government bodies and computer
companies all began using the powerful
UNIX system to develop many of the
technologies which today are part of a
UNIX system. Computer aided design,
manufacturing control systems,laboratorysimulations,even the Internet itself,
all began life with and because of UNIX
Today, without UNIX systems, the Internewould come to a screeching halt.
Most telephone calls could not be made,
electronic commerce would grind to a halt and
there would have never been "Jurassic Park"!
Below is my justify function that's passed the read file line using fgets in another function. The printf lines are just for debugging.
void justify(char strin[]){
int i = 0; //strin iterator
int j = 0; //out iterator
int endSpaces = LINE + 1 - strlen(strin);
int voids = countwords(strin) - 1;
printf("Voids: %d\n", voids);
printf("Input: %s", strin);
//No words in line, exit
if (voids <= 0)
return;
//How many to add between words
int addEvenly = endSpaces/voids;
int addUnevenly = endSpaces % voids;
printf("space to distribute: %d evenly: %d unevenly: %d\n", endSpaces, addEvenly, addUnevenly);
//Copy space left of array to output
while (strin[i] == ' '){
outLine[j++] = ' ';
i++;
}
//One word at a time
while (endSpaces > 0 || addUnevenly > 0){
//Copy letters into out
while (strin[i] != ' '){
outLine[j] = strin[i];
i++;
j++;
}
//Add the necessary spaces between words
if (addEvenly > 0){
for (int k = 0; k < addEvenly; k++){
outLine[j++] = ' ';
}
}
//Distribute to the left
if (addUnevenly > 0){
outLine[j++] = ' ';
endSpaces--;
addUnevenly--;
}
printf("Output: %s\n\n", outLine);
endSpaces = endSpaces - addEvenly;
//Finish copying rest of input to output when no more spaces to add
if (endSpaces == 0 && addUnevenly == 0){
while (strin[i] != '\0')
outLine[j++] = strin[i++];
printf("Output 2: %s\n", outLine);
}
}
fprintf(out, "%s", outLine);
}
On sunday I created a function (justifyline()) able to justify and indent a line you give it as input. It outputs a buffer containing the justified (formatted) text and any eventual text-remainder; such a remainder may be used as input to the function justifyline().
After this step I've used the file below (text.txt) to test the behaviour of such a function. That test demonstrates me the need to use also word wrapping between lines. Then I've written the function formatLineByLine(). The function formatLineByLine() doesn't care of void lines.
Text file (text.txt): (I used the text in your question trying to correct it, but not all I've corrected, then the input file suffers of this fact!)
Since it began to escape from AT&T's
Bell Laboratories in the early 1970's,
the success of the UNIX operating system
has led to many different versions:
recipients of the (at that time free)
UNIX system code all began developing
their own different versions in their
own different ways for use and sale.
Universities, research institutes,
government bodies and computer companies
all began using the powerful UNIX system
to develop many of the technologies which
today are part of a UNIX system.
Computer aided design, manufacturing
control systems, laboratory simulations,
even the Internet itself, all began life
with and because of UNIX Today, without
UNIX systems, the Internet would come to a
screeching halt. Most telephone calls
could not be made, electronic commerce
would grind to a halt and there would
have never been "Jurassic Park"!
The output of the function formatLineByLine()
ABCDE12345678901234567890123456789012345
Since it began to escape from
AT&T's Bell Laboratories in the
early 1970's, the success of the
UNIX operating system has led to
many different versions: recipients
of the (at that time free) UNIX
system code all began developing
their own different versions in
their own different ways for use
and sale. Universities, research
institutes, government bodies and
computer companies all began using
the powerful UNIX system to develop
many of the technologies which
today are part of a UNIX system.
Computer aided design,
manufacturing control systems,
laboratory simulations, even the
Internet itself, all began life
with and because of UNIX Today,
without UNIX systems, the Internet
would come to a screeching halt.
Most telephone calls could not be
made, electronic commerce would
grind to a halt and there would
have never been "Jurassic Park"!
Another step is the idea to use a paragraph per paragraph justifycation. Then I've written the function justifyParagraph(). The function formatInParagraphs() reads the file text.txt and prints it justified using the function justifyParagraph().
The output of the function formatInParagraphs()
ABCDE12345678901234567890123456789012345
Since it began to escape from
AT&T's Bell Laboratories in the
early 1970's, the success of the
UNIX operating system has led to
many different versions: recipients
of the (at that time free) UNIX
system code all began developing
their own different versions in
their own different ways for use
and sale.
Universities, research
institutes, government bodies and
computer companies all began using
the powerful UNIX system to develop
many of the technologies which
today are part of a UNIX system.
Computer aided design,
manufacturing control systems,
laboratory simulations, even the
Internet itself, all began life
with and because of UNIX Today,
without UNIX systems, the Internet
would come to a screeching halt.
Most telephone calls could not be
made, electronic commerce would
grind to a halt and there would
have never been "Jurassic Park"!
The function justifyline() is able to create a justified buffer with indentation (parameter size_t indent) and to use also a single space between the words (parameter int nospacing sent as 1).
The function justifyParagraph() is able to create a justified buffer with line indentation (parameter: size_t indent) and 1st line indentation (parameter: size_t indentstart). The formatted output may be directly printed when a NULL output buffer is sent to the function (parameter char **outbuf sent as NULL). The last line the function generates may be justified or not (parameter: int notFrmtLast sent as 1).
Both justification functions, when the parameter char **outbuf points a NULL pointer ( *outbuf == NULL ), allocate memory using malloc() . In this case you have to free the buffer after its use. If this parameter is passed as NULL to the function justifyParagraph(), the function prints the elaborated output, if outbuf is passed as NULL to the function justifyline(), the function returns an error.
The code is below. An issue of this code is that, in some cases, the length of the string should be computed using a function different from strlen(). To avoid this problem you may use these functions with lines that have a single space between the words. Such a problem affects the functions justifyParagraph() and formatLineByLine().
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int justifyLine(char *inbuf, char **outbuf, size_t linelen, char ** endptr, size_t indent, int nospacing);
int justifyParagraph(char *inbuf,char **outbuf,size_t linelen,size_t indentstart,size_t indent,int notFmtLast);
int formatLineByLine(FILE *f, size_t linelen,size_t indent, int notFrmtLast);
int formatInParagraphs(FILE *f, size_t linelen,size_t indentstart,size_t indent, int notFrmtLast);
int justifyParagraph(char *inbuf,char **outbuf,size_t linelen,size_t indentstart,size_t indent,int notFmtLast)
{
char *optr=NULL,*endp=NULL;
size_t len,s;
int retval,nf;
for(;;) { //Error control loop
if (inbuf==NULL) {
retval=0x10;break;
}
if (indent+indentstart>linelen) {
retval=0x20;break;
}
if (outbuf!=NULL) {
if (*outbuf==NULL) {
if ( (*outbuf=malloc(linelen+1))==NULL ){
retval=0x30;break;
}
}
optr=*outbuf;
}
endp=inbuf;
indent+=indentstart;
len=linelen-indent;
s=indentstart;nf=0;
while( *endp!=0) {
if (notFmtLast && strlen(endp)<linelen-indent)
nf=1;
if ( (retval=justifyLine(endp,&optr,linelen,&endp,
indent,nf)) ) {
retval|=0x40;break;
}
if (outbuf!=NULL) {
optr+=strlen(optr);
*optr++='\n';
*optr=0;
} else {
puts(optr);
}
indent-=s;
len+=s;
s=0;
}
break; //Close error ctrl loop!
}
if (outbuf==NULL && optr!=NULL)
free(optr);
return retval;
}
int justifyLine(char *inbuf,char **outbuf,size_t linelen, char ** endptr,size_t indent,int nospacing)
{
size_t textlen,tmp;
size_t spctoadd,spcodd,spcin;
size_t timetoodd;
size_t ibidx,obidx,k,wc;
char * endp;
char * outb=NULL;
int retval=0;
for(;;) { //Error control loop
endp=inbuf;
if (inbuf==NULL) {
retval=1;break;
}
if (indent>linelen) {
retval=2;break;
}
if (outbuf==NULL) {
retval=3;break;
}
if (*outbuf==NULL) {
if ( (*outbuf=malloc(linelen+1))==NULL ){
retval=4;break;
}
}
outb=*outbuf;
//Leave right spaces
while(*inbuf==' ')
inbuf++;
if (*inbuf==0) {
endp=inbuf;
*outb=0;
break; //exit from error loop without error!
}
linelen-=indent;
//Count words and the minimum number of characters
ibidx=0;
wc=0;textlen=0;k=1;endp=NULL;
while ( *(inbuf+ibidx)!=0 ) {
if (*(inbuf+ibidx)==' ') {
ibidx++;continue;
}
//There's a char!
k=ibidx; //last word start
tmp=textlen;
wc++;textlen++; //add the space after the words
//textlen<linelen because textlen contains also the space after the word
// while(textlen<=linelen && *(inbuf+ibidx)!=' ' && *(inbuf+ibidx) ) {
while(*(inbuf+ibidx)!=' ' && *(inbuf+ibidx) ) {
textlen++;ibidx++;
}
if (textlen>linelen+1) {
endp=inbuf+k;
textlen=tmp;
wc--;
break;
}
}
textlen=textlen-wc;
if (endp==NULL) {
endp=inbuf+ibidx;
}
if (textlen<2) {
*outb=0;
break; //exit from error loop without error!
}
//Prepare outbuf
memset(outb,' ',linelen+indent);
*(outb+linelen+indent)=0;
ibidx=0;
obidx=indent;
if (wc>1) {
if (!nospacing) {
//The odds are max in number == wc-2
spctoadd=linelen-textlen;
} else {
spctoadd=wc-1;
}
spcin=spctoadd/(wc-1);
spcodd=spctoadd % (wc-1);
if (spcodd)
timetoodd=(wc-1)/spcodd;
k=timetoodd;
while(spctoadd) {
while(*(inbuf+ibidx)!=' ') {
*(outb+obidx++)=*(inbuf+ibidx++);
}
obidx+=spcin;spctoadd-=spcin;
if (spcodd && !(--k)) {
k=timetoodd;
spcodd--;
spctoadd--;
obidx++;
}
while(*(inbuf+ ++ibidx)==' ');
}
}
while(*(outb+obidx) && *(inbuf+ibidx) && *(inbuf+ibidx)!=' ')
*(outb+obidx++)=*(inbuf+ibidx++);
//There're words longer then the line!!!
if (*(inbuf+ibidx) && *(inbuf+ibidx)!=' ')
endp=inbuf+ibidx;
break; //Terminate error ctrl loop.
}
if (endptr!=NULL)
*endptr=endp;
return retval;
}
int formatLineByLine(FILE *f, size_t linelen,size_t indent, int notFrmtLast)
{
char text[250],*app;
//justifyLine allocates memory for the line if the outbuf (optr) value is NULL
char * optr=NULL;
size_t j,k;
//print a ruler
for(j=0;j<indent;j++)
printf("%c",'A'+(char)j);
for(j=1;j<=linelen-indent;j++)
printf("%c",'0'+(char)(j%10));
printf("\n");
//starts printing
fseek(f,0,SEEK_SET);
j=0;
while(fgets(text+j,sizeof(text)-j,f)) {
if ( (app=strrchr(text+j,'\n')) ) {
*app=0;
}
k=strlen(text);
if (strlen(text)<linelen-indent) {
if (!*(text+k) && *(text+k-1)!=' ') {
*(text+k++)=' ';
*(text+k)=0;
}
j=k;
continue;
}
app=text;
do {
//justifyLine allocates memory for the line if the outbuf (optr) value is NULL
if ( justifyLine(app,&optr,linelen,&app,indent,0) ) {
if (optr!=NULL)
free(optr);
return 1;
}
printf("%s\n",optr);
j=(*app!=0)?strlen(app):0;
} while(j>linelen-indent);
if (j) {
strcpy(text,app);
*(text+j++)=' ';
*(text+j)=0;
}
}
if (*text!=0 && j) {
if ( justifyLine(text,&optr,linelen,NULL,indent,notFrmtLast) )
{
if (optr!=NULL)
free(optr);
return 2;
}
printf("%s\n",optr);
}
//justifyLine allocates memory for the line if the outbuf value is NULL
if (optr!=NULL)
free(optr);
return 0;
}
int formatInParagraphs(FILE *f, size_t linelen,size_t indentstart,size_t indent, int notFrmtLast)
{
char text[1024], *app;
//To uncomment when you use the commented justifyParagraph line.
//see below
//char *outbuf=NULL;
size_t j;
//print a ruler
for(j=0;j<indent;j++)
printf("%c",'A'+(char)j);
for(j=1;j<=linelen-indent;j++)
printf("%c",'0'+(char)(j%10));
printf("\n");
//starts printing
fseek(f,0,SEEK_SET);
j=0;
while(fgets(text+j,sizeof(text),f)) {
if ( (app=strrchr(text+j,'\n')) ) {
*app++=' ';*app=0;
}
if ( *(text+j)==' ' && !*(text+j+1) ) {
//The following commented line allocates memory creating a paragraph buffer!
//doesn't print the formatted line.
//justifyParagraph(text,&outbuf,linelen,indentstart,indent,notFrmtLast);
//This line directly print the buffer allocating and de-allocating
//only a line buffer. It prints the formatted line.
justifyParagraph(text,NULL,linelen,indentstart,indent,notFrmtLast);
j=0;
//To uncomment when you use the commented justifyParagraph line.
// printf("%s\n\n",outbuf);
puts("");
} else {
j+=strlen(text+j);
}
}
return 0;
}
int main(void)
{
FILE * file;
file=fopen("text.txt","r");
formatLineByLine(file,40,5,1);
puts("");
formatInParagraphs(file,40,5,5,1);
fclose(file);
return 0;
}
You were incredibly close – but you forgot one thing!
After copying a word into outLine, you insert the correct number of additional spaces, and continue with 'the next word'. However, at that point the input pointer i still is at the end of the previously copied word (so it points to the first space immediately after that). The test while (strin[i] != ' ') then immediately fails and you insert the additional spaces at that point again. This continues until you run out of spaces to add, and at the very end you add what was not processed, which is "the entire rest of the string".
The fix is simple: after copying your word into outLine, copy the original space(s) as well, so the i iterator gets updated to point to the next word.
//One word at a time
while (endSpaces > 0 || addUnevenly > 0)
{
//Copy letters into out
while (strin[i] != ' ')
{
outLine[j] = strin[i];
i++;
j++;
}
//Copy original spaces into out <-- FIX!
while (strin[i] == ' ')
{
outLine[j] = strin[i];
i++;
j++;
}
With this, your code works entirely as you intended. Output:
|Since it began to escape from AT&T's Bell Laboratories in|
|the early 1970's, the success of the UNIX|
|operating system has led to many different|
|versions: recipients of the (at that time free) UNIX system|
|code all began developing their own different|
|versions in their own different ways for use and sale.|
| Universities, research|
|institutes, government bodies and computer|
|companies all began using the powerful |
|UNIX system to develop many of the |
|technologies which today are part of a |
|UNIX system. Computer aided design, |
|manufacturing control systems,laboratorysimulations,even the Internet itself, |
|all began life with and because of UNIX |
|Today, without UNIX systems, the Internewould come to a screeching halt.|
|Most telephone calls could not be made,|
|electronic commerce would grind to a halt and|
|there would have never been "Jurassic Park"! |
Possible improvements
Justified lines should never begin with whitespace (your Copy space left of array to output part). Just increment the pointer there:
//Copy space left of array to output
while (strin[i] == ' ')
{
// outLine[j++] = ' ';
i++;
endSpaces++;
}
(and move the calculation for How many to add between words below this, because it changes endSpaces).
The same goes for spaces at the end. You can adjust endSpaces at the start
int l = strlen(strin);
while (l > 0 && strin[l-1] == ' ')
{
l--;
endSpaces++;
}
and suppress copying the trailing spaces into outLn at the bottom. (That needs some additional tinkering, I couldn't get it right first time.)
It is much neater to ignore multiple spaces inside the input string as well, but that takes a bit more code.
With these three implemented, you get a slightly neater output:
|Since it began to escape from AT&T's Bell Laboratories in|
|the early 1970's, the success of the UNIX|
|operating system has led to many different|
|versions: recipients of the (at that time free) UNIX system|
|code all began developing their own different|
|versions in their own different ways for use and sale.|
|Universities, research|
|institutes, government bodies and computer|
|companies all began using the powerful|
|UNIX system to develop many of the|
|technologies which today are part of a|
|UNIX system. Computer aided design,|
|manufacturing control systems,laboratorysimulations,even the Internet itself,|
|all began life with and because of UNIX|
|Today, without UNIX systems, the Internewould come to a screeching halt.|
|Most telephone calls could not be made,|
|electronic commerce would grind to a halt and|
|there would have never been "Jurassic Park"!|
A drawback of this one-line-at-a-time method is that it cannot easily be rewritten to gather input until a line overflows. To do so, you need:
a routine that skips all spaces and return a pointer to the next word.
a routine that reads words until a line is 'overfull' – that is, the number of words plus (the number of words - 1) for spaces is larger than your LINE value. This uses routine #1 and outputs exactly one justified line.
You need to pass on the location and number of strings from your main to both these routines, and in both check if you are at the end of either a single input line or the entire input array.
I've written this main that contains two simple methods to center a text in a line. The first method only prints the text variable without modifying it, the second method modifies the text variable and then prints it. (Here method is not intended as function, the code contains two examples which you may translate easily in simple functions)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char text[81],fmt[10];
int linelen=80,tlen;
int spacetocenter=0;
printf("Insert text to center [max %lu char]:\n",sizeof(text)-1);
if (scanf("%[^\n]",text)<1) {
perror("scanf");
return -1;
}
getchar(); //Leaves return from the buffer
tlen=strlen(text);
spacetocenter=(linelen-tlen)/2;
if (spacetocenter<0)
spacetocenter=0;
//Method one (this doesn't modify text)
//This method directly prints the contents of text centered.
//----------------------------------------------------------
snprintf(fmt,sizeof(fmt),"%%%+ds\n",spacetocenter+tlen);
//printf("%s\n",fmt); // prints the used format
printf(fmt,text);
//Method two (this modifies text)
//This method modifies the contents of the variable text
//----------------------------------------------------------
memmove(text+spacetocenter,text,tlen+1);
memset(text,' ',spacetocenter);
printf("%s\n",text);
return 0;
}
Note:
After the second method is applied tlen no longer contains the length of text!
The program consider the line of 80 chars, if you need shorter/longer lines you have to modify the value of the variable linelen.
code realize function that reading file(contain lots of urls) ,every url pass through "evhttp_uri_parse" getting host and path.But it has a error that evhttp_uri_parse parse fail ,return NULL。Possibly reason is a stack overflow.
FILE *fp=fopen(argv[1],"rb");
if(NULL==fp)
{
printf("open url_file is error %d::%s\n",errno,strerror(errno));
return 0;
}
char url_buf[2048];
memset(url_buf,'\0',sizeof(url_buf));
fgets(url_buf,sizeof(url_buf),fp);
while(!feof(fp))
{
if(strlen(url_buf)>1)
{
printf("url_buf::%s",url_buf);
#if 1
struct evhttp_uri *ev_uri=NULL;
ev_uri=evhttp_uri_parse(url_buf);
if(ev_uri==NULL)
{
printf("parse uri error::%d,%s\n",errno,strerror(errno));
}
const char *host=evhttp_uri_get_host(ev_uri);
const char *path=evhttp_uri_get_path(ev_uri);
printf("query host::%s,path::%s\n",host,path);
evhttp_uri_free(ev_uri);
#endif
}
memset(url_buf,'\0',sizeof(url_buf));
fgets(url_buf,sizeof(url_buf),fp);
}
fclose(fp);
fgets(url_buf,sizeof(url_buf)+1,fp) should be changed to fgets(url_buf,sizeof(url_buf),fp)
fgets adds '\n' at the end of the string. Try to remove it and see if it helps.
if your url for any reason greater than 2048 character size then fgets will not completely return you the url you wanted and return you a part of it (with 2047 character) with a null character at 2048'th location only.
so thats why it's a bad idea to put sizeof(url_buf)+1. it will lead to undefined behavior since you will be accessing a location which is out of bound to url_buf array.
so check whether you got a string with newline character and change it to a null character, if you didn't get a newline character in the string then you might want to read until you get a newline to get the complete url.
this is applicable only if your url's are delimited by newline.
I'm learning to use libcurl in C. To start, I'm using a randomized list of accession names to search for protein sequence files that may be found hosted here. These follow a set format where the first line is a variable length (but which contains no information I'm trying to query) then a series of capitalized letters with a new line every sixty (60) characters (what I want to pull down, but reformat to eighty (80) characters per line).
I have the call itself in a single function:
//finds and saves the fastas for each protein (assuming on exists)
void pullFasta (proteinEntry *entry, char matchType, FILE *outFile) {
//Local variables
URL_FILE *handle;
char buffer[2] = "", url[32] = "http://www.uniprot.org/uniprot/", sequence[2] = "";
//Build full URL
/*printf ("u:%s\nt:%s\n", url, entry->title); /*This line was used for debugging.*/
strcat (url, entry->title);
strcat (url, ".fasta");
//Open URL
/*printf ("u:%s\n", url); /*This line was used for debugging.*/
handle = url_fopen (url, "r");
//If there is data there
if (handle != NULL) {
//Skip the first line as it's got useless info
do {
url_fread(buffer, 1, 1, handle);
} while (buffer[0] != '\n');
//Grab the fasta data, skipping newline characters
while (!url_feof (handle)) {
url_fread(buffer, 1, 1, handle);
if (buffer[0] != '\n') {
strcat (sequence, buffer);
}
}
//Print it
printFastaEntry (entry->title, sequence, matchType, outFile);
}
url_fclose (handle);
return;
}
With proteinEntry being defined as:
//Entry for fasta formatable data
typedef struct proteinEntry {
char title[7];
struct proteinEntry *next;
} proteinEntry;
And the url_fopen, url_fclose, url_feof, url_read, and URL_FILE code found here, they mimic the file functions for which they are named.
As you can see I've been doing some debugging with the URL generator (uniprot URLs follow the same format for different proteins), I got it working properly and can pull down the data from the site and save it to file in the proper format that I want. I set the read buffer to 1 because I wanted to get a program that was very simplistic but functional (if inelegant) before I start playing with things, so I would have a base to return to as I learned.
I've tested the url_<function> calls and they are giving no errors. So I added incremental printf calls after each line to identify exactly where the bus error is occurring and it is happening at return;.
My understanding of bus errors is that it's a memory access issue wherein I'm trying to get at memory that my program doesn't have control over. My confusion comes from the fact that this is happening at the return of a void function. There's nothing being read, written, or passed to trigger the memory error (as far as I understand it, at least).
Can anyone point me in the right direction to fix my mistake please?
EDIT: As #BLUEPIXY pointed out I had a potential url_fclose (NULL). As #deltheil pointed out I had sequence as a static array. This also made me notice I'm repeating my bad memory allocation for url, so I updated it and it now works. Thanks for your help!
If we look at e.g http://www.uniprot.org/uniprot/Q6GZX1.fasta and skip the first line (as you do) we have:
MNAKYDTDQGVGRMLFLGTIGLAVVVGGLMAYGYYYDGKTPSSGTSFHTASPSFSSRYRY
Which is a 60 characters string.
When you try to read this sequence with:
//Grab the fasta data, skipping newline characters
while (!url_feof (handle)) {
url_fread(buffer, 1, 1, handle);
if (buffer[0] != '\n') {
strcat (sequence, buffer);
}
}
The problem is sequence is not expandable and not large enough (it is a fixed length array of size 2).
So make sure to choose a large enough size to hold any sequence, or implement the ability to expand it on-the-fly.
Goal:
Find if a string contains a blank line. Whether it be '\n\n',
'\r\n\r\n', '\r\n\n', '\n\r\n'
Issues:
I don't think my current regex for finding '\n\n' is right. This is my first time really using regex outside of simple use of * when removing files in command line.
Is it possible to check for all of these cases (listed above) in one regex? or do I have to do 4 seperate calls to compile_regex?
Code:
int checkForBlankLine(char *reader) {
regex_t r;
compile_regex(&r, "*\n\n");
match_regex(&r, reader);
return 0;
}
void compile_regex(regex_t *r, char *matchText) {
int status;
regcomp(r, matchText, 0);
}
int match_regex(regex_t *r, char *reader) {
regmatch_t match[1];
int nomatch = regexec(r, reader, 1, match, 0);
if (nomatch) {
printf("No matches.\n");
} else {
printf("MATCH!\n");
}
return 0;
}
Notes:
I only need to worry about finding one blank line, that's why my regmatch_t match[1] is only one item long
reader is the char array containing the text I am checking for a blank line.
I have seen other examples and tried to base the code off of those examples, but I still seem to be missing something.
Thank you kindly for the help/advice.
If anything needs to be clarified please let me know.
It seems that you have to compile the regex as extended:
regcomp(&re, "\r?\n\r?\n", REG_EXTENDED);
The first atom, \r? is probably unnecessary, because it doesn't add to the blank-line condition if you don't capture the result.
In the above, blank line really means empty line. If you want blank line to mean a line that has no characters except for white space, you can use:
regcomp(&re, "\r?\n[ \t]*\r?\n", REG_EXTENDED);
(I don't think you can use the space character pattern, \s here instead of [ \t], because that would include carriage return and new-line.)
As others have already hinted at, the "simple use of * in the command line` is not a regular expression. This wildcard-matching is called file globbing and has different semantics.
Check what the * in a regex means. It's not like the wildcard "anything" in the command line. The * means that the previous component can appear any amount of times. The wildcard in regex is the .. So if you want to say match anything you can do .*, which would be anything, any amount of times.
So in your case you can do .*\n\n.* which would match anything that has \n\n.
Finally, you can use or in a regex and ( ) to group stuff. So you can do something like .*(\n\n|\r\n\r\n).* And that would match anything that has a \n\n or a \r\n\r\n.
Hope that helps.
Rather than looking for only \r or \n, look for not \r or \n?
Your regex would simply be
'[^\r\n]'
and a match result of false indicates a blank line to your specification.