Parsing a document , C - c

I need to parse a document in C language. I was about to use the strtok function but I don't know if it's the best method or if just a token system is enough (searching for \n, space etc).
The structure of each line of the document is : element \n element "x".
thanks :-)

Token system if fine, strtok is just an implementation of that. However, you're better off with using strtok_r which does not keep any internal state outside control of your program.

I don't remember the details, but I saw in several sources that strtok was an unsafe piece of work. You'd be better off rolling your own, if you ask me.

Related

Elegant way of obtaining the position of a certain char in txt File in C

Im practicing C right now. I want to read in a txt File and find the line and the place in that line for the char 'X'. Is there a more elegant way then iterating over fgets for the line number and then iterating over getchar ?
That is pretty much the only way to do it.
Of course, you can practice using any of the file-access and string-manipulation functions, to understand their beauty or their limitations. But essentially, that is it - read characters one by one. Whether you do it "by hand", or by calling a specialized function (which will do that for you).
You might understand it better if you extend the problem to find all occurencies of some character.
Note: many times, computers to things exactly like the humans do, but faster. Why? Because the programmer implements in the SW the same algorithm which is in their mind. From there, optimizations can be done.
Just think about it: How would you find the first "X" in a (random) book?

Check if a string has only whitespace characters in C

I am implementing a shell in C11, and I want to check if the input has the correct syntax before doing a system call to execute the command. One of the possible inputs that I want to guard against is a string made up of only white-space characters. What is an efficient way to check if a string contains only white spaces, tabs or any other white-space characters?
The solution must be in C11, and preferably using standard libraries. The string read from the command line using readline() from readline.h, and it is a saved in a char array (char[]). So far, the only solution that I've thought of is to loop over the array, and check each individual char with isspace(). Is there a more efficient way?
So far, the only solution that I've thought of is to loop over the array, and check each individual char with isspace().
That sounds about right!
Is there a more efficient way?
Not really. You need to check each character if you want to be sure only space is present. There could be some trick involving bitmasks to detect non-space characters in a faster way (like strlen() does to find a NUL terminator), but I would definitely not advise it.
You could make use of strspn() or strcspn() checking the returned value, but that would surely be slower since those functions are meant to work on arbitrary accept/reject strings and need to build lookup tables first, while isspace() is optimized for its purpose using a pre-built lookup table, and will most probably also get inlined by the compiler using proper optimization flags. Other than this, vectorization of the code seems like the only way to speed things up further. Compile with -O3 -march=native -ftree-vectorize (see also this post) and run some benchmarks.
"loop over the array, and check each individual char with isspace()" --> Yes go with that.
The time to do that is trivial compared to readline().
I'm going to provide an alternative solution to your problem: use strtok. It splits a string into substrings based on a specific set of ignored delimiters. With an empty string, you'd just get no tokens at all.
If you need more complicated matching than that for your shell (eg. To do quoted arguments) you're best off writing a small tokenizer/lexer. The strtok method is basically to just look for any of the delimeters you've specified, temporarily replace them with \0, returning the substring up to that point, putting the old character back, and repeating until it reaches the end of the string.
Edit:
As the busybee points out in the comment below, strtok does not put back the character that it replaces with \0. The above paragraph was worded poorly, but my intent was to explain how to implement your own simple tokenizer/lexer if you needed to, not to explain exactly how strtok works down to the smallest detail.

Extracting the domain extension of a URL stored in a string using scanf()

I am writing a code that takes a URL address as a string literal as input, then runs the domain extension of the URL through an array and returns the index if finds a match, -1 if does not.
For example, an input would be www.stackoverflow.com, in this case, I'd need to extract only the com part. In case of www.google.com.tr, I'd need only com again, ignoring the .tr part.
I can think of basically writing a function that'll do that just fine but I'm wondering if it is possible to do it using scanf() itself?
It's really an overhead to use scanf here. But you can do this to realize something similar
char a[MAXLEN],b[MAXLEN],c[MAXLEN];
scanf("%[^.].%[^.].%[^. \n]",a,b,c);
printf("Desired part is = %s\n",c);
To be sure that formatting is correct you can check whether this scanf call is successful or not. For example:
if( 3 != scanf("%[^.].%[^.].%[^. \n]",a,b,c)){
fprintf(stderr,"Format must be atleast sth.something.sth\n");
exit(EXIT_FAILURE);
}
What is the other way of achieving this same thing. Use fgets to read the whole line and then parse with strtok with delimiters ".". This way you will get parts of it. With fgets you can easily support different kind of rules. Instead of incorporating it in scanf (which will be a bit difficult in error case), you can use fgets,strtok to do the same.
With the solution provided above only the first three parts of the url is being considered. Rest are not parsed. But this is hardly the practical situation. Most the time we have to process the whole information, all the parts of the url (and we don't know how many parts can be there). Then you would be better using fgets/strtok as mentioned above.

Scan all characters entered until see a tab char

I want to autocomplete on a command line application a but like in bash you can use tab key which will complete the command. But getchar() seems to wait until a newline char is received before it starts reading any characters.
scanf seems to work the same way.
Is there any way I can scan characters one at a time no matter if they are whitespace or control characters?
I want to be able to read char by char as entered building up a command and then as soon as tab char received I will attempt to lookup how to complete and print full command in my application.
Don't reinvent the wheel.
Use what other projects use to build command lines: Libraries that implement that job for you.
Many CLI prompts depend on GNU readline, which I think is fine, if a bit cluttered and heavy. Also, it's GPL, so depends on whether you like that or not.
I'd look into linenoise. It's very lightweight, and if you decide you really want to implement user interface yourself rather than including one or two files, OK, do that, but look at the rather concise reference implementation that seems to be. Caveat: haven't used it myself, so far. The API is pretty simple, though, as can be seen in their example.
A popular alternative is libedit/editline.
You need the GNU Readline Library. Use the rl_bind_key() function to add a filename-completion processing function whenever the users presses a key ('\t', in your case).
You are in for a world of hurt if you try to roll your own readline style function.
(If you are on Windows.)

how can i parse a string using c

Given below is my string
char test[1000]="$GPGSA,A,3,14,20,22,25,31,32,,,,,,,2.4,1.4,1.9*3A
$GPGSV,4,1,16,31,76,060,35,14,28,070,34,20,32,309,32,32,61,309,32*72\n
$GPGSV,4,2,16,25,21,053,29,24,37,258,29,23,14,277,27,12,,,21*44\n
$GPGSV,4,3,16,22,13,133,20,11,20,272,,16,11,161,,30,,,*4F\
n$GPGSV,4,4,16,29,,,,28,,,,27,,,,26,,,*7E\n
$GPGGA,150427.8,4001.022852,N,10505.269674,W,1,06,1.4,1559.6,M,-21.0,M,,*53\n
$PQXFI,150427.8,4001.022852,N,10505.269674,W,1559.6,35.12,25.46,2.05*4A\n
$GPVTG,nan,T,nan,M,0.0,N,0.0,K,A*23\n
$GPRMC,150427.8,A,4001.022852,N,10505.269674,W,0.0,,280611,,,A*50";
I want to get string
"$GPGGA,150427.8,4001.022852,N,10505.269674,W,1,06,1.4,1559.6,M,-21.0,M,,*53\n"
from above big string using C Language.
Please help me out.
You say which line you want, but you didn't say why. If you say what it is about this line that makes it the line you are after, then I could comment on how you'd find it.
But basically, you'll probably want to separate the string into lines. You can use strtok() to break on \n. You can then examine the lines, one at a time.
This looks like GPS data to me and is used (and parsed) in many applications.
http://mbed.org/users/todotani/notebook/gps-nmea-parser/
http://www.edaboard.com/thread204021.html
You might be able to save yourself some time by re-using some other open source parsers.
The strstr() command, part of the standard C library, can be used to find a substring within a string.

Resources