locating all substring instances in a given file - c

I'm currently working on a function to find all images referenced in an html file, currently I am trying to to find these substrings within the file: ".bmp" ".gif" ".jpg" ".png" and also want to find their roots eg: /images/foo/ and then use these two substrings to make a new string: /images/foo/bar.jpg I know how I am going to concatenate the strings but I have no idea how I am going to locate the actual substrings, I feel quite overwhelmed right now and would really appreciate some help.

The "right" answer to this question ought to urge you to use tools that were built for the job. Smart people write stuff like libxml for a reason. Re-inventing the wheel will only make things more difficult. With libxml, for example, you easily traverse an XML tree like so:
for (cur_node = a_node; cur_node; cur_node = cur_node->next) {
if (cur_node->type == XML_ELEMENT_NODE) {
printf("node type: Element, name: %s\n", cur_node->name);
}
The "wrong" answer is to come up with some "trick" for finding the beginning of an image string, either by looking for the beginning of the image tag (<img) or a quote " as Doug mentions in the comments.
You'll notice that I put right and wrong in quotations. I'm somewhat of a purist and would strongly suggest an XML-oriented solution because it's wholly generalizable and easily extendible (tomorrow you may say: oh I also need the anchor text). A DOM parser makes every subsequent problem a breeze to solve.
But if you're working on a proof of concept or prototype (or maybe even homework) where everything's well-formed and you don't release your code in the wild, the "wrong" approach may be sufficient.

Related

regex: extract text between two string with text that match a specific word

I'm refactorying a very big C project and I need to find out some part of code written by specific programmer.
Fortunately every guy involved in this project mark his own code using his email address in standard C style comments.
Ok, someone could say that this could be achieved easily with a grep from command line, but this is not my goal: I may need to remove this comments or substitute them with other text so regex is the only solution.
Ex.
/*********************************************
*
* ... some text ....
*
* author: user#domain.com
*
*********************************************/
From this post I found the right expression to search for C style comments which is:
\/\*(\*(?!\/)|[^*])*\*\/
But that is not enough! I only need the comments which contains a specific email address. Fortunately the domain of email address I'm looking for seems to be unique in the whole project so this could make it simpler.
I think I must use some positive lookahead assertion, I've tried this one:
(\/\*)(\*(?!\/)|[^*](?=.*domain.com))*(\*\/)
but it doesn't run!
Any advice?
You can use
\/\*[^*]*(?:\*(?!\/)[^*]*)*#domain\.com[^*]*(?:\*(?!\/)[^*]*)*\*\/
See the regex demo
Pattern details:
/\* - comment start
[^*]*(?:\*(?!\/)[^*]*)* - everything but */
#domain\.com - literal domain.com
[^*]*(?:\*(?!\/)[^*]*)* - everything but */
\*\/ - comment end
A faster alternative (as the first part will be looking for everything but the comment end and the word #domain):
\/\*[^*#]*(?:\*(?!\/)[^*#]*|#(?!domain\.com)[^*#]*)*#domain\.com[^*]*(?:\*(?!\/)[^*]*)*\*\/
See another demo
In these patterns, I used an unrolled construct for (\*(?!\/)|[^*])*: [^*]*(?:\*(?!\/)[^*]*)*. Unrolling helps construct more efficient patterns.

Generate MEL to rebuild a model in Maya

I made a 3D tree in Maya and need MEL code to run in a for loop and generate as many trees as I want. Is there any way to convert a model made in Maya into MEL code that would rebuild the tree?
I can't just duplicate it because the script needs to generate the tree from scratch. Unfortunately, I cleared my history because it was messy, so I am looking for a way to generate the MEL code given just the geometry.
It is slightly unfortunate that you deleted the history. If you had not then you could have just built a script out if the history tree. But all is not lost, still i will explain how you can do this in case you do remake the history at some point.
The complexity of your history is never really a issue, at least when modeling. I would advice to not delete any history until your sure its of no use. Having a 1000 node history is not in any way slowing you down since most of it is inactive. Its only a problem once you enter animation even then its not necessarily a problem at all. Since the history deletion is a bit destructive its a good idea to save the scene when you are deleting big areas of history. So rule of thumb do not delete history and use the freeze transformation operation unless its really necessary (and it may be you never actually do this).
Another side remark: Copying stuff is still valid, unless like in this case there is a artificial limitation. The only time when this limitation is valid is if this is homework (in which case you may want to demonstrate that your willing to follow instruction). Doing a copy does not detract in any way the end result. Copying is mel (and all maya ascii files are mel of sorts).
Method 1: (when history is cleared)
You can reduce the problem to a normal rigging operation. What you do is you rig the tree that you built with bones, clusters and possibly blend shapes. Then just randomize the channels of your rig on each copy operation. The neat thing here is that the script can be easily extended with different base trees, or just about anything, just by making new separate rigs.
The bonus here is that you could now also easily animate the tree being blown by the wind etc.
This might feel like something outside the scope of the assignment. However a good MEL programmer does not really separate tasks. Using a node to do something is just as valid, if not more valid, as writing everything in code. Something like this demonstrates really thorough understanding of maya use and maya programming. On the other hand not all users are as enlightened (most are not).
Everything in Maya is or at least should be a rig. (the auxiliary to this is that your mel should strive to build a rig or you should use API to make nodes)
Method 2: (when history is cleared)
Chunk your result into pieces. Then randomize the chunk positions with L-system like rule based generator. It may seem counter intuitive to use a L-system with hand modeled pieces when the entire thing could be generated with a L-system.
But clunking allows for a very simple l-system to be built. The end result also retains the artistic integrity of the original in some ways that might be much more pleasing.
A slightly different version of this and method 3 is to make 2-3 overlapping set of branches in the model and then randomly delete branches for variation.
Method 3: (when history is cleared)
Cheat, well i would actually argue that in CG there's no such thing as cheating. Just copy the same tree about, alter its rotation ans scale. And randomize new shaders to your model (swap different leaf textures different color etc). When combined with even slight amount of rigginng approach can totally hide the fact that there is just one tree. Copies dont have to be all that exact they might be shaped with a lattice or something. This is actually pretty efficient, most people wouldn't notice.
Method 4: (when history is not cleared)
When you model something manually maya records this quite efficiently. The history can be turned into a script with little or no effort. The best kept secret. If you save this kind of scene as maya ascii then the maya ascii file is (almost) just mel and you can repeat this by adding variables to the stream to instrument it to mel. Beware tough that not all tools build meaningful history so point tweaks should be done on clusters instead on manual point tweaking that ends up in the shapes tweak array.
It is also possible to build the mel automatically with a mel script that checks for connections and values in nodes to reproduce the thing in code. The good thing is that you can readily introduce the instrumentation. Here's a really old version of code as giveaway (you need to handle the shaders and inputComponents for example):
/* jooConvertNodeNetworkToMelLadder.mel 0.0
Authors: Janne 'Joojaa' Ojala
Testing: Has known bugs, code deprecated, no bug reporting
Licence: Creative Commons Attribution-ShareAlike 3.0 Unported
About:
A deprecated early version of code generation it has some bugs and
is nowhere near perfect but you can have this code for the heck of
it. Main thing missing is that it does not check for selection
sets for nodes so you need to make those manually.
Install Instructions:
copy jooConvertNodeNetworkToMelLadder.mel to your maya script
directory
Usage:
Select node and run jooConvertNodeNetworkToMelLadder from mel
commandline.
Deprecated at 31.12.2006
*/
proc string generateAttrib(string $node, string $createNode,
string $attrib,string $varName){
string $return="";
if (getAttr($node + "." + $attrib) !=
getAttr($createNode + "." + $attrib))
$return = (" setAttr (" +
$varName + "+\"." + $attrib+"\") "+
getAttr($node + "." + $attrib) + ";\n");
return $return;
}
proc string doOneNode(string $nodeOrig,string $connections[]){
string $hist[]=`listConnections -c 1 -p 1 $nodeOrig`;
string $return="";
$type=`nodeType $nodeOrig`;
$varName=("$"+$nodeOrig);
$createNode=`createNode $type`;
$return=(" "+$varName+"=`createNode -n "+$nodeOrig+" "+$type+"`;\n");
if ($type != "mesh") {
for ($attrib in `listAttr -settable -multi -w -scalar $createNode`){
$return += generateAttrib($nodeOrig, $createNode,
$attrib, $varName);
}
}
delete $createNode;
for ($i=0;$i< size($hist);$i=$i+2){
if (size(`connectionInfo -dfs $hist[$i]`)){
string $node,$port,$type,$nodeOrig,$portOrig;
$node=`match "[^.]*" $hist[$i+1]`;
$port=`match "[.].*$" $hist[$i+1]`;
$nodeOrig=`match "[^.]*" $hist[$i]`;
$portOrig=`match "[.].*$" $hist[$i]`;
$connections[size($connections)]=
(" connectAttr -f ($" +
$nodeOrig + "+\"" + $portOrig + "\") ($" +
$node+"+\""+$port+"\");");
}
}
return $return;
}
global proc jooConvertNodeNetworkToMelLadder()
{
print "\n// jooConvertNodeNetworkToMelLadder result:\n{\n";
string $a[]={};
for ($node in `listHistory`)
print(`doOneNode $node $a`);
print $a;
print "}\n";
}
But this is all I have time for, hope this helps.

Understanding parsing SVG file format

First off, gist here
Map.svg in the gist is the original Map I'm working with, got it off wikimedia commons.
Now, there is a land mass off the eastern cost of Texas in that original svg. I removed it using Inkscape, and it re-wrote the path in a strange new way. The diff is included in the gist.
Now this new way of writing the path blows up my parser logic, and I'm trying to understand what happened. I'm hoping someone here knows more about the SVG file format that I do. I will admit I have not read through the entire SVG standard spec, however the parts of it I did read didn't mention anything about missing commands or relative coordinates. Then again I may have been looking at the incorrect spec, not sure.
The way I understood it, SVG path data was very straight forward, something like this:
(M,L,C)[point{n}] .... [Z] then repeat ad-nauseum
Now the part I'm trying to understand is this new Inkscape has written out what seems like relative coordinates, without commands like L, or L being implied somehow. My gut is telling me what has happened here is obvious to someone. For what it's worth I'm doing my parsing in C.
If you're parsing SVG, why not look at the SVG specification?
Start a new sub-path at the given (x,y) coordinate. M (uppercase) indicates that absolute coordinates will follow; m (lowercase) indicates that relative coordinates will follow. If a moveto is followed by multiple pairs of coordinates, the subsequent pairs are treated as implicit lineto commands.
From: http://www.w3.org/TR/2011/REC-SVG11-20110816/paths.html#PathDataMovetoCommands
You said,
The way I understood it, SVG path data was very straight forward, something like this: (M,L,C)[point{n}] .... [Z]
I don't know where you got that information. Stop getting your information from that source.
I will admit I have not read through the entire SVG standard spec...
Nobody reads the entire spec. Just focus on the part you're implementing at the moment. You could also start with SVG Tiny, and work with that subset for now.
Path Grammar is where you should start when writing a parser. If you can't read it, then buy a book on compilers.
Path grammar: http://www.w3.org/TR/2011/REC-SVG11-20110816/paths.html#PathDataBNF

ANTLR and arrays

I have question relating to implementation of arrays with Java+ANTLR combo. (I'm mainly talking about java/c style arrays).
So basically I'm asking how do you implement such feature, if there is such example already available or if someone could point me to anything that may point to solve it.
On other hand, I've searched a bit how would possible solution be. Main problem that I see
is that user may create arrays of various dimensions, even go crazy if he or she wants (like creating 5 dimension arrays or worse).
While grammar for something like this is fairly simple, like
new ID (INT (',' INT)* )
back end really gets involved a bit. As I said, user may input any number of dimensions, so array dimensions should be dynamically created. (at least as I see it, maybe I'm over complicating things?)
After searching I did found something that pretty much solves this problem perfectly, here is link to the question:
Is it possible to dynamically build a multi-dimensional array in Java?
Of course, my question is, is this viable example, it is a bit (to say at least), complicated? Is there more elegant solution to it?
Having that in mind, I was thinking maybe answer might be in the grounds of somehow transforming multidimensions
into more linear structure ? Could something like that be useful ? Simple search on stackoverflow pointed many solutions
to this, like:
Algorithm to convert a multi-dimensional array to a one-dimensional array
Would it be worth to search in that direction ?
Now, at the end, having in mind that arrays are really common feature in many languages, I must find it surprising that after searching ANTLR mailing list there is no similar question, which as I previously said leads me to believe that I'm maybe over complicating things ? (Unless I really suck at search?) I would really appreciate feedback.
Your syntax, if I'm not mistaken, corresponds to something like
new char 4,5,6,7
which is kind of strange. I expect that you really meant
new char[4,5,6,7]
However from a purely syntactic point of view, there's no reason not to just store the indices in an array and let the semantic analysis pass worry about it.

MD5 code kata and BDD

I was thinking to implement MD5 as a code kata and wanted to use BDD to drive the design (I am a BDD newb).
However, the only test I can think of starting with is to pass in an empty string, and the simplest thing that will work is embedding the hash in my program and returning that.
The logical extension of this is that I end up embedding the hash in my solution for every test and switching on the input to decide what to return. Which of course will not result in a working MD5 program.
One of my difficulties is that there should only be one public function:
public static string MD5(input byte[])
And I don't see how to test the internals.
Is my approach completely flawed or is MD5 unsuitable for BDD?
I believe you chose a pretty hard exercise for a BDD code-kata. The thing about code-kata, or what I've understood about it so far, is that you somehow have to see the problem in small incremental steps, so that you can perform these steps in red, green, refactor iterations.
For example, an exercise of finding an element position inside an array, might be like this:
If array is empty, then position is 0, no matter the needle element
Write test. Implementation. Refactor
If array is not empty, and element does not exist, position is -1
Write test. Implementation. Refactor
If array is not empty, and element is the first in list, position is 1
Write test. Implementation. Refactor
I don't really see how to break the MD5 algorithm in that kind of steps. But that may be because I'm not really an algorithm guy. If you better understand the steps involved in the MD5 algorithm, then you may have better chances.
It depends on what you mean with unsuitable... :-) It is suitable if you want to document a few examples that describes your implementation. It should also be possible to have the algorithm emerge from your specifciation if you add one more character for each test.
By just adding a switch statement you're just trying to "cheat the system". Using BDD/TDD does not mean you have to implement stupid things. Also the fact that you have hardcoded hash values as well as a switch statement in your code are clear code smells and should be refactored and removed. That is how your algorithm should emerge because when you see the hard coded values you first remove them (by calculating the value) and then you see that they are all the same so you remove the switch statement.
Also if your question is about finding good katas I would recommend lokking in the Kata catalogue.

Resources