Seems like a standard enough problem to warrant a standard design in the solution:
Say I want to write x+2 (or less) strings in a file. x strings make up the content of a section and the two strings make a kind of a header and footer for that section. The catch is that I would not write the header/footer strings if there are no strings in the content. Furthermore, these x strings are written from disparate places in the code. So the current flow is:
write header string
write content strings
write footer string
This leads to the header/footer strings being written even if the content is empty, and I have to address that, i.e. not writing the header/footer strings in this case.
The solution that I can think of is writing the header string before the first content string that is being written (implemented by funnelling each content string write with a header string write, with a boolean flag preventing multiple header string writes), and then writing the footer string only if the header string has been written (governed by a boolean flag).
This is the top level gist of it, just wondering if there are standard approaches available for cases like these.
Thanks!
There are a number of solutions to this:
Write the header and data lines to an in-memory cache and output them at the time you try to write the footer (but only if there are data lines, otherwise output nothing).
Same thing but using a temporary file for the data cache in case it's too big.
Remember the header and whether or not you've output it.
Since the first two solutions involve inefficiencies (caching possibly large amounts of data, or using relatively slow external storage), I'll concentrate on the latter. See the note at the bottom on how to do the caching (a).
The approach which doesn't require caching the data is to just have an indicator as to whether or not you've written the header. Before each data line, output the header (and set the flag) only if the flag is not yet set. You can also use this flag to control the footer (if the header hasn't been output, neither should the footer be):
def outHeader (str):
headerText = str
headerSent = false
def outdata (str):
if not headerSent:
write headerText
headerSent = true
write str
def outFooter (str):
if headerSent:
write str
This solution is perhaps much simpler in terms of no data caching required.
(a) If you did want to go with the caching solution (despite the advice that it's a sub-optimal solution), the following pseudo-code shows how it could be done:
def outHeader (str):
cachedHeader = str
cachedData = ""
def outdata (str):
cachedData = cachedData + str + "\n"
def outFooter (str):
if cachedData != "":
write cachedHeader
write cachedData
write str
The only difference between that in-memory cache and a file-based cache is:
creating an empty temporary file and setting lineCount to 0 where you currently create cachedData in outHeader().
sending str to the temporary file and incrementing lineCount in outData().
using lineCount to decide if there's cached data in outFooter and reading the lines back from the temporary file for output as data.
Related
Below is a picture of my current accessibility tree. You can see that the 4 text leaves in it are separated, but it still forms only one line of content. Is this still accessible well ("well" meaning screen readers can detect that they form one complete sentence), or should all of the text leaves be combined into one leaf?
If they should be combined, how can you concatenate variables into the text in React, while keeping it as one single leaf? This is my current code: <p>{cloudiness}% ({cloudinessDescription})</p>
How they are read aloud depends on the screen reader being used. VoiceOver reads it as one phrase, but that doesn't mean others will. Having it split up wouldn't be a nice experience, but it doesn't mean it's not accessible.
If you really want to make sure it's read as one phrase but don't like the noise of the template literal inside the JSX (I agree), why not define the string somewhere else until you are able to test on multiple screen readers?
const cloudinessSummary = `${cloudiness}% (${cloudinessDescription})`;
return <p>{cloudinessSummary}</p>;
I'm trying to edit code (C++/C#/Java) in a RichTextBox of a winfrom application.
Since there are different new line breakers in different platform, I'm using these code to encode the rich text:
value = value
.Replace("\r\n", "\\par ")
.Replace("\r", "\\par ")
.Replace("\n", "\\par ");
All the 3 types of newline breakers are displayed fine. But then I noticed that, since \r\n(2 chars in file content) has been encoded as \par (1 char in RichTextBox.Text), the index of a certain text differs. Thus RichTextBox.Select(index, length) will select text in the wrong position, if the position was calculated from the raw file content. For each more line, the difference of index increases 1.
I'm thinking if there is anything like "\\return\\par", which is counted as 2 chars but displayed as 1 newline breaker; and in RichTextBox.Text(it will be saved as the file content), it is decoded back as "\r\n".
And if not, is there any other way to align index of file content and RichTextBox.Text? Thanks.
BTW I have a dirty way to walk around: RichTextBox.Select(index - lineNo, length). But it's really ugly.
Is there an easy way to parse a file for a directive (as I can't think of a better word for it)?
I need to scan a file for <!--#directive parameter=value -->, copy the value, find the location and length where this directive was in the file, so it can be replaced with whatever.
I come from microcontrollers, and don't have a lot of experience with extra / full libraries.
Is there a better way to implement this than manually scanning it line by line? (I guess with ftell for position, fgets, and then parse with sscanf, fseek back to last position if it was a match).
Here is a regular expression which can help:
<!--\s*#.+=(\S+)\s?-->
Group with index 1 in each match is your value.
You can test them here: https://regex101.com/
Also consider using high-level language for this. Here is a snippet in C# which prints all values from a text file:
var inputText = File.ReadAllText("D:\\myTextFile.txt");
var regex = new Regex("<!--\\s*#.+=(\\S+)\\s?-->");
var matches = regex.Matches(inputText);
foreach (var g in matches.Cast<Match>().Select(match => match.Groups[1]))
Console.WriteLine(g.ToString());
According to this site:
http://pedrowa.weba.sk/docs/ApiDoc/apidoc_ap_get_client_block.html
This function:
ap_get_client_block(request_rec *r, char *buffer, int bufsiz);
Reads a chunk of POST data coming in from the network when a user requests a webpage.
So far, I have the following core code that reads the data:
while (len_read > 0){
len_read=ap_get_client_block(r,argsbuffer,length);
if((rpos+len_read) > length){rsize=length-rpos;}else{rsize=len_read;}
memcpy((char*)*rbuf+rpos,(char*)argsbuffer,(size_t)rsize);rpos+=rsize;
}
argsbuffer is a character array of length bytes and rbuf is a valid pointer, and the rest of the variables are apr_off_t data type.
If I changed this code to this:
while (len_read > 0){
len_read=ap_get_client_block(r,argsbuffer,length);
if((rpos+len_read) > length){rsize=length-rpos;}else{rsize=len_read;}
memcpy((char*)*rbuf+rpos,(char*)argsbuffer,(size_t)rsize);rpos+=rsize;
if (rpos > wantedlength){len_read=0;}
}
would I be able to close the stream some way and maintain processing speed without corrupt data coming in?
I already executed ap_setup_client_block(r, REQUEST_CHUNKED_ERROR) and made sure ap_should_client_block(r) returned true before processing the first code above. so that is like in a sense, opening a file. ap_get_client_block(r,argsbuffer,length). is like reading a file. Now what about an some ap_ command equivalent to close?
What I want to avoid is corrupt data.
The data that is incoming is in various sizes and I only want to attempt to capture a certain piece of data without having a loop go through the entire set of data every time. Thats why I posted this question.
For example: If I wanted to look for "A=123" as the input data within the first fixed 15 bytes and the first set of data is something like:
S=1&T=2&U=35238952958&V=3468348634683963869&W=fjfdslhjsdflhjsldjhsljsdlkj
Then I want the program to examine only:
S=1&T=2&U=35238
I'd be tempted using the second block of code. The first block of code works but goes through everything.
Anyone have any idea? I want this to execute on a live server as I am improving a security detection system. If anyone knows any other functionality that I should add or remove to my code, let me know. I want to optimize for speed.
Original question below, update regarding solution, if someone has a similar problem:
For a fast regex I found http://re2c.org/ ; for xml parsing http://expat.sourceforge.net/
Is there an xml library I can use to parse xml from memory (and not from file) in a streaming manner in c?
Currently I have:
libxml2 ; XMLReader seems to only be possible to use with a filehandle and not in-memory
rapidxml is c++ and does not seem to expose a c interface
Requirements:
I need to process the individual xml nodes without having the whole xml (400GB uncompressed, and "only" 29GB as original .bz2 file) in memory ( bzip'd file gets read in and decompressed piecewise, and I would pass those uncompressed pieces to be consumed by the xml parser )
It does not need to very fast, but I would prefer an efficient solution
I (most probably) don't need the path of an extracted node, so it would be fine to just discard them as soon as they have been processed by my callback (if I would need the path contrary to what I think right now, I could then still track it myself)
This is part of me trying to solve my own problem posted here (and no, it's not the same question): How to efficiently parse large bz2 xml file in C
Ideally I'd like to be able to feed the library a certain amount of bytes at a time and have a function called whenever a node is completed.
Thank you very much
Here's some pseudo c code (way shorter than actual c code) for a better understanding
// extracted data gets put here
strm.next_out = buffer_ptr;
while( bytes_processed_total < filesize ) {
// extracts up to amount of data set in strm.avail_in
BZ2_bzDecompress( strm );
bytes_processed = strm.next_out - buffer_ptr;
bytes_processed_total += bytes_processed;
// here I would like to pass bytes_processed of buffer_ptr to xmlreader
}
About the data I want to parse: http://wiki.openstreetmap.org/wiki/OSM_XML
At the moment I only need certain <node ...> nodes from this, which have subnode <tag k="place" v="country|county|city|town|village"> (the '|' means at least one of those in this context, in the file it's of course only "country" etc without the '|')
xmlReaderForMemory from libxml2 seems a good one to me (but haven't used it so, I may be wrong)
the char * buffer needs to point to a valid XML document (that can be a part of your entire XML file). This can be extracted reading in chuncks your file but obtaining a valid XML fragment.
What's the structure of your XML file ? A root containing subsequent similar nodes or a fully fledged tree ?
If I had an XML like this:
<root>
<node>...</node>
<node>...</node>
<node>...</node>
</root>
I'd read starting from the opening <node> till the closing </node> and then parse it with the xmlReaderForMemory function, do what I need to do, then go on with the next <node> node.
Ofc if your <node> content is too complex/long, you may have to go deep some levels:
<node>
<subnode>....</subnode>
<subnode>....</subnode>
<subnode>....</subnode>
<subnode>....</subnode>
</node>
And read from the file until you have the entire <subnode> node (but keeping track that you're in a <node>.
I know it's ugly, but is a viable way. Or you can try to use a sax parser (dunno if some C implementation exists).
Sax parsing fires events on each node start and node end, so you can do nothing untill you find your nodes and process just them.
Another viable way can be using some external tools to filter the whole XML (XQuery or XPath processors) in order to extract just your interesting nodes from the whole file, obtain a smaller doc and then work on it.
EDIT: Zorba was a good XQuery framework, with command line preprocessor, may be a good place to look at
EDIT2: well since you have this dimensions, one alternative solution can be manage the file as a text file, so read and uncompress in chunks and then matching something like:
<yourNode>.*</yourNode>
with regexp.
If you're on a Linux/Unix you should have POSIX regexp library. Check
this question on S.O. for further insights.