How do I normalize a string using ICU4C? - c

I find the ICU docs somewhat challenging.
My question is: How do I normalize a string using ICU4C?
I'm looking at unorm2_normalize, but what if the buffer isn't large enough? How would I know this before? Naturally, I want to normalize the entire string.
Thanks! :>
P.S. Here is the API doc on that function: http://icu-project.org/apiref/icu4c/unorm2_8h.html#a0a596802db767da410b4b04cb75cbc53

You get a error code back from all these function call in the pErrorCode parameter. This is how you call such a function:
UErrorCode error = U_ZERO_ERROR;
unorm2_normalize( ... &error );
....
if( !U_SUCCESS( error ) )
{
// handle error...
}
Here are the error codes: http://icu-project.org/apiref/icu4c/utypes_8h.html#a3343c1c8a8377277046774691c98d78c
In your case you might want to do something like this:
if( error == U_STRING_NOT_TERMINATED_WARNING
|| error == U_BUFFER_OVERFLOW_ERROR )
{
// enlarge the buffer...
}

Related

Proper way to parse a file and build output

I'm trying to learn D and I thought after doing the hello world stuff, I could try something I wanted to do in Java before, where it was a big pain because of the way the Regex API worked: A little template engine.
So, I started with some simple code to read through a file, character by character:
import std.stdio, std.file, std.uni, std.array;
void main(string [] args) {
File f = File("src/res/test.dtl", "r");
bool escape = false;
char [] result;
Appender!(char[]) appender = appender(result);
foreach(c; f.rawRead(new char[f.size])) {
if(c == '\\') {
escape = true;
continue;
}
if(escape) {
escape = false;
// do something special
}
if(c == '#') {
// start of scope
}
appender.put(c);
}
writeln(appender.data());
}
The contents of my file could be something like this:
<h1>#{hello}</h1>
The goal is to replace the #{hello} part with some value passed to the engine.
So, I actually have two questions:
1. Is that a good way to process characters from file in D? I hacked this together after searching through all the imported modules and picking what sounded like it might do the job.
2. Sometimes, I would want to access more than one character (to improve checking for escape-sequences, find a whole scope, etc. Should I slice the array for that? Or are D's regex functions up to that challenge? So far, I only found matchFirst and matchAll methods, but I would like to match, replace and return to that position. How could that be done?
D standard library does not provide what you require. What you need is called "string interpolation", and here is a very nice implementation in D that you can use the way you describe: https://github.com/Abscissa/scriptlike/blob/4350eb745531720764861c82e0c4e689861bb17e/src/scriptlike/core.d#L139
Here is a blog post about this library: https://p0nce.github.io/d-idioms/#String-interpolation-as-a-library

Does the error returned by db.Exec(...) have a code?

I'm trying to delete a database using the postgres driver (lib/pq) by doing a:
db.Exec("DROP DATABASE dbName;")
But I'd like to do a different conditional based on whether the error received is something strange, or is a "database does not exist" error.
Is there a constant variable or something I can use to check if the error returned is a "database does not exist" error message, or would I have to manually parse the error string myself?
I tried to look in the documentation, but could not find anything for "database does not exist". I did however find this list.
Perhaps it fits under some other error code? Also I'm not quite sure the semantically correct way of fetching and comparing the error codes through the Postgres driver. I presume I should do something like this:
if err.ErrorCode != "xxx"
The lib/pq package may return errors of type *pq.Error, which is a struct. If it does, you may use all its fields to inspect for details of the error.
This is how it can be done:
if err, ok := err.(*pq.Error); ok {
// Here err is of type *pq.Error, you may inspect all its fields, e.g.:
fmt.Println("pq error:", err.Code.Name())
}
pq.Error has the following fields:
type Error struct {
Severity string
Code ErrorCode
Message string
Detail string
Hint string
Position string
InternalPosition string
InternalQuery string
Where string
Schema string
Table string
Column string
DataTypeName string
Constraint string
File string
Line string
Routine string
}
The meaning and possible values of these fields are Postres specific and the full list can be found here: Error and Notice Message Fields
You could use this: https://github.com/omeid/pgerror
It has lots of mappings for various postgres errors.
With the package, you can do the following (taken from the README):
// example use:
_, err = stmt.Exec(SomeInsertStateMent, params...)
if err != nil {
if e := pgerror.UniqueViolation(err); e != nil {
// you can use e here to check the fields et al
return SomeThingAlreadyExists
}
return err // other cases.
}
This package has all the PG error constants:
https://github.com/jackc/pgerrcode
Just import and you're good to go:
import "github.com/jackc/pgerrcode"
// ...
if err, ok := err.(*pq.Error); ok {
if err.Code == pgerrcode.UniqueViolation {
return fmt.Errorf("unique field violation on column %s", err.Column)
}
}
The package is also in the family of one of the 2 or 3 most popular Go PostgreSQL drivers, called "pgx", so it should be reliable enough.

Extracting jansson JSON data

I am using the C jansson library http://www.digip.org/jansson/
It's quite easy to use https://jansson.readthedocs.org/en/2.7/tutorial.html#the-program
But I cannot get a simple int out of my JSON string. I can successfully receive and load a string of JSON (as in I get no errors, nothing is null) but when I use the jansson get functions to get an int, my int is always 0 even though using steps and breakpoints, the jansson function process an in is not returning 0.
The JSON string looks like this:
{"type":3}
Here is the code:
static void foo(json_t *jsonRoot) {
// json root is error checked even before this, and is not null
if (jsonRoot == NULL) {
return;
}
// Trying to get type = 3
json_t *j_type;
int type = 0;
j_type = json_object_get(jsonRoot, "type");
if (!json_is_integer(j_type)) {
printf("Not an int!\n");
return;
} else {
// I get in to the else
// json_integer_value has a its own internal check and
// will return 0 if the value is not an int, but it is not
// returning 0. It is running the macro json_to_integer(json)->value
type = json_integer_value(j_type);
}
printf("type is %d\n", type);
// type is 0
}
My issue was with strtoll. I had to redefine it.

Drop xml declaration in libxml2

I want to get rid of the xml declaration when I use libxml2's function xmlSaveFile. How can I accomplish that ?
Is there any macro to do that or can I use another function (I tried with xmlSaveToFilename from xmlsave.h but I do not know how it works) ?
Something like this should work:
xmlSaveCtxtPtr saveCtxt = xmlSaveToFilename(filename, NULL, XML_SAVE_NO_DECL);
if (saveCtxt == NULL) {
// Error handling
}
if (xmlSaveDoc(saveCtxt, doc) < 0) {
// Error handling
}
xmlSaveClose(saveCtxt);
The documentation of the xmlsave module can be found here.

How do you print an associative array in DTrace?

The question pretty much sums it up. "dtrace 'print an associative array'" has exactly one google hit and the similar searches are equally useless.
EDIT:
If I were to use an aggregation, I'm not aware that I'd still be able to remove entries. My application requires that I be able to do things like:
file_descriptors[0] = "stdin"
file_descriptors[3] = "service.log"
...
...
file_descriptors[3] = 0
...
...
# should print only those entries that have not been cleared.
print_array(file_descriptors)
I know that you can clear an entire aggregation, but what about a single entry?
UPDATE:
Since I'm doing this in OS X and my application is to track all of the file descriptors that have been opened by a particular process, I was able to have an array of 256 pathnames, thusly:
syscall::open*:entry
/execname == $1/
{
self->path = copyinstr(arg0);
}
syscall::open*:return
/execname == $1/
{
opened[arg0] = self->path;
}
syscall::close*:entry
/execname == $1/
{
opened[arg0] = 0;
}
tick-10sec
{
printf(" 0: %s\n", opened[0]);
}
The above probe repeated 255 more times...
It sucks. I'd really like to have something better.
Is this the link Google found? Because the advice seems pretty sound:
I think the effect you're looking for should be achieved by using an
aggregation rather than an array. So you'd actually do something like:
#requests[remote_ip,request] = count();
... and then:
profile:::tick-10sec
{
/* print all of the requests */
printa(#requests);
/* Nuke the requests aggregation */
trunc(#requests);
}
Use an associative array and sum(1) and sum(-1) instead of count().

Resources