What is the difference between CStringGetTextDatum() and CStringGetDatum() in Postgresql? - c

In a previous question about using CStrings to create table data in an SQL extension, it was a problem to use Datums made from CStringGetDatum() for table columns that expect VARCHAR. The solution was to use CStringGetTextDatum(). Now i am curious why.
Here are the function definitions, but i am not sure, in which situation to use CStringGetDatum() over the second, if you can't use the first with CStrings:
#define CStringGetDatum(X) PointerGetDatum(X)
#define CStringGetTextDatum(s) PointerGetDatum(cstring_to_text(s))
#define PointerGetDatum(X) ((Datum) (X))
text *
cstring_to_text(const char *s)
{
return cstring_to_text_with_len(s, strlen(s));
}
text *
cstring_to_text_with_len(const char *s, int len)
{
text *result = (text *) palloc(len + VARHDRSZ);
SET_VARSIZE(result, len + VARHDRSZ);
memcpy(VARDATA(result), s, len);
return result;
}

The data type text or varchar is not stored as a zero-terminated array of characters. It is a varlena, which has the length (and other stuff) stored in the beginning.
The data type cstring is for a C string and is an internal data type in PostgreSQL. You can never use it in SQL.
Use CStringGetDatum whenever you need to pass a Datum that is a C string and use CStringGetTextDatum to convert a C string to a text or varchar that you need to pass as a Datum.

Related

how to write a Postgres user definded type with array

I'm writing a user defined type in Postgres called personname:
#define FLXIBLE_ARRAY_MEMBER 0
PG_MODULE_MAGIC;
typedef struct personname{
int familyLen;
int givenLen;
int givenStart;
char pname[FLXIBLE_ARRAY_MEMBER];
}personname;
I write my personname_in and personname_out function roughly like this:
PG_FUNCTION_INFO_V1(pname_in);
Datum
pname_in(PG_FUNCTION_ARGS){
char* str = PG_GETARG_CSTRING(0);
personname *name;
...
name = (personname*) palloc(sizeof(personname) + strlen(str) + 1);
name->familyLen = familyNameLen;
name->givenLen = givenNameLen;
name->givenStart = givenNameStart;
strcpy(name->pname, str);
PG_RETURN_POINTER(name);
}
PG_FUNCTION_INFO_V1(pname_out);
Datum
pname_out(PG_FUNCTION_ARGS){
personname *name = (personname*) PG_GETARG_POINTER(0);
char* family = getFamily(name);
char* given = getGiven(name);
char* nameStr;
nameStr = psprintf("%s,%s", family, given);
pfree(family);
pfree(given);
PG_RETURN_CSTRING(nameStr);
}
And my sql is like this:
CREATE FUNCTION pname_in(cstring)
RETURNS personname
AS '_OBJWD_/pname'
LANGUAGE C IMMUTABLE STRICT;
CREATE FUNCTION pname_out(personname)
RETURNS cstring
AS '_OBJWD_/pname'
LANGUAGE C IMMUTABLE STRICT;
CREATE TYPE personname (
internallength = 12,
input = pname_in,
output = pname_out
);
Now my code can correctly respond with select "NAME" :: personname;, and when I insert and select, it can correctly access to all arguments in personname except the pname array.
I create a table called users which contains the pname array, when I type select * from users; it shows this:
However, when I copy and paste my personname_in and personname_out code in another c file, replace palloc with malloc and test it with some input string from terminal, it can print correct pname value.
Could someone please tell me where did I do wrong, or what's the correct way to create a new type in PostgreSQL with array?
The CREATE TYPE statement does not fit the code, and the 4-byte varlena header is missing.
Qoth the documentation:
While the details of the new type's internal representation are only known to the I/O functions and other functions you create to work with the type, there are several properties of the internal representation that must be declared to PostgreSQL. Foremost of these is internallength. Base data types can be fixed-length, in which case internallength is a positive integer, or variable-length, indicated by setting internallength to VARIABLE. (Internally, this is represented by setting typlen to -1.) The internal representation of all variable-length types must start with a 4-byte integer giving the total length of this value of the type. (Note that the length field is often encoded, as described in Section 68.2; it's unwise to access it directly.)
You must define the type with
INTERNALLENGTH = VARIABLE
and the struct has to start with a 4-byte integer.
I didn't check for others errors.

Absorb equal strings onto the same pointer

I'm making a program that reads a text file composed by strings, each one on a line. Basically I do this:
...
char* name;
char* buffer = malloc(sizeof(char) * SIZE); //size is a defined constant in the header
while(fgets(buffer, SIZE, pf)){ //pf is the opened stream
name = malloc(sizeof(char) * SIZE);
strcpy(name, strtok(buffer, "\n"));
manipulate(name); //call an extern function
}
Function manipulate is declared in this manner:
void manipulate(void* ptr);
The problem is that in this way two equal strings will have different memory addresses so they will recognized as two different elements from manipulate function.
How can I make them recognized as a single element?
Store the strings in a set, a data type which stores no repeated values and is fast to search. Basically it's a hash table where the key is the string and the value doesn't matter.
You can write your own hash table, it's a good exercise, but for production you're better off using an existing one like from GLib. It already has convenience methods for using a hash table as a set. While we're at it, we can use their g_strchomp() and g_strdup().
#include <stdio.h>
#include <glib.h>
int main () {
// Initialize our set of strings.
GHashTable *set = g_hash_table_new(g_str_hash, g_str_equal);
// Allocate a line buffer on the stack.
char line[1024];
// Read lines from stdin.
while(fgets(line, sizeof(line), stdin)) {
// Strip the newline.
g_strchomp(line);
// Look up the string in the set.
char *string = g_hash_table_lookup(set, line);
if( string == NULL ) {
// Haven't seen this string before.
// Copy it, using only the memory we need.
string = g_strdup(line);
// Add it to the set.
g_hash_table_add(set, string);
}
printf("%p - %s\n", string, string);
}
}
And here's a quick demonstration.
$ ./test
foo
0x60200000bd90 - foo
foo
0x60200000bd90 - foo
bar
0x60200000bd70 - bar
baz
0x60200000bd50 - baz
aldskflkajd
0x60200000bd30 - aldskflkajd
aldskflkajd
0x60200000bd30 - aldskflkajd
If you indeed have two strings then they necessarily have different addresses, regardless of whether their contents are the same. It sounds like you want to keep track of the strings you've already read, so as to avoid / merge duplicates. That starts with the "keeping track" part.
Evidently, then, you need some kind of data structure in which to record the strings you've already read. You have many choices for that, and they have different advantages and disadvantages. If the number of distinct strings you'll need to handle is relatively small then a simple array or linked list could suffice, but if it is large enough then a hash table will provide much better performance.
With that in hand, you check each newly-read string against the previously read ones and act accordingly.

c strncasecmp that can handle NULL (character 0)

I have strings that may contain character 0. They are stored in a structure like this:
typedef struct somestruct_s {
const unsigned char *string;
size_t length;
};
If I wish to compare 2 of these together I can use memcmp as such:
int match = (a->length == b->length) ? !memcmp (a->string, b->string, a->length) : 0;
But if I wish to compare 2 of these together without regard to case, my first instinct is to use strncasecmp/_strnicmp -- however, that function stops on null characters.
Is there a common C function already around that can do this. I don't mind writing my own, but before I do I want to make sure there isn't a standard function that I am unaware of.

Function logic reuse between char string and wchar_t string without explicit string copying?

I'm writing a data structure in C to store commands; Here is the source pared down to what I'm unsatisfied with:
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
#include <errno.h>
#include "dbg.h"
#include "commandtree.h"
struct BranchList
{
CommandTree *tree;
BranchList *next;
};
struct CommandTree
{
wchar_t id; // wchar support actually has no memory cost due to the
bool term; // padding that would otherwise exist, and may in fact be
BranchList *list; // marginally faster to access due to its alignable size.
};
static inline BranchList *BranchList_create(void)
{
return calloc(1, sizeof(BranchList));
}
inline CommandTree *CommandTree_create(void)
{
return calloc(1, sizeof(CommandTree));
}
int CommandTree_putnw(CommandTree *t, const wchar_t *s, size_t n)
{
for(BranchList **p = &t->list;;)
{
if(!*p)
{
*p = BranchList_create();
if(errno == ENOMEM) return 1;
(*p)->tree = CommandTree_create();
if(errno == ENOMEM) return 1;
(*p)->tree->id = *s;
}
else if(*s != (*p)->tree->id)
{
p = &(*p)->next;
continue;
}
if(n == 1)
{
(*p)->tree->term = 1;
return 0;
}
p = &(*p)->tree->list;
s++;
n--;
}
}
int CommandTree_putn(CommandTree *t, const char *s, size_t n)
{
wchar_t *passto = malloc(n * sizeof(wchar_t));
mbstowcs(passto, s, n);
int ret = CommandTree_putnw(t, passto, n);
free(passto);
return ret;
}
This works perfectly well, but I'm rather unsatisfied with how I'm handling the fact that my tree supports wchar_t. I decided to add this when I realized that the padding of CommandTree would make any datatype smaller than than 7 bytes cost just as much memory anyway, but so as not to duplicate code, I have CommandTree_putn reuse the logic in the wchar_t-supporting CommandTree_putnw.
However, due to the difference in size of char and wchar_t, I can't just pass the array; I have to convert using mbstowcs and pass a temporary wchar_t * to CommandTree_putnw. This is suboptimal, given that CommandTree_putn is going to see the most usage and this quintuples the memory usage (sizeof (char) to sizeof (char) + sizeof (wchar_t)) of the stored string, which could stack if lots of these are going to be instantiated with longish commands.
I was wondering it I could do something like create a third function that would contain the logic, and get passed a size_t, depending in the value of which it would cast the string passed to it as a void * to either const char * or const wchar_t * but given that C is statically typed, I'd have to pretty much duplicate the logic with s cast to its respective type, which would ruin the idea I'm going for of "single instance of logic".
So ultimately, the question is, can I provide the program logic only once and pass wrappers const char * and const wchar_t * respectively, without creating a temporary wchar_t * in the function to handle const char *?
I don't know your hard requirements, but wchar_t tends to be difficult to work with precisely because of this problem; it's too hard to mesh with existing code that uses char.
All of the codebases I've worked with eventually migrated to UTF-8, which removes the necessity to store strings in a different type. UTF-8 works with the standard strcpy/strlen type of string manipulation functions and is fully Unicode savvy. The only challenge is that you will need to convert it to UTF-16 to invoke Windows Unicode APIs. (OS X can use UTF-8 directly.) You didn't mention platform so I don't know if this will be an issue for you. In our case we just wrote Win32 wrappers that took UTF-8 strings.
Can you use C++? If so, and the actual type wchar_t is important (rather than Unicode support), you can templatize the functions and then instantiate them with std::wstring or std::string depending on string width. You can also write them to be based on char and wchar_t if you are brave, but you'll need to write special wrapper functions to handle basic operations like strcpy versus wcscpy and so it ends up being more work overall by far.
In plain C, I don't think there's a silver bullet at all. There are yucky answers, but none I could recommend with a straight face.

C - Append strings until end of allocated memory

Let's consider following piece of code:
int len = 100;
char *buf = (char*)malloc(sizeof(char)*len);
printf("Appended: %s\n",struct_to_string(some_struct,buf,len));
Someone allocated amount of memory in order to get it filled with string data. The problem is that string data taken from some_struct could be ANY length. So what i want to achieve is to make struct_to_string function do the following:
Do not allocate any memory that goes outside (so, buf has to be allocated outside of the function, and passed)
Inside the struct_to_string I want to do something like:
char* struct_to_string(const struct type* some_struct, char* buf, int len) {
//it will be more like pseudo code to show the idea :)
char var1_name[] = "int l1";
buf += var1_name + " = " + some_struct->l1;
//when l1 is a int or some non char, I need to cast it
char var2_name[] = "bool t1";
buf += var2_name + " = " + some_struct->t1;
// buf+= (I mean appending function) should check if there is a place in a buf,
//if there is not it should fill buf with
//as many characters as possible (without writting to memory) and stop
//etc.
return buf;
}
Output should be like:
Appended: int l1 = 10 bool t1 = 20 //if there was good amount of memory allocated or
ex: Appended: int l1 = 10 bo //if there was not enough memory allocated
To sum up:
I need a function (or couple of functions) that adds given strings to the base string without overwritting base string;
do nothing when base string memory is full
I can not use C++ libraries
Another things that I could ask but are not so important right now:
Is there a way (in C) iterate through structure variable list to get their names, or at least to get their values without their names? (for example iterate through structure like through array ;d)
I do not normally use C, but for now I'm obligated to do, so I have very basic knowledge.
(sorry for my English)
Edit:
Good way to solve that problem is shown in post below: stackoverflow.com/a/2674354/2630520
I'd say all you need is the standard strncat function defined in the string.h header.
About the 'iterate through structure variable list' part, I'm not exactly sure what you mean. If your talking about iterating over the structure's members, a short answer would be : you can't introspect C structs for free.
You need to know beforehand what structure type you're using so that the compiler know at what offset in the memory it can find each member of your struct. Otherwise it's just an array of bytes like any other.
Don't mind asking if I wasn't clear enough or if you want more details.
Good luck.
So basically I did it like here: stackoverflow.com/a/2674354/2630520
int struct_to_string(const struct struct_type* struct_var, char* buf, const int len)
{
unsigned int length = 0;
unsigned int i;
length += snprintf(buf+length, len-length, "v0[%d]", struct_var->v0);
length += other_struct_to_string(struct_var->sub, buf+length, len-length);
length += snprintf(buf+length, len-length, "v2[%d]", struct_var->v2);
length += snprintf(buf+length, len-length, "v3[%d]", struct_var->v3);
....
return length;
}
snprintf writes as much as possible and discards everything left, so it was exactly what I was looking for.

Resources