How can I get table tuple and transform it into an array in C for postgres? - c

I'm following Postgres documentation https://www.postgresql.org/docs/8.2/xfunc-c.html for writing C function and creating the extension (for hierarchial clustering) and I'm confused.
So I can get a tuple by using HeapTupleHeader t = PG_GETARG_HEAPTUPLEHEADER(0);
How can I get attribute values in this tuple? We have GET_ARGUMENT_BY_NUM, can I get a value for each column and put it into an array? (For some reason i want to get data from table and for example, clusterize it).
There is an example of using specific table for a function (emp table). How can I use random table for my function (I couldn't find the example)?
Is c_overpaid(emp, limit) (in documentation) called one time for emp table, or is it called as much as the rows in the table?
for hierarchical-clustering: can I get table data from postgres, write it into a temp file, read that file, put it into array, clusterize it and put the result into database? (like create or alter table and do a partitioning? like this: hub - is whole table, part_1 is one cluster, part_2 is the second one etc)

You should read the documentation for the current version.
Yes.
As the example shows, with GetAttributeByName, but there is also a GetAttributeByNum function. I assume you are talking about a C array and not a PostgreSQL array. You can stuff all the values into an array, sure, if they have the same data type.
Then you would have to use the special type record. For a code sample, look at the functions record_to_json and composite_to_json in src/backend/utils/adt/json.c.
It is called for each row found, since it appears in the SELECT list.
That's a bit vague, but sure. I don't see why you'd want to extract that from a table though. Why not write your own table access method, since it looks like you want to define a new way of storing tables.
But be warned, that would be decidedly non-trivial, and you'd better first get your feet wet with more mundane stuff.

Related

Pass an entire Row as parameter to User-Defined Table Function in Flink Table API

How can I pass an entire Row to my ScalarFunction RowToTupleConverter in the following code? All the examples only address passing single or multiple values by name, but I want the whole result of the SELECT statement to be passed as a Row. My guess was using *, but that's not recognized as a valid parameter.
envT.registerFunction("toTuple", new RowToTupleConverter());
envT.createTemporaryView("t", envT.fromDataStream(ds));
Table result = envT.from("t").select("getAvroFieldString(f1, 'HASH_KEY') as hk,
getAvroFieldLong(f1, 'LOAD_DATE') as ld, 'test' as NAME");
envT.toAppendStream(result.select("*").map("toTuple(*)"), new TupleTypeInfo[...]).print();
I do not want to address the individual fields but a whole row, since I'm building up everything generic, thus my ScalarFunction requires a parameter of type Row. The function iterates through the row and creates a Tuple2<GenericRecord,GenericRecord>> from the values of the row.
Background:
The job is built up like this, because we need both key and value from a Kafka Source using the Confluent Schema Registry, and the job should be generic to allow for an arbitrary schema, allowing multiple instantiations without changing the codebase. The only way we found to achieve this, is creating a DataStream from a FlinkKafkaConsumer, where Tuple2 includes the key and the value of a message each in an instance of GenericRecord, and transforming this to a Flink table.
Since GenericRecord is a blackbox to the Table API, I followed recommendations in another thread and created simple ScalarFunctions, which extract the specific values I need. Right now that part is still hardcoded, but once everything works, it will also be generic. However, I'm struggling to wrap the result table back to a Tuple2, in order to write the transformed records back to another Kafka Topic, which is why I introduced another ScalarFunction to map from a Row to a Tuple2<GenericRecord,GenericRecord>>.
Is this possible and if so, how? If not, what kind of workaround could I use to solve this problem? I'd also appreciate suggestions for a more elegant way in general, but judging from the amount of research I did into that direction and due to the nature of the use case, I doubt there is. Unfortunately, moving to SpecificRecord is not an option.

Define a String constant in SQL Server?

Is it possible in SQL Server to define a String constant? I am rewriting some queries to use stored procedures and each has the same long string as part of an IN statement [a], [b], [c] etc.
It isn't expected to change, but could at some point in future. It is also a very long string (a few hundred characters) so if there is a way to define a global constant for this that would be much easier to work with.
If this is possible I would also be interested to know if it works in this scenario. I had tried to pass this String as a parameter, so I could control it from a single point within my application but the Stored Procedure didn't like it.
You can create a table with a single column and row and disallow writes on it.
Use that as you global string constant (or additional constants, if you wish).
You are asking for one thing (a string constant in MS SQL), but appear to maybe need something else. The reason I say this is because you have given a few hints at your ultimate objective, which appears to be using the same IN clause in multiple stored procedures.
The biggest clue is in the last sentence:
I had tried to pass this String as a
parameter, so I could control it from
a single point within my application
but the Stored Procedure didn't like
it.
Without details of your SQL scripts, I am going to attempt to use some psychic debugging techniques to see if I can get you to what I believe is your actual goal, and not necessarily your stated goal.
Given your Stored Procedure "didn't like that" when you tried to pass in a string as a parameter, I am guessing the composition of the string was simply a delimited list of values, something like "10293, 105968, 501940" or "Juice, Milk, Donuts" (pay no attention to the actual list values - the important part is the delimited list itself). And your SQL may have looked something like this (again, ignore the specific names and focus on the general concept):
SELECT Column1, Column2, Column3
FROM UnknownTable
WHERE Column1 IN (#parameterString);
If this approximately describes the path you tried to take, then you will need to reconsider your approach. Using a regular T-SQL statement, you will not be able to pass a string of parameter values to an IN clause - it just doesn't know what to do with them.
There are alternatives, however:
Dynamic SQL - you can build up the
whole SQL statement, parameters and
all, then execute that in the SQL
database. This probably is not what
you are trying to achieve, since you
are moving script to stored
procedures. But it is listed here
for completeness.
Table of values -
you can create a single-column table
that holds the specific values you
are interested in. Then your Stored
Procedure can simply use the column
from this table for the IN clause).
This way, there is no Dynamic SQL
required. Since you indicate that
the values are not likely to change,
you may just need to populate the
table once, and use it wherever
appropriate.
String Parsing to
derive the list of values - You can
pass the list of values as a string,
then implement code to parse the
list into a table structure on the
fly. An alternative form of this
technique is to pass an XML
structure containing the values, and
use MS SQL Server's XML
functionality to derive the table.
Define a table-value function that
returns the values to use - I have
not tried this one, so I may be
missing something, but you should be
able to define the values in a
table-value function (possibly using
a bunch of UNION statements or
something), and call that function
in the IN clause. Again - this is an
untested suggestion and would need
to be worked through to determine
it's feasibility.
I hope that helps (assuming I have guessed your underlying quandary).
For future reference, it would be extremely helpful if you could include SQL script showing
your table structure and stored procedure logic so we can see what you have actually attempted. This will considerably improve the effectiveness of the answers you receive. Thanks.
P.S. The link for String Parsing actually includes a large variety of techniques for passing arrays (i.e. lists) of information to Stored Procedures - it is a very good resource for this kind of thing.
In addition to string-constants tables as Oded suggests, I have used scalar functions to encapsulate some constants. That would be better for fewer constants, of course, but their use is simple.
Perhaps a combination - string constants table with a function that takes a key and returns the string. You could even use that for localization by having the function take a 'region' and combine that with a key to return a different string!

Can't change data type on MS Access 2007

I have a huge database (800MB) which consists of a field called 'Date Last Modified' at the moment this field is entered as a text data type but need to change it to a Date/Time field to carry out some queries.
I have another exact same database but with only 35MB of data inside it and when I change the data type it works fine, but when I try to change data type on big database it gives me an error:
Micorosoft Office Access can't change the data type.
There isn't enough disk space or memory
After doing some research some sites mentioned of changing the registry file (MaxLocksPerFile) tried that as well, but no luck :-(
Can anyone help please?
As John W. Vinson says here, the problem you're running into is that Access wants to hold a copy of the table while it makes the changes, and that causes it to exceed the maximum allowable size of an Access file. Compacting and repairing might help get the file under the size limit, but it didn't work for me.
If, like me, you have a lot of complex relationships and reports on the old table that you don't want to have to redo, try this variation on #user292452's solution instead:
Copy the table (i.e. 'YourTable') then paste Structure Only back
into your database with a different name (i.e. 'YourTable_new').
Copy YourTable again, and paste-append the data to YourTable_new.
(To paste-append, first paste, and select Append Data to Existing
Table.)
You may want to make a copy of your Access database at this point,
just in case something goes wrong with the next part.
Delete all data in YourTable using a delete query---select all
fields, using the asterisk, and then run with default settings.
Now you can change the fields in YourTable as needed and save
again.
Paste-append the data from YourTable_new to YourTable, and check
that there were no errors from type conversion, length, etc.
Delete YourTable_new.
One relatively tedious (but straightforward) solution would be to break the big database up into smaller databases, do the conversion on the smaller databases, and then recombine them.
This has an added benefit that if, by some chance, the text is an invalid date in one chunk, it will be easier to find (because of the smaller chunk sizes).
Assuming you have some kind of integer key on the table that ranges from 1 to (say) 10000000, you can just do queries like
SELECT *
INTO newTable1
FROM yourtable
WHERE yourkey >= 0 AND yourkey < 1000000
SELECT *
INTO newTable2
FROM yourtable
WHERE yourkey >= 1000000 AND yourkey < 2000000
etc.
Make sure to enter and run these queries seperately, since it seems that Access will give you a syntax error if you try to run more than one at a time.
If your keys are something else, you can do the same kind of thing, but you'll have to be a bit more tricky about your WHERE clauses.
Of course, a final thing to consider, if you can swing it, is to migrate to a different database that has a little more power. I'm guessing you have reasons that this isn't easy, but with the amount of data you're talking about, you'll probably be running into other problems as well as you continue to use Access.
EDIT
Since you are still having some troubles, here is some more detail in the hopes that you'll see something that I didn't describe well enough before:
Here, you can see that I've created a table "OutputIDrive" similar to what you're describing. I have an ID tag, though I only have three entries.
Here, I've created a query, gone into SQL mode, and entered the appropriate SQL statement. In my case, because my query only grabs value >= 0 and < 2, we'll just get one row...the one with ID = 1.
When I click the run button, I get a popup that tells/warns me what's going to happen...it's going to put a row into a new table. That's good...that's what we're looking for. I click "OK".
Now our new table has been created, and when I click on it, we can see that our one line of data with ID = 1 has been copied over to this new table.
Now you should be able to just modify the table name and the number values in your SQL query, and run it again.
Hopefully this will help you with whatever tripped you up.
EDIT 2:
Aha! This is the trick. You have to enter and run the SQL statements one at a time in Access. If you try to put multiple statements in and run them, you'll get that error. So run the first one, then erase it and run the second one, etc. and you should be fine. I think that will do it! I've edited the above to make it clearer.
Adapted from Karl Donaubauer's answer on an MSDN post:
Switch to immediate window (Ctl + G)
Execute the following statement:
DBEngine.SetOption dbMaxLocksPerFile, 200000
Microsoft has a KnowledgeBase article that addresses this problem directly and describes the cause:
The page locks required for the transaction exceed the MaxLocksPerFile value, which defaults to 9500 locks. The MaxLocksPerFile setting is stored in the Windows registry.
The KnowledgeBase article says it applies to Access 2002 and 2003, but it worked for me when changing a field in an .mdb from Access 2013.
It's entirely possible that in a database of that size, you've got text data that won't convert to a valid Date/Time.
I would suggest (and you may hate me for this) that you export all those prospective date values from "Big" and go through them (perhaps in Excel) to see which ones are not formatted the way you'd expect.
Assuming that the error message is accurate, you're running up against a disk or memory limitation. Assuming that you have more than a couple of gigabytes free on your disk drive, my best guess is that rebuilding the table would put the database (including work space) over the 2 gigabyte per file limit in Access.
If that's the case you'll need to:
Unload the data into some convenient format and load it back in to an empty database with an already existing table definition.
Move a subset of the data into a smaller table, change the data type in the smaller table, compact and repair the database, and repeat until all the data is converted.
If the error message is NOT correct (which is possible), the most likely cause is a bad or out-of-range date in your text-date column.
Copy the table (i.e. 'YourTable') then paste just its structure back into your database with a different name (i.e. 'YourTable_new').
Change the fields in the new table to what you want and save it.
Create an append query and copy all the data from your old table into the new one.
Hopefully Access will automatically convert the old text field directly to the correct value for the new Date/Time field. If not, you might have to clear out the old table and re-append all the data and use a string to date function to convert that one field when you do the append.
Also, if there is an autonumber field in the old table this might not work because there is no way to ensure that the old autonumber values will line up with the new autonumber values that get assigned.
You've been offered a bunch of different ways to get around the disk space error message.
Have you tried adding a new field to your existing table using Date data type and then updating the field with the value the existing string date field? If that works, you can then delete the old field and rename the new one to the old name. That would probably take up less temp space than doing a direct conversion from string to date on a single field.
If it still doesn't work, you may be able to do it with a sceond table with two columns, the first long integer (make it the primary key), the second, date. Then append the PK and string date field to this empty table. Then add a new date field to the existing table, and using a join, update the new field with the values from the two-column table.
This may run into the same problem. It depends on number of things internal to the Jet/ACE database engine over which we have no real control.

Generating Primary Key with Non-Zero Index (SSIS Data Flow)

I've got a data flow task that takes a pair of tables, mashes the relevant data together, and comes out with some results to be put into an indexed table. The indexed table already has data that I'm not getting rid of and for simplicity's sake should retain their existing keys. So, I need to generate a key that starts from the highest Primary Key value already in the column.
I have found a blog post that works when starting from any known value, but this data flow will eventually be used on different databases, so that value won't be constant. It will always be the max of the column, though, but I can't find a way to grab that value using the script component suggested there.
This type of thing is notoriously difficult to do in SSIS which is why I try to avoid it. You need to:
...brace yourself...
-create a variable in your SSIS package to hold the start value
-create a SQL Task with a Parameter mapped to that variable with a direction of output and a query something like "SET ? = (SELECT MAX(IDValue) FROM Table)" - the question mark is the placeholder for the parameter which maps to the variable
-work the variable into your data flow - probably with a derived column transformation
I hope this helps...

How to insert a row into a dataset using SSIS?

I'm trying to create an SSIS package that takes data from an XML data source and for each row inserts another row with some preset values. Any ideas? I'm thinking I could use a DataReader source to generate the preset values by doing the following:
SELECT 'foo' as 'attribute1', 'bar' as 'attribute2'
The question is, how would I insert one row of this type for every row in the XML data source?
I'm not sure if I understand the question... My assumption is that you have n number of records coming into SSIS from your data source, and you want your output to have n * 2 records.
In order to do this, you can do the following:
multicast to create multiple copies of your input data
derived column transforms to set the "preset" values on the copies
sort
merge
Am I on the right track w/ what you're trying to accomplish?
I've never tried it, but it looks like you might be able to use a Derived Column transformation to do it: set the expression for attribute1 to "foo" and the expression for attribute2 to "bar".
You'd then transform the original data source, then only use the derived columns in your destination. If you still need the original source, you can Multicast it to create a duplicate.
At least I think this will work, based on the documentation. YMMV.
I would probably switch to using a Script Task and place your logic in there. You may still be able leverage the File Reading and other objects in SSIS to save some code.

Resources