How can I pass an entire Row to my ScalarFunction RowToTupleConverter in the following code? All the examples only address passing single or multiple values by name, but I want the whole result of the SELECT statement to be passed as a Row. My guess was using *, but that's not recognized as a valid parameter.
envT.registerFunction("toTuple", new RowToTupleConverter());
envT.createTemporaryView("t", envT.fromDataStream(ds));
Table result = envT.from("t").select("getAvroFieldString(f1, 'HASH_KEY') as hk,
getAvroFieldLong(f1, 'LOAD_DATE') as ld, 'test' as NAME");
envT.toAppendStream(result.select("*").map("toTuple(*)"), new TupleTypeInfo[...]).print();
I do not want to address the individual fields but a whole row, since I'm building up everything generic, thus my ScalarFunction requires a parameter of type Row. The function iterates through the row and creates a Tuple2<GenericRecord,GenericRecord>> from the values of the row.
Background:
The job is built up like this, because we need both key and value from a Kafka Source using the Confluent Schema Registry, and the job should be generic to allow for an arbitrary schema, allowing multiple instantiations without changing the codebase. The only way we found to achieve this, is creating a DataStream from a FlinkKafkaConsumer, where Tuple2 includes the key and the value of a message each in an instance of GenericRecord, and transforming this to a Flink table.
Since GenericRecord is a blackbox to the Table API, I followed recommendations in another thread and created simple ScalarFunctions, which extract the specific values I need. Right now that part is still hardcoded, but once everything works, it will also be generic. However, I'm struggling to wrap the result table back to a Tuple2, in order to write the transformed records back to another Kafka Topic, which is why I introduced another ScalarFunction to map from a Row to a Tuple2<GenericRecord,GenericRecord>>.
Is this possible and if so, how? If not, what kind of workaround could I use to solve this problem? I'd also appreciate suggestions for a more elegant way in general, but judging from the amount of research I did into that direction and due to the nature of the use case, I doubt there is. Unfortunately, moving to SpecificRecord is not an option.
In my SSIS Data Flow I create a derived column (DT_WSTR) based on a concatenation of two other columns. I want to save the max length in this column in a variable (with SQL it would be a MAX(LEN(COLUMN))). How is this done?
Add another Derived Column after your Derived Column that calculates the length of the computed column. Let's call it ColumnLength
LEN(COLUMN)
Now add a Multicast transformation. One path from here will go on to the "rest" of your data flow. A new path will lead to an Aggregate transformation. There, specify you want the maximum.
Now - what do you want to do with that information?
Write to a table -> OLE DB Destination
Report to the log -> Script Task that fires and information event
Use elsewhere in the package -> Recordset destination and then use a foreach loop to pop the one row, one value out it
Something else?
Sample data flow assuming you chose option 3 - recordset destination
I need to create 2 variables in my SSIS package. objRecordset of type Object and MaxColumnLength of type Int32
When the data flow finishes, all my data would have arrived in my table (represented by the Script Component) and my aggregated maximum length will flow into the recordset destination which uses my variable objRecordset
To get the value out of the ado.net recordset and into our single variable, we need to "shred the recordset" Google that term, you'll find many, many examples.
My control flow will look something like this
The ForEach (ado.net) Enumerator Loop Container is consumes every row in our dataset and we will specify that the our variable MaxColumnLength will be the 0th element of the table.
Finally, I put a sequence container in there so I can get a breakpoint to fire. We see the length of my max column variable to be 15 which matches my source query
SELECT 'a' As [COLUMN]
UNION ALL SELECT 'ZZZZZZZZZZZZZZZ'
I believe this addresses the problem you have asked.
As a data warehouse practitioner, I would encourage you to rethink your approach on the lookups. Yes, the 400 character column is going to wreak havoc on your memory so "math it out". Use the cryptological functions available to you and compute a fixed width, unique, key for that column and then you only work with that data.
SELECT
CONVERT(binary(20), HASHBYTES('SHA1', MyBusinessKeys)) AS BusHashKey
FROM
dbo.MyDimension;
Now you have 20 bytes, always and SHA1 is unlikely to generate duplicate values.
I am new to using ssis and am on my third package. We are taking data from Oracle into Sql Server. On my oracle table, the unique key is called recnum and is numeric(12,0). In this particular package, I am trying to take the record from oracle, lookup in a sql server table to see if that unique key is found, and if not add the record to the sql server table. My issue is it wouldn't find a match. After much testing, I came up with the following method that works. But I don't understand why I had to do this.
How I currently have it working:
I get the data from oracle. In my next step, I added a derived column that uses the oracle column. (The expression is just that field, no other formatting.) Then in the lookup I use the derived column instead of the column from Oracle.
We had already done this on another table where the unique key was numeric(8,0) and it worked ok without needing a derived column.
SSIS is very fussy about data types, lookups only work nicely if data types match.
Double click on the Data Path lines between Data Flow objects to check data types. I use Data Conversion tasks or CAST statements to force matching data types when I use lookups.
Hope this helps.
I'm currently creating my first real life project in Pervasive. The task is to map a certain XML structure containing orders (as in shops and products) to 3 tables I created myself. These tables rest inside a MS-SQL-Server instance.
All of the tables have a unique key called "id", an automatically incremented column. I've dropped this column from all mappings so that Pervasive will not try to fill it itself.
For certain calculations, for a split key in one of the tables and for references to the created records in other tables, I will need the id that the database has just created. For that, I have googled the answer. I can use "select ##identity;" as a statement, and this returns the id that has most recently been created for the current connection. This means that in Pervasive, I will have to execute this statement using the already existing target connection object.
But how to do that? I am quite sure that I will need a JDImport or DJExport object, but how to get one associated with the current connection that Pervasive inserts the records by?
Or is there any other way to handle this auto increment when I need to reference the id in other tables?
Not sure how things work in Pervasive, but you may run into issues with ##identity,. Scope_identity() would probably be safer but may still not work in Pervasive.
Hopefully your tables have a natural key in addition to the generated id, in which case you can select your id based on the natural key. This will avoid any issues you may have with disparate sessions and scope.
If there is anyone looking this post up and wonders about the answer, it's "You can't". Pervasive does not allow access to their very own connection object, the one they use to query the database. Without access to it, you cannot guaranteed fetch the right id. The solution for us was this: We used a stored procedure which we called in the Before-Transformation event that created the header record and returned the id and an optional error message as a table. We executed it and it returns the id we then save and use throughout our mapping.
I have a huge database (800MB) which consists of a field called 'Date Last Modified' at the moment this field is entered as a text data type but need to change it to a Date/Time field to carry out some queries.
I have another exact same database but with only 35MB of data inside it and when I change the data type it works fine, but when I try to change data type on big database it gives me an error:
Micorosoft Office Access can't change the data type.
There isn't enough disk space or memory
After doing some research some sites mentioned of changing the registry file (MaxLocksPerFile) tried that as well, but no luck :-(
Can anyone help please?
As John W. Vinson says here, the problem you're running into is that Access wants to hold a copy of the table while it makes the changes, and that causes it to exceed the maximum allowable size of an Access file. Compacting and repairing might help get the file under the size limit, but it didn't work for me.
If, like me, you have a lot of complex relationships and reports on the old table that you don't want to have to redo, try this variation on #user292452's solution instead:
Copy the table (i.e. 'YourTable') then paste Structure Only back
into your database with a different name (i.e. 'YourTable_new').
Copy YourTable again, and paste-append the data to YourTable_new.
(To paste-append, first paste, and select Append Data to Existing
Table.)
You may want to make a copy of your Access database at this point,
just in case something goes wrong with the next part.
Delete all data in YourTable using a delete query---select all
fields, using the asterisk, and then run with default settings.
Now you can change the fields in YourTable as needed and save
again.
Paste-append the data from YourTable_new to YourTable, and check
that there were no errors from type conversion, length, etc.
Delete YourTable_new.
One relatively tedious (but straightforward) solution would be to break the big database up into smaller databases, do the conversion on the smaller databases, and then recombine them.
This has an added benefit that if, by some chance, the text is an invalid date in one chunk, it will be easier to find (because of the smaller chunk sizes).
Assuming you have some kind of integer key on the table that ranges from 1 to (say) 10000000, you can just do queries like
SELECT *
INTO newTable1
FROM yourtable
WHERE yourkey >= 0 AND yourkey < 1000000
SELECT *
INTO newTable2
FROM yourtable
WHERE yourkey >= 1000000 AND yourkey < 2000000
etc.
Make sure to enter and run these queries seperately, since it seems that Access will give you a syntax error if you try to run more than one at a time.
If your keys are something else, you can do the same kind of thing, but you'll have to be a bit more tricky about your WHERE clauses.
Of course, a final thing to consider, if you can swing it, is to migrate to a different database that has a little more power. I'm guessing you have reasons that this isn't easy, but with the amount of data you're talking about, you'll probably be running into other problems as well as you continue to use Access.
EDIT
Since you are still having some troubles, here is some more detail in the hopes that you'll see something that I didn't describe well enough before:
Here, you can see that I've created a table "OutputIDrive" similar to what you're describing. I have an ID tag, though I only have three entries.
Here, I've created a query, gone into SQL mode, and entered the appropriate SQL statement. In my case, because my query only grabs value >= 0 and < 2, we'll just get one row...the one with ID = 1.
When I click the run button, I get a popup that tells/warns me what's going to happen...it's going to put a row into a new table. That's good...that's what we're looking for. I click "OK".
Now our new table has been created, and when I click on it, we can see that our one line of data with ID = 1 has been copied over to this new table.
Now you should be able to just modify the table name and the number values in your SQL query, and run it again.
Hopefully this will help you with whatever tripped you up.
EDIT 2:
Aha! This is the trick. You have to enter and run the SQL statements one at a time in Access. If you try to put multiple statements in and run them, you'll get that error. So run the first one, then erase it and run the second one, etc. and you should be fine. I think that will do it! I've edited the above to make it clearer.
Adapted from Karl Donaubauer's answer on an MSDN post:
Switch to immediate window (Ctl + G)
Execute the following statement:
DBEngine.SetOption dbMaxLocksPerFile, 200000
Microsoft has a KnowledgeBase article that addresses this problem directly and describes the cause:
The page locks required for the transaction exceed the MaxLocksPerFile value, which defaults to 9500 locks. The MaxLocksPerFile setting is stored in the Windows registry.
The KnowledgeBase article says it applies to Access 2002 and 2003, but it worked for me when changing a field in an .mdb from Access 2013.
It's entirely possible that in a database of that size, you've got text data that won't convert to a valid Date/Time.
I would suggest (and you may hate me for this) that you export all those prospective date values from "Big" and go through them (perhaps in Excel) to see which ones are not formatted the way you'd expect.
Assuming that the error message is accurate, you're running up against a disk or memory limitation. Assuming that you have more than a couple of gigabytes free on your disk drive, my best guess is that rebuilding the table would put the database (including work space) over the 2 gigabyte per file limit in Access.
If that's the case you'll need to:
Unload the data into some convenient format and load it back in to an empty database with an already existing table definition.
Move a subset of the data into a smaller table, change the data type in the smaller table, compact and repair the database, and repeat until all the data is converted.
If the error message is NOT correct (which is possible), the most likely cause is a bad or out-of-range date in your text-date column.
Copy the table (i.e. 'YourTable') then paste just its structure back into your database with a different name (i.e. 'YourTable_new').
Change the fields in the new table to what you want and save it.
Create an append query and copy all the data from your old table into the new one.
Hopefully Access will automatically convert the old text field directly to the correct value for the new Date/Time field. If not, you might have to clear out the old table and re-append all the data and use a string to date function to convert that one field when you do the append.
Also, if there is an autonumber field in the old table this might not work because there is no way to ensure that the old autonumber values will line up with the new autonumber values that get assigned.
You've been offered a bunch of different ways to get around the disk space error message.
Have you tried adding a new field to your existing table using Date data type and then updating the field with the value the existing string date field? If that works, you can then delete the old field and rename the new one to the old name. That would probably take up less temp space than doing a direct conversion from string to date on a single field.
If it still doesn't work, you may be able to do it with a sceond table with two columns, the first long integer (make it the primary key), the second, date. Then append the PK and string date field to this empty table. Then add a new date field to the existing table, and using a join, update the new field with the values from the two-column table.
This may run into the same problem. It depends on number of things internal to the Jet/ACE database engine over which we have no real control.