Hey can someone tell me what the Field, File and Index .ddf files do in pervasive. Do they have to changed or be updated when a table definition changes? Any insight would be GREATLY appreciated.
Cheers.
FILE.DDF links the underlying Btrieve Data files to a logical table name.
FIELD.DDF uses the File Id from FILE.DDF to define all of the fields including offsets, data types, etc for each table.
INDEX.DDF defines the indexes on the fields in FIELD.DDF.
They are the field information meta date used by PSQL to access the data files in a relation access method (ODBC, OLEDB, ADO.NET, etc).
They do have to be changed if the underlying data file is changed through Btrieve. If the table definition changes through SQL (like ALTER TABLE statements), the Pervasive Control Center, DTI (Distributed Tuning Interface), DTO (Distributed Tuning Object), PDAC, ActiveX, or DDF Builder then the DDFs are updated automatically.
Related
I'm SUPER new to Snowflake and Snowpark, but I do have respectable SQL and Python experience. I'm trying to use Snowpark to do my data prep and eventually use it in a data science model. However, I cannot write to the database from which I'm pulling from -- I need to create all tables in a second DB.
I've created code blocks to represent both input and output DBs in their own sessions, but I'm not sure that's helpful, since I have to be in the first session in order to even get the data.
I use code similar to the following to create a new table while in the session for the "input" DB:
my_table= session.table("<SCHEMA>.<TABLE_NAME>")
my_table.toPandas()
table_info = my_table.select(col("<col_name1>"),
col("<col_name2>"),
col("<col_name3>").alias("<new_name>"),
col("<col_name4"),
col("<col_name5")
)
table_info.write.mode('overwrite').saveAsTable('MAINTABLE')
I need to save the table MAINTABLE to a secondary database that is different from the one where the data was pulled from. How do I do this?
It is possible to provide fully qualified name:
table_info.write.mode('overwrite').saveAsTable('DATABASE_NAME.SCHEMA_NAME.MAINTABLE')
DataFrameWriter.save_as_table
Parameters:
table_name – A string or list of strings that specify the table name or fully-qualified object identifier (database name, schema name, and table name).
I am working on a project to create a simplified version of SQLite Database. I got stuck when trying to figure out how does it manages to store data of multiple tables with different schema, in a single file. I suppose it should be using some indexes to map the data of different tables. Can someone shed more light on how its actually done? Thanks.
Edit: I suppose there is already an explanation in the docs, but looking for some easier way to understand it better and faster.
The schema is the list of all entities (tables, views etc) (the database as a whole) rather than a database existing of many schemas on a per entity basis.
Data itself is stored in pages each page being owned by an entity. It is these blocks that are saved.
The default page size is 4k. You will notice that the file size will always be a mutliple of 4K. You could also, with experimentation create a database with some tables, note it's size, then add some data, and if the added data does not require another page, see that the size of the file is the same. This demonstrating how it's all about pages rather than a linear/contiguos stream of data.
It, the schema, is saved in a table called sqlite_master. This table has columns :-
type (the type e.g. table etc),
name (the name given to the entity),
tbl_name (the tale to which the entity applies )
root page (the map to the first page)
sql (the SQL used to generate the entity, if any)
note that another schema, sqlite_temp_master, may also exist if there are temporary tables.
For example :-
Using SELECT * FROM sqlite_master; could result in something like :-
2.6. Storage Of The SQL Database Schema
I want to migrate a database from Btrieve (PSQL) to Oracle. For this i'll first convert my source db to CSV then i'll convert exported CSV to target db.
I'm not sure but as far as i know, it is not possible to get schema retained while exporting a DB to CSV.
It retains its schema insofar as it can tell you the column names, and column order. And from values, you can derive the column type (for example lots of unquoted numbers suggest an int or decimal type).
But it doesn't maintain useful things like primary keys, foreign keys, constraints, defaults.
You can try getting and copying a table schema from the source db, then pasting and running it against to your new db and see if it works (with some minor tweaks). Or you could use a tool like liquibase which should be able to help here.
I have several CSV files and have their corresponding tables (which will have same columns as that of CSVs with appropriate datatype) in the database with the same name as the CSV. So, every CSV will have a table in the database.
I somehow need to map those all dynamically. Once I run the mapping, the data from all the csv files should be transferred to the corresponding tables.I don't want to have different mappings for every CSV.
Is this possible through informatica?
Appreciate your help.
PowerCenter does not provide such feature out-of-the-box. Unless the structures of the source files and target tables are the same, you need to define separate source/target definitions and create mappings that use them.
However, you can use Stage Mapping Generator to generate a mapping for each file automatically.
PMy understanding is you have mant CSV files with different column layouts and you need to load them into appropriate tables in the Database.
Approach 1 : If you use any RDBMS you should have have some kind of import option. Explore that route to create tables based on csv files. This is a manual task.
Approach 2: Open the csv file and write formuale using the header to generate a create tbale statement. Execute the formula result in your DB. So, you will have many tables created. Now, use informatica to read the CSV and import all the tables and load into tables.
Approach 3 : using Informatica. You need to do lot of coding to create a dynamic mapping on the fly.
Proposed Solution :
mapping 1 :
1. Read the CSV file pass the header information to a java transformation
2. The java transformation should normalize and split the header column into rows. you can write them to a text file
3. Now you have all the columns in a text file. Read this text file and use SQL transformation to create the tables on the database
Mapping 2
Now, the table is available you need to read the CSV file excluding the header and load the data into the above table via SQL transformation ( insert statement) created by mapping 1
you can follow this approach for all the CSV files. I haven't tried this solution at my end but, i am sure that the above approach would work.
If you're not using any transformations, its wise to use Import option of the database. (e.g bteq script in Teradata). But if you are doing transformations, then you have to create as many Sources and targets as the number of files you have.
On the other hand you can achieve this in one mapping.
1. Create a separate flow for every file(i.e. Source-Transformation-Target) in the single mapping.
2. Use target load plan for choosing which file gets loaded first.
3. Configure the file names and corresponding database table names in the session for that mapping.
If all the mappings (if you have to create them separately) are same, use Indirect file Method. In the session properties under mappings tab, source option.., you will get this option. Default option will be Direct change it to Indirect.
I dont hav the tool now to explore more and clearly guide you. But explore this Indirect File Load type in Informatica. I am sure that this will solve the requirement.
I have written a workflow in Informatica that does it, but some of the complex steps are handled inside the database. The workflow watches a folder for new files. Once it sees all the files that constitute a feed, it starts to process the feed. It takes a backup in a time stamped folder and then copies all the data from the files in the feed into an Oracle table. An Oracle procedure gets to work and then transfers the data from the Oracle table into their corresponding destination staging tables and finally the Data Warehouse. So if I have to add a new file or a feed, I have to make changes in configuration tables only. No changes are required either to the Informatica Objects or the db objects. So the short answer is yes this is possible but it is not an out of the box feature.
I need to store data's change histories in database. For example some time some user modify some property of some data. The expected result is we can get the change histories for one data like
Tom changed title to 'Title one;'
James changed name to 'New name'
Steve added new_tag 'tag23'
Based on these change histories we can get all versions for some data.
Any good idea to achieve this? Not limited to traditional relation database.
These are commonly called audit tables. How I generally manage this is using triggers on the database. For every insert/update from a source table the trigger copies the data into another table called the same table name with an _AUDIT appended to it (the naming convention does not matter, it's just what I use). ORACLE provides you with something called journal tables. Using ORACLE designer (or manually) you can achieve the same thing and often developers put a _JN to the end of the journal/audit table. This, however, works the same, with triggers on the source table copying data into the audit table.
EDIT:
I should also note that you can create a new separate schema to manage just your audit tables or you can keep them in your schema with the source tables. I do both, it just depends on the situation.
I wrote an article about various options: http://blog.schauderhaft.de/2009/11/29/versioned-data/
If you are not tied to a relational database, there are things called 'append only' databases (I think), which never change data, but only append new versions. For your case this sounds kind of perfect. Unfortunately I don't know of any implementation.