I have three tables/entities for which I want to preserve their data when I load Doctrine2 fixtures. Of course, right now, when I run doctrine:fixtures:load, it purges the entire database (except migration_versions) and then loads the fixtures appropriately.
I realize that I can use the --append switch to only add data to the database, but I do want to remove most of the data from the database.
How do I preserve table data from only three tables/entities when using Doctrine2 fixtures?
How about you separate the "append-only" and "delete-only" fixtures classes into separate folders and then run two console commands specifying the fixtures path using --fixtures ?
Related
Background: I have few models which are materialized as 'Table'. This tables are populated with wipe(Truncate) and Load. Now I want to protect my existing data in the Table if the query used to populate data is returning empty result set. How can I make sure an empty result set is not replacing my existing data in table.
My table lies in Snowflake and using dbt to model the output table.
Nutshell: Commit the transaction only when SQL statement used is returning Not empty result set.
Have you tried using dbt ref() function, which allows us to reference one model within another?
https://docs.getdbt.com/reference/dbt-jinja-functions/ref
If you are loading data in a way that is not controlled via dbt and then you are using this table - this is called a source. You can read more about this in here.
dbt does not control what you load into a source, everything else that is the T in the ELT is controlled where you reference a model via ref() function. A great example if you have a source that changes and you load it into a table and make sure that incoming data does not "drop" already recorded data is "incremental" materialization. I suggest you read more in here.
Thinking incremental takes time and practise, also it is recommended every now and then to do a --full-refresh.
You can have pre-hooks and post-hooks that can check your sources with clever macros and add dbt tests. We would really need a little bit more context of what you have and what you wish to achieve to suggest a real response.
I'm testing out a trial version of Snowflake. I created a table and want to load a local CSV called "food" but I don't see any "load" data option as shown in tutorial videos.
What am I missing? Do I need to use a PUT command somewhere?
Don't think Snowsight has that option in the UI. It's available in the classic UI though. Go to Databases tab, select a database. Go to Tables tab and select a table the option will be at the top
If the classic UI is limiting you or you are already using Snowsight and don't want to switch back, then here is another way to upload a CSV file.
A preliminary is that you have installed SnowSQL on your device (https://docs.snowflake.com/en/user-guide/snowsql-install-config.html).
Start SnowSQL and perform the following steps:
Use the database where to upload the file to. You need various privileges for creating a stage, a fileformat, and a table. E.g. USE MY_TEST_DB;
Create the fileformat you want to use for uploading your CSV file. E.g.
CREATE FILE FORMAT "MY_TEST_DB"."PUBLIC".MY_FILE_FORMAT TYPE = 'CSV';
If you don't configure the RECORD_DELIMITER, the FIELD_DELIMITER, and other stuff, Snowflake uses some defaults. I suggest you have a look at https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html. Some of the auto detection stuff can make your life hard and sometimes it is better to disable it.
Create a stage using the previously created fileformat
CREATE STAGE MY_STAGE file_format = "MY_TEST_DB"."PUBLIC".MY_FILE_FORMAT;
Now you can put your file to this stage
PUT file://<file_path>/file.csv #MY_STAGE;
You can find documentation for configuring the stage at https://docs.snowflake.com/en/sql-reference/sql/create-stage.html
You can check the upload with
SELECT d.$1, ..., d.$N FROM #MY_STAGE/file.csv d;
Then, create your table.
CREATE TABLE MY_TABLE (col1 varchar, ..., colN varchar);
Personally, I prefer creating first a table with only varchar columns and then create a view or a table with the final types. I love the try_to_* functions in snowflake (e.g. https://docs.snowflake.com/en/sql-reference/functions/try_to_decimal.html).
Then, copy the content from your stage to your table. If you want to transform your data at this point, you have to use an inner select. If not then the following command is enough.
COPY INTO mycsvtable from #MY_STAGE/file.csv;
I suggest doing this without the inner SELECT because then the option ERROR_ON_COLUMN_COUNT_MISMATCH works.
Be aware that the schema of the table must match the format. As mentioned above, if you go with all columns as varchars first and then transform the columns of interest in a second step, you should be fine.
You can find documentation for copying the staged file into a table at https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
If you can check the dropped lines as follows:
SELECT error, line, character, rejected_record FROM table(validate("MY_TEST_DB"."MY_SCHEMA"."MY_CSV_TABLE", job_id=>'xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'))
Details can be found at https://docs.snowflake.com/en/sql-reference/functions/validate.html.
If you want to add those lines to your success table you can copy the the dropped lines to a new table and transform the data until the schema matches with the schema of the success table. Then, you can UNION both tables.
You see that it is pretty much to do for loading a simple CSV file to Snowflake. It becomes even more complicated when you take into account that every step can cause some specific failures and that your file might contain erroneous lines. This is why my team and I are working at Datameer to make these types of tasks easier. We aim for a simple drag and drop solution that does most of the work for you. We would be happy if you would try it out here: https://www.datameer.com/upload-csv-to-snowflake/
Context: Our business users receive excel sheets (.xlsx) via mail that they want to import into Foundry. We agreed on a given structure and naming convention for the files and tabs in order to simply drag and drop them into a specific folder and append them to the existing dataset. The change of this existing dataset then triggers a pipeline (raw->clean->ontology).
Issue: We use "Additional Columns" to clean up the data and apply some logic based on them (_filePath, _byteOffset, _importedAt) but every time a new excel is appended the schema seems to be reset and the "Additional Columns" are unticked.
Is there a way of keeping the "Additional Columns" after importing and appending an excel sheet to an existing dataset?
Unfortunately, imports through the drag-and-drop interface always replace the existing schema on import which is why you are losing the additional columns. If you can create the files as CSV's instead of XLS then you can append and keep the existing schema, including the additional columns. Another approach, albeit indirect, would be to have an additional step between raw and clean that calls the metadata API to add the optional columns.
You'd want to set these textParserParam arguments:
textParserParams["addFilePath"] = True
textParserParams["addByteOffset"] = True
textParserParams["addImportedAt"] = True
I am working in ODI12c Project. I have a Scenario in which i need to load multiple files in a single table Parallelly.
I tried through file List and using Loop in ODI package,but its loading data in serial i.e. 1st file then 2nd file etc.
Please suggest how should i load data Parallelly.
If you want to load multiple files in odi in one go you can merge all those files in one single file and load it in a single go. I'm considering all files have same structure. if you want to load them separately then creating separate model for each file will be headache and re usability of code will also compromised. You can run scenarios in asynchronous mode to made them run independent . I believe this is my best answer till now.
I have an another approach to achieve this:
Create a simple package for loading the file.
Create a another package which is calling the another one.
Load all the files name in a oracle table like:
File_id Name
1 File1
2 File2
Now create a increment variable with pull the name of 1st file and then 2 .
Now call the file loading package inside another package & make it asynchronous.
Pass the increment variable which is having the file name into model created for loading file ( resource Name).
Each time when variable is getting incremented it's gonna pull the next file and create a different session for each file loading . there might be a gap of 1 sec for each file loading because of increment of variables,i think that would be ok.
I hope you got the way in case any help you can add comment.
Is your target table partitioned?
The bottleneck you might be having could be due to the target destination being the same physical storage.
Partitioning the target (or similar techniques) might help
I have several CSV files and have their corresponding tables (which will have same columns as that of CSVs with appropriate datatype) in the database with the same name as the CSV. So, every CSV will have a table in the database.
I somehow need to map those all dynamically. Once I run the mapping, the data from all the csv files should be transferred to the corresponding tables.I don't want to have different mappings for every CSV.
Is this possible through informatica?
Appreciate your help.
PowerCenter does not provide such feature out-of-the-box. Unless the structures of the source files and target tables are the same, you need to define separate source/target definitions and create mappings that use them.
However, you can use Stage Mapping Generator to generate a mapping for each file automatically.
PMy understanding is you have mant CSV files with different column layouts and you need to load them into appropriate tables in the Database.
Approach 1 : If you use any RDBMS you should have have some kind of import option. Explore that route to create tables based on csv files. This is a manual task.
Approach 2: Open the csv file and write formuale using the header to generate a create tbale statement. Execute the formula result in your DB. So, you will have many tables created. Now, use informatica to read the CSV and import all the tables and load into tables.
Approach 3 : using Informatica. You need to do lot of coding to create a dynamic mapping on the fly.
Proposed Solution :
mapping 1 :
1. Read the CSV file pass the header information to a java transformation
2. The java transformation should normalize and split the header column into rows. you can write them to a text file
3. Now you have all the columns in a text file. Read this text file and use SQL transformation to create the tables on the database
Mapping 2
Now, the table is available you need to read the CSV file excluding the header and load the data into the above table via SQL transformation ( insert statement) created by mapping 1
you can follow this approach for all the CSV files. I haven't tried this solution at my end but, i am sure that the above approach would work.
If you're not using any transformations, its wise to use Import option of the database. (e.g bteq script in Teradata). But if you are doing transformations, then you have to create as many Sources and targets as the number of files you have.
On the other hand you can achieve this in one mapping.
1. Create a separate flow for every file(i.e. Source-Transformation-Target) in the single mapping.
2. Use target load plan for choosing which file gets loaded first.
3. Configure the file names and corresponding database table names in the session for that mapping.
If all the mappings (if you have to create them separately) are same, use Indirect file Method. In the session properties under mappings tab, source option.., you will get this option. Default option will be Direct change it to Indirect.
I dont hav the tool now to explore more and clearly guide you. But explore this Indirect File Load type in Informatica. I am sure that this will solve the requirement.
I have written a workflow in Informatica that does it, but some of the complex steps are handled inside the database. The workflow watches a folder for new files. Once it sees all the files that constitute a feed, it starts to process the feed. It takes a backup in a time stamped folder and then copies all the data from the files in the feed into an Oracle table. An Oracle procedure gets to work and then transfers the data from the Oracle table into their corresponding destination staging tables and finally the Data Warehouse. So if I have to add a new file or a feed, I have to make changes in configuration tables only. No changes are required either to the Informatica Objects or the db objects. So the short answer is yes this is possible but it is not an out of the box feature.