My question is as follows, when using postgresql or mongodb if i have new data flowing into the DB can it handle duplicates?
Going more into the details the data is dynamic for example product AAA price 100$ type package, after the initial creation future updates of the data can contain old entries so there is a duplicates issue and also some fields of the older entries might be updated so what i want is only to get the updated data without creating duplicate entries so if the price changes on the older entries that gets updated and if there are newer entries they get inserted.
Is it possible on the DB level or am i asking for too much?
Related
My question is exactly as described above
My case is:
I need to migrate old server data to a new server
I have extracted all the required data on excel sheets
Example of an entry of an article
id 200
title whatever
linked_item_id 1400
Many deletion occured as filtering to the old data so it's not starting from 1 anymore
That's why when i import this data on the new server, I'll need the article to have ID 200 and the linked item to be 1400 so i can associate it on the new system using xlsx importer with a custom methods
After that, normal and proper usage for the DB should be happening through the app (This is to import the data only)
3 Questions here which i need the answer for:
1- I just want to make sure that updating the ID after saving the record won't harm the DB, will it ?
2- Will that affect the performance of loading the relations ?, as the foreign_keys isn't entered properly
3- Assuming i have newly initiated DB and i added first record and it got ID 1, then i manually updated the record to be ID 200
Will the next record take 201 or 2 ?
Thanks in advance
I have a 290 million source data set and I get a daily download of 12 million records daily which contain data from the previous days downloads. I am having trouble inserting the daily records into the source and excluding the records I already have. Some of the records that are new may not be from the previous day they could be several days back so a date restriction wont work. Please help.
I just had this exact same issue basically in your Data flow of your SSIS you need to add a Lookup. Have it match the data your inserting to the new data based on the PK. then you can separate the data from here, choose Redirect Rows to no match output. This will make the green arrow contain all data that is no present.
Lookup component using a key field and with the no match output, do an insert (you could also with the match output do an update; though 290million rows IS going to take A WHILE)...
I've got a table in SQLite, and it already has many rows stored in it. I know realise I need another column in the table. Up to now I've just deleted the database and started again because the data has just been test data. But now the data in the database can't be deleted.
I know the query to add a column to the table, my question is what is a good way to do this so that it works for both existing users and new users? (I have updated the CREATE query I have for when the table is not found (because it's a new user or an existing user has cleared the database). It seems wrong to have an ALTER query in software that ships, and check every time. Is there some way of telling SQLite to automatically add the column if it doesn't exist during the UPDATE query I now need?
If I discover I need more columns in the future, is having a bunch of ALTER statements on startup (or somewhere?) really the best way to do it?
(If relevant this is for a node js app)
I'd just throw a table somewhere that marks what version of your database it is, and check that to determine if an update is needed. Either that or if you have a table already where there's always going to be just one record in it add a new field 'DatabaseVersion' to it.
So for example if you check the version number, and find it's a version 1 database when the newest version should be version 3, you know which updates to perform on it.
You can use PRAGMA user_version to store the version number of the database and check if the database needs to be updated.
The project I'm working on tracks data on a year by year basis. The user will log into the system and specify the year it wants to access the data of. For example, the user could specify the year 2004, and the .jsp pages will display 2004 data.
My problem is that from 2013 onward, the data for one .jsp page will be different, and the current database table schema needs to be modified, but backwards compatibility for the 2012 and before years needs to be maintained.
Currently (2012 and before), the relevant database table displays two columns, "continuing students" & "new starts" that is displayed by a single .jsp. For 2013 and onward, 4 columns need to be displayed. The original two columns are being split into two subcategories each, undergrad and graduate. So I can't simply add those new columns to the existing table because that would violate third normal form.
What do you think the best way to manage this situation? How do I display the new data while still maintaining backwards compatibility to display the data for older years?
Some options:
Introduce the fields and allow for nulls for older data. I think you rejected this idea.
Create new table structures to store the new data going forward. It's an least an option if you don't want (1). You could easily create a view that queries from both tables and presents a unified set of data. You could also handle this in the UI and call two separate stored procedures depending on the year queried.
Create a new table with the new attributes and then optionally join back to your original table. This keeps the old table the same and the new table is just an extension of the old data. You would write a stored procedure potentially to take in the year and then return the appropriate data.
One of the things to really consider is that the old data is now inactive. If you aren't writing to it anymore, it's just historical data that can be "archived" mentally. In that case I think it's ok to freeze the schema and the data and let it live by itself.
Also consider if your customers are likely to change the schema yet again. If so, then maybe (1) is the best.
I receive new data files every day. Right now, I'm building the database with all the required tables to import the data and perform the required calculations.
Should I just append each new day's data to my current tables? Each file contains a date column, which would allow for a "WHERE" query in the future if I need to analyze data for one particular day. Or should I be creating a new set of tables for every day?
I'm new to database design (coming from Excel). I will be using SQL Server for this.
Assuming that the structure of the data being received is the same, you should only need one set of tables rather than creating new tables each day.
I'd recommend storing the value of the date column from your incoming data in your database, and also having a 'CreateDate' column in your tables, with a default value of 'GetDate()' so that it automatically gets populated with the current date when the row is inserted.
You may also want to have another column to store the data filename that the row was imported from, but if you're already storing the value of the date column and the date that the row was inserted, this shouldn't really be necessary.
In the past, when doing this type of activity using a custom data loader application, I've also found it useful to create log files to log success/error/warning messages, including some type of unique key of the source data and target database - ie. if coming from an Excel file and going into a database column, you could store the row index from Excel and the primary key of the inserted row. This helps tracking down any problems later on.
You might want to consider having a look at SSIS (SqlServer Integration Services). It's the SqlServer tool for doing ETL activities.
yes, append each day's data to the tables; 1 set of tables for all data.
yes, use a date column to identify the day that the data was loaded.
maybe have another table with a date column and a clob column. The date to contain the load date and the clob to contain the file that you imported.
Good question. You most definitely should have a single set of tables and append the data daily. Consider this: if you create a new set of tables each day, what would, say, a monthly report query look like? A quarterly report query? It would be a mess, with UNIONs and JOINs all over the place.
A single set of tables with a WHERE clause makes the querying and reporting manageable.
You might do a little reading on relational database theory. Wikipedia is a good place to start. The basics are pretty straightforward if you have the knack for it.
I would have the data load into a stage table regardless and append to the main tables after. Once a week i would then refresh all data in the main table to ensure that the data remains correct as per the source.
Marcus