I have two tables, A and B. Both are related by a field. What I want to do is update a field of a subset of data from table B. This subset is filtered by a table A data. The data to be set is also taken from table B but from another different subset of data also filtered by another data from table A. Can this be done in DB2 using a direct query?
Thank you
Related
I am currently using ClickHouse to store few billions of data each week. We use aggregated tables to fetch data, so far so good. Now there is a need to fetch a single row from this database.
ClickHouse is not meant for such a case, even though after applying some optimization recommended by ClickHouse single row select is still somehow slow (few seconds).
to clarify this little more, this table is indexed by columns a,b,c, and d and also partitioned monthly (The table has some more columns). A new service has to query this table whereas only knows a and b and z (a UUID column). However, the average response is between 3 and 10 seconds for 10 billion data.
I have an opportunity to add an extra data store layer so that I can store the data into an extra database for this need.
Now the actual question: What could be the best database for such a case where we only need to read a single row of billions of data?
P.S:
Due to storage and network cost, we can't use Redis
We can't add more columns to the select query to optimize the query
Cassandra?
You can use an additional table and a materialized view to emulate inverted index.
This additional table should be sorted by z and contain pk columns (a,b,c, d) from the main table.
Then query the main table like
select ... from main_table where (a,b,c,d) in
( select a,b,c,d from additional_table where z= ... )
and z = ...
additional_table can be automatically filled by the materialized view from the main_table.
I have DB- A and B. Both the DBs have a table called 'company'.
I want to check whether both the tables are identical with their data or not.
You could export the data from both tables (with the same ORDER BY clause) and compare the resulting files, or you could define a postgres_fdw foreign table that presents one table in the other database, then use EXCEPT to compute differences.
I am working on a star schema and I want to track the history of data for some dimensions and specifically for some columns. Is it possible to work with temporal tables as an other alternative ? If yes, how to store the current record in a temporal table? Also, is it logic that the source of my dimension will be the historical table of my temporal table?
Determining if two rows or expressions are equal can be a difficult and resource intensive process. This can be the case with UPDATE statements where the update was conditional based on all of the columns being equal or not for a specific row.
To address this need in the SQL Server environment the CHECKSUM function ,in your case is helpful as it natively creates a unique expression for comparison between two records.
So you will compare between your two sources which are logically the ODS and Datawarehouse. If the Chescksum between two different sources isn't the same, you will update the old record and insert the new updated one.
I have been unable to pin down how temporal table histories are stored.
If you have a table with several columns of nvarchar data and one stock quantity column that is updated regularly, does SQL Server store copies of the static columns for each change made to stock quantity, or is there an object-oriented method of storing the data?
I want to include all columns in the history because it is possible there will be rare changes to the nvarchar columns, but wary of the table history size if millions of qty updates are duplicating the other columns.
I suggest that you use the SQL Server temporal table only for the values that need monitoring otherwise the fixed unchanging attribute values would get duplicated with every change. SQL Server stores a whole new row whenever a row update occurs. See the docs:
UPDATES: On an UPDATE, the system stores the previous value of the row
in the history table and sets the value for the SysEndTime column to
the begin time of the current transaction (in the UTC time zone) based
on the system clock
You need to move your fixed varchar attributes/fields to another table and use a relation, 1:1 or whatever will be suitable.
Check also other relevant questions under the temporal-tables tag:
SQL Server - Temporal Table - Storage costs
SQL Server Temporal Table Creating Duplicate Records
Duplicates in temporal history table
I am new to SSIS and I hope someone can point me in the right direction!
I need to move data from one database to another. I have written a query that takes data from a number of tables (SOURCE). I then use a conditional split (Condition: Id = id) to a number of tables in the destination database. Here is my problem, I need another table populating which takes the ‘id’ value from the three tables and uses them in a fourth table as attributes, along with additional data from SOURCE.
I think I need to pass the id values to parameters but there does not seem a way to do this when inserting to ADO NET Destination.
Fourth table will have inserted id values(auto incremented) from table1, table2 and table3.
Am I going about this correctly or is there a better way?
Thanks in advance!
I know of no way to get the IDENTITY values of rows inserted in a Dataflow destination for use in the same Dataflow.
Probably the way to do what you want to do is to make a fourth branch in your dataflow inserting the columns that you have into the fourth table, and leaving the foreign keys (the ids from the other 3 tables) blank.
Then after the Dataflow, use an ExecuteSQL task to call a stored procedure that populates the missing columns in the fourth table by looking up their ids in the other three tables.
If your fourth table doesn't have the values you need to lookup the ids in the other three tables, then you can have the dataflow go to a staging table that does have those values, and populate the fourth table from the staging table while looking up the ids from the corresponding values.