Database column Varchar Int dilemma - database

I have an erd for recipes,
recipe->recipecomponent<-component
If i insert a recipe with ingredients, i would then insert on both recipe and component table then take the ids of both inserted then insert it to the middle table.
so the middle table has 2 col which are Foreign keys to the table and PK to the other 2 tables which are Auto Increment int types.
The problem now is that, when if i insert a recipe with 2 ingredients, since i would insert 2 rows on component which means i need to insert 2 ids from component into recipe component.
For example.
Say, i just inserted a recipe with 2 ingredients,
As i inserted in recipe the id is 1(AI,INT).
since it has 2 ingredients, i insert the 2 in component.
should then have 1(AI,INT) and 2(AI,INT).
i would then have to insert those ids(Which are PK to the 2 tables) as FK to the middle table.
Expected row would be on recipecomponent table is
recipeid - componentid
1 || 1 2
How do i insert on component id. Do i insert it with an array?
$insert_row = array('recipeid'=>$recipeid,'componentid'=>componentids);
Assuming that componentids is an array that contains 1,2 ids from component table.
This is no problem, but when you try to insert this. It will show in the value as ARRAY which gives off an error
Severity: Notice
Message: Array to string conversion
Filename: mysqli/mysqli_driver.php
Line Number: 553
and
Error Number: 1054
Unknown column 'Array' in 'field list'
INSERT INTO recipecomponent ( recipeid, componentid) VALUES ( 1,
Array)
Filename: C:\www\KG\system\database\DB_driver.php
Line Number: 330
I found a solution to this though, I converted it to string with implode
$new_component_id = implode(' ',$componentid);
but then since its now a string "1 2" and when i insert it to the column which is an int type it only shows in the row the first digit which is 1.
I thought about just inserting separately. this would have no problem for a recipe with only 2 ingredients.
would be like this then:
recipeid - componentid
1 || 1
1 || 2
but say i inserted a recipe with atleast 4 ingredients and many more to be inserted. Would it be a waste for memory?
If so, I was thinking if there was any character thats considered an integer but is accepted as a value to be inserted like, assume the character -
so when i insert the string 1-2 it would show up as 1-2 on my col which is an int type.
I need some professional help and advice.

The last option, with one record per ingredient, is the correct way to go. This is what the bridge table "recipecomponent" is for.
Inserting multiple values in the same column (like in your first example) is against normalisation (again, that's what the bridge table is for). More importantly, when you're querying for a particular id, having the ids on different records is quicker than parsing a string with multiple ids.
What happens with $new_component_id (i.e. the second id is cut off) is probably because of some data type conversion (whether this happens on the database side, or on the PHP side, is not explicit from the problem you're reporting).
If you wish to insert multiple ingredients using only one query, you can use the following syntax:
INSERT INTO recipecomponent (recipeid, componentid) VALUES (1,1), (1,2), (1,3);

Related

Copy data from one table to another with an array of structs in BigQuery

We are trying to copy data from one table to another using an INSERT INTO ... SELECT statement.
Our original table schema is as follows, with several columns including a repeated record containing 5 structs of various data types:
original table schema
We want an exact copy of this table, plus 3 new regular columns, so made an empty table with the new schema. However when using the following code the input table ends up with fewer rows overall than the original table.
insert into input_table
select column1, column2, null as newcolumn1, null as newcolumn2, null as newcolumn3,
array_agg(struct (arr.struct1, arr.struct2, arr.struct3, arr.struct4, arr.struct5)) as arrayname, column3
from original_table, unnest(arrayname) as arr
group by column1, column2, column3;
We tried the solution from this page: How to copy data from one table into another table which has a record repeated column in GCP Bigquery
but the query would error as it would treat the 5 structs within the array as arrays themselves (data type = eg. string, mode = repeated, rather than nullable/required).
The error we see says that our repeated record column "has type ARRAY<STRUCT<struct1name ARRAY, struct2name ARRAY, struct3name ARRAY, ...>> which cannot be inserted into column summary, which has type ARRAY<STRUCT<struct1name STRING, struct2name STRING, struct3name STRING, ...>> at [4:1]"
Additionally, a query to find rows that exist in the original but not in the input table returns no results.
We also need the columns in this order (cannot do a simple copy of the table and add the 3 new columns at the end).
Why are we losing rows when using the above code to do an insert into... select?
Is there a way to copy over the data in this way and retain the exact number of rows?

Filtering SQL rows based on certain alphabets combination

I have a column that store user input text field from a frontend website. User can input any kind of text in it, but they will also put in a specific alphabets combination to represent a job type - for example 'dri'. As an example:
Row 1: P49384; Open vehicle bonnet-BO-dri 22/10
Row 2: P93818; Vehicle exhaust-BO 10/20
Row 3: P1933; battery dri-pu-103/2
Row 4: P3193; screwdriver-pu 423
Row 5: X939; seats bo
Row 6: P9381-vehicle-pu-bo dri
In this case, I will like to filter only rows that contain dri. From the example, you can see the text can be in any order (user behaviour, they will key whatever they like without following any kind of format). But the constant is that for a particular job type, they will put in dri.
I know that I can simply use LIKE in SQL Server to get these rows. Unfortunately, row 4 is included inside when I use this operator. This is because screwdriver contains dri.
Is there any way in SQL Server I can do to strictly only obtain rows that has dri job type, while excluding words like screwdriver?
I tried to use PATINDEX but it failed too - PATINDEX('%[d][r][i]%', column) > 0
Thanks in advance.
Your data is the problem here. Unfortunately even for denormalised data it doesn't appear to have a reliable/defined format, making parsing your data in a language like T-SQL next to impossible. What problems are there? Based on the original sample data, at a glance the following problems exist:
The first data value's delimiter isn't consistent. Rows 1-5 use a semicolon (;), but row 6 uses a hyphen (-)
The last data value's delimiter isn't consistent. Row 1, 2 & 4 use a space ( ), but row 3 uses a hyphen (-).
Internal data doesn't use a consistent delimiter. For example:
Row 1 has a the value Open vehicle bonnet-BO-dri, which appears to be the values Open vehicle bonnet, BO and dri; so the hyphen(-) is the delimiter.
Row 5 has seats bo, which appears to be the values seats and bo, so uses a space ( ) as a delimiter.
The fact that row 6 has vehicle as its own value (vehicle-pu-bo-dri), however, implies that Open vehicle bonnet and Vehicle Exhaust (on rows 1 and 2 respectively) could actually be the values Open, vehicle, & bonnet and Vehicle & Exhaust respectively.
Honestly, the solution is to fix your design. As such, your tables should likely look something like this:
CREATE TABLE dbo.Job (JobID varchar(6) CONSTRAINT PK_JobID PRIMARY KEY NONCLUSTERED, --NONCLUSTERED Because it's not always ascending
YourNumericalLikeValue varchar(5) NULL); --Obviously use a better name
CREATE TABLE dbo.JobTypeCompleted(JobTypeID int IDENTITY (1,1) CONSTRAINT PK_JobTypeID PRIMARY KEY CLUSTERED,
JobID varchar(6) NOT NULL CONSTRAINT FK_JobType_Job FOREIGN KEY REFERENCES dbo.Job (JobID),
JobType varchar(30) NOT NULL); --Must likely this'll actually be a foreign key to an actual job type table
GO
Then, for a couple of your rows, the data would be inserted like so:
INSERT INTO dbo.Job (JobID, YourNumericalLikeValue)
VALUES('P49384','22/10'),
('P9381',NULL);
GO
INSERT INTO dbo.JobTypeCompleted(JobID,JobType)
VALUES('P49384','Open vehicle bonnet'),
('P49384','BO'),
('P49384','dri'),
('P9381','vehicle'),
('P9381','pu'),
('P9381','bo'),
('P9381','dri');
Then you can easily get the jobs you want with a simple query:
SELECT J.JobID,
J.YourNumericalLikeValue
FROM dbo.Job J
WHERE EXISTS (SELECT 1
FROM dbo.JobTypeCompleted JTC
WHERE JTC.JobID = J.JobID
AND JTC.JobType = 'dri');
You can apply like operator in your query as column_name like '%-dri'. It means find out records that end with "-dri"

Update strategy for table with sequence generated number as primary key in Informatica

I have a mapping that gets data from multiple sql server source tables and assigns a sequence generated number as ID for each rows. In the target table, the ID field is set as primary key.
Every time I run this mapping, it creates new rows and assigns a new ID for the records that are pre-existing in the target. Below is an example:
1st run:
ID SourceID Name State
1 123 ABC NY
2 456 DEF PA
2nd run:
ID SourceID Name State
1 123 ABC NY
2 456 DEF PA
3 123 ABC NY
4 456 DEF PA
Desired Output must:
1) create a new row and assign a new ID if a record gets updated in the source.
2) create a new row and assign a new ID if new rows are inserted in the source.
How can this be obtained in Informatica?
Thank you in advance!
I'll take a flyer and assume the ACTUAL question here is 'How can I tell if the incoming record is neither insert nor update so that I can ignore it'. You could
a) have some date field in your source data to identify when the record was updated and then restrict your source qualifier to only pick up records which were last updated after the last time this mapping ran... drawback is if fields you're not interested in were updated then you'll process a lot of redundant records
b) better suggestion!! Configure a dynamic lookup which should store the latest state of a record matching by the SourceID. Then you can use the newlookuprow indicator port to tell if the record is an insert, update or no change and filter out the no change records in a subsequent transformation
Give the ID field an IDENTITY PROPERTY...
Create Table SomeTable (ID int identity(1,1),
SourceID int,
[Name] varchar(64),
[State] varchar(64))
When you insert into it... you don't insert anything for ID. For example...
insert into SomeTable
select
SourceID,
[Name],
[State]
from
someOtherTable
The ID field will be an auto increment starting at 1 and increment by 1 each time a row is inserted. In regards to your question about adding rows each time one is updated or inserted into another table, this is what TRIGGERS are for.

TSQL: getting next available ID

Using SQL Server 2008, have three tables, table a, table b and table c.
All have an ID column, but for table a and b the ID column is an identity integer, for table c the ID column is a varchar type
Currently a stored procedure take a name param, following certain logic, insert to table a or table b, get the identity, prefix with 'A' or 'B' then insert to table c.
Problem is, table C ID column potentially have the duplicated values, i.e. if identity from table A is 2, there might already have 'A2','A3','A5' in the ID column for table C, how to write a T-SQL query to identify the next available value in table C then ensure to update table A/B accordingly?
[Update]
this is the current step,
1. depends on input parameter, insert to table A or table B
2. initialize seed value = ##Identity
3. calculate ID value to insert to table C by prefix 'A' or append 'B' with the seed value
4. look for record match in table C by ID value from step 3, if didn't find any record, insert it, else increase seed value by 1 then repeat step 3
The issue being at a certain value range, there could be a huge block of value exists in table C ID, i.e. A3000 to A500000 existed now in table C ID, the database query is extemely slow if follow the existing logic. Needs to figure out a logic to smartly get the minimum available number (without the prefix)
it is hard to describe, hope this make more sense, I truly appreciate any help on this Thanks in advance!
This should do the trick. Simple self extracting example will work in SSMS. I even made it out of order just in case. You would just change your table to be where #Data is and then change Identifier field to replace 'ID'.
declare #Data Table ( Id varchar(3) );
insert into #Data values ('A5'),('A2'),('B1'),('A3'),('B2'),('A4'),('A1'),('A6');
With a as
(
Select
ID
, cast(right(Id, len(Id)-1) as int) as Pos
, left(Id, 1) as TableFrom
from #Data
)
select
TableFrom
, max(Pos) + 1 as NextNumberUp
from a
group by TableFrom
EDIT: If you want to not worry about production data you could add this last part amending what I wrote:
Select
TableFrom
, max(Pos) as LastPos
into #Temp
from a
group by TableFrom
select TableFrom, LastPos + 1
from #Temp
Regardless if this was production environment you are going to have to hit part of it at some time to get data. If the datasets are not too large and just varchar(256) or less and only 5 million rows or less you could dump that entire column from tableC to a temp table. Honestly query performance versus imports change vastly from system to system.
Following your design there shouldn't be any duplicates in Table C considering that A and B are unique.
A | B | C
1 1 A1
2 2 A2
B1
B2

how to increment a sequence number while triggering data from one table to another

I want to write a trigger to transfer some columns of all inserted rows in a table to another table while incrementing the maximum number in a sequence number field in the destination table. this field is not auto increment but is a primary key field.
What I used to do was find the max sequence no in destination table, increment and then insert the new value. This worked fine if data is inserted row at a time. But when many rows are inserted from a single query, how can I increment the sequence number? Sample problem follows:
insert into [mssql].mssql.dbo.destination_table (name,seq_no)
select name,?
from inserted
even few thousand rows can be inserted at once.
seq_no is part of a composite primary key. So for example if data is inserted under different name seq_no will be different. (This requirement should be considered when I can increment the seq_no without considering its part in the primary key)
Okay, I got your problem, try this
insert into [mssql].mssql.dbo.destination_table (name,seq_no)
select name, x.MaxSeq + row_number() over (order by name)
from inserted, (select Max(seq_no) As MaxSeq From source_table) x

Resources