Inject SQL before Codeigniters $this->db->get() call - sql-server

Here is what I try to do. I have a table with the following structure, that is supposed to hold translated values of other data in any other table
Translations
| Language id | translation | record_id | column_name | table_name |
====================================================================
| 1 | Hello | 1 | test_column | test_table |
| 2 | Aloha | 1 | test_column | test_table |
| 1 | Test input | 2 | test_column | test_table |
In my code I use in my views, I have a function that looks up this table, and returns the string in the language of the user. If the string is not translated in his language, the function returns the string in the default of the application (let's say with ID = 1)
It works fine, but I have to go through about 600 view files to apply this... I was wondering if it was possible to inject some SQL in my CodeIgniter models right before the $this->db->get() of the original record, that replaces the original column with the translated one.
Something like this:
$this->db->select('column_name, col_2, col_3');
// Injected SQL pseudocode:
// If RECORD EXISTS in table Translations where Language_id = 2 and record_id = 2 AND column_name = test_column AND table_name = test_table
// BEGIN
// SELECT translations.translation as column_name
// WHERE translations.table_name = test_table AND column_name = test_column AND record_id = 2
// END
// ELSE
// BEGIN
// SELECT translations.translation as column_name
// WHERE translations.table_name = test_table AND column_name = test_column AND record_id = 1
// END
$this->db->get('test_table');
Is this possible to be done somehow?

what you're asking for doesn't really make sense. You "inject" by simply making different query first, then altering your second query based on the results.
the other option (perhaps better) would be to do all of this in a stored procedure, but it is still essentially the same, just with less connections & prolly quicker processing

Related

SQL Server 2017 - get column name, datatype and value of table

I thought it was a simple task but it's a couple of hours I'm still struggling :-(
I want to have the list of column names of a table, together with its datatype and the value contained in the columns, but have no idea how to bind the table itself to get the current value:
DECLARE #TTab TABLE
(
fieldName nvarchar(128),
dataType nvarchar(64),
currentValue nvarchar(128)
)
INSERT INTO #TTab (fieldName,dataType)
SELECT
i.COLUMN_NAME,
i.DATA_TYPE
FROM
INFORMATION_SCHEMA.COLUMNS i
WHERE
i.TABLE_NAME = 'Users'
Expected result:
+------------+----------+---------------+
| fieldName | dataType | currentValue |
+------------+----------+---------------+
| userName | nvarchar | John |
| active | bit | true |
| age | int | 43 |
| balance | money | 25.20 |
+------------+----------+---------------+
In general the answer is: No, this is impossible. But there is a hack using text-based containers like XML or JSON (v2016+):
--Let's create a test table with some rows
CREATE TABLE dbo.TestGetMetaData(ID INT IDENTITY,PreName VARCHAR(100),LastName NVARCHAR(MAX),DOB DATE);
INSERT INTO dbo.TestGetMetaData(PreName,LastName,DOB) VALUES
('Tim','Smith','20000101')
,('Tom','Blake','20000202')
,('Kim','Black','20000303')
GO
--Here's the query
SELECT C.colName
,C.colValue
,D.*
FROM
(
SELECT t.* FROM dbo.TestGetMetaData t
WHERE t.Id=2
FOR XML PATH(''),TYPE
) A(rowSet)
CROSS APPLY A.rowSet.nodes('*') B(col)
CROSS APPLY(VALUES(B.col.value('local-name(.)','nvarchar(500)')
,B.col.value('text()[1]', 'nvarchar(max)'))) C(colName,colValue)
LEFT JOIN INFORMATION_SCHEMA.COLUMNS D ON D.TABLE_SCHEMA='dbo'
AND D.TABLE_NAME='TestGetMetaData'
AND D.COLUMN_NAME=C.colName;
GO
--Clean-Up (carefull with real data)
DROP TABLE dbo.TestGetMetaData;
GO
Part of the result
+----------+------------+-----------+--------------------------+-------------+
| colName | colValue | DATA_TYPE | CHARACTER_MAXIMUM_LENGTH | IS_NULLABLE |
+----------+------------+-----------+--------------------------+-------------+
| ID | 2 | int | NULL | NO |
+----------+------------+-----------+--------------------------+-------------+
| PreName | Tom | varchar | 100 | YES |
+----------+------------+-----------+--------------------------+-------------+
| LastName | Blake | nvarchar | -1 | YES |
+----------+------------+-----------+--------------------------+-------------+
| DOB | 2000-02-02 | date | NULL | YES |
+----------+------------+-----------+--------------------------+-------------+
The idea in short:
Using FOR XML PATH(''),TYPE will create a XML representing your SELECT's result set.
The big advantage with this: The XML's element will carry the column's name.
We can use a CROSS APPLY to geht the column's name and value
Now we can JOIN the metadata from INFORMATION_SCHEMA.COLUMNS.
One hint: All values will be of type nvarchar(max) actually.
The value being a string type might lead to unexpected results due to implicit conversions or might lead into troubles with BLOBs.
UPDATE
The following query wouldn't even need to specify the table's name in the JOIN:
SELECT C.colName
,C.colValue
,D.DATA_TYPE,D.CHARACTER_MAXIMUM_LENGTH,IS_NULLABLE
FROM
(
SELECT * FROM dbo.TestGetMetaData
WHERE Id=2
FOR XML AUTO,TYPE
) A(rowSet)
CROSS APPLY A.rowSet.nodes('/*/#*') B(attr)
CROSS APPLY(VALUES(A.rowSet.value('local-name(/*[1])','nvarchar(500)')
,B.attr.value('local-name(.)','nvarchar(500)')
,B.attr.value('.', 'nvarchar(max)'))) C(tblName,colName,colValue)
LEFT JOIN INFORMATION_SCHEMA.COLUMNS D ON CONCAT(D.TABLE_SCHEMA,'.',D.TABLE_NAME)=C.tblName
AND D.COLUMN_NAME=C.colName;
Why?
Using FOR XML AUTO will use attribute centered XML. The elements name will be the tables name, while the values rest within attributes.
UPDATE 2
Fully generic function:
CREATE FUNCTION dbo.GetRowWithMetaData(#input XML)
RETURNS TABLE
AS
RETURN
SELECT C.colName
,C.colValue
,D.*
FROM #input.nodes('/*/#*') B(attr)
CROSS APPLY(VALUES(#input.value('local-name(/*[1])','nvarchar(500)')
,B.attr.value('local-name(.)','nvarchar(500)')
,B.attr.value('.', 'nvarchar(max)'))) C(tblName,colName,colValue)
LEFT JOIN INFORMATION_SCHEMA.COLUMNS D ON CONCAT(D.TABLE_SCHEMA,'.',D.TABLE_NAME)=C.tblName
AND D.COLUMN_NAME=C.colName;
--You call it like this (see the extra paranthesis!)
SELECT * FROM dbo.GetRowWithMetaData((SELECT * FROM dbo.TestGetMetaData WHERE ID=2 FOR XML AUTO));
As you see, the function does not even has to know anything in advance...

Splitting up one row/column into one or mulitple rows over two columns

First post here! I'm trying to update a stored procedure in my employer's Data Warehouse that's linking two tables on their ID's. The stored procedure is based on 2 columns in Table A. It's primary key, and a column that contains the primary keys from Table B and it's domain in one column. Note that it physically only needs Table A since the ID's from B are in there. The old code used some PATINDEX/SUBSTRING code that assumes two things:
The FK's are always 7 characters long
Domain strings look like this "#xx-yyyy" where xx has to be two characters and yyyy four.
The problem however:
We've recently outgrown the 7-digit FK's and are now looking at 7 or 8 digits
Longer domain strings are implemented (where xx may be between 2 or 15 characters)
Sometimes there is no domain string. Just some FK's, delimited the same way.
The code is poorly documented and includes some ID exceptions (not a problem, just annoying)
Some info:
The Data Warehouse follows the Data Vault method and this procedure is stored on SQL Server and is triggered by SSIS. Subsequent to this procedure the HUB and Satellites are updated so in short: I can't just create a new stored procedure but instead will try to integrate my code into the old stored procedure.
The servers is running on SQL Server 2012 so I can't use string_split
This platform is dying out so I just have to "keep it running" for this year.
An ID and domain are always seperated with one space
If a record has no foreign keys it will always have an empty string
When a record has multiple (foreign) ID's it will always use the same delimiting, even when the individual FK's have no domain string next to it. Delimiter looks like this:
"12345678 #xx-xxxx[CR][CR][CR][LF]12345679 #yy-xxxx"
I've managed to create some code that will assign row numbers and is flexible in recognising the amount of FK's.
This is a piece of the old code:
DECLARE
#MAXCNT INT = (SELECT MAX(ROW) FROM #Worktable),
#C_ID INT,
#R_ID INT,
#SOURCE CHAR(5),
#STRING VARCHAR(20),
#VALUE CHAR(20),
#LEN INT,
#STARTSTRINGLEN INT =0,
#MAXSTRINGLEN INT,
#CNT INT = 1
WHILE #CNT <= #MAXCNT
BEGIN
SELECT #LEN=LEN(REQUESTS),#STRING =REQUESTS, #C_ID =C_ID FROM #Worktable WHERE ROW = #CNT
--1 REQUEST RELATED TO ONE CHANGE
IF #LEN < 17
BEGIN
INSERT INTO #ChangeRequest
SELECT #C_ID,SUBSTRING(#STRING,0,CASE WHEN PATINDEX('%-xxxx%',#STRING) = 0 THEN #LEN+1 ELSE PATINDEX('%-xxxx%',#STRING)-4 END)
--SELECT #STRING AS STRING, #LEN AS LENGTH
END
ELSE
-- MULTIPLE REQUESTS RELATED TO ONE CHANGE
SET #STARTSTRINGLEN = 0
WHILE #STARTSTRINGLEN<#LEN
BEGIN
SET #MAXSTRINGLEN = (SELECT PATINDEX('%-xxxx%',SUBSTRING(#STRING,#STARTSTRINGLEN,#STARTSTRINGLEN+17)))+7
INSERT INTO #ChangeRequest
--remove CRLF
SELECT #C_ID,
REPLACE(REPLACE(
substring(#string,#STARTSTRINGLEN+1,#MAXSTRINGLEN )
, CHAR(13), ''), CHAR(10), '')
SET #STARTSTRINGLEN=#STARTSTRINGLEN+#MAXSTRINGLEN
IF #MAXSTRINGLEN = 0 BEGIN SET #STARTSTRINGLEN = #len END
END
SET #CNT = #CNT + 1;
END;
Since this loop is assuming fixed lengths I need to make it more flexible. My code:
(CASE WHEN LEN([Requests]) = 0
THEN 0
ELSE (LEN(REPLACE(REPLACE(Requests,CHAR(10),'|'),CHAR(13),''))-LEN(REPLACE(REPLACE(Requests,CHAR(10),''),CHAR(13),'')))+1
END)
This consistently shows the accurate number of FK's and thus the number of rows to be created. Now I need to create a loop in which to physically create these rows and split the FK and domain into two columns.
Source table:
+---------+----------------------------------------------------------------------------+
| Some ID | Other ID's |
+---------+----------------------------------------------------------------------------+
| 1 | 21 |
| 2 | 31 #xxx-xxx |
| 3 | 41 #xxx-xxx[CR][CR][CR][LF]42 #yyy-xxx[CR][CR][CR][LF]43 #zzz-xxx |
| 4 | 51[CR][CR][CR][LF]52[CR][CR][CR][LF]53 #xxx-xxx[CR][CR][CR][LF]54 #yyy-xxx |
| 5 | <empty string> |
+---------+----------------------------------------------------------------------------+
Target table:
+-----+----------------+----------------+
| SID | OID | Domain |
+-----+----------------+----------------+
| 1 | 21 | <empty string> |
| 2 | 31 | xxx-xxx |
| 3 | 41 | xxx-xxx |
| 3 | 42 | yyy-xxx |
| 3 | 43 | zzz-xxx |
| 4 | 51 | <empty string> |
| 4 | 52 | <empty string> |
| 4 | 53 | xxx-xxx |
| 4 | 54 | yyy-xxx |
| 5 | <empty string> | <empty string> |
+-----+----------------+----------------+
Currently all rows are created but every one beyond the first for each SID is empty.

SQL Server select from multiple tables stored in a column or list

Because it is too complicated to solve this problem without real data, I will try to add some:
| tables 1 | table 2 | ... | table n
---------------------------------------------------------------------------------------
columns_name: | name | B | C | D | name | B | C | D | ... | name | B | C | D
---------------------------------------------------------------------------------------
column_content:| John | ... | Ben | ... | ... | John| ...
The objective is to extract the rows in the N tables where name = 'John'.
Where we already have a table called [table_names] with the n tables names stored in the column [column_table_name].
Now we want to do something like that:
SELECT [name]
FROM (SELECT [table_name]
FROM INFORMATION_SCHEMA.TABLES)
WHERE [name] = 'Jonh'
Tables names are dynamic and thus unknown until we run the information_schema.tables query.
This final query is giving me an error. Any clue about how to use multiple stored tables names in a subquery?
You need to alias your subquery in order to reference it. Plus name should be table_name
SELECT [table_name]
FROM (SELECT [table_name]
FROM INFORMATION_SCHEMA.TABLES) AS X
WHERE [table_name] = 'Jonh'

Most efficient way to separate rows into groups populate a main ID field to link the sub groups in a T-SQL in a large dataset

I have a data file that I need to format correctly in order to use. It is quite a large file (roughly 3.4 million rows).
The issue is the format the file I am being sent is in a totally different format to how I need to use the file. I have no say over the format of the file as it is from an external source.
Source file:
100000001 567890 123456ZZZ 0 Description line
100000001 X999999999999 1
100000001 Y999999999999 1
100000001 Z999999999999 1
100000001 123456789 2
100000001 234567890 2
100000001 567890 123456YYY 0 Description line
100000001 X999999999999 1
100000001 Y999999999999 1
100000001 Z999999999999 1
100000001 123456789 2
100000001 234567890 2
100000002 678901 123456ZZZ 0 Description line
100000002 Y999999999999 1
100000002 Z999999999999 1
100000002 123456789 2
The issue is with the exception of the first 9 characters which determines the main record data the data is fixed widths but these change depending on the type which is a number 0-2.
So in this case the data contains 3 records, consisting of two groups of data which have different formats but those other lines do not have any of the reference information on them (123456ZZZ, 123456YYY).
My plan was to split the data into three separate tables, one for the main records (type 0), one for the 2nd group (type 1) and one for the final group (type 2).
To do this however I would need to populate the data tables for type 1 and type 2 with the two blocks of information from the main record.
567890
123456
YYY
This would then result in the following tables.
Table 1 - Main Records (Type 0)
| ID | Ref | Model | Range | Variant | Description |
|----|-----------|--------|--------|---------|------------------|
| 01 | 100000001 | 567890 | 123456 | ZZZ | Description line |
| 02 | 100000001 | 567890 | 123456 | YYY | Description line |
| 03 | 100000002 | 678901 | 123456 | ZZZ | Description line |
Table 2 - Group 1 (Type 1)
| Ref | ID | Part |
|-----------|----|---------------|
| 100000001 | 01 | X999999999999 |
| 100000001 | 01 | Y999999999999 |
| 100000001 | 01 | Z999999999999 |
| 100000001 | 02 | X999999999999 |
| 100000001 | 02 | Y999999999999 |
| 100000001 | 02 | Z999999999999 |
| 100000001 | 02 | Y999999999999 |
| 100000001 | 03 | Z999999999999 |
Table 2 - Group 2 (Type 2)
| Ref | ID | Operation |
|-----------|----|-----------|
| 100000001 | 01 | 123456789 |
| 100000001 | 01 | 234567890 |
| 100000001 | 02 | 123456789 |
| 100000001 | 02 | 234567890 |
| 100000001 | 03 | 123456789 |
The ID column in table 2 and 3 being used to link to the main record to use a join on the final select to bring back these rows as well when the relevant search to find the main record.
The issue I am having is so far the best way to do this I have managed to find is using a CURSOR but obviously this is a very bad way of doing this as there are a lot of records and just the test data set of a few thousand rows is taking a while to run so 3.4 mil rows will take well in excess of a day to complete.
My knowledge of T-SQL for this type of manipulation is quite limited and as I found on another issue with MySQL often the answers you find when doing a search might not be the best way of doing something (as I have found with the CURSOR) so thought I would seek some advice.
Building a program in C# or some other language would be the best way to do it, that is what I am cuurently doing. One way you could do this without coding is to create a dummy table with an index on the Ref column it's not mandatory but makes the solution faster. Then after the whole file is inserted into the table you could a couple of INSERT statements with a select to put the data into the correct tables. After that truncate the dummy table so it will be ready for the next file coming in. If you use SQL Server Job Agent you could automate this to it would be completely hands-off. This is what previously worked for me and I couldn't find a better way. The best solution is to use SSIS by having a script run to break up the file and make several smaller files then load them into the correct tables. SSIS would be a more permanent solution.
Looking at the previous suggestion of using some inserts the solution I have ended up using which has taken the time to run from a CURSOR of about 16 days down to just under 1 hour was a series of INSERT statements in a WHILE loop.
DECLARE #counter int
SELECT #counter = COUNT(ID) FROM import_table
DECLARE #i int = 1
DECLARE #ID int, #type int
WHILE (#i <= #counter)
BEGIN
IF ((SELECT type FROM import_table WHERE ID = #i) = 0)
BEGIN
-- Set ID for main record
SELECT #ID = ID FROM import_table WHERE ID = #i
-- Import data rows into main table
INSERT INTO data_main (...)
SELECT (...)
FROM import_table
WHERE pckID = #i
END
IF ((SELECT type FROM import_table WHERE ID = #i) = 1)
BEGIN
-- Import data rows into main table
INSERT INTO data_group_1 (..., mainID)
SELECT (..., #ID)
FROM import_table
WHERE pckID = #i
END
IF ((SELECT type FROM import_table WHERE ID = #i) = 2)
BEGIN
-- Import data rows into main table
INSERT INTO data_group_2 (..., mainID)
SELECT (..., #ID)
FROM import_table
WHERE pckID = #i
END
SET #i = #i + 1
END
While it may not be the most efficient way and I certainly agree that a program or looking into the SSIS functions would be the best route forwards at the moment this is certainly more efficient in respect of time than using the CURSOR to loop through all the records.
The only requirement is that I have to make sure that the import_table has its AI ID value reset to 1 each time which I will put as part of the being of the stored proc.
In essence this set of code uses a counter value to count the total number of rows in the import table, where the ID beginning at 1 will link nicely to an incrementing counter.
Then the next part of the selection is a WHILE loop using a simple #i integer to increase this value each time.
Every time a record that has its type as 0 the #ID value is set to this records ID value and that data is inserted using an INSERT statement.
If the type field is 1 or 2 then the #ID value is not changed but instead used to populate the mainID column of the Group 1 or Group 2 tables which run an INSERT statement using that data.
To get around the issue with the field lengths being different on each type with the exception of the REF column and TYPE column for the import_table I have set 5 columns.
ID
Ref
ContentA
Type
ContentB
Then in each of the insert statements the ID field is auto incremented, the Ref comes directly from that field in the main data as does type.
For the other fields I am using a set of SUBSTRING lookups to extract the data at the points the original specification lists.
, Ref
, SUBSTRING(ContentA,2,5)
, SUBSTRING(ContentA,6,2)
, SUBSTRING(ContentB,1,10)
, SUBSTRING(ContentA,11,5)
Again I am sure that there is probably a better way of doing this by writing a program in C# or using SSIS.
In the interim however this solution is certainly a more efficient way of looping through the records than using a CURSOR.
Hopefully this makes sense and will prove useful to somebody else who is trying to extract data from one source into several tables using purely a SQL Stored Proc without using SSIS.

SSIS data manipulation

I am currently using SSIS to read the data from a table, modify a column and inset it into a new table.
The modification I want to perform will occur if a previously read row has an identical value in a particular column.
My original idea was to use a c# script with a dictionary containing previously read values and a count of how many times it has been seen.
My problem is that I cannot save a dictionary as an SSIS variable. Is it possible to save a C# variable inside an SSIS script component? or is there another method I could use to accomplish this.
As an example, the data below
/--------------------------------\
| Unique Column | To be modified |
|--------------------------------|
| X5FG | 0 |
| QFJD | 0 |
| X5FG | 0 |
| X5FG | 0 |
| DFHG | 0 |
| DDFB | 0 |
| DDFB | 0 |
will be transformed into
/--------------------------------\
| Unique Column | To be modified |
|--------------------------------|
| X5FG | 0 |
| QFJD | 0 |
| X5FG | 1 |
| X5FG | 2 |
| DFHG | 0 |
| DDFB | 0 |
| DDFB | 1 |
Rather than use a cursor, just use a set based statment
Assuming SQL 2005+ or Oracle, use the ROW_NUMBER function in your source query like so. What's important to note is the PARTITION BY defines your group/when the numbers restart. The ORDER BY clause directs the order in which the numbers are applied (most recent mod date, oldest first, highest salary, etc)
SELECT
D.*
, ROW_NUMBER() OVER (PARTITION BY D.unique_column ORDER BY D.unique_column ) -1 AS keeper
FROM
(
SELECT 'X5FG'
UNION ALL SELECT 'QFJD'
UNION ALL SELECT 'X5FG'
UNION ALL SELECT 'X5FG'
UNION ALL SELECT 'DFHG'
UNION ALL SELECT 'DDFB'
UNION ALL SELECT 'DDFB'
) D (unique_column)
Results
unique_column keeper
DDFB 0
DDFB 1
DFHG 0
QFJD 0
X5FG 0
X5FG 1
X5FG 2
You can create a script component. When given the choice, select the row transformation (instead of source or destination).
In the script, you can create a global variable that you will update in the process row method.
Perhaps SSIS isn't the solution for this one task. Using a cursor with a table-valued variable you would be able to accomplish the same result. I'm not a fan of cursors in most situation, but when you need to iterate through data that depends on previous iterations or is self-reliant then it can be useful. Here's an example:
DECLARE
#value varchar(4)
,#count int
DECLARE #dictionary TABLE ( value varchar(4), count int )
DECLARE cur CURSOR FOR
(SELECT UniqueColumn FROM SourceTable s)
OPEN cur;
FETCH NEXT FROM cur INTO #value;
WHILE ##FETCH_STATUS = 0
BEGIN
DECLARE #innerCount int = 0
IF NOT EXISTS (SELECT 1 FROM #dictionary WHERE value = #value)
BEGIN
INSERT INTO #dictionary ( value, count )
VALUES( #value, 0 )
END
ELSE
BEGIN
SET #innerCount = (SELECT count + 1 FROM #dictionary WHERE value = #value)
UPDATE #dictionary
SET count = #innerCount
WHERE value = #value
END
INSERT INTO TargetTable ( value, count )
VALUES (#value, #innerCount)
FETCH NEXT FROM cur INTO #value;
END

Resources