Resolving Ambiguos columns without Ailas - sql-server

I have a Dynamic query where the select columns are dynamically prepared. As this is a common code and I have some specific column choices, I have to put Aliases according to my query.
For this I had to use string replace.
I want to know if there is a provision in SQL Server to resolve the ambiguous columns without Alias.
i.e. Something like take values always from Table1 when there is ambiguity.
DECLARE #ColumnNames VARCHAR(1000)
DECLARE #SQLStmt VARCHAR(MAX)
--Below I am getting list of column names comma separated 'Id,Name,Salary'....
SELECT #ColumnNames = testDB.dbo.testfuncion(45)
SET #SQLStmt = 'SELECT '+#ColumnNames+' FROM Table1 LEFT Join Table2 ON Table1.ID = Table2.ID'
Name exists in both the tables, but lets say table two has less matching records. So I always want to get Name from Table1.
In anyway can we set a preference on one particular table using any inbuilt clause.

Simply prepend the columns with the true table name, like this:
SET #SQLStmt = 'SELECT Table1.'+REPLACE(#ColumnNames,',',',Table1.')+' FROM Table1 LEFT Join Table2 ON Table1.ID = Table2.ID'
This should turn your string of Id,Name,Salary into Table1.Id,Table1.Name,Table1.Salary

Related

GROUP BY T1.* ? Group by all columns in Table1, joined left by table 2, and Aggregate functions on T2 columns?

I have a query that is merging 2 tables. Table 1 has many columns, and may eventually expand. Table 2 also has several columns, but I will be performing aggregate functions on 90% of its columns. Table 1 has 300 + rows, Table 2 has 84K + rows.
SELECT
t1.*
,t2.c2
,SUM(t2.c3)
,SUM(t2.c4)
FROM
Table1 AS t1
LEFT JOIN Table2 AS t2 ON t1.c10 = t2.c1
GROUP BY
t1.*
,t2.c2
I'm getting an error Incorrect Syntax near '*' and it points to the line containing the GROUP BY statement.
I am aware that the SELECT t1.* works as I ran this portion prior to trying to aggregate T2 columns and it worked as expected.
Is there a way to quickly GROUP BY all the columns in T1? I know normally we would select only needed columns, but in this case, I need all the T1 columns.
Previous research has led me to only find instances where 1 table was used, and mostly people were looking to get or remove duplicate values. I'm looking to specifically combine the 300 records of T1 to the 84K records of T2 without having to name off all the columns from T1 in the GROUP BY section.
This method is slightly unconventional, but you can pass it into a variable by using dynamic sql. Below is an example of how you can do it:
declare #test nvarchar(max)
set #test = ''
select #test += Column_name +',' from information_schema.columns where table_name='Table1'
DECLARE #sql nvarchar(max)
SELECT #sql = N'SELECT top 10 ' +#test+ 'NULL as a FROM Table1;'
EXEC sp_executesql #sql
You can apply the same principle and rewrite your query to use the group by function. Hope this helps.
Based on the article posted by #wosi, https://dba.stackexchange.com/questions/21226/why-do-wildcards-in-group-by-statements-not-work, I was able to modify the code and get the expected results. Please note I went from 80K to 70K because I was joining the tables on 1 column. The way my data was structured I had to join on 2 columns. Final code looks something like this:
SELECT
t1.*
,t2.c2
,t2.c3
,t2.c4
FROM
Table1 AS t1
LEFT JOIN
(SELECT c2, SUM(c3), SUM(c4)
FROM Table2
GROUP BY c2) AS t2
ON t1.c10 = t2.c1 AND t1.c15 = t2.c2
You can't use * in GroupBy Statement. Of course, there are some Dynamic SQL to prevent typing all columns in the SP but if you are using T-SQL in a view you should type all columns.

SQL Query Results to Local Variable without unique identifier

I'm relatively new to SQL and I'm trying to write a query that will assign the result of multiple rows into a local variable
DECLARE #x VARCHAR(MAX)
SET #x= (SELECT someCol
FROM table)
--Does important stuff to the #x variable
SELECT #x
During my research I realized that this won't work because the subquery can only return one value and my query will return multiple results. However I can not do something like this:
DECLARE #x VARCHAR(MAX)
SET #x= (SELECT someCol
FROM table
where id= 'uniqueIdentifier')
--Does important stuff to the #x variable
SELECT #x
The reason I can't use a where clause is that I need to do this for the entire table and not just one row. Any ideas?
EDIT: I realized my question was too broad so I'll try to reformat the code to give some context
SELECT col_ID, col_Definition
FROM myTable
If I were to run this query col_Definition would return a large varchar which holds a lot of information such as the primary key of another table that I'm trying to obtain. Lets say for example I did:
DECLARE #x VARCHAR(MAX)
SET #x= (SELECT col_Definition
FROM myTable
WHERE col_ID = 1)
--Function to do the filtering of the varchar variable that works as expected
SELECT #x as [Pk of another table] --returns filtered col_Definition
This will work as expected because it returns a single row. However, I would like to be able to run this query so that it will return the filtered varchar for every single row in the "myTable" table.
If I understand correctly, you store a PK embedded in a string that you want to eventually get out and use to join to that table. I would think putting the group of records you want to work with into a temp table and then applying some logic to that varchar column to get the PK. That logic is best as set based, but if you really want row by row, use a scalar function rather than a variable and apply it to the temp table:
select pk_column, dbo.scalarfunction(pk_column) as RowByRowWithFunction, substring(pk_column,5,10) as SetBasedMuchFaster
from #tempTable.
You need to define what the 'uniqueIdentifier' is first.
Not sure about using a subquery and grabbing the result and executing another query with that result unless you do an INNER JOIN of some sort or if using python or another language to process the data then use:
("SELECT someCol
FROM table
where id='%s'" % Id)

SQL Server 2012 UPDATE REPLACE with 2 columns from another table

I'm trying to update a column in one table (#TEMPTABLE) using data in another table (#PEOPLE) by using the REPLACE() function.
#TEMPTABLE has a column called "NameString' that is a long string with a user's name and ID.
#PEOPLE has a column for ID, and IDnumber.
UPDATE #TEMPTABLE
SET NAMEString = REPLACE(NAMEString, a.[ID], a.[IDNumber]) FROM #PEOPLE a
I'm trying to replace all the ID's in the NameString Column with the IDnumbers coming from #People table.
You need to give it a join criterion. For example:
update #TEMPTABLE
set NAMEString = replace(a.NAMEString, a.ID, b.IDNumber)
from #TEMPTABLE a
left join #PEOPLE b
on a.id = b.ID
Also, make sure to reference the right table when you use the IDNumber column - your original query doesn't actually use the table containing IDNumber at all, as far as I can tell from your description of your tables.
Note that my example assumes there's an ID field in #TempTable, or something else to join on - otherwise, you may need to extract it from the NameString column first.

2 nvarchar fields are not matching though the data is same?

I want to join 2 tables using an Inner Join on 2 columns, both are of (nvarchar, null) type. The 2 columns have the same data in them, but the join condition is failing. I think it is due to the spaces contained in the column values.
I have tried LTRIM, RTRIM also
My query:
select
T1.Name1, T2.Name2
from
Table1 T1
Inner Join
Table2 on T1.Name1 = T2.Name2
I have also tried like this:
on LTRIM(RTRIM(T1.Name1)) = LTRIM(RTRIM(T2.Name2))
My data:
Table1 Table2
------ ------
Name1(Column) Name2(Column)
----- ------
Employee Data Employee Data
Customer Data Customer Data
When I check My data in 2 tables with
select T1.Name1,len(T1.Name1)as Length1,Datalength(T1.Name1)as DataLenght1 from Table1 T1
select T2.Name2,len(T2.Name2)as Length2,Datalength(T2.Name2)as DataLenght2 from Table2 T2
The result is different Length and DataLength Values for the 2 tables,They are not same for 2 tables.
I can't change the original data in the 2 tables. How can I fix this issue.
Thank You
Joins do not have special rules for equality. The equality operator always works the same way. So if a = b then the join on a = b would work. Therefore, a <> b.
Check the contents of those fields. They will not be the same although you think they are:
select convert(varbinary(max), myCol) from T
Unicode has invisible characters (that only ever seem to cause trouble).
declare #t table (name varchar(20))
insert into #t(name)values ('Employee Data'),('Customer Data')
declare #tt table (name varchar(20))
insert into #tt(name)values ('EmployeeData'),('CustomerData')
select t.name,tt.name from #t t
INNER JOIN #tt tt
ON RTRIM(LTRIM(REPLACE(t.name,' ',''))) = RTRIM(LTRIM(REPLACE(tt.name,' ','')))
I would follow the following schema
Create a new table to store all the possible names
Add needed keys and indexes
Populate with existing names
Add columns to your existing tables to store the index of the name
Create relative foreign keys
Populate the new columns with correct indexes
Create procedure to perform an insert in new tables names only in case the value is not existing
Perform the join using the new indexes

How to SELECT * but without "Column names must be unique in each view"

I need to encapsulate a set of tables JOINs that we freqently make use of on a vendor's database server. We reuse the same JOIN logic in many places in extracts etc. and it seemed a VIEW would allow the JOINs to be defined and maintained in one place.
CREATE VIEW MasterView
AS
SELECT *
FROM entity_1 e1
INNER JOIN entity_2 e2 ON e2.parent_id = entity_1.id
INNER JOIN entity_3 e3 ON e3.parent_id = entity_2.id
/* other joins including business logic */
etc.
The trouble is that the vendor makes regular changes to the DB (column additions, name changes) and I want that to be reflected in the "MasterView" automatically.
SELECT * would allow this, but the underlying tables all have ID columns so I get the "Column names in each view must be unique" error.
I specifically want to avoid listing the column names from the tables because a) it requires frequent maintenance b) there are several hundred columns per table.
Is there any way to achieve the dynamism of SELECT * but effectively exclude certain columns (i.e. the ID ones)
Thanks
I specifically want to avoid listing the column names from the tables because a) it requires frequent maintenance b) there are several hundred columns per table.
In this case, you can't avoid it. You must specify column names and for those columns with duplicate names use an alias. Code generation can help with these many columns.
SELECT * is bad practice regardless - if someone adds a 2GB binary column to one of these tables and populates it, do you really want it to be returned?
One simple method to generate the columns you want is
select column_name+',' from information_schema.columns
where table_name='tt'
and column_name not in('ID')
As well as Oded's answer (100% agree with)...
If someone changes the underlying tables, you need view maintenance anyway (with sp_refreshview). The column changes will not appear in the view automatically. See "select * from table" vs "select colA, colB, etc. from table" interesting behaviour in SQL Server 2005
So your "reflected in the "MasterView" automatically requirement can't be satisfied anyway
If you want to ensure the view is up to date, use WITH SCHEMABINDING which will prevent changes to the underlying tables (until removed or dropped). Then make column changes, then re-apply the view
I had the same issue, see example below:
ALTER VIEW Summary AS
SELECT * FROM Table1 AS t1
INNER JOIN Table2 AS t2 ON t1.Id = t2.Id
and I encountered that error you mentioned, the easiest solution is using the alias before * like this:
SELECT t1.* FROM Table1 AS t1
INNER JOIN Table2 AS t2 ON t1.Id = t2.Id
You shouldn't see that error anymore.
I had gone with this in the end, building off of Madhivanan's suggestion. It's similar to what t-clausen.dk later suggested (thanks for your efforts) though I find the xml path style more elegant than cursors / rank partitions.
The following recreates the MasterView definition when run. All columns in the underlying tables are prepended with the table name, so I can include two similarly named columns in the view by default. This alone solves my original problem, but I also included the "WHERE column_name NOT IN" clause to specifically exclude certain columns that will never be used in the MasterView.
create procedure Utility_RefreshMasterView
as
begin
declare #entity_columns varchar(max)
declare #drop_view_sql varchar(max)
declare #alter_view_definition_sql varchar(max)
/* create comma separated string of columns from underlying tables aliased to avoid name collisions */
select #entity_columns = stuff((
select ','+table_name+'.['+column_name+'] AS ['+table_name+'_'+column_name+']'
from information_schema.columns
where table_name IN ('entity_1', 'entity_2')
and column_name not in ('column to exclude 1', 'column to exclude 2')
for xml path('')), 1, 1, '')
set #drop_view_sql = 'if exists (select * from sys.views where object_id = object_id(N''[dbo].[MasterView]'')) drop view MasterView'
set #alter_view_definition_sql =
'create view MasterView as select ' + #entity_columns + '
from entity_1
inner join entity_2 on entity_2 .id = entity_1.id
/* other joins follow */'
exec (#drop_view_sql)
exec (#alter_view_definition_sql)
end
If you have a Select * and then you are using the JOIN, the result might include columns with the same name and that cannot be possible in a view.If you run the query by itself, works fine but not when creating the View.
For example:
**Table A**
ID, CatalogName, CatalogDescription
**Table B**
ID, CatalogName, CatalogDescription
**After the JOIN query**
ID, CatalogName, CatalogDescription, ID, CatalogName, CatalogDescription
That's not possible in a View.
Specify a unique name for each column in the view. Using just * is not a very good practice.

Resources