Resetting values for some columns in a large table - sql-server

I have a very large table (10 millions of rows and 15 columns). Now I have a list of data to update all values in 2 columns (5 and 13). The values for each rows in these 2 columns are different.
Currently I've to write 10 million lines like this
UPDATE table SET col5 = value1 WHERE RowNumber = 1
UPDATE table SET col5 = value2 WHERE RowNumber = 2
UPDATE table SET col5 = value3 WHERE RowNumber = 3
...
UPDATE table SET col5 = value10000000 WHERE RowNumber = 10000000
for column5.

Related

Group By and list not matched

I have table as under:
Col1 Col2 Col3 Col4 Col5
1 50.9499411799115 Point imp A
1 109.69487431133 Point exp 1
1 107.69487431133 Point exp 2
1 1019.69487431133 Point exp B
2 51.5403193833315 Point imp 0
2 50.5403193833315 Point exp 3
I want to group by Col1 and select all the ones where there are no 'A' or 'B' in Col5
I used the below query for generating the ouput in MSSQL but didnt get correct result, can someone point out my mistake
SELECT Col1
FROM table1
WHERE
Col5 NOT LIKE('%A%')
or Col5 NOT LIKE('%B%')
GROUP BY Col1;
Therefore my output should be
Col1
2
The problem is you're trying to eliminate one row based on data in another, so the only way to do that is to check the other day e.g.
select Col1
from table1 D1
where not exists (select 1 from table1 D2 where D2.Col1 = D1.Col1 and (Col5 like ('%A%') or Col5 like ('%B%')))
group by Col1

How to select distinct rows, but repeat if it has a different row between the equal ones

Having data like this:
id text bit date
1 row 1 2016-11-24
2 row 1 2016-11-25
3 row 0 2016-11-26
4 row 1 2016-11-27
I want to select the data based on where the text and bit columns are distinct, but based on some order, in this case the id, the data changes between two identical rows, it should duplicate this row on the selection.
So, if I use distinct on SQL, I would get rows 1 and 3, but I want to retreive rows 1, 3 and 4, because even 1 and 4 being identical, row 3 is between then when ordering by id.
With a larger dataset, like:
id text bit date
1 row 1 2016-11-24
2 row 1 2016-11-25
3 row 0 2016-11-26
4 row 1 2016-11-27
5 foo 1 2016-11-28
6 bar 1 2016-11-29
7 row 1 2016-11-30
8 row 0 2016-12-01
9 row 0 2016-12-02
10 row 1 2016-12-03
Again, selecting with distinct on text and bit columns, the query would retrieve rows 1,3,5 and 6, but actually I want rows 1,3,4,5,6,7,8 and 10.
;with tb(id,[text],[bit],[date]) AS (
SELECT 1,'row',1,'2016-11-24' union
SELECT 2,'row',1,'2016-11-25' union
SELECT 3,'row',0,'2016-11-26' union
SELECT 4,'row',1,'2016-11-27' union
SELECT 5,'foo',1,'2016-11-28' union
SELECT 6,'bar',1,'2016-11-29' union
SELECT 7,'row',1,'2016-11-30' union
SELECT 8,'row',0,'2016-12-01' union
SELECT 9,'row',0,'2016-12-02' union
SELECT 10,'row',1,'2016-12-03')
select t1.* from tb as t1
OUTER APPLY (select top 1 [text],[bit] from tb as tt where tt.id<t1.id order by id desc ) as t2
where t1.[text]!=isnull(t2.[text],'') or t1.[bit]!=isnull(t2.[bit],1-t1.[bit])
result set:
1 row 1 2016-11-24
3 row 0 2016-11-26
4 row 1 2016-11-27
5 foo 1 2016-11-28
6 bar 1 2016-11-29
7 row 1 2016-11-30
8 row 0 2016-12-01
10 row 1 2016-12-03
It seems that you need a row-by-row operator. You need to know if the new row is the same as the previous one or not. If it is, neglect it, if not, keep it. Here is my solution:
declare #text varchar(100)=(select [text] from Mytable where id = 1)
declare #bit bit = (select [bit] from Mytable where id = 1)
declare #Newtext varchar(100)
declare #Newbit bit
declare #Mytable table(id int, [text] varchar(100), [bit] bit)
Insert into #Mytable select id,text, bit from Mytable where id = 1
declare #counter int =2
while #counter<=(select COUNT(*) from MyTable)
Begin
select #Newtext=(select [text] from Mytable where id = #counter)
select #Newbit=(select [bit] from Mytable where id = #counter)
IF #Newtext!=#text or #Newbit!=#bit
Begin
Insert into #Mytable
select * from Mytable where id = #counter
End
set #text = #Newtext
set #bit = #Newbit;
set #counter = #counter+1
END
select * from #Mytable

SQL Server : Bulk insert a Datatable into 2 tables

Consider this datatable :
word wordCount documentId
---------- ------- ---------------
Ball 10 1
School 11 1
Car 4 1
Machine 3 1
House 1 2
Tree 5 2
Ball 4 2
I want to insert these data into two tables with this structure :
Table WordDictionary
(
Id int,
Word nvarchar(50),
DocumentId int
)
Table WordDetails
(
Id int,
WordId int,
WordCount int
)
FOREIGN KEY (WordId) REFERENCES WordDictionary(Id)
But because I have thousands of records in initial table, I have to do this just in one transaction (batch query) for example using bulk insert can help me doing this purpose.
But the question here is how I can separate this data into these two tables WordDictionary and WordDetails.
For more details :
Final result must be like this :
Table WordDictionary:
Id word
---------- -------
1 Ball
2 School
3 Car
4 Machine
5 House
6 Tree
and table WordDetails :
Id wordId WordCount DocumentId
---------- ------- ----------- ------------
1 1 10 1
2 2 11 1
3 3 4 1
4 4 3 1
5 5 1 2
6 6 5 2
7 1 4 2
Notice :
The words in the source can be duplicated so I must check word existence in table WordDictionary before any insert record in these tables and if a word is found in table WordDictionary, the just found Word ID must be inserted into table WordDetails (please see Word Ball)
Finally the 1 M$ problem is: this insertion must be done as fast as possible.
If you're looking to just load the table the first time without any updates to the table over time you could potentially do it this way (I'm assuming you've already created the tables you're loading into):
You can put all of the distinct words from the datatable into the WordDictionary table first:
SELECT DISTINCT word
INTO WordDictionary
FROM datatable;
Then after you populate your WordDictionary you can then use the ID values from it and the rest of the information from datatable to load your WordDetails table:
SELECT WD.Id as wordId, DT.wordCount as WordCount, DT.documentId AS DocumentId
INTO WordDetails
FROM datatable as DT
INNER JOIN WordDictionary AS WD ON WD.word = DT.word
There a little discrepancy between declared table schema and your example data, but it was solved:
1) Setup
-- this the table with the initial data
-- drop table DocumentWordData
create table DocumentWordData
(
Word NVARCHAR(50),
WordCount INT,
DocumentId INT
)
GO
-- these are result table with extra information (identity, primary key constraints, working foreign key definition)
-- drop table WordDictionary
create table WordDictionary
(
Id int IDENTITY(1, 1) CONSTRAINT PK_WordDictionary PRIMARY KEY,
Word nvarchar(50)
)
GO
-- drop table WordDetails
create table WordDetails
(
Id int IDENTITY(1, 1) CONSTRAINT PK_WordDetails PRIMARY KEY,
WordId int CONSTRAINT FK_WordDetails_Word REFERENCES WordDictionary,
WordCount int,
DocumentId int
)
GO
2) The actual script to put data in the last two tables
begin tran
-- this is to make sure that if anything in this block fails, then everything is automatically rolled back
set xact_abort on
-- the dictionary is obtained by considering all distinct words
insert into WordDictionary (Word)
select distinct Word
from DocumentWordData
-- details are generating from initial data joining the word dictionary to get word id
insert into WordDetails (WordId, WordCount, DocumentId)
SELECT W.Id, DWD.WordCount, DWD.DocumentId
FROM DocumentWordData DWD
JOIN WordDictionary W ON W.Word = DWD.Word
commit
-- just to test the results
select * from WordDictionary
select * from WordDetails
I expect this script to run very fast, if you do not have a very large number of records (millions at most).
This is the query. I'm using temp table to be able to test.
if you use the 2 CTEs, you'll be able to generate the final result
1.Setting up a sample data for test.
create table #original (word varchar(10), wordCount int, documentId int)
insert into #original values
('Ball', 10, 1),
('School', 11, 1),
('Car', 4, 1),
('Machine', 3, 1),
('House', 1, 2),
('Tree', 5, 2),
('Ball', 4, 2)
2. Use cte1 and cte2. In your real database, you need to replace #original with the actual table name you have all initial records.
;with cte1 as (
select ROW_NUMBER() over (order by word) Id, word
from #original
group by word
)
select * into #WordDictionary
from cte1
;with cte2 as (
select ROW_NUMBER() over (order by #original.word) Id, Id as wordId,
#original.word, #original.wordCount, #original.documentId
from #WordDictionary
inner join #original on #original.word = #WordDictionary.word
)
select * into #WordDetails
from cte2
select * from #WordDetails
This will be data in #WordDetails
+----+--------+---------+-----------+------------+
| Id | wordId | word | wordCount | documentId |
+----+--------+---------+-----------+------------+
| 1 | 1 | Ball | 10 | 1 |
| 2 | 1 | Ball | 4 | 2 |
| 3 | 2 | Car | 4 | 1 |
| 4 | 3 | House | 1 | 2 |
| 5 | 4 | Machine | 3 | 1 |
| 6 | 5 | School | 11 | 1 |
| 7 | 6 | Tree | 5 | 2 |
+----+--------+---------+-----------+------------+

SQL Update row column with random lookup value

I am trying to update a lead table to assign a random person from a lookup table. Here is the generic schema:
TableA (header),
ID int,
name varchar (30)
TableB (detail),
ID int,
fkTableA int, (foreign key to TableA.ID)
recordOwner varchar(30) null
other detail colums..
TableC (owners),
ID int,
fkTableA int (foreign key to TableA.ID)
name varchar(30)
TableA has 10 entries, one for each type of sales lead pool. TableB has thousands of entries for each row in TableA. I want to assign the correct recordOwners from TableC to and even number of rows each (or as close as I can). TableC will have anywhere from one entry for each row in tableA or up to 10.
Can this be done in one statement? It doesn't have to be. I can't seem to figure out the best approach for speed. Any thoughts or samples are appreciated.
Updated:
TableA has a 1 to many relation ship with TableC. For every record of TableA, TableC will have at least one row, which represents an owner that will need to be assigned to a row in TableB.
TableA
int name
1 LeadSourceOne
2 LeadSourceTwo
TableC
int(id) int(fkTableA) varchar(name)
1 1 Tom
2 1 Bob
3 2 Timmy
4 2 John
5 2 Steve
6 2 Bill
TableB initial data
int(id) int(fkTableA) varchar(recordOwner) (other detail columns)
1 1 NULL ....
2 1 NULL ....
3 1 NULL ....
4 2 NULL ....
5 2 NULL ....
6 2 NULL ....
7 2 NULL ....
8 2 NULL ....
9 2 NULL ....
TableB end result
int(id) int(fkTableA) varchar(recordOwner) (other detail columns)
1 1 TOM ....
2 1 BOB ....
3 1 TOM ....
4 2 TIMMY ....
5 2 JOHN ....
6 2 STEVE ....
7 2 BILL ....
8 2 TIMMY ....
9 2 BILL ....
Basically I need to randomly assign a record from tableC to tableB based on the relationship to tableA.
UPDATE TabB SET name = (SELECT TOP 1 coalesce(tabC.name,'') FROM TabC INNER JOIN TabA ON TabC.idA = TabA.id WHERE tabA.Id = TabB.idA )
Should work but its not tested.
Try this:
UPDATE TableB
SET recordOwner = (SELECT TOP(1) [name]
FROM TableC
ORDER BY NEWID())
WHERE recordOwner IS NULL
I ended up looping thru and updating x percent of the detail records based on how many owners I had. The end result is something like this:
create table #tb_owners(userId varchar(30), processed bit)
insert into #tb_owners(
userId,
processed)
select userId = name,
processed = 0
from tableC
where fkTableA = 1
select #percentUpdate = cast(100 / count(*) as numeric(8,2))
from #tb_owners
while exists(select 1 from #tb_owners o where o.processed = 0)
begin
select top 1
#userFullName = o.name
from #tb_owners o
where o.processed = 0
order by newId()
update tableB
set recordOwner = #userFullName
from tableB ptbpd
inner join (
select top (#percentUpdate) percent
id
from tableB
where recordOwner is null
order by newId()
) nd on (ptbpd.id = nd.id)
update #tb_owners
set processed = 1
where userId = #oUserId
end
--there may be some left over, set to last person used
update tableB
set recordOwner = #userFullName
from tableB
where ptbpd.recordOwner is null

Getting number of records against each row using SQL server

I have table
col1 col2
---- ----
a rrr
a fff
b ccc
b zzz
b xxx
i want a query that return number of occurrences of col1 against each row like.
rows col1 col2
---- ---- ----
2 a rrr
2 a fff
3 b ccc
3 b zzz
3 b xxx
As a is repeated 2 time and b is repeated 3 time.
You can try Count over partition_by_clause which divides the result set produced by the FROM clause into partitions to which the function is applied.
This function treats all rows of the query result set as a single group
Try this...
select count (col1) over (partition by col1) [rows] ,col1 ,col2 from tablename
You can use the OVER clause with aggregate functions like COUNT:
SELECT rows = COUNT(*) OVER(PARTITION BY col1),
col1,
col2
FROM dbo.TableName
Demo

Resources