Grouping Similar Strings Together in SQL Server

Grouping Similar Strings Together in SQL Server - sql-server

We have a data dilemma involving our products and how they're setup in our ERP system.
When a product is revised, we end up creating a new part number and change the name slightly.
Here is how it shows up
Name QtySold
Product 1 10
Product 1 V2 08
Product 1 V3 06
Product 1 V3 With Accessories 12
All of these product names vary in length, otherwise I'd use the LEFT() function to group them together.
How can I write a query that will take anything that is obviously related (by the naked eye) and group and summarize them together?
The ideal output would be something like this:
Name QtySold
Product 1 36

Related

Multiple Manufacturers ID DB REPLACE - opencart 3.0.3.6

Can please someone help me with DATABASE code? SELECT * FROM oc_manufacturer WHERE 1
I have combined two opencart stores into 1, so I have now multiple manufacturers and different products are assigned to them from each store. Before I can delete the duplicates, I need to assign the right id to right products. So..
If manufacturer id is 11
And the same manufacturer is 57
How can I assign a right manufacturer to a products that have 57 id, but needs to be 11 now.
Can you send me the right code so I can run in SQL?

You can do it on your own risk. First make full backup of your DB
Next ran this query in SQL edit in phpMyadmin:
UPDATE `oc_product` SET manufacturer_id = 255 WHERE manufacturer_id = 5
Where in my example 255 is the new, or existing manufacturer_id which you need to assign to the product and 5 is the manufacturer_id which you later will delete/remove.

Find Whole Duplicated Invoices in SQL

I'm trying to write some SQL to allow me to get result of possible duplicated Invoices that will have the same [same Items, with Same Quantity], That is possible to be Duplicate issued
Invoice items Average around 300 Item
Total Invoice To be Revision around 2500 Invoice
The Following is a invoices sample with only 1 items or so, but in real population items average is 300
Inv_ID Item_Code Item_Q
A-800 101010 24
A-801 101010 24
A-802 202020 9
A-803 101010 18
A-804 202020 9
A-805 202020 9
A-806 101010 18
Hoping The Excepted Result will be
A-800, A-801
A-802, A-804, A-805
A-803, A-806
But the invoice has around 200 item, and the duplicated invoices has to be has the same items and exact same quantity for these.
It's SQL_Server
And The Result need to match the whole Invoices item
Like Invoice A has 300 Different Items line with each one Quantity 2
The Results need to be all invoice has the exact same 300 Item with the Exact Quantity.
The Supplier has issued multiple duplicated invoice to our accounting
Department by mistakes over 4 years, it was discovered by chance, so
we need to find out the duplicated invoice to remove it from payment
schedule.
The issued invoices Need to has the exact different items with exact quantity to be considered duplicated.,,,

You didn't specify your DBMS product, so this answer is for Postgres.
select string_agg(inv_id, ',' order by inv_id) as inv_ids
from the_table
group by item_code, item_q
order by inv_ids;
Online example
Other DBMS products have similar functions to do the aggregation. In Oracle or SQL Server, you would use listagg(), in MySQL you would use group_concat(). The individual syntax would be slightly different (you have to check the manual), but the idea is the same.

TSQL Count appearance Group by values in other table

I need to group some data to show in a graph but... it is too difficult for me :-(
In one table I have customers info, among that, Name, Kgs and yearly turnover
CustomerA 8 415.86
CustomerB 145846 6815.80
..............
CustomerZC 25160 25690.30
and I need to COUNT customers that has bought less than 50 Kgs, how many bought from 51 to 100, from 100 to 1.000, from 1000 to 30.000 and so on
but since groups limit are not similar, the boundaries of each range are stored in another table and looks like
Group0 0-50
Group1 51-100
Group2 101-1000
.....
Group15 1000001-5000000
Group16 5000001-9999999999
but I can modify it if it can helps
My Target is to have result like this:
0-50 14217
51-100 6425
101-1000 841
....
1000001-5000000 43
Now I achieve this result making 15 different queries but I would like to make an global algorithm that can adapt to a variable number of groups
Thanks

This one is similar, take a look at the second option that joins to a range table.
In your case, it would look something like this:
select r.boundary_name, coalesce(count(*), 0) as cnt
from ranges r
left join customers c
on c.kgs between r.low_range and r.high_range
group by r.boundary_name;
Naturally you'd need to tweak the join if you're looking for exclusive ranges vs. inclusive, and the ranges table will need a low and high bound column.

Explaining row and column dependencies

This is a simple and common scenario at work, and I'd appreciate some input.
Say I am generating a report for the owners of a pet show, and they want to know which of their customers have bought how many of each pet. In this scenario my only tools are SQL and something that outputs my query to a spreadsheet.
As the shop owner, I might expect reports in the form:
Customer Dog Cat Rabbit
1 2 3 0
2 0 1 1
3 1 2 0
4 0 0 1
And if one day I decided to stock Goldfish then the report should now come out as.
Customer Dog Cat Rabbit Goldfish
1 2 3 0 0
2 0 1 1 0
3 1 2 0 0
4 0 0 1 0
5 0 0 0 1
But as you probably know, to have a query which works this way would involve some form of dynamic code generation and would be harder to do.
The simplest query would work along the lines of:
Cross join Customers and Pets, Outer join Sales, Group, etc.
and generate:
Customer Pet Quantity
1 Dog 2
1 Cat 3
1 Rabbit 0
1 Goldfish 0
2 Dog 0
2 Cat 1
2 Rabbit 1
...etc
a) How would I explain to the shop owners that the report they want is 'harder' to generate? I'm not trying to say it's harder to read, but it is harder to write.
b) What is the name of the concept I am trying to explain to the customer (to aid with my Googling)?

The name of the concept is 'cross-tab' and can be accomplished in several ways.
MS Access has proprietary extensions to SQL to make this happen. SQL pre-2k5 has a CASE trick and 2k5 and later has PIVOT, but I think you still need to know what the columns will be.

Some databases indeed support some way of creating cross tables, but I think most need to know
the columns in advance, so you'd have to modify the SQL (and get a database that supports such an extension).
Another alternative is to create a program that will postprocess the second "easy" table to get your clients the cross table as output. This is probably easier and more generic than having to modify SQL or dynamically generate it.
And about a way to explain the problem... you could show them in an Excel how many steps are needed to get the desired result:
Source data (your second listing).
Select values from the pets column
Place each pet type found on a new column
Count values per each type per client
Fill the values
and then say that SQL gives you only the source data, so it's of course more work.

This concept is called pivoting
SQL assumes that your data is represented in terms of relations with fixed structure.
Like, equality is a binary relation, "customer has this many pets of this type" is a ternary relation and so on.
When you see this resultset:
Customer Pet Quantity
1 Dog 2
1 Cat 3
1 Rabbit 0
1 Goldfish 0
2 Dog 0
2 Cat 1
2 Rabbit 1
, it's actually a relation defined by all possible combinations of domain values being in this relation.
Like, a customer 1 (domain customers id's) has exactly 2 (domain positive numbers) pets of genus dog (domain pets).
We don't see rows like these in the resultset:
Customer Pet Quantity
1 Dog 3
Pete Wife 0.67
, because the first row is false (customer 1 doesn't have 3 items of dog, but 2), and the second row values are out of their domain scopes.
SQL paradigma implies that your relations are defined when you issue a query and each row returned defines the relation completely.
SQL Server 2005+ can map rows into columns (that is what you want), but you should know the number of columns when designing the query (not running).
As a rule, the reports you are trying to build are built with reporting software which knows how to translate relational SQL resultsets into nice looking human readable reports.

I have always called this pivoting, but that may not be the formal name.
Whatever it's called you can do almost all of this in plain SQL.
SELECT customer, count(*), sum(CASE WHEN pet='dog' THEN 1 ELSE 0 END) as dog, sum(case WHEN pet='cat' THEN 1 ELSE 0 END) as cast FROM customers join pets
Obviously what's missing is the dynamic columns. I don't know if this is possible in straight SQL, but it's certainly possible in a stored procedure to generate the query dynamically after first querying for a list of pets. The query is built into a string then that string is used to create a prepared statement.

GROUP_CONCAT and DISTINCT are great, but how do i get rid of these duplicates i still have?

i have a mysql table set up like so:
id uid keywords
-- --- ---
1 20 corporate
2 20 corporate,business,strategy
3 20 corporate,bowser
4 20 flowers
5 20 battleship,corporate,dungeon
what i WANT my output to look like is:
20 corporate,business,strategy,bowser,flowers,battleship,dungeon
but the closest i've gotten is:
SELECT DISTINCT uid, GROUP_CONCAT(DISTINCT keywords ORDER BY keywords DESC) AS keywords
FROM mytable
WHERE uid !=0
GROUP BY uid
which outputs:
20 corporate,corporate,business,strategy,corporate,bowser,flowers,battleship,corporate,dungeon
does anyone have a solution? thanks a ton in advance!

What you're doing isn't possible with pure SQL the way you have your data structured.
No SQL implementation is going to look at "Corporate" and "Corporate, Business" and see them as equal strings. Therefore, distinct won't work.
If you can control the database,
The first thing I would do is change the data setup to be:
id uid keyword <- note, not keyword**s** - **ONE** value in this column, not a comma delimited list
1 20 corporate
2 20 corporate
2 20 business
2 20 strategy
Better yet would be
id uid keywordId
1 20 1
2 20 1
2 20 2
2 20 3
with a seperate table for keywords
KeywordID KeywordText
1 Corporate
2 Business
Otherwise you'll need to massage the data in code.

Mmm, your keywords need to be in their own table (one record per keyword). Then you'll be able to do it, because the keywords will then GROUP properly.

Not sure if MySql has this, but SQL Server has a RANK() OVER PARTITION BY that you can use to assign each result a rank...doing so would allow you to only select those of Rank 1, and discard the rest.

You have two options as I see it.
Option 1:
Change the way your store your data (keywords in their own table, join the existing table with the keywords table using a many-to-many relationship). This will allow you to use DISTINCT. DISTINCT doesn't work currently because the query sees "corporate" and "corporate,business,strategy" as two different values.
Option 2:
Write some 'interesting' sql to split up the keywords strings. I don't know what the limits are in MySQL, but SQL in general is not designed for this.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Grouping Similar Strings Together in SQL Server - sql-server

Related

Multiple Manufacturers ID DB REPLACE - opencart 3.0.3.6

Find Whole Duplicated Invoices in SQL

TSQL Count appearance Group by values in other table

Explaining row and column dependencies

GROUP_CONCAT and DISTINCT are great, but how do i get rid of these duplicates i still have?

Categories

Resources