Excel data compare and change - database

I'll try to stay clear. So the problem is I want to change some data based on the cells value.
Let say the column A is the price, column B is the base value to compar with (like item001,item002), column C is the value for comparing (item001,item002,etc.) with the NEW price in column D. I would like to find the same value in column B and C, and depending if there is a match,to change the value in right row of column A to value in column D. Was it clear enough? Basicly I have 4 columns, 2 times the codes and the prices of the codes. I want Excel to find the same codes and change the price of the old to the new one. The first two columns with price and codes are much longer then the other 2 columns with new price and the data to compare won't be in the same row for sure.
Probably there is no simple solution, but would help my work a lot. I would appreciate if someone can tell which functions should I use in combination.
Sorry for being a noob, have to start learning from somewhere.
Best regards,
Endre Kalmar

So, ...
| A | B | C | D |
-------------------------
1 |price| ID | ID |price|
This is the situation you got?
In column E, put
VLOOKUP(B1; C:D; 2; FALSE)
Depending on your regional settings, the ; might be replaced with a ,
This will give you the price in column D when the value in column B can be found in column C
If it cannot be found, you get an #N/A error

Related

Can I create a Running Totals Calculated Column in Ag Grid

I want to create a new Custom Column in AG grid which will display the calculated value of another column together with the value of the column in the previous row.
We have created lots of calculated columns in AdapTable but i cannot work out how do this.
In our example we have a Price and a Date Column and a Running Price Calculated Column.
For the row where Date is Today, I want to the value in the Running Price column to be 'Price' in this Row plus whatever the Running Price value is in the Row where Date is Yesterday.
And for yesterday's row I want Running Price to include the value for 2 days ago. And so on.
Perhaps this example will help explain:
Price | Date | Running Price
5 | 2 Days Ago | 10
7 | Yesterday | 17
9 | Today | 26
If I can do this without needing to sort AG Grid on the Date column then even better as my users like to do their own sorts and I don't want it to break the running total.
Yes, this can be done fairly easily in AdapTable.
You need to use what it calls an AggregatedScalarQuery.
Assuming that the columns in your grid are called 'Price', and 'MyDate' then the Expression for the 'RunningPrice' Calculated Column will be something like:
CUMUL(SUM([Price]), OVER([MyDate]))
See more at: https://docs.adaptabletools.com/guide/adaptable-ql-expression-aggregation-scalar#cumulative-aggregation
Edit: I should add that you dont need to sort the 'MyDate' column as per your initial message, since OVER will run over the dates in natural sort order. So your users can continue to sort AG Grid how they like without it affecting your Calculated Column.

Condensing a bunch of columns into one array column mapped to a key

I'm doing a project that analyzes covid data and I'm trying to clean the data of null values and the like, but first need to make it usable. It currently has an individual column for every date and the amount of new cases that day. The Combined_Key column is unique so that was what I was going to try to map the dates and cases to.
Also every column is of type String so I imagine I'll need to insert the data into a dataframe that's setup with the correct types but I also don't know how to do that without making 450 date columns all typed separately, even more exciting is that there isn't an inherent date type in spark/scala so not sure how to handle that.
UID,iso2,iso3,code3,FIPS,Admin2,Province_State,Country_Region,Lat,Long_,Combined_Key,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,1/31/20,2/1/20,2/2/20,2/3/20,2/4/20,2/5/20,2/6/20,2/7/20,2/8/20,2/9/20,2/10/20,2/11/20,2/12/20,2/13/20,2/14/20,2/15/20,2/16/20,2/17/20,2/18/20,2/19/20,2/20/20,2/21/20,2/22/20,2/23/20,2/24/20,2/25/20
84001001,US,USA,840,1001.0,Autauga,Alabama,US,32.53952745,-86.64408227,"Autauga, Alabama, US",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,5,6,6,6,6,8,8,10,12,12,12,12,12,12,12,17,18,19,19,19,23,24,24,24,25,26,28,30,32,33,36,36,36,37,39,41,42,43,47,51,54,54,56,58,62,63,72,81
There's part of top 2 rows of the data, a whole lot of date columns have been left out. I'm working in the spark shell, I've tried something like this after turning the data into a table but that gets either a "error: 5 more arguments than can be applied to method ->: (y: B)(String, B)" or "error: type mismatch;" respectively.
var covidMap = scala.collection.mutable.Map[String, ArrayBuffer[Int]]()
table.foreach{x => covidMap += (x(10)).toString -> (x(11),x(20),x(30),x(40),x(50),x(60))}
table.foreach{x => covidMap += (x(10)).toString -> (x(11))}
Honestly I don't know if these are even close to what I need to be doing, I've been coding for 5 weeks in a training program and it's incredibly difficult for me thus far, so, I'm here. Any help is appreciated!
Starting with an example DataFrame (taking your first two example date columns and adding today's date to show it'll work in the future):
val df = List(
(84001001,"US","USA",840,1001.0,"Autauga","Alabama","US",32.53952745,-86.64408227,"Autauga, Alabama, US",0,0,50)
)
.toDF("UID","iso2","iso3","code3","FIPS","Admin2","Province_State","Country_Region","Lat","Long_","Combined_Key","1/22/20","1/23/20","4/2/22")
.show()
gives:
+--------+----+----+-----+------+-------+--------------+--------------+-----------+------------+--------------------+-------+-------+------+
| UID|iso2|iso3|code3| FIPS| Admin2|Province_State|Country_Region| Lat| Long_| Combined_Key|1/22/20|1/23/20|4/2/22|
+--------+----+----+-----+------+-------+--------------+--------------+-----------+------------+--------------------+-------+-------+------+
|84001001| US| USA| 840|1001.0|Autauga| Alabama| US|32.53952745|-86.64408227|Autauga, Alabama, US| 0| 0| 50|
+--------+----+----+-----+------+-------+--------------+--------------+-----------+------------+--------------------+-------+-------+------+
We can then create a new column, which I've called dates but you can easily rename. Here the array function is used to combine all of the values of the date columns into column, which is an array:
import org.apache.spark.sql.functions.{array, col}
val dateRegex = "\\d+/\\d+/\\d+" // matches all columns in x/y/z format
val dateColumns = df.columns.filter(_.matches(dateRegex))
df
// select all date columns and combine into a new column: `dates`
.withColumn("dates", array(dateColumns.map(df(_)): _*))
// drop the original date columns, keeping `dates`
.drop(dateColumns: _*)
.show(false)
gives:
+--------+----+----+-----+------+-------+--------------+--------------+-----------+------------+--------------------+----------+
|UID |iso2|iso3|code3|FIPS |Admin2 |Province_State|Country_Region|Lat |Long_ |Combined_Key |dates |
+--------+----+----+-----+------+-------+--------------+--------------+-----------+------------+--------------------+----------+
|84001001|US |USA |840 |1001.0|Autauga|Alabama |US |32.53952745|-86.64408227|Autauga, Alabama, US|[0, 0, 50]|
+--------+----+----+-----+------+-------+--------------+--------------+-----------+------------+--------------------+----------+
A downside to this is that the output DataFrame doesn't retain the original date values.

How to create a total quantity based upon multiple criteria from 2 arrays

What I'm trying to accomplish is going to be illustrated from the below picture, the example spreadsheet is linked as well.
I need to create function that will sum Quantity Columns G & D where ID Columns B & E is less than 5000. Then display those results to the corresponding ID in Column J.
Lastly, the quantities for the Materials in the Chain Material section need to carry over to the respective ID in Column H.
I do NOT need to sum anything from Column D where the ID from Column B is greater than 5000, that information is useless.
Expected result can be seen next to each cell in Column K.
Thank you in advance!
https://docs.google.com/spreadsheets/d/1seoOaGytQ8wKH-wXR7hhN1YGdWiNNNhMIyIpCzRAMG4/edit?usp=sharing
use in J4:
=INDEX(IFNA(VLOOKUP(H4:H, QUERY({B4:D; E4:G},
"select Col1,sum(Col3) group by Col1"), 2, 0)))

How to Copy Data From Other Column Based On Other Cloumn Values

Here is my datasheet
I have duplicate values in Column A and Column B. I'm done with getting only unique values from column A in Column C with =UNIQUE(A2:A) formula. But I also want to get only values from B for unique entries I fetch from Column A into Column C. I want to do in Column D but unable to do this. So How I can copy Column B values in Column D against unique values get in Column C via formula. Hope you understand my Question. Sorry for bad English.
For Ex:
See if this works
=unique({B2:B, D2:D})
you can do UNIQUE of two columns:
=UNIQUE(A2:B)
or you can use SORTN with grouping mode 2 like:
=SORTN(A2:B, ROWS(A2:A), 2, 1, 1)

How to represent a 2-D data matrix in a database

I have a data set which consists of an ID and a matrix (n x n) of data related to that ID.
Both the column names (A,B,C,D) and the Row names (1,2,3) are also important and need to be held for each individual ID, as well as the data (a1,b1,c1,d1,...)
for example:
ID | A | B | C | D |
1 | a1 | b1 | c1 | d1 |
2 | ... | ... | ... | ... |
3 | ... | ... | ... | ... |
I am trying to determine the best way of modelling this data set in a database, however, it seems like something that is difficult given the flat nature of RDBMS.
Am I better off holding the ID and an XML blob representing the data matrix, or am i overlooking a simpler solution here.
Thanks.
RDBMSes aren't flat. The R part sees to that. What you need is:
Table Entity
------------
ID
Table EntityData
----------------
EntityID
MatrixRow (1, 2, 3...)
MatrixColumn (A, B, C, D...)
Value
Entity:EntityData is a one-to-many relationship; each cell in the matrix has an EntityData row.
Now you have a schema that can be analyzed at the SQL level, instead of just being a data dump where you have to pull and extract everything at the application level in order to find out anything about it.
This is one of the reasons why PostgreSQL supports arrays as a data type. See
http://www.postgresql.org/docs/8.4/static/functions-array.html
and
http://www.postgresql.org/docs/8.4/static/arrays.html
Where it shows you can use syntax like ARRAY[[1,2,3],[4,5,6],[7,8,9]] to define the values of a 3x3 matrix or val integer[3][3] to declare a column type to be a 3x3 matrix.
Of course this is not at all standard SQL and is PostgreSQL specific. Other databases may have similar-but-slightly-different implementations.
If you want a truly relational solution:
Matrix
------
id
Matrix_Cell
-----------
matrix_id
row
col
value
But constraints to make sure you had valid data would be hideous.
I would consider a matrix as a single value as far as the DB is concerned and store it as
csv:
Matrix
------
id
cols
data
Which is somewhat lighter than XML.
I'd probably implement it like this:
Table MatrixData
----------------
id
rowName
columnName
datapoint
If all you're looking for is storing the data, this structure will hold any size matrix and allow you to reconstitute any matrix from the ID. You will need some post-processing to present it in "matrix format", but that's what the front-end code is for.
can the data be thought of as "row data"? if so then maybe you could store each row as a Object (or XML Blob) with data A,B,C,D and then, in your "representation", you use something like a LinkedHashMap (assuming Java) to get the objects with an ID key.
Also, it seems that by its very basic nature, a typical database table already does what you need doesn't it?
Or even better what you can do is, create a logical array like structure.
Say u want to store an m X n array..
Create m attributes in the table.
In each attribute store n elements separated by delimiters ...
while retrieving the data, simply do reverse parsing to easily get back the data..

Resources