Ive got a pipe that consists of 5 pieces, each including 5 properties - database

Inlet -> front -> middle -> rear -> outlet
Those five properties have a value anything between 4 - 40. Now i want to calculate a specific match for each of those values that is either a full 10 or a 5 when a single property is summed from each pipe piece. There might be hundreds of different pipe pieces all with different properties.
So if i have all 5 pieces and when summed, their properties go like 54,51,23,71,37. That is not good and not what im looking.
Instead 55,50,25,70,40. That would be perfect.
My trouble is there are so many of the pieces that it would be insane to do the miss'matching manually, and new ones come up frequently.
I have manually inserted about 100 of these already into SQLite, but should be easy to convert into any excel or other database formats, so answer can be related to anything like mysql or googlesheets.
I need the calculation that takes every piece in account and results either in "no match" or tells me the id of each piece that is required for a match and if multiple matches are available, it separates them.
Edit: Even just the math needed to do this kind of calculation would be a lot of help here, not much of a math guy myself. I guess there should be a reference piece i need to use and then that gets checked against every possible scenario.

If the value you want to verify is in A1, use: =ROUND(A1/5,0)*5
If the pipes may not be shorter than the given values, use =CEILING(A1,5)

Related

Excel calculate smallest of X columns within Y columns, ignoring zeros

I'm trying to calculate the sum of best segments in a run. For example, each Km gives a list as such:
5:40 6:00 5:45 5:55 6:21 6 :30
I'm trying to gather the best segments of 2km/3km/4km etc and would like a simple code to do it. At the moment, I'm using the formula
=Min(If(B1=0,9:9:9,sum(A1:B1),If(C1=0,9:9:9,sum(B1:C1))
but this goes all the way to 50km, meaning a very long formulae that I then have to repeat slightly differently at 3km, then 4km, then 5km etc. Surely there must me a way of
generating an array of summed columns of every n column, then iterating over that to find the min while ignoring values of 0?
I can do it manually for now, but what if I want to go over 50km? I might want to incorporate bike rides/car drives in the future just for some data analysis so I figured it best finding an ideal formulae now.
It's frustrating as I could code it and I want to avoid VBA ideally and stick to formulae in Excel.
Here is a draft of the case where there aren't any zeroes just for groups of 2Km. I decided the simplest approach initially was to add a couple of helper rows containing the running total of times (and for later use counts) and use a formula like this to subtract them in pairs:
=MIN(INDEX(A2:J2,SEQUENCE(1,9,2))-IF(SEQUENCE(1,9,0)=0,0,INDEX(A2:J2,SEQUENCE(1,9,0))))
but if you have access to recent additions to Excel 365 like Scan you can do it without helper rows.
Here is a more realistic scenario with a couple of zeroes thrown in
=LET(runningSum,Y$4:AP$4,runningCount,Y$5:AP$5,cols,COLUMNS(runningSum),leg,X7,
seqEnd,SEQUENCE(1,cols-leg+1,leg),seqStart,SEQUENCE(1,cols-leg+1,0),
times,INDEX(runningSum,seqEnd)-IF(seqStart=0,0,INDEX(runningSum,seqStart)),
counts,INDEX(runningCount,seqEnd)-IF(seqStart=0,0,INDEX(runningCount,seqStart)),
MIN(IF(counts=leg,times)))
Note that there are no runs of more than seven consecutive legs that don't contain a zero so 8, 9, 10 etc. just work out to 0.
As mentioned you could dispense with the helper rows by using Scan, but not everyone has access to this so I will add it separately:
=LET(data,Y$3:AP$3,runningSum,SCAN(0,data,LAMBDA(a,b,a+b)),
runningCount,SCAN(0,data,LAMBDA(a,b,a+(b>0))),leg,X7,cols,COLUMNS(data),
seqEnd,SEQUENCE(1,cols-leg+1,leg),seqStart,SEQUENCE(1,cols-leg+1,0),
times,INDEX(runningSum,seqEnd)-IF(seqStart=0,0,INDEX(runningSum,seqStart)),
counts,INDEX(runningCount,seqEnd)-IF(seqStart=0,0,INDEX(runningCount,seqStart)),
MIN(IF(counts=leg,times)))
Tom that worked! I learnt a few things on the way too and using the indexing method alongside sequence and columns is something I had not thought of. I'd never heard of the LET command before and I can already see that this is going to really help with some of the bigger calculations in the future.
Thank you so much, I'd like to show you how it now looks. Row 3087 is my old formula, row 3088 is a copy of the same data using the new formula, as you can see I've gotten exactly the same results so it's clear that it works perfectly and it is can be easily duplicated.

Most efficient way to pull values that may or may not change?

I am not a trained programmer, but I assist in developing/maintaining macros within our VBA-based systems to expedite various tasks our employees do manually. For instance, copying data from one screen to another. By hand, any instance of this could take 30 seconds to 2 minutes, but with a macro, it could take 2-3 seconds.
Most of the macros we develop rely on the ability to accurately pull data as displayed (not from its relative field!) based on a row/column format for each character. As such, we employ the use of a custom command (let's call it, say... Instance.Grab) that pulls what we need from the screen using row x/column y coordinates and the length of what we want to pull. Example, where the we would normally pull a 8 character string from coordinates 1,1:
dim PulledValue as String
PulledValue = Instance.Grab(1,1,8)
If I ran that code on my question so far, the returned value for our macro would have been "I am not"
Unfortunately, our systems are getting their displays altered to handled values of an increased character length. As such, the coordinates of the data we're pulling is getting altered significantly. Rather than go through our macros and change the coordinates and length manually in each macro (which would need to be repeated if the screen formats change again), I'm converting our macros so that any time they need to pull the needed string, we can simply change the needed coordinate/length from a central location.
My question is, what would be the best way to handle this task? I've thought of a few ideas, but want to maximize effectiveness and minimize the time I spend developing it, given my limited programming experience. For the sake of this, let's call what I need to make happen CoorGrab, and where an array is needed, make an array called CoorArray:
1) creating Public Function CoorGrab(ThisField As Variant) -if I did it this way, than I would simply list all the needed coordinate/length sets based on the variant I enter, then pull whichever set as needed using a 3 dimensional array. For instance: CoorGrab(situationA) would return CoorArray(5, 7, 15). This would be easy enough to edit for one of us who know something about programming, but if we're not around for any reason, there could be issues.
2) creating all the needed coordinates in public arrays in the module. I'm not overly familiar with how to implement this, but I think I read up on something called public constants? I kinda like this idea for its simplicity, but am hesitant to use any variable or array as public.
3) creating a .txt file in a shared drive that has all the needed data and a label to identify them, and save it to a shared drive that any terminal can access when running these macros. This would be the easiest for a non-programmer to jump in and edit in case I or one of our other programming-savvy employees aren't available, but it seems like far more work than is needed, and I fear what could happen if the .txt file got a type or accidentally deleted.
Any thoughts on how I should proceed? Are one of the above options inherently better/easier than the others? Or is there another way to handled this situation that I didn't cover? Any info or advice you all can provide would be greatly appreciated!
8/2/15 Note - Should probably mention the VBA is used as part of a terminal emulator with custom applications for the needs of our department. I don't manage the emulator or its applications, nor do I have system admin access; I just create/edit macros used within it to streamline some of the ways our users handle their workloads. Of the three of us who do this, I'm the least skilled at programming, but also the only on who could be pulled that could update them before the changes take effect.
Your way is not so bad, I would:
Use a string as a label as parameter for CoorGrab
Return a range instead of a string (because you can use a single cell range as text and you keep a trace where your data is)
public CoorGrab(byval label as string) as range
Create an Excel Sheet with 3 rows: 1 = label, 2 = x, 3 = y (you could
add a 4 if you need to search in an other sheet)
CoorGrab() Find the label in the Excel Sheet and return X / Y
If developers aren't availables, they just have to edit the Excel sheet.
You could too create and outsource Excel File to read coordinates outside the local file, or use it to update files of everybody (Read file from server, add/update all label in the server file but not in local file)

Best way to store a database field that could potentially be a number or string

We are storing wine data in our database. The vintage of the wine might be a number like 2010 or a string such as Non-Vintage.
2010 means the grapes were harvested in 2010.
Non-Vintage means the grapes were harvested across an unknown time-period.
At first we decided to store the field as a string, since 2010 and Non-Vintage are both potentially strings. However, we need to be able to sort the years or perform some arithmetic (i.e. year > 2010).
We are considering either:
Store the data as a number and "Non-Vintage" would be assigned 0. However, we'd have to provide weird validations everywhere in the app for handling the 0 value.
Store the year as a number and provide a boolean "non_vintage" field for non-vintage wines.
The data pulled from the database will be delivered to an AngularJS front-end via an API. The Javascript code will have to parse through and use the year at various points... i.e. "show me all wines where year > 2010".
Anyone have any thoughts on which is better and why?
are you using mysql? you can cast strings as ints in your mysql statements.
select * from table order by cast(string AS signed) asc;
That would mean you can store as a string.
Option 1
Since you're not really treating the year as a number, nor do you need to .. the simplest option you have is to just create a string and have at it.
This has some advantages and disads right off the bat:
Advantage:
simplest, nothing complex. any database can handle this.
logic is fairly simple, although a side case to handle your "Non-Vintage" is needed.
Disadvantage:
potentially allows values you don't want/expect/etc. :
ie: "NonVintage", "Non-vantage", "Unknown", "Spider-man!" ... O.o
This might be mitigated by whatever your process for putting values in might be (ie if it's mostly automated, this might be a smaller issue) :)
========================================
Option 2
A more stricter way would be to use 2 columns.
A number and a string.
Store the vintage year in a 4-digit number field, and you know it'll always be a "proper" year. (you could add a check constraint to prevent years < 1000 if you want ;) )
Store the code "NV" in a 2 (or 3?) digit string "code" column. This gives you good flexibility going forward in case other requirements in future start asking for additional types or such.
You could just do a boolean - it would work, however, if things changed in future (they always do), you'd have to redesign .. the string code column has no real hard disad on the boolean and gives you simple flexibility going forward ;)
========================
It would depend on your system and what you know of it, and how the data's coming in ... but I'd probably lean towards option 2 (number + string) myself.

What is a good approach to check if an item is in a very big hashset?

I have a hashset that cannot be entirely loaded into the memory. So let's say it has ABC part and each one could be loaded into memory but not all at one time.
I also have random entries coming in from time to time which I can barely tell which part it could potentially belong to. So one of the approaches could be that I load A first and then make a check, and then B, C. But next entry could belong to B so I have to unload C, and then load A, then B...Hopefully I make this understood.
This clearly would be very slow so I wonder is there a better way to do that? (if using db is not an alternative)
I suggest that you don't use some criteria to put data entry either to A or to B. In other words, A,B,C - it's just result of division of whole data to 3 equal parts. Am I right? If so I recommend you add some criteria when you adding new entry to your set. For example, if your entries are numbers put those who starts from 0-3 to A, those who starts from 4-6 - to B, from 7-9 to C. When your search something, you apriori now that you have to search in A or in B, or in C. If your entries are words - the same solution, but now criteria is first letter. May be here better use not 3 sets but 26 - size of english alphabet. Please note, that you anyway have to store one of sets in memory. You see one advantage - you do maximum 1 load/unload operation, you don't need to check all sets - you now which of them can really store your value. This idea is widely using in DB - partitioning. If you store in sets nor numbers nor words but some complex objects you anyway can invent some simple criteria.

How to format data in (a) CSV file(s) so that it can easily be imported in R?

Edit:
So, this format would work:
featureID charge xcoordinate ycoordinate
1 2 5105.9217 336.125209180674
1 2 5108.7642 336.124751115092
2 0 2434.9217 145.893331325278
But what if I have two columns with multiple value that are linked. Say column quality has a machine and a quality linked and the column looks like this
MachineQuality
[[{1:1224}, {2:3453}], [{1:2242}, {2:4142}]
Now if I want to split that up like I did with the coordinates of the convexhull I would need 2 rows instead of 1. But wouldn't I need 2 rows for every row that is already in (so 4, because there are already 2 extra for the coordinates) like this:
featureID charge xcoordinate ycoordinate quality1 quality2
1 2 5105.9217 336.125209180674 1224 3453
1 2 5105.9217 336.125209180674 2242 4142
1 2 5108.7642 336.124751115092 1224 3453
1 2 5108.7642 336.124751115092 2242 4142
[...]
Would it have to be like this?
I'm very new to R, my knowledge doesn't go much further than knowing how to make a vector and some simple plots. I'm going to use R for an internship project the next couple of months and during this time I will (hopefully) learn some of the ins and outs of R. However, before I start I need to produce the data that I'm going to do the statistics on. I need to know beforehand how I should format my output CSV data so that I can easily read it in once I start my R analysis.
One thing that I've been asked to do is make a CSV file out of the data so that it can be read in by R. The example CSV files for importing with R that I've seen all look like this
featureID Charge value
1 2 10
2 0 9
However, my data mostly consists out of columns for which the values contain multiple values. To clarify:
As an example, my data exists of "features" that, amongs other information has a "convexhull". This convexhull consists of paired x and y coordinates. So what I could have for data is (only showing two coordinates, can be many)
featureID Charge Convexhull
1 2 [[{'y': '336.125209180674'}, {'x': '5105.9217'}], [{'y': '336.124751115092'}, {'x': '5108.7642'}]]
Is it possible to get this in one CSV file, being able to read it in R correctly (so that the paired x and y coordinates are preserved)? If so, how should the CSV file look like? For example, I've seen examples for CSV files with multiple values that look like this:
featureID charge xcoordinate ycoordinate
1 2 5105.9217 336.125209180674
5108.7642 336.124751115092
2 0 2434.9217 145.893331325278
But I can't find if this is easily imported by R.
If this is not doable in one CSV file, are the CSV files easily imported independently, with a primary key idea, like database linking?
The only critical things are that you have a unique character separating your data columns and that each column is the same length. As long as the second row in your last example is filled in that will import fine.
You need to consider what you want to do with the data after it's in R to decide how you might want any other special formatting beforehand. But, as long as the column separator is a unique character and the columns are of equal length then it will import.
(You can violate the unique separator requirement if your entries are wrapped in quotes. And if you want to get really fancy you could "import" almost anything. But if someone's asking you to format the data then they probably want a rectangular data.frame compatible layout. They probably want unique values in each column (no columns of points). But that's between you and them.)
long vs. wide form. Your last example is known as long form (except all cells should be filled in) and your first example is roughly wide form as discussed on the ?reshape page and illustrated in the examples at the end of that page. You likely want to stick with long form. For an alternative see the reshape2 package.
save & load. Note that if you are only writing it out to read it back in to R later (as opposed to communicating it to some other software) you could use save and load which don't require any change to the object at all.
json. Another possibility given the form of your example is that you might want to look at the rjson package .

Resources