select first two characters of values in a concatenated string - salesforce

I am trying to create a formula field that checks a string that is a series of concatenated values separated by a comma. I want to check the first two characters of each comma separated value in the string. For example, the string pattern could be: abcd,efgh,ijkl,mnop,qrst,uvwx
In my formula I'd like to check if the first two characters are 'ab','ef'
If so, I would return true, else false.
Thanks.

To do this properly, you need to use a regular expression. Unfortunately the REGEX function is not available in formula fields. It is, however, available in formulas in Validation Rules and in Workflow Rules. You can, therefore, specify the below formula in either of a Validation or Workflow Rule:
OR(
AND(
NOT(
BEGINS( KXENDev__Languages__c, "ab" )
),
NOT(
BEGINS( KXENDev__Languages__c, "ef" )
)
),
REGEX( KXENDev__Languages__c , ".*,(?!ab|ef).*")
)
If it's a Validation Rule, you're done -- this formula will create an error if any of the entries do not start with "ab" or "ef". If it's a Workflow Rule, then you can add a Field Update to it to update some field with False when this formula is true (if this formula is true then there is at least one item that doesn't start with ab or ef, so that would make your field False).
Some may ask "What's with the BEGINS statements? Couldn't you have done this all with one REGEX?" Yes, I probably could, but that makes for an increasingly complex REGEX statement, and these are quite difficult to debug in Salesforce.com, so I prefer to keep my REGEXes in Salesforce.com as simple as possible.

I suggest you to search for ',ab' and ',ef' using CONTAINS method. But first of all you need to re implement method which composes this string so it puts ',' before first substring. At the end returned string should look like ',abcd,efgh,ijkl,mnop,qrst,uvwx'.
If you are not able to re implement method which compose this string use LEFT([our string goes here],2) method to check first two chars.

Related

How to split a search string into parts, then check parts against a database

Here's what I'm dealing with:
We have a database of machines and their part lists are specified using strings. For example, one machine might be specified with the string &XXX&YYY-ZZZ, meaning the machine contains parts XXX and YYY and not ZZZ.
We use &XXX to specify that a part exists in a machine, and -XXX to specify that a part does not exist in a machine.
It's also possible that a part is not listed (i.e. not specified whether or not it exists in the machine). For example I might only have &XXX&YYY (ZZZ is not specified).
Additionally, the codes can be in any order, for example I might have &XXX&YYY-ZZZ or &XXX-ZZZ&YYY.
In order to search for machines, I get a string like this: &XXX-YYY/&YYY&ZZZ (/ is an OR operator), meaning "I want to find all machines that either a) contain XXX and do not contain YYY, or b) contain both YYY and ZZZ.
I'm having trouble parsing the string based on the variable ordering, possibility that parts may not be shown, and handling of the / operator. Note, we use Microsoft 365.
Looking for some suggestions!
When I search for &XXX-YYY/&YYY&ZZZ, I should return the following machines:
Machine
Result
&XXX-YYY&ZZZ
TRUE (because XXX exists and YYY does not exist)
&XXX-YYY-ZZZ
TRUE (because XXX exists and YYY does not exist)
&XXX&YYY&ZZZ
TRUE (because YYY exists and ZZZ exists)
&XXX&ZZZ
FALSE (because YYY is specified in the search, but this machine doesn't specify it)
&ZZZ&YYY
TRUE (showing that parts can be in any order)
You can try it in cell C2 with the following formula:
=LET(query, A2, queries, TEXTSPLIT(query,, "/"), input, B2:B7,
qryNum, ROWS(queries),
SPLIT, LAMBDA(txt,LET(str, SUBSTITUTE(SUBSTITUTE(txt, "&",";1_"),
"-",";0_"), TEXTSPLIT(str,,";",TRUE))),
lkUps, DROP(REDUCE("", queries, LAMBDA(acc,qry, HSTACK(acc, SPLIT(qry)))),,1),
MAP(input, LAMBDA(txt, LET(str, SPLIT(txt),
out, REDUCE("", SEQUENCE(qryNum, 1), LAMBDA(acc,idx,
LET(cols, INDEX(lkUps,,idx), qry, FILTER(cols, cols<>""),
matches, SUM(N(ISNUMBER(XMATCH(str, qry)))),
result, IF(ROWS(qry)=matches,1,0),IF(acc="", result, MAX(acc, result))
))), IF(out=1, TRUE, FALSE)
)))
)
and here is the corresponding output:
Assumptions:
String values (operation and part) should be unique, i.e. the case &XXX-YYY&XXX is not considered, because &XXX is duplicated.
Explanation
The main idea is to transform the input information in a way we can do comparisons at array level via XMATCH. The first thing to do is to identify each OR condition in the search string because we need to test each one of them against the Input column. The name queries is an array with all the OR conditions.
We can transform the string inputs in a way we can split the string into an array. SPLIT is a user LAMBDA function that does that:
SUBSTITUTE(SUBSTITUTE(txt, "&",";1_"),"-",";0_"), TEXTSPLIT(str,,";",TRUE)))
What it does is convert for example the input: &XXX-YYY&ZZZ into the following array:
1_XXX
0_YYY
1_ZZZ
We change the original operations &,- into 1,0 just for convenience, but you can keep the original operation value, it is irrelevant for the calculation. It is important to set the fourth TEXTSPLIT input argument to TRUE to ensure no empty rows are generated.
The name lkUps is an array with all the OR conditions organized by column for query. In the format we want, for example:
1_XXX 1_YYY
0_YYY 1_ZZZ
Note: For creating lkUps we use the pattern: DROP/REDUCE/HSTACK, for more information about it, check the answer to the question: how to transform a table in Excel from vertical to horizontal but with different length provided by #DavidLeal.
Now we have all the elements we need to build the recurrence. We use MAP to iterate over all Input column values. For each element (txt) we transform it to the format of our convenience via SPLIT user LAMBDA function and name it str.
We use REDUCE function inside MAP to iterate over all columns of lkUps to check against str. We use SEQUENCE(qryNum, 1) as input of REDUCE to be able to iterate over each lkUps column (qry).
Now we are going to use the above variables in XMATCH and name the variable matches as follows:
SUM(N(ISNUMBER(XMATCH(str, qry))))
If all values from qry were found in str then we have a match. If that is the case the item of the SUM will be 1, otherwise 0. Therefore the SUM for the match case should be of the same size as qry.
Because we include in the XMATCH both parts and operations (1,0), we ensure that not just the same parts are found, but also their corresponding operations are the same. The order of the parts is not relevant, XMATCH ensures it.
The REDUCE recurrence keeps the maximum value from the previous iteration (previous OR condition). We just need at least one match among all OR conditions. Therefore once we finish all the recurrence, if the result value of REDUCE is 1 at least one match was found. Finally, we transform the result into a TRUE/FALSE.
Note: For a large list of operations instead of using the above approach of two SUBSTITUTE calls. The SPLIT function can be defined as follow:
LAMBDA(txt,tks, LET(seq, SEQUENCE(COLUMNS(tks),1),
out, REDUCE("", seq, LAMBDA(acc,idx, LET(str, IF(acc="", txt, acc),
SUBSTITUTE(str, INDEX(tks,1,idx), INDEX(tks,2,idx))))),
TEXTSPLIT(out,,";",TRUE)))
and the input tks (tokens) can be defined as follow: {"&","-";"1_", "0_"}, i.e. in the first row old values and in the second row the new values.

Regex pattern matching for input type time

To check whether input type "time" field is completed (09:00am) I have used a regular expression.
ng-pattern="\b((1[0-2]|0?[1-9]):([0-5][0-9]) ([AaPp][Mm]))"
But in the same regular expression I want to check whether the input field is empty. For further information, time field can be empty or completed (ex: )(09:30am)
Can anyone help me regarding this..
In an ng-pattern, you need to use
ng-pattern="/^(?:(?:1[0-2]|0?[1-9]):[0-5]\d\s*[AaPp][Mm])?$/"
and if you need to avoid leading/trailing spaces, also add ng-trim="false".
See this regex demo.
The (?:...)? optional non-capturing group is a wrapper for the whole pattern that becomes optional, i.e. can match an empty string.
The ^ anchor will only match at the start of the string, and $ will anchor the match at the end of the string, so that an entire string should match.
In case somebody is looking for a regex for European time format (00:00-23:59), as i was, here is the regex for that:
^(?:1[0-9]|2[0-3]|0?[0-9]):[0-5]\d{1}?$
Hope this helps somebody save a minute or two.

What is the best way to validate and parse complex fields in Pentaho kettle?

What is the best way to validate fields in a row and if invalid, correct it to the right form?
The simplest example would be checking phone number field (can come in variant formats -> 111-111-1111, (111) 111-1111 etc), and we would ideally want to validate these and standardize to one form (lets say: 1111111111). One way to do this is to use filter rows and then use a regex, or we can use data validator. But this will only tell us what data is invalid but not actually format it for us. We can then use Javascript modified value step to write a js script to do this. But I am guessing there is a better way (or a built in integration that I haven't come across) that would do these basic validations. Or is it recommended to just dump rows containing invalid fields in a separate csv file and then use a script to parse it separately?
g'day
i use the excellent 'replace in string' step to handle this circumstance
you can cumulatively apply rules for removing bad char from strings within the single step - it is really easy to use for single-char fixes like what you have described, and best of all, it also allows you to search based on regex as well - in a single step you have documented your standardisation and produced the clean output
in your case, i would create two 'rules' to replace ( and ) with nothing - however, the - is a little trickier; you need a rule for each removal of a single char, so you would need to know the maximum number of - in a single data field, then add this many lines to your 'replace in string' step
if this is unpalatable, consider the 'user defined java expression' and a call to replace, eg: ( (t0 != null) ? t0.replace("-","") : t0 )
as i stated, each 'fix' is applied in sequential order - the In stream field is the input field-name, whereas the Outstream field is left blank instructing d.i. to modify the field itself - here's a more complex example where i search for regex and replace them with nothing, escape for the case where i escape a " double-quote:
In stream field Out stream field use RegEx Search Replace with
sc_srcuri N {Internal.Transformation.Filename.Directory}
re_s_sciname Y ["] \\"
re_s_sciname Y .[\x08]
re_s_sciname Y .[\x08]
re_s_sciname Y .[\x08]
re_s_sciname Y [*]
re_s_sciname Y \s*$
re_s_sciname Y ^\s*
notice i am removing up to three 'delete' control-codes [\x08] from this particular string?

Properly Using String Functions in Stored Procedures

I have an SSIS package that imports data into SQL Server. I have a field that I need to cut everything after and including "-". The following is my script:
SELECT LEFT(PartNumber, CHARINDEX('-', PartNumber) - 1)
FROM ExtensionBase
My question is where in my stored procedure should I use this script so that it cuts before entering data into the ExtensionBase. Can I do this in a Scalar_Value Function?
You have two routes available to you. You can use Derived Columns and the Expressions to generate this value or use a Script Transformation. Generally speaking, reaching for a script first is not a good habit for maintainability and performance in SSIS but the other rule of thumb is that if you can't see the entire expression without scrolling, it's too much.
Dataflow
Here's a sample data flow illustrating both approaches.
OLE_SRC
I used a simple query to simulate your source data. I checked for empty strings, part numbers with no dashes, multiple dashes and NULL.
SELECT
D.PartNumber
FROM
(
VALUES
('ABC')
, ('def-jkl')
, ('mn-opq-rst')
, ('')
, ('Previous line missing')
, (NULL)
) D(PartNumber);
DER Find dash position
I'm going to use FINDSTRING to determine the starting point. FINDSTRING is going to return a zero if the searched item doesn't exist or the input is NULL. I use the ternary operator to either return the position of the first dash, less a space to account for the dash, or the length of the source string.
(FINDSTRING(PartNumber,"-",1)) > 0
? (FINDSTRING(PartNumber,"-",1)) - 1
: LEN(PartNumber)
I find it helpful in these situations to first compute the positions before trying to use them later. That way if you make an error, you don't have to fix multiple formulas.
DER Trim out dash plus
The 2012 release of SSIS provided us with the LEFT function while previous editions users had to make due with SUBSTRING calls.
The LEFT expression would be
LEFT(PartNumber,StartOrdinal)
whilst the SUBSTRING is simply
SUBSTRING(PartNumber,1,StartOrdinal)
SCR use net libraries
This approach is to use the basic string capabilities of .NET to make life easier. The String.Split method is going to return an array of strings, less what we split upon. Since you only want the first thing, the zero-eth element, we will assign that to our SCRPartNumber column that is created in this task. Note that we check whether the PartNumber is null and sets the null flag on our new column.
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (!Row.PartNumber_IsNull)
{
string[] partSplit = Row.PartNumber.Split('-');
Row.SCRPartNumber = partSplit[0];
}
else
{
Row.SCRPartNumber_IsNull = true;
}
}
Results
You can see the results are the same however you compute them.

NULL vs Empty when dealing with user input

Yes, another NULL vs empty string question.
I agree with the idea that NULL means not set, while empty string means "a value that is empty". Here's my problem: If the default value for a column is NULL, how do I allow the user to enter that NULL.
Let's say a new user is created on a system. There is a first and last name field; last name is required while first name is not. When creating the user, the person will see 2 text inputs, one for first and one for last. The person chooses to only enter the last name. The first name is technically not set. During the insert I check the length of each field, setting all fields that are empty to NULL.
When looking at the database, I see that the first name is not set. The question that immediately comes to mind is that maybe they never saw the first name field (ie, because of an error). But this is not the case; they left if blank.
So, my question is, how do you decide when a field should be set to NULL or an empty string when receiving user input? How do you know that the user wants the field to be not set without detecting focus or if they deleted a value...or...or...?
Related Question: Should I use NULL or an empty string to represent no data in table column?
I'll break the pattern, and say that I would always use NULL for zero-length strings, for the following reasons.
If you start fine-slicing the implications of blanks, then you must ensure somehow that every other developer reads and writes it the same way.
How do you alphabetize it?
Can you unambiguously determine when a user omitted entering a value, compared with intentionally leaving it blank?
How would you unambiguously query for the difference? Can a query screen user indicate NULL vs. blank using standard input form syntax?
In practice, I've never been prohibited from reading and writing data using default, unsurprising behavior using this rule. If I've needed to know the difference, I've used a boolean field (which is easier to map to unambiguous UI devices). In one case, I used a trigger to enforce True => null value, but never saw it invoked because the BR layer filtered out the condition effectively.
If the user provides an empty string, I always treat it as null from the database perspective. In addition, I will typically trim my string inputs to remove leading/trailing spaces and then check for empty. It's a small win in the database with varchar() types and it also reduces the cases for search since I only need to check for name is null instead of name is null or name = '' You could also go the other way, converting null to ''. Either way, choose a way and be consistent.
What you need to do is figure out what behavior you want. There is no one fixed algebra of how name strings are interpreted.
Think about the state machine here: you have fields which have several states: it slounds like you're thinking about a state of "unitialized", another of "purposefully empty" and a third with some set value. ANYTHING you do that makes that assignment and is consistent with the rest of your program will be find; it sounds like the easy mapping is
NULL → uninitialized
"" → purposefully unset
a name → initialized.
I almost never use NULL when referring to actual data. When used for foreign keys, I would say that NULL is valid, but it is almost never valid for user entered data. The one exception that would probably come up quite regularly is for dates that don't exist, such as an employee database with a "termination_date" field. In that case, all current employees should have a value of NULL in that field. As for getting them to actually enter a null value, for the values that truly require a null value, I would put a checkbox next to the input field so that the user can check it on and off to see the corresponding value to null (or in a more user friendly manner, none). When enabling the checkbox to set the field to null, the corresponding text box should be disabled, and if a null value is already associated, it should start out as disabled, and only become enabled once the user unchecks the null checkbox.
I try to keep things simple. In this case, I'd make the first-name column not-nullable and allow blanks. Otherwise, you'll have three cases to deal with anywhere you refer to this field:
Blank first name
Null first name
Non-blank first name
If you go with 'blank is null' or 'null is blank' then you're down to two cases. Two cases are better than three.
To further answer your question: the user entering data probably doesn't (and shouldn't) know anything about what a "null" is and how it compares to an "empty". This issue should be resolved cleanly and consistently in the system--not the UI.
While your example is mostly for strings, I like to say that I use null for numerical and boolean fields.
An accountbalance of 0 is very different to me as one that is null. Same for booleans, if people take a multiple choice test with true and false answers, it is very important to know whether someone answered true or false or didn't answer at all.
Not using null in these cases would require me to have an extra table or a different setup to see whether someone answered a question. You could use for instance -1 for not filled in 0 for false and 1 for true, but then you're using a numerical field for something that's essentially a boolean.
I've never, ever had a use for a NULL value in production code. An empty string is a fine sentinel value for a blank name field, phone number, or annual income for any application. That said, I'm sure you could find some use for it, but I just think it's overused. If I were to use a NULL value, however, I imagine I'd use it anywhere I want to represent an empty value.
I have always used NULL for uninitialized values, empty for purposely empty values and 0 for off indicators.
By doing this all the time, it is there even if I am not using it, but I don't have to do anything different if I need that distinction.
I am usually testing for empty(), but sometimes I check for isset() which evaluates false on NULL. This is useful for reminders to answer certain questions. If it is empty, false or 0 then the question is answered.

Resources