How to get each StateName TotalPopulation #2009 - sql-server

Stateid StateName Year Population
1 andhra 2008 25000
2 andhra 2009 10000
3 ap 2008 15000
2 ap 2009 20000
How to get each StateName TotalPopulation #2009

without using the Linq, here a solution
Dictionary<string,int> data=new Dictionary<string,int>(); // to store the words and count
string inputString = "I love red color. He loves red color. She love red kit.";
var details=inputString.Split(' '); // split the string you have on space, u can exclude the non alphabet characters
foreach(var detail in details)
{
// based on Ron comment you should trim the empty detail in case you have multi space in the string
if(!string.IsNullOfEmpty(detail) && data.ContainsKey(detail))
data[detail].Value++;
else
data.Add(detail,1);
}

What I did was, i broke the string into an array using the split function. Then looped through each of the element and checked whether that element has been parsed or not. If yes, the add the count by 1 else add the element to the dictionary.
class Program
{
static void Main(string[] args)
{
string inputString = "I love red color. He loves red color. She love red kit.";
Dictionary<string, int> dict = new Dictionary<string, int>();
var arr = inputString.Split(' ','.',',');
foreach (string s in arr)
{
if (dict.ContainsKey(s))
dict[s] += 1;
else
dict.Add(s, 1);
}
foreach (var item in dict)
{
Console.WriteLine(item.Key + "- " + item.Value);
}
Console.ReadKey();
}
}

Try this way
string inputString = "I love red color. He loves red color. She love red kit."; Dictionary<string, int> wordcount = new Dictionary<string, int>();
var words = inputString.Split(' ');
foreach (var word in words)
{
if (!wordcount.ContainsKey(word))
wordcount.Add(word, words.Count(p => p == word));
}
wordcount will have the output you are looking for. Note that it will have all entries for all words, so if you want for only a subset, then alter it to lookup against a master list.

Check the link given bellow.
Count Word
This example shows how to use a LINQ query to count the occurrences of a specified word in a string. Note that to perform the count, first the Split method is called to create an array of words. There is a performance cost to the Split method. If the only operation on the string is to count the words, you should consider using the Matches or IndexOf methods instead. However, if performance is not a critical issue, or you have already split the sentence in order to perform other types of queries over it, then it makes sense to use LINQ to count the words or phrases as well.
class CountWords
{
static void Main()
{
string text = #"Historically, the world of data and the world of objects" +
#" have not been well integrated. Programmers work in C# or Visual Basic" +
#" and also in SQL or XQuery. On the one side are concepts such as classes," +
#" objects, fields, inheritance, and .NET Framework APIs. On the other side" +
#" are tables, columns, rows, nodes, and separate languages for dealing with" +
#" them. Data types often require translation between the two worlds; there are" +
#" different standard functions. Because the object world has no notion of query, a" +
#" query can only be represented as a string without compile-time type checking or" +
#" IntelliSense support in the IDE. Transferring data from SQL tables or XML trees to" +
#" objects in memory is often tedious and error-prone.";
string searchTerm = "data";
//Convert the string into an array of words
string[] source = text.Split(new char[] { '.', '?', '!', ' ', ';', ':', ',' }, StringSplitOptions.RemoveEmptyEntries);
// Create and execute the query. It executes immediately
// because a singleton value is produced.
// Use ToLowerInvariant to match "data" and "Data"
var matchQuery = from word in source
where word.ToLowerInvariant() == searchTerm.ToLowerInvariant()
select word;
// Count the matches.
int wordCount = matchQuery.Count();
Console.WriteLine("{0} occurrences(s) of the search term \"{1}\" were found.", wordCount, searchTerm);
// Keep console window open in debug mode
Console.WriteLine("Press any key to exit");
Console.ReadKey();
}
}
/* Output:
3 occurrences(s) of the search term "data" were found.

Related

How to split an Array of Strings to another array with respect to lines?

So I am trying to get a list a list of WebElements to an Array of String. I have written the below code, which is helping me to get the WebElements to an array
Code -
List<WebElement> statusLabelSection = driver.findElements(By.xpath("//div[#class='MuiGrid-root paboxLayout MuiGrid-item']//table"));
List<String> stringsOutput = new ArrayList<String>();
for(WebElement ele:statusLabelSection) {
stringsOutput.add(ele.getText());
}
System.out.println(stringsOutput);
Output –
[Not Required
0001F - Code Recovery Composite
Recovery Which That May Apply
None
Additional static Information
None]
Problem Statement -
I want the output to be in an array like this
expected -
[Not Required, 0001F - Code Recovery Composite, Recovery Which That May Apply, None, Additional static Information, None]
Can you please help!!!
Might not be the best way but solved it. Any feedback to improvise is welcome!!!
code
List<WebElement> statusLabelSection = driver.findElements(By.xpath("//div[#class='MuiGrid-root paboxLayout MuiGrid-item']//table"));
List<String> stringsOutput = new ArrayList<String>();
for(WebElement ele: statusLabelSection) {
stringsOutput.add(ele.getText());
}
String str = stringsOutput.get(0).toString();
String[] inputArr = str.split("\n");
for (int i=0; i<inputArr.length; i++) {
System.out.println("num " + i + " - "+ inputArr[i]);
}

Need to optimize the code for mapping codes to description

I have a Text field that has semicolon separated codes. These code has to be replaced with the description. I have separate map that have code and description. There is a trigger that replace the code with their description. the data will loaded using the dataloader in this field. I am afraid, it might not work for large amount of data since I had to use inner for loops. Is there any way I can achieve this without inner for loops?
public static void updateStatus(Map<Id,Account> oldMap,Map < Id, Account > newMap)
{
Map<String,String> DataMap = new Map<String,String>();
List<Data_Mapper__mdt> DataMapList = [select Salseforce_Value__c,External_Value__c from Data_Mapper__mdt where
active__c = true AND Field_API_Name__c= :CUSTOMFIELD_MASSTATUS AND
Object_API_Name__c= :OBJECT_ACCOUNT];
for(Data_Mapper__mdt dataMapRec: DataMapList){
DataMap.put(dataMapRec.External_Value__c,dataMapRec.Salseforce_Value__c);
}
for(Account objAcc : newMap.values())
{
if(objAcc.Status__c != ''){
String updatedDescription='';
List<String> delimitedList = objAcc.Status__c.split('; ');
for(String Code: delimitedList) {
updatedDescription = DataMap.get(Code);
}
objAcc.Status__c = updatedDescription;
}
It should be fine. You have a map-based access acting like a dictionary, you have a query outside of the loop. Write an unit test that populates close to 200 accounts (that's how the trigger will be called in every data loader iteration). There could be some concerns if you'd have thousands of values in that Status__c but there's not much that can be done to optimise it.
But I want to ask you 3 things.
The way you wrote it the updatedDescription will always contain the last decoded value. Are you sure you didn't want to write something like updatedDescription += DataMap.get(Code) + ';'; or maybe add them to a List<String> and then call String.join on it. It looks bit weird. If you truly want first or last element - I'd add break; or really just access the last element of the split (and then you're right, you're removing the inner loop). But written like that this looks... weird.
Have you thought about multiple runs. I mean if there's a workflow rule/flow/process builder - you might enter this code again. And because you're overwriting the field I think it'll completely screw you over.
Map<String, String> mapping = new Map<String, String>{
'one' => '1',
'two' => '2',
'three' => '3',
'2' => 'lol'
};
String text = 'one;two';
List<String> temp = new List<String>();
for(String key : text.split(';')){
temp.add(mapping.get(key));
}
text = String.join(temp, ';');
System.debug(text); // "1;2"
// Oh noo, a workflow caused my code to run again.
// Or user edited the account.
temp = new List<String>();
for(String key : text.split(';')){
temp.add(mapping.get(key));
}
text = String.join(temp, ';');
System.debug(text); // "lol", some data was lost
// And again
temp = new List<String>();
for(String key : text.split(';')){
temp.add(mapping.get(key));
}
text = String.join(temp, ';');
System.debug(text); // "", empty
Are you even sure you need this code. Salesforce is perfectly fine with having separate picklist labels (what's visible to the user) and api values (what's saved to database, referenced in Apex, validation rules...). Maybe you don't need this transformation at all. Maybe your company should look into Translation Workbench. Or even ditch this code completely and do some search-replace before invoking data loader, in some real ETL tool (or even MS Excel)

XML serialization error in SQL Server when concatenating characters

I'm trying to concatenate a number of characters corresponding to some ints (the first 15 ASCII characters for example):
;with cte as (
select 1 nr
union all
select nr + 1
from cte
where nr <= 15)
select (
select char(nr)
from cte
for xml path (''), type).value('.', 'nvarchar(max)')
option (maxrecursion 0)
but I'm getting an error saying:
Msg 6841, Level 16, State 1, Line 1
FOR XML could not serialize the
data for node 'NoName' because it contains a character (0x0001) which
is not allowed in XML. To retrieve this data using FOR XML, convert it
to binary, varbinary or image data type and use the BINARY BASE64
directive.
Even if I try to modify my CTE's seed from 1 to 10 for example, I still get the error but for a different character, 0x000B.
I have two possible solutions I'm looking for:
find a way to concatenate all the characters (any other method than using FOR XML) - preffered solution
or
remove all characters that are not allowed in XML - I've tried this but it seems I just hit other non-allowed characters. I've also looked for a list of these non-allowed characters but I couldn't find one.
Any help is very much appreciated.
Update - context:
This is part of a bigger CTE where I'm trying to generate random character sets from random numbers by doing multiple divisions and modulus operations.
I modulo each number by 256, get the result, turn it into its corresponding CHAR() and then dividing the number by 256 and so on until it's modulo or division is 0.
In the end I want to concatenate all of these characters. I have everything in place, I'm just encountering this error which does not allow me to concatenate the generated strings from CHAR().
This might sound weird and you might say that it's not a SQL-task and you can do it in other languages, but I want to try and find a solution in SQL, no matter how low the performance is.
XML PATH is just one of the techniques used for grouped concatenation. Aaron Bertrand explains and compares all of them in Grouped Concatenation in SQL Server. Built-in support for this is coming in the next version of SQL Server in the form of STRING_AGG.
Bertrand's article explains that XML PATH can only work with XML safe characters. Non-printable characters like 0x1 (SOH) and 0xB (Vertical Tab) won't work without XML encoding the data first. Typically, this isn't a problem because real data doesn't contain non-printable charactes - what would a SOH and VT look like on a text box?
Perhaps, the easiest way to solve your problem is to use UNICODE() instead of CHAR() to generate Unicode characters and start form 32 instead of 0 or 1.
For now, the fastest and safest method to aggregate strings is to use a SQLCLR custom aggregate. If you don't use sloppy techniques like concatenating strings directly, it will also consume the least amount of memory.The various GROUP_CONCAT implementations shown in this project are small enough that you can copy and use in your own projects. They will work with any Unicode character too, even with non-printable ones.
BTW, SQL Server vNext brings STRING_AGG to aggregate strings. We'll just have to wait a year or two.
The non-ordered version, GROUP_CONCAT is just 99 lines. It simply collects all strings in a dictionary and writes them out at the end:
using System;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.IO;
using System.Collections.Generic;
using System.Text;
namespace GroupConcat
{
[Serializable]
[SqlUserDefinedAggregate(Format.UserDefined,
MaxByteSize = -1,
IsInvariantToNulls = true,
IsInvariantToDuplicates = false,
IsInvariantToOrder = true,
IsNullIfEmpty = true)]
public struct GROUP_CONCAT : IBinarySerialize
{
private Dictionary<string, int> values;
public void Init()
{
this.values = new Dictionary<string, int>();
}
public void Accumulate([SqlFacet(MaxSize = 4000)] SqlString VALUE)
{
if (!VALUE.IsNull)
{
string key = VALUE.Value;
if (this.values.ContainsKey(key))
{
this.values[key] += 1;
}
else
{
this.values.Add(key, 1);
}
}
}
public void Merge(GROUP_CONCAT Group)
{
foreach (KeyValuePair<string, int> item in Group.values)
{
string key = item.Key;
if (this.values.ContainsKey(key))
{
this.values[key] += Group.values[key];
}
else
{
this.values.Add(key, Group.values[key]);
}
}
}
[return: SqlFacet(MaxSize = -1)]
public SqlString Terminate()
{
if (this.values != null && this.values.Count > 0)
{
StringBuilder returnStringBuilder = new StringBuilder();
foreach (KeyValuePair<string, int> item in this.values)
{
for (int value = 0; value < item.Value; value++)
{
returnStringBuilder.Append(item.Key);
returnStringBuilder.Append(",");
}
}
return returnStringBuilder.Remove(returnStringBuilder.Length - 1, 1).ToString();
}
return null;
}
public void Read(BinaryReader r)
{
int itemCount = r.ReadInt32();
this.values = new Dictionary<string, int>(itemCount);
for (int i = 0; i <= itemCount - 1; i++)
{
this.values.Add(r.ReadString(), r.ReadInt32());
}
}
public void Write(BinaryWriter w)
{
w.Write(this.values.Count);
foreach (KeyValuePair<string, int> s in this.values)
{
w.Write(s.Key);
w.Write(s.Value);
}
}
}
}
Just another approach (works with non-printables too):
You are adding one character after each other. You do not need any group concatenation at all. Your recursive (rather iterativ) CTE is a hidden RBAR on its own and will do this for you.
The following example uses a list of ints (considering your use case where you need to do this with random numbers) as input:
DECLARE #SomeInts TABLE(ID INT IDENTITY,intVal INT);
INSERT INTO #SomeInts VALUES(36),(33),(39),(32),(35),(37),(1),(2),(65);
WITH cte AS
(
SELECT ID,intVal AS nr,CAST(CHAR(intVal) AS VARCHAR(MAX)) AS targetString FROM #SomeInts WHERE ID=1
UNION ALL
SELECT si.ID,intVal + 1,targetString + CHAR(intVal)
FROM #SomeInts AS si
INNER JOIN cte ON si.ID=cte.ID+1
)
SELECT targetString, CAST(targetString AS varbinary(max))
FROM cte
option (maxrecursion 0);
The result (printed and as growing hex list --> beware of x01 and x02):

Using LINQ to find Excel columns that don't exist in array?

I have a solution that works for what I want, but I'm hoping to get some slick LINQ types to help me improve what I have, and learn something new in the process.
The code below is used verify that certain column names exist on a spreadsheet. I was torn between using column index values or column names to find them. They both have good and bad points, but decided to go with column names. They'll always exist, and sometimes in different order, though I'm working on this.
Details:
GetData() method returns a DataTable from the Excel spreadsheet. I cycle through all the required field names from my array, looking to see if it matches with something in the column collection on the spreadsheet. If not, then I append the missing column name to an output parameter from the method. I need both the boolean value and the missing fields variable, and I wasn't sure of a better way than using the output parameter. I then remove the last comma from the appended string for the display on the UI. If the StringBuilder object isn't null (I could have used the missingFieldCounter too) then I know there's at least one missing field, bool will be false. Otherwise, I just return output param as empty, and method as true.
So, Is there a more slick, all-in-one way to check if fields are missing, and somehow report on them?
private bool ValidateFile(out string errorFields)
{
data = GetData();
List<string> requiredNames = new [] { "Site AB#", "Site#", "Site Name", "Address", "City", "St", "Zip" }.ToList();
StringBuilder missingFields = null;
var missingFieldCounter = 0;
foreach (var name in requiredNames)
{
var foundColumn = from DataColumn c in data.Columns
where c.ColumnName == name
select c;
if (!foundColumn.Any())
{
if (missingFields == null)
missingFields = new StringBuilder();
missingFieldCounter++;
missingFields.Append(name + ",");
}
}
if (missingFields != null)
{
errorFields = missingFields.ToString().Substring(0, (missingFields.ToString().Length - 1));
return false;
}
errorFields = string.Empty;
return true;
}
Here is the linq solution that makes the same.
I call the ToArray() function to activate the linq statement
(from col in requiredNames.Except(
from dataCol in data
select dataCol.ColumnName
)
select missingFields.Append(col + ", ")
).ToArray();
errorFields = missingFields.ToString();
Console.WriteLine(errorFields);

lucene ngram tokenizer usage for fuzzy phrase match

I am trying to achieve fuzzy phrase search (to match misspelled words) by using lucene, by referring various blogs I thought to try ngram indexes on fuzzy phrase search.
But I couldn't find ngram tokenizer as part of my lucene3.4 JAR library, is it deprecated and replaced with something else ? - currently I am using standardAnalyzer where I am getting decent results for exact match of terms.
I have below two requirements to handle.
My index is having document with phrase "xyz abc pqr", when I provide query "abc xyz"~5, I am able to get results, but my requirement is to get results for same document even though I have one extra word like "abc xyz pqr tst" in my query (I understand match score will be little less) - using proximity extra word in phrase is not working, if I remove proximity and double quotes " " from my query, I am getting expected results (but there I get many false positives like documents containing only xyz, only abc etc.)
In same above example, if somebody misspell query "abc xxz", I still want to get results for same document.
I want to give a try with ngram but not sure it will work as expected.
Any thoughts ?
Try to use BooleanQuery and FuzzyQuery like:
public void fuzzysearch(String querystr) throws Exception{
querystr=querystr.toLowerCase();
System.out.println("\n\n-------- Start fuzzysearch -------- ");
// 3. search
int hitsPerPage = 10;
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
IndexReader reader = IndexReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
BooleanQuery bq = new BooleanQuery();
String[] searchWords = querystr.split(" ") ;
int id=0;
for(String word: searchWords ){
Query query = new FuzzyQuery(new Term(NAME,word));
if(id==0){
bq.add(query, BooleanClause.Occur.MUST);
}else{
bq.add(query, BooleanClause.Occur.SHOULD);
}
id++;
}
System.out.println("query ==> " + bq.toString());
searcher.search(bq, collector );
parseResults( searcher, collector ) ;
searcher.close();
}
public void parseResults(IndexSearcher searcher, TopScoreDocCollector collector ) throws Exception {
ScoreDoc[] hits = collector.topDocs().scoreDocs;
// 4. display results
System.out.println("Found " + hits.length + " hits.");
for(int i=0;i<hits.length;++i) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println((i + 1) + ". " + d.get(NAME));
}
}

Resources