Find specific number of a word from the beginning in string - arrays

I've been gathering information using api calls from my jira. Information gathered is saved in a body file and it has the following content:
No tickets:
{"startAt":0,"maxResults":50,"total":0,"issues":[]}{"startAt":0,"maxResults":50,"total":0,"issues":[]}
One Ticket:
{"expand":"names,schema","startAt":0,"maxResults":50,"total":1,"issues":[{"expand":"operations,versionedRepresentations,editmeta,changelog,renderedFields","id":"456881","self":"https://myjira...com","key":"TICKET-1111","fields":{"summary":"[TICKET] New Test jira","created":"2018-12-17T01:47:09.000-0800"}}]}{"expand":"names,schema","startAt":0,"maxResults":50,"total":1,"issues":[{"expand":"operations,versionedRepresentations,editmeta,changelog,renderedFields","id":"456881","self":"https://myjira...com","key":"TICKET-1111","fields":{"summary":"[TICKET] New Test jira","created":"2018-12-17T01:47:09.000-0800"}}]}
Two Tickets:
{expand:schema,names,startAt:0,maxResults:50,total:2,issues:[{expand:operations,versionedRepresentations,editmeta,changelog,renderedFields,id:456881,self:https://myjira...com,key:TICKET-1111,fields:{summary:[TICKET] New Test jira,created:2018-12-17T01:47:09.000-0800}},{expand:operations,versionedRepresentations,editmeta,changelog,renderedFields,id:320281,self:https://myjira...com,key:TICKET-2222,fields:{summary:[TICKET] Test jira,created:2016-03-18T07:58:52.000-0700}}]}{expand:schema,names,startAt:0,maxResults:50,total:2,issues:[{expand:operations,versionedRepresentations,editmeta,changelog,renderedFields,id:456881,self:https://myjira...com,key:TICKET-1111,fields:{summary:[TICKET] New Test jira,created:2018-12-17T01:47:09.000-0800}},{expand:operations,versionedRepresentations,editmeta,changelog,renderedFields,id:320281,self:https://myjira...com,key:TICKET-2222,fields:{summary:[TICKET] Test jira,created:2016-03-18T07:58:52.000-0700}}]}
etc..
Using this code I've been able to gather total open tickets:
std::ifstream t("BodyOpenIssues.out");
std::string BodyString((std::istreambuf_iterator<char>(t)),
std::istreambuf_iterator<char>());
// Removing Quotes
BodyString.erase(std::remove(BodyString.begin(), BodyString.end(), '"'), BodyString.end());
int Result = 0;
unsigned first = BodyString.find("total:");
unsigned last = BodyString.find(",issues");
std::string TotalOpenIssues = BodyString.substr(first + 6, last - (first + 6));
Result = std::stoi(TotalOpenIssues);
return Result;
Using a second function I'm trying to get the keys based on total open tickets.
if (GetOpenIssuesNumber() > 0)
{
std::ifstream t("BodyOpenIssues.out");
std::string BodyString((std::istreambuf_iterator<char>(t)),
std::istreambuf_iterator<char>());
// Removing Quotes
BodyString.erase(std::remove(BodyString.begin(), BodyString.end(), '"'), BodyString.end());
unsigned first = BodyString.find("key:TICKET-");
unsigned last = BodyString.find(",fields");
std::string TotalOpenIssues = BodyString.substr(first + 11, last - (first + 11));
String^ Result = gcnew String(TotalOpenIssues.c_str());
return "TICKET-" + Result;
}
else
{
return "No open issues found";
}
What I mean is:
If Total is 1 to search from the beginning and find the first key TICKET-1111.
If Total is 2 to search from the beginning and get the first key TICKET-1111 then to continue from there and to find the next key TICKET-2222.
And based on that total to find that many keys in that string.
I got lost from all the casting between the types as ifstream reads the file and I save the result in std::string. After the find I save the result in System::String to use it in my Label.. I've been researching and found out that I can use char array but I can't make it dynamic based on BodyString.length().
If more information is required please let me know.
Any suggestions are really appreciated! Thank you in advance!

I went for nlohmann json library. It has everything I need. Thank you Walnut!
These are formatted as JSON. You should use a JSON library for C++ and parse the files with that. Using search/replace is unnecessary complicated and you will likely run into corner cases you haven't considered sooner or later (do you really want the code to randomly miss tickets, etc.?). Also String^ is not C++. Are you writing C++/CLI instead of C++? If so, please tag c++-cli instead of c++. – walnut

Related

Get WPF TextBox Row and Column Position

I've been searching and the solutions here work for getting the row and column position of the cursor in a WPF text box until the row number is greater than 16,311. After that, the column is a negative value.
private void SetTextFileCursorPosition()
{
var caretIndex = TextFile.CaretIndex;
var row = TextFile.GetLineIndexFromCharacterIndex(caretIndex);
var col = caretIndex - TextFile.GetCharacterIndexFromLineIndex(row);
CursorPosition = $"{col + 1}, {row + 1}";
}
Very strange behavior. Even pouring through the .NET source code, I couldn't quite figure out exactly why this was happening. I've spent some time writing up my research and have submitted it as a bug in the .NET framework.
Here is the link to the bug report:
System.Windows.Controls.TextBox.GetCharacterIndexFromLineIndex returns increasingly incorrect values for larger line numbers
But I'll include a summary of what I found:
My experience was a little different than yours. In my tests, everything works fine until I get to line 8,512. Starting at that line GetCharacterIndexFromLineIndex seems to start returning the starting index of the next line, instead of the one being requested. Meaning, instead of giving me the start of 8,512, it was giving me the start of 8,513.
Testing with larger numbers of lines, I found that at line 25,536, GetCharacterIndexFromLineIndex starts skipping two lines, instead returning the start of line 25,538. The number of line skips increases to 3 at line 42,560 and then to 4 at line 59,584.
This reveals a pattern: every 17,024 lines, the number of skipped lines increases by 1. The pattern starts at line 8,512, because it is 17,024 / 2 (half).
I can't explain why this happens exaclty, but the above provides some good documentation on the behavior. And below, I've put together some code to work around the problem.
You can work around this to a drgree:
var caretIndex = TextFile.CaretIndex;
var line = TextFile.GetLineIndexFromCharacterIndex(caretIndex);
var colStart = TextFile.GetCharacterIndexFromLineIndex(line);
var pos = caretIndex - colStart;
int posAdj = pos;
int lineAdj = line;
while (posAdj < 0)
{
posAdj = TextFile.GetLineLength(lineAdj) + posAdj;
lineAdj--;
}
CursorPosition = $"{posAdj + 1}, {line + 1}"; //NOT lineAdj
The above adds on the length of previous lines until it reaches a positive value, effectively adding in the skipped lines. This should work no matter how long the text is, and should even keep working after they (hopefully) patch the bug (since then pos should never be < 0).

What is the best way to set up a Spring JPA to handling searching for items based on tags?

I am trying to set up a search system for a database where each element (a code) in one table has tags mapped by a Many to many relationship. I am trying to write a controller, "search" where I can search a set of tags which basically act like key words, giving me an element list where the elements all have the specified tags. My current function is incredibly naive, basically it consists of retrieving all the codes which are mapped to be a tag, then adding those a set, then sorting the codes by how many times the tags for each code is found in the query string.
public List<Code> naiveSearch(String queryText) {
String[] tagMatchers = queryText.split(" ");
Set<Code> retained = new HashSet<>();
for (int i = 0; i < Math.min(tagMatchers.length, 4); i++) {
tagRepository.findAllByValueContaining(tagMatchers[i]).ifPresent((tags) -> {
tags.forEach(tag -> {
retained.addAll(tag.getCodes());
}
);
});
}
SortedMap<Integer, List<Code>> matches = new TreeMap<>();
List<Code> c;
for (Code code : retained) {
int sum = 0;
for (String tagMatcher : tagMatchers) {
for (Tag tag : code.getTags()) {
if (tag.getValue().contains(tagMatcher)) {
sum += 1;
}
}
}
c = matches.getOrDefault(sum, new ArrayList<>());
c.add(code);
matches.put(sum, c);
}
c = new ArrayList<>();
matches.values().forEach(c::addAll);
Collections.reverse(c);
return c;
}
This is quite slow and the overhead is unacceptable. My previous trick was a basically retrieval on the description for each code in the CRUDrepository
public interface CodeRepository extends CrudRepository<Code, Long> {
Optional<Code> findByCode(String codeId);
Optional<Iterable<Code>> findAllByDescriptionContaining(String query);
}
However this is brittle since the order of tags in containing factors into whether the result will be found. eg. I want "tall ... dog" == "dog ... tall"
So okay, I'm back several days later with how I actually solved this problem. I used hibernate's built in search library which has a very easy implementation in spring. Just paste the required maven coordinates in your POM.xml and it was ready to roll.
First I removed the manytomany for the tags<->codes and just concatenated all my tags into a string field. Next I added #Field to the tags field and then wrote a basic search Method. The method I wrote was a very simple search function which took a set of "key words" or tags then performed a boolean search based on fuzzy terms for the the indexed tags for each code. So far it is pretty good. My database is fairly small (100k) so I'm not sure about how this will scale, but currently each search returns in about 20-50 ms which is fast enough for my purposes.

Proper way to parse a file and build output

I'm trying to learn D and I thought after doing the hello world stuff, I could try something I wanted to do in Java before, where it was a big pain because of the way the Regex API worked: A little template engine.
So, I started with some simple code to read through a file, character by character:
import std.stdio, std.file, std.uni, std.array;
void main(string [] args) {
File f = File("src/res/test.dtl", "r");
bool escape = false;
char [] result;
Appender!(char[]) appender = appender(result);
foreach(c; f.rawRead(new char[f.size])) {
if(c == '\\') {
escape = true;
continue;
}
if(escape) {
escape = false;
// do something special
}
if(c == '#') {
// start of scope
}
appender.put(c);
}
writeln(appender.data());
}
The contents of my file could be something like this:
<h1>#{hello}</h1>
The goal is to replace the #{hello} part with some value passed to the engine.
So, I actually have two questions:
1. Is that a good way to process characters from file in D? I hacked this together after searching through all the imported modules and picking what sounded like it might do the job.
2. Sometimes, I would want to access more than one character (to improve checking for escape-sequences, find a whole scope, etc. Should I slice the array for that? Or are D's regex functions up to that challenge? So far, I only found matchFirst and matchAll methods, but I would like to match, replace and return to that position. How could that be done?
D standard library does not provide what you require. What you need is called "string interpolation", and here is a very nice implementation in D that you can use the way you describe: https://github.com/Abscissa/scriptlike/blob/4350eb745531720764861c82e0c4e689861bb17e/src/scriptlike/core.d#L139
Here is a blog post about this library: https://p0nce.github.io/d-idioms/#String-interpolation-as-a-library

Splitting a text file where the information are separated in different lines

So, I have a text file where the information are separated by the enter key (I don't know how to explain, I will paste the code and some stuff).
cha-cha
Fruzsina
Ede
salsa
Szilvia
Imre
Here's how the text file looks like, and I need to split it into three parts, the first being the type of the dance, and then dancer 1 and dancer 2.
using System;
using System.Collections.Generic;
using System.IO;
namespace tanciskola
{
struct tanc
{
public string tancnev;
public string tancos1;
public string tancos2;
}
class Program
{
static void Main(string[] args)
{
#region 1.feladat
StreamReader sr = new StreamReader("tancrend.txt");
tanc[] tanc = new tanc[140];
string[] elv;
int i = 0;
while (sr.Peek() != 0)
{
elv = sr.ReadLine().Split('I don't know what goes here');
tanc[i].tancnev = elv[0];
tanc[i].tancos1 = elv[1];
tanc[i].tancos2 = elv[2];
i++;
}
#endregion
Console.ReadKey();
}
}
}
Here is how I tried to solve it, although I don't really get how I should do it. The task is would be to display the first dance and the last dance, but for that I need to split it somehow.
As mentioned in my comments, you seem to have a text file where each item is on a new line, and a set of 3 lines constitutes a single 'record'. In that case, you can simply read all the lines of the file, and then create your records, like so:
var v = File.ReadLines("file path");
tancr[] tanc = new tancr[140];
for (int i = 0; i < v.Count(); i += 3)
{
tanc[i/3].tancnev= v.ElementAt(i);
tanc[i/3].tancos1 = v.ElementAt(i + 1);
tanc[i/3].tancos2 = v.ElementAt(i + 2);
}
Note: ReadLines() is better when the file size is large. If your file is small, you could use ReadAllLines() instead.
To split by the "enter character" you can use Environment.NewLine in .NET:
https://msdn.microsoft.com/en-us/library/system.environment.newline(v=vs.110).aspx
elv = sr.ReadAllText().Split(new string[] {Environment.NewLine}, StringSplitOptions.None);
This constant will contain the sequence that is specific to your OS (I'm guessing Windows).
You should be aware that the characters used for newlines is different for Windows vs. Linux/Unix. So in the rare event that someone edits your file on a different OS, you can run into problems.
On Windows, newline is a two character sequence: carriage-return + line-feed (ASCII 13 + 10). On Linux it is just line-feed. So if you wanted to be extra clever, you could first check for CRLF and if you only get one element back from Split() then try just LF.

Any Efficient way to parse large text files and store parsing information?

My purpose is to parse text files and store information in respective tables.
I have to parse around 100 folders having more that 8000 files and whole size approximately 20GB.
When I tried to store whole file contents in a string, memory out exception was thrown.
That is
using (StreamReader objStream = new StreamReader(filename))
{
string fileDetails = objStream.ReadToEnd();
}
Hence I tried one logic like
using (StreamReader objStream = new StreamReader(filename))
{
// Getting total number of lines in a file
int fileLineCount = File.ReadLines(filename).Count();
if (fileLineCount < 90000)
{
fileDetails = objStream.ReadToEnd();
fileDetails = fileDetails.Replace(Environment.NewLine, "\n");
string[] fileInfo = fileDetails.ToString().Split('\n');
//call respective method for parsing and insertion
}
else
{
while ((firstLine = objStream.ReadLine()) != null)
{
lineCount++;
fileDetails = (fileDetails != string.Empty) ? string.Concat(fileDetails, "\n", firstLine)
: string.Concat(firstLine);
if (lineCount == 90000)
{
fileDetails = fileDetails.Replace(Environment.NewLine, "\n");
string[] fileInfo = fileDetails.ToString().Split('\n');
lineCount = 0;
//call respective method for parsing and insertion
}
}
//when content is 90057, to parse 57
if (lineCount < 90000 )
{
string[] fileInfo = fileDetails.ToString().Split('\n');
lineCount = 0;
//call respective method for parsing and insertion
}
}
}
Here 90,000 is the bulk size which is safe to process without giving out of memory exception for my case.
Still the process is taking more than 2 days for completion. I observed this is because of reading line by line.
Is there any better approach to handle this ?
Thanks in Advance :)
You can use a profiler to detect what sucks your performance. In this case it's obvious: disk access and string concatenation.
Do not read a file more than once. Let's take a look at your code. First of all, the line int fileLineCount = File.ReadLines(filename).Count(); means you read the whole file and discard what you've read. That's bad. Throw away your if (fileLineCount < 90000) and keep only else.
It almost doesn't matter if you read line-by-line in consecutive order or the whole file because reading is buffered in any case.
Avoid string concatenation, especially for long strings.
fileDetails = fileDetails.Replace(Environment.NewLine, "\n");
string[] fileInfo = fileDetails.ToString().Split('\n');
It's really bad. You read the file line-by-line, why do you do this replacement/split? File.ReadLines() gives you a collection of all lines. Just pass it to your parsing routine.
If you'll do this properly I expect significant speedup. It can be optimized further by reading files in a separate thread while processing them in the main. But this is another story.

Resources