Splitting a text file where the information are separated in different lines - file

So, I have a text file where the information are separated by the enter key (I don't know how to explain, I will paste the code and some stuff).
cha-cha
Fruzsina
Ede
salsa
Szilvia
Imre
Here's how the text file looks like, and I need to split it into three parts, the first being the type of the dance, and then dancer 1 and dancer 2.
using System;
using System.Collections.Generic;
using System.IO;
namespace tanciskola
{
struct tanc
{
public string tancnev;
public string tancos1;
public string tancos2;
}
class Program
{
static void Main(string[] args)
{
#region 1.feladat
StreamReader sr = new StreamReader("tancrend.txt");
tanc[] tanc = new tanc[140];
string[] elv;
int i = 0;
while (sr.Peek() != 0)
{
elv = sr.ReadLine().Split('I don't know what goes here');
tanc[i].tancnev = elv[0];
tanc[i].tancos1 = elv[1];
tanc[i].tancos2 = elv[2];
i++;
}
#endregion
Console.ReadKey();
}
}
}
Here is how I tried to solve it, although I don't really get how I should do it. The task is would be to display the first dance and the last dance, but for that I need to split it somehow.

As mentioned in my comments, you seem to have a text file where each item is on a new line, and a set of 3 lines constitutes a single 'record'. In that case, you can simply read all the lines of the file, and then create your records, like so:
var v = File.ReadLines("file path");
tancr[] tanc = new tancr[140];
for (int i = 0; i < v.Count(); i += 3)
{
tanc[i/3].tancnev= v.ElementAt(i);
tanc[i/3].tancos1 = v.ElementAt(i + 1);
tanc[i/3].tancos2 = v.ElementAt(i + 2);
}
Note: ReadLines() is better when the file size is large. If your file is small, you could use ReadAllLines() instead.

To split by the "enter character" you can use Environment.NewLine in .NET:
https://msdn.microsoft.com/en-us/library/system.environment.newline(v=vs.110).aspx
elv = sr.ReadAllText().Split(new string[] {Environment.NewLine}, StringSplitOptions.None);
This constant will contain the sequence that is specific to your OS (I'm guessing Windows).
You should be aware that the characters used for newlines is different for Windows vs. Linux/Unix. So in the rare event that someone edits your file on a different OS, you can run into problems.
On Windows, newline is a two character sequence: carriage-return + line-feed (ASCII 13 + 10). On Linux it is just line-feed. So if you wanted to be extra clever, you could first check for CRLF and if you only get one element back from Split() then try just LF.

Related

Find specific number of a word from the beginning in string

I've been gathering information using api calls from my jira. Information gathered is saved in a body file and it has the following content:
No tickets:
{"startAt":0,"maxResults":50,"total":0,"issues":[]}{"startAt":0,"maxResults":50,"total":0,"issues":[]}
One Ticket:
{"expand":"names,schema","startAt":0,"maxResults":50,"total":1,"issues":[{"expand":"operations,versionedRepresentations,editmeta,changelog,renderedFields","id":"456881","self":"https://myjira...com","key":"TICKET-1111","fields":{"summary":"[TICKET] New Test jira","created":"2018-12-17T01:47:09.000-0800"}}]}{"expand":"names,schema","startAt":0,"maxResults":50,"total":1,"issues":[{"expand":"operations,versionedRepresentations,editmeta,changelog,renderedFields","id":"456881","self":"https://myjira...com","key":"TICKET-1111","fields":{"summary":"[TICKET] New Test jira","created":"2018-12-17T01:47:09.000-0800"}}]}
Two Tickets:
{expand:schema,names,startAt:0,maxResults:50,total:2,issues:[{expand:operations,versionedRepresentations,editmeta,changelog,renderedFields,id:456881,self:https://myjira...com,key:TICKET-1111,fields:{summary:[TICKET] New Test jira,created:2018-12-17T01:47:09.000-0800}},{expand:operations,versionedRepresentations,editmeta,changelog,renderedFields,id:320281,self:https://myjira...com,key:TICKET-2222,fields:{summary:[TICKET] Test jira,created:2016-03-18T07:58:52.000-0700}}]}{expand:schema,names,startAt:0,maxResults:50,total:2,issues:[{expand:operations,versionedRepresentations,editmeta,changelog,renderedFields,id:456881,self:https://myjira...com,key:TICKET-1111,fields:{summary:[TICKET] New Test jira,created:2018-12-17T01:47:09.000-0800}},{expand:operations,versionedRepresentations,editmeta,changelog,renderedFields,id:320281,self:https://myjira...com,key:TICKET-2222,fields:{summary:[TICKET] Test jira,created:2016-03-18T07:58:52.000-0700}}]}
etc..
Using this code I've been able to gather total open tickets:
std::ifstream t("BodyOpenIssues.out");
std::string BodyString((std::istreambuf_iterator<char>(t)),
std::istreambuf_iterator<char>());
// Removing Quotes
BodyString.erase(std::remove(BodyString.begin(), BodyString.end(), '"'), BodyString.end());
int Result = 0;
unsigned first = BodyString.find("total:");
unsigned last = BodyString.find(",issues");
std::string TotalOpenIssues = BodyString.substr(first + 6, last - (first + 6));
Result = std::stoi(TotalOpenIssues);
return Result;
Using a second function I'm trying to get the keys based on total open tickets.
if (GetOpenIssuesNumber() > 0)
{
std::ifstream t("BodyOpenIssues.out");
std::string BodyString((std::istreambuf_iterator<char>(t)),
std::istreambuf_iterator<char>());
// Removing Quotes
BodyString.erase(std::remove(BodyString.begin(), BodyString.end(), '"'), BodyString.end());
unsigned first = BodyString.find("key:TICKET-");
unsigned last = BodyString.find(",fields");
std::string TotalOpenIssues = BodyString.substr(first + 11, last - (first + 11));
String^ Result = gcnew String(TotalOpenIssues.c_str());
return "TICKET-" + Result;
}
else
{
return "No open issues found";
}
What I mean is:
If Total is 1 to search from the beginning and find the first key TICKET-1111.
If Total is 2 to search from the beginning and get the first key TICKET-1111 then to continue from there and to find the next key TICKET-2222.
And based on that total to find that many keys in that string.
I got lost from all the casting between the types as ifstream reads the file and I save the result in std::string. After the find I save the result in System::String to use it in my Label.. I've been researching and found out that I can use char array but I can't make it dynamic based on BodyString.length().
If more information is required please let me know.
Any suggestions are really appreciated! Thank you in advance!
I went for nlohmann json library. It has everything I need. Thank you Walnut!
These are formatted as JSON. You should use a JSON library for C++ and parse the files with that. Using search/replace is unnecessary complicated and you will likely run into corner cases you haven't considered sooner or later (do you really want the code to randomly miss tickets, etc.?). Also String^ is not C++. Are you writing C++/CLI instead of C++? If so, please tag c++-cli instead of c++. – walnut

How can I password protect a file regardless of its extension in Java 8 ro Java 10

I have tried doing this by encrypting individual files but I have a lot of data (~20GB) and hence it would take a lot of time. In my test it took 2.28 minutes to encrypt a single file of size 80MB.
Is there a quicker way to be able to password protect that would apply to any any file (text/binary/multimedia)?
If you are just trying to hide the file from others, you can try to encrypt the file path instead of encrypting the whole huge file.
For the path you mentioned: text/binary/multimedia, you can try to encrypt it by a method as:
private static String getEncryptedPath(String filePath) {
String[] tokens = filePath.split("/");
List<String> tList = new ArrayList<>();
for (int i = 0; i < tokens.length; i++) {
tList.add(Hashing.md5().newHasher() // com.google.common.hash.Hashing;
.putString(tokens[i] + filePath, StandardCharsets.UTF_8).hash().toString()
.substring(2 * i, 2 * i + 5)); // to make it impossible to encrypt, add your custom secret here;
}
return String.join("/", tList);
}
and then it becomes an encrypted path as:
72b12/9cbb3/4a5f3
Once you know the real path text/binary/multimedia, any time you want to access the file, you can just use this method to get the real file path 72b12/9cbb3/4a5f3.

TextEncodings.Base64Url.Decode vs Convert.FromBase64String

I was working on creating a method that would generate a JWT token. Part of the method reads a value from my web.config that services as the "secret" used to generate the hash used to create the signature for the JWT token.
<add key="MySecret" value="j39djak49H893hsk297353jG73gs72HJ3tdM37Vk397" />
Initially I tried using the following to convert the "secret" value to a byte array.
byte[] key = Convert.FromBase64String(ConfigurationManager.AppSettings["MySecret"]);
However, an exception was thrown when this line was reached ...
The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.
So I looked into the OAuth code and so another method being used to change a base64 string into a byte array
byte[] key = TextEncodings.Base64Url.Decode(ConfigurationManager.AppSettings["MySecret"]);
This method worked without issue. To me it looks like they are doing the same thing. Changing a Base64 text value into an array of bytes. However, I must be missing something. Why does Convert.FromBase64String fail and TextEncodings.Base64Url.Decode work?
I came across the same thing when I migrated our authentication service to .NET Core. I had a look at the source code for the libraries we used in our previous implementation, and the difference is actually in the name itself.
The TextEncodings class has two types of text encoders, Base64TextEncoder and Base64UrlEncoder. The latter one modifies the string slightly so the base64 string can be used in an url.
My understanding is that it is quite common to replace + and / with - and _. As a matter of fact we have been doing the same with our handshake tokens. Additionally the padding character(s) at the end can also be removed. This leaves us with the following implementation (this is from the source code):
public class Base64UrlTextEncoder : ITextEncoder
{
public string Encode(byte[] data)
{
if (data == null)
{
throw new ArgumentNullException("data");
}
return Convert.ToBase64String(data).TrimEnd('=').Replace('+', '-').Replace('/', '_');
}
public byte[] Decode(string text)
{
if (text == null)
{
throw new ArgumentNullException("text");
}
return Convert.FromBase64String(Pad(text.Replace('-', '+').Replace('_', '/')));
}
private static string Pad(string text)
{
var padding = 3 - ((text.Length + 3) % 4);
if (padding == 0)
{
return text;
}
return text + new string('=', padding);
}
}

Any Efficient way to parse large text files and store parsing information?

My purpose is to parse text files and store information in respective tables.
I have to parse around 100 folders having more that 8000 files and whole size approximately 20GB.
When I tried to store whole file contents in a string, memory out exception was thrown.
That is
using (StreamReader objStream = new StreamReader(filename))
{
string fileDetails = objStream.ReadToEnd();
}
Hence I tried one logic like
using (StreamReader objStream = new StreamReader(filename))
{
// Getting total number of lines in a file
int fileLineCount = File.ReadLines(filename).Count();
if (fileLineCount < 90000)
{
fileDetails = objStream.ReadToEnd();
fileDetails = fileDetails.Replace(Environment.NewLine, "\n");
string[] fileInfo = fileDetails.ToString().Split('\n');
//call respective method for parsing and insertion
}
else
{
while ((firstLine = objStream.ReadLine()) != null)
{
lineCount++;
fileDetails = (fileDetails != string.Empty) ? string.Concat(fileDetails, "\n", firstLine)
: string.Concat(firstLine);
if (lineCount == 90000)
{
fileDetails = fileDetails.Replace(Environment.NewLine, "\n");
string[] fileInfo = fileDetails.ToString().Split('\n');
lineCount = 0;
//call respective method for parsing and insertion
}
}
//when content is 90057, to parse 57
if (lineCount < 90000 )
{
string[] fileInfo = fileDetails.ToString().Split('\n');
lineCount = 0;
//call respective method for parsing and insertion
}
}
}
Here 90,000 is the bulk size which is safe to process without giving out of memory exception for my case.
Still the process is taking more than 2 days for completion. I observed this is because of reading line by line.
Is there any better approach to handle this ?
Thanks in Advance :)
You can use a profiler to detect what sucks your performance. In this case it's obvious: disk access and string concatenation.
Do not read a file more than once. Let's take a look at your code. First of all, the line int fileLineCount = File.ReadLines(filename).Count(); means you read the whole file and discard what you've read. That's bad. Throw away your if (fileLineCount < 90000) and keep only else.
It almost doesn't matter if you read line-by-line in consecutive order or the whole file because reading is buffered in any case.
Avoid string concatenation, especially for long strings.
fileDetails = fileDetails.Replace(Environment.NewLine, "\n");
string[] fileInfo = fileDetails.ToString().Split('\n');
It's really bad. You read the file line-by-line, why do you do this replacement/split? File.ReadLines() gives you a collection of all lines. Just pass it to your parsing routine.
If you'll do this properly I expect significant speedup. It can be optimized further by reading files in a separate thread while processing them in the main. But this is another story.

Windows phone 7 silverlight string array in Isolated storage

I have an array of strings which I am trying to store in Isolated storage, However I need to store each string in the array in a new file of its own.
Any approach is welcomed.
Thanks.
I do something similar in an app with code roughly along these lines. Though I am serializing objects in an array to json. Same rough idea though.
using (IsolatedStorageFile file = IsolatedStorageFile.GetUserStoreForApplication()) {
for (int i = 0; i < array.Length; i++) {
string fileName = "file" + i.ToString() + ".dat";
using (var stream = file.CreateFile(filename)) {
using (var writer = new StreamWriter(stream)) {
writer.Write(array[i]);
}
}
}
}
Note this is just typed straight in, I may have a mistake in there :)
Your question is a little vauge, but here I go.
What is stopping you from just serializing each string to a file with the index as the name? For example, store stringarray[0] in a file 0.xml.
Just check whether the file exists before trying to read it.

Resources