So, I have a text file where the information are separated by the enter key (I don't know how to explain, I will paste the code and some stuff).
cha-cha
Fruzsina
Ede
salsa
Szilvia
Imre
Here's how the text file looks like, and I need to split it into three parts, the first being the type of the dance, and then dancer 1 and dancer 2.
using System;
using System.Collections.Generic;
using System.IO;
namespace tanciskola
{
struct tanc
{
public string tancnev;
public string tancos1;
public string tancos2;
}
class Program
{
static void Main(string[] args)
{
#region 1.feladat
StreamReader sr = new StreamReader("tancrend.txt");
tanc[] tanc = new tanc[140];
string[] elv;
int i = 0;
while (sr.Peek() != 0)
{
elv = sr.ReadLine().Split('I don't know what goes here');
tanc[i].tancnev = elv[0];
tanc[i].tancos1 = elv[1];
tanc[i].tancos2 = elv[2];
i++;
}
#endregion
Console.ReadKey();
}
}
}
Here is how I tried to solve it, although I don't really get how I should do it. The task is would be to display the first dance and the last dance, but for that I need to split it somehow.
As mentioned in my comments, you seem to have a text file where each item is on a new line, and a set of 3 lines constitutes a single 'record'. In that case, you can simply read all the lines of the file, and then create your records, like so:
var v = File.ReadLines("file path");
tancr[] tanc = new tancr[140];
for (int i = 0; i < v.Count(); i += 3)
{
tanc[i/3].tancnev= v.ElementAt(i);
tanc[i/3].tancos1 = v.ElementAt(i + 1);
tanc[i/3].tancos2 = v.ElementAt(i + 2);
}
Note: ReadLines() is better when the file size is large. If your file is small, you could use ReadAllLines() instead.
To split by the "enter character" you can use Environment.NewLine in .NET:
https://msdn.microsoft.com/en-us/library/system.environment.newline(v=vs.110).aspx
elv = sr.ReadAllText().Split(new string[] {Environment.NewLine}, StringSplitOptions.None);
This constant will contain the sequence that is specific to your OS (I'm guessing Windows).
You should be aware that the characters used for newlines is different for Windows vs. Linux/Unix. So in the rare event that someone edits your file on a different OS, you can run into problems.
On Windows, newline is a two character sequence: carriage-return + line-feed (ASCII 13 + 10). On Linux it is just line-feed. So if you wanted to be extra clever, you could first check for CRLF and if you only get one element back from Split() then try just LF.
char (* text)[1][45+1];
text = calloc(5000,(130+1));
strcpy(0[*text],"sometext)");
Now I want to encode "sometext" to base58, however, I do not know how, and oddly enough, there isn't one example of BASE58 in C.
The base58 encoding I'm interested in uses these symbols:
123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ
It's been optimized to lessen the risk of mis-reading, so 0 and 'O' are both gone, for instance.
P.S
Don't mind the weird allocation and declaration of the variables, I was experimenting.
You're not supposed to encode strings, you're supposed to encode integers.
If starting with a string, you must first decide how to interpret it as an integer (might be base128, or something), then re-encode in base58.
Satoshi has the reference implementation (https://github.com/bitcoin/bitcoin/blob/master/src/base58.h)
However, he uses some utility bignum class to do it, and it's in C++. If you have access to a bignum library, you just keep dividing by 58 until the number is broken up. If you don't have a bignum library, AFAIK you're outta luck.
Here's an implementation in PHP for large numbers I've created for Amithings, beyond the integers (Integer -> http://php.net/manual/en/language.types.integer.php).
For example, try the example below (Don't forget to pass your ID to the function in string format. Use the PHP function strval()):
$number = '123456789009876543211234567890';
$result = base58_encode($number);
echo('Encoded: ' . $result . '<br>');
echo('Decoded: ' . base58_decode($result) . '<br>');
Important: You may consider to change this routine by including some sort of key/password/encryption to ensure that others can not decode your database IDs.
function base58_encode($input)
{
$alphabet = '123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ';
$base_count = strval(strlen($alphabet));
$encoded = '';
while (floatval($input) >= floatval($base_count))
{
$div = bcdiv($input, $base_count);
$mod = bcmod($input, $base_count);
$encoded = substr($alphabet, intval($mod), 1) . $encoded;
$input = $div;
}
if (floatval($input) > 0)
{
$encoded = substr($alphabet, intval($input), 1) . $encoded;
}
return($encoded);
}
function base58_decode($input)
{
$alphabet = '123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ';
$base_count = strval(strlen($alphabet));
$decoded = strval(0);
$multi = strval(1);
while (strlen($input) > 0)
{
$digit = substr($input, strlen($input) - 1);
$decoded = bcadd($decoded, bcmul($multi, strval(strpos($alphabet, $digit))));
$multi = bcmul($multi, $base_count);
$input = substr($input, 0, strlen($input) - 1);
}
return($decoded);
}
My simple code with Crypto++ library:
string base58_encode(Integer num, string vers)
{
string alphabet[58] = {"1","2","3","4","5","6","7","8","9","A","B","C","D","E","F",
"G","H","J","K","L","M","N","P","Q","R","S","T","U","V","W","X","Y","Z","a","b","c",
"d","e","f","g","h","i","j","k","m","n","o","p","q","r","s","t","u","v","w","x","y","z"};
int base_count = 58; string encoded; Integer div; Integer mod;
while (num >= base_count)
{
div = num / base_count; mod = (num - (base_count * div));
encoded = alphabet[ mod.ConvertToLong() ] + encoded; num = div;
}
encoded = vers + alphabet[ num.ConvertToLong() ] + encoded;
return encoded;
}
It's just for cryptocurrency wallets. string can be changed for other tasks.
Here is an implementation that seems to be pure c.
https://github.com/trezor/trezor-crypto/blob/master/base58.c
I have wondered for some time, what a nice, clean solution for joining an array of strings might look like.
Example: I have ["Alpha", "Beta", "Gamma"] and want to join the strings into one, separated by commas – "Alpha, Beta, Gamma".
Now I know that most programming languages offer some kind of join method for this. I just wonder how these might be implemented.
When I took introductory courses, I often tried to go it alone, but never found a satisfactory algorithm. Everything seemed rather messy, the problem being that you can not just loop through the array, concatenating the strings, as you would add one too many commas (either before or after the last string).
I don’t want to check conditions in the loop. I don’t really want to add the first or the last string before/after the loop (I guess this is maybe the best way?).
Can someone show me an elegant solution? Or tell me exactly why there can’t be anything more elegant?
The most elegant solution i found for problems like this is something like this (in pseudocode)
separator = ""
foreach(item in stringCollection)
{
concatenatedString += separator + item
separator = ","
}
You just run the loop and only after the second time around the separator is set. So the first time it won't get added. It's not as clean as I'd like it to be so I'd still add comments but it's better than an if statement or adding the first or last item outside the loop.
All of these solutions are decent ones, but for an underlying library, both independence of separator and decent speed are important. Here is a function that fits the requirement assuming the language has some form of string builder.
public static string join(String[] strings, String sep) {
if(strings.length == 0) return "";
if(strings.length == 1) return strings[0];
StringBuilder sb = new StringBuilder();
sb.append(strings[0]);
for(int i = 1; i < strings.length; i++) {
sb.append(sep);
sb.append(strings[i]);
}
return sb.toString();
}
EDIT: I suppose I should mention why this would be speedier. The main reason would be because any time you call c = a + b; the underlying construct is usually c = (new StringBuilder()).append(a).append(b).toString();. By reusing the same string builder object, we can reduce the amount of allocations and garbage we produce.
And before someone chimes in with optimization is evil, we're talking about implementing a common library function. Acceptable, scalable performance is one of the requirements them. A join that takes a long time is one that's going to be not oft used.
Most languages nowadays - e.g. perl (mention by Jon Ericson), php, javascript - have a join() function or method, and this is by far the most elegant solution. Less code is better code.
In response to Mendelt Siebenga, if you do require a hand-rolled solution, I'd go with the ternary operator for something like:
separator = ","
foreach (item in stringCollection)
{
concatenatedString += concatenatedString ? separator + item : item
}
I usually go with something like...
list = ["Alpha", "Beta", "Gamma"];
output = "";
separator = "";
for (int i = 0; i < list.length ; i++) {
output = output + separator;
output = output + list[i];
separator = ", ";
}
This works because on the first pass, separator is empty (so you don't get a comma at the start, but on every subsequent pass, you add a comma before adding the next element.
You could certainly unroll this a little to make it a bit faster (assigning to the separator over and over isn't ideal), though I suspect that's something the compiler could do for you automatically.
In the end though, I suspect pretty this is what most language level join functions come down to. Nothing more than syntax sugar, but it sure is sweet.
For pure elegance, a typical recursive functional-language solution is quite nice. This isn't in an actual language syntax but you get the idea (it's also hardcoded to use comma separator):
join([]) = ""
join([x]) = "x"
join([x, rest]) = "x," + join(rest)
In reality you would write this in a more generic way, to reuse the same algorithm but abstract away the data type (doesn't have to be strings) and the operation (doesn't have to be concatenation with a comma in the middle). Then it usually gets called 'reduce', and many functional languages have this built in, e.g. multiplying all numbers in a list, in Lisp:
(reduce #'* '(1 2 3 4 5)) => 120
#Mendelt Siebenga
Strings are corner-stone objects in programming languages. Different languages implement strings differently. An implementation of join() strongly depends on underlying implementation of strings. Pseudocode doesn't reflect underlying implementation.
Consider join() in Python. It can be easily used:
print ", ".join(["Alpha", "Beta", "Gamma"])
# Alpha, Beta, Gamma
It could be easily implemented as follow:
def join(seq, sep=" "):
if not seq: return ""
elif len(seq) == 1: return seq[0]
return reduce(lambda x, y: x + sep + y, seq)
print join(["Alpha", "Beta", "Gamma"], ", ")
# Alpha, Beta, Gamma
And here how join() method is implemented in C (taken from trunk):
PyDoc_STRVAR(join__doc__,
"S.join(sequence) -> string\n\
\n\
Return a string which is the concatenation of the strings in the\n\
sequence. The separator between elements is S.");
static PyObject *
string_join(PyStringObject *self, PyObject *orig)
{
char *sep = PyString_AS_STRING(self);
const Py_ssize_t seplen = PyString_GET_SIZE(self);
PyObject *res = NULL;
char *p;
Py_ssize_t seqlen = 0;
size_t sz = 0;
Py_ssize_t i;
PyObject *seq, *item;
seq = PySequence_Fast(orig, "");
if (seq == NULL) {
return NULL;
}
seqlen = PySequence_Size(seq);
if (seqlen == 0) {
Py_DECREF(seq);
return PyString_FromString("");
}
if (seqlen == 1) {
item = PySequence_Fast_GET_ITEM(seq, 0);
if (PyString_CheckExact(item) || PyUnicode_CheckExact(item)) {
Py_INCREF(item);
Py_DECREF(seq);
return item;
}
}
/* There are at least two things to join, or else we have a subclass
* of the builtin types in the sequence.
* Do a pre-pass to figure out the total amount of space we'll
* need (sz), see whether any argument is absurd, and defer to
* the Unicode join if appropriate.
*/
for (i = 0; i < seqlen; i++) {
const size_t old_sz = sz;
item = PySequence_Fast_GET_ITEM(seq, i);
if (!PyString_Check(item)){
#ifdef Py_USING_UNICODE
if (PyUnicode_Check(item)) {
/* Defer to Unicode join.
* CAUTION: There's no gurantee that the
* original sequence can be iterated over
* again, so we must pass seq here.
*/
PyObject *result;
result = PyUnicode_Join((PyObject *)self, seq);
Py_DECREF(seq);
return result;
}
#endif
PyErr_Format(PyExc_TypeError,
"sequence item %zd: expected string,"
" %.80s found",
i, Py_TYPE(item)->tp_name);
Py_DECREF(seq);
return NULL;
}
sz += PyString_GET_SIZE(item);
if (i != 0)
sz += seplen;
if (sz < old_sz || sz > PY_SSIZE_T_MAX) {
PyErr_SetString(PyExc_OverflowError,
"join() result is too long for a Python string");
Py_DECREF(seq);
return NULL;
}
}
/* Allocate result space. */
res = PyString_FromStringAndSize((char*)NULL, sz);
if (res == NULL) {
Py_DECREF(seq);
return NULL;
}
/* Catenate everything. */
p = PyString_AS_STRING(res);
for (i = 0; i < seqlen; ++i) {
size_t n;
item = PySequence_Fast_GET_ITEM(seq, i);
n = PyString_GET_SIZE(item);
Py_MEMCPY(p, PyString_AS_STRING(item), n);
p += n;
if (i < seqlen - 1) {
Py_MEMCPY(p, sep, seplen);
p += seplen;
}
}
Py_DECREF(seq);
return res;
}
Note that the above Catenate everything. code is a small part of the whole function.
In pseudocode:
/* Catenate everything. */
for each item in sequence
copy-assign item
if not last item
copy-assign separator
' Pseudo code Assume zero based
ResultString = InputArray[0]
n = 1
while n (is less than) Number_Of_Strings
ResultString (concatenate) ", "
ResultString (concatenate) InputArray[n]
n = n + 1
loop
In Perl, I just use the join command:
$ echo "Alpha
Beta
Gamma" | perl -e 'print(join(", ", map {chomp; $_} <> ))'
Alpha, Beta, Gamma
(The map stuff is mostly there to create a list.)
In languages that don't have a built in, like C, I use simple iteration (untested):
for (i = 0; i < N-1; i++){
strcat(s, a[i]);
strcat(s, ", ");
}
strcat(s, a[N]);
Of course, you'd need to check the size of s before you add more bytes to it.
You either have to special case the first entry or the last.
collecting different language implementations ?
Here is, for your amusement, a Smalltalk version:
join:collectionOfStrings separatedBy:sep
|buffer|
buffer := WriteStream on:''.
collectionOfStrings
do:[:each | buffer nextPutAll:each ]
separatedBy:[ buffer nextPutAll:sep ].
^ buffer contents.
Of course, the above code is already in the standard library found as:
Collection >> asStringWith:
so, using that, you'd write:
#('A' 'B' 'C') asStringWith:','
But here's my main point:
I would like to put more emphasis on the fact that using a StringBuilder (or what is called "WriteStream" in Smalltalk) is highly recommended. Do not concatenate strings using "+" in a loop - the result will be many many intermediate throw-away strings. If you have a good Garbage Collector, thats fine. But some are not and a lot of memory needs to be reclaimed. StringBuilder (and WriteStream, which is its grand-grand-father) use a buffer-doubling or even adaptive growing algorithm, which needs MUCH less scratch memory.
However, if its only a few small strings you are concatenating, dont care, and "+" them; the extra work using a StringBuilder might be actually counter-productive, up to an implementation- and language-dependent number of strings.
The following is no longer language-agnostic (but that doesn't matter for the discussion because the implementation is easily portable to other languages). I tried to implement Luke's (theretically best) solution in an imperative programming language. Take your pick; mine's C#. Not very elegant at all. However, (without any testing whatsoever) I could imagine that its performance is quite decent because the recursion is in fact tail recursive.
My challenge: give a better recursive implementation (in an imperative language). You say what “better” means: less code, faster, I'm open for suggestions.
private static StringBuilder RecJoin(IEnumerator<string> xs, string sep, StringBuilder result) {
result.Append(xs.Current);
if (xs.MoveNext()) {
result.Append(sep);
return RecJoin(xs, sep, result);
} else
return result;
}
public static string Join(this IEnumerable<string> xs, string separator) {
var i = xs.GetEnumerator();
if (!i.MoveNext())
return string.Empty;
else
return RecJoin(i, separator, new StringBuilder()).ToString();
}
join() function in Ruby:
def join(seq, sep)
seq.inject { |total, item| total << sep << item } or ""
end
join(["a", "b", "c"], ", ")
# => "a, b, c"
join() in Perl:
use List::Util qw(reduce);
sub mjoin($#) {$sep = shift; reduce {$a.$sep.$b} #_ or ''}
say mjoin(', ', qw(Alpha Beta Gamma));
# Alpha, Beta, Gamma
Or without reduce:
sub mjoin($#)
{
my ($sep, $sum) = (shift, shift);
$sum .= $sep.$_ for (#_);
$sum or ''
}
Perl 6
sub join( $separator, #strings ){
my $return = shift #strings;
for #strings -> ( $string ){
$return ~= $separator ~ $string;
}
return $return;
}
Yes I know it is pointless because Perl 6 already has a join function.
I wrote a recursive version of the solution in lisp. If the length of the list is greater that 2 it splits the list in half as best as it can and then tries merging the sublists
(defun concatenate-string(list)
(cond ((= (length list) 1) (car list))
((= (length list) 2) (concatenate 'string (first list) "," (second list)))
(t (let ((mid-point (floor (/ (- (length list) 1) 2))))
(concatenate 'string
(concatenate-string (subseq list 0 mid-point))
","
(concatenate-string (subseq list mid-point (length list))))))))
(concatenate-string '("a" "b"))
I tried applying the divide and conquer strategy to the problem, but I guess that does not give a better result than plain iteration. Please let me know if this could have been done better.
I have also performed an analysis of the recursion obtained by the algorithm, it is available here.
Use the String.join method in C#
http://msdn.microsoft.com/en-us/library/57a79xd0.aspx
In Java 5, with unit test:
import junit.framework.Assert;
import org.junit.Test;
public class StringUtil
{
public static String join(String delim, String... strings)
{
StringBuilder builder = new StringBuilder();
if (strings != null)
{
for (String str : strings)
{
if (builder.length() > 0)
{
builder.append(delim);
}
builder.append(str);
}
}
return builder.toString();
}
#Test
public void joinTest()
{
Assert.assertEquals("", StringUtil.join(", ", null));
Assert.assertEquals("", StringUtil.join(", ", ""));
Assert.assertEquals("", StringUtil.join(", ", new String[0]));
Assert.assertEquals("test", StringUtil.join(", ", "test"));
Assert.assertEquals("foo, bar", StringUtil.join(", ", "foo", "bar"));
Assert.assertEquals("foo, bar, baz", StringUtil.join(", ", "foo", "bar", "baz"));
}
}