I have a function which returns a dynamic array of byte
type
TMyEncrypt = Array of Byte;
TMyDecrypt = Array of Byte;
function decrypt(original: TMyEncrypt) : TMyDecrypt;
The content of the returned dynamic array TMyDecrypt is a standard text with CRLF.
How can i load this into a TStringList with CRLF as separator, without saving it to a temporary file before?
EDIT: the retruned array of byte contains unicode coded characters
Decode the byte array to a string, and then assign to the Text property of the string list.
var
Bytes: TBytes;
StringList: TStringList;
....
StringList.Text := TEncoding.Unicode.GetString(Bytes);
Note the use of TBytes which is the standard type used to hold dynamic arrays of bytes. For compatibility reasons it makes sense to use TBytes. That way your data can be processed by other RTL and library code. A fact we immediately take advantage of by using TEncoding.
You could use SetString, as my answer originally suggested:
var
Text: string;
Bytes: TBytes;
StringList: TStringList;
....
SetString(Text, PChar(Bytes), Length(Bytes) div SizeOf(Char)));
StringList.Text := Text;
Personally I prefer to use TEncoding because it is very explicit about the encoding being used.
If your text was null terminated then you could use:
StringList.Text := PChar(Bytes);
Again, I'd prefer to be explicit about the encoding. And I might be a little paranoid about my data somehow not being null terminated.
You might find that UTF-8 is a more efficient representation than UTF-16.
Related
I'm wondering if there is a canonical way to read Unicode files in Rust. Most examples read a line, because if the file is well formed utf8 a line should consist of whole/complete 'characters' (Unicode Scalar Values).
Here's a simple example of reading a file as a utf8 file, but only works if 'one byte' == 'one character', which isn't guaranteed.
let mut chr: char;
let f = File::open(filename).expect("File not found");
let mut rdr = BufReader::new(f);
while (true) {
let mut x: [u8; 1] = [0];
let n = rdr.read(&mut x);
let bytes = utf8_char_width(x[0]); // unstable feature
chr = x[0] as char;
...
I'm new to Rust, but the only thing I could find that would help me read a full character was the utf8_char_width, which is marked unstable.
Does Rust have a facility such that I can open a file as (Unicode) 'text' and it will read/respect the BOM (if available) and allow me to iterate over the contents of that file returning a Rust char type for each 'character' (Unicode Scalar Value) found?
Am I making something easy hard? Again, I'm new to Rust so everything is hard to me currently :-D
Update (in response to comments)
A "Unicode file" is a file containing only Unicode encoded data. I'd like to be able to read Unicode encoded files, without worrying about the various details of character size or endianness. Since Rust uses a four byte (u32) 'char' I'd like to be able to read the file one character at a time, not worrying about line length (or it's allocation).
While UTF8 is byte oriented, the Unicode standard does define a BOM for it as well as saying that the default (no BOM) is UTF8.
It is somewhat counter-intuitive (I'm new to Rust) that the char type is UTF32 while a string is (effectively) a vector of u8. However, I can see the reasoning behind forcing the developer to be explicit regarding 'byte' or 'char' as I've seen a lot of bugs caused by people assuming that those are the same size. Clearly, there is an iterator to return char's from a string so the code to handle the UTF8 -> UTF32 is in place, it just needs to take it's input from a file stream rather than a memory vector. Perhaps as I learn more a solution will present itself.
var originalMsg *C.uchar
C.ecall_pay_w(8, 10, &originalMsg, &signature)
originalMsgStr := fmt.Sprintf("%c", originalMsg)
//Todo convert originalMstStr to same value with originalMsg
i have to convert go str(originalMsgStr) to *C.uchar type which is same value with originalMsg.
How can i do it?
You get a C-string back from your call to C.ecall_pay_w and want to convert that C-string to a Go-string. You can do this by manually following the C-string until you reach the terminating 0.
Assuming that:
There is a terminating 0 at the end of the C-string
The C-string is encoded as ASCII, so every byte represents an ASCII character (in the range [0..127]). This means it is both ASCII and UTF-8 at the same time because UTF-8 is backward compatible to ASCII.
Then your solution could be this:
func convertCStringToGoString(c *C.uchar) string {
var buf []byte
for *c != 0 {
buf = append(buf, *c)
c = (*C.uchar)(unsafe.Pointer(uintptr(unsafe.Pointer(c)) + 1))
}
return string(buf)
}
Note that doing "unsafe" things like this in Go is cast-heavy. That was done on purpose by the Go authors. You need to convert to unsafe.Pointer before you can convert to uintptr. The uintptr can be added to (+ 1) while the unsafe.Pointer does not support that. These are the reasons for that much casting.
I do not know Go in much detail, but do not forget that in C the *C.uchar would be something like unsigned char * which is often used to reference a string (Null-terminated array of characters).
Here you use fmt.Sprintf("%c", originalMsg), with %c which expects a single char, so apart from the language detail on how you would cast the resulting string to a *C.uchar, you most probably have lost content already.
%c the character represented by the corresponding Unicode code point
From https://golang.org/pkg/fmt/#hdr-Printing
I wants to know what the size of string/blob in apex.
What i found is just size() method, which return the number of characters in string/blob.
What the size of single character in Salesforce ?
Or there is any way to know the size in bytes directly ?
I think the only real answer here is "it depends". Why do you need to know this?
The methods on String like charAt and codePointAt suggest that UTF-16 might be used internally; in that case, each character would be represented by 2 or 4 bytes, but this is hardly "proof".
Apex seems to be translated to Java and running on some form of JVM and Strings in Java are represented internally as UTF-16 so again that could indicate that characters are 2 or 4 bytes in Apex.
Any time Strings are sent over the wire (e.g. as responses from a #RestResource annotated class), UTF-8 seems to be used as a character encoding, which would mean 1 to 4 bytes per character are used, depending on what character it is. (See section 2.5 of the Unicode standard.)
But you should really ask yourself why you think your code needs to know this because it most likely doesn't matter.
You can estimate string size doing the following:
String testString = 'test string';
Blob testBlob = Blob.valueOf(testString);
// below converts blob to hexadecimal representation; four ones and zeros
// from blob will get converted to single hexadecimal character
String hexString = EncodingUtil.convertToHex(testBlob);
// One byte consists of eight ones and zeros so, one byte consists of two
// hex characters, hence the number of bytes would be
Integer numOfBytes = hexString.length() / 2;
Another option to estimate the size would be to get the heap size before and after assigning value to String variable:
String testString;
System.debug(Limits.getHeapSize());
testString = 'testString';
System.debug(Limits.getHeapSize());
The difference between two printed numbers would be the size a string takes on the heap.
Please note that the values obtained from those methods will be different. We don't know what type of encoding is used for storing string in Salesforce heap or when converting string to blob.
I've migrated my project from XE5 to 10 Seattle. I'm still using ANSII codes to communicate with devices. With my new build, Seattle IDE is sending † character instead of space char (which is #32 in Ansii code) in Char array. I need to send space character data to text file but I can't.
I tried #32 (like before I used), #032 and #127 but it doesn't work. Any idea?
Here is how I use:
fillChar(X,50,#32);
Method signature (var X; count:Integer; Value:Ordinal)
Despite its name, FillChar() fills bytes, not characters.
Char is an alias for WideChar (2 bytes) in Delphi 2009+, in prior versions it is an alias for AnsiChar (1 byte) instead.
So, if you have a 50-element array of WideChar elements, the array is 100 bytes in size. When you call fillChar(X,50,#32), it fills in the first 50 bytes with a value of $20 each. Thus the first 25 WideChar elements will have a value of $2020 (aka Unicode codepoint U+2020 DAGGER, †) and the second 25 elements will not have any meaningful value.
This issue is explained in the FillChar() documentation:
Fills contiguous bytes with a specified value.
In Delphi, FillChar fills Count contiguous bytes (referenced by X) with the value specified by Value (Value can be of type Byte or AnsiChar)
Note that if X is a UnicodeString, this may not work as expected, because FillChar expects a byte count, which is not the same as the character count.
In addition, the filling character is a single-byte character. Therefore, when Buf is a UnicodeString, the code FillChar(Buf, Length(Buf), #9); fills Buf with the code point $0909, not $09. In such cases, you should use the StringOfChar routine.
This is also explained in Embarcadero's Unicode Migration Resources white papers, for instance on page 28 of Delphi Unicode Migration for Mere Mortals: Stories and Advice from the Front Lines by Cary Jensen:
Actually, the complexity of this type of code is not related to pointers and buffers per se. The problem is due to Chars being used as pointers. So, now that the size of Strings and Chars in bytes has changed, one of the fundamental assumptions that much of this code embraces is no longer valid: That individual Chars are one byte in length.
Since this type of code is so problematic for Unicode conversion (and maintenance in general), and will require detailed examination, a good argument can be made for refactoring this code where possible. In short, remove the Char types from these operations, and switch to another, more appropriate data type. For example, Olaf Monien wrote, "I wouldn't recommend using byte oriented operations on Char (or String) types. If you need a byte-buffer, then use ‘Byte’ as [the] data type: buffer: array[0..255] of Byte;."
For example, in the past you might have done something like this:
var
Buffer: array[0..255] of AnsiChar;
begin
FillChar(Buffer, Length(Buffer), 0);
If you merely want to convert to Unicode, you might make the following changes:
var
Buffer: array[0..255] of Char;
begin
FillChar(Buffer, Length(buffer) * SizeOf(Char), 0);
On the other hand, a good argument could be made for dropping the use of an array of Char as your buffer, and switch to an array of Byte, as Olaf suggests. This may look like this (which is similar to the first segment, but not identical to the second, due to the size of the buffer):
var
Buffer: array[0..255] of Byte;
begin
FillChar(Buffer, Length(buffer), 0);
Better yet, use this second argument to FillChar which works regardless of the data type of the array:
var
Buffer: array[0..255] of Byte;
begin
FillChar(Buffer, Length(buffer) * SizeOf(Buffer[0]), 0);
The advantage of these last two examples is that you have what you really wanted in the first place, a buffer that can hold byte-sized values. (And Delphi will not try to apply any form of implicit string conversion since it's working with bytes and not code units.) And, if you want to do pointer math, you can use PByte. PByte is a pointer to a Byte.
The one place where changes like may not be possible is when you are interfacing with an external library that expects a pointer to a character or character array. In those cases, they really are asking for a buffer of characters, and these are normally AnsiChar types.
So, to address your issue, since you are interacting with an external device that expects Ansi data, you need to declare your array as using AnsiChar or Byte elements instead of (Wide)Char elements. Then your original FillChar() call will work correctly again.
If you want to use ANSI for communication with devices, you would define the array as
x: array[1..50] of AnsiChar;
In this case to fill it with space characters you use
FillChar(x, 50, #32);
Using an array of AnsiChar as communication buffer may become troublesome in a Unicode environment, so therefore I would suggest to use a byte array as communication buffer
x: array[1..50] of byte;
and intialize it with
FillChar(x, 50, 32);
Its easy to define a string at the size of 3 (in old delphi code)
st:string[3];
now, we wish to move the code to ansi
st:ansiString[3];
won't work!
and for adcanced oem type
st:oemString[3];
same problem, where
type
OemString = Type AnsiString(CP_OEMCP);
how could be declared a fixed length ansi string and the new oem type?
update: i know it will create a fixed length string. it is part of the design of the software to protect against mistakes, and is essential for the program.
You don't need to define the size of an AnsiString.
The notation
string[3]
is for short strings used by Pascal (and Delphi 1) and it is mostly kept for legacy purposes.
Short strings can be 1 to 255 bytes long. The first ("hidden") byte contains the length.
AnsiString is a pointer to a character buffer (0 terminated). It has some internal magic like reference counting. And you can safely add characters to an existing string because the compiler will handle all the nasty details for you.
UnicodeStrings are like AnsiStrings, but with unicode chars (2 bytes in this case). The default string now (Delphi 2009) maps to UnicodeString.
the type AnsiString has a construct to add a codepage (used to define the characters above 127) hence the CP_OEMCP:
OemString = Type AnsiString(CP_OEMCP);
"Short Strings" are "Ansi" String, because there are only available for backward compatibility of pre-Delphi code.
st: string[3];
will always create a fixed-length "short string" with the current Ansi Code Page / Char Set, since Delphi 2009.
But such short strings are NOT the same than so called AnsiString. There is not code page for short strings. As there is no reference-count for short strings.
The code page exists only for AnsiString type, which are not fixed-length, but variable-length, and reference counted, so a completely diverse type than a short string defined by string[...].
You can't just mix Short String and AnsiString type declaration, by design. Both are called 'strings' but are diverse types.
Here is the mapping of a Short String
st[0] = length(st)
st[1] = 1st char (if any) in st
st[2] = 2nd char (if any) in st
st[3] = 3rd (if any) in st
Here is the memory mapping of an AnsiString or UnicodeString type:
st = nil if st=''
st = PAnsiChar if st<>''
and here is the PSt: PAnsiChar layout:
PWord(PSt-12)^ = code page
PWord(PSt-10)^ = reference count
PInteger(PSt-8)^ = reference count
PInteger(PSt-4)^ = length(st) in AnsiChar or UnicodeChar count
PAnsiChar(PSt) / PWideChar(PSt) = Ansi or Unicode text stored in st, finished by a #0 char (AnsiChar or UnicodeChar)
So if there is some similarities between AnsiString and UnicodeString type, the short string type is totally diverse, and can't be mixed as you wished.
That would only be usefull when String[3] in unicode versions of Delphi defaults to 3 WideChars. That would supprise me, but in case it is, use:
st: array[1..3] of AnsiChar;
The size of an ansistring and unicodestring will grow dynamically. The compiler and runtime code handle all this stuff for you.
See: http://delphi.about.com/od/beginners/l/aa071800a.htm
For a more in depth explanation see: http://www.codexterity.com/delphistrings.htm
The length can be anything from 1 char to 2GB.
But the old ShortString type, the newer string types in Delphi are dynamic. They grow and shrink as needed. You can preallocate a string to a given length calling SetLength(), useful to avoid re-allocating memory if you have to add data piece by piece to a string you know the final length anyway, but even after that the string can still grow and shrink when data are added or deleted.
If you need static strings you can use array[0..n] of chars, whose size won't change dynamically.