I am allowing the user to choose any username he wants and it can be anything at all such as
AC♀¿!$"Man'#
Now i need to create a directory for him. What function i use to escape the text so i dont a FS problem/exception?
Whether you replace invalid characters or remove them, there's always going to be the possibility of collisions. If I were you, I'd have a separate primary key for the user (a GUID perhaps) and use that for the directory name. That way you can have your user names anything you'd like without worrying about invalid directory names.
Depending on if your characters are ASCII/Unicode, you can use the byte/character values as a replacement and use some character to mark these replaced characters (like an underscore), giving you something like _001_255_200ValidPart_095_253. Note that you have to replace the marking characters, too (_095 in the example, 95 is the ASCII code for the underscore).
Use Path.GetInvalidFileNameChars or Path.GetInvalidPathChars to check for characters to remove.
http://msdn.microsoft.com/en-us/library/system.io.path.getinvalidfilenamechars.aspx
I think your best bet would be to create a Dictionary that maps invalid filesystem characters to a valid replacement string. Then scan the string for invalid characters, replacing with valid strings as you go. This would allow the user to pick anything they want and give you the consistency to translate it back into the user's name if you want.
You're not going to find a way to escape the username that will give a valid non-clashing directory name in every instance.
A more reliable approach will be to create the directory using some arbitary convention, and then store a mapping between the two. This also provides support for the case where your user wants the ability to change name.
Check out Question #663520 for more on this.
does user need to know the exact name of his / her directory ? If not, why not create directories using arbitrary rules and associate each one to his owner in a DB or something ?
Related
A customer asked to create a custom character mapper function from specific names to ASCII in their SQL database.
Here is a simplified fragment that works (shortened for brevity):
select TRANSLATE(N'àáâãäåāąæậạả',
N'àáâãäåāąæậạả',
N'aaaaaaaaaaaa');
While analyzing the results on customer's dataset, I noticed one more unmapped symbol ă. So I added it to the mapper as follows:
select TRANSLATE(N'àáâãäåāąæậạảă',
N'àáâãäåāąæậạảă',
N'aaaaaaaaaaaaa');
Unexpectedly, it started failing with the message:
The second and third arguments of the TRANSLATE built-in function must contain an equal number of characters.
Obviously, TRANSLATE thinks that ă is special and consists of more than one character. Actually, even Notepad thinks the same (copy ă and try to delete it using Backspace key - something unusual will happen. Delete key works normally, though).
Then I thought - if TRANSLATE considers it a two-char symbol, let's add a two char mapping then:
select TRANSLATE(N'àáâãäåāąæậạảă',
N'àáâãäåāąæậạảă',
N'aaaaaaaaaaaaaa');
No errors this time, yay. But the input string was not processed correctly, ă was not replaced with a.
What is the correct (case-sensitive) way to replace such "double symbols"? Can it be done using TRANSLATE at all? I don't want to add a bunch of REPLACE for every such symbol I find.
I wonder if I have a user that has name in non latin character set (eg. Юрий Гагарин).
What is recommendation to store name - as transliterated representation (Jurij Gagarin)?
And is there any field that can store original name?
I've checked list of user attributes (http://www.kouti.com/tables/userattributes.htm) but I haven't found anything that can fit.
You won't find an attribute that was created for exactly that purpose, but there are several single-value string attributes that are often unused. You can just use one of those, as long as no one else in your organization is not using it for something else.
One attribute that is usually unused, but actually kind of makes sense to use for this is adminDisplayName.
Otherwise, you could just create your own attribute. You can add attributes to your schema, but I think that's a bit overkill just for a plain-text attribute.
The FM EPS2_GET_DIRECTORY_LISTING has a parameter file_mask which I guess that it should act as a pattern. I need to read from the AS the files containing a word but the file_mask is working faultly. For example if I pass "*ZIP" it returns a file named '.TXT'. Is there a proper way to use that parameter?
The parameters are described in SAP note 1860206 which I will not quote here because I'm not sure about the copyright status. However, wildcards generally do not work as expected in this case - your best bet is to read without the parameter and filter the table afterwards.
I had similar problem but due to poor(eg. * wildcard can be used only at the and of the file mask string :/ ) implementation of standard FILE_MASK-based filtering feature in EPS2_GET_DIRECTORY_LISTING I ended up with the solution where I read entire directory content and then process it with regular expressions to find matching files/directories.
I've been assigned the task of creating a table that stores an email signature for each username. The question is, how should I store the signature block? I could use a regular varchar type, but then how do I store the formatting metadata?
Any ideas or suggestions would be welcome.
Thanks!
Another idea I had was that you could design a specific email signature template, and then let people specify fields, such as Username, quote, avatar, alignment etc, and then have them modify their signature in a "signature editor". This way you could just store the "data" and not the rendering. so you could store something like follows:
<signature>
<username>chama</username>
<avatar href="http://url to my image"/>
<quote>A bird in the hand is not in the nest</quote>
</signature>
and it could look something like:
Chama
A bird in the hand is not in the nest
use varchar(max), or whatever length limit is appropriate.
otherwise, the only real concern is that you might want to make sure the html is html-encoded before you stick it in the database. (i.e., replace < with <, etc.) Not sure what you're using, but some tools have a setting so you don't have to do it manually.
other things you can do besides / in addition to html-encoding
1) restrict the formatting tags to some pre-defined set (i.e., search/replace tags you don't want before doing the insert. You can manage this in your db stored procedure, or better yet, in your front-end (if you have control over that).
2) disqualify attempts to insert data if they include certain tags (like '<script>', etc.)
HTML, RTF, XML. The stanard choices are multiple.
Note: "email signature" is NOT "digital signature". The term digital signature has a specific meaning and means a SIGNATURE to make sure - for email - it comes from th real sender and has not been tampered with.
I'd suggest going with your initial thought -- varchar(max). This will allow you to store signatures that are ASCII based. This includes plaintext, RTF or HTML signatures.
If users want to embed images (i.e. not a link to an image), then you'd have to determine a way for the caller to convert those images to Base64 or other before storing and after reading from your table.
Based on what I'm finding, you have basically two options:
1) Convert your formatted signature data to Binary and store it as a BLOB.
2) Instead of saving the signature itself in the DB, save them as files somewhere and store a reference to that file location in the DB.
As expected, I get an error when entering some characters not included in my database collation:
(1267, "Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='")
Is there any function I could use to make sure a string only contains characters existing in my database collation?
thanks
You can use a regular expression to only allow certain characters. The following allows only letters, numbers and _(underscore), but you can change to include whatever you want:
import re
exp = '^[A-Za-z0-9_]+$'
re.match(exp, my_string)
If an object is returned a match is found, if no return value, invalid string.
I'd look at Python's unicode.translate() and codec.encode() functions. Both of these would allow more elegant handling of non-legal input characters, and IIRC, translate() has been shown to be faster than a regexp for similar use-cases (should be easy to google the findings).
From Python's docs:
"For Unicode objects, the translate() method does not accept the optional deletechars argument. Instead, it returns a copy of the s where all characters have been mapped through the given translation table which must be a mapping of Unicode ordinals to Unicode ordinals, Unicode strings or None. Unmapped characters are left untouched. Characters mapped to None are deleted. Note, a more flexible approach is to create a custom character mapping codec using the codecs module (see encodings.cp1251 for an example)."
http://docs.python.org/library/stdtypes.html
http://docs.python.org/library/codecs.html