When adding document to index and sepcial characters like *, #, # error - azure-cognitive-search

The request is invalid. Details: actions : 0: Invalid document key: 'TESTS123*14'. Keys can only contain letters, digits, underscore (_), dash (-), or equal sign (=). If the keys in your source data contain other characters, we recommend encoding them with a URL-safe version of Base64 before uploading them to your index. If that is not an option, you can add the 'allowUnsafeKeys' query string parameter to disable this check.
We use .Net Sdk, how to set the allowUnsafeKeys?
Tried to URL-safe version of Base64 but it stores only encoded content, not the actual content.

Related

Is it ok to use unencoded reserved characters in url?

I have a requirement of adding multiple nested paths in the querystring. For which, im encoding the individual path names and combine those with / (delimiter). If a path contains slash in it, that will be encoded as %2F..However we are not encoding delimiter slash(which is used for spliting the path)
example:
Input1: a->b->c
Input 2: path_with_/_slash->d->e
Output: ?q=a/b/c+path_with_%2F_slash/d/e
Note: im creating querystring manually (not using urlsearchparams, as it encodes all the slashed including the separator)
Is it ok to use unencoded slash (used as separator) in query string?
Will that create any problem in any of the browsers?
Is there a better way to handle this scenario?
If you're manually forming the query string, you must follow the procedure outlined in the URL Standard, section 5.2, "application/x-www-form-urlencoded serializing":
Let output be the empty string.
For each tuple of tuples:
Let name be the result of running percent-encode after encoding with encoding, tuple’s name, the application/x-www-form-urlencoded percent-encode set, and true.
Let value be the result of running percent-encode after encoding with encoding, tuple’s value, the application/x-www-form-urlencoded percent-encode set, and true.
If output is not the empty string, then append U+0026 (&) to output.
Append name, followed by U+003D (=), followed by value, to output.
Return output.
And unless you hand-rolled your own server, any functions/middleware/etc. for working with url queries during route handling will have automatically urldecoded those values for you.

Postgresql Full text search for numbers preceeded by hashsign

The documents that I want to run full text search on contains sequences of a hash sign followed by a series of digits e.g. #12345 #9999. None of the parsers seem to recognize the sequence as a single token.
The blank parser does recognize '#' as a token, so I thought I could use a synonym dictionary to match '#' with 'num' and then use the follows operator e.g. # <-> 1234. However; the blank parser groups all the blank character into one token so the token usually contains a leading space ' #'. I can't make a synonym entry with a leading space (or at least don't know how to).
If I included the english_stem dictionary in the mapping for the blank parser then ' #' is recognized as a lexeme. But so are all the other blank characters which adds too much noise to the generated ts_vector
Short of creating a custom parser is there anyway I can configure the search so that I can use full text search to query explicitly for #0000 patterns?

Passing a pound sign as a value in the query string of a URL, using Angular

I have an input tag and on input it will filter a list of data in a table according to the input value. That value is passed via the query string in the request URL. Typically I get data returned and the table is updated appropriately. However, when searching for the pound sign (#), I am receiving a 500 internal server error. My question is there a known issue with Angular when passing a pound sign in the query string?
To pass reserved characters in URLs, you need to use percent encoding. For #, it's %23.
The wikipedia page for Percent Encoding has a nice lookup table.

SimpleDB HMAC signing

I'm writing a basic client to access the Amazon SimpleDB service and I'm having some trouble understanding the logic behind the signing of the request.
Here is an example request:
https://sdb.amazonaws.com/?Action=PutAttributes
&DomainName=MyDomain
&ItemName=Item123
&Attribute.1.Name=Color&Attribute.1.Value=Blue
&Attribute.2.Name=Size&Attribute.2.Value=Med
&Attribute.3.Name=Price&Attribute.3.Value=0014.99
&Version=2009-04-15
&Timestamp=2010-01-25T15%3A01%3A28-07%3A00
&SignatureVersion=2
&SignatureMethod=HmacSHA256
&AWSAccessKeyId=<Your AWS Access Key ID>
Following is the string to sign.
The message to sign:
GET\n
sdb.amazonaws.com\n
/\n
AWSAccessKeyId=<Your AWS Access Key ID>
&Action=PutAttributes
&Attribute.1.Name=Color
&Attribute.1.Value=Blue
&Attribute.2.Name=Size
&Attribute.2.Value=Med
&Attribute.3.Name=Price
&Attribute.3.Value=0014.99
&DomainName=MyDomain
&ItemName=Item123
&SignatureMethod=HmacSHA256
&SignatureVersion=2
&Timestamp=2010-01-25T15%3A01%3A28-07%3A00
&Version=2009-04-15
Following is the signed request.
https://sdb.amazonaws.com/?Action=PutAttributes
&DomainName=MyDomain
&ItemName=Item123
&Attribute.1.Name=Color&Attribute.1.Value=Blue
&Attribute.2.Name=Size&Attribute.2.Value=Med
&Attribute.3.Name=Price&Attribute.3.Value=0014.99
&Version=2009-04-15
&Timestamp=2010-01-25T15%3A01%3A28-07%3A00
&Signature=<URLEncode(Base64Encode(Signature))>
&SignatureVersion=2
&SignatureMethod=HmacSHA256
&AWSAccessKeyId=<Your AWS Access Key ID>
What I don't get is the message to sign. Why don't I get it? well, the parameter order
is all changed around between the request and the message to sign. It appears in the example that maybe the parameters are ordered alphabetically.
Has anyone played around with SimpleDB to be able to tell me what the logic is behind the message to sign, i.e. parameter order etc. The documentation is not being very specific here.
To answer my own question.
The answer is buried in the documentation. I was right, I'm to sort the parameters first.
http://docs.amazonwebservices.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/index.html?Query_QueryAuth.html
For those reading this question later, below is a quote of the relevant section from the docs. This section seems to have disappeared from the SimpleDB docs but is still present in the SQS docs. It still applies directly to SimpleDB.
A key issue is that you have to properly URL encode all of the HTTP parameter values.
Do not URL encode any of the unreserved characters that RFC 3986
defines.
These unreserved characters are A-Z, a-z, 0-9, hyphen ( - ), underscore ( _ ), period ( . ), and tilde ( ~ ).
Percent encode all other characters with %XY, where X and Y are hex characters 0-9 and uppercase A-F.
Percent encode extended UTF-8 characters in the form %XY%ZA
Percent encode the space character as %20 (and not +, as common encoding schemes do).
A common error involves failure to encode the asterisk character (*) which can appear in both data values and in SelectExpressions.

Any python/django function to check whether a string only contains characters included in my database collation?

As expected, I get an error when entering some characters not included in my database collation:
(1267, "Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='")
Is there any function I could use to make sure a string only contains characters existing in my database collation?
thanks
You can use a regular expression to only allow certain characters. The following allows only letters, numbers and _(underscore), but you can change to include whatever you want:
import re
exp = '^[A-Za-z0-9_]+$'
re.match(exp, my_string)
If an object is returned a match is found, if no return value, invalid string.
I'd look at Python's unicode.translate() and codec.encode() functions. Both of these would allow more elegant handling of non-legal input characters, and IIRC, translate() has been shown to be faster than a regexp for similar use-cases (should be easy to google the findings).
From Python's docs:
"For Unicode objects, the translate() method does not accept the optional deletechars argument. Instead, it returns a copy of the s where all characters have been mapped through the given translation table which must be a mapping of Unicode ordinals to Unicode ordinals, Unicode strings or None. Unmapped characters are left untouched. Characters mapped to None are deleted. Note, a more flexible approach is to create a custom character mapping codec using the codecs module (see encodings.cp1251 for an example)."
http://docs.python.org/library/stdtypes.html
http://docs.python.org/library/codecs.html

Resources