Model Monitor Capture data - EndpointOutput Encoding is BASE64 - amazon-sagemaker

https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-data-capture-endpoint.html
I have followed the steps mentioned in this link and it appears I cannot change the encoding for EndpointOutput in datacapture file. It's coming BASE64 for xgboost model. I am using latest version 1.2.3.
For monitor scheduler it required both EndpointOutput and EndpointInput to have the same encoding. My EndpointInput is CSV but EndpointOutput is coming to be BASE64 and nothing can change it.
This is causing issue while run of analyzer. After baseline is generated and data is captured, when monitoring schedule runs the analyzer it throws error of Encoding mismatch. For it to run EndpointOutput and EndpointInput should have same encoding.
I saw we cannot do anything to change the encoding of output. I used LightGBM, CatBoost algorithms also and found for these EndpointOuput encoding is JSON, which is readable but still not solving the purpose.
Is there a way we can change EndpointOutput Encoding for DataCapture.
I have used option of deserializer in predictor, used both JSONDeserializer and CSVDeserializer, still I kept getting BASE64 with xgboost and JSON encoding format with LightGBM and CatBoost algorithm.

Related

converting .trc 2 ascii

I have accidentally saved some data using the oracle TRACE file format. I have been trying to convert these data files into normal text (ASCII) file formats for a day now without much success. Can someone help me into the right direction? I would prefer to use linux but do also have access to a windows machine. I could upload an example file as well. The files come from a RIGOL scope and using "vim" to peer into them gives something along the lines of: "sc8^#DS1104Z^#^#^#^#^#^#^#^#^#^#^#^#^#00.04.01.SP2^#^#^#^#^#^#^#^#^A^#^E^#^[s`¢^#^#^#^#<8c><9c>^#^#^L^#^B^#^#É^C^#ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ..."

Decoding audio from non-file source with Microsoft Media Foundation

My question is basically that I am new to this framework and I am looking at pointers to how to work with non-file sources in media foundation since the documentation in this front seems lacking in my mind. Below is some info on what I am doing and what approach I am working with right now but I have no idea if it is the correct way to use the framework.
I am currently trying to use Microsoft Media Foundation to decode audio that I'm getting over Bluetooth and then send it along as PCM audio. When looking at the documentation for ms media foundation it seems that almost all examples assume the source is a file.
Looking at the tutorial for decoding audio for example they use MFCreateSourceReaderFromURL, which I cannot use since my source is not a file.
As I wanted to follow the tutorial and change as little as possible Im thinking that I need only change how I create the source reader and the rest of the process would be the same. I looked at the other SourceReaders available and MFCreateSourceReaderFromByteStream sounds about right for my purposes.
Is there a chance that I only need to create a bytestream and continuously fill it with data that I get over the air as we go and the media source created by MFCreateSourceReaderFromByteStream handle this well? Or do I need to create a custom media source and do more manual work at the lower parts of the API to get something like this to work?
Or maybe a source reader is the wrong approach altogether when the source is not a file? In the main page about Source Reader here they have the following picture:
And this picture shows the media source within the source reader pointing to a source file only, is this a real limitation or simply and example?
Im writing this in plain c, but pointing to c++ documentation or examples is fine as its usually pretty straightforward to translate c++ to c and there seems to be no documentation for c anyways.
Edit:
Im adding a image on what kind of data Im getting, the red area being the chunks of data I refer to in comments below Source.
Non-file source is not a accurate description. Does it have a file structure, just not a file? Structured differently? Raw stream?
If you look at samples with source reader, they assume presence and usage of stream handler capable to parse incoming stream into elementary streams with known type and properties. Then you or Media Foundation could apply decoder or otherwise transform the data.
As you specified that the data come "in chunks", most likely that you are interested in an alternate option to use AAC Decoder explicitly. You can create an instance of it, initialize input and output types, then feed it with compressed audio and pull decoded PCM on the output. The decoder has MFT interface.

Urlencoding weird characters in another charset

I'm learning C, and I'm using libcURL to send a POST request to log into a website.
I'm stumbling upon a problem, my password contains the ü character.
Reading the POST request from my browser, I can see it gets encoded as %FC.
However, when using curl_easy_escape() to encode it, it encodes as %C3%BC.
I've gone to search, and found out it's a different encoding. I think ISO, because the page has this meta: <meta content="text/html; charset=ISO-8859-1" http-equiv="content-type">
However, I can't figure out how to convert it somehow.
Now, how would I go about urlencoding üas %FC?
Use of non-UTF-8 encodings for POST is an utter mess and the behavior actually varied quite a bit between browsers, so it's considered very bad practice to do this. But since you're stuck with a site that does, you'll have to work around it.
I can't find a curl api for doing percent encoding with alternate charsets so you might have to do it yourself (first use iconv to convert from your system's native encoding, hopefully UTF-8, to ISO-8859-1 (Latin-1), then do the percent encoding manually).
One idea - are you sure you even should be doing your own escaping? My impression is that it's just meant for URLs, and the curl API for POSTing forms may already do escaping internally (not sure), in which case you probably just have to tell it the right content type.

Unreadable characters in a file on a remote server when viewing in a browser

I have to work with a text file on a remote server. The file can be accessed by a direct link using any browser, in the form http://server.school.com/files/people.all (not a real link, since the access requires a password). When I view it in Firefox some of characters are unreadable for example: 'José Luis Paniagua Sánchez'. I have a few questions.
Could the issue be caused by incorrect settings of my browser or could there be a problem with the file itself?
Is opening a file in a web browser and copying the entire content to a text editor using copy/paste inherently different from downloading the information with a script? Could it affect the encoding of the data?
Thanks.
Select in the browser the encoding, UTF-8 likely. Firefox: View - Character Encoding. The problem is that the file does not specify the encoding of the file (or specifies a default encoding).
A binary download, like downloading an image file (with which you could try), should keep the file as-is.
Cut-copy-paste using the right encoding in the browser should work for UTF-8.
Assuming it is indeed UTF-8 (multibyte sequences for special chars), and you are working on Windows (which is single-byte), you'll better use a programmer's editor like NotePad++ or JEdit, both free. They can set the encoding explicitly, and even convert.

Mercurial: which config property controls the encoding of file contents?

Is there a dedicated Mercurial configuration property which specifies the encoding of file contents and hence should be used by a Mercurial client to properly display a file?
I've found web.encoding which does not seem to mean exactly what I'm looking for. Also, Google gave some results for ui.encoding as well, but I couldn't find any hints in the reference.
Mercurial is not concerned with the encoding of the files you put in your repository: Mercurial is happy to store files with any encoding (or maybe not particular encoding at all).
This means that you can add files with UTF-8, Latin-1, or any other encoding to your repository and Mercurial will check them out exactly as they were when you added them.
The encoding of each file is not stored anywhere in Mercurial and it is up to the client to recognize the encoding (perhaps based on file content where it makes sense, e.g., for XML files).
For a Mercurial desktop client (as per your comments below) I suggest looking at the file content:
Can you decode it at UTF-16?
Can you decode it as UTF-8?
Are the NUL bytes in the file? Then stop and declare it to be "binary".
Fallback on a Latin-N encoding such as Latin-1 for Western Europe.
The UTF-16 and UTF-8 encodings are nice since they are structured and this makes it possible for you to detect that a file isn't valid UTF-8 encoded, say. The above list is written with a European perspective — you should probably also consult someone with knowledge about Shift JIS and other encodings used in Asia.
In any case, I would only expect a Mercurial client to do a best effort attempt at showing me a file with an encoding other than ASCII.
Some alternative interpretations of your question:
If you're really asking about how to make your files look "correct" when you view them in hgweb, then it's a matter of using a consistent encoding in the repository and setting `web.encoding.
If you're really asking how to ensure that text files get the OS native line ending character on different platforms (\n on Unix, \r\n on Windows) when take a look at the eol extension that comes with Mercurial.
No. Encoding (charset) is property of file in repository

Resources