Uploaded file integrity check with Azure Datalake Gen1 - md5

WE are uploading a lot of files to Azure Datalake Gen1.
and one question came up that how do we check if the file is not corrupted or hampered during upload.
how do i check the file i uploaded is the same file that has been added into datalake?
are there any libraries to do this integrity chek?
or is ADLUploader provides any thing for this?
is there any other way to achieve the same thing?

Related

Unable to see any columns in table after running AWS Glue crawler

I am relatively new to AWS Glue, but after creating my crawler and running it successfully, I can see that a new table has been created but I can't see any columns in that table. It is absolutely blank.
I am using a .csv file from a S3 bucket as my data source.
Is your file UTF8 encoded... Glue has a problem if it’s not.
Does your file have at least 2 records
Does the file have more than one column.
There are various factors that impact the crawler from identifying a csv file
Please refer to this documentation that talks about the built in classifier and what it needs to crawl a csv file properly
https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html

Preventing batch file formats in GPO

Is it possible to disable AD users from saving a text file as a .bat format via GPO? If yes, how is it doable?
Thanks in advance
Since you didn't specific where you wanted to prevent the saving of files, I will give a general answer.
You can users prevent from saving files with a specific extension (file screening) on network shares using something called File Server Resource Manager.
You can prevent users from executing or running files having specific extensions through a GPO.
One cannot prevent a user from saving a file with a specific extension to their own workstation.
So, depending on the Microsoft technology you are using, and where the file is being saved, the overall answer is it depends.

Batch Processing Design Patterns

A partner who cannot support a real-time web service interface must SFTP CSV files to my linux environment.
The file is zipped and encrypted. The sftp server is a different virtual server than the one that will process the CSV data into my application's database.
I don't need help with the technical steps (bash script, etc) but I'm looking for file management conventions that assist with the following requirements:
Good auditabilty
Non-destructive
Recoverable
Basically I'm trying to figure out when it makes sense to make copies of the file, when to rename it to indicate some process step has been completed to a file, etc. (e.g. Do I keep the zip files or do I delete them once unzipped?)
There is going to be personal preference in the response, but I'm looking for that; to learn from someone who has more experience working with this type of interface. This seems better than me inventing something myself.
If the files are encrytped upon the network and within the files settings, then it cannot be successfully transmitted across unless the file is parsed within another file. You could try to make the sftp server foward the file onto a seperate machine,but this would only cause more issues because of the encryption type based on the files.

grails file upload

Hey. I need to upload some files (images/pdf/pp) to my SQLS Database and thereafter, download it again. I'm not sure what is the best solution - store it as bytes, or store it as file (not sure if possible). I need later to databind multiple domain classes together with that file upload.
Any help would be very much apreciated,
JM
saving files in the file system or in the DB is a general question which is asked here several times.
check this: Store images(jpg,gif,png) in filesystem or DB?
I recommend to save the files in the file system and just save the path in the DB.
(if you want to work with google app-engine though you have to save the file as byte array in the DB as saving files in the file system is not possible with google app-engine)
To upload file with grails check this: http://www.grails.org/Controllers+-+File+Uploads

Best methodology for file uploading (how does YouTube do it?)

What is the best method to follow for uploading files to server using my website? Each user will upload some files to his profile. Should I place all files in a single directory or do I need to create folders for each user and keep the files there?
How does YouTube do this? How should I store the uploaded info in database? What's the most efficient approach to do this when handling large number of users and files? I don't want to know about the usage of API. I want to know the best approach for file organization.
I would have a folder for each user, its more managable!
I wouldn't fancy a massive folder with loads of files!
You could end up with duplicate file names and end up deleting another users files!
I would create a directory for every user!

Resources