Azure Logic App how to read files from ADLS using HTTP GET - azure-logic-apps

I am trying to use an HTTP get to iterate through a folder in ADLS. The folder "
TemplatesUpdated" has a bunch of subfolders that have a few files in each subfolder. I want to iterate through each subfolder and then copy each file to a new location. This is what I have so far, but I am not sure what to put in the Body to get each subfolder and each item within the subfolders.

Achieving your requirement using HTTP requires methods in order to iterate or loop through each folder. There are 2 ways of achieving your requirement.
WAY - 1: USING LOGIC APPS WITH AZURE BLOB STORAGE CONNECTOR
You will require 2 List Blob actions so that one will get the subfolders in TemplatesUpdated and the other List Blob retrieves files n the subfolder. below is the flow of my logic app
RESULT:
WAY - 2: USING SDK
from azure.storage.blob import BlockBlobService
ACCOUNT_NAME = "<STORAGE ACCOUNT NAME>"
SAS_TOKEN='<SAS TOKEN>'
blob_service = BlockBlobService(account_name=ACCOUNT_NAME,account_key=None,sas_token=SAS_TOKEN)
containers = blob_service.list_containers()
for c in containers:
generator = blob_service.list_blobs(c.name)
for blob in generator:
print("\t Blob name: "+c.name+'/'+ blob.name)
RESULTS:

Related

Display a 1 GB video file in react js frontend, stored in private S3 bucket using flask

I need to display/stream large video files in reactjs. These files are being uploaded to private s3 bucket by user using react form and flask.
I tried getObject method, but my file size is too large. get a signed url method required me to download the file.
I am new to AWS-python-react setup. What is the best/most efficient/least costly approach to display large video files in react?
AWS offers other streaming specific services but if you really want to get them off S3 you could retrieve the files using torrent which, with the right client/videoplayer would allow you to start playing them without having to download the whole file.
Since you mentioned you're using Python, you could do this using AWS SDK like so:
import boto3
s3 = boto3.client('s3')
response = client.get_object_torrent(
Bucket='my_bucket',
Key='/some_prefix/my_video.mp4'
)
The response object will have this format:
{
'Body': StreamingBody()
}
Full docs here.
Then you could use something like webtorrent to stream it on the frontend.
Two things to note about this approach (quoting docs):
Amazon S3 does not support the BitTorrent protocol in AWS Regions launched after May 30, 2016.
You can only get a torrent file for objects that are less than 5 GBs in size.

Fetch thousand of files from S3/minio with a single page webapp (no server)

I'm developing a single page app for image annotation. Each .jpg file is stored on S3/minIO services, coupled with a .xml file (Pascal VOC notation), which describes the coordinates and positions for each annotation associated to the image.
I'd like to fetch all the xml data, to be able filtering my image results within the webapp project (based upon ReactJS). But thousand of request to an S3 server directly from a web app seems a bit odd to me; nevertheless, I would prefer avoid using any "middleware" servers (like python/flask or nodejs), relying on the ReactJS app.
I've not been able to find any workaround to download all the xml files content with a single ajax call; do you have some idea to address this kind of issue?
The S3 API doesn't provide an API to fetch multiple files in a single operation. As you have suggested in your question, your application will need to handle this logic by first getting a list of the objects then iterating through that list.
Alternatively, if you can consider storing the xml files as a single archive.

How to Download all images(.jpg) files from a aws s3 bucket

I am trying to download the images stored in a aws s3 folder inside a bucket and display the images in my frontend. Problem is I am able to download 1 image at a time. I want to download all the images at one go and then display in my react UI.I am using Springboot in my backend. Below is my code.
public byte[] downloadUserProfileImage(int userProfileId) {
String path = String.format("%s/%s",
BucketName.PROFILE_IMAGE.getBucketName(),
userProfileId);
String filename = "profile_image.jpg";
return fileStore.download(path, filename);
}
I have not used Springboot with aws. But in python I have done this so in Java/springboot the syntax will only change.
You need to loop through all the files in S3, get the keys for these files and there is a function of s3.download_file(...).
To loop through the files use a Paginator - Check the documentation.

google appengine searching buckets to find a particular "content_type" text/csv

I have multiple buckets and i would like to find a the buckets that store the csv files. I do not know how to search buckets to find what i need. Is there a method to query the buckets to only find content type "text/csv." Ultimately i am attempting to find the csv files blobkey that begins with "encoded_gs_file:" Also, what is the relationship between the datastore and storage?
The blobstore viewer that i am running in localhost only shows the encoded_gs_file for images. But i know that there should be a encoded_gs_file for the csv files.
When i visit the following url:
http://localhost:8000/datastore?kind=__GsFileInfo__
i can see the csv file type, but when i go to this url:
http://localhost:8000/datastore?kind=__BlobInfo__
the csv file does not appear. I think if i can get the csv file to appear in the ____blobInfo____ endpoint, then i can download it
There is not a specific method to search objects into a bucket, but what you can do is to search using different API methods for example using the JSON API:
1.List all the buckets on your project. https://cloud.google.com/storage/docs/json_api/v1/buckets/list?apix_params=%7B%22project%22%3A%22edp44591%22%7D
2.Then, having the list of buckets you can list all the object in each one
https://cloud.google.com/storage/docs/json_api/v1/objects/list
3.Once you have the list of objects inside the bucket you can filter with you preferred programming language.
Basically you can do the same with the XML API here is the reference to it:
https://cloud.google.com/storage/docs/xml-api/reference-methods
Or using the gsutil tool:
gsutil list :to list all the bucket on your project: https://cloud.google.com/storage/docs/listing-buckets
gsutil ls -r gs://[BUCKET_NAME]/** : to list all the objects inside your project.
https://cloud.google.com/storage/docs/listing-objects
If you want to see examples about how to use the API with different code-languages go to the document Cloud Storage Client Libraries https://cloud.google.com/storage/docs/reference/libraries#client-libraries-install-nodejs

how to create directory structure on CREATE BLOB action

Is it possible to check for the presence of a specific directory structure and create it if it doesnt exist?
I'm creating a blob like so:
When the directory doesn't exist, I am getting:
How do we create the directory structure when it doesnt exist?
Firstly Azure Blob doesn't support folder, it's a simulate directory. You may specify a character or string delimiter within a blob name to create a virtual hierarchy (e.g., the forward slash /). You could refer to this link.
So you don't need to check for the presence of a specific directory, just name the blob with directory then it will create the folder and the blobs with same directory prefix will be classified in a same folder.
Just name the blob with directory prefix like this foldername/blobname.
As far as I know, there isn't an action in logic app to create a container. So I think you can add a function in your logic app to create the container, then create the blob. I post the screenshot below:
Create azure function(here I use v1 runtime version)
Then add this function to your logic app and provide the request body
Create the blob and fill in the "Specify folder path to upload" box with /azure because the request body above is "name":"azure".
Run this logic app, it will create a container named "azure" and create the blob named "testblob".

Resources