Let's say I have imported a large number of audio files in S3. I would need to map my audio files metadata (including artist, track name, duration, release date, ...) to a DynamoDB table in order to query them using a GraphQL API in a react app. However, I can't yet figure out how to extract these metadata to be mapped in DynamoDB.
In the DynamoDB developer guide, it is mentioned (p.914) that the S3 object identifier can be stored in the DynamoDB item.
It is also mentioned that S3 object metadata support can provide a link back to the parent item in DynamoDB (by storing the primary key value of the table item as the S3 metadata).
However, the process is not really detailed; the closest approach I found is from J. Beswick who uses a lambda function to load a large amount of data from a JSON file stored in an S3 bucket.
(https://www.youtube.com/watch?v=f0sE_dNrimU&feature=emb_logo).
S3 object metadata is something different from audio metadata.
Think this way: everything that you put on S3 is a object. This object has a key (name) and some metadata attached to it by default by S3 and another metadata that you can attach to it. All of these things are explained here.
Audio files metadata are a different thing. They are inside the file (let's suppose that it is a mp3 file). To access this data you need to read the file using a api that knows the file format and how to extract the data.
When you upload the file to s3 it does not extract any kind of data and attach it to your object metadata (artist, track number, etc from mp3 files). You need to do it by yourself.
A suggested solution would be: for every file that you upload to s3, the upload triggers a lambda function that knows how to extract the audio metadata from the file. It will then extract this metadata and save it on DynamoDB together with the name of the object in s3. After that you can query your table with the search that you planned for and after found the record, point to the correct object in s3.
In that suggestion you can also run it for all objects already existent in the s3 bucket to avoid requiring new upload.
Related
We have our data files as JSON on GCP Cloud Storage.
Which of the below 2 approach is the ideal/efficient way to load it to snowflake existing table
Use GCS as Named External Stage
Use GCS as External Location to load data
If (1), then should we go for Calling Snowpipe REST Endpoints to Load Data ?
The "efficiency" is pretty much the same for either method, but I'd strongly recommend going the route of Auto Ingest Snowpipe, as outlined in this link:
https://docs.snowflake.com/en/user-guide/data-load-snowpipe-auto-gcs.html
This works really well and allows for a "set it and forget it" type of project.
I'm developing a single page app for image annotation. Each .jpg file is stored on S3/minIO services, coupled with a .xml file (Pascal VOC notation), which describes the coordinates and positions for each annotation associated to the image.
I'd like to fetch all the xml data, to be able filtering my image results within the webapp project (based upon ReactJS). But thousand of request to an S3 server directly from a web app seems a bit odd to me; nevertheless, I would prefer avoid using any "middleware" servers (like python/flask or nodejs), relying on the ReactJS app.
I've not been able to find any workaround to download all the xml files content with a single ajax call; do you have some idea to address this kind of issue?
The S3 API doesn't provide an API to fetch multiple files in a single operation. As you have suggested in your question, your application will need to handle this logic by first getting a list of the objects then iterating through that list.
Alternatively, if you can consider storing the xml files as a single archive.
I have multiple buckets and i would like to find a the buckets that store the csv files. I do not know how to search buckets to find what i need. Is there a method to query the buckets to only find content type "text/csv." Ultimately i am attempting to find the csv files blobkey that begins with "encoded_gs_file:" Also, what is the relationship between the datastore and storage?
The blobstore viewer that i am running in localhost only shows the encoded_gs_file for images. But i know that there should be a encoded_gs_file for the csv files.
When i visit the following url:
http://localhost:8000/datastore?kind=__GsFileInfo__
i can see the csv file type, but when i go to this url:
http://localhost:8000/datastore?kind=__BlobInfo__
the csv file does not appear. I think if i can get the csv file to appear in the ____blobInfo____ endpoint, then i can download it
There is not a specific method to search objects into a bucket, but what you can do is to search using different API methods for example using the JSON API:
1.List all the buckets on your project. https://cloud.google.com/storage/docs/json_api/v1/buckets/list?apix_params=%7B%22project%22%3A%22edp44591%22%7D
2.Then, having the list of buckets you can list all the object in each one
https://cloud.google.com/storage/docs/json_api/v1/objects/list
3.Once you have the list of objects inside the bucket you can filter with you preferred programming language.
Basically you can do the same with the XML API here is the reference to it:
https://cloud.google.com/storage/docs/xml-api/reference-methods
Or using the gsutil tool:
gsutil list :to list all the bucket on your project: https://cloud.google.com/storage/docs/listing-buckets
gsutil ls -r gs://[BUCKET_NAME]/** : to list all the objects inside your project.
https://cloud.google.com/storage/docs/listing-objects
If you want to see examples about how to use the API with different code-languages go to the document Cloud Storage Client Libraries https://cloud.google.com/storage/docs/reference/libraries#client-libraries-install-nodejs
When you ask Kloudless to retrieve the files from an account, using: GET /v0/accounts/{account_id}/folders/{id}/contents/, it only lists the actual files, there are no thumbnail files.
So you cannot use the get files contents:GET /v0/accounts/{account_id}/files/{id}/contents/
because it needs a specific file id for the thumbnail file, but you don't get that because none are listed in the preview call.
So how do you retrieve thumbnails for the files?
2016-09 Update: A thumbnails endpoint (docs) is now available for select services. The prior SO answer has been preserved below as it describes the File Download endpoint which is valuable to obtain the file contents for services that do not yet support obtaining thumbnails for.
At the current time the Kloudless API does not support returning thumbnails for
files stored in users' cloud storage accounts.
The request that you are making:
GET /v0/accounts/{account_id}/files/{id}/contents/
is a download request which fetches the full contents of the file.
The file ID can be obtained from the objects listed in the
children request which you referenced before:
GET /v0/accounts/{accounts_id}/folders/{id}/contents/
This will return a list of file/folder objects which have the ID of the
resource as well as other metadata. The ID in the returned file objects can be
used in the download request to fetch the contents of the file.
I'm trying to upload to GCS using the Blobstore. I have set the GCS bucket name while generating the upload url, and the file gets uploaded successfully.
In the upload handler, blobInfo.getFilename() returns the right file name. But the file actually got saved in the GCS bucket in some different file name. Each time, the file name is some random hash like this one:
L2FwcGhvc3RpbmdfcHJvZC9ibG9icy9BRW5CMlVvbi1XNFEyWEJkNGlKZHNZRlJvTC0wZGlXVS13WTF2c0g0LXdzcEVkaUNEbEEyc3daS3Vham1MVlZzNXlCSk05ZnpKc1RudDJpajF1TmxwdWhTd2VySVFLdUw3US56ZXFHTEZSLVoxT3lablBI
Is this how it will work? Is this an anomaly?
I store the file name to the datastore based on the data returned from blobInfo.getFilename(), which is the correct value of file name. But I'm unable to access the file using the GcsFilename since the file is stored in GCS with that random hash as file name.
Any pointers would be greatly helpful.
Thanks!
PS: The blobstore page says that BlobInfo is currently not available for GCS objects. But BlobInfo.getFilename returns the right value for me. Is that something wrong from my end?
It's how it works, see https://cloud.google.com/appengine/docs/python/blobstore/fileinfoclas ...:
FileInfo metadata is not persisted to datastore [...] You must save
the gs_object_name yourself in your upload handler or this data will
be lost
I personally recommend that new applications use https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/ directly, rather than the blobstore emulation on top of it.
The latter is currently provided essentially only for (limited, partial) backwards compatibility: it's not really all that suitable for new applications.