In Drupal, is there a way to index files (pdf, doc) that were submitted via a Webform? - file

I'm trying to figure out a solution on how to be able to index/search PDF, doc, and maybe txt files that were uploaded via a webform. I've found a module (Search API attachments) that will index files but it appears that it only indexes files that are attached to nodes. :(
Our client wants to be able to search the contents of resumés that are submitted from a webform.

If your clients are expecting hundreds of nodes, it might be worthwhile to set up an Apache Solr. Then you can use Tika to index all kinds of files: http://tika.apache.org/
If that's not an option, you can write a custom module that uses the Webform API that saves the attached file as a node... and then use your Search API attachments module.

Related

Fetch thousand of files from S3/minio with a single page webapp (no server)

I'm developing a single page app for image annotation. Each .jpg file is stored on S3/minIO services, coupled with a .xml file (Pascal VOC notation), which describes the coordinates and positions for each annotation associated to the image.
I'd like to fetch all the xml data, to be able filtering my image results within the webapp project (based upon ReactJS). But thousand of request to an S3 server directly from a web app seems a bit odd to me; nevertheless, I would prefer avoid using any "middleware" servers (like python/flask or nodejs), relying on the ReactJS app.
I've not been able to find any workaround to download all the xml files content with a single ajax call; do you have some idea to address this kind of issue?
The S3 API doesn't provide an API to fetch multiple files in a single operation. As you have suggested in your question, your application will need to handle this logic by first getting a list of the objects then iterating through that list.
Alternatively, if you can consider storing the xml files as a single archive.

google appengine searching buckets to find a particular "content_type" text/csv

I have multiple buckets and i would like to find a the buckets that store the csv files. I do not know how to search buckets to find what i need. Is there a method to query the buckets to only find content type "text/csv." Ultimately i am attempting to find the csv files blobkey that begins with "encoded_gs_file:" Also, what is the relationship between the datastore and storage?
The blobstore viewer that i am running in localhost only shows the encoded_gs_file for images. But i know that there should be a encoded_gs_file for the csv files.
When i visit the following url:
http://localhost:8000/datastore?kind=__GsFileInfo__
i can see the csv file type, but when i go to this url:
http://localhost:8000/datastore?kind=__BlobInfo__
the csv file does not appear. I think if i can get the csv file to appear in the ____blobInfo____ endpoint, then i can download it
There is not a specific method to search objects into a bucket, but what you can do is to search using different API methods for example using the JSON API:
1.List all the buckets on your project. https://cloud.google.com/storage/docs/json_api/v1/buckets/list?apix_params=%7B%22project%22%3A%22edp44591%22%7D
2.Then, having the list of buckets you can list all the object in each one
https://cloud.google.com/storage/docs/json_api/v1/objects/list
3.Once you have the list of objects inside the bucket you can filter with you preferred programming language.
Basically you can do the same with the XML API here is the reference to it:
https://cloud.google.com/storage/docs/xml-api/reference-methods
Or using the gsutil tool:
gsutil list :to list all the bucket on your project: https://cloud.google.com/storage/docs/listing-buckets
gsutil ls -r gs://[BUCKET_NAME]/** : to list all the objects inside your project.
https://cloud.google.com/storage/docs/listing-objects
If you want to see examples about how to use the API with different code-languages go to the document Cloud Storage Client Libraries https://cloud.google.com/storage/docs/reference/libraries#client-libraries-install-nodejs

How do you retrieve thumbnails from the cloud accounts?

When you ask Kloudless to retrieve the files from an account, using: GET /v0/accounts/{account_id}/folders/{id}/contents/, it only lists the actual files, there are no thumbnail files.
So you cannot use the get files contents:GET /v0/accounts/{account_id}/files/{id}/contents/
because it needs a specific file id for the thumbnail file, but you don't get that because none are listed in the preview call.
So how do you retrieve thumbnails for the files?
2016-09 Update: A thumbnails endpoint (docs) is now available for select services. The prior SO answer has been preserved below as it describes the File Download endpoint which is valuable to obtain the file contents for services that do not yet support obtaining thumbnails for.
At the current time the Kloudless API does not support returning thumbnails for
files stored in users' cloud storage accounts.
The request that you are making:
GET /v0/accounts/{account_id}/files/{id}/contents/
is a download request which fetches the full contents of the file.
The file ID can be obtained from the objects listed in the
children request which you referenced before:
GET /v0/accounts/{accounts_id}/folders/{id}/contents/
This will return a list of file/folder objects which have the ID of the
resource as well as other metadata. The ID in the returned file objects can be
used in the download request to fetch the contents of the file.

How do file objects get stored uploaded from a website

What design is used to store file objects that gets loaded from a website. For instance if I have a website that accepts documents or images. So
Use Case 1.
Users logs in and selects a MS word file on his machine and uploads to the website.
Use Case 2.
User logs in and selects a image on his machine and uploads to the website.
How do I store these file objects in the database
The first step is just getting the file from the AngularJS application to the server. This page talks about sending request to the server from the client and should get you started.
Once you have done that, (assuming you are using PHP) you will need to save the resulting file to the database. This post will get you started with saving files to PostgreSQL, but the details will end up being very specific to your situation:
If you have more questions after reading through those resources please add specific details about your setup to your question.

Uploading multiple files to blobstore (redux)

Yes, I've seen this question already, but I'm finding information that contradicts its accepted answer and Nick Johnson's blog on the GAE docs.
The docs talk about uploading more than one file at the same time - the function to get uploaded files returns a list:
The get_uploads() method returns a
list of BlobInfo objects, one for each
uploaded file in the request.
But everywhere I've looked, the going assumption is that only one file a time can be uploaded, and a new upload url needs to be created each time.
Is it even possible to upload more than one file at the same time using HTML5/Flash using Plupload?
Currently, the blobstore service upload URLs only support one file upload per post. In order to upload multiple files, you need to use the pattern documented in my blog posts. In future, we may extend the blobstore API to support more flexible upload URLs, supporting multiple uploaded files in a single request.
Edit: The blobstore now supports multiple file uploads in a single request.
Here's how I use the get_uploads() method for more than one file:
blob_info = self.get_uploads()[0]
blob_info2 = self.get_uploads()[1]
Nick Johnson's dropbox service is another example and I hope you find what suits your needs.

Resources