What is the inverse of blobstore.create_gs_key

What is the inverse of blobstore.create_gs_key - google-app-engine

Is there a way to reverse blobstore.create_gs_key("gs/xxxxxxx"), ie to recreate the file id from the returned string.
NB: I need to do that with a blob key (from datastore) that has been already deleted. I need the file name to retrieve the deleted file from Google Storage versioning.

Related

Identifying duplicate files based on data content in SSIS

I get files to a shared location . Every file has different meta ie. file name, date created.
I have to extract the data using SSIS if and only if file content is different than previously processed files.

This should be fairly straight-forward -
Use a ForEach container configured to For Each File setting. Folder name would be the shared location. File Name should be a wildcard (example, *.csv)
Create a table in SQL called LoadedFiles which will hold the names of the files loaded. Note that when you create the ForEach container you would have also created a variable that would hold the file-name. Now in the ForEach container, check if the value in this variable exists already in the LoadedFiles table. If it doesn't, only then load.
I've assumed that all the files have the same metadata (column names and data types). Even if they do not, you can employ the same logic.
Also, if it isn't obvious, for this to work you need to insert a new row into the LoadedFiles table every time you do decide to load a file.
EDIT: It seems same file name does not equate to same content for the OP. In that case, he should just do a MERGE on the SQL table instead of a blind insert.
MERGE on the primary key and IF MATCHED do nothing else INSERT

I got work around
SSIS execute process task and i have called FC.exe
http://www.howtogeek.com/206123/how-to-use-fc-file-compare-from-the-windows-command-prompt/

Does the stream_id change if I move the file to some other directory within the same filetable?

I am using MSSQL 2012 and its feature called File Table to store some big amount of files stored in hierarchical directories. I am referencing the entries on the file from other custom tables via the column stream_id, which is unique for every record on the file table. Sometimes I need to move the files on the file table to some other location on the same file table. So far I have noticed that the stream_id does not change if I move the file to another directory. However, now in the production environment the stream_id does change after the move, so my custom table is referencing a not existing entry on the file table.
For moving the Files I am using File.Move(source, target);
Is there something wrong with the deployment of the file table in my production environment or is it just a feature that the stream_id can sometime change if I change the location?
I haven't found any reference in the internet regarding the stream_id and its lifetime.

The stream_id is the pointer to the blob, the path_locator is the PK of the table. The former refers to the file no matter where it is in the system, the latter refers to whatever file is currently occupying the space at that path location.
My understanding is that the stream_id will not change for the life of that file. If you are seeing the "same" file with a different stream_id, consider whether an application (like MS Word), created a temp file then renamed the temp file to the current file when you saved. This would result in the new file having the same path_locator but a different stream_id.

Database With Users and Images

How to form database with users and images and to know every user which images belongs to him. For example if I have in database users with (id=1 name=Peter) and another user (id=2 name=Alex) and etc. ? How to know with FK and PK which images belongs to him, how to form this database ?

I suggest to store the location of the images in the database using VARCHAR datatype instead of any BLOB or other binary datatype.So you can use a column for ImageUrl Or path no need to use different table.
Storing the database location reduces the size of database greatly as well updating or replacing the image are much simpler as it is just an file operation instead of massive update/insert/delete in database.
Note:Before saving Image to actual folder don't forget to rename it Because uploaded image can have same name try to use GUID as it Gives unique value.
http://blog.sqlauthority.com/2007/12/13/sql-server-do-not-store-images-in-database-store-location-of-images-url/

how to access raw data in opensearchserver?

I searched for documents and cannot find where it store all data.
I want to access all crawled data in order to do my own processing.

In the file StartStopListener it sets up the index directories: look for the value of the environment values OPENSEARCHSERVER_DATA, OPENSEARCHSERVER_MULTIDATA, or OPENSHIFT_DATA_DIR.
Now, whether you'll be able to parse the files easily/correctly is another debate: I haven't ever tried to directly open a search server's indexes by hand, and I don't know that the index format is well documented.

By default, the crawled data are not stored. Only the extracted text is stored. It is possible to store the crawled data, here is the process:
Create a new field: Set the "stored" parameter to yes or to compressed.
Go to the Schema / Parser List
Edit the HTML parser
In the "Field Mapping" tab, link the parser field "htmlSource" to the new field.
Restart the indexation process. Now, all crawled data will be copied to this field. Don't forget to add it as returned field in your query.

How to insert exisitng documents stored on NFTS in sql server filestream's storage

I am doing investigation on filestream (asking on stackoverflow while reading whitepapers and google searching), in my current screnario documents are managed in this way:
1) I have a DB table where I keep the document id and the doc path (like \fileserver\DocumentRepository\file000000001.pdf)
2) I have a document folder (\fileserver\DocumentRepository) where I store the documents
Of course I need to change this to a varbinary(max)/filestream storage.
What is the best way to perform this task?
Is it possible to say "\fileserver\DocumentRepository\file000000001.pdf" is assigned to a varbinary(max) field or I have to explicitly insert it? So somehow tell to the varbinary(max) field: "now you are a pointer to the existing document".

You can't assign an existing file to a varbinary(max)/filestream value. You have to explicitly insert it.
That being said, if for some reason this is not an option for you (e.g. you can't afford copying huge amounts of data or would hit a disk space problem while copying) there are some hacks to do the migration with 0-copy. The trick would be to do the following steps:
Switch the DB to simple recovery model.
Insert placeholder filestream files for all the files you're about to migrate. When inserting, use a varbinary value of 0x. While inserting, collect the (document id/file path) => (filestream file name) pairs.
Stop Sql Server.
Overwrite the empty filestream files with the real documents (use move/hard links to avoid copying data).
Start Sql Server, perform some sanity checks (DBCC) and start a new backup chain.
Obviously, this hack is not recommended and prone do database corruption. :)