Databricks Filesystem - %sh ls vs %fs ls - filesystems

I have some files located in %sh ls and I would want to move those files into the filesystem of databricks (make them visible in %fs ls).
Does any of you know what the difference is between %sh ls and %fs ls, and how do I move the files between them?
I know we can utilize dbutils.fs.cp to move files that are already in %fs ls location.
Any help or pointers is appreciated.

when you are executing commands via %sh, they are executed on the driver node only, and showing the content on that machine. When you're doing %fs ls, it's by default shows you content of DBFS (Databricks File System), but it can also show the local content if you add the file:// prefix to the path.
You can copy or move files files as following:
Using dbutils.fs.cp("file:///local-path", "dbfs-path") (or dbutils.fs.mv)
By using so-called FUSE mount that mounts DBFS to local machines - you need to add the /dbfs/ prefix to a path that you want to have on DBFS, like, /dbfs/FileStore/.... (If you're using Community edition with DBR >= 7.x, then it may not work, so you have only the first method)
P.S. You can find more information in the documentation.

Related

Docker File No such file or directory - Absolute Path issue

Writing because I have a stranger problem with Docker File process
The problem is regarding Docker File Context. As far as I understood the directory context that I can access from Dockerfiles is one directory up and one directory down
Example Directory Tree
A - B - C - D - E
If my docketfile is on C
I can access B D
But I can’t access A E
I have a problem because this is my case
My Docker file is on C
And I need to access files from B D E
And I really don’t know how to do it
I need to access it
Becaiuse my target jar is on E
And I need to do an ADD to this file to implementing docket hot deploy with Spring Dev Tools
Somenthing like on Docker
ADD .\D\E\jar.file jar.file
ENtrypoint xxx
Expose xxx
And I still need to access B to get some other files.
Was Clear?
Sorry I know is strange
If you can do something it does not mean it is right or if is something not recommended so it means the issue can arise.
If you read General guidelines and recommendations, It will recommend keeping the thing in context, then why you need to copy thing from the different drive? Btw it is not possible in Linux as docker need to copy from the context so better to keep your jar file in dockerfile context.
Understand build context
When you issue a docker build command, the current working directory
is called the build context. By default, the Dockerfile is assumed to
be located here, but you can specify a different location with the
file flag (-f). Regardless of where the Dockerfile actually lives, all
recursive contents of files and directories in the current directory
are sent to the Docker daemon as the build context.

File extracted from ZIP not recognized until re-save

Q: Why would re-saving a file be different vs a direct extraction from a zip file? Particularly on Windows?
Context
I have an angular application that prepares a text file for import into a commercial machine. For user convenience, we provide the file inside a zip file so that the required folder structure can be provided to the user. They write this file to a USB drive and use that to import into the machine.
Problem
If the downloaded zip file is extracted directly onto the USB (to get the file and the required folder structure), the machine cannot recognize the embedded text file.
Troubleshooting
If I open the file in any text editor, add a space, delete the space, and re-save the file on the USB, then the machine will recognize the file. Alternatively, if I extract the zip onto the local file system, then copy the folder structure from the local file system to the USB, then the machine also will recognize it.
If I switch to Linux, then a 'write out' from nano will fix the file. If I use the touch command on the file, the problem remains.
Suspecting a whitespace/line-ending issue, I've tried several diff tools which reveal no apparent differences:
$ diff original.txt resaved.txt (Linux)
$ vbindiff original.txt resaved.txt (Linux)
> fc /b original.txt resaved.txt (Windows 7)
Other info:
Angular version: 5.2.10
Zip Utility in angular: JSZip 3.1.5
Unzip Utils: 7-Zip and Native Windows Explorer extract
JSZip code:
const zip = new JSZip();
zip.folder('FolderA/FolderB/FolderC').file('FILE.TXT', new File([contentString], 'TEMP.TXT', { type: 'text/plain' }));
zip.generateAsync({ type: 'blob' })
.then(function (content) {
saveAs(content, 'ZipFile.ZIP');
});
At this point, I'm out of ideas. Hoping someone here may have some insight into this odd behavior.
TL;DR: Check the file attributes (e.g. Archive, Read-Only, Hidden, System, etc).
Our system was specifically looking for the Archive bit and modifying the file in any way set this bit.
This was an ugly one to ferret out, but chatting with our embedded systems programmer for a bit led to the answer.
Our machine was specifically searching for the archive bit (Windows file attribute) when it was searching for files to import. This bit is a relic from Windows NTFS and is near obsolete. For all intents and purposes it is a dirty flag used to point out files that should be archived/backed up in the next backup run. There are much better ways to do this, so it has fallen out of style.
However, for whatever reason, our system is searching only for files with that bit set. That's why opening/copying/moving the file would fix the problem, because altering it in any way set this archive bit (dirty flag).
If you want to learn more about it, see here and here.
So, the moral of the story is to check these file attributes if you have a similar issue.
We are using the Harmony USB driver from Microchip, so this may be a nuance of that tool (or maybe just an artifact from one of the online examples).
You can see it this using the file properties in Windows Explorer or with the > attrib <file> command in Windows command prompt.
To fix:
Windows: You can set the value from the command prompt using > attrib +a <file> or remove it using > attrib -a <file>.
If using node.js on a Windows host, you can use the winattr library from NPM to manipulate these attributes.
Linux: You can use $ getfattr and $ setfattr to set the bit (see here and here).
Note: the answers I linked say to use $ setfattr -h -v 0x00000020 -n system.ntfs_attrib_be <target-file> but I got an operation not supported when I tried to do the same. I ended up using the java solution, but when I inspected the file afterward, it seemed the equivalent command would have been $ setfattr -n user.DOSATTRIB -v 0sMHgyMAA= <target-file>. Your mileage may vary but I offer it in case it helps anyone.
Java: You can also use Java from any system.

can't execute external command in cgi

I have this line code in my file .pl
system("./read_image http://echopaw.fr/wp-content/uploads/2015/09/animals-41.jpg");
read_image is a C executable file, it works well in command line, but when I run my .pl file in web server, this line didn't work, its function is to write some data into a file, so I can see if it works
I also tried `` , but it still didn't work
anyone gets some ideas?
I think it's because when your webserver runs your CGI then it does so from some wierd directory (/var/www/htdocs or / or whatever). Then your read_image is also expected to be in that directory because of the ./read_image.
./ means current directory, which is not necessarily the directory where your .cgi is located.
I'd suggest using the Perl module FindBin:
your cgi:
...
use strict;
use warnings;
use FindBin;
...
system("$FindBin::Bin/read_image http://echopaw.fr/wp-content/uploads/2015/09/animals-41.jpg");
$FindBin::Bin resolves to the path where your .cgi is located -- no matter where you call it from. It's a quite handy module and can also be used to pimp the #INC path to find your own modules:
use lib "$FindBin::Bin/../lib";
use MyModule;
This way MyModule.pm is expected to be in $current_script_path/../lib. Very convenient.
Addendum
As the discussion evolves this is apparently not only a problem of whether apache can or cannot find the read_image command but also of whether read_image in turn can find the wget command which it tries to execute.
As #CDahn already noted in a comment apache runs CGI scripts and the like with a limited environment for security reasons. If you run read_image from your shell, then you have a fully working environment with, say, PATH including 15 different directories (like /usr/bin:/opt/bin:/usr/share/bin:/usr/local/bin:..., whatever). When apache runs your scripts, PATH may only contain 2 or 3 directories that are absolutely necessary, e.g. /usr/bin:/usr/local/bin. The same applies to other environment variables.
You can verify this with a little .cgi script that simply echoes the current environment, like
while (my ($key,$value) = each %ENV) {
print "$key=$value\n";
}
(Taken from http://perldoc.perl.org/functions/each.html)
Call this .cgi from your browser and you'll see the difference.
When your read_image cannot find the wget command then probably because wget is located in a PATH apache doesn't know of. Instead of teaching apache about the additional directory (bad idea), I would give the full path to wget command in your read_image program. From your terminal window run
$ which wget
/usr/bin/wget
and use that very path when calling wget from your read_image.

How to bind root (/) to itself with fuse on linux?

I'm writing a fuse file system that mount one directory to itself. I want to log some calls (flush for example). I've started to adapt fuse tutorial sample code. If I try to bind any directory it works great:
./bbfs -o nonempty ./test ./test
but if I try to bind particular root directory ("/"):
sudo ./bbfs -o nonempty / /
no one line is in logfile.
Is it possible?
My mangled version of sample program. I've changed only bbfs.c file.
You can't mount a FUSE filesystem (or any other type of filesystem, for that matter) at /, because your root filesystem is already there.
Doing so would be disastrous anyway, as mounting a filesystem at a path makes any files which previously existed under that path inaccessible. You can't use FUSE as a filter like this -- you will need to find another solution to whatever it is you're trying to do.

vim cannot connect to cscope database

I have opensuse 11.4 installed. Vim is version 7. Now I normally use it to browse the linux kernel source. So I generated the cscope database inside a directory within my home folder i.e. /home/aijazbaig1/cscope_DB/ and I got 3 files viz. cscope.out, cscope.po.out and cscope.in.out besides the cscope.files file which contains a list of all the relevant files which I want to search.
Additionally I have added the following to my .bashrc:
CSCOPE_DB=/home/aijazbaig1/cscope_DB/cscope.out
export CSCOPE_DB
But when I do a :cscope show from within vim it says there are no connections. Can anyone please let me know what is going wrong.
Keen to hear from you,
This is mentioned in the comments above, but I want to make sure it's preserved in an answer.
The issue that came up for me was that vim didn't know where to look for the cscope database. When I added
cs add $CSCOPE_DB
to my .vimrc. Everything came out fine.
I figure since I've made the visit, I would try responding.
I was getting this error when searching using ctrl-space s (or any search for that matter):
E567: no cscope connections
I finally found the full solution at http://cscope.sourceforge.net/cscope_vim_tutorial.html, Step 11.
The idea is that you create a list of source files to be included in the view of cscope, generate the cscope.out in the same location, and update the export path accordingly:
find /my/project/dir -name '*.c' -o -name '*.h' > /foo/cscope.files
cscope -R -b (this may take a while depending on the size of your source)
export CSCOPE_DB=/foo/cscope.out (put this in your .bashrc/.zshrc/other-starting-script if you don't want to repeat this every time you log into the terminal)
You need to add a "cscope connection", like this in vim:
:cscope add $PATH_TO_CSCOPE.out
See :help cs for more examples.
Here's how I explore linux kernel source using cscope:
I use vim as my editor.
While standing inside the kernel source root directory, run cscope in interactive mode while recursively going through subdirectories during search for source files:
cscope -R
When run for the first time, it will generate the database file with the name: cscope.out inside the current directory. Any subsequent runs will use the already generated database.
Search for anything or any file and open it.
Set cscope tags in vim to make the :tag and CTRL-] commands search through cscope first and then ctags' tags:
:set cscopetag
Set cscope database inside current VIM session:
:cs add cscope.out
Now you can use CTRL-] and CTRL-t as you would do in ctags to navigate around! :)
I have the same issue on my PC. For now, to solve the issue:
On terminal execute: which is cscope
Open .vimrc file to edit: set csprg=/usr/bin/cscope
I ran into a similar problem with no cscope connections on ubuntu 18.04, then I discovered my .vimrc file does not load the CSCOPE_DB variable. Looked a little around and found a solution.
You can just copy this directly in to your .vimrc file.
Part of the code loads your cscope file from your directory. The keybinds are just a nice bonus.
Hope this helps.

Resources