I'm attempting to validate that a file that is uploaded to my application is actually that file to increase security.
For image formats, we can check the magic bits and the file extension to determine the format.
I'm looking to do the same for a CSS file. From my understanding, there's no magic bits for a CSS file. The extent of what I could check would be the magic bits from a UTF-8 formatted file, which wouldn't protect against scripts in the event of an injection flaw.
Currently we validate the file extension is correct, but if you were to change the file extension, any file could be uploaded.
Is there a best practice way of validating a CSS file?
Can you read file content and use regex to check for valid css styles, such as:
([#.#]?[\w.:> ]+)[\s]{[\r\n]?([A-Za-z\- \r\n\t]+[:][\s]*[\w .\/()\-!]+;[\r\n]*(?:[A-Za-z\- \r\n\t]+[:][\s]*[\w .\/()\-!]+;[\r\n]*(?2)*)*)}
See:
https://regex101.com/r/fK9mY3/1
Related
As a general question: What's the role of file extension when determining file types?
For example, I can change .jpeg file to .png extension and even .txt. Of course, in the case of changing to .txt, it will neither be opened as picture, nor readable.
To determine file type, it seems the safe way is to parse the first few bytes of the file. If extension is not trustable, extension is no more than file name.
As a general rule, you should ALWAYS parse the COMPLETE file in order to be sure that the file is what the extension says. As you can easily imagine, it is pretty simple to create a binary file resembling a e.g. BMP (with a correct header) but then containing something different.
You should never trust the extension neither the header because otherwise a malicious user could exploit some of your code to generate e.g. a buffer overflow, and this is absolutely paramount if you are writing programs that must run at root/admin privilege.
Having said the obvious, the file extension nowadays is mainly used so that the OS can associate a program to that particular file (usually calling the program and passing the selected file as first parameter), and then it's up to the program to determine the file content.
It is a little bit different when talking about executable files. Under Unix, in order to be executable a file has to have the "x" flag set, otherwise it would not run, regardless of the extension. Under Windows, there is not such thing and the OS relies on only a few extensions (EXE, COM, BAT, etc.) to determine which files can be executed.
The EXE file, for example, has to start with "MZ" followed by some information for its allocation and size (http://www.delorie.com/djgpp/doc/exe/) and the OS surely checks its internal headers. Other formats (e.g. the COM executable format of the MS-DOS era) is just "pure" assembly code, so there is no check done by the OS. It just interprets those opcodes, hoping that everything will be fine.
So, to summarize:
File extension is mainly used so that the OS can call the appropriate program to open it (and passing the filename as the first parameter, argc/argv in C language for example)
Windows relies on some file extension to know if a file is executable, while Unix/Mac relies on a particular flag (x) associated with the file
Two things that are not well known about file extensions: directory names can have extension too, and extension can be way longer than the usual 3 characters.
With the help of file extension, you know how to read the first few and all the rest of the bytes. You also know what program to use to read the file. Or if it is an executable, you know that it is to be executed and not shown as a picture.
Yes you can change the file extension, but what does it mean then? It only means that OS (or any program that tried to read the file) is working correctly. Only you are providing bad data to it.
File extension is not something that some bytes of data inherently have. Extensions are given to those bytes depending upon the protocol followed to write them that way. After you have encoded the letters in binary form, you provide that binary form with .txt extension so that the text reader knows that these bytes convert to letters. That's the role of file extension. With bad file extension, this role is not fulfilled, resulting in incomprehension of the data you saved in binary.
As a general question: What's the role of file extension when determining file types?
The file extension usually identifies the application that opens a file.
If you rename a .JPG to a .PNG and while having JPG and PNG opened by the same application (usually an image viewer) that application can read the image stream and process it correctly regardless of having an incorrect file stream.
The problem arises if you rename the file in such a way that the file gets routed to an application that cannot handle the file's content.
If you rename a .DOCX (word) file to an Autocad extension (.DWG), opening the word file in autocad is likely to produce errors (unless per chance autocad can read word files).
How Application will detect file extension?
I knew that every file has header that contains all the information related to that file.
My question is how application will use that header to detect that file?
Every file in file system associated some metadata with it for example, if i changed audio file's extension from .mp3 to .txt and then I opened that file with VLC but still VLC is able to play that file.
I found out that every file has header section which contains all the information related to that file.
I want to know how can I access that header?
Just to give you some more details:
A file extension is basically a way to indicate the format of the data (for example, TIFF image files have a format specification).
This way an application can check if the file it handles is of the right format.
Some applications don't check (or accept wrong) file formats and just tries to use them as the format it needs. So for your .mp3 file, the data in this file is not changed when you simply change the extension to .txt.
When VLC reads the .txt byte by byte and interprets it as a .mp3 it can just extract the correct music data from that file.
Now some files include a header for extra validation of what kind of format the data inside the file is. For example a unicode text file (should) include a BOM to indicate how the data in the file needs to be handled. This way an application can check whether the header tag matches the expected header and so it knows for sure that your '.txt` file actually contains data in the 'mp3' format.
Now there are quite some applications to read those header tags, but they are often specific for each format. This TIFF Tag Viewer for example (I used it in the past to check the header tags from my TIFF files).
So or you could just open your file with some kind of hex viewer and then look at the format specifications what every bytes means, or you search Google for a header viewer for the format you want to see them.
I want to send a file as an email attachment, but at present there is an email filter that prevents that. Is there a simple method or library to encapsulate a file of any length inside an uncompressed ZIP file? I'd like to avoid adding an actual ZIP library that compresses, if I can. For one thing, the file I'm sending is already compressed.
The zip format has a stored method (method 0) that would allow you to simply enclose the file in the appropriate headers. See the PKWare appnote.txt for a description of the format. You would need to calculate the CRC-32 of the data to include in the headers.
I have a binary file (.bin) and a (.txt) file.
Using Python3, is there any way to combine these two files into one file (WITHOUT using any compressor tool if possible)?
And if I have to use a compressor, I want to do this with python.
As an example, I have 'file.txt' and 'file.bin', I want a library that gets these two and gives me one file, and also be able to un-merge the file.
Thank you
Just create a tar archive, a module that let's you accomplish this task is already bundled with Cpython, and it's called tarfile.
more examples here.
there are a lot of solutions for compressing!
gzip or zlib would allows compression and decompression and could be a solution for your problem.
Example of how to GZIP compress an existing file from [http://docs.python.org]:
import gzip
f_in = open('file.txt', 'rb')
f_out = gzip.open('file.txt.gz', 'wb')
f_out.writelines(f_in)
f_out.close()
f_in.close()
but also tarfile is a good solution!
Tar's the best solution to get binary file.
If you want the output to be a text, you can use base64 to transform binary file into a text data, then concatenate them into one file (using some unique string (or other technique) to mark the point they were merged).
Is there any way to decode a PICS Rules File?
Contains settings and preferences used by the Windows operating system.
The file is given by:
Q:F]9y3n5|dp:n4hU)qbs;!&-bzh
The PICSRules spec defines what PICS rules should look like. I don't think your file is one of those.
Is that the entire file contents above?
Are you sure that it's definitely a PICS rules file? There are many file types which use the .prf extension, perhaps it's one of those.