Is there an elegant way to check file integrity with md5 in ansible using md5 files fetched from server? - md5

I have several files on a server that I need to download from an ansible playbook, but because the connection has good chances of interruption I would like to check their integrity after download.
I'm considering two approaches:
Store the md5 of those files in ansible as vars
Store the md5 of those files on the server as files with the extension .md5. Such a pair would look like: file.extension and file.extension.md5.
The first approach introduces overhead in maintaining the md5s in ansible. So everytime someone adds a new file, he needs to make sure he adds the md5 in the right place.
But as an advantage, there is a solution for this, using the built in check from get_url action in conjunction with checksum=md5. E.g.:
action: get_url: url=http://example.com/path/file.conf dest=/etc/foo.conf checksum=md5:66dffb5228a211e61d6d7ef4a86f5758
The second approach is more elegant and the narrows the responsibility. When someone adds a new file on the server, he will make sure to add the .md5 as well and won't even need to use the ansible playbooks.
Is there a way to use the checksum approach to match the md5 from a file?

If you wish to go with your method of storing the checksum in files on the server, you can definitely use the get_url checksum arg to validate it.
Download the .md5 file and read it into a var:
- set_fact:
md5_value: "{{ lookup('file', '/etc/myfile.md5') }}"
And then when you download the file, pass the contents of md5_value to get_url:
- get_url:
url: http://example.com
dest: /my/dest/file
checksum: "md5:{{ md5_value }}"
force: true
Note that it is vital to specify a path to a file in dest; if you set this to a directory (and have a filename in url), the behavior changes significantly.
Note also that you probably need the force: true. This will cause a new file to download every time you run it. The checksum is only triggered when files are downloaded. If the file already exists on your host it won't bother to validate the sum of the existing file, which might not be desirable.
To avoid the download every time you could stat to see if the file already exists, see what its sum is, and set the force param conditionally.
- stat:
path: /my/dest/file
register: existing_file
- set_fact:
force_new_download: "{{ existing_file.stat.md5 != md5_value }}"
when: existing_file.stat.exists
- get_url:
url: http://example.com
dest: /my/dest/file
checksum: "md5:{{ md5_value }}"
force: "{{ force_new_download | default ('false') }}"
Also, if you are pulling the sums/artifacts from some sort of web server you can actually get the value of the sum right from the url without having to actually download the file to the host. Here is an example using a Nexus server that would host the artifacts and their sums:
- set_fact:
md5_value: "{{ item }}"
with_url: http://my_nexus_server.com:8081/nexus/service/local/artifact/maven/content?g=log4j&a=log4j&v=1.2.9&r=central&e=jar.md5
This could be used in place of using get_url to download the md5 file and then using lookup to read from it.

With the stat module:
- stat:
path: "path/to/your/file"
register: your_file_info
- debug:
var: your_file_info.stat.md5

The elegant solution will be using the below 3 modules provided by ansible itself
http://docs.ansible.com/ansible/stat_module.html
use the stat module to extract the md5 value and register it in a variable
http://docs.ansible.com/ansible/copy_module.html
while using the copy module to copy the file from the server, register the return value of md5 in another variable
http://docs.ansible.com/ansible/playbooks_conditionals.html
use this conditional module to compare the above 2 variables and print the results whether the file is copied properly or not

Another solution is to use url lookup (tested on ansible-2.3.1.0):
- name: Download
get_url:
url: "http://localhost/file"
dest: "/tmp/file"
checksum: "md5:{{ lookup('url', 'http://localhost/file.md5') }}"

Wrote an ansible module with the help of https://pypi.org/project/checksumdir
The module can be found here
Example:
- get_checksum:
path: path/to/directory
checksum_type: sha1/md5/sha256/sha512
register: checksum

Related

ansible loop over file list and check file exists, if not download it

Not sure how to have this logic implemented, I know how to do it a single file :
- name: Obtain information about a file
win_stat:
path: "C:\myfile.txt"
register: fileinfo
- [...]
when: fileinfo.exists == False
how should I go with a list of files?
If you just want to reduce the steps for doing this, you should be able to do your download step (not shown in your example) with ignore_errors: yes on your download commands. If you use a combination of ignore_errors: yes and register, you can even tell whether the command failed.
If you're looking to make it a bit more efficient, you can do the stat in a single task and then examine the results of that. When you execute a task with a list, you get a hash of answers.
Assuming you have a list of file names/paths in ssh_key_config, you use the stat and then you can loop over the items (which conveniently have the file name in them).
- name: Check to see if file exists
stat:
path: "{{ remote_dir }}/{{ item }}"
register: stat_results
with_items: "{{ target_files }}"
ignore_errors: True
- name: perform operation
fetch:
src: "{{ remote_dir }}/{{ item.item }}"
dest: "{{ your_dest_dir }}"
flat: yes
with_items: "{{ stat_results.results }}"
when: item.stat.exists == False
In this case, the assumptions are that remote_dir contains the remote directory on the host, target_files contains the actual file names, and your_dest_dir contains the location you want the files placed locally.
I don't do much with Windows and Ansible, but win_stat is documented pretty much the same as stat, so you can likely just replace that.
Also note that this expects the list of files, not a glob. If you use a glob (for example, you want to retrieve all files with a certain extension from the remote), then you would not use the with_items clause, and you'd need to use the item.stat.filename and/or item.stat.path to retrieve the file remotely (since the item.item would contain the request item, which would be the glob.

Ansible copy doesn't set file mode correctly

I've got an Ansible script which among many things copy's some files to the server:
- name: copy vhost basic files to folder
copy:
src: "{{ item }}"
dest: /var/www/vhosts/mmpew/
mode: 664
owner: "{{ deploy_user }}"
group: "{{ deploy_user }}"
with_fileglob:
- ../files/vhost/*
Locally on my Macbook the files have the permissions -rw-r--r--, but even though I set the mode in the ansible script to 664, the resulting files on the server have the permissions -r-----rwt.
Why oh why do the resulting files on the server not match either the mode set in the ansible script, or the original permissions from my local filesystem from which they are copied?
I even tried to set the mode correctly using the Ansible file module:
- name: Make sure the files I just uploaded are chmodded correctly
file:
path: /var/www/vhosts/mmpew/{{ item }}
mode: 644
with_items:
- the.txt
- files.php
- here.py
but even though I get no errors from Ansible, the file modes are not set correctly.
Could anybody enlighten me as to what is wrong here? All tips are welcome!
Use mode: 0644
The 0 is necessary.
You can specify the mode symbolically:
mode: u=rw,g=r,o=r
This is more readable and less error-prone. Symbolic mode is supported by Ansible >= 1.8, according to the documentation.
there are two kinds of method to define the mode
first:
mode: 0644
second:
mode: '644'

How to store command output into array in Ansible?

Essentially, I want to be able to handle "wildcard filenames" in Linux using ansible. In essence, this means using the ls command with part of a filename followed by an "*" so that it will list ONLY certain files.
However, I cannot store the output properly in a variable as there will likely be more than one filename returned. Thus, I want to be able to store these results no matter how many there might be in an array during one task. I then want to be able to retrieve all of the results from the array in a later task. Furthermore, since I don't know how many files might be returned, I cannot do a task for each filename, and an array makes more sense.
The reason behind this is that there are files in a random storage location that are changed often, but they always have the same first half. It's their second half of their names that are random, and I don't want to have to hard code that into ansible at all.
I'm not certain at all how to properly implement/manipulate an array in ansible, so the following code is an example of what I'm "trying" to accomplish. Obviously it won't function as intended if more than one filename is returned, which is why I was asking for assistance on this topic:
- hosts: <randomservername>
remote_user: remoteguy
become: yes
become_method: sudo
vars:
aaaa: b
tasks:
- name: Copy over all random file contents from directory on control node to target clients. This is to show how to manipulate wildcard filenames.
copy:
src: /opt/home/remoteguy/copyable-files/testdir/
dest: /tmp/
owner: remoteguy
mode: u=rwx,g=r,o=r
ignore_errors: yes
- name: Determine the current filenames and store in variable for later use, obviously for this exercise we know part of the filenames.
shell: "ls {{item}}"
changed_when: false
register: annoying
with_items: [/tmp/this-name-is-annoying*, /tmp/this-name-is-also*]
- name: Run command to cat each file and then capture that output.
shell: cat {{ annoying }}
register: annoying_words
- debug: msg=Here is the output of the two files. {{annoying_words.stdout_lines }}
- name: Now, remove the wildcard files from each server to clean up.
file:
path: '{{ item }}'
state: absent
with_items:
- "{{ annoying.stdout }}"
I understand the YAML format got a little mussed up, but if it's fixed, this "would" run normally, it just won't give me the output I'm looking for. Thus if there were 50 files, I'd want ansible to be able to manipulate them all, and/or be able to delete them all.. etc etc etc.
If anyone here could let me know how to properly utilize an array in the above test code fragment that would be fantastic!
Ansible stores the output of shell and command action modules in stdout and stdout_lines variables. The latter contains separate lines of the standard output in a form of a list.
To iterate over the elements, use:
with_items:
- "{{ annoying.stdout_lines }}"
You should remember that parsing ls output might cause problems in some cases.
Can you try as below.
- name: Run command to cat each file and then capture that output.
shell: cat {{ item.stdout_lines }}
register: annoying_words
with_items:
- "{{ annoying.results }}"
annoying.stdout_lines is already a list.
From doc of stdout_lines
When stdout is returned, Ansible always provides a list of strings, each containing one item per line from the original output.
To assign the list to another variable do:
..
register: annoying
- set_fact:
varName: "{{annoying.stdout_lines}}"
# print first element on the list
- debug: msg="{{varName | first}}"

Ansible read after write file operations in playbooks

I am working on a project using Ansible which requires me to write some data to a file using one playbook and then read the data from the same file using another playbook.
The playbook will be something like this
test1.yml
---
- hosts: localhost
connection: local
gather_facts: no
tasks:
- name: Writing data to test file
local_action: shell echo "data:" {{ 100 |random(step=10) }} > test.txt
- include: test2.yml
and would need to read it using test2.yml
---
- hosts: localhost
connection: local
gather_facts: no
vars_files:
- test.txt
tasks:
- name: Writing data to test file
local_action: shell echo "{{ data }}" > result.txt
However,
The second playbook is not able to read the latest data being posted by the first playbook.
If I view the data written in test.txt and result.txt they both are different. Is there a way to achieve consistency between the results of playbook calls ????
Are those two playbooks called separately? If they are included inside a master playbook, then this would explain it. All includes in the master playbook are resolved before execution, so Ansible would already have read both playbooks and the vars_file before any of them gets executed. You should be able to solve this by dynamically including the vars file during play with the include_vars module.
If I was wrong with my assumption and you're not including the playbooks in a parent playbook: What exactly do you mean by "different"? Is it completely different data or is it a formatting issue? I'm puzzled how data in general could not be consistent between calls. There is no magic in writing to and reading from a file. That should theoretically work.

How to move/rename a file using an Ansible task on a remote system

How is it possible to move/rename a file/directory using an Ansible module on a remote system? I don't want to use the command/shell tasks and I don't want to copy the file from the local system to the remote system.
From version 2.0, in copy module you can use remote_src parameter.
If True it will go to the remote/target machine for the src.
- name: Copy files from foo to bar
copy: remote_src=True src=/path/to/foo dest=/path/to/bar
If you want to move file you need to delete old file with file module
- name: Remove old files foo
file: path=/path/to/foo state=absent
From version 2.8 copy module remote_src supports recursive copying.
The file module doesn't copy files on the remote system. The src parameter is only used by the file module when creating a symlink to a file.
If you want to move/rename a file entirely on a remote system then your best bet is to use the command module to just invoke the appropriate command:
- name: Move foo to bar
command: mv /path/to/foo /path/to/bar
If you want to get fancy then you could first use the stat module to check that foo actually exists:
- name: stat foo
stat: path=/path/to/foo
register: foo_stat
- name: Move foo to bar
command: mv /path/to/foo /path/to/bar
when: foo_stat.stat.exists
I have found the creates option in the command module useful. How about this:
- name: Move foo to bar
command: creates="path/to/bar" mv /path/to/foo /path/to/bar
I used to do a 2 task approach using stat like Bruce P suggests. Now I do this as one task with creates. I think this is a lot clearer.
- name: Move the src file to dest
command: mv /path/to/src /path/to/dest
args:
removes: /path/to/src
creates: /path/to/dest
This runs the mv command only when /path/to/src exists and /path/to/dest does not, so it runs once per host, moves the file, then doesn't run again.
I use this method when I need to move a file or directory on several hundred hosts, many of which may be powered off at any given time. It's idempotent and safe to leave in a playbook.
Another Option that has worked well for me is using the synchronize module . Then remove the original directory using the file module.
Here is an example from the docs:
- synchronize:
src: /first/absolute/path
dest: /second/absolute/path
archive: yes
delegate_to: "{{ inventory_hostname }}"
I know it's a YEARS old topic, but I got frustrated and built a role for myself to do exactly this for an arbitrary list of files. Extend as you see fit:
main.yml
- name: created destination directory
file:
path: /path/to/directory
state: directory
mode: '0750'
- include_tasks: move.yml
loop:
- file1
- file2
- file3
move.yml
- name: stat the file
stat:
path: {{ item }}
register: my_file
- name: hard link the file into directory
file:
src: /original/path/to/{{ item }}
dest: /path/to/directory/{{ item }}
state: hard
when: my_file.stat.exists
- name: Delete the original file
file:
path: /original/path/to/{{ item }}
state: absent
when: my_file.stat.exists
Note that hard linking is preferable to copying here, because it inherently preserves ownership and permissions (in addition to not consuming more disk space for a second copy of the file).
This is the way I got it working for me:
Tasks:
- name: checking if the file 1 exists
stat:
path: /path/to/foo abc.xts
register: stat_result
- name: moving file 1
command: mv /path/to/foo abc.xts /tmp
when: stat_result.stat.exists == True
the playbook above, will check if file abc.xts exists before move the file to tmp folder.
Another way to achieve this is using file with state: hard.
This is an example I got to work:
- name: Link source file to another destination
file:
src: /path/to/source/file
path: /target/path/of/file
state: hard
Only tested on localhost (OSX) though, but should work on Linux as well. I can't tell for Windows.
Note that absolute paths are needed. Else it wouldn't let me create the link. Also you can't cross filesystems, so working with any mounted media might fail.
The hardlink is very similar to moving, if you remove the source file afterwards:
- name: Remove old file
file:
path: /path/to/source/file
state: absent
Another benefit is that changes are persisted when you're in the middle of a play. So if someone changes the source, any change is reflected in the target file.
You can verify the number of links to a file via ls -l. The number of hardlinks is shown next to the mode (e.g. rwxr-xr-x 2, when a file has 2 links).
Bruce wasn't attempting to stat the destination to check whether or not to move the file if it was already there; he was making sure the file to be moved actually existed before attempting the mv.
If your interest, like Tom's, is to only move if the file doesn't already exist, I think we should still integrate Bruce's check into the mix:
- name: stat foo
stat: path=/path/to/foo
register: foo_stat
- name: Move foo to bar
command: creates="path/to/bar" mv /path/to/foo /path/to/bar
when: foo_stat.stat.exists
This may seem like overkill, but if you want to avoid using the command module (which I do, because it using command is not idempotent) you can use a combination of copy and unarchive.
Use tar to archive the file(s) you will need. If you think ahead this actually makes sense. You may want a series of files in a given directory. Create that directory with all of the files and archive them in a tar.
Use the unarchive module. When you do that, along with the destination: and remote_src: keyword, you can place copy all of your files to a temporary folder to start with and then unpack them exactly where you want to.
On Windows:
- name: Move old folder to backup
win_command: "cmd.exe /c move /Y {{ sourcePath }} {{ destinationFolderPath }}"
To rename use rename or ren command instead
You can Do It by --
Using Ad Hoc Command
ansible all -m command -a" mv /path/to/foo /path/to/bar"
Or You if you want to do it by using playbook
- name: Move File foo to destination bar
command: mv /path/to/foo /path/to/bar
- name: Example
hosts: localhost
become: yes
tasks:
- name: checking if a file exists
stat:
path: "/projects/challenge/simplefile.txt"
register: file_data
- name: move the file if file exists
copy:
src: /projects/challenge/simplefile.txt
dest: /home/user/test
when: file_data.stat.exists
- name: report a missing file
debug:
msg: "the file or directory doesn't exist"
when: not file_data.stat.exists

Resources