Join two files in powershell - arrays

Really need help on this :(I 'll try to be as simple as possible.
I got one big file looking like this:
ID,Info1,Info2,info3,...
On each line, i got one ID and a lot of stuff, comma separated. There can be > 3000 lines.
Now i got a second file like this :
ID,Info4,Info5,Info6,...
The first file contains ALL the elements whereas the second file contains only some of them.
For example, first one:
BLA1,some stuff...
BLA2,some stuff...
BLA3,some stuff...
ALO1,some stuff...
ALO2,some stuff...
And the second one :¨
BLA3,some stuff2...
ALO1,some stuff2...
BLA1,some stuff2...
What i want is simple, I want to append all the 'some stuff2...' of the second file to the first one like a join type=left with sql
I want the first file to have now :
BLA1,some stuff...,some stuff2...
BLA2,some stuff...
BLA3,some stuff...,some stuff2...
ALO1,some stuff...,some stuff2...
ALO2,some stuff...
I tried something like this :
ForEach ($line in $file1) {
$colA = $line.Split(',')
ForEach ($line in $file2) {
$colB = $line.Split(',')
if($colA[0]-eq $colB[0]) { #Item found in file2
$out += $date + $colA[1]+","+ ... +","+ $colB[1]+","+ ... +"`n"
}else {
$out += $date + $colA[1]+","+ ... +"`n"
}
}
}
But it takes so much time it dosnt success (and maybe there were other problems i didnt see). What's the best way? a 2D Array? I could try to sort the IDs and then script a little, but as its not numerical only i don't know how to process.
Thks a lot guys for your help,

Use a hashtable where the key is the ID.
$ht = [ordered]#{}
foreach ($line in $file1) {
$id,$rest = $line -split ',',2
$ht[$id] = $line
}
foreach ($line in $file2) {
$id,$rest = $line -split ',',2
if ($ht.ContainsKey($id)) {
$ht[$id] += ",$rest"
}
else {
$ht[$id] = $line
}
}
$ht.Values > newfile.txt

I went with the assumption that you either have known header lines or can add them...
f1.csv
Name,Item_1
BLA1,thing_bla1_1
ALB1,thing_alb1_1
BLA2,thing_bla2_1
ALB2,thing_alb2_1
BLA3,thing_bla3_1
ALB3,thing_alb3_1
f2.csv
Name,Item_2
BLA3,thing_bla3_2
ALB3,thing_alb3_2
BLA1,thing_bla1_2
ALB1,thing_alb1_2
BLA2,thing_bla2_2
ALB2,thing_alb2_2
Code:
$grouped = Import-Csv .\f1.csv, .\f2.csv | group -property Name -ashashtable
$($grouped.Keys | foreach {$obj = $grouped.Item("$_")[0].Name + "," + $grouped.Item("$_")[0].Item_1 + "," + $grouped.Item("$_")[1].Item_2; $obj}) | Out-File .\test.csv
What we are doing here is importing the two CSVs into one element, then grouping the items of the same name in the hash table. Then we pipe the keys (the non-duplicated names from the files) into a foreach that combines them into one line. We need the $() around those statements to allow the output to be piped to Out-File.
I'm nearly positive that there is a cleaner way to do the inside of the foreach, but this does work.
The output (text.csv):
ALB1,thing_alb1_1,thing_alb1_2
BLA2,thing_bla2_1,thing_bla2_2
ALB3,thing_alb3_1,thing_alb3_2
BLA1,thing_bla1_1,thing_bla1_2
ALB2,thing_alb2_1,thing_alb2_2
BLA3,thing_bla3_1,thing_bla3_2

If you want to do a LEFT JOIN, you could load the files into a temporary database and actually do a LEFT JOIN. See here for an example using SQLite.

Related

How to convert a Delimited List of Mixed Strings to an array for powershell script

I'm looking to solve a problem where I have a long file of comma-delimitted ordered 8-digit numbers and ranges (with leading zeros), as below:
00001253,00001257-00001268,00001288,...,02154320,02154321,02154323-02154327,...
I want to
(a) store any values that aren't ranges as tokens in a PowerShell array while retaining leading zeros
and
(b) expand ranges to all of their corresponding values and store the tokens in the same array. Here's the PowerShell "script" I threw together for my purpose so far:
$refids = #(ARRAY_DERIVED_FROM_ABOVE_LIST)
foreach ($refid in $refids) {
New-Item c:\scripts\$refid.txt -type file -force -value "KEY:$refid"
}
Any ideas on how to proceed? Thanks in advance for any assistance
You can start with this, maybe:
$string = '00001253,00001257-00001268,00001288,02154320,02154321,02154323-02154327'
$string.split(',') |
foreach {
if ($_.Contains('-'))
{
invoke-expression ($_.replace('-','..')) |
foreach {'{0:D8}' -f $_}
}
else {$_}
}
00001253
00001257
00001258
00001259
00001260
00001261
00001262
00001263
00001264
00001265
00001266
00001267
00001268
00001288
02154320
02154321
02154323
02154324
02154325
02154326
02154327
Mjolinor's answer is very good. Some people refrain from using Invoke-Expression. Here is another example that accomplishes the same thing while showing a slightly different approach.
$string = '00001253,00001257-00001268,00001288,02154320,02154321,02154323-02154327'
$string.split(',') | ForEach-Object {
If($_.Contains('-')){
$_.Split("-")[0]..$_.Split("-")[1]
} Else {
$_
}
} | ForEach-Object{ Write-Output ([string]$_).PadLeft(8,"0")}

Powershell, Directory Verification

I am fairly new to PowerShell. I have created an exe that can be ran by a coworker that will go to 8 separate sql servers, to individual directories and check them for a list of created files. My current code checks for age less than one day and that it's not empty, however, I have been presented with a new problem. I need to be able to take a list in a text file/array/csv/etc and compare the list to the directory to ensure that there are not any additional files in the folder. All of the information is then formatted, exported to a file, and then emailed to certain recipients.
My question is, how do I create a script that works using this idea; preferably without considerable change to my current code as it is fairly length considering the sub directories of the 8 sql servers.. (current script redacted is as follows):
$today = get-date
$yesterday = $today.AddDays(-1)
$file_array = "XXX_backup*.trn", "XXX_backup*.bak", "XXY_backup*.trn", "XXY_backup*.bak"
$server_loc = \\hostname\e$\
$server_files = #(get-childitem $server_loc | where-object {$_.fullname -notmatch "(subdirectory)})
$server_complete_array = #("Files Directory ($server_loc)", " ")
foreach ($file in $files) {
IF ($file.length -eq 0) {
[string]$string = $file
$server_complete_Array = $server_complete_Array + "$string is empty."
}
elseif ($file.LastWriteTime -lt $yesterday) {
[string]$string = $file
$server_complete_Array = $server_complete_Array + "$string is old."
}
else {
[string]$string = $file
$server_complete_Array = $server_complete_Array + "$string is okay."}
}
Thank you for your help in advance. :)
Bit of a hack, but it should work with what you already have:
$file_array = "XXX_backup*.trn", "XXX_backup*.bak", "XXY_backup*.trn", "XXY_backup*.bak"
$FileNames = $server_files | select -ExpandProperty Name
$FileCheck = ('$Filenames ' + ($file_array | foreach {"-notlike '$_'"}) -join ' ')
Since you're already using wildcards in your file specs, this works with that by creating a command which filters the file name array through a series of -notlike operators, one for each file spec.
$FileCheck
$Filenames -notlike 'XXX_backup*.trn' -notlike 'XXX_backup*.bak' -notlike 'XXY_backup*.trn' -notlike 'XXY_backup*.bak'
Each one will filter out the names that match the file spec and pass the rest onto the next -notlike operator. Whatever falls out the end didn't match any of them
At this point $FileCheck is just a string, and you'll need to use Invoke-Expression to run it.
$ExtraFiles = Invoke-Expression $FileCheck
and then $ExtraFiles should contain the names of all the files that did not match any of the file specs in $file_array. Since the filters are being created from $file_array, there's no other maintenance to do if you add or remove filespecs from the array.

Filtering Search Results Against a CSV in Powershell

Im interested in some ideas on how one would approach coding a search of a filesystem for files that match any entries contained in a master CSV file. I have a function to search the filesystem, but filtering against the CSV is proving harder than I expect. I have a csv with headers in it for Name & IPaddr:
#create CSV object
$csv = import-csv filename.csv
#create filter object containing only Name column
$filter = $csv | select-object Name
#Now run the search function
SearchSubfolders | where {$_.name -match $filter} #returns no results
I guess my question is this: Can I filter against an array within a pipeline like this???
You need a pair of loops:
#create CSV object
$csv = import-csv filename.csv
#Now run the search function
#loop through the folders
foreach ($folder in (SearchSubfolders)) {
#check that folder against each item in the csv filter list
#this sets up the loop
foreach ($Filter in $csv.Name) {
#and this does the checking and outputs anything that is matched
If ($folder.name -match $Filter) { "$filter" }
}
}
Usually CSVs are 2-dimensional data structures, so you can't use them directly for filtering. You can convert the 2-dimensional array into a 1-dimensional array, though:
$filter = Import-Csv 'C:\path\to\some.csv' | % {
$_.PSObject.Properties | % { $_.Value }
}
If the CSV has just a single column, the "mangling" can be simplified to this (replace Name with the actual column name):
$filter = Import-Csv 'C:\path\to\some.csv' | % { $_.Name }
or this:
$filter = Import-Csv 'C:\path\to\some.csv' | select -Expand Name
Of course, if the CSV has just a single column, it would've been better to make it a flat list right away, so it could've been imported like this:
$filter = Get-Content 'C:\path\to\some.txt'
Either way, with the $filter prepared, you can apply it to your input data like this:
SearchSubFolders | ? { $filter -contains $_.Name } # ARRAY -contains VALUE
The -match operator won't work, because it compares a value (left operand) against a regular expression (right operand).
See Get-Help about_Comparison_Operators for more information.
Another option is to create a regex from the filename collection and use that to filter for all the filenames at once:
$filenames = import-csv filename.csv |
foreach { $_.name }
[regex]$filename_regex = ‘(?i)^(‘ + (($filenames | foreach {[regex]::escape($_)}) –join “|”) + ‘)$’
$SearchSubfolders |
where { $_.name -match $filename_regex }
You can use Compare-Object to do this pretty easily if you are matching the actual Names of the files to names in the list. An example:
$filter = import-csv files.csv
ls | Compare-Object -ReferenceObject $filter -IncludeEqual -ExcludeDifferent -Property Name
This will print the files in the current directory that match the any Name in files.csv. You could also print only the different ones by dropping -IncludeEqual and -ExcludeDifferent flags. If you need full regex matching you will have to loop through each regex in the csv and see if it is a match.
Here's any alternate solution that uses regular expression filters. Note that we will create and cache the regex instances so we don't have to rely on the runtime's internal cache (which defaults to 15 items). First we have a useful helper function, Test-Any that will loop through an array of items and stop if any of them satisfies a criteria:
function Test-Any() {
param(
[Parameter(Mandatory=$True,ValueFromPipeline=$True)]
[object[]]$Items,
[Parameter(Mandatory=$True,Position=2)]
[ScriptBlock]$Predicate)
begin {
$any = $false
}
process {
foreach($item in $items) {
if ($predicate.Invoke($item)) {
$any = $true
break
}
}
}
end { $any }
}
With this, the implementation is relatively simple:
$filters = import-csv files.csv | foreach { [regex]$_.Name }
ls -recurse | where { $name = $_.Name; $filters | Test-Any { $_.IsMatch($name) } }
I ended up using a 'loop within a loop' construct to get this done after much trial and error:
#the SearchSubFolders function was amended to force results in a variable, SearchResults
$SearchResults2 = #()
foreach ($result in $SearchResults){
foreach ($line in $filter){
if ($result -match $line){
$SearchResults2 += $result
}
}
}
This works great after collapsing my CSV file down to a text-based array containing only the necessary column data from that CSV. Much thanks to Ansgar Wiechers for assisting me with that particular thing!!!
All of you presented viable solutions, some more complex than I cared for, nevertheless if I could mark multiple answers as correct, I would!! I chose the correct answer based on not only correctness but also simplicity.....

Search for, and remove column from CSV file

I'm trying to write a subroutine that will take two arguments, a filename and the column name inside a CSV file. The subroutine will search for the second argument (column name) and remove that column (or columns) from the CSV file and then return the CSV file with the arguments removed.
I feel like I've gotten through the first half of this sub (opening the file, retrieve the headers and values) but I can't seem to find a way to search the CSV file for the string that the user inputs and delete that whole column. Any ideas? Here's what I have so far.
sub remove_columns {
my #Para = #_;
my $args = #Para;
die "Insufficent arguments\n" if ($nargs < 2);
open file, $file
$header = <file>;
chomp $header;
my #hdr = split ',',$header;
while (my $line = <file>){
chomp $line;
my #vals = split ',',$line;
#hash that will allow me to access column name and values quickly
my %h;
for (my $i=0; $i<=$#hdr;$i++){
$h{$hdr[$i]}=$i;
}
....
}
Here's where the search and removal will be done. I've been thinking about how to go about this; the CSV files that I'll be modifying will be huge, so speed is a factor, but I can't seem to think of a good way to go about this. I'm new to Perl, so I'm struggling a bit.
Here are a few hints that will hopefully get you going.
To remove the element of an array at position $index of an array use :
splice #array,$index,1 ;
As speed is an issues, you probably want to construct an array of column numbers at the start and then loop on the the elements of the array
for my $index (#indices) {
splice #array,$index,1 ;
}
(this way is more idiomatic Perl than for (my $i=0; $i<=$#hdr;$i++) type loop )
Another thing to consider - CSV format is surprisingly complicated. Might your data have data with , within " " such as
1,"column with a , in it"
I would consider using something like Text::CSV
You should probably look in the direction of Text::CSV
Or you can do something like this:
my $colnum;
my #columns = split(/,/, <$file>);
for(my $i = 0; $i < scalar(#columns); $i++) {
if($columns[$i] =~ /^$unwanted_column_name$/) {
$colnum = $i;
last;
};
};
while(<$file>) {
my #row = split(/,/, $_);
splice(#row, $colnum, 1);
#do something with resulting array #row
};
Side note:
you really should use strict and warnings;
split(/,/, <$file>);
won't work with all CSV files
There is elegant way how to remove some columns from array. If I have columns to removal in array #cols, and headers in #headers I can make array of indexes to preserve:
my %to_delete;
#to_delete{#cols} = ();
my #idxs = grep !exists $to_delete{$headers[$_]}, 0 .. $#headers;
Then it's easy to make new headers
#headers[#idxs]
and also new row from read columns
#columns[#idxs]
The same approach can be used for example for rearranging arrays. It is very fast and pretty idiomatic Perl way how to do this sort of tasks.

How do I add a variable’s value to an array?

I am looking for files in a directory. If I can’t find the file, I want to send the name of that file to an array, so that by the time the loop is done I’ll have an array of the files that weren’t found. How do I code this in Perl?
foreach $missing (#miss) {
chomp $missing;
($a,$b)=split(/\.m_inproc./,$missing);
#find = `find /home1/users/virtual/ -name .m_inproc.$b`;
$find_size = scalar #find;
$flag = "/home1/t01jkxj/check_st/flags/$b";
if ($find_size < 1 && -e $flag) {
$doit = `$b > #re_missing`;
}
}
This is my searching code, and if it doesn’t find a file ($find_size is less than 1) and there is a flag file (meaning we’ve done this search before). I want to write that variable $b (the filename) to an array. Obviously, my syntax currently ($doit = $b > #re_missing;) is incorrect. What would it be? Thanks!
How about:
push #re_missing, $b
By the way, using $a and $b are bad form. These are the implicitly declared variables used in the body of the comparator for sort.
Not sure of the full context of your code but this code - combined with your already implemented file size checking - should put you on the right track. The short answer here is "USE PUSH" but I felt like coding things, so sorry for that if that causes unneccesary reading and "use push" would have sufficed.
#usr/bin/perl
use strict;
use warnings;
my #files = ("tester.txt", "pooter.txt", "output.txt");
my #notfound;
foreach(#files)
{
if (!(-e $_))
{
push (#notfound, $_);
}
}
foreach(#notfound)
{
print $_;
}
my #missing_files = grep { ! -e } #files_to_search_for;

Resources