Yesterday I had to look at a list of stuff on a third-party website and match it up to a list of stuff we control. It took quite a bit of time and some Excel shenanigans to complete the task and as I know it’s going to crop up again, I decided to employ PowerShell to do it in the future.
Rather than go through the whole process, we’ll just look at the common bit that you might want to use, which is grabbing a file (in my case an HTML source file, but it can be any text file) and stripping out the regex matches into a new clean file.
PowerShell Commands
We’re going to use the following:
Get-Content
to grab the text from the source fileSelect-String
to regex match the content we wantForEach-Object
to iterate matchesOut-File
to chuck it into a new file- … and most importantly, lots of
|
to pipe everything along the conveyor belt
In the example below I’m sending it all into a CSV – this is arbitrary as it is just a new line for each match. In my case, treating it as a CSV data source is useful in the next step. You could send it to a plain text file too.
Complete PowerShell Script
$sourcePath = "example.txt" $outPath = "example.csv" $regexPattern = "([A-Z]{2}\-[0-9]{2,4})" Get-Content $sourcePath | Select-String -Pattern $regexPattern -AllMatches | ForEach-Object {$_.matches.groups[1].value} | Out-File $outPath
The regex happens to be looking for a particular pattern I’m interested in – you can BYOR.
Input / Output
This is a brief example of input and output.
Input
<div class="checkbox_block"> <input type="checkbox" id="f_projects_box2205109" value="2205109"> <label for="f_projects_box2205109"> <span class="text"> KT-2002 Some description here </span> </label> </div> <div class="checkbox_block"> <input type="checkbox" id="f_projects_box2205208" value="2205208"> <label for="f_projects_box2205208"> <span class="text"> Some additional description AR-9999 etc. </span> </label> </div>
Output
KT-2002 AR-9999
Summary
Whenever I have a task that has distinct steps, I automate it. Even running this once would make it worth the effort because (a) I’m a human not a robot, so writing a PowerShell script is a better use of my time than doing manual repetitive work (which is boring and not aligned to the way of the punk), (b) the process results in a task being automated and my brain containing more knowledge as the more I PowerShell the more I learn about it and the faster I am the next time I automate something, and (c) I get to share this with my future self so if I need to do something similar later, I won’t be starting from scratch.
It is a common misconception that it is only worth automating work if it “looks like the hours spent will be more than the time to automate” – but I suggest you rethink your strategy before all eight hours of your day get eaten by stuff your computer could do for you. Or, to put it another way… if someone asks me to do something manual and I automate it every time, I am setting a standard about what I am (and am not) willing to do with my ~27,375 days on the planet, ~15,435 of which are no longer open to choice. It’s not just about economics… it’s about quality of life. Go automate that thing now!