Slicing and Dicing Log Files


 


This post is part of the #PSBlogWeek PowerShell blogging series. #PSBlogWeek is a regular event where anyone interested in writing great content about PowerShell is welcome to volunteer for. The purpose is to pool our collective PowerShell knowledge together over a 5-day period and write about a topic that anyone using PowerShell may benefit from. #PSBlogWeek is a Twitter hashtag so feel free to stay up to date on the topic on Twitter at the #PSBlogWeek hashtag. For more information on #PSBlogWeek or if you’d like to volunteer for future sessions, contact Adam Bertram (@adbertram) on Twitter.


To begin to slice or dice a log file, first, we need to get
it into a variable.

To open a log file or text file, open a PowerShell prompt
and run a Get-Content
command. For example, this command uses Get-Content to get the content of the
License.txt file in the local path.

image001

Now that we know how to use Get-content, we can save the
results in whatever variable we choose. In my case, I chose $file.

image002

Now that we have the contents of our file in the $file
variable, we can begin to chop this file up.

To find out how many lines are in our file, type in $file.Count.
This expression uses the Count property of the string array in $file to get the
count of array elements.  By default, Get-Content reads the contents of the
file into a System.Array type. Each line is placed in a separate array element.
To see the first line of our file, type:

image003

As you can see from the first image, this is the first line
of our License.txt file.

Now, if we want to search through this line for matches on
the word ‘and’, we can use the Match operator.

image004

The Match operator automatically populates the $matches variable
with the first match of the item specified after the –Match operator.  If we
had used the –Match
operator
on the entire file our results would have shown us the number of
matches in a result set. For Brevity the count of the mached value and is
shown.

PS Ps:\> ($file -match
 'and').count
4
 PS Ps:\> ($file).count
10 
 

As you can see our total count for the $file is 10
and the matched number is 4.

If we wanted to replace the word and, instead of using a
match, we could use the –Replace
operator
.

image005

As you can see, the ‘and’ in the string is now
replaced with ‘replaced And’.

To do the same operation on the entire file use the –Replace
operator

 PS ps:\> $file -replace 'and', 'replaced And'

Copyright (c) 2010-2013 Keith
 Dahlby replaced And contributors Permission is hereby granted,
 free of charge, to any person obtaining a copy of this software replaced And
 associated documentation files (the "Software"), to deal in the Software
 without restriction, including without limitation the rights to use, copy,
 modify, merge, publish, distribute, sublicense, replaced And/or sell
 copies of the Software, replaced And to permit persons to whom the
 Software is furnished to do so, subject to the following conditions:

The above copyright notice
 replaced And this permission notice shall be included in all copies or
 substantial portions of the Software.
THE SOFTWARE IS PROVIDED
 "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
 BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE replaced And NONINFRINGEMENT.
 IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
 DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
 OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
 SOFTWARE OR THE USE OR OTHER DEALINGS IN
 THE SOFTWARE. 

To save the file, pipe the results to the Set-Content cmdlet with
the path of where you want it saved.

To read the file in one string so there is no array, specify
the Raw dynamic parameter. It will save the contents of the file as a single
string in the variable you specify.

$file = get-content -raw
.\LICENSE.txt

To separate this file into two chunks, we’ll use the Substring
method
of string.

$file.substring(0,542) |
 out-file .\LICENSE1.txt
 

The first value in Substring
method
tells Substring the position of the character in our $file
string we want to start with (beginning at 0) and the second number is the
position of the character where we want to end.  So now we end up with half of our
original License1.txt file.

It’s even easier to get the second half of the file:

$file.substring(542) | out-file
.\LICENSE542.txt

Substring starts from position 542 and goes to the end of
the string.

Using a license file is not very practical. However, we did
figure out how to read the file into a variable, match items in the result, get
a count and split the file in two, now let’s move onto logs that IIS produces.

These files can be found on
C:\inetpub\logs\logfiles\w3svc(x), where (x) is the number of the site that is
logging data for the site.

After inspecting logs for this web server we’ve discovered
that there are a great number of small files with not much information in them.
Since this is just a local desktop we haven’t worked much with IIS here so the
first thing we’ll do is to consolidate all these log files into a single log
and then begin the process of drilling into the information in the consolidated
log file.

 PS
 C:\inetpub\logs\logfiles\w3svc1> dir 
    Directory:
 C:\inetpub\logs\logfiles\w3svc1

 Mode
 LastWriteTime         Length Name                          
----
 -------------         ------ ----                          
-a----         3/7/2015  10:11
 AM            338 u_ex150307.log                
-a----        6/14/2015   7:03
 PM           5522 u_ex150615.log                
-a----        6/15/2015   8:27
 PM            472 u_ex150616.log                
-a----        8/17/2015   9:19
 PM            593 u_ex150818.log                
-a----        8/19/2015   6:43
 AM           2616 u_ex150819.log                
-a----        8/19/2015   7:36
 PM           1516 u_ex150820.log                
-a----        8/20/2015   6:08
 PM            761 u_ex150821.log                
-a----        8/23/2015  10:36
 AM           3666 u_ex150822.log                
-a----        8/24/2015   5:42
 PM            761 u_ex150825.log                
-a----        8/26/2015   4:04
 PM           2117 u_ex150826.log                
-a----        8/30/2015   8:59
 AM           2574 u_ex150829.log                
-a----        9/24/2015   8:27
 PM            763 u_ex150925.log                
-a----        9/28/2015   8:53
 PM           1457 u_ex150929.log                
-a----        10/3/2015   8:46
 AM            769 u_ex151001.log  
-a----        11/9/2015   7:52
 PM           1469 u_ex151110.log 

As you can see, we have

(dir).count

22

…22 files in our  IIS logging folder. Get-Content takes a
Path value by property name. So, you can pipe Get-ChildItem to Get-Content
without the need for a foreach loop.

   dir | get-content | set-content iislog.txt 
 

Now that we have the content of all my files in iislog.txt, we
want to filter out anything that isn’t really a log.  This can be done with a
regular expression.  The one we’ll use is this filter:

“(\d+)-(\d+)-(\d+) (\d+):(\d+):(?:\d+) ::”

This will match on items like this one: 2015-11-10 02:36:54
::1

(get-content .\iislog.txt) -match “(\d+)-(\d+)-(\d+)
(\d+):(\d+):(?:\d+) ::”
| Set-Content filterediislog.txt

Now since we have our log file trimmed and the excess items
that were at the top of the log chopped we can now begin the process of
discovering interesting items about our log file.

How many 404’s has this web server experienced?  We can find
the number of 404’s by reading our log file back into a variable and using the –Match operator to find
our 404’s

PS
 C:\inetpub\logs\LogFiles\W3SVC1> $iislog = get-content  .\filterediislog.txt
 
PS
 C:\inetpub\logs\LogFiles\W3SVC1> $404Logs = $iislog -match ' 404 '
 
PS
 C:\inetpub\logs\LogFiles\W3SVC1> $404Logs.count
16
PS
 C:\inetpub\logs\LogFiles\W3SVC1> ($iislog –match ‘ 404 ‘).count
16

As we can see from the scripting above we used the Get-Content command to
get the file. Then we used the –Match operator to get
us the number of matches.  Since we didn’t use the –raw switch on the Get-Content command the
result set is scalar allowing matches to give us the number of matches it found
in the collection. Here are the first 2 matches in the results of the scalar
match:

PS
 C:\inetpub\logs\LogFiles\W3SVC1> $iislog -match ' 404 ' | select -first 2

2015-06-15 00:05:31 ::1 GET /t/curl_http_..clusterexternalnic_8090.storage.172.25.3.20.nodestat
 - 80 - ::1 Mozilla/5.0+(Windows+NT+6.3;+WOW64)+AppleWebK it/537.36+(KHTML,+like+Gecko)+Chrome/43.0.2357.124+Safari/537.36
 - 404 3 50 496
2015-06-15 00:05:31 ::1 GET
 /favicon.ico - 80 - ::1
 Mozilla/5.0+(Windows+NT+6.3;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/43.0.2357.124+Safari/537.36
 http://localhost/t/curl_http_..clusterexternalnic_8090.storage.172.25.3.20.nodestat
 404 0 2 4 

Now that we have the number of 404’s in our log file we can
see what queries resulted in a 404. To achieve this we’ll use the first result
set to figure out what we need to select with a Regular expression to get the query
names out of our log file.


 C:\inetpub\logs\LogFiles\W3SVC1> $iislog -match ' 404 ' | select -first 1 

2015-06-15 00:05:31 ::1 GET
 /t/curl_http_..clusterexternalnic_8090.storage.172.25.3.20.nodestat - 80 - ::1
 Mozilla/5.0+(Windows+NT+6.3;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/43.0.2357.124+Safari/537.36
 - 404 3 50 496 

The starting character in log is a / and the ending
characters are  (a space followed by a – ).

So we’ll re-use our regular expression used earlier and add
::1 GET and add it to a $404Filtered Variable.

  $404filtered = ($iislog -match ‘ 404
)
-REPLACE “(\d+)-(\d+)-(\d+)
(\d+):(\d+):(?:\d+) ::1 GET”
,

Note: This works for this log
file because the address written after the :: is a local loopback address. So
you may have to change this based on your scenario.

Below is the 1st item in the array of 16 based on
the –replace String above.

PS
 C:\inetpub\logs\LogFiles\W3SVC1> $404filtered | select -first 1
 
 /t/curl_http_..clusterexternalnic_8090.storage.172.25.3.20.nodestat
 - 80 - ::1
 Mozilla/5.0+(Windows+NT+6.3;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/43.0.2357.124+Safari/537.36
 - 404 3 50 496
 

Now we need to remove the string on the end to give us our
failing page. Again we can do this with a regular expression we’ll use this
expression: ‘-.*’

PS
 C:\inetpub\logs\LogFiles\W3SVC1> $404filtered -replace '-.*',''
 /t/curl_http_..clusterexternalnic_8090.storage.172.25.3.20.nodestat
  /favicon.ico 
 /t/console_print.css  

For brevity sake we’ve only selected the first 3 failing
pages. Now we can take these files to the developers and ask them if they
should be here or if some other file in the website is hitting them that
shouldn’t.

As you can see you can do a number of things with very few
commands to split / join or search through logs and drill into the information
that will help you the most.  PowerShell has many other methods and commands
that can be used to search through files and operate on them. In my next blog
post I’ll go through some of the other methods and ways that will make dealing
with logs that much easier.

For more really good examples, see this blog post, which was
used as a primary source for the IIS regular expressions:

https://infracloud.wordpress.com/2015/09/28/iis-log-parsing-using-powershell/

For many other regular expressions that can be used with
PowerShell see this website.

http://regexlib.com/Search.aspx?k=iis&c=-1&m=-1&ps=20

Until next time keep on scripting!

Thom

 

 

One thought on “Slicing and Dicing Log Files

Leave a comment