# Posts Tagged perl

Sometimes (especially in one-liners) you want to assign a value only if a corresponding regex (regular expression) that picks out the value matches. I.e if it has once matched you don’t want it overwritten with undef if the regex later fails on a subsequent row in your file.

This can be solved thusly:

 $var =$1 if (/Correct (\d)+ %/); 

The above snippet will assign $var if the regex on the right hand side matches and picks out a value (via the capturing parenthesis on the right hand side and otherwise leave it unchanged. Tags: A quick and dirty one-liner (depending on the length of your lines ;)) to calculate the average of a value in a bunch of files in a directory structure. The below one line picks out a value in each file that matches the name “Logfile*.txt” in the underlying directory structure. In the below case, the line was in the form of:  Correctly Classified Instances 37 60.6557 %  or  Correctly Classified Instances 37 60 %  The code traverses the directory structure from the current dir and picks out the “60.6557″ and sums that over the number of files that matched and then divides with however many files that matched.  find . -name "Logfile*.txt" -exec perl -ne '($var) = (/^Correctly.*\s+((\.|\d)+)\s+%/); print "$var\n" if$var;' '{}' \; | xargs perl -e 'use List::Util qw(sum); print(sum(@ARGV)/scalar(@ARGV)); print "\n";' 

OBS: Not very robust!! But it IS a one-liner!

Tags:

So I had this huge bunch of Spreadsheet::WriteExcel code where I was generating Excel sheets from some statistics I was gathering and I noticed that everytime I opened the Excel file I got a message with something along the lines “File error: data may have been lost”, but on a casual look the spreadsheet looked fine… but of course the devil is in the details. It turns out that if you happen to Spreadsheet::WriteExcel::write in the same cell twice (or more I guess ) that error message is what my Excel produces…. So how to find where the problem is… in this huge bunch of writes…

Enter Adam Kennedy’s excellent Aspect library for Perl, a truly brilliant module!! Thanks Adam!

How did I use it in this case? Here goes…

 .... lots of other code.... use Aspect; my $pointcut = call qr/Spreadsheet::WriteExcel::Worksheet::write$/; # Observe the $at the end, otherwise write_string will also match, and we don't want that ...... code passes ... sub in_my_big_excel_write_block { my %spcoords = (); my$before_write = before { my @args = @{$_->params}; if ($spcoords{$args[1]}{$args[2]} ) { croak "Will do double write at coord: [" . $args[1] . "," .$args[2] . "] = >" . $args[3] . "<\n" . "Previous write at coord: [" .$args[1] . "," . $args[2] . "] = >" .$spcoords{$args[1]}{$args[2]} . "<\n"; } $spcoords{$args[1]}{$args[2]} =$args[3]; } $pointcut; .... lots of$worksheet->write(...) code.... 

 } 
Now my poor perl script will die with a message telling me where and what I tried to write double and what I wrote there previously, now it’s easy to find!

A more general debugging tip would be this simple “before” advice:

 before { if( ($cnt % 1000) == 0 ) { my @args = @{$_->params}; print "Calling " . $_->sub_name . " with args : " . Dumper(@args) . "\n"; sleep 1; }$cnt++; } $pointcut;  Neat huh!? And trust me, this is only some small simple example of the power of the Aspect library. Tags: , , , Har inte upplevt några problem förut, men nu har det börjat strula… ÅÄÖ i Mac OS X Terminalen har blivit ett elände. Det är 3 olika problem. 1. Få ut åäö från applikationer, t.ex ‘cat textfil.txt’ (där filen innehåller åäö) eller från t.ex Perl program 2. Skriva åäö på kommandoraden, för mig har den nu börjat ge ingenting (ok, ett “beep”) eller “(arg: 6)” 3. Få ‘ls’ att korrekt lista filer med åäö i filnamen. Problem 1: Löses genom att ändra terminal settings till Western (ISO Latin 1) Se bild. Mac OS X Terminal Settings Problem 2: Löses med en kombination av lösningen till problem 1 och detta tips på 99.se Dvs, skapa en fil i hemkatalogen som heter “.inputrc” och innehåller följande: set convert-meta off set input-meta on set output-meta on Om man bara skapar .inputrc så blir det knas när man t.ex skriver åäö, följt av 3 backspace som då backar tillbaka in på prompten för mig. Och att bara skapa .inputrc löser inte Problem 1 (för mig iallafall). Problem 3: Har jag ingen lösning på. När jag gör ‘ls’ i en mapp med filer men åäö i filnamnen så visas “a?” istället för å, “a??” isf ä och “o??” isf ö och jag kan inte tab-completa filnamnet, dock så kan man skriva in filnamnet som det står eller copy-pasta det. Om jag gör export LC_ALL=”sv_SE” i terminalen och har “Terminal Settings” satt till UTF-8 då ger ‘ls’ svenska tecken i llistningen men då är jag tillbaka med Problem 1. Har jag Terminal Settings satt till Western (ISO Latin 1) så blir åäö några andra konstiga tecken (typ Ì). Tags: , Ok, this is sooo cool! For us who love Perl and often have the feeling of “If I could just filter this emacs buffer through Perl, how happy I would be!” here’s a tip: Step 1: Mark the buffer (or the parts of it that you are interested on running through Perl). Step 2: Type M+| (“Escape” + “|” (vertical bar) on your normal keyboard) or Step 2 (alt): Issue the “shell-command-on-region” elisp function Step 3: Enter your perl magic, for instance ‘perl -ne “print if /gnu/”‘ to remove all lines which does not match “gnu” Step 4: Be amazed! Now, the result will end up in the minibuffer if it can fit on one line, otherwise a new “shell command output buffer” will be created and hold your stuff. Tags: , , , I often find myself wanting to pass a regex as and argument to some function or another with the initial idea of passing the regex either as a string or a compiled regex (see the qr operator) and I invariably end up with problems. One example is for instance when I pass it as a string I mess up quoting ? / * . etc.. Another example of problems with this approach is the following, imagine that you have written a function to take a regex and you imagine searching all cells in an Excel sheet for cells matching this regex, as follows: sub find_in_worksheet { my$regex = shift;
my $workbook = shift; my$worksheet_name = shift;
my $worksheet =$workbook->worksheet($worksheet_name); my ($row_min, $row_max ) =$worksheet->row_range();
my ( $col_min,$col_max ) = $worksheet->col_range(); for my$row ( $row_min ..$row_max ) {
for my $col ($col_min .. $col_max ) { my$cell = $worksheet->get_cell($row, $col ); next unless$cell;

if( $cell->unformatted() =~$regex ) {
return ($row,$col);
}
}
}
return undef;
}


That works fine, until inevetably you want the cell contents NOT to match the regex… *sigh*, what now…?
Well it turns out that a much more fruitful (and a more general) approach is as follows:

sub find_in_worksheet {
my $predicate = shift; my$workbook = shift;
my $worksheet_name = shift; my$worksheet  = $workbook->worksheet($worksheet_name);

my ( $row_min,$row_max ) = $worksheet->row_range(); my ($col_min, $col_max ) =$worksheet->col_range();

for my $row ($row_min .. $row_max ) { for my$col ( $col_min ..$col_max ) {

my $cell =$worksheet->get_cell( $row,$col );
next unless $cell; if($predicate->($cell->unformatted()) ) { return ($row, $col); } } } return undef; }  You would call this functions as follows: my ($row, $col) = find_in_worksheet( sub {$_[0] =~ /regex to match cell contents/ } , $workbook,$worksheetname);


See the “sub { $_[0] =~ /regex to match cell contents/ }” part? There’s the magic. Now if you later on figure out that you want the cell contents NOT to match, you just call the method thusly: my ($row, $col) = find_in_worksheet( sub {$_[0] !~ /regex to match cell contents/ } , $workbook,$worksheetname);


Volá!

Now you can even change the code for the predicate to something completely different, maybe not even involving regexes… for instance:

my ($row,$col) = find_in_worksheet( sub { $_[0] == 10 } ,$workbook, $worksheetname);  This is just a small example of the power of using “sub’s” as arguments to functions and the very nice syntax in Perl for achieveing it. It also carries the benefit of giving you compile time checking of your regex! (Yes, Perl is a compiled language!) Tags: , , As a followup on date calculations, here’s small one to calculate the number of days between two dates, the same disclamer as for the previous post applies, so, without further ado, here goes: # Given two dates, calculate the number of days between them sub diff_dates { my ($sec1,$min1,$hour1,$mday1,$mon1,$year1) = @_[0..5]; my ($sec2,$min2,$hour2,$mday2,$mon2,$year2) = @_[6..11]; my$time1 = timegm($sec1,$min1,$hour1,$mday1,$mon1,$year1);
my $time2 = timegm($sec2,$min2,$hour2,$mday2,$mon2,$year2); my$diff_seconds = $time1 -$time2;
my $daydiff =$diff_seconds / 24 / 3600;

return int( $daydiff ); }  Tags: , , Ok, here’s a small snippet for calculating a new date given a date and the numer of days forward or backwards from that date without having to include Date::Time or some other big date handling lib which is not part of the standard perl distro. It wouldn’t surprise me if there are some subtile problems with this way of doing it, maybe leap years or something will mess it up, but I have found that it works fine for the most for me at least YMMV! The format of the input is of course according to timegm. # Given a date and the number of days forward or backwards (negative) calculate that date sub calc_date { my ($sec,$min,$hour,$mday,$mon,$year) = @_[0..5]; my$diff_days = $_[6]; my$time = timegm($sec,$min,$hour,$mday,$mon,$year);
my $diff_seconds =$diff_days * 24 * 3600;
my $end_time =$time + $diff_seconds; return localtime($end_time);
}


Tags: , ,

There are two basic alternatives for file slurping:

my @lines = <FH>;


Or…

{
local( $/, *FH ) ; open( FH,$file ) or die "sudden flaming death\n";
$text = <FH>; }  A small comment on Perl 6, it has finally managed to get it right!  my$content = slurp $filename; # or my @lines = slurp$filename;


Does it get easier than that? Shall we look at the Java code for the corresponding operation? No better not, the post will become so long that noone will have time to read it!

Tags: ,

To continue on the track of installing modules as a user in an administered environment, what if you want to use CPAN (you do!) then how would you go about it??

Well, first of fire away cpan:

   perl -MCPAN -e shell


If you haven’t already configured CPAN it will ask you to do it. If you have, you can reconfigure CPAN by issuing the following command at the CPAN prompt:

   o conf init


At this point the configuration will start. Where it is especially interresting for us trapped users are:

Every Makefile.PL is run by perl in a separate process. Likewise we
run 'make' and 'make install' in processes. If you have any
parameters (e.g. PREFIX, LIB, UNINST or the like) you want to pass
to the calls, please specify them here.

If you don't understand this question, just press ENTER.

Parameters for the 'perl Makefile.PL' command?
Typical frequently used settings:

PREFIX=~/perl       non-root users (please see manual for more hints)



Here you would put in whatever existing directory in your home catalog. For instance:

   PREFIX=~/perllib


Or you should be able to use:

   INSTALL_BASE=~/perllib


The next interesting section is:

If you're accessing the net via proxies, you can specify them in the
CPAN configuration or via environment variables. The variable in
the \$CPAN::Config takes precedence.



Ok, now you should only need to finish the configuration and be ready to start installing perl modules from CPAN in your own home directory.

Remember, to use these installed modules later on you may need to add the

use lib

directive to your perl scripts or export a suitable environment variable (see previous post on the topic).

Update: Had a particularly difficult machine where CPAN persistently tried to run ‘make install’ as root, which it did not have privilage to.

This:

perl -Iperl5/lib/perl5 -MCPAN -Mlocal::lib -e 'CPAN::install(Lingua::EN::StopWords)'


Solved that problem…