MEMOMEM

¶ CommonLisp_CL-PPCRE_01

Common Lisp の正規表現パッケージ CL-PPCRE を使ってみた。

処理系は CLISP



;; quicklisp で "ppcre" を検索
> (ql:system-apropos "ppcre")
#<SYSTEM arnesi+.cl-ppcre-extras / arnesi+-20120909-darcs / quicklisp 2013-04-20>
#<SYSTEM arnesi.cl-ppcre-extras / arnesi-20101006-darcs / quicklisp 2013-04-20>
#<SYSTEM cl-ppcre / cl-ppcre-2.0.4 / quicklisp 2013-04-20>
#<SYSTEM cl-ppcre-template / cl-unification-20130128-cvs / quicklisp 2013-04-20>
#<SYSTEM cl-ppcre-test / cl-ppcre-2.0.4 / quicklisp 2013-04-20>
#<SYSTEM cl-ppcre-unicode / cl-ppcre-2.0.4 / quicklisp 2013-04-20>
#<SYSTEM optima.ppcre / optima-20130420-git / quicklisp 2013-04-20>
#<SYSTEM parser-combinators-cl-ppcre / cl-parser-combinators-20121125-git / quicklisp 2013-04-20>


;; CL-PPCRE を load
> (ql:quickload :cl-ppcre)
To load "cl-ppcre":
  Load 1 ASDF system:
      cl-ppcre
; Loading "cl-ppcre"


(:CL-PPCRE)


;; Sample 文字列
> (defvar user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/534.28 (KHTML, like Gecko) Chrome/12.0.728.0 Safari/534.28")


;; マッチした部分文字列
> (ppcre:scan-to-strings "M\\S+" user-agent)
"Mozilla/5.0" ;


;; グローバルマッチ
> (ppcre:all-matches-as-strings "M\\S+" user-agent)
("Mozilla/5.0" "Macintosh;" "Mac" "ML,")


;; こんな関数を作っておくと直感的？
(defun m/re/g (re str)
 (ppcre:all-matches-as-strings re str))


> (m/re/g "\\d+" user-agent)
("5" "0" "10" "6" "6" "534" "28" "12" "0" "728" "0" "534" "28")

まだまだこれから調べる

参考サイト：

CL-PPCRE - Portable Perl-compatible regular expressions for Common Lisp

第4回 Common Lispライブラリガイド

第2回 Quicklispによるライブラリ環境

#CommonLisp #Lisp #regex #CL-PPCRE #Quicklisp #CLISP

SN 2013/07/20 22:55:01

Archives > CommonLisp_CL-PPCRE_01.html

¶ Perl_hash_and_functions

Perl の hash 操作。

なぜかよく忘れるのでメモ。



# hash の key と value を
# それぞれ変数にバインドするときは each 関数を使う
my %hash = ("key1" => "value1", 
            "key2" => "value2",);
while (my ($key, $value) = each %hash) {
    # ...
}


# hash の中に目的の key が存在するかは
# exists 関数を使う
if (exists $hash{"key1"}) {
    # hash から key を削除は delete 関数
    delete $hash{"key1"};
}

#Perl #hash #each #exists #delete

SN 2013/07/08 01:19:53

Archives > Perl_hash_and_functions.html

¶ Perl_multi_sort

sort コマンドだと、日本語などのマルチバイト文字を無視してソートしてしまうので、Perl でこんなスクリプトを書いておいて、msort なんかのコマンドとして保存しておくと便利。



#! /usr/bin/perl
use warnings;
use strict;


my @lines = ();
while (<>) {
    push @lines, $_;
}


foreach my $line (sort @lines) {
    print $line;
}

このやり方だと複数のファイルを読み込んだとき、全部をひとつの配列に突っ込んだ上でソートしてる。

#Perl #Shell #sort

SN 2013/07/08 01:04:13

Archives > Perl_multi_sort.html

¶ Perl_get_http_status

Perl で URL の HTTP Status (200 OK とか 404 Not Found とか）を確認する。



#!/usr/bin/perl
use strict;
use warnings;
use LWP;
use HTTP::Status;


my $ua = LWP::UserAgent->new();


sub get_http_status {
    my ($url, $ua) = @_; 
    return unless $url or $ua;
    my $response = $ua->head($url);
    my $msg = status_message($response->code);
    return $msg;
}


my @urls = qw(	http://basicwerk.com/ 
                http://basicwerk.com/memo.cgi
                http://basicwerk.com/contact.html
                http://basicwerk.com/not_found.html);


foreach my $url (@urls) {
    print "$url\t";
    print get_http_status($url, $ua);
    print "\n";
}


# 出力結果はこんな感じ
# http://basicwerk.com/	OK
# http://basicwerk.com/memo.cgi OK
# http://basicwerk.com/contact.html OK
# http://basicwerk.com/not_found.html Not Found

例えばこれを任意の URL を引数に受け取って結果を返す get_http_status.pl のようにするなら、@urls の部分を @ARGV に置き換えて、



#!/usr/bin/perl
use strict;
use warnings;
use LWP;
use HTTP::Status;


my $ua = LWP::UserAgent->new();


sub get_http_status {
    my ($url, $ua) = @_; 
    return unless $url or $ua;
    my $response = $ua->head($url);
    my $msg = status_message($response->code);
    return $msg;
}


foreach my $url (@ARGV) {
    print "$url\t";
    print get_http_status($url, $ua);
    print "\n";
}

$ chmod 0755 get_http_status.pl $ get_http_status.pl http://basicwerk.com/ http://basicwerk.com/not_found.html http://basicwerk.com/ OK http://basicwerk.com/not_found.html Not Found

#Perl #LWP #HTTP::Status

SN 2013/07/06 14:53:09

Archives > Perl_get_http_status.html

¶ 20130630_Gauche_study

Gauche の勉強メモ



(define html "<p>This is a a a <em>text</em>.</p>")


;; Perl でいうところの、
;; $foo =~ s/regex/repracement/g;
(regexp-replace-all #/<.+?>/ html "")
;; -> "This is a a a text."


;; グローバルマッチでマッチした文字列のリストを返す
(use gauche.generator)


(define (m/re/g re str)
 (map 
  rxmatch-substring 
  (generator->list 
   (grxmatch re str))))


(m/re/g #/\w+/ html)
;; -> ("p" "This" "is" "a" "em" "text" "em" "p")


;; 重複したリストを省く
(delete-duplicates (m/re/g #/\w+/ html))
;; -> ("p" "This" "is" "a" "em" "text")


;; 例えばこんな感じで
;; ブラックリストを作っておき
(define (is_ng? x)
 (if (< (string-length x) 2) #t #f))


;; ブラックリストを省く
(remove is_ng? (delete-duplicates (m/re/g #/\w+/ html)))
;; -> ("This" "is" "em" "text")

文字列の出力、日付、format 数値左0埋め。



;; Perl でいうところの
;; open my $out, ">", "file";
(define out (open-output-file "file" :if-exists :supersede))
(format out "This is a text.\n")


;; 日付は srfi-19 を use する
(use srfi-19)
(define today (current-date))
(format out 
 "And Today is ~4,'0D/~2,'0D/~2,'0D.\n" 
 (date-year today) 
 (date-month today) 
 (date-day today))


;; Perl でいうところの
;; close $out;
(close-output-port out)


;; file の中身は
;; This is a text.
;; And Today is 2013/07/01.

参考：

6.13 正規表現

http://practical-scheme.net/gauche/man/gauche-refj_51.html

9.8.2 Generator operations

http://practical-scheme.net/gauche/man/gauche-refj_82.html#Generator-operations

リストから重複する要素を取り除く - Gaucheクックブック

http://d.hatena.ne.jp/rui314/20070219/p1

6.22.8 出力

http://practical-scheme.net/gauche/man/gauche-refj_60.html#g_t_00e5_0087_00ba_00e5_008a_009b

#Gauche #Scheme #Lisp #Regex

SN 2013/07/01 00:55:19