NAME
mb - Can easy script in Big5, Big5-HKSCS, GBK, Sjis(also CP932), UHC, UTF-8, ...
SYNOPSIS
$ perl mb.pm script_by_mbcs.pl (auto detect encoding of script)
$ perl mb.pm -e big5 script_by_big5.pl
$ perl mb.pm -e big5hkscs script_by_big5hkscs.pl
$ perl mb.pm -e eucjp script_by_eucjp.pl
$ perl mb.pm -e gb18030 script_by_gb18030.pl
$ perl mb.pm -e gbk script_by_gbk.pl
$ perl mb.pm -e sjis script_by_sjis.pl
$ perl mb.pm -e sjis script_by_cp932.pl
$ perl mb.pm -e uhc script_by_uhc.pl
$ perl mb.pm -e utf8 script_by_utf8.pl
$ perl mb.pm -e wtf8 script_by_wtf8.pl
C:\WINDOWS> perl mb.pm script.pl ??-DOS-like *wildcard* available
MBCS quotes:
qq/ DAMEMOJI ������\ /
q/ DAMEMOJI ������\ /
m/ DAMEMOJI ������\ /
s/ DAMEMOJI ������\ / DAMEMOJI ������\ /
split / DAMEMOJI ������\ /
tr/ DAMEMOJI ������\ / DAMEMOJI ������\ /
y/ DAMEMOJI ������\ / DAMEMOJI ������\ /
qr/ DAMEMOJI ������\ /
MBCS subroutines:
mb::chop(...);
mb::chr(...);
mb::do 'file';
mb::dosglob(...);
mb::eval 'string';
mb::getc(...);
mb::index(...);
mb::index_byte(...);
mb::length(...);
mb::ord(...);
mb::require 'file';
mb::reverse(...);
mb::rindex(...);
mb::rindex_byte(...);
mb::substr(...);
mb::use Module;
mb::no Module;
MBCS special variables:
$mb::PERL
$mb::ORIG_PROGRAM_NAME
supported encodings:
Big5, Big5-HKSCS, EUC-JP, GB18030, GBK, Sjis(also CP932), UHC, UTF-8, WTF-8
supported operating systems:
Apple Inc. OS X,
Hewlett-Packard Development Company, L.P. HP-UX,
International Business Machines Corporation AIX,
Microsoft Corporation Windows,
Oracle Corporation Solaris,
and Other Systems
supported perl versions:
perl version 5.005_03 to newest perl
DESCRIPTION
This software is a source code filter, a transpiler-modulino.
Perl is said to have been able to handle Unicode since version 5.8. However,
unlike JPerl, "Easy jobs easy" has been lost. (but we have got it again :-D)
In Shift_JIS and similar encodings(Big5, Big5-HKSCS, GB18030, GBK, Sjis, CP932,
UHC) have any DAMEMOJI who have metacharacters at second octet. Which characters
are DAMEMOJI is depends on whether the enclosing delimiter is single quote or
double quote.
This software escapes DAMEMOJI in your script, generate a new script and
run it.
Larry Wall san's Style
If you're using the utf8 pragma and you have a big headache, probably, you're
on the wrong way. You should back to the Larry Street where is a sign that
says ver.5.00503, once.
There is another path there. Follow that path. Soon, your headache will be
improve.
The "length()" described in the script universally functions as "bytes::length()",
and the "substr()" in the script universally functions as "bytes::substr()".
If you want to know the number of code points of multibyte characters contained
in a scalar value, you have to write "mb::length()". If you want to execute
"substr()" in code point context, you have to write "mb::substr()".
Once, Larry Wall san said like this;
"Easy jobs must be easy."
Welcome to world of Larry Wall san's Style!!
SEE ALSO
https://metacpan.org/author/INA
http://backpan.cpantesters.org/authors/id/I/IN/INA/
https://metacpan.org/release/Jacode4e-RoundTrip
https://metacpan.org/release/Jacode4e
https://metacpan.org/release/Jacode