I thought some people might find interesting AWK code that writes AWK code. I know it is the type of thing that style Nazi's hate. The idea here is to write code that quickly computes a canonical alias. By canonical, I mean a standard form. So we choose just one alias to be the canonical alias, and easy alias for the same pkg will return the same canonical result.
Backgound
This will be applied to finding categories as follows. Puppy has a list of files that defines the categories for each package.
/usr/local/petget/categories.dat
Each category is defined like an array. In the form:
Code: Select all
PKGCAT_Desktop_applet=" gfontsel glipper minixcal xclipboard "
AWK uses associative arrays (aka dictionaries, maps, hashmaps, hashtables)...well the exact name depends on an implementation. For instance if the associative arrays are not implemented via a hashtable then some of these names aren't applicable. Typically though associative arrays have fast lookup like a hash table.
We can instead use the package name as the key and the category as the value. This way we can quickly look up the category. However, the name of the package might not be in the same format as in categories.dat. We might have to trim a version number off it or translate it to a different alias for the package (different linux distrosname the packages differently).
What this code does
This code only solves the first part of the problem which is the translation of a package name to a different alias.
Test Input
Code: Select all
rxvt-unicode,urxvt,urxvt-unicode
gtk+,gtk+2*
gtkdialog,gtkdialog3
dbus*,libdbus*,libdbus-glib*
mesa,mesa_*,libgl1-mesa*,mesa-common*
sane,sane-backends
samba,samba-tng,samba_*,mountcifs
udev,udev_*,libudev*,libgudev*
xdg_puppy,xdg-utils
perl_tiny,perl-base,perl-modules,perlapi*
** This alias data was taken from sc0ttmann's pkg (/usr/sbin/pkg#L212)
Test Output
Code: Select all
function get_canonical_name(s){
if (s in CANONICAL_ARY){
return CANONICAL_ARY[s]
} else {
switch(s){
case /gtk\+2.*/:
return "gtk+"
case /libdbus-glib.*/:
return "dbus"
case /libdbus.*/:
return "dbus"
case /dbus.*/:
return "dbus"
case /mesa-common.*/:
return "mesa"
case /libgl1-mesa.*/:
return "mesa"
case /mesa_.*/:
return "mesa"
case /samba_.*/:
return "samba"
case /libgudev.*/:
return "udev"
case /libudev.*/:
return "udev"
case /udev_.*/:
return "udev"
case /perlapi.*/:
return "perl_tiny"
}
return s
}
init_CANONICAL_ARY(){
CANONICAL_ARY["perl_tiny"]="perl_tiny"
CANONICAL_ARY["perl-base"]="perl_tiny"
CANONICAL_ARY["samba-tng"]="samba"
CANONICAL_ARY["sane-backends"]="sane"
CANONICAL_ARY["gtkdialog3"]="gtkdialog"
CANONICAL_ARY["xdg_puppy"]="xdg_puppy"
CANONICAL_ARY["mountcifs"]="samba"
CANONICAL_ARY["mesa"]="mesa"
CANONICAL_ARY["urxvt-unicode"]="rxvt-unicode"
CANONICAL_ARY["gtkdialog"]="gtkdialog"
CANONICAL_ARY["perl-modules"]="perl_tiny"
CANONICAL_ARY["urxvt"]="rxvt-unicode"
CANONICAL_ARY["rxvt-unicode"]="rxvt-unicode"
CANONICAL_ARY["xdg-utils"]="xdg_puppy"
CANONICAL_ARY["gtk+"]="gtk+"
CANONICAL_ARY["samba"]="samba"
CANONICAL_ARY["sane"]="sane"
CANONICAL_ARY["udev"]="udev"
}
https://pastebin.com/kwmtNern
and I have even rougher code than this about how I will use the above generated code:
https://pastebin.com/4Pw5QrW5
I don't recommend looking too much into either these these pastbin scrips yet because it is not finished. What I will say though is that since AWK code is applied repeadly over a data file then it makes since to me to optimize it and if you can optimize AWK code by having the AWK code be written by other code the so be it. Style police be damned!
Anyway, when I finish everything that I have in mind here I will apply it to resolving (at least in part) towards resolving:
Issue #44 in pkg - slack2pup and ppa2pup can't get good package categories
Other AWK Topics
AWK: match($2,/^(.*[^:digit:])([:digit:]*$|$)/,pkg_split)
awk: Converting deb dependency info into puppy format
AWK Based Version Comparison