Local Test Repo Generation: Download Doc File,
Posted: Sat 28 Dec 2019, 00:01
1.0 Introduction
This code is primarily but works.As noted in the title it:
1. Generates a Test Repo and then
2. Adds the test repo to pkg
Part of this process involves running a webserver. You can specify both the webserver to run as well asfallback webservers to run (or install) if for some reason the specified web server won't run.
The actual fallback logic might need more work but currently the webserver specified is "busybox httpd", which I covered in previous post. Since the puppy verson of busybox has this webserver, no other webservers should be run (or installed) unless the user changes the specified web server in the script.
I created this code to test the repo update scripts in Sc0tmann's package manager (i.e. pkg) and so I'll also want to try it with other web servers, just as a means to test package installation, while at the same time testing the repo update scripts. The code which is the subject of this thread is a good demonstration on what can be done with "pkg".
2.0 Cherry Picking Items for the Test Repo
The code to select the repo items has three parts:
2.1. Identify the items of interest
2.2. Randomly pick a few of the items of interest
2.3. Filter the Repo DB Doc File to select only those randomly selected items of interest.
After the Repo Doc File has been filtered, then"
3.1 download only the items in the fitered repo db doc file
3.2 start the web server
3.3 add the new repo to pkg. This adds the item to ~/.pkg/sources, ~/.pkg/sources-all and /ect/apt/sources.list and then converts the repo doc file into puppy format.
There are two scripts which are part of package to convert the repo into puppy format. They are ppa2pup and ppa2pup_gawk. The latter gawk version is many times faster for a large repo but not necessarily faster if there only a few items. The gawk version is part of the main branch but not yet part of an official release of pkg.
2.1 Cherry Picking Items of Interest
As noted above the first step is to identify the items of interest for testing. In our case we are interested in packages which include the epoch number in the version (see manpage debversion). Historically, the puppy package manager has stripped the epoch number from the repo database but this information could be useful for version comparison. The following awk program extracts the first three fields from a puppy "repo db doc file" (e.g. /var/packages/Packages-ubuntu-bionic-main) but only for the packages of interest, which are the ones that have a colon in their version number. THe colon means that the version number includes the epoch.
2.2 Randomly Pick a few of the Items of Interest for Testing
As noted above step two is to randomly pick some of these packages of interest and pragmatically generate AWK code to select only these randomly picked items of interest.
The random packages of interest are selected in the above code by taking the first three rerecords of a random sort:
Rather than output just the package name, we output an array which includes all the packages which we want to include in our filtered "repo db doc file" This array is an associative array (AKA a dictionary or in some cases as a hashmap). Typically this type of data structure has a fast lookup. The keys are simply the package name. If the array has a key equal to the package name then we print the result. The purpose of the code generation here is ironically for readability, In-lining the data like this is more readable when the amount of data is small. For large data sets it would be better for the program to read the data from an external file.
2.3 Filter the Repo DB doc file for only the items of interest.
Here is an example of the code generated by my script:
Lines such as:
were generated by the previously mentioned function "echo_filter_line()" and this output is written to a file. The file is then read back into a string "representing the program" with the following code:
Depending on the options you can execute the program as a string or have it first written to a file. Executing it as a string might be faster but if you write it to a file then it is easier to debug.
3.1 download only the items in the fitered repo db doc file
The code to download only the filtered items is quite simple.
This AWK code is such that it only processes lines that start with "Filename". These lines give the path of the file to download. To download the file the AWK code calls an external function by using AWKs system command, which we use to call wget. The repo root on the local file system was done as an input variable to awk and the repo url was inclined. Whether we inline or alternatively use the -v (for variable) option is somewhat arbitrary.
3.2 start the web server
Given that there are fallback webservers both to run and/or install the full code to start the web server is quite complicated. But in my example the basic code to start the seb server is as follows:
Currently the code uses a configuration file (the -c option) but the actual confuration file is empty. Also as mentioned in my previous post, to display the contents of a directory with busybox httpd, requires cgi. Instruction on how to do this are in my previous post.
3.3 add the new repo to pkg
The code to add a new repo to package is straight forward. For instance on Debian systems the node.js repo can be added as follows:
As mentioned above there are two alternative functions that package uses to add a Debian repo. They are ppa2pup and ppa2pup_gawk. In the test code you choose which one you want to use:
Conclusion
This coding exercise has created for me some examples on how I can filter a Debian repo and create a mirror of the filtered packages automatically with sc0ttman's package manager (i.e. pkg). It will be useful for testing sc0ttman's package manager and I will also be able to adapt the code to other applications. The biggest weakness is perhaps the complexity on using fallback webserver packages but I think this fallback approach will be useful for testing and I think that there are other things that I can learn from these fallback techniques.
This code is primarily but works.As noted in the title it:
1. Generates a Test Repo and then
2. Adds the test repo to pkg
Part of this process involves running a webserver. You can specify both the webserver to run as well asfallback webservers to run (or install) if for some reason the specified web server won't run.
The actual fallback logic might need more work but currently the webserver specified is "busybox httpd", which I covered in previous post. Since the puppy verson of busybox has this webserver, no other webservers should be run (or installed) unless the user changes the specified web server in the script.
I created this code to test the repo update scripts in Sc0tmann's package manager (i.e. pkg) and so I'll also want to try it with other web servers, just as a means to test package installation, while at the same time testing the repo update scripts. The code which is the subject of this thread is a good demonstration on what can be done with "pkg".
2.0 Cherry Picking Items for the Test Repo
The code to select the repo items has three parts:
2.1. Identify the items of interest
2.2. Randomly pick a few of the items of interest
2.3. Filter the Repo DB Doc File to select only those randomly selected items of interest.
After the Repo Doc File has been filtered, then"
3.1 download only the items in the fitered repo db doc file
3.2 start the web server
3.3 add the new repo to pkg. This adds the item to ~/.pkg/sources, ~/.pkg/sources-all and /ect/apt/sources.list and then converts the repo doc file into puppy format.
There are two scripts which are part of package to convert the repo into puppy format. They are ppa2pup and ppa2pup_gawk. The latter gawk version is many times faster for a large repo but not necessarily faster if there only a few items. The gawk version is part of the main branch but not yet part of an official release of pkg.
2.1 Cherry Picking Items of Interest
As noted above the first step is to identify the items of interest for testing. In our case we are interested in packages which include the epoch number in the version (see manpage debversion). Historically, the puppy package manager has stripped the epoch number from the repo database but this information could be useful for version comparison. The following awk program extracts the first three fields from a puppy "repo db doc file" (e.g. /var/packages/Packages-ubuntu-bionic-main) but only for the packages of interest, which are the ones that have a colon in their version number. THe colon means that the version number includes the epoch.
Code: Select all
AWK_PRG_1=\
'BEGIN {FS="|"; OFS="|"}
{ if ($1 ~ /^[^|]+:[^|]+$/ ){
print $1 "|" $2 "|" $3 #We might want to use some of these other fields for a different application
}}'
As noted above step two is to randomly pick some of these packages of interest and pragmatically generate AWK code to select only these randomly picked items of interest.
Code: Select all
function echo_filter_line(){
read a_pkg_name
echo "pkg_filter[\""$a_pkg_name"\"]=\"true\""
}
while read pkg_record; do
echo "$pkg_record" | cut -f2 -d'|' | echo_filter_line
done < <( cat $REPO_DB_DOC_FILE_in | awk "$AWK_PRG_1" ) \
| sort -R | head -n 3 >> "$filter_lines_path"
Code: Select all
sort -R | head -n 3
2.3 Filter the Repo DB doc file for only the items of interest.
Here is an example of the code generated by my script:
Code: Select all
#!/usr/bin/gawk -f
function init_filter(){
pkg_filter["libreoffice-l10n-nso"]="true"
pkg_filter["libmythes-dev"]="true"
pkg_filter["libgcc1-ppc64el-cross"]="true"
}
function filter_accept(s){ #Return true if we are to print the result
if ( pkg_filter[s] == "true" ){
return "true"
} else {
return "false"
}
}
BEGIN {init_filter()}
/^Package:/ { PKG=$0; sub(/^Package: /,"",PKG); FILTER_ACTION=filter_accept(PKG)}
{if (FILTER_ACTION == "true"){
print $0
}
}
Code: Select all
pkg_filter["libreoffice-l10n-nso"]="true"
Code: Select all
$(cat $filter_lines_path)
3.1 download only the items in the fitered repo db doc file
The code to download only the filtered items is quite simple.
Code: Select all
AWK_PRG_3=\
'/^Filename:/ {
system("wget --quiet \"$repo_url_in\" -O \"" RROOT "/" FPATH "\" 1>/dev/null")
}'
cat "${doc_path}/Packages" | awk -v "RROOT=\"$repo_root_path\"" \
"$AWK_PRG_3"
3.2 start the web server
Given that there are fallback webservers both to run and/or install the full code to start the web server is quite complicated. But in my example the basic code to start the seb server is as follows:
Code: Select all
httpd -h /var/www/html
3.3 add the new repo to pkg
The code to add a new repo to package is straight forward. For instance on Debian systems the node.js repo can be added as follows:
Code: Select all
pkg add-repo https://deb.nodesource.com/node_9.x stretch main
Code: Select all
TEST_CMD=ppa2pup_gawk
...
( exec <<< "$repo_name_out"
pkg add-repo "$repo_url_out" "$distro_ver_out" "$stream_out" )
...
case "$TEST_CMD" in
ppa2pup) PKG_PPA2PUP_FN=ppa2pup pkg --repo-update; ;;
ppa2pup_gawk) pkg --repo-update; ;;
esac
This coding exercise has created for me some examples on how I can filter a Debian repo and create a mirror of the filtered packages automatically with sc0ttman's package manager (i.e. pkg). It will be useful for testing sc0ttman's package manager and I will also be able to adapt the code to other applications. The biggest weakness is perhaps the complexity on using fallback webserver packages but I think this fallback approach will be useful for testing and I think that there are other things that I can learn from these fallback techniques.