Local Test Repo Generation: Download Doc File,

Message

s243a · #1 Post by **s243a** » Sat 28 Dec 2019, 00:01

1.0 Introduction

This code is primarily but works.As noted in the title it:
1. Generates a Test Repo and then
2. Adds the test repo to pkg

Part of this process involves running a webserver. You can specify both the webserver to run as well asfallback webservers to run (or install) if for some reason the specified web server won't run.

The actual fallback logic might need more work but currently the webserver specified is "busybox httpd", which I covered in previous post. Since the puppy verson of busybox has this webserver, no other webservers should be run (or installed) unless the user changes the specified web server in the script.

I created this code to test the repo update scripts in Sc0tmann's package manager (i.e. pkg) and so I'll also want to try it with other web servers, just as a means to test package installation, while at the same time testing the repo update scripts. The code which is the subject of this thread is a good demonstration on what can be done with "pkg".

2.0 Cherry Picking Items for the Test Repo

The code to select the repo items has three parts:
2.1. Identify the items of interest
2.2. Randomly pick a few of the items of interest
2.3. Filter the Repo DB Doc File to select only those randomly selected items of interest.

After the Repo Doc File has been filtered, then"
3.1 download only the items in the fitered repo db doc file
3.2 start the web server
3.3 add the new repo to pkg. This adds the item to ~/.pkg/sources, ~/.pkg/sources-all and /ect/apt/sources.list and then converts the repo doc file into puppy format.

There are two scripts which are part of package to convert the repo into puppy format. They are ppa2pup and ppa2pup_gawk. The latter gawk version is many times faster for a large repo but not necessarily faster if there only a few items. The gawk version is part of the main branch but not yet part of an official release of pkg.

2.1 Cherry Picking Items of Interest
As noted above the first step is to identify the items of interest for testing. In our case we are interested in packages which include the epoch number in the version (see manpage debversion). Historically, the puppy package manager has stripped the epoch number from the repo database but this information could be useful for version comparison. The following awk program extracts the first three fields from a puppy "repo db doc file" (e.g. /var/packages/Packages-ubuntu-bionic-main) but only for the packages of interest, which are the ones that have a colon in their version number. THe colon means that the version number includes the epoch.

Code: Select all

  AWK_PRG_1=\
'BEGIN {FS="|"; OFS="|"}
{ if ($1 ~ /^[^|]+:[^|]+$/  ){
    print $1 "|" $2 "|" $3 #We might want to use some of these other fields for a different application
 }}'

2.2 Randomly Pick a few of the Items of Interest for Testing

As noted above step two is to randomly pick some of these packages of interest and pragmatically generate AWK code to select only these randomly picked items of interest.

Code: Select all

function echo_filter_line(){
    read a_pkg_name
    echo "pkg_filter[\""$a_pkg_name"\"]=\"true\""
}
  while read pkg_record; do
    echo "$pkg_record" | cut -f2 -d'|' | echo_filter_line
  done < <( cat $REPO_DB_DOC_FILE_in | awk "$AWK_PRG_1" ) \
  | sort -R | head -n 3 >> "$filter_lines_path"

The random packages of interest are selected in the above code by taking the first three rerecords of a random sort:

Code: Select all

 sort -R | head -n 3

Rather than output just the package name, we output an array which includes all the packages which we want to include in our filtered "repo db doc file" This array is an associative array (AKA a dictionary or in some cases as a hashmap). Typically this type of data structure has a fast lookup. The keys are simply the package name. If the array has a key equal to the package name then we print the result. The purpose of the code generation here is ironically for readability, In-lining the data like this is more readable when the amount of data is small. For large data sets it would be better for the program to read the data from an external file.

2.3 Filter the Repo DB doc file for only the items of interest.

Here is an example of the code generated by my script:

Code: Select all

#!/usr/bin/gawk -f
function init_filter(){
pkg_filter["libreoffice-l10n-nso"]="true"
pkg_filter["libmythes-dev"]="true"
pkg_filter["libgcc1-ppc64el-cross"]="true"
}
function filter_accept(s){ #Return true if we are to print the result
  if ( pkg_filter[s] == "true" ){
    return "true"
  } else {
    return "false"
  }
}
BEGIN {init_filter()}
/^Package:/ { PKG=$0; sub(/^Package: /,"",PKG); FILTER_ACTION=filter_accept(PKG)}
{if (FILTER_ACTION == "true"){
    print $0
  }
}

Lines such as:

Code: Select all

pkg_filter["libreoffice-l10n-nso"]="true"

were generated by the previously mentioned function "echo_filter_line()" and this output is written to a file. The file is then read back into a string "representing the program" with the following code:

Code: Select all

$(cat $filter_lines_path)

Depending on the options you can execute the program as a string or have it first written to a file. Executing it as a string might be faster but if you write it to a file then it is easier to debug.

3.1 download only the items in the fitered repo db doc file

The code to download only the filtered items is quite simple.

Code: Select all

  AWK_PRG_3=\
'/^Filename:/ { 
	 system("wget --quiet \"$repo_url_in\" -O \"" RROOT "/" FPATH "\" 1>/dev/null") 
	 }'
   cat "${doc_path}/Packages" | awk -v "RROOT=\"$repo_root_path\"" \
    "$AWK_PRG_3"

This AWK code is such that it only processes lines that start with "Filename". These lines give the path of the file to download. To download the file the AWK code calls an external function by using AWKs system command, which we use to call wget. The repo root on the local file system was done as an input variable to awk and the repo url was inclined. Whether we inline or alternatively use the -v (for variable) option is somewhat arbitrary.

3.2 start the web server

Given that there are fallback webservers both to run and/or install the full code to start the web server is quite complicated. But in my example the basic code to start the seb server is as follows:

Code: Select all

httpd -h /var/www/html

Currently the code uses a configuration file (the -c option) but the actual confuration file is empty. Also as mentioned in my previous post, to display the contents of a directory with busybox httpd, requires cgi. Instruction on how to do this are in my previous post.

3.3 add the new repo to pkg

The code to add a new repo to package is straight forward. For instance on Debian systems the node.js repo can be added as follows:

Code: Select all

pkg add-repo https://deb.nodesource.com/node_9.x stretch main

As mentioned above there are two alternative functions that package uses to add a Debian repo. They are ppa2pup and ppa2pup_gawk. In the test code you choose which one you want to use:

Code: Select all

TEST_CMD=ppa2pup_gawk
...
    ( exec <<< "$repo_name_out"
      pkg add-repo "$repo_url_out" "$distro_ver_out" "$stream_out" )
...
case "$TEST_CMD" in 
ppa2pup) PKG_PPA2PUP_FN=ppa2pup pkg --repo-update; ;;
ppa2pup_gawk) pkg --repo-update; ;;
esac

Conclusion

This coding exercise has created for me some examples on how I can filter a Debian repo and create a mirror of the filtered packages automatically with sc0ttman's package manager (i.e. pkg). It will be useful for testing sc0ttman's package manager and I will also be able to adapt the code to other applications. The biggest weakness is perhaps the complexity on using fallback webserver packages but I think this fallback approach will be useful for testing and I think that there are other things that I can learn from these fallback techniques.

s243a · #2 Post by **s243a** » Sat 28 Dec 2019, 05:46

The above post is now ready for reading.

musher0 · #3 Post by **musher0** » Sat 28 Dec 2019, 06:48

For now this is just a note to myself. TWYL.

sc0ttman · #4 Post by **sc0ttman** » Sat 28 Dec 2019, 12:25

In case this is of interest to anyone (it's somewhat related), there is lots of
Bash CGI related stuff here:

http://murga-linux.com/puppy/viewtopic.php?t=115252

(old)Puppy Linux Discussion Forum

(old)Puppy Linux Discussion Forum

Local Test Repo Generation: Download Doc File,

Local Test Repo Generation: Download Doc File,