Steeph's Web Site

Go To Navigation
Show/Hide Navigation

Entries tagged 'cat:Software' (Page 4)

SBWG 1.0.0 Delayed To Next Life

I've decided to not publish version 1.0.0 of SBWG.

Very early on I devided against a versioning approach that allowed me to stay below version 1.0 for a very long time by only incrementing the version number by 0.0.1 for important changes but rather chose to increment the version and publish a new version whenever. I stick to this approach because it allowes me to express felt overall progress in the version number. But now that the goals that I at some point set for version 1.0.0 are nearly reached (they pretty much are), this leaves me in a spot where publishing version 1.0.0 would be the next thing to do despite the fact that it's not actually really absolutely perfect, yet. Absolute perfection isn't really my approach. It's still more like a learning project.

So to avoid having published a version that looks perfect on the label but isn't inside, I will skip this version number. Don't get me wrong. My goals are. I've tested it more than I thought I would and thought of potential problems and fixed bugs that I'd argue are not something you'd expect most shell scripters to catch. It definitely works reliably for what it is intended and at least works well for much more than I thought it would in the first few months of starting this project.

So, there is no link in this entry. No new version published today. The next version will be 1.x.x something.

Alright. Now that that's done, I can start to implement new features again next year.
This entry is an update of the entry 'Backup Shell Script'.

I've updated/improved my backup script again.




function usage () {
  echo "This script requires either one or two arguments."
  echo "Usage:"
  echo "$0 JOBLIST"
  echo "JOBLIST File that contains one backup job per line in the format SOURCE_DIRECTORY NAME"
  echo "SOURCE_DIRECTORY Directory that should be backed up. Please, no trailing slash."
  echo "NAME A string that is used to name the backup in the destination."
  exit 1

function single() {
  if grep -qs "$budir " /proc/mounts
    printf "\n"
    printf "Attempting backup of source directory '%s' to '%s'.\n" "$srcdir" "$fullname"
    printf "Number of differential backups to keep: %s\n" "$num"
    printf "Removing oldest backup... "
#    rm -rf "$fullname.$num"
    rsync --archive --delete "$budir/empty/" "$fullname.$num/"		# This is quicker than rm.
    printf "Done.\n"
    for ((i=$num; i>=2; i--)); do
      printf "Renaming '%s' to '%s' ... " "$fullname.$((i-1))" "$fullname.$i"
      mv "$fullname.$((i-1))" "$fullname.$i"
      printf "Done.\n"
    printf "Duplicating last backup ('%s' to '%s')... " "$fullname.0" "$fullname.1"
    [[ -d $fullname.1 ]] && exit 1							# This directory should not exist at that point.
    cp -al "$fullname.0" "$fullname.1"
#    rsync --archive --acls --xattrs --hard-links "$fullname.0/" "$fullname.1/"
    printf "Done.\n"
    printf "\n\n\n" >> "$fullname.log"
    printf "STARTING INCREMENTAL BACKUP AT %s\n" "$now" >> "$fullname.log"
    printf "Starting new incremental backy uppy at '%s.0'..." "$fullname"
    if rsync --exclude ".cache" --archive --no-links --delete "$srcdir/" "$fullname.0/" 2>&1 | tee -a "$fullname.log"
      printf " Done.\n"
      printf "Done.\n" >> "$fullname.log"
      printf " Failed.\n"
      printf "Failed.\n" >> "$fullname.log"
    printf "\nIf this line is here the script finished (with or without errors) at %s\n" "$now" >> "$fullname.log"
    printf "'%s' is not mounted. Aborting." "$budir"
    exit 1

function fromlist() {
  while read job; do
    if [ -n "$job" ]; then
      name="${job#* }"
      srcdir="${job% *}"
  done < "$list"

mkdir "$budir/empty"									# Empty directory for a quicker method to delete a large directory.

case $# in
  mount "$budir"
  printf "Reading backup jobs from list. Defaulting to %s.\n" "$list"
  mount "$budir"
  printf "Reading backup jobs from %s." "$list"
  mount "$budir"
#  I can't decide whether to make the third argument num or budir. I don't need it anyway.
#  num=$3
#  budir=$3
#  mount $budir
#  single

exit 0
SBWG - The Pathshortener And Other Recent Changes

I made it my goal to harden SBWG before I start to implement new features. Before I call the next version of this project 1.0 I want to make sure that unexpected input from the command line or from source files, absurd numbers of absurdly long tags and content items, stupidly weird filenames or random binary data as tag values as well as purposfully created traps in the various places where input is processed are handled well, meaning that nothing fails unless there is no sensible way around it, and if something fails, that nothing breaks. Data should be filtered carefully, errors should be handled well and whereever possible data should be made processible if it was supplied in an unprocessible form to reduce the chances of errors. On top of that I wanted to make sure that the script did its job in a reasonable amount of time considering the circumstances. I mean, it will never be very fast. Bash is just not the right language for that. But there certainly were some repetitive tasks that could be improved. Fot the latter I created a simple caching functionality that will probably be extended in the future. I managed to reduce the (calculated/estimated) generation time of my biggest test web site from almost 300 years to a few days. Actual web sites will of course not take that long to generate, even on a slow machine. A huge web site will maybe take up to one day to generate completely, even without the new options that keep the script from re-generating existing unchanged parts of the web site. But before sombody will try to create such a big web site with SBWG I will probably have improved speed further. And even then it's a worst-case time.

As part of the aforementioned goals I have started working on last new feature before version 1.0. I call it the pathreducer. Since many of the files created by SBWG are named after the tags they represent, they can become quite long and contain almost any printible character, including multi-byte unicode characters or characters of character sets I haven't even heard of. I definitely don't want to restrict more than I already have what characters and how many of them tag values can contain. Especially filesystems used by operating systems from Microsoft are relatively restrictive in maximum allowed directory, path and filename length and allowed characters. By default the pathreducer is not used. But if enabled via command line option or in a web site's settings file, it will filter directory and filenames and shorten them to a user-defined maximum length. If the pathreducer decides to change a path elements it also adds a 6-character hash value to make shortened or otherwise reduced path elements as good as unique.

That works well for now and even can create 8.3 or 6.3 filenames for old DOS filesystems. But the result is not very nice because it isn't aware of what filesystem it is going to write a file to. To be save it removes more characters than it would have to for ext and NTFS filesystems. In the future I may extend the pathreducer to detect the filesystem at least of the root of the output directory automatically and decide how exactly path elements should be reduced according to the actual limitations of the present filesystem. Than it may even be enabled by default, even though it can increase the generation time quite a bit.

There are still some tests that I want to do and I will probably find some more things that I want to fix before version 1.0. But I see light.

Generating Bitmap Files With Bash

I needed a large amount of image files to try something. I wanted them to be different images. But what's in them didn't matter. So I looked at a BMP file to see how I could create one byte for byte automatically. Bitmap is just the first uncompressed format that I thought of. This is what I came up with:

rbmp() {
  echo -n -e '\x42\x4D\x2A\x02\x00\x00\x00\x00\x00\x00\x7A\x00\x00\x00\x6C\x00\x00\x00\x10\x00\x00\x00\x09\x00\x00\x00\x01\x00\x18\x00\x00\x00\x00\x00\xB0\x01\x00\x00\x23\x2E\x00\x00\x23\x2E\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x42\x47\x52\x73\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x'
  c=$(</dev/urandom tr -dc '0123456789ABCDEF' | head -c864 | sed 's/.\{2\}/&\\x/g')
  echo -n -e "$c"'00'

Or, if you would like to do run a command for each pixel/color before it is generated, you can do it like this:

# Create a 24 bit bitmap file of 16x9 randomly colored pixels
randombmp() {
  echo -n -e '\x42\x4D\x2A\x02\x00\x00\x00\x00\x00\x00\x7A\x00\x00\x00\x6C\x00\x00\x00\x10\x00\x00\x00\x09\x00\x00\x00\x01\x00\x18\x00\x00\x00\x00\x00\xB0\x01\x00\x00\x23\x2E\x00\x00\x23\x2E\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x42\x47\x52\x73\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
  while (( i < 433 )); do
    c=$(</dev/urandom tr -dc '0123456789ABCDEF' | head -c2)
    echo -n -e '\x'"$c"

I'm sure I did something wrong. I just copied the header of some BMP file with the right format that I don't even know what it was created with. I didn't actually look up how the header of a BMP file is composed. But it worked.

The Three Bad Reasons Why I Don't Use Git

1. Github or other public Git repositories: Wouldn't be complete

Haven't published everything always, don't want to publish everything, would want to include everything but couldn't. So using public Git repositories would always be accompanied by a feeling of imperfection.

2. I've never used Git for anything really.

Apart from cloning and occasionally updating others' repositories I've never used them. I'm not used to using Git and I don't struggle with not using it. So starting to do so now would require me to hurdle quite some hurdle. I never used it. Why would I start now?

3. It's too late to start now.

I've noticed that the point where it would have been a good idea and would have made a lot of sense to get accustomed to using Git has long passed. So by starting now or in the future I would admit that I didn't take the hurdle when it would have been the right thing to do - when the best time to do so was near the present. I would admit to doing some things not the right way in the past if I would start to do them right from now on. It's easier to pretend that the way it always has been is the right way - the way I'm used to doing things.

That all makes no logical sense. It would be an improvement to start using Git for some things, be it coding projects, any category of texts that I have on my computers, any collection of files, ... The costs of these improvements would be hard disk space, which isn't all that rare for me nowadays and getting used to using Git, which isn't complicated.

So why don't I even try to use it in some cases? Well, I've just honestly told you my three reasons.