About a year ago, Github launched a feature that allows adding of README to a user profile. To add the README to your profile, you have to:

  • create a public repository with a name matching your Github username
  • place the README.md in the root of the repository

You can learn more about it in Github documentation .

What is a Dynamic Github Profile?

The dynamic Github profile is updated automatically on some external event or by schedule. It is possible with the use of Github Actions . Github Actions are another recently released Github feature. Github Actions is essentially a CI/CD system that allows creating and running custom workflows.

I first learned about the profile README in this article on Hackernoon. The guy used PHP to fetch and update a list of the latest posts in his blog. Although I am a PHP expert myself, I desired to make it more challenging. I realized that XML parsing and replacing text in a file is achievable using Bash native tools only.

Parsing RSS feed

RSS feed is a plain XML file with a simple schema. Here’s a sample from my freshly launched blog:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Posts on Aleksandr Tabakov&#39;s technical blog | atabakoff</title>
    <link>https://atabakoff.com/posts/</link>
    <description>Recent content in Posts on Aleksandr Tabakov&#39;s technical blog | atabakoff</description>
    <image>
      <url>https://atabakoff.com/aleksandr_tabakov.jpeg</url>
      <link>https://atabakoff.com/aleksandr_tabakov.jpeg</link>
    </image>
    <generator>Hugo -- gohugo.io</generator>
    <lastBuildDate>Fri, 27 May 2022 21:33:51 +0200</lastBuildDate><atom:link href="https://atabakoff.com/posts/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>How to run Vaultwarden in Docker/Podman as a systemd service</title>
      <link>https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/</link>
      <pubDate>Fri, 27 May 2022 21:33:51 +0200</pubDate>
      
      <guid>https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/</guid>
      <description>Running Vaultwarden in a container as a systemd service using Podman. How to install Podman, run Vaultwarden in a container, create a systemd config for Vaultwarden service and manage it using systemctl.</description>
    </item>
    
  </channel>
</rss>

Each post is represented by an item element where we need title, link, and pubDate.

Parsing RSS feed with grep

The naive approach is to use grep and then build markdown in a bash loop. Let’s first try to grep:

> wget --quiet -O rss.xml https://atabakoff.com/posts/index.xml
> cat rss.xml | grep -Po '<(title|link|pubDate)>[^<]+'
<title>Posts on Aleksandr Tabakov&#39;s technical blog | atabakoff
<link>https://atabakoff.com/posts/
<link>https://atabakoff.com/aleksandr_tabakov.jpeg
<title>How to run Vaultwarden in Docker/Podman as a systemd service
<link>https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/
<pubDate>Fri, 27 May 2022 21:33:51 +0200

Not bad, but we need to get rid of the first three lines and the opening tag:

> cat rss.xml | grep -Po '<(title|link|pubDate)>[^<]+' | tail -n +4 \
     | grep -oE '>([^>]+)' | grep -oE '([^>]+)'
How to run Vaultwarden in Docker/Podman as a systemd service
https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/
Fri, 27 May 2022 21:33:51 +0200
Just to test grep
https://atabakoff.com/testing-grep/
Fri, 31 May 2022 18:33:51 +0200

I added one extra item to test if my expression works with multiple posts. At this point, I start thinking that grep might not be the best option. I quickly wrote a converter to convert RSS to markdown, before researching other options:

#!/bin/bash

items=$( cat rss.xml | grep -Po '<(title|link|pubDate)>[^<]+' | tail -n +4 \
    | grep -oE '>([^>]+)' | grep -oE '([^>]+)' )

IFS=$'\n'
count=0
for item in $items
do
   case $(expr $count % 3) in
    '0')
        title=$item
        link=''
        pubDate=''
        ;;
    '1')
        link=$item
        ;;
    '2')
        pubDate=$( date -d "$item" +'%d/%m/%Y' )
        cat<<EOF
* $pubDate [$title]($link)
EOF
        ;;
    esac
    count=$(($count + 1))
done

Run it to validate:

> ./test.sh
* 27/05/2022 [How to run Vaultwarden in Docker/Podman as a systemd service](https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/)
* 31/05/2022 [Just to test grep](https://atabakoff.com/testing-grep/)

Testing performance

My RSS feed is tiny. To measure performance, we need to run the parser many times. I created a test.sh file:

#!/bin/bash

x=100

while [ "$x" -gt "0" ] ; do
   $(/bin/bash $1) 2>/dev/null
   x=$((x-1))
done

It accepts a script file as a parameter and runs it 100 times in a loop. Let’s run it with time to see how much it’s taking to parse the feed:

> time ./test.sh grep-rss.sh 
./test.sh grep.sh  1,87s user 0,72s system 137% cpu 1,883 total

Not very impressive but expected due to the use of regular expressions.

Parsing RSS feed in Bash

I started googling if there’s a way to parse XML in Bash and found this awesome solution . It describes the same problem of parsing the RSS feed. I modified the code for my needs and stored it in the parse-rss.sh file:

#!/bin/bash

xmlgetnext () {
   local IFS='>'
   read -d '<' TAG VALUE
}

cat $1 | while xmlgetnext ; do
   case $TAG in
      'item')
         title=''
         link=''
         pubDate=''
         ;;
      'title')
         title=$VALUE
         ;;
      'link')
         link=$VALUE
         ;;
      'pubDate')
         pubDate=$( date -d "$VALUE" +'%d/%m/%Y' )
         ;;
      '/item')
         cat<<EOF
* $pubDate [$title]($link)
EOF
         ;;
   esac
done

I ran the same test to compare performance:

> time ./test.sh parse-rss.sh
./test.sh parse.sh  0,81s user 0,33s system 109% cpu 1,042 total

Almost two times faster: 1,042 vs 1,883. It is the final approach I chose for processing of RSS feed.

Updating README.md

Updating a list of posts is simply a replacement. Since markdown allows the usage of HTML code, we can use HTML comments to mark a placeholder for posts:

<!--blog:start-->
<!--blog:end-->

The standard tool to replace text in Bash is sed but it has one limitation. It is a string editor, only processing one string in one step. In our case, both the placeholder and the posts list is a multiline text. Here’s how I solved it:

#!/bin/bash

NUM=$(($2*3))

POSTS=$( cat $1 | head $NUM | tr '\n' '\t' )
cat README.md | tr '\n' '\t' \
    | sed -E "s#(<\!--blog:start-->).*(<\!--blog:end-->)#\1\t${POSTS}\2#g" \
    | tr '\t' '\n' > README.tmp
mv README.tmp README.md
rm -f rss.xml posts.md

Some things worth explaining:

  • NUM=$(($2*3)) is the number of lines for the specified number of posts; in my case, I want to show five posts taking three lines each (title, link, date)
  • tr '\n' '\t' is to convert the text to a single line to process it by sed
  • tr '\t' '\n' is to bring back newlines

Github Actions Pipeline

Now we have our scripts and we need to put them into a pipeline. Github Actions are looking at a special .gihub/workflows directory and process each .yaml file there. I’ve created a posts.yml file there with the following content:

name: Update blog posts

on:
  push:
  workflow_dispatch:
  schedule:
    - cron:  '0 0 * * *'

jobs:
  update-readme-with-latest-posts:
    runs-on: ubuntu-latest
    steps:
    - name: Clone repository
      uses: actions/checkout@v2
      with:
        fetch-depth: 1
    - name: Fetch RSS feed
      run: wget --quiet -O rss.xml https://atabakoff.com/posts/index.xml
    - name: Parse RSS feed
      run: |
        cd ${GITHUB_WORKSPACE}
        ./src/parse-rss.sh rss.xml > posts.md        
    - name: Update README.md
      run:  |
        cd ${GITHUB_WORKSPACE}
        ./src/update-readme.sh posts.md 5        
    - name: Push changes
      run: |
        git config --global user.name "${GITHUB_ACTOR}"
        git config --global user.email "${GITHUB_ACTOR}@users.noreply.github.com"
        git commit -am "Updated blog posts" | exit 0
        git push        

Here’s what needs to be explained:

  • push is to run it on push
  • cron: '0 0 * * *' is a schedule, in my case every day at midnight
  • uses: actions/checkout@v2 clones a repository

Then I split fetching, parsing, and updating into separate steps. It allows me quickly localize a problem if something goes wrong. Something worth noting:

  • cd ${GITHUB_WORKSPACE} is to move to the current working directory, which is the newly cloned repository
  • ${GITHUB_ACTOR} is your username
  • ${GITHUB_ACTOR}@users.noreply.github.com is a special Github email one can use to push the changes to the repository

Conslusion

You can find the full solution in my profile repository . It’s been a lot of fun solving this problem with pure Bash.

That being said, there’re lots of community-made Github actions. They allow creating a dynamic profile without writing any code. All you need to do is to write some YAML. But there’s little challenge in that. It is not a warrior way.