Create Dynamic Github Profile Using Github Actions and Bash

About a year ago, Github launched a feature that allows adding of README to a user profile. To add the README to your profile, you have to:

create a public repository with a name matching your Github username
place the README.md in the root of the repository

You can learn more about it in Github documentation .

What is a Dynamic Github Profile?

The dynamic Github profile is updated automatically on some external event or by schedule. It is possible with the use of Github Actions . Github Actions are another recently released Github feature. Github Actions is essentially a CI/CD system that allows creating and running custom workflows.

I first learned about the profile README in this article on Hackernoon. The guy used PHP to fetch and update a list of the latest posts in his blog. Although I am a PHP expert myself, I desired to make it more challenging. I realized that XML parsing and replacing text in a file is achievable using Bash native tools only.

Parsing RSS feed

RSS feed is a plain XML file with a simple schema. Here’s a sample from my freshly launched blog:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Posts on Aleksandr Tabakov&#39;s technical blog | atabakoff</title>
    <link>https://atabakoff.com/posts/</link>
    <description>Recent content in Posts on Aleksandr Tabakov&#39;s technical blog | atabakoff</description>
    <image>
      <url>https://atabakoff.com/aleksandr_tabakov.jpeg</url>
      <link>https://atabakoff.com/aleksandr_tabakov.jpeg</link>
    </image>
    <generator>Hugo -- gohugo.io</generator>
    <lastBuildDate>Fri, 27 May 2022 21:33:51 +0200</lastBuildDate><atom:link href="https://atabakoff.com/posts/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>How to run Vaultwarden in Docker/Podman as a systemd service</title>
      <link>https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/</link>
      <pubDate>Fri, 27 May 2022 21:33:51 +0200</pubDate>
      
      <guid>https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/</guid>
      <description>Running Vaultwarden in a container as a systemd service using Podman. How to install Podman, run Vaultwarden in a container, create a systemd config for Vaultwarden service and manage it using systemctl.</description>
    </item>
    
  </channel>
</rss>

Each post is represented by an item element where we need title, link, and pubDate.

Parsing RSS feed with `grep`

The naive approach is to use grep and then build markdown in a bash loop. Let’s first try to grep:

> wget --quiet -O rss.xml https://atabakoff.com/posts/index.xml
> cat rss.xml | grep -Po '<(title|link|pubDate)>[^<]+'
<title>Posts on Aleksandr Tabakov&#39;s technical blog | atabakoff
<link>https://atabakoff.com/posts/
<link>https://atabakoff.com/aleksandr_tabakov.jpeg
<title>How to run Vaultwarden in Docker/Podman as a systemd service
<link>https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/
<pubDate>Fri, 27 May 2022 21:33:51 +0200

Not bad, but we need to get rid of the first three lines and the opening tag:

> cat rss.xml | grep -Po '<(title|link|pubDate)>[^<]+' | tail -n +4 \
     | grep -oE '>([^>]+)' | grep -oE '([^>]+)'
How to run Vaultwarden in Docker/Podman as a systemd service
https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/
Fri, 27 May 2022 21:33:51 +0200
Just to test grep
https://atabakoff.com/testing-grep/
Fri, 31 May 2022 18:33:51 +0200

I added one extra item to test if my expression works with multiple posts. At this point, I start thinking that grep might not be the best option. I quickly wrote a converter to convert RSS to markdown, before researching other options:

#!/bin/bash

items=$( cat rss.xml | grep -Po '<(title|link|pubDate)>[^<]+' | tail -n +4 \
    | grep -oE '>([^>]+)' | grep -oE '([^>]+)' )

IFS=$'\n'
count=0
for item in $items
do
   case $(expr $count % 3) in
    '0')
        title=$item
        link=''
        pubDate=''
        ;;
    '1')
        link=$item
        ;;
    '2')
        pubDate=$( date -d "$item" +'%d/%m/%Y' )
        cat<<EOF
* $pubDate [$title]($link)
EOF
        ;;
    esac
    count=$(($count + 1))
done

Run it to validate:

> ./test.sh
* 27/05/2022 [How to run Vaultwarden in Docker/Podman as a systemd service](https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/)
* 31/05/2022 [Just to test grep](https://atabakoff.com/testing-grep/)

Testing performance

My RSS feed is tiny. To measure performance, we need to run the parser many times. I created a test.sh file:

#!/bin/bash

x=100

while [ "$x" -gt "0" ] ; do
   $(/bin/bash $1) 2>/dev/null
   x=$((x-1))
done

It accepts a script file as a parameter and runs it 100 times in a loop. Let’s run it with time to see how much it’s taking to parse the feed:

> time ./test.sh grep-rss.sh 
./test.sh grep.sh  1,87s user 0,72s system 137% cpu 1,883 total

Not very impressive but expected due to the use of regular expressions.

Parsing RSS feed in Bash

I started googling if there’s a way to parse XML in Bash and found this awesome solution . It describes the same problem of parsing the RSS feed. I modified the code for my needs and stored it in the parse-rss.sh file:

#!/bin/bash

xmlgetnext () {
   local IFS='>'
   read -d '<' TAG VALUE
}

cat $1 | while xmlgetnext ; do
   case $TAG in
      'item')
         title=''
         link=''
         pubDate=''
         ;;
      'title')
         title=$VALUE
         ;;
      'link')
         link=$VALUE
         ;;
      'pubDate')
         pubDate=$( date -d "$VALUE" +'%d/%m/%Y' )
         ;;
      '/item')
         cat<<EOF
* $pubDate [$title]($link)
EOF
         ;;
   esac
done

I ran the same test to compare performance:

> time ./test.sh parse-rss.sh
./test.sh parse.sh  0,81s user 0,33s system 109% cpu 1,042 total

Almost two times faster: 1,042 vs 1,883. It is the final approach I chose for processing of RSS feed.

Updating README.md

Updating a list of posts is simply a replacement. Since markdown allows the usage of HTML code, we can use HTML comments to mark a placeholder for posts:

<!--blog:start-->
<!--blog:end-->

The standard tool to replace text in Bash is sed but it has one limitation. It is a string editor, only processing one string in one step. In our case, both the placeholder and the posts list is a multiline text. Here’s how I solved it:

#!/bin/bash

NUM=$(($2*3))

POSTS=$( cat $1 | head $NUM | tr '\n' '\t' )
cat README.md | tr '\n' '\t' \
    | sed -E "s#(<\!--blog:start-->).*(<\!--blog:end-->)#\1\t${POSTS}\2#g" \
    | tr '\t' '\n' > README.tmp
mv README.tmp README.md
rm -f rss.xml posts.md

Some things worth explaining:

NUM=$(($2*3)) is the number of lines for the specified number of posts; in my case, I want to show five posts taking three lines each (title, link, date)
tr '\n' '\t' is to convert the text to a single line to process it by sed
tr '\t' '\n' is to bring back newlines

Github Actions Pipeline

Now we have our scripts and we need to put them into a pipeline. Github Actions are looking at a special .gihub/workflows directory and process each .yaml file there. I’ve created a posts.yml file there with the following content:

name: Update blog posts

on:
  push:
  workflow_dispatch:
  schedule:
    - cron:  '0 0 * * *'

jobs:
  update-readme-with-latest-posts:
    runs-on: ubuntu-latest
    steps:
    - name: Clone repository
      uses: actions/checkout@v2
      with:
        fetch-depth: 1
    - name: Fetch RSS feed
      run: wget --quiet -O rss.xml https://atabakoff.com/posts/index.xml
    - name: Parse RSS feed
      run: |
        cd ${GITHUB_WORKSPACE}
        ./src/parse-rss.sh rss.xml > posts.md        
    - name: Update README.md
      run:  |
        cd ${GITHUB_WORKSPACE}
        ./src/update-readme.sh posts.md 5        
    - name: Push changes
      run: |
        git config --global user.name "${GITHUB_ACTOR}"
        git config --global user.email "${GITHUB_ACTOR}@users.noreply.github.com"
        git commit -am "Updated blog posts" | exit 0
        git push

Here’s what needs to be explained:

push is to run it on push
cron: '0 0 * * *' is a schedule, in my case every day at midnight
uses: actions/checkout@v2 clones a repository

Then I split fetching, parsing, and updating into separate steps. It allows me quickly localize a problem if something goes wrong. Something worth noting:

cd ${GITHUB_WORKSPACE} is to move to the current working directory, which is the newly cloned repository
${GITHUB_ACTOR} is your username
${GITHUB_ACTOR}@users.noreply.github.com is a special Github email one can use to push the changes to the repository

Conslusion

You can find the full solution in my profile repository . It’s been a lot of fun solving this problem with pure Bash.

That being said, there’re lots of community-made Github actions. They allow creating a dynamic profile without writing any code. All you need to do is to write some YAML. But there’s little challenge in that. It is not a warrior way.

What is a Dynamic Github Profile?#

Parsing RSS feed#

Parsing RSS feed with grep#

Testing performance#

Parsing RSS feed in Bash#

Updating README.md#

Github Actions Pipeline#

Conslusion#