About a year ago, Github launched a feature that allows adding of README to a user profile. To add the README to your profile, you have to:
- create a public repository with a name matching your Github username
- place the
README.md
in the root of the repository
You can learn more about it in Github documentation .
What is a Dynamic Github Profile?
The dynamic Github profile is updated automatically on some external event or by schedule. It is possible with the use of Github Actions . Github Actions are another recently released Github feature. Github Actions is essentially a CI/CD system that allows creating and running custom workflows.
I first learned about the profile README in this article on Hackernoon. The guy used PHP to fetch and update a list of the latest posts in his blog. Although I am a PHP expert myself, I desired to make it more challenging. I realized that XML parsing and replacing text in a file is achievable using Bash native tools only.
Parsing RSS feed
RSS feed is a plain XML file with a simple schema. Here’s a sample from my freshly launched blog:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
<title>Posts on Aleksandr Tabakov's technical blog | atabakoff</title>
<link>https://atabakoff.com/posts/</link>
<description>Recent content in Posts on Aleksandr Tabakov's technical blog | atabakoff</description>
<image>
<url>https://atabakoff.com/aleksandr_tabakov.jpeg</url>
<link>https://atabakoff.com/aleksandr_tabakov.jpeg</link>
</image>
<generator>Hugo -- gohugo.io</generator>
<lastBuildDate>Fri, 27 May 2022 21:33:51 +0200</lastBuildDate><atom:link href="https://atabakoff.com/posts/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>How to run Vaultwarden in Docker/Podman as a systemd service</title>
<link>https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/</link>
<pubDate>Fri, 27 May 2022 21:33:51 +0200</pubDate>
<guid>https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/</guid>
<description>Running Vaultwarden in a container as a systemd service using Podman. How to install Podman, run Vaultwarden in a container, create a systemd config for Vaultwarden service and manage it using systemctl.</description>
</item>
</channel>
</rss>
Each post is represented by an item
element where we need title
, link
, and pubDate
.
Parsing RSS feed with grep
The naive approach is to use grep
and then build markdown in a bash loop. Let’s first try to grep
:
> wget --quiet -O rss.xml https://atabakoff.com/posts/index.xml
> cat rss.xml | grep -Po '<(title|link|pubDate)>[^<]+'
<title>Posts on Aleksandr Tabakov's technical blog | atabakoff
<link>https://atabakoff.com/posts/
<link>https://atabakoff.com/aleksandr_tabakov.jpeg
<title>How to run Vaultwarden in Docker/Podman as a systemd service
<link>https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/
<pubDate>Fri, 27 May 2022 21:33:51 +0200
Not bad, but we need to get rid of the first three lines and the opening tag:
> cat rss.xml | grep -Po '<(title|link|pubDate)>[^<]+' | tail -n +4 \
| grep -oE '>([^>]+)' | grep -oE '([^>]+)'
How to run Vaultwarden in Docker/Podman as a systemd service
https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/
Fri, 27 May 2022 21:33:51 +0200
Just to test grep
https://atabakoff.com/testing-grep/
Fri, 31 May 2022 18:33:51 +0200
I added one extra item to test if my expression works with multiple posts. At this point, I start thinking that grep
might not be the best option. I quickly wrote a converter to convert RSS to markdown, before researching other options:
#!/bin/bash
items=$( cat rss.xml | grep -Po '<(title|link|pubDate)>[^<]+' | tail -n +4 \
| grep -oE '>([^>]+)' | grep -oE '([^>]+)' )
IFS=$'\n'
count=0
for item in $items
do
case $(expr $count % 3) in
'0')
title=$item
link=''
pubDate=''
;;
'1')
link=$item
;;
'2')
pubDate=$( date -d "$item" +'%d/%m/%Y' )
cat<<EOF
* $pubDate [$title]($link)
EOF
;;
esac
count=$(($count + 1))
done
Run it to validate:
> ./test.sh
* 27/05/2022 [How to run Vaultwarden in Docker/Podman as a systemd service](https://atabakoff.com/how-to-run-vaultwarden-in-podman-as-a-systemd-service/)
* 31/05/2022 [Just to test grep](https://atabakoff.com/testing-grep/)
Testing performance
My RSS feed is tiny. To measure performance, we need to run the parser many times. I created a test.sh
file:
#!/bin/bash
x=100
while [ "$x" -gt "0" ] ; do
$(/bin/bash $1) 2>/dev/null
x=$((x-1))
done
It accepts a script file as a parameter and runs it 100 times in a loop. Let’s run it with time
to see how much it’s taking to parse the feed:
> time ./test.sh grep-rss.sh
./test.sh grep.sh 1,87s user 0,72s system 137% cpu 1,883 total
Not very impressive but expected due to the use of regular expressions.
Parsing RSS feed in Bash
I started googling if there’s a way to parse XML in Bash and found this
awesome solution
. It describes the same problem of parsing the RSS feed. I modified the code for my needs and stored it in the parse-rss.sh
file:
#!/bin/bash
xmlgetnext () {
local IFS='>'
read -d '<' TAG VALUE
}
cat $1 | while xmlgetnext ; do
case $TAG in
'item')
title=''
link=''
pubDate=''
;;
'title')
title=$VALUE
;;
'link')
link=$VALUE
;;
'pubDate')
pubDate=$( date -d "$VALUE" +'%d/%m/%Y' )
;;
'/item')
cat<<EOF
* $pubDate [$title]($link)
EOF
;;
esac
done
I ran the same test to compare performance:
> time ./test.sh parse-rss.sh
./test.sh parse.sh 0,81s user 0,33s system 109% cpu 1,042 total
Almost two times faster: 1,042
vs 1,883
. It is the final approach I chose for processing of RSS feed.
Updating README.md
Updating a list of posts is simply a replacement. Since markdown allows the usage of HTML code, we can use HTML comments to mark a placeholder for posts:
<!--blog:start-->
<!--blog:end-->
The standard tool to replace text in Bash is sed
but it has one limitation. It is a string editor, only processing one string in one step. In our case, both the placeholder and the posts list is a multiline text. Here’s how I solved it:
#!/bin/bash
NUM=$(($2*3))
POSTS=$( cat $1 | head $NUM | tr '\n' '\t' )
cat README.md | tr '\n' '\t' \
| sed -E "s#(<\!--blog:start-->).*(<\!--blog:end-->)#\1\t${POSTS}\2#g" \
| tr '\t' '\n' > README.tmp
mv README.tmp README.md
rm -f rss.xml posts.md
Some things worth explaining:
NUM=$(($2*3))
is the number of lines for the specified number of posts; in my case, I want to show five posts taking three lines each (title, link, date)tr '\n' '\t'
is to convert the text to a single line to process it bysed
tr '\t' '\n'
is to bring back newlines
Github Actions Pipeline
Now we have our scripts and we need to put them into a pipeline. Github Actions are looking at a special .gihub/workflows
directory and process each .yaml
file there. I’ve created a posts.yml
file there with the following content:
name: Update blog posts
on:
push:
workflow_dispatch:
schedule:
- cron: '0 0 * * *'
jobs:
update-readme-with-latest-posts:
runs-on: ubuntu-latest
steps:
- name: Clone repository
uses: actions/checkout@v2
with:
fetch-depth: 1
- name: Fetch RSS feed
run: wget --quiet -O rss.xml https://atabakoff.com/posts/index.xml
- name: Parse RSS feed
run: |
cd ${GITHUB_WORKSPACE}
./src/parse-rss.sh rss.xml > posts.md
- name: Update README.md
run: |
cd ${GITHUB_WORKSPACE}
./src/update-readme.sh posts.md 5
- name: Push changes
run: |
git config --global user.name "${GITHUB_ACTOR}"
git config --global user.email "${GITHUB_ACTOR}@users.noreply.github.com"
git commit -am "Updated blog posts" | exit 0
git push
Here’s what needs to be explained:
push
is to run it on pushcron: '0 0 * * *'
is a schedule, in my case every day at midnightuses: actions/checkout@v2
clones a repository
Then I split fetching, parsing, and updating into separate steps. It allows me quickly localize a problem if something goes wrong. Something worth noting:
cd ${GITHUB_WORKSPACE}
is to move to the current working directory, which is the newly cloned repository${GITHUB_ACTOR}
is your username${GITHUB_ACTOR}@users.noreply.github.com
is a special Github email one can use to push the changes to the repository
Conslusion
You can find the full solution in my profile repository . It’s been a lot of fun solving this problem with pure Bash.
That being said, there’re lots of community-made Github actions. They allow creating a dynamic profile without writing any code. All you need to do is to write some YAML. But there’s little challenge in that. It is not a warrior way.