Hello and welcome to my little nock of the internet. Here I have my blog which mainly contain posts about tech, the outdoors, my journey as a PhD Fellow, cooking, and some times mead brewing.

A bit of background information before you step into my world of crazy. I am Lars, a.k.a. looopTools, a Software Engineer and PhD Fellow living in Northern Denmark. My research focus on the application of generalised deduplication in real world storage systems. This includes the design, implementation, and evaluation of prototypes and more complex systems. I have 10+ years of experience from industry mainly through student positions, but also as self-employed consultant, or full-tome employee. I mainly work in low-level user space, high-level kernel space, and storage systems in general. Besides research and software development, I also love the outdoors and try to go out as often as possible (not enough at the moment) and I am an aspiring author currently working on a few different novels. I also dabble in being a more advance home cook and baker, which you may see some posts about. Finally I like the ancient art brewing mead, a.k.a. honey wine, and experiment with different flavour combinations and ageing times.

# Developing a Cake recipe trial by trial - Part 1

Backing cakes is one of the easiest things you can do in a kitchen and often it is just as fast as shake and back but tastier. However, for quite a while I have struggled with backing chocolate cakes, the flavour have been great by the texture have not been on par. I have for a long while sworn to this (danish) Alletiders mumse CHOKOLADEKAGE recipe when cooking as it got closest to what I wanted, but it was still lacking something.

In this post and following post, I will tack you through my development of my own recipe which is based on the recipe above.

# Step 1: The base recipe

Let us start on common ground with the base ingredient list and recipe in English

Ingredient Amount
Cacao 36g
All purpose flour 250g
Sugar 300g
Baking soda 5g
Baking powder 5g
salt a nib
Egg 2
Water 2.5 dl.

A few comments on this the Cacao, baking soda, and baking powder is given in table spoon and tea spoons in the original recipe. I have calculated it here in grams based on standard transformation tables, I have from a danish cookbook.

Sift the dry ingredients together and melt the butter. Mix in butter, eggs, and water. Put the mixture in a 23 by 23 cm pan and bake in a preheated oven at 170 on the lowest rack for 35 min by conventional heat. Now the recipe does not call for buttering off the pan so I am assuming the original author wants us to baking paper. Also the author states that the cake is freezer, this is totally true and in my opinion this cake is best served “luke” or cold and not piping hot.

## What is wrong with this Recipe?

Nothing! I really love it. Then why the hell do I want to change it? There are a few reasons, I like an airy cake, which I do not think the original recipe provides, it has a very monotone flavour, and I like to mix it up.

So let us get nutty!

# Step 2: First prototype

So my thoughts behind the first changes was, what do I want to change. I want a more airy cake and I want to modify the flavour a bit. Also I hate melting butter for cakes, for some random reason, so why not creme it?

Making the cake more airy can be fixed by whipping the whites of the eggs and gently fold them in. This also gives us a bit more control when adding the yolks as we can “creme” them in with the butter.

For the fun of it I add 125g of dark chocolate chips as well.

Changing the flavour, chocolate and cocoa goes well with dark berries and I positively love boysenberries (a Frankenstein berry cultivated by mixing european raspberries, blackberries and American dewberries and logan berries) and jam made from them is pretty readily available in most supermarkets in Denmark. I decided to go with a 100g, without changing other ingredients. Now let us move on to the modified recipe!

In a bowl creme together the sugar and butter until fully combined and white. Separate the egg yolks and whites. Put the whites in a cool bowl (if you have a metal one use that, copper even better) and the yolks in a separate dish. In a separate bowl sift the flour, cocoa, salt, baking soda and baking powder together. Now put one egg yolk into the butter and sugar and one table spoon of the flour mix. Mix until fully combined and then repeat with the next egg yolk. Now mix in the water and the rest of the flour mixture until fully combined. Mix in the chocolate chips. Set a side for a bit while you whip the egg whites to stiff peaks. If you do not have dexterity problems, then there is no reason to use an electrical whisk for this, just learn a proper whisking technique. First fold in half of the egg whites until fully combined. Then gently fold in the other half of the whites. Rub a backing tin or springform pan (I use the latter) with butter and sprinkle with sugar, and gently pour in the mixture. Increase the backing time to 60 minutes.

## Comments and things to change

The cake did become more airy due to the whipping of the whites and gently folding them in. The chocolate chips made the cake to rich (which is a thing I never thought I should say). Finally, the test people and I still thought some texture was missing and the boysenberry flavour was not powerful enough.

1. The reason the cooking time increases, is due to the add amount of liquid from the jam.
2. The amount of jam was to little, the flavour of the boysenberries was not as powerfull as I wanted
3. The chocolate chips made the cake to rich
4. The reason I sprinkled the form with sugar as well, is because it creates a caramelised outside on the cake. Which gives an extra layer of texture.
5. What if we add a marzipan wrapping? This came about due to again thinking the flavour was to monotone.

# Step 3: Second prototype and currently newest

So for this bake, I removed the chocolate chip cookies and increased the jam to 300g (an entire glass) and after the bake I rolled out marzipan and wrapped the cake. The rest of the mixing process was the same. We kept the same baking temperature, but increased the baking time to 90 minutes.

## Comments and things to change

First of the cake was nice, lovely airy (these cakes never get light), tasted of both chocolate and boysenberry. But a few things, 1) Due to the heavily increased amount of sugar added by both the jam and marzipan, the cake got way to sweet. This needs to be rectified and I will comment on how I wanna try to do this. 2) The marzipan did add a lowly almond flavour (for the love of the gods buy good quality), but the almond flavour was not pronounced enough. 3) Marzipan clearly works best if the cake is served “luke” / cold. I still have a few pieces in my freezer and the once I have tried from there had a much strong marzipan flavour.

Now let us talk about 1) in more detail. I am not sure I want to keep the marzipan, but I do want to keep the jam level. The solution therefore to reduce the sweetness is to reduce the sugar level. So I am thinking about cutting as much as 125 grams of sugar. But this will have to be trial by error as it is very hard to judge.

Next 2) Adding more almond flavour can also help us address 1). So a friend said that some almond cream in between two layers of this cake would be nice. This got me thinking, lots of chocolate cakes use coffee to cut through the riches and sweets of a cake using the bitterness of the coffee. Therefore the next version will feature an espresso cream with Amaretto (Italian Almond liquor).

Finally 3) I am a bit torn here, either I keep the marzipan or I find another way to put in more almond flavour (besides Amaretto). But I do not like chopped almonds inside of cake. I could use essence of almond, but I kind of see that as cheating. So I have to come up with ideas for how to fix this.

This is were this first post drops off, but I will keep you posted as I change the recipe. In the end I will add the final recipe to my recipe collection which you can find on my GitHub page!

# What keyboard and mouse do I use?

As a developer and a PhD Fellow I spend a lot of time in front of a monitor, screen, mouse, and keyboard and have done so for quite a while. To put it into perspective when I was fifteen I earned money for the first website I made for some else, by the time I was 18 I coded Java for companies in my spare time. During my summer holiday at 18 I coded 8-16 hours a day for a start-up company and basically in 2 months I shattered my wrists. It got really bad, but I thought I was young a powered through, resulting in RSI pains in my wrists. I had to dramatically decrease my working hours and focus on studying (though I did work anyway) and through exercises I got to a point where I could type for 5-6 hours, which is decent even for an everyday working developer. But prolonged use of standard keyboards still gives me pain, so I had to come up with a solution to this problem. In this post I will cover the hardware I use to achieve a ergonomic desk setup and share links to some of the stuff, bare in mind none is sponsored it is purely my recommendation.

Now back to why I some times whip out my Apple keyboard and others. Some times I get a weird pain in my shoulder (due to a bike crash a few years back) and the only way I have found to lessen the pain while I work is to switch between different keyboards. At home this it is the Apple keyboard and at work I have a Das Keyboard 4 Professional and a Vortex Pok3r. I actually got the Pok3r first on the recommendation of a friend, who loves it, but the only positive thing I have to say about that keyboard is that it can be used as a blunt weapon in an emergency situation. I can, best case, use it for 45 minutes before I have so much pain in my wrists that I need to take a 30 minute break and my shoulders will start acing pretty quickly after. This is of cause without having pain in the should before. Multiple people have claimed that this is just because I am not use to using it and that my fingers are not used to Cherry MX Blue switches. Well the should pain maybe, but I love MX Blue switches and all though my Kinesis is not with MX Blues I would love it to be and I actually have had a Das Keyboard 4 Professional with MX Blues at a former employer which was amazing. The Das Keyboard I have at work now is MX Browns, just to not have my colleagues kill me when I type on it and it helps a lot on the should pains I fell from time to time. I think it is simply down to the switch of position and it is also the keyboard I use as auxiliary keyboard.

Next, mouse! What type of mouse do I use? I am ergonomic lover and therefore I love trackball mice. At work I use a Logitech M570 which Logitech no longer makes and it has been replaced by the mouse I have at home, which is the Ergo M575. Both mice are amazing and I have no complaints. Well I have one, but I am not sure if that is the M570 or my Mac being annoying. At work I mainly use a Dell XPS 15” (model 9560 I think) running (surprise) Fedora which the M570 just works with no problem what so ever. But if I connect the M570 to my MacBook Pro the connection is often dropped and I have been unable to figure out why. Finally at home I also use an Apple Magic trackpad (gen 1) and I have done so for years, since it was available in Denmark I have had one. The reason being that moving windows between spaces on macOS with the keyboard is just a pain in the arse and none of the tools I found did a good job. So for that I use the trackpad and nothing else really. I have had a Magic Mouse (gen 1) as well and that is mouse send from hell, it is even worse that the Pok3r at providing wrist pain. I much prefer the old Apple Mighty Mouse, which I still have a few off and use to play Warcraft.

So this is the hardware I use and what I have tried.

## TL;DR

I use a Kinesis Advantage 2 LF keyboard and Logitech M570 and M575 trackball mice.

# Using multiple binarya in Postgres queries with libpqxx

In deduplication one of the things we use a lot is data fingerprints, where we use some partial function to generate a unique fingerprint for a data chunk. We use this fingerprint for a variety of things, comparing chunks, referencing chunks, and sometimes as file names for the chunks. For one of the systems I work on, I need to store these fingerprints in a database and to be honest this have caused me a bit of concern, which I will go into in a bit. But first let me describe how we generate the fingerprints and why.

In the system we have our chunks represented as vector of bytes in the form of std::vector<uint8_t>. These varies in size depending on the chunk size we are using for the experiments, but the most common I use is 4kB. Comparing chunks of 4kB may not seem expensive, but you are still comparing 32768 bits for each chunk you compare against. To reduce the processing time we generate fingerprints and in our system we use SHA-1 to do this we use Harpocrates [1] which is a wrapper around openSSL’s implementation which makes it super easy to use. Using SHA-1 gives us a 20 byte fingerprint so only 160 bites are now compared, which is a reduction of roughly 99.51% which is pretty cool.

Now that is all good and cool but how do we store this in a database? This has caused me some concern. First I was thinking (and have done this [2]) that we could store the fingerprints in the database as “safe strings” by converting them to hexadecimal representation, using the following conversion method:

#include <cstdint>
#include <string>

#include <vector>

#include <sstream>

std::string fingerprint_to_string(const std::vector<uint8_t>& fingerprint)
{
std::stringstream ss;
for (const auto& elm : fingerprint)
{
ss << std::hex << elm;
}
return ss.str();
}


My main concern with this approach is that it increases the length of the fingerprint from 20 to 40 bytes, admittedly not a lot but enough that I was annoyed. Also when it is stored in the database a couple of bytes might be added. Some would say, why not just use blobs then? Well I would like to be able to compare the fingerprints, so using a blob is not always an option, and blobs are more for large amounts of data. Luckily we are using PostgreSQL which has a data type bytea [3] (short for bytearray or binary string), which costs the length of the binary string plus 1 to 4 bytes. Not quite sure why it is 1 to 4 and not one or the other. This was intriguing to me as libpqxx (pqxx) [4] actually does support bytea with the data type pqxx::binarystring. But I ran into a problem, I like using prepared statements with pqxx, so I trieded the following

#include <pqxx/pqxx>

...
// Prepare statment on connection
m_conn.prepare("seen_fingerprints", "SELECT fingerprint FROM basis WHERE fingerprint IN $1"); ... std::vector<std::vector<uint8_t>> seen_fingerprints(const std::vector<std::vector<uint8_t>>& fingerprints) { std::vector<pqxx::binarystring> b_fingerprints(fingerprints.size()); for (size_t i = 0; i < fingerprints.size() ++i) { auto fingerprint = pqxx::binarystring(fingerprints.at(i).data(), fingerprints.at(i).size()); b_fingerprints.at(i) = fingerprint; } // start transaction pqxx::work worker{m_conn}; pqxx::results res {worker.exec_prepared("seen_fingerprints", b_fingerprints)}; // handled result ... // Commit transaction worker.commit(); return result; }  But I ran into this error: 'pqxx::string_traits<pqxx::binarystring>::size_buffer' is not defined and the only solution I could find was to do it for each fingerprint individually. Those who program knows that sounds like a dumb idea and I was aware of this. So I ended up making a issue on pqxx’s GitHub page [5] and with the help of some of the people of there actually found a solution. Below is a simplified insert function using this solution. #include <fmt/core.h> void register_fingerprints(const std::vector<std::Vector<uint8_t>>& fingerprints) { std::vector<pqxx::binarystring> params(fingerprints.size()); std::string params_list = ""; // PQXX 1 index parameter list size_t param_index = 1; for (const auto& fingerprint : fingerprints) { auto b_fingerprint = pqxx::binarystring(fingerprint.dat(), fingerprint.size()); params.at(param_index - 1) = b_fingerprints; params_list = params_list + fmt::format("(${}, 1),", param_index);
param_index = param_index + 1;
}

// substring - 1 because we reomve th elast ,
std::string query = fmt::format("INSERT INTO basis (fingerprint, reference) VALUES {}",
param_list.substr(0, param_list.size() - 1));

pqxx::work worker(m_conn);
worker.exec_params(query, pqxx::prepare::make_dynamic_params(params));
worker.commit();
}


Basically we create a query with a an amount of parameters equal to the number of fingerprints and then we construct and insert statement from this, and parse the params vector as a dynamic parameter, which basically solved my problems.

Now one thing I have yet to test is how the underlying libpq [6] actually handles the pqxx::binarystring and if it converts it to a hexadecimal string, based on the answers I got in [5] I think it does in which case the storage savings are none. But I still believe I am using a more appropriate data type so that is a win.

Final note, some may ask “what is fmt and where does it come from?” fmt [7] is a string formatting library for C++ by Victor Zverovich [8], which brings easy string formatting to C++ and it feels and looks a lot like what is available by default in Python. I was turned on to this library by Jason Turner (lefticus) [9] the host of C++ weekly [10] (which I recommend that you follow). fmt will be integrated in to C++20 as std::format from the header <format> [11]. It is a nifty “little” library and I recommend that you give it a spin, it has made my life so much easier.

# Autumn has come to the northen realms

Autumn has come to north and it has hit the realm of Denmark with rain, cold winds, and despair. The Danes are retreating into their homes like Hobbits waiting for spring for them to be released up on the world again!

For some reason this seems to be the view of many a person outside of the Nordic countries. They seem to expect us to stay inside for most of autumn and winter. But let us be honest what a sad life that would be. In this post I will talk some of the things what I do outdoor in autumn.

In general, it is true that we try to make our homes even more couchy for winter and autumn, we turn on candles, make tea or coffee, more often settle down with a book inside than out. But autumn for me also means a lot of walks in the forests, experiencing the colours gradually switch from dark or bright green, to a tapestry of green, brown, yellow, and red. Observing how the forests grows quiet as the activity of the animals lessen. It is quite amazing and a gift to be able to observe this transformation

However, it is not just the woods of deciduous trees that change, the ever greens does to. The woods becomes more damp, the rain reinvigorate the moss and life somehow comes back into the green landscape. You can truly believe that in these forests and woods of ours their be magic. Sitting down and imagining the elves of old (read before Tolkien) or the goddess Freya walking through these lands is eyes and a thing I do. But I do not limit myself to just go for walks in nature, I enjoy going for over night shelter trips in the autumn and winter, or okay I do it gladly all year around. I usually do this with a group of friends. We meet up on a Friday and stay in the shelter until Sunday. If it is mushroom season we will forage for that or if we are near a lake safe for bathing, we will take a dip.

Finally, FIRE!!!! be it at a camp site or in a garden, making a fire sitting with blankets or warm clothes drinking a good scotch or cup of coffee roasting sausages or marshmallows is really nice and a thing I try to do as much as possible. Although currently living in an apartment kind of puts a lit on this.

So that is some of the things I do outdoors during autumn.

# Printing via smbclient on Linux command line

After I updated to Fedora 32, my life with my Dell XPS 15” work laptop became so much easier. No random boot locks or I do not want to connect to that Bluetooth device problems. But one thing broke, and that was printing using printers connected through a SAMBA server.

There are a couple of ways to solve this, either go CUPS or run the program smbclient straight from the command line and to be honest, the later speaks to me on a primal level. Running the client is relatively easy; we simply need to type the command:

smbclient <SERVER>/<PRINTER> -U <USER> -W <DOMAIN> -c print <FILE>


Where <SERVER> is the URL for the SAMBA server, <PRINTER> is the name of the printer you want to print from and more specifically the name of the printer on the SAMBA server. <USER> is your username for that SAMBA server, <DOMAIN> is the work-group domain of your user on the SAMBA server, and <FILE> is the name of the file you want to print. Okay, it may not be simple, and you have to remember all those details. Next, what about print settings like printing a duplex? Well, for that you need another tool, and the one I found works best is pdftops. With this pdftops you can change the format of the file you want to print, if you want to change an A4 document to duplex, you will need to run pdftops using the following parameters:

pdftops --paper A4 -duplex <INPUT> <OUTPUT>


Where <INPUT> is the path to the input file, I have tested with .odt and .pdf files, and <OUTPUT> is the path of the generate postscript file. Yes, it generates an additional file. Then you will run the smbclient command with the output file as <FILE> input. But that is a lot of work, so can we script our self out of this? Well, naturally, my young padawan we can!

So first, what do we need? Well, we need to be able to call systems command. What else? Well nothing really, but it would be nice not to have to remember the server URL, printer names, and also where the print is. Okay, seems fair, what scripting/programming language do we use? Well, whatever we want and for this example let us go with Python and let us use a JSON file to keep information on the server, printers, and more.

Let us start with the JSON file and let us assume it has this format:

{
"server": "SERVER_URL",
"domain": "DOMAIN",
"printers": [
{
"name": "PRINTER_NAME",
"location": "WHERE IS THIS THING"
}
]
}


Here printers is a list of the printers you want to be able to print from. Remember every time you have to add a new printer, and you simply add the information to the JSON file. I have not included the username for security reasons, so you will have to type the username and password every time you run the script. Additionally, I will assume that the first printer in the list is our default printer.

So first, how do we list printers?

import json

configFilePath = 'PATH_TO_CONFIG_FILE'

def listPrinters():
with open(configFilePath, 'r') as configFile:

for index, printer in enumerate(printers):
print('{!s}: {!s} - \${!s}'.format(index, printer['name'], printer['location']))


So how do we select a printer? Well notice how I printed the index of the printer, let us use that index, and remember how I assumed that default printer was the first in the printers list.


def getPrinter(index=0):
with open(configFilePath, 'r') as configFile:


Okay, now we can get the printer information that we need. So now how do we make a function for running the duplex command? Here I prefer to use the subprocess for tasks like this, so we will define a function that takes a file and make a Postscript version, in the same path as the input file. We will assume for now that we only handle A4 paper documents.

import os
import subprocess

def generateDuplex(file):
psFile = '{!s}.ps'.format(os.path.splitext(file)[0])

subrocess.call(['pdftops', '-paper', 'A4', '-duplex', file, psFile], encoding='utf-8')

return psFile


This creates the Postscript file with the same name as the input file but with a .ps as an extension. As a state, the output file will be created in the same directory as the input file, so you may want to clean this up after print.

Next step is to print the file, right? Yes, but how do we get the server and the domain? Like all the rest of the configurations, I just like having a function for it. So let us make that one quickly.

def getServerAndDomain():
with open(configFilePath) as configFile:
return data['server'], data['domain']


Now we are ready to print, and we will again use subprocess to call the system command, this time for smbclient.

import shlex

def printDoc(printer, user, file):
server, domain = getServerAndDomain()
subprocess.call(['smbclient', '{!s}{!s}'.format(server, printer),
'-U', user, '-W', domain, '-c', 'print ' + shlex.quote(file)], encoding='utf-8')


I think a few comments are needed here. First, what does shlex.quote do? It fix quotations around the file name for us, so we do not have to handle it. Secondly, I do not check the output of the smbclient call to see if it succeed. I should do that, and it is in the future plans, right now I just needed to get the script to work. But it is fairly easy to get the response, and we would use subprocess.check_call(...) to get the needed information. Finally, where does the user come from? Well, where does the printer come from? Those are command-line arguments, which are given by you! Now let us make the entry point part of the script. To handle user inputs, I will use the argparse package, and I will explain this part of the code in, well, parts.

First, we set up the argument parser and the inputs it should be able to handle, as shown in the code block below. Now a few things that have always annoyed me with argparse was that I could not find a way to provide an argument after the argument name. As an example, when listing printers, I do not want to have to type python print_doc --printers 1. I just want to type python print_doc --printers. During the development of this script, I talked to a friend about this, and he informed me about action='store_true' parameter to argparse which I was unaware off. I may have missed while I read the documentation. So as --printers and --duplex do not need an input parameter, we configure these two in that way.

import argparse
if __name__ == '__main__':

parser = argparse.ArgumentParser()
parser.add_argument('--duplex', '-d', help='set the user', action='store_true')
parser.add_argument('--printers', help='print list of all printers', action='store_true')
args = parser.parse_args()


Next, I will assume that --printers are run just to list printers, and no other parameters should be acted upon if that argument is present and only list the printers.

import sys

if args.printers:
listPrinters()
sys.exit(1)


Next, if we are not looking for printers, we check for the user and document.

    if not args.document:
print('You have not provided a document to print')
system.exit(1)

if not args.user:
print('You have not provided a user')
system.exit(1)


Next, we select the printer; if we provide no printer, we choose the default printer.

    printer = 
if args.printer:
printer = getPrinter(int(args.printer))
else:
printer = getPrinter()


Next, we check for enabled duplex and generate the duplex version of the document if needed. We keep the path of the document, such that we can replace it with the duplex document path if needed.

document = args.document
if args.duplex:
document = generateDuplex(document)


Finally, we print the document and remove the duplex version if duplex was enabled.

printDoc(printer, args.user, document)

if args.duplex:
subprocess.call(['rm', document], encoding='utf-8')


So now we can print using this script by typing python SCRIPT_NAME.py -u <USERNAME> --document <PATH_TO_DOCUMENT> if want to enable duplex python SCRIPT_NAME.py -u <USERNAME> --document <PATH_TO_DOCUMENT> -d, and if we want to set a printer python SCRIPT_NAME.py -u <USERNAME> --document <PATH_TO_DOCUMENT> -p <PRINTER_NAME>. The printDoc will prompt you for your user name if needed.

Now finally if we want to be fancy and not have to type python <PATH_TO_SCRIPT> every time we print. But want to type, let us say, smbprint instead, we make an alias in our Terminal configuration. The name of the configuration file varies, but the command we add should be structure the same for BASH, ZSH, and most other common terminals.

alias smbprint="python <PATH_TO_SCRIPT>"


This is how I handle the print using SAMBA printers problem, that has come with Fedora 32, for me. If you want the full script, I have made it available on GitLab, and the version I have presented in this post is tagged as 1.0.0.

# Future plans

I have a few plans and dreams for this script.

• Enable other paper sizes
• Change document orientation
• Check the output from subprocess call to smbclient
• Get a list of printers from the SAMBA server

# Known bugs

There is one known bug and which is that some files with white space(s) in their name cause a problem with the print statement. I have the issue both with the script and the raw smbclient command.