Free beer is not OK

Coding, Operating Systems, Security, Technology []

The phrase “free, as in beer” is often used in connection with Open Source software, to indicate that the software is being given to users without any expectation of payment. This distinguishes it from “free, as in speech” which might erroneously suggest that the software could do whatever it liked.

Actually, were it not for Andres Freund’s recent discovery, a certain piece of software called xz utils might have actually become free to do whatever it liked (or more correctly, whatever its evil master desired). NIST gives it a criticality of 10/10. Freund announced his discovery a month after the tainted xz had been released, though thankfully before it had worked its way into production systems.

The xz utilities provide various data compression features that are widely used by many other software packages and notably sshd, the software responsible for providing secure access to a server by administrators. By compromising sshd, an attacker armed with a suitable digital key (matching the one injected into the poisoned xz utilities) could easily access the server and do absolutely anything. Steal data. Initiate fraudulent transactions. Forge identities. Plant additional malware. Encrypt or destroy everything on the server, and anything securely connected to the server. The ramifications are terrifying.

This was no ordinary attack. The attacker(s) created a number of personas as far back as 2022, notably one named Jia Tan, to gradually pressure the XZ Utils principal maintainer Lasse Collin into trusting the malicious contributors. Once trust had been established, a complex set of well-hidden modifications were made, and Tan released version 5.6.0 to the unsuspecting world. An attack so sophisticated suggests nation-state involvement, and fingers are pointing in many directions.

There is currently no universally accepted mechanism to determine the bona fides of open source contributors. Pressuring a lone project maintainer to let you into the inner circle, especially one who is exhausted/poor/vulnerable, is therefore a viable attack vector. Given the number of “one person” open source projects out there, many of which have roles in critical infrastructure, it would surprise nobody if it were to be revealed that other projects have also been subject to similar long-term attacks.

For now, the best we can hope for is increased vigilance, more lucky breaks like that of Andres Freund and perhaps better support/funding for the open source developers.

AI, AI captain

Legal and Political, Security, Technology []

Artificial Intelligence is appearing everywhere and it is increasingly difficult to stop it seeping into our lives. It learns and grows by observing everything we do, in our work, in our play, in our conversations, in everything we express to our communities and everything that community says to us. We are being watched. Many think it is just a natural progression from what we already created. To me, it is anything but natural.

Spellchecking: an AI precursor

Half a century ago, automatic spell-checking was introduced to word processing systems. Simple pattern matching built into the software enabled it to detect unknown words and suggest similar alternatives. By adding statistical information it could rearrange the alternatives so that the most likely correct word would be suggested first. Expand the statistics to include nearby words and the words typed to date and the accuracy of the spell-checking can become almost prescient. Nevertheless, it is all based on statistical information baked into your software.

But where did those statistics come from? We know that over a thousand years ago the military cryptographers were determining word frequency in various languages as an aid to deciphering battlefield communications. Knowledge of letter, word and phrase frequencies was a key component of the effort to defeat the Enigma machine during World War II. So by the time the word processor was commonplace, the statistical basis of spellchecking was also present. It evolved from hundreds of years of analysis, and one could not in any way discern any of the original analysed text from the resulting statistics.

Grammar checking: pseudo-intelligence

In time, spellcheckers were enhanced with the ability to parse sentences and detect syntactic errors. The language models, lexical analysers, pattern matchers and everything else that goes into a grammar checker can be self-contained. The rules and procedures are generally unchanging, though one could gradually build up some adjustments to the recorded statistics based on previous text that was exposed to the system. It appears somewhat intelligent but only because there is a level of complexity involved that a human might find challenging.

Predictive text: spooky cleverness

Things started to get interesting when predictive text systems became mainstream, especially among mobile device users where text entry was cumbersome. Once again, statistics played a huge role, but over time these systems were enhanced to update themselves based on contemporary analysis. Eventually the emergence of (large) language models “trained” on massive amounts of content (much of it from the Web) enabled these tools to make seemingly mind-reading predictions of the next words you would type. Accepting the predicted text could save time, but sometimes the predictions are wildly off base, or comically distracting. Worse, however, is the risk that as more and more people accept the predicted text the more we lose the unique voice of human writers.

Certain risks surface from the use of predictive text based on public and local content, notably plagiarism and loss of privacy. Unlike the simple letter/word counting of the military cryptographers of the ninth century, today’s writing assistance tools have been influenced by vast amounts of other people’s creative works beyond mere words and its suggestions can be near copies of substantial portions of this material.

While unintended plagiarism is worrying, the potential for one’s own content to become part of an AI’s corpus of knowledge is a major concern. In the AI industry’s endless quest for more training data, every opportunity is being exhausted, whether or not the original creators agree. In many cases the content was created by people long before feeding it to an AI became a realistic possibility. The authors would never have imagined how their work could be used (abused?), and many are no longer with us to voice their opinions on it. If they were asked, that is.

And what of your local content? You might not want to feed that to some AI in the cloud so that it influences what the AI delivers to other people. Maybe it is content that you must protect. Maybe you are both morally and legally obliged to protect it. In that case, knowing that an AI is nearby you would take precautions to not expose your sensitive content to such an AI. Right?

Embedded AI: the hidden danger

What if the AI were embedded in many of the tools at your disposal? Protecting your sensitive content (legal correspondence, medical reports etc.) from the “eyes” of an AI would be challenging. Your first task would to make yourself aware of its presence. That, unfortunately, is where it is getting harder every day.

Microsoft introduced Windows Copilot in 2023, including the business versions of their Office suite, meaning that AI is present in your computer’s operating system and your main productivity tools. Thankfully it’s either an optional feature or a paid-for feature so you are not forced to use it. But that may change.

A particularly worrying development, and the motivation behind this post, is Adobe’s recent announcement (Feb 2024) of its AI Assistant embedded into Acrobat and Reader. These are the tools that most people use to create and read PDF documents. It will allow the user to easily search through a PDF document for important information (not just simple pattern searching), create short summaries of the content and much more. Adobe states that the new AI is “governed by data security protocols and no customer document content is stored or used for training AI Assistant without their consent”. It’s currently in beta, and when it is finally released it will be a paid-for service.

Your consent regarding the use of AI is all-or-nothing because you accept (or reject) certain terms when you are installing/updating the software. Given how tempting the features are, granting consent could be commonplace. Today you might have nothing sensitive to worry about, so you grant consent. Some time later, when getting one-paragraph summaries of your PDFs seems a natural part of your daily workflow, you might receive something important, sensitive, perhaps something you are legally obliged to protect. You open the PDF and now the AI in the cloud has it too, and there is no way for you to re-cork the genie.

“No AI here”

We are entering choppy waters for sure. Maybe we need something we can add to our content that says “not for AI consumption”? Without such control by authors and readers alike we could be facing a lot more trouble.

Amazon Linux 2023 on VirtualBox

Operating Systems, Technology

About seven months ago I threw my hat into a GitHub thread that had opened over a year before (March 2022!) asking Amazon to make good on its promise to release off-prem images of its AL 2023 operating system. My jab at Amazon was picked up in an article on The Register and a few weeks later there was finally some movement by Amazon, raising the profile of the issue and eventually leading to a release of KVM and VMware images mid-November. There was no image for VirtualBox and I mentioned this omission in a follow-up on GitHub. The current January 2024 release still only supports KVM and VMWare. The online instructions also omit VirtualBox. This is unusual because they had done so for previous versions of their OS.

Two weeks after the failure of Amazon to produce a VirtualBox image I decided to solve the problem myself . Here’s the environment in which I created the solution:

  • Windows 10
  • Oracle VirtualBox v7
  • WinZip / 7Zip or similar Zip tool
  • CDBurnerXP

First get the OVA file from the latest release page by navigating to the VMware sub-page and downloading the .ova file from the link therein. For the Jan 2024 release you want the file named al2023-vmware_esx-2023.3.20240122.0-kernel-6.1-x86_64.xfs.gpt.ova, and remember to check the SHA256 signature!

Using your preferred Zip tool open the .ova file and extract the .vmdk file therein.

You will find the VBoxManage.exe program in Program Files/Oracle and you can use it to generate a .vdi file for VirtualBox as follows:

  VBoxManage.exe clonehd al2023-___.vmdk al2023-___.vdi --format VDI

(I am using “___” as a shorthand.) Now create three files named “meta-data”, “network-config” and “user-data” as follows:

meta-data

local-hostname: myhost.mydomain.example.org

network-config

network:
  version: 2
  ethernets:
    enp0s3:
      dhcp4: false
      addresses:
        - 192.168.1.234/24
      gateway4: 192.168.1.1
      nameservers:
        addresses: [8.8.8.8]

user-data

package_upgrade: false
ssh_pwauth: True
chpasswd:
  list: |
    ec2-user:mY-C0mpl3x-Pwd
  expire: False
write_files:
  - path: /etc/cloud/cloud.cfg.d/80_disable_network_after_firstboot.cfg
    content: |
      network:
        config: disabled

These are YAML files with two-space indenting. If you are interested in such configurations, check out some official examples! Feel free to use a different IP address for your VM and whatever DNS nameserver you want, and choose a different (complex) password to your liking.

Finally use the command line tool from CDBurnerXP to create an ISO containing the above three files:

cdbxpcmd.exe --burn-data -name:cidata -file:meta-data -file:network-config -file:user-data -iso:seed.iso -format:iso -changefiledates

Run VirtualBox and add the al2023-___.vdi file to the collection of virtual media images. Then set up a new VM with the following configuration:

  • Type: Linux 64-bit
  • System: 4Gb RAM, 1 or 2 CPUs
  • Storage [Controller=IDE] mounted image seed.iso
  • Storage [Controller=SATA] mounted image al2023-___.vdi
  • Display: 33MB, 1 monitor, VMSVGA.
  • Network: bridged adapter, Realtek

Boot the VM and after some initialisation sequences you should be at a login prompt in a minute or two. Log in via the console or use PuTTY (SSH). The user name is ec2-user and the password is per the user-data file above. At this point you can unmount the seed.iso as it has done its job.

WUps

Operating Systems, Technology

Windows Update is both essential and painful. Regularly interrupting the normal flow of work, sometimes sapping all the energy out of the computers, taking control for long periods of time (on older machines this could be hours!) and occasionally “whoops…” Like the past few days where all except one of my PCs has choked on KB5034441. There are suggestions that the problem is due to the relatively new requirement that the Windows recovery partition have at least 250Mb of available space. All of mine have more than double that, so the update failure is likely more complex. The remedy (partition resizing) proposed by Microsoft is far more convoluted than anything the average user would be familiar with, and infeasible for any central IT administrator to apply to their many users. It comes with significant risks, notably disk corruption, and while the patch is an essential fix for a security issue, it only applies to the minority of people who have BitLocker enabled. Even for those affected, it only applies if physical access to the affected PC by an attacker is possible. That’s a lot of “if”s.

What should be done while we wait for Microsoft to fix their fix? Since the failed patch keeps insisting on a retry, my strategy is simple: ignore it. Or at least, instruct my PCs to ignore this particular patch.

Ignoring a WU patch

Microsoft once offered a tool call “Show or Hide Updates” that scanned for available updates and allowed you to select which of them would be hidden from the WU process. This tool doesn’t require any installation. Just run the wushowhide.diagcab file, select the Hide option, wait for it to present the list of available updates and (in this case) select the offending KB5034441. Sadly Microsoft no longer offer the S&HU tool on their site, but thanks to the Wayback Machine you can download wushowhide.diagcab from the archive.

After hiding the offending update via the S&HU tool, if it is still marked as “retry” in the Windows Update section of Windows Settings, just click the retry link and watch the update disappear.

What next?

Microsoft will eventually release a fix for KB5034441. This might be a revision of the patch, in which case the patch identifier may stay the same, which unfortunately means the S&HU configuration will prevent the fix from being applied. You could re-run S&HU to un-hide the patch, but only if you are sure the patch has been fixed.

Alternatively, Microsoft could withdraw the broken patch so it is no longer offered via WU. In its place they would issue a new patch with a new ID to be applied automatically via WU in the usual way. Hopefully this time without choking.

Wet January

LUE []

My small patch on planet Earth has not much climate but plenty of weather. An island subject to ocean buffeting, chills from northern icy regions and occasional heat from the nearby continent. Often on the same day. I recall being greeted by snow in the morning, beaming sunshine in the afternoon and torrential rain that evening. It has been a bit turbulent of late, two storms in two days. Winds at 100km/h, gusts even worse. And rain.

This has me a little peeved, to be honest. I like to go for a short walk now and then, clear the cobwebs out of my head, put some air in my lungs, stop staring at screens for a while. This January I was looking forward to my walks on account of my new hat, a deep blue pure wool Fedora, which sadly in this weather won’t last a minute unless it is nailed to my head. So I sit here with the rain drumming a tattoo on the window behind me while I stare at one of my screens and ponder another wet January day without a nice walk.

OK then. Coffee break is over. Time to get on with writing that report. I wonder if it would be odd to wear a hat while typing…?