Uwe’s Blog

My writing about data engineering, opensource development, general programming and thoughts about engineering culture.

  • 2025-39: A week in conda-forge

    In the third week of reporting on my conda-forge work, you will see how the large number of contributions happens quickly. As we’re getting closer to the Python 3.14 release, I spent some time bringing that forward.

  • 2025-38: A week in conda-forge

    To continue to give you a glance into my conda-forge work, we’re continuing with the second week of reporting. This week was a bit leaner on my activities here as I spend my time on preparing my PyData Paris 2025.

  • 2025-37: A week in conda-forge

    While I spent the majority of my time on QuantCo-internal (often strategic, non-code) work, my GitHub profile still hovers around 8,000 yearly contributions. Part of this is through my internal work, but the majority of the counts are coming from work in conda-forge. This is a place where I feel I have gamed the metric. As many people will...

  • Let people invite themselves to Google Calendar entries using AppScript

    If you want to organise an event with a group of people within your Google Workspace, you can invite the whole workspace or ask around who wants to attend. It has been the norm at my current workplace to post in Slack and let people react with an emoji if they wish to attend. This was convenient as any attendee...

  • The implications of pickling ML models

    When you have trained a machine learning model (pipeline), you will make predictions directly afterwards to assess its quality. When using the model actually for something useful, we also want to make predictions with it at a later point in time. This forces us to store the model to disk and think of a way to serialise it.

  • Apache Arrow on the Apple M1

    In the previous blog post I explained how I got a well-working setup on my M1 MacBook. With that in place, I mostly worked on my main work setup running. But as a core Apache Arrow developer, I was also very eager to spend the extra mile and get Arrow (the C++ and Python part) working on the M1....

  • The first two weeks with the Apple M1

    Apple recently published new computers that contain their new M1 processors. I was quite excited about them because of the promises made by various benchmarks regarding performance and energy consumption but also because it is also a new platform. Most things won’t work there and some assumption on how we work today have to change if you want to use...

  • Calculating levenshtein distances with fletcher

    Levenshtein distance is a typical measure to compare two different strings. It gives you the minimal number of add, remove and replace operations to transition from one string to another.

  • Trimming down pyarrow’s conda footprint (Part 2 of X)

    We have again reduced the footprint of creating a conda environment with pyarrow. This time we have done some detective work on the package contents and removed contents from thrift-cpp and pyarrow that are definitely not needed at runtime.

  • Removing Python as a dependency of R

    Surprisingly Python was a runtime dependency of R on conda-forge. As R doesn’t need Python to run, this was a bit weird. We got rid of this by splitting up the GLib package.