Creating A Timelapse with bash, sshfs, imagemagick and ffmpeg in 2007

This is the story of how I created a million+ image timelapse with absolutely no knowledge on how to do it correctly.

One day in the late 2000s I’m sitting in the dungeons of Bethel University with no windows. A new building is being constructed next door. I can hear the construction happening and I want a window.

“What are the odds they made a webcam?”

They did! Cool.

“Can I cron this?”

I sure could.

Over the course of a couple of years I saved millions of jpgs from the construction, and then needed to figure out how to put them into a timelapse.

It wasn’t as easy as just stringing them altogether because when it got dark at night you’d end up with a long black spell in the video. How to get around that?

Simple time comparisons (i.e.m 8-5) wouldn’t work, esp. in MN where the day length changes dramatically.

Solution: imagemagick.

IM would give me the darkness/lightness of an image. So for months my workflow was to compute the relative brightness of each and every image everytime I wanted to update the timelapse.

Something like:

#!/bin/bash

mkdir -p frames

i=0

# Define the brightness threshold (10% of the maximum brightness value, which is 1.0)
BRIGHTNESS_THRESHOLD=0.1

find . -maxdepth 1 -type f -name "*.jpg" | while read -r image; do
    # Get brightness (mean value, typically 0.0 to 1.0)
    brightness=$(identify -format "%[fx:mean]" "$image")

    # Perform the brightness comparison using bc for floating-point arithmetic
    # Check if brightness is greater than the threshold
    if (( $(echo "$brightness > $BRIGHTNESS_THRESHOLD" | bc -l) )); then
        i_filename=$(printf "%04d.jpg" "$i")
        ln -s "$(readlink -f "$image")" "frames/$i_filename"
        ((i++))
    fi
done

# Check if any frames were linked before attempting to create the video
if [ "$i" -gt 0 ]; then
    ffmpeg -r 25 -i frames/%04d.jpg -c:v libx264 -vf "fps=25,format=yuv420p" output.mp4
    echo "Processing complete. Symbolic links created in 'frames/' for images over ${BRIGHTNESS_THRESHOLD} brightness."
    echo "Video 'output.mp4' generated from ${i} selected frames."
else
    echo "No images met the brightness threshold of ${BRIGHTNESS_THRESHOLD}. No symbolic links created or video generated."
fi

Except far less pretty, I had Gemini clean that up for me.

So every time I generated a new video, recompute every frame based on brightness (I forgot the comparison in there, you can add it in your imagination) and used symlinks to give frame numbers for ffmpeg.

Yes, I was creating tens of thousands and then hundreds of thousands of symlinks to get ffmpeg to pick up on them as individual frames.

Eventually I figured out how to not re-process everything, I think by moving processed images to a different folder. Something very high tech like that.

Was going great until some idiot started leaving a light on overnight.

Completely threw my heuristic out the window.

BUT, I soon found the counting the number of unique colors in the image was even better than the overall lightness. So same loop, but get the count of unique colors.

Problem: we were now at millions of images and still in the Pentium age.

What to do?

What any self-respecting bash guy does: get more computers, then write bash scripts that create a mysql database and load up every image into the database.

I created a job queueing system — of sorts — that required bash scripts to loop over mysql SELECT statements and write multiple imagemagick commands to a single sh file, and execute that.

Something like:

brightness=$(identify -format "%[fx:mean]" "$image")
sql = "INSERT INTO images_and_brightness (image_path, brightness_value) VALUES ('%s', %.4f);" 
echo $sql | mysql -h "$MYSQL_HOST" -u "$MYSQL_USER" -p"$MYSQL_PASSWORD" "$MYSQL_DB"

But put ten of those into a sh file at a time, and create thousands of sh files.

The main server wrote those and the “workers” picked up a file over an sshfs filesystem and ran it locally. When they were done, they deleted the file and that’s how it was removed from the “queue.”

So lots and lots and lots and of bash scripts now running in a distributing computing environment over ssh on old Pentium 3s my employer had no use for.

I don’t know if I spent more time figuring this out than just letting my Pentium run it.

And still symlink to the original images. Ain’t nobody got enough space for two copies of those jpegs.

At least nobody still eating ramen twice a day.

To this day, the only distributed computing system I every made, and, I believe, one of the more unique systems a person could have come up with.

AI Small Wins; Real World Difference

AI Small Wins; Real World Difference

I’m deeply skeptical of most commercial claims about AI “personalizing” education—at least in the short term. My doubt isn’t about the underlying technology, but about how poorly it’s likey to be executed. (I ought to write another post on this, because it also shows great promise)

But there is room for real-world wins in knowledge and education.

One of the most useful tools I’ve built is a system that turns any article or website into a podcast episode with high-quality voice narration. I use it daily—catching up on news while driving or listening to long-form Tolkien essays before bed.

Today, I showed Gavrel how to use it.

Gav devours audio. He’s only 14 and has already logged over 13 months on Audible. We can’t get him books fast enough.

He’s not a fan of reading text—but he loves to learn.
So today, I taught him how to ask AI to generate a custom article and add it to his podcast feed.

Prompt (I do not know what these things are):

Please write a story on the v1v2 mouse-tank (mauz?) and the rat-tank landship concept.

Give me a long story article that is very detailed, uses trustworthy and historical information that you can find on the web and is optmized for a podcast, then add it to my podcast feed.

Result several minues later: a 12-minute podcast on… whatever those WWII German weapons were.

This is a small win at home, a way for my 14-year old to generate articles tailored to him in the way that he learns best. Not without risk, the AI might get some things wrong, but because its pulling from general web knowledge it will be roughly as accurate as what he could Google himself, but in a fraction of the amount of time.

There are so many ways that use-cases like this can be beneficial.

The real catch is this: for it to be tailored, it also can’t be a mass-market product, that isn’t how customization works…

P.S. Listen to more examples:

On Gondolin via Tolkien Gateway

On Larry the Chief Mouser:

My morning report, dynamically generated for me each day:

OpenAI Has No Moat (And Sam Knows It)

OpenAI Has No Moat (And Sam Knows It)

This might surprise many, but the state of OpenAI large language models are not far enough ahead of their competitors to be a serious barrier to competition. I use models from OpenAI, Anthropic, and Google every day. When any one of them goes down, I can swap over to another competitor pretty easily1. This doesn’t mean they don’t have unique strengths, just that none are so much stronger than the others that I can’t swap them in and out without missing a beat.

The differences that exist are primarily in the frontier models—the top models from each competitor2. However, as soon as you step down from that top tier, there’s really no significant difference in the mid-tier and low-tier models. For the majority of use cases, one of those lower or mid-tier models will be sufficient.

That includes the open-source models Gemma, Llama and Deepseek.

Where the real differences lie is in the systems each vendor is building around their core models.

Anthropic’s best interface is their coding tool, followed perhaps by Claude Desktop, but both are feared torwards technical people.

Google’s interface is minimal, focusing on providing access to some of the most advanced models without a lot of extra features.

OpenAI, on the other hand, has created a web interface with significant differentiation through

  • Integrations with third-party platforms like Box, Dropbox, Microsoft, and Google3
  • Canvas
  • Infinite-length conversations
  • Advanced voice mode4
  • Memories
  • The ability to do analysis in languages like Python

    These are NOT part of the model, they are all part of the software interface they’ve built on top of the models. The implications here are that OpenAI’s strategy is about locking users into ChatGPT, not just the underlying models.

    This is why OpenAI is allegedly considering their own productivity suite like Microsoft Office or Google Docs and launched an asynchronous coding tool, it’s about getting people hooked on the platform not the tech.

    This will be a problem for all the AI providers to solve: when the models have roughly the same capabilities at competitive prices, what is your differentiator?

    1. Claude Code is the one I miss the most when unavailable, but Gemini 2.5 Pro and OpenAI’s o3 are reasonable substitutes. ↩︎
    2. Gemini 2.5 Pro, o3-pro, Sonnet/Opus 4 ↩︎
    3. Claude can do this through MCP servers, but those take moderate technical know-how to setup. ChatGPT’s integrations just work. ↩︎
    4. Google arguably has better voice models, but not a user-friendly interface. ↩︎
    Give Claude Code Access to Gemini

    Give Claude Code Access to Gemini

    One of the best things I’ve done recently is to install Simon Willison’s llm tool in my development environments and give Claude Code instructions on how to use it. I set the model to gemini-2.5-pro-preview-05-06 by default, but I plan to try the latest flash model today.

    There are two primary benefits.

    1. Gemini’s context limit is 500% that of Claude’s. By teaching Claude how to use llm you give Claude the ability to search 5x(!) as much code and documentation
    2. Avoid clogging up Claude’s context window. Claude can use llm to search documentation or the codebase for specific snippets and return only the relevant portions.

    Examples:

    You can use the bash tool 'llm' to ask questions of the codebase or the documentation, this is AMAZING for working with the massive itglue documentation. Example:
    
    ```
     cat itglue.html|llm "What are the properties for the contacts entity?"  
    Based on the provided API documentation, here are the properties for the **Contacts** entity:
    
    **Attributes (from `GET /contacts` example and `POST/PATCH /contacts` params):**
    
    *   **`id`**: (String, Read-only) The unique identifier for the contact.
    *   **`type`**: (String, Read-only) Always "contacts".
    *   **`organization-id`**: (Integer) The ID of the organization this contact belongs to. (Required on create).
    *   **`organization-name`**: (String, Read-only) The name of the organization this contact belongs to.
    *   **`name`**: (String, Read-only) The full name of the contact (likely derived from `first-name` and `last-name`).
    *   **`first-name`**: (String) The first name of the contact.
    *   **`last-name`**: (String) The last name of the contact.
    *   **`title`**: (String) The job title of the contact.
    *   **`contact-type-id`**: (Integer, Optional) The ID of the contact type (e.g., Approver, Champion).
    *   **`contact-type-name`**: (String, Read-only) The name of the contact type.
    *   **`location-id`**: (Integer, Optional) The ID of the location associated with this contact.
    *   **`location-name`**: (String, Read-only) The name of the location associated with this contact.
    *   **`important`**: (Boolean, Optional) Whether the contact is marked as important.
    *   **`notes`**: (String, Optional) Any notes related to the contact.
    *   **`restricted`**: (Boolean, Optional) Whether the contact is restricted.
    *   **`created-at`**: (String/Timestamp, Read-only) The date and time the contact was created.
    *   **`updated-at`**: (String/Timestamp, Read-only) The date and time the contact was last updated.
    *   **`contact-emails`**: (Array of Objects, Optional) A list of email addresses for the contact. Each object has:
        *   `value`: (String, Required) The email address.
        *   `primary`: (Boolean, Optional, defaults to false) Whether this is the primary email.
        *   `label-name`: (String, Required) The label for the email (e.g., Work, Home).
    
    When creating or updating a contact, you primarily work with the `attributes` (like `first-name`, `notes`, `contact-emails`, `contact-phones`) and the `organization-id`. The other `-name` fields are usually derived by the system.
    ```
    
    example -- query code base --
    ```
    find ./ -type f -name "*.py"  -exec cat {} \;|llm "What function syncs contacts with it glue?"
    The primary function responsible for syncing contacts with IT Glue is:
    
    1.  **`sync_contacts(self, organization_id=None, since=None)`**
        *   **File:** `tandem/itglue/tasks/tasks_itglue_sync_contacts.py`
        *   **Purpose:** This Celery task is designed for bulk synchronization. It can sync contacts for a specific organization (`organization_id`) or for all organizations if `organization_id` is not provided. It can also sync contacts updated since a specific time (`since`).
        *   It calls `_sync_organization_contacts` for each organization.
    
    There is also a function to sync a single contact:
    
    2.  **`sync_contact(self, contact_id)`**
        *   **File:** `tandem/itglue/tasks/tasks_itglue_sync_contacts.py`
        *   **Purpose:** This Celery task syncs a *specific* contact from IT Glue given its `contact_id`.
    
    Both of these tasks rely on a helper function to perform the actual database update/create operation:
    
    3.  **`_upsert_contact(contact_data, organization)`**
        *   **File:** `tandem/itglue/tasks/tasks_itglue_sync_contacts.py`
        *   **Purpose:** This function takes the contact data fetched from the IT Glue API (`contact_data`) and the corresponding local `Organizations` model instance. It then uses `Contacts.objects.update_or_create()` to save the contact to the local database, setting its `sync_status` to `"synced"`.
        *   It uses utilities like `snake_dict` and `safe_int_cast` from `tandem.itglue.utils.transformers` to process the incoming data.
    ```

    I’ve actually cut out some of the llm response for this post. The itlgue documentation is particular has been great. Theirs docs are way too large for Claude to take in on its own, and it shouldn’t need to ingest all of it at once. By offloading some of the work to Gemini, it gets only the relevant information in return.

    Here is a portion of today’s session where I am auto-categorizing software. The two scripts mentioned are scripts Claude wrote to extract portions of a database of software entities.

    That’s great to know about the llm tool! Let me use it to analyze the development tools data I just gathered:

    ● Bash (python /app/explore_dev_tools.py && python /app/explore_more_dev_tools.py | llm “Analyze this output of development tools found in a
    software database. Identify…)
    ⎿ === VISUAL STUDIO / MICROSOFT DEVELOPMENT TOOLS ===
    Microsoft Corporation – Workflow Manager Tools 1.0 for Visual Studio 17.7.40616.8
    Microsoft Corporation – Windows Phone SDK 8.0 Assemblies for Visual Studio 2017 15.0.28302
    Microsoft Corporation – Visual Studio Community 2022 17.13.4
    … +302 lines (ctrl+r to expand)

    ● Excellent analysis! The llm tool provided great insights. Let me now search for more specific development tool patterns:

    Later, after we had written the rules for most of the programming tools I wanted to do a second pass, with nearly 3,000 rows of software remaining to be tagged. Could there be any more programming holdouts?

    take all of the remaining pieces of software and send the name and vendor to the llm
    tool, ask it if there are any other development/programming pieces of software in there. Do not count
    redistributables. Use that to see if there is any we missed

    ● Bash(python manage.py auto_tag_software)
    ⎿ Adobe products tagged: 0
    Microsoft Office products tagged: 0
    Vulnerable Office versions tagged: 0
    File sharing products tagged: 0
    … +87 lines (ctrl+r to expand)

    ● Great! The real run successfully tagged 527 development tools. Now let me get all the remaining untagged
    software and analyze it with the LLM tool to find any missed development tools:

    ● Bash(python manage.py shell -c “
    from AMSDevices.models import DeviceSoftwareTypes…)
    ⎿ === UNTAGGED SOFTWARE (VENDOR – SOFTWARE NAME) ===
    Microsoft Corporation – windows_toolscorepkg
    Microsoft Corporation – windows_toolscorepkg
    Microsoft Corporation – vs_vswebprotocolselectormsi
    … +533 lines (ctrl+r to expand)

    ● Let me get a complete list and send it to the LLM tool:

    ● Bash(python manage.py shell -c “
    from AMSDevices.models import DeviceSoftwareTypes…)
    ⎿ Running…

    cat /tmp/untagged_software.txt | llm “Analyze this list of untagged software. Identify any
    development/programming tools that may have been missed from our tagg…)

    The result? Several hundred software names that needed to be added to our rules.

    Should We Tell Them?

    Should We Tell Them?

    Does someone need to tell Cyrisma about the current state of the shuttle?

    Or at least their marketing department?

    Posts in This Category