ChatGPT’s User-Agent… Obfuscation
If you ask ChatGPT to “please fetch https://whatmyuseragent.com/” in regular mode, it gives an answer like Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT‑User/1.0; +https://openai.com/bot
Notice it clearly labels itself ChatGPT.
However, if you ask it the same thing in agent mode, it lies:
That is my user-agent, not ChatGPT’s!
This comment on Hacker News tries to find a gray area:
I find this problem quite difficult to solve:
1. If I as a human request a website, then I should be shown the content. Everyone agrees.
2. If I as the human request the software on my computer to modify the content before displaying it, for example by installing an ad-blocker into my user agent, then that’s my choice and the website should not be notified about it. Most users agree, some websites try to nag you into modifying the software you run locally.
3. If I now go one step further and use an LLM to summarize content because the authentic presentation is so riddled with ads, JavaScript, and pop-ups, that the content becomes borderline unusable, then why would the LLM accessing the website on my behalf be in a different legal category as my Firefox web browser accessing the website on my behalf?
But I really don’t think it is. While I would be equally annoyed to find my requests to ChatGPT to do research stymied, that doesn’t give ChatGPT the right to lie to other online businesses about ‘who’ it is.
Creating A Timelapse with bash, sshfs, imagemagick and ffmpeg in 2007
This is the story of how I created a million+ image timelapse with absolutely no knowledge on how to do it correctly.
One day in the late 2000s I’m sitting in the dungeons of Bethel University with no windows. A new building is being constructed next door. I can hear the construction happening and I want a window.
“What are the odds they made a webcam?”
They did! Cool.
“Can I cron this?”
I sure could.
Over the course of a couple of years I saved millions of jpgs from the construction, and then needed to figure out how to put them into a timelapse.
It wasn’t as easy as just stringing them altogether because when it got dark at night you’d end up with a long black spell in the video. How to get around that?
Simple time comparisons (i.e.m 8-5) wouldn’t work, esp. in MN where the day length changes dramatically.
Solution: imagemagick.
IM would give me the darkness/lightness of an image. So for months my workflow was to compute the relative brightness of each and every image everytime I wanted to update the timelapse.
Something like:
#!/bin/bash
mkdir -p frames
i=0
# Define the brightness threshold (10% of the maximum brightness value, which is 1.0)
BRIGHTNESS_THRESHOLD=0.1
find . -maxdepth 1 -type f -name "*.jpg" | while read -r image; do
# Get brightness (mean value, typically 0.0 to 1.0)
brightness=$(identify -format "%[fx:mean]" "$image")
# Perform the brightness comparison using bc for floating-point arithmetic
# Check if brightness is greater than the threshold
if (( $(echo "$brightness > $BRIGHTNESS_THRESHOLD" | bc -l) )); then
i_filename=$(printf "%04d.jpg" "$i")
ln -s "$(readlink -f "$image")" "frames/$i_filename"
((i++))
fi
done
# Check if any frames were linked before attempting to create the video
if [ "$i" -gt 0 ]; then
ffmpeg -r 25 -i frames/%04d.jpg -c:v libx264 -vf "fps=25,format=yuv420p" output.mp4
echo "Processing complete. Symbolic links created in 'frames/' for images over ${BRIGHTNESS_THRESHOLD} brightness."
echo "Video 'output.mp4' generated from ${i} selected frames."
else
echo "No images met the brightness threshold of ${BRIGHTNESS_THRESHOLD}. No symbolic links created or video generated."
fi
Except far less pretty, I had Gemini clean that up for me.
So every time I generated a new video, recompute every frame based on brightness (I forgot the comparison in there, you can add it in your imagination) and used symlinks to give frame numbers for ffmpeg.
Yes, I was creating tens of thousands and then hundreds of thousands of symlinks to get ffmpeg to pick up on them as individual frames.
Eventually I figured out how to not re-process everything, I think by moving processed images to a different folder. Something very high tech like that.
Was going great until some idiot started leaving a light on overnight.
Completely threw my heuristic out the window.
BUT, I soon found the counting the number of unique colors in the image was even better than the overall lightness. So same loop, but get the count of unique colors.
Problem: we were now at millions of images and still in the Pentium age.
What to do?
What any self-respecting bash guy does: get more computers, then write bash scripts that create a mysql database and load up every image into the database.
I created a job queueing system — of sorts — that required bash scripts to loop over mysql SELECT statements and write multiple imagemagick commands to a single sh file, and execute that.
Something like:
brightness=$(identify -format "%[fx:mean]" "$image")
sql = "INSERT INTO images_and_brightness (image_path, brightness_value) VALUES ('%s', %.4f);"
echo $sql | mysql -h "$MYSQL_HOST" -u "$MYSQL_USER" -p"$MYSQL_PASSWORD" "$MYSQL_DB"
But put ten of those into a sh file at a time, and create thousands of sh files.
The main server wrote those and the “workers” picked up a file over an sshfs filesystem and ran it locally. When they were done, they deleted the file and that’s how it was removed from the “queue.”
So lots and lots and lots and of bash scripts now running in a distributing computing environment over ssh on old Pentium 3s my employer had no use for.
I don’t know if I spent more time figuring this out than just letting my Pentium run it.
And still symlink to the original images. Ain’t nobody got enough space for two copies of those jpegs.
At least nobody still eating ramen twice a day.
To this day, the only distributed computing system I every made, and, I believe, one of the more unique systems a person could have come up with.
AI Small Wins; Real World Difference
I’m deeply skeptical of most commercial claims about AI “personalizing” education—at least in the short term. My doubt isn’t about the underlying technology, but about how poorly it’s likey to be executed. (I ought to write another post on this, because it also shows great promise)
But there is room for real-world wins in knowledge and education.
One of the most useful tools I’ve built is a system that turns any article or website into a podcast episode with high-quality voice narration. I use it daily—catching up on news while driving or listening to long-form Tolkien essays before bed.
Today, I showed Gavrel how to use it.
Gav devours audio. He’s only 14 and has already logged over 13 months on Audible. We can’t get him books fast enough.
He’s not a fan of reading text—but he loves to learn.
So today, I taught him how to ask AI to generate a custom article and add it to his podcast feed.
Prompt (I do not know what these things are):
Please write a story on the v1v2 mouse-tank (mauz?) and the rat-tank landship concept.
Give me a long story article that is very detailed, uses trustworthy and historical information that you can find on the web and is optmized for a podcast, then add it to my podcast feed.
Result several minues later: a 12-minute podcast on… whatever those WWII German weapons were.
This is a small win at home, a way for my 14-year old to generate articles tailored to him in the way that he learns best. Not without risk, the AI might get some things wrong, but because its pulling from general web knowledge it will be roughly as accurate as what he could Google himself, but in a fraction of the amount of time.
There are so many ways that use-cases like this can be beneficial.
The real catch is this: for it to be tailored, it also can’t be a mass-market product, that isn’t how customization works…
P.S. Listen to more examples:
On Gondolin via Tolkien Gateway
My morning report, dynamically generated for me each day:
OpenAI Has No Moat (And Sam Knows It)
This might surprise many, but the state of OpenAI large language models are not far enough ahead of their competitors to be a serious barrier to competition. I use models from OpenAI, Anthropic, and Google every day. When any one of them goes down, I can swap over to another competitor pretty easily1. This doesn’t mean they don’t have unique strengths, just that none are so much stronger than the others that I can’t swap them in and out without missing a beat.
The differences that exist are primarily in the frontier models—the top models from each competitor2. However, as soon as you step down from that top tier, there’s really no significant difference in the mid-tier and low-tier models. For the majority of use cases, one of those lower or mid-tier models will be sufficient.
That includes the open-source models Gemma, Llama and Deepseek.
Where the real differences lie is in the systems each vendor is building around their core models.
Anthropic’s best interface is their coding tool, followed perhaps by Claude Desktop, but both are feared torwards technical people.
Google’s interface is minimal, focusing on providing access to some of the most advanced models without a lot of extra features.
OpenAI, on the other hand, has created a web interface with significant differentiation through
- Integrations with third-party platforms like Box, Dropbox, Microsoft, and Google3
- Canvas
- Infinite-length conversations
- Advanced voice mode4
- Memories
- The ability to do analysis in languages like Python
These are NOT part of the model, they are all part of the software interface they’ve built on top of the models. The implications here are that OpenAI’s strategy is about locking users into ChatGPT, not just the underlying models.
This is why OpenAI is allegedly considering their own productivity suite like Microsoft Office or Google Docs and launched an asynchronous coding tool, it’s about getting people hooked on the platform not the tech.
This will be a problem for all the AI providers to solve: when the models have roughly the same capabilities at competitive prices, what is your differentiator?
- Claude Code is the one I miss the most when unavailable, but Gemini 2.5 Pro and OpenAI’s o3 are reasonable substitutes. ↩︎
- Gemini 2.5 Pro, o3-pro, Sonnet/Opus 4 ↩︎
- Claude can do this through MCP servers, but those take moderate technical know-how to setup. ChatGPT’s integrations just work. ↩︎
- Google arguably has better voice models, but not a user-friendly interface. ↩︎