hrbrmstr's Daily Drop<p><strong>Drop #640 (2025-04-17): Twisted Topics Thursday</strong></p><p><em>DuckDB MCP; Bad Bots: Whatcha Gonna Do?; Terrifying Telemetry</em></p><p>As this tariFFable week slinks ever so further to a close, we’ll mix it up a bit for the Thursday Drop and cover some AI, the increasingly dead internet, and why you should never open your internet browser ever again.</p><p>And, we’ll make today’s a bit beefier to make up for no Wednesday Drop.</p> <p>Type your email…</p><p>Subscribe</p> <p><strong>TL;DR</strong></p><p><em>(This is an LLM/GPT-generated summary of today’s Drop using Ollama + llama 3.2 and a custom prompt.)</em></p><p><em>(Linkifying the WHOLE PARAGRAPH was an unexpected ‘hallucination’.)</em></p><p><a href="https://sizeof.cat/post/web-browser-telemetry-2025-edition/" rel="nofollow noopener noreferrer" target="_blank">The 2025 Web Browser Telemetry report from sizeof.cat shows that nearly all mainstream browsers, including privacy-focused ones, make unsolicited background connections, with only modified versions like LibreWolf and Waterfox minimizing telemetry at the expense of slower updates.</a></p><p><a href="https://github.com/motherduckdb/mcp-server-motherduck" rel="nofollow noopener noreferrer" target="_blank">A hands-on walkthrough demonstrates connecting Claude Desktop to a DuckDB Model Context Protocol server, running anomaly detection SQL queries, and interpreting results to identify significant scanning spikes in security data.</a></p><p><a href="https://www.imperva.com/resources/resource-library/reports/2025-bad-bot-report/" rel="nofollow noopener noreferrer" target="_blank">Imperva’s 2025 Bad Bot Report reveals that automated bots now account for 51% of web traffic, with malicious bots making up 37% and targeting APIs and industries like travel and retail using increasingly sophisticated tactics.</a></p> <p><strong>DuckDB MCP</strong></p>Photo by Brenadette Grindle Faust on <a href="https://www.pexels.com/photo/mallard-2551452/" rel="nofollow noopener noreferrer" target="_blank">Pexels.com</a><p>Our fine, feathered friends over at MotherDuck (the cloudy version of DuckDB) have a spiffy <a href="https://github.com/motherduckdb/mcp-server-motherduck" rel="nofollow noopener noreferrer" target="_blank">Model Context Protocol server for DuckDB</a> which is super easy to hook up to, say, Claude Desktop to start exploring your data in some fancy and fun new ways.</p><p>I’m not a big fan of including all the <code>args</code> in the JSON configs of all these MCP clients (including Claude) so I made an entry for some tag, CVE, and sensor data from my $WORK like this:</p> <pre>"gndb": { "command": "/Users/me/bin/gnddbmcp.sh"}</pre> <p>and that corresponding shell script is just:</p> <pre>#!/usr/bin/env bash/Users/me/.cargo/bin/uvx mcp-server-motherduck --db-path /Users/me/Data/gn.ddb</pre> <p>Claude has an “Attach to MCP” button which you can use to connect to the server and start exploring your data. And when I asked it what tables I had, you get something like this back:</p> <pre>> what tables are there?To see what tables exist in your database, I can help you run a query. Let me do that for you:{ `query`: `SHOW TABLES;`}Based on the query results, there are 4 tables in your database:1. cveActivity2. cveCpe3. sensors4. tagActivityWould you like to explore the schema of any of these tables to understand their structure?</pre> <p>I asked it a query that I already have scripts for (to see how well it performed):</p> <pre>> What tags have anomalous recent spikes with at least one day having over 200 unique IPs?To find tags with anomalous recent spikes with at least one day having over 200 unique IPs, I'll need to analyze the tagActivity table. Let me write a query to identify this pattern:{ `query`: `-- First, let's examine the schema of the tagActivity table to understand its structureDESCRIBE tagActivity;`}Now I'll query for tags that have had at least one day with over 200 unique IPs, and analyze patterns to identify anomalous spikes:{ `query`: `WITH tag_stats AS ( -- Calculate the average unique IPs per day for each tag SELECT tag_id, name, AVG(unique_ips) AS avg_unique_ips, MAX(unique_ips) AS max_unique_ips, STDDEV(unique_ips) AS stddev_unique_ips FROM tagActivity GROUP BY tag_id, name HAVING MAX(unique_ips) > 200 -- Filter for tags with at least one day over 200 unique IPs),tag_anomalies AS ( -- Join back to get days with anomalous activity -- defined as days where unique_ips exceeds average + 2*stddev SELECT ta.day, ta.tag_id, ta.name, ta.unique_ips, ts.avg_unique_ips, ts.stddev_unique_ips, (ta.unique_ips - ts.avg_unique_ips) / NULLIF(ts.stddev_unique_ips, 0) AS z_score FROM tagActivity ta JOIN tag_stats ts ON ta.tag_id = ts.tag_id AND ta.name = ts.name WHERE ta.unique_ips > ts.avg_unique_ips + 2 * ts.stddev_unique_ips AND ta.unique_ips > 200 ORDER BY z_score DESC)-- Get the final results with the most anomalous tags firstSELECT tag_id, name, MAX(z_score) AS max_z_score, MAX(unique_ips) AS max_unique_ips, AVG(unique_ips) AS avg_unique_ips_during_anomalies, COUNT(*) AS anomalous_daysFROM tag_anomaliesGROUP BY tag_id, nameORDER BY max_z_score DESCLIMIT 15;`}Let's get a bit more detail on the most significant anomalies by looking at their daily activity patterns:{ `query`: `-- Let's look at the daily pattern for the top 5 anomalous tagsWITH top_anomalous_tags AS ( SELECT tag_id FROM ( WITH tag_stats AS ( SELECT tag_id, name, AVG(unique_ips) AS avg_unique_ips, MAX(unique_ips) AS max_unique_ips, STDDEV(unique_ips) AS stddev_unique_ips FROM tagActivity GROUP BY tag_id, name HAVING MAX(unique_ips) > 200 ), tag_anomalies AS ( SELECT ta.day, ta.tag_id, ta.name, ta.unique_ips, ts.avg_unique_ips, ts.stddev_unique_ips, (ta.unique_ips - ts.avg_unique_ips) / NULLIF(ts.stddev_unique_ips, 0) AS z_score FROM tagActivity ta JOIN tag_stats ts ON ta.tag_id = ts.tag_id AND ta.name = ts.name WHERE ta.unique_ips > ts.avg_unique_ips + 2 * ts.stddev_unique_ips AND ta.unique_ips > 200 ) SELECT tag_id, name, MAX(z_score) AS max_z_score FROM tag_anomalies GROUP BY tag_id, name ORDER BY max_z_score DESC LIMIT 5 ))-- Get daily activity for these tags over the last 30 daysSELECT ta.day, ta.tag_id, ta.name, ta.unique_ipsFROM tagActivity taJOIN top_anomalous_tags t ON ta.tag_id = t.tag_idWHERE ta.day >= (SELECT MAX(day) FROM tagActivity) - INTERVAL 30 DAYORDER BY ta.tag_id, ta.day;`}Let's now find out when the anomalous spikes occurred (finding days when the unique_ips count exceeded 200 for these tags):{ `query`: `-- Find days where the unique_ips count exceeded 200 for these tagsSELECT day, tag_id, name, unique_ipsFROM tagActivityWHERE unique_ips > 200ORDER BY day DESC, unique_ips DESC;`}…</pre> <p>I’ll spare you from the entire session and just drop the result (this is all 100% accurate from what I already knew):</p><blockquote><p><strong>Most Significant Anomalous Spikes</strong></p><ol><li><strong>AWS Configuration Scanner</strong> (bbf098dc-a1e5-4e28-a5c3-cf7d96765151)<ul><li>Had a major spike on April 9, 2025, with 1,742 unique IPs</li><li>This is an extreme outlier with a z-score of 6.52 (meaning it’s 6.52 standard deviations above normal activity)</li><li>Normal activity for this tag averages around 130 unique IPs per day</li><li>Also had another spike on April 16, 2025, with 280 unique IPs</li></ul></li><li><strong>robots.txt Scanner</strong> (215d03ae-6899-49dd-9167-4c22fe1e832d)<ul><li>Experienced a significant spike on April 7, 2025, with 9,478 unique IPs</li><li>This represents a z-score of 5.75</li><li>This scanner typically has high activity (average ~5,800 IPs daily), but the April 7 spike was still notably anomalous</li></ul></li><li><strong>Spring Boot Actuator Crawler</strong> (d9f421c7-d877-448f-a822-95a24f2a0ad7)<ul><li>Saw a major spike on April 1, 2025, with 4,810 unique IPs</li><li>Represents a z-score of 5.63</li><li>Normal activity averages around 348 unique IPs per day</li></ul></li></ol><p><strong>Other Recent Anomalous Activity in April</strong></p><ol start="4"><li><strong>Git Config Crawler</strong> (c536e78c-7206-47e9-90f3-6e4302444185)<ul><li>Spike on April 16, 2025, with 1,004 unique IPs (z-score 3.31)</li><li>Normal activity averages around 347 unique IPs</li></ul></li><li><strong>MVPower CCTV DVR RCE CVE-2016-20016 Attempt</strong> (3b9f5a45-63d0-4b08-a4c7-cd514cdc92f4)<ul><li>Spike on April 16, 2025, with 1,917 unique IPs (z-score 3.24)</li><li>Normal activity averages around 515 unique IPs</li></ul></li><li><strong>JAWS Webserver RCE</strong> (55ca1afe-c8a3-4d7b-a741-81075b9f8200)<ul><li>Spike on April 16, 2025, with 1,917 unique IPs (z-score 3.24)</li><li>Normal activity averages around 515 unique IPs</li></ul></li><li><strong>WordPress Backup Migration RCE Attempt</strong> (95ecfd16-5307-47d7-9500-2f64b93adb11)<ul><li>Spike on April 13, 2025, with 388 unique IPs (z-score 3.33)</li><li>Normal activity averages around 51 unique IPs</li></ul></li><li><strong>SPIP Scanner</strong> (487d4a3f-884c-416e-9e69-239b04baa5c8)<ul><li>Spike on April 12, 2025, with 273 unique IPs (z-score 3.02)</li><li>Normal activity averages around 63 unique IPs</li></ul></li><li><strong>Apache Tomcat Crawler</strong> (ebd17317-0de5-431b-9aa6-c91407eaf40f)<ul><li>Spike on April 14, 2025, with 547 unique IPs (z-score 3.12)</li><li>Normal activity averages around 218 unique IPs</li></ul></li></ol><p><strong>Patterns and Observations</strong></p><ul><li>The AWS Configuration Scanner showed the most statistically significant anomaly, with unique IP counts over 13x normal levels</li><li>Several security vulnerability scanners appear in the list, suggesting coordinated scanning activities</li><li>Many of these anomalous spikes occurred on specific dates (April 9, April 16, April 13), which could indicate coordinated scanning campaigns</li><li>There’s a notable pattern of targeting web-related vulnerabilities (AWS configurations, Spring Boot, WordPress, Apache Tomcat)</li></ul><p>These spikes represent unusual scanning activity that deviates significantly from normal patterns and could indicate new vulnerability scanning campaigns or increased interest in exploiting specific vulnerabilities.</p></blockquote><p>I went on to ask a more gnarly question: <em>“Which sensor + persona combos experienced degraded performance during the timespan in the records?”</em>, and I put the results <a href="https://paste.hrbrmstr.app/p/zQ" rel="nofollow noopener noreferrer" target="_blank">here</a>.</p><p>You’ll see I had to tell it about the fact that the last day’s records were not complete, so it had to rerun the analysis again. I double-checked the results with my Observable Framework dashboard, and — sure enough — it got it right.</p><p>I know lots of folks are reluctant to have these tools touch real data and draw real conclusions. And, that is 100% fair. Just because you’re reducing the content pool for the probabilistic choices does not mean it won’t make 💩 up, but — if you know your domain — these can be tremendously useful tools to augment your regular data analysis.</p> <p><strong>Bad Bots: Whatcha Gonna Do?</strong></p>Photo by Pavel Danilyuk on <a href="https://www.pexels.com/photo/close-up-shot-of-white-robot-toy-8294606/" rel="nofollow noopener noreferrer" target="_blank">Pexels.com</a><p>Imperva is one of the decent names in both “cloud” and “security” spaces. And, like all vendors (including my $WORK), they released a <a href="https://www.imperva.com/resources/resource-library/reports/2025-bad-bot-report/" rel="nofollow noopener noreferrer" target="_blank">new report</a> right before the annual corproate cyber megacon (RSA). This year, it’s on “bad bots”, and paints a stark picture of an internet where automated bots, empowered by artificial intelligence, now dominate web traffic and pose escalating risks across the globe.</p><p>For the first time in a decade, automated traffic exceeded human activity, making up 51% of all web traffic in 2024. This surge is largely attributed to the proliferation of AI tools and large language models, which have dramatically lowered the technical barriers for attackers. As a result, malicious bots—those designed to commit fraud, steal data, or disrupt services—accounted for 37% of all internet traffic, up sharply from 32% the previous year. “Good” bots, such as search engine crawlers, now represent just 14% of traffic.</p><p>This rise in bad bot activity is not just a matter of volume but also of sophistication. Attackers increasingly use advanced tactics to mimic human behavior, evade detection, and exploit vulnerabilities in business logic, especially within APIs. In 2024, 44% of advanced bot traffic targeted APIs, compared to just 10% directed at traditional web applications. These attacks are highly effective because APIs are the backbone of digital transformation, powering everything from payments to analytics, and often lack the same level of scrutiny as user-facing web interfaces. Attackers exploit API logic to automate fraud, scrape sensitive data, and bypass security controls, frequently with devastating financial and reputational consequences for organizations.</p><p>Industry analysis reveals shifting patterns in bot targeting. The travel sector overtook retail in 2024 as the most attacked industry, accounting for 27% of all bad bot attacks. Travel and retail both face advanced threats: bots disrupt inventory, manipulate pricing models, and hijack customer accounts. In the travel industry, “seat spinning” bots hold tickets without purchasing, while scraping bots distort look-to-book ratios, undermining revenue management and competitive pricing. Retailers face ongoing threats from scalping, gift card fraud, and price scraping, with bot attacks now occurring year-round rather than just during peak shopping seasons.</p><p>The report also details the economic and regulatory consequences of unchecked bot activity. Successful attacks can lead to direct financial losses, regulatory penalties under laws like GDPR and CCPA, legal costs, and long-term reputational harm. For instance, a case study highlights how a global talent agency saw 83% of its web traffic generated by bad bots, skewing marketing analytics and draining advertising budgets until advanced bot protection was deployed.</p><p>There’s TONS more info in the report, along with recommendations for mitigating bot threats. It’s a great read, with a pretty modern/bold design (though I could have done without the 🍩).</p> <p><strong>Terrifying Telemetry</strong></p>Photo by dabatepatfotos on <a href="https://www.pexels.com/photo/a-satellite-dish-on-the-roof-of-a-building-7633266/" rel="nofollow noopener noreferrer" target="_blank">Pexels.com</a><p>The <a href="https://sizeof.cat/post/web-browser-telemetry-2025-edition/" rel="nofollow noopener noreferrer" target="_blank">2025 “Web Browser Telemetry” report from sizeof.cat</a> reveals how modern browsers communicate with external servers without our knowledge. The findings suggest that privacy concerns are warranted, as nearly all mainstream browsers engage in background network activities.</p><p>Researchers tested various browsers by analyzing network requests during fresh launches and navigation to blank pages. Chrome and Edge proved to be the most communicative, immediately sending data to Google and Microsoft servers for updates, safe browsing features, and browser-wielder metrics collection.</p><p>Firefox, though often recommended for privacy, still connects to Mozilla servers at startup for telemetry, updates, and security checks. Even after disabling telemetry in settings, some network requests continue for security updates and features like Pocket.</p><p>Modified Firefox versions like LibreWolf and Waterfox present a more nuanced situation. These browsers avoid unsolicited connections to Mozilla or analytics servers at startup, supporting their privacy-focused claims. Yet community reports suggest they may still make minimal connections for extension updates or certificate verification, though far fewer than standard Firefox. For folks prioritizing minimal telemetry, these modified versions currently offer the best balance, despite potentially slower security updates due to smaller development teams.</p><p>Chromium-based browsers like Brave and Vivaldi, which emphasize privacy features, also maintain some background connections for their own systems. While they block many external trackers, they still communicate with their respective servers for updates and occasionally for metrics collection, though less aggressively than Chrome or Edge.</p><p>Advanced privacy tools like Pi-hole or DNS-based blocking provide only partial protection, as some browsers bypass system DNS settings through direct IP connections to telemetry endpoints. This bypassing is particularly common in Windows and Chrome-based browsers that increasingly use hardcoded DNS-over-HTTPS or direct IP addresses to avoid local network controls.</p><p>It’s a pretty bleak and detailed report, but the blows are dampened if you move to a different tab, and watch the site cycle through different <code><title></code> sequences that appear to make it look like various other online services.</p> <p><strong>FIN</strong></p><p>Remember, you can follow and interact with the full text of The Daily Drop’s free posts on:</p><ul><li>🐘 Mastodon via <code>@dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev</code></li><li>🦋 Bluesky via <code>https://bsky.app/profile/dailydrop.hrbrmstr.dev.web.brid.gy</code></li></ul><p>☮️</p><p><a rel="nofollow noopener noreferrer" class="hashtag u-tag u-category" href="https://dailydrop.hrbrmstr.dev/tag/duckdb/" target="_blank">#duckdb</a></p>