Zum Inhalt der Seite gehen


The web is broken, IMHO

So there is a (IMHO) shady market out there that gives app developers on iOS, Android, MacOS and Windows money for including a library that sells users network bandwidth. Infatica [1] is just one example, there are many more.

I am 99% sure that these companies cause what effectively are DDoS attacks that many webmasters have to deal with since months. This business model should simply not exist. Apple, Microsoft and Google should act.

1/8

[1] infatica.io/sdk-monetization/

Dieser Beitrag wurde bearbeitet. (1 Tag her)

teilten dies erneut

Als Antwort auf Jan Wildeboer 😷

What these companies then sell to *their* customers is network access through the devices/PCs that have an app with this SDK installed. They are proud to tell you how you can funnel your (AI) web scraping etc through millions of rotating, residential and mobile IP addresses. Exactly the pattern we see hitting our servers.

infatica.io/pricing/

2/8

Dieser Beitrag wurde bearbeitet. (1 Tag her)
Als Antwort auf Jan Wildeboer 😷

Now, again, this company is just one of many selling similar services. And they all promise that they carefully check what commands their customers send to the (IMHO) infected apps on your phone and PC. Yeah, I am sure they "do no evil". And when they do, they can claim it's not their problem because they are merely the proxy. Again, IMHO, a shady business model.

3/8

Dieser Beitrag wurde bearbeitet. (1 Tag her)
Als Antwort auf Jan Wildeboer 😷

But this explains the explosion of bot traffic that really cripples a lot of smaller services (like my forgejo instance, that I had to make non-public).

So if you include such an SDK in your app to make some money — you are part of the problem and I think you should be punished for that. You are delivering malware to your users, making them botnet members.

Unfortunately it is next to impossible for normal users to detect the inclusion of such shady SDKs and the network traffic they cause.

4/8

Dieser Beitrag wurde bearbeitet. (1 Tag her)
Als Antwort auf Jan Wildeboer 😷

I already blogged about this at jan.wildeboer.net/2025/02/Bloc…

I might rewrite that blog post to make the problem clearer. And to explain why I am now of the opinion that *every* form of web-scraping should be considered abusive. If you think your web-scraping is acceptable behaviour, you can thank these shady companies and the "AI" hype for moving you to the bad corner.

TL;DR certain companies recruit app developers to create botnets. Botnets are malware. Period.

The web is broken, IMHO.

5/8

Dieser Beitrag wurde bearbeitet. (1 Tag her)
Als Antwort auf Jan Wildeboer 😷

From being on the development team for sciop.net, I've worked some on filtering out (malicious) bot activity, and while our log retention policies (zero), make it slightly harder to get the same level of statistics as you have, I've identified behaviors similar to this.
Everything from the "AI" scrapers, that endlessly hammer endpoints which no longer exist, as well as one's that do, to bots trying to hit variations of ".git/config" or "admin.php". There's also been some somewhat more insinuous behavior, which I believe to be looking for endpoints susceptible to ddos attacks. (probing for pages with high server response time and without caching). I very much agree, the "AI" scraping, and all that, has fundamentally broken the internet that I knew.
Als Antwort auf Jan Wildeboer 😷

Addendum: Trend Micro did some research on these companies back in 2023 and it confirms my suspicions. And I guess with AI scraping this kind of business is booming. For the paranoid:

„There are malicious actors who repacked freeware and shareware written by other people to conduct drive-by downloads of the Infatica peer-to-business (P2B) service“

trendmicro.com/vinfo/ae/securi…

6/8

Dieser Beitrag wurde bearbeitet. (1 Tag her)
Als Antwort auf Jan Wildeboer 😷

Addendum 2: If you want to feel really dirty, go to proxyway.com/reviews?e-filter-… for a collection of reviews on these services. It's a huge market and I am 100% convinced that "AI" web scraping is currently the biggest "growth" driver for these companies.

And when I see that quite some of them rely on injecting SDKs into 3rd party apps to "extend" their "Reach", I would call these "residential proxy providers" malware/botnets. But that's just my personal opinion. I am sure they are all legit.

7/8

Dieser Beitrag wurde bearbeitet. (1 Tag her)
Als Antwort auf Jan Wildeboer 😷

Oh my ... For me, that's just another good reason to avoid proprietary software as much as possible.

I had to lockdown multiple of my personal services and put them behind VPNs, because that bot traffic simply got too much during the past year.

Als Antwort auf Jan Wildeboer 😷

If you've made it to this final post of this thread — thank you for your time and interest! I hope it helps you understand why web crawlers have become a real problem and how this is more and more an attack on the foundation of the Web as it was intended to be. This "residential proxy" business is just one part of this. And we webmasters/admins can only try to block. It is getting more and more difficult to keep up with these waves. Thanks "AI"!

I will convert this thread to a blog post.

8/8

Dieser Beitrag wurde bearbeitet. (1 Tag her)
Als Antwort auf Jan Wildeboer 😷

It's a big problem. I run some serious infrastructure (6 really beefy servers, hosted in a Colocation datacenter in my own rack) and had to put some mitigations in place in order to keep up with the increasing computing resource drain:

- Putting services behind VPN gateways
- Moving public repositories from my own Forgejo instance to Codeberg
- Using static site generators like Jekyll instead of CMS systems

But I hate it, that those things are nessecary. My personal code-repo server was openly available on the net since CVS pserver days in the early 2000s and now I basically have to put it on a private network.

Dieser Beitrag wurde bearbeitet. (1 Tag her)
Als Antwort auf Jan Wildeboer 😷

what does that traffic look like on the home LAN? I ask because I've seen a huge increase in QUIC traffic from someone's mobile device (like, as much as the TCP coming from that device) and figure it's an unsavory app, but haven't had time to look into it.
Als Antwort auf Kevin Neely :donor:

Most of the time it looks like legit traffic, browsing web pages etc. Sometimes you will see SMTP/IMAP traffic, when they try to bruteforce their way into mail servers. When you can log the outgoing DNS requests, you might see a lot of traffic going to/from domain names that are unusual for your network. Hitting the same endpoints over and over again.
Dieser Beitrag wurde bearbeitet. (1 Tag her)
Als Antwort auf Jan Wildeboer 😷

that makes a lot of sense, and I figured it was something like that. Since QUIC is essentially an HHTP/S accelerator, I was wondering if they're trying to do something tricky to both hide and speed the scraping.
Als Antwort auf Jan Wildeboer 😷

Done: This thread is now a blog post at jan.wildeboer.net/2025/04/Web-…
Als Antwort auf Jan Wildeboer 😷

I seem some providers listed there that I used to scrape Google services (that otherwise easily block you), so I think there is actually some good use here to scrape large providers and it's not just about AI.
Als Antwort auf dusoft

I had to make my forgejo instance non-public, as 95% of the traffic that hit it was scraperbots from everywhere. We are talking about saturation of my 250 mBit/s upstream link for more than 6 days, making the site completely unaccessible for "normal" visitors. You might think your scraper is "nice", but my logs tell me a clear story, unfortunately.
Dieser Beitrag wurde bearbeitet. (10 Stunden her)
Als Antwort auf Jan Wildeboer 😷

I am saying there are other use cases than the only one identified by you.
Als Antwort auf dusoft

@dusoft Sure. But I will block your scraperbot just like all the other ones. The AI and Ad mafia has destroyed any good reputation that scrapers might have had. Don't blame me, blame them. Is all.
Als Antwort auf Jan Wildeboer 😷

not sure it is listed on it but a tool such as @exodus may be able to detect if the Infatica SDK is embedded in an app binary.

Edit: didn't find this specific sdk on reports.exodus-privacy.eu.org/… but I guess detection rules could be added to their detection engine

Dieser Beitrag wurde bearbeitet. (53 Minuten her)
Als Antwort auf Jan Wildeboer 😷

So is this a bot net that has been legitimized by people failing to read a EULA or TOS?
Als Antwort auf Lyrial

That's one way to see it. Not one I would subscribe to, though. Yes, these companies create and sell access to botnets. But no, it’s not the users fault because they didn’t read the T&Cs.
Dieser Beitrag wurde bearbeitet. (1 Tag her)
Als Antwort auf Lyrial

@lyrial They approached me earlier this year (and have since been ghosted and blocked for good), and yes, they tell potential future collaborators to just add their SDK and some lines to the TOS and that's all, nothing to worry about, all fine.

Fscking disgusting.

Als Antwort auf Gina Häußge

So how is this not a bigger story‽ I'd guess that companies tend to not want to risk easy revenue and would want to keep it quiet, but this seems like something that could affect the infosec of everyone everywhere.
Als Antwort auf Lyrial

@lyrial I am also wondering about that, TBH (To Be Honest). I do hope that some investigative journalist (team) picks up on this and publishes a story that will get noticed. That's why I post about this. Not just to inform my dear followers and friends, but also to raise awareness in general. @foosel
Als Antwort auf Jan Wildeboer 😷

@lyrial yeah... I thought long and hard about trying to get some attention on this myself, or rather, how, after I got contacted by this company. At some point I considered reaching out to the BSI too. But the past months have simply been too much already on their own.
Als Antwort auf Jan Wildeboer 😷

welp. I was wondering how the hell the LLM companies were using residential IPs -- this is clearly how. Ugh.
Als Antwort auf Jan Wildeboer 😷

Well, it hasn’t been included in the PiHole/ABP-lists my pihole is subscribed to.
Do you know whether there’s a list of domains for this service, which could be domain blocked?
Dieser Beitrag wurde bearbeitet. (1 Tag her)
Als Antwort auf AliveDevil

While you might be able to block the intake points of their "service", it wouldn't really reduce the traffic caused by the infected apps. That's kind of their whole selling point. That they are stealthy and happily accept when a residential IP gets blocked because they have enough in their pool. That the affected user has lost his/her/their access and doesn't know why that happened — not their problem.
Dieser Beitrag wurde bearbeitet. (1 Tag her)
Als Antwort auf Jan Wildeboer 😷

@AliveDevil I would like pihole to help to identify which apps show this behaviour. Ideally some kind of plugin that says hey, check your device xy, it’s misbehaving
Als Antwort auf Jan Wildeboer 😷

feels like mobile apps should have to declare which servers they will access, as part of app permissions. Only very rarely (e.g. browsers) would one then grant a "contact any server" permission.
Als Antwort auf Steve Purcell

@sanityinc I'd prefer if Apple, Google etc declare these SDKs to be malware and kick all apps that include such "proxy" SDKs out of their respective stores.
Als Antwort auf Jan Wildeboer 😷

On that page they mention that they work with #bitdefender . What a disaster that security vendors are just helping this abuse instead of combatting it.
Als Antwort auf Frehi

@frehi I chose to use them as an example because they are very open and transparent about what they do. A lot of their "competitors" are happily hiding this proxy funtionailty in SDKs that claim to do more legit stuff like ad delivery. So in a lot of cases not even the app developer using such an SDK knows that they are causing their users to "donate" network bandwidth and cause possibly abusive behaviour, leading to their IP being blocked.
Als Antwort auf Jan Wildeboer 😷

Thank you, this explains a lot of what I have been seeing. Is there any way to identify which apps are usi by these sdks?
Dieser Beitrag wurde bearbeitet. (1 Tag her)
Als Antwort auf Jan Wildeboer 😷

Here’s my question: how do we—the users—figure out which apps are doing this shi…stuff so we can get rid of the apps?
Als Antwort auf Jan Wildeboer 😷

not just apps?
i like black boxes e.g. iot things and tv as prime culprits for loading such bandwidth abusing sdk and thanks for being there filling us in letting us know we've all been abused via monitization. again.
Als Antwort auf Jan Wildeboer 😷

not them having a captcha on the page. What's the matter? Don't like people hitting your site from a wide range of IPs?
Als Antwort auf Jan Wildeboer 😷

Yes. This killed my little hobby website with a basic plan (a cycling fantasy league). Traffic went through the roof in the off-season when there shouldn't be any, and I couldn't afford to invest the time to figure out how to stop them.
Als Antwort auf Jan Wildeboer 😷

Of freaking course this is a thing... Why use malware to create a botnet if you can just, sell it as a service... :/
Als Antwort auf Hazelnoot

also, what addresses is this library phoning home to? would love to add them to my pihole.
Als Antwort auf Abi

@letsbekind2 @hazelnoot I honestly don’t know. And if they are smart enough, they will rotate these endpoints all the time too.
Als Antwort auf Jan Wildeboer 😷

This is why many folks are deploying anubis.techaro.lol -- kernel.org, FFPMEG, FreeBSD, SourceHut, heck /me, and even UNESCO deployed[1] it to lighten that load and get rid of all the AI scrapers but also these botnets through residential IPs.

Yes, constructs with "VPNs" (that share the endpoint of the user that wanted protection) and rackets like libraries, criminals they are, but that is the Advertising&AI world....

[1] = anubis.techaro.lol/docs/user/k…

Als Antwort auf Jan Wildeboer 😷

Do these scrapers send non-GET requests, by any chance?

I wonder if we should redefine GET as being only for requests that are not only idempotent but also inexpensive.