Hello! I have a server that runs 24/7, and have recently started doing some stuff that requires scraping the web. The websites are detecting the server’s IP to not be residential though, and it’s causing issues.

I’d like to host a proxy server on the small server I have running 24/7 in my house, so that everything for that 1 page could be proxied through it. Does anyone have any idea how I’d set up a server like that? Thanks.

  • Max-P@lemmy.max-p.me
    link
    fedilink
    English
    arrow-up
    5
    ·
    1 year ago

    You can pretty easily install Squid, it’s fairly simple to configure and works well for most use cases. Just a plain simple HTTP proxy.

    You could also set up a VPN to your home to achieve something similar, by binding some requests to the VPN IP. It’s a bit harder to set up however as it involves routing tables, route metrics and conditionally binding the outgoing connection to a specific interface

    • neoney@lemmy.neoney.devOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      Thanks, sounds like Squid will be perfect. I’ll just need to figure out some way to connect. I wish I could just open a port, but it hasn’t been working since I enabled IPv6 on my router. Do you think I could make it accessible through cloudflare tunnels?

      • Max-P@lemmy.max-p.me
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Cloudflare tunnels won’t work as Cloudflare won’t tunnel HTTP proxy traffic, at least as far as I know.

        What you can do however is have your home server VPN into your remote server, then your remote server will have no problem connecting to Squid over the VPN link. WireGuard is very simple to configure like that, probably 5-10 lines of config on each end. You don’t need any routing or forwarding or anything, just a plain VPN with 2 peers that can ping eachother, so no ip_forward or iptables -j MASQUERADE needed or anything that most guides would include. You can also use something like Tailscale, anything that will let the two machines talk to eachother.

        Depending on the performance and reliability needs, you could even just forward a port with SSH. Connect to your remote server from the home server with something like ssh -N -R localhost:8088:localhost:8080 $remoteServer and port 8088 on the remote will forward to port 8080 on the home server as long as that SSH connection is up. -N simply makes SSH not open a shell on the remote, dedicating the SSH session to the forwarding. Nice and easy, especially for prototyping.

        • neoney@lemmy.neoney.devOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          That seems overcomplicated for me honestly, but now I just thought that I actually can host the scraper on the home server, as the scraper itself only scrapes simple data, and the downloads are by a separate program.

          • neoney@lemmy.neoney.devOP
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            The downloader talks to the scraper through HTTP, which I can publish through CF Tunnels, so it’s perfect.