Problem With Email Scraper Custom Crawler

So my issue is trying to get wild cards in there if possible.  Basically I have text like this

<div class="AAA" data-name="BBB">
    <div class="CCC">

        <h4 class="DDD-title">United States, NY</h4>
        <strong>New York<br>Brooklyn</strong>

My issue is I need to get EEE scraped but I can’t seem to figure how how.  Is there any way to do multiple markers?  I would like to do it like this somehow.

– Start it with class=”AAA”
– Then Go to <p>
– Then end with <br>

Is there a way to just add in a wildcard to take care of all the inbetween text from AAA to the <p>

Possible attack vectors for a web site scraper

I’ve written a little utility that, given a web site address, goes and gets some metadata from the site. My ultimate goal here is to use this inside a web site that allows users to enter a site, and then this utility goes and gets some information: title, URL, and description.

I’m looking specifically at certain tags within the HTML, and I’m encoding the return data, so I believe I’ll be safe from XSS attacks. However, I wonder if there are any other attack vectors that this leaves me open to.

Residential vs. Datacenter proxies. GSA Proxy Scraper.

It would be great if you could add a filter that would allow the separation or filtering of residential vs. datacenter proxies in GSA proxy scraper.  It would also be great if we could drill down on location a bit more.  I know country is a filter option now but if you could also offer city / state (for U.S) that would be very useful.  Are these requests something that could be added?

Google Search Scraper

“Facebook provides a debugger tool for its scraper. Interestingly, Google doesn’t limit the requests made by this debugger (whitelisted?) and hence it can be used to scrap the google search results without being blocked by the CAPTCHA.
Since facebook is involved, a facebook session Cookie must be supplied to the library with each request.”

