Results 1 to 9 of 9
  1. #1
    Join Date
    Dec 2014
    Posts
    204

    Default ESB Scraping Ability Options

    The mention of Ubot in another thread reminded me to ask this about the scraping options with ESB builds...

    1. Can we create builds with the ability to use proxies?

    2. Can we create builds that can use multiple connections?

    I noticed commercial scrapers like ScrapeBox, among others, let you set proxies and choose the number of connections.

    Just wondering if that is possible now or in the future with ESB builds. I'm not sure we'll be creating anything like that, it's just something we've been looking into along with other things.

    Thanks!

    Mel

  2. #2
    Join Date
    Dec 2008
    Posts
    3,201

    Default

    The web automation uses chrome. I believe Ubot does something similar.

    You would need to use the Set Proxy action before you start running any browser stuff. That should go in the beginning of the script. You would need to build a form so people can enter in different proxies and then in the script load a proxy and set it to the browser.

    I'm not sure what you mean by connections. Scripts are ran simultaneously depending on how much processing power you have.

    You can see how it works here.

    http://profittigersystems.com/vbforu...n-the-Database


    I purposely didn't build ESB to be a commercial scraper since that stuff is so abused. There are legitimate uses for scraping, but when you start talking about connections that leads me to believe people will be blasting a website. I don't want people who create 100's of gmail email accounts so they can spam the internet. That was why I decided against adding captcha breaking. That just leads to people abusing other websites.

    Those types of behavior lead to bad karma.

    I wouldn't want to support a product like that. Websites are already trying to stop automation. I think I even remember Ubot was having issues with one of the browsers they were using. It was getting blocked by websites because that browser was mainly used for bots.



    Thomas

  3. #3
    Join Date
    Dec 2008
    Posts
    3,201

    Default

    Getting into a tirade over internet abuse made me forget something really important. haha

    Your customers won't be able to create automation scripts. You create them and they run them on their computer. That means they won't be able to scrape whatever website they want. You will have to set it up for them.

    Hopefully this makes sense.


    Thomas

  4. #4
    Join Date
    Dec 2014
    Posts
    204

    Default

    That sounds like a good setup. You're right about mass scraping. Websites are getting more aggressive about stopping it, too. With our new hosting we have Sucuri set up to block a lot of bot activity ourselves.

    Thanks!

    Mel

  5. #5
    Join Date
    Dec 2008
    Posts
    3,201

    Default

    Quote Originally Posted by Mel Bel View Post
    That sounds like a good setup. You're right about mass scraping. Websites are getting more aggressive about stopping it, too. With our new hosting we have Sucuri set up to block a lot of bot activity ourselves.

    Thanks!

    Mel
    Mel, if you have a more specific idea that you want to keep secret, you're welcome to email or pm me the specifics and I can try to help you better. Sorry for my rant . I got triggered when I saw Scrapebox.


    Thomas

  6. #6
    Join Date
    Dec 2014
    Posts
    204

    Default

    I was just wondering about it generally. We did some research on the scraping software out there and saw a lot of these scrapers use proxies and some of them multiple connections (but not all of them on that feature). I didn't know how those would work with ESB, but you explained it well.

    Thanks again,

    Mel

  7. #7
    Join Date
    Dec 2008
    Posts
    3,201

    Default

    Quote Originally Posted by Mel Bel View Post
    I was just wondering about it generally. We did some research on the scraping software out there and saw a lot of these scrapers use proxies and some of them multiple connections (but not all of them on that feature). I didn't know how those would work with ESB, but you explained it well.

    Thanks again,

    Mel
    There is a multiple run field on the scripts. So when a script gets scheduled to run, it will schedule as many as you have in that field. Set it to 10 and it will schedule that script to run 10 times. The scripts do run simultaneously depending on the computing power. It may run 10 at the same time or 8 and queue the other 2. I don't see it ever running a script 100 times simultaneously.

    The browser stuff will be slower compared to Scrapebox because it is opening up a hidden Chrome browser.

    The advantages of using Chrome is being able to simulate a real person clicking and typing in text. Sometimes a website will expect interactions before it shows all the data. Like showing more results when the person scrolls down. I never used Scrapebox but would guess that it just pulls down all the html and parses it looking for the data. That is faster, but the html can change based on interaction with the webpage.

    The disadvantages is being slower since you are opening up a browser and waiting for a page to load.

    Throughout the development of ESB I always chose consistency over speed. My main concern is these scripts running successfully and if I needed to give up some speed in order to do so than so be it. I'd also would point out that ESB is really two programs. One is the main program and the other is the automation server where it runs most scripts. The person can still work while all of these scripts are running in the background. So they are not typically waiting for a response from a script. If you are waiting for a script to bring back data you will be more concerned with speed. The way we approached it was to allow the user to work so the speed, while still important, isn't a necessity. It was more important that the script ran successful.


    Thomas

  8. #8
    Join Date
    Dec 2014
    Posts
    204

    Default

    Hey, that's interesting that it will run simultaneous scripts depending on the computing power and queue the remaining scripts. So you have it set up somehow to detect if the computer meets certain criteria to handle x-amount of scripts simultaneously like that, or does it do it some other way? An interesting feature for sure.

    Mel

  9. #9
    Join Date
    Dec 2008
    Posts
    3,201

    Default

    Quote Originally Posted by Mel Bel View Post
    Hey, that's interesting that it will run simultaneous scripts depending on the computing power and queue the remaining scripts. So you have it set up somehow to detect if the computer meets certain criteria to handle x-amount of scripts simultaneously like that, or does it do it some other way? An interesting feature for sure.

    Mel
    It is part of the framework I use. It depends on the amount of threads (connections is probably the same thing) available. If you have other things running then less threads. If you are running a higher end computer then it has more threads available. Lower end... less threads.


    Thomas

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may edit your posts
  •