Automate Web Browsing
Thursday, May 01, 2008 |
Edit Post
If you're looking for scripting access into client side JavaScript or Screen Scraping mechanisms to capture content as rendered in the browser, this will be of interest to you:
I've started to notice Ruby now for about 3 years, stumbling onto Ruby on Rails only occasionally to find it dispereased sparsely, but herald proudly, within the development community. Until recently I've pretty much ignored Ruby and have stuck with traditional Lamp platforms, relying on PHP for server side scripting.
Something I've wanted to do for a long time is to automate web browsing tasks. While I've used Perl's Mechanize library, my most pressing desire was to capture client-side JavaScript. My research uncovered two possible solutions. I found a firefox extension JSSh, a TCP/IP JavaScript Shell server for Mozilla, over at Ideas for Dozens: Telnet to JavaScript.
JSSh acepts a telnet connection interface to the JavaScript Mozilla's environment. While JavaScript Window objects are passed as objects in JSSh, there seems to be limitations, as these objects do not seem to offer full inheritance of Window Objects. Basic Math, Array and other objects are present, but what I needed was the Window.setTimeout() method. Maybe I am not fully understanding the functionality of JSSs, but if it has more features, they're not well documented. For certain limited applications, JSSh offers great flexibility to solve problems by providing any telnet capable application access to JavaScript and is non the less very cool.
My next tangent was found in Watir, an automated IE Screen Scraper, written in Ruby. With Ruby, and the libraries Watir and BeautifulSoup, I was able to automate a full function screen scraper in a couple hours (should have been minutes, had I already been familiar with Ruby)
The Class has three functions: It opens a specific page, logs in if required and then monitors the contents of a specific HTML tag. When the content changes, it raises an alarm.
On initialization:
- Open desired web page in a hidden IE window
- Login if redirected to login page
- Hold the contents of a single specific HTML tag in a Class variable
- Wait a specified delay interval
- Refresh the page
- Raise alarm and open a visible IE window if content has changed
Labels:
Tech
Dynamic Page QR Code
Popular Posts
-
Product Manager Cover Letter: This real cover letter worked successfully at getting an interview as a product manager. Use it as a templat...
-
Creating randomized valid file paths is a common requirement for many applications such as the case of short url redirects. The Goo.gl url s...
-
If you're interested in placing QR tags dynamically on your site, here's how I did it in less than 5 minutes thanks to Google's ...
-
Cover Letter Examples that I have used successfully to get a job interview: Further to my last post on this topic, there's no substi...
-
Is there a scientific reason that can explain Why People are So stupid? It's not surprising that so many people take advantage of being...
-
Sample for Cover Letters Writing an effective cover letter is essential to get yourself noticed. Use your cover letter as a sample of your...
-
Decoded HTML Encoded HTML Entities /** * Encode HTML tags as HTML Entities * using jQuery * * Code takes raw...
-
In my opinion, Git is a programmers program. It is fast, feature-rich yet intuitive, kind of like Google...there's a new treasure waitin...
-
Blogger RSS URL s can be customized to syndicate content in a user friendly way. This is especially important if you operate a multi-issue b...
-
The example below uses Google's OpenID API to request and validate the user's GMail address. The visitor is first directed to Google...