Archive | Javascript RSS feed for this section

Pseudo Test Driven Development with Selenium and Page Object Model for developers and QA

2 Aug

A thought came to me today and I did a quick search online. There doesn’t appear to be much articles about test driven development with respect to Selenium. And what info there is is too brief (e.g. presentation slides) or just high level and developer process oriented. Both those approaches just seemed common sense and didn’t cover the heart of the matter. So I’ll write about it from my perspective giving some details from both the developer and QA/tester sides on how to use Selenium effectively for both parties, and not reiterate (as much as possible) what’s already covered in other online articles on this subject (at the time of this post). What I cover should be useful reference to both developers and QA/testers alike, and perhaps even other groups. Though pardon the long descriptive post to get at the heart of the content, got to clearly describe it all…Here me out til the end before you voice any disagreements or consider this post a crap of a rant.

First, to the reader, I will just assume you generally know what Test Driven Development (TDD), unit testing, (user) acceptance testing, Acceptance Testing Driven Development (ATDD), Selenium, and Page Object Model (POM) are. If not, read up on those topics beforehand.

Let’s start from TDD and unit testing. TDD is usually associated with unit testing and then may expand from there. Unit tests cover testing input & output (I/O) of individual software components/modules that perform (ideally) a simple function. The next tier up is integration tests that cover interactions between the units (i.e. components/modules) and the respective overall I/O between the units as a system. When the integration test covers all the components that form the system the it may be considered a system integration test. The final tier is the user interface level where we create UI automation tests that are often also (or then extended to create) acceptance tests.

So with all that in mind, how does Selenium (and similar UI automation tools) fit in with test driven development? Now this question will only apply to the front end codebase as Selenium can only test for that. In terms of back end code, we do nothing in terms of Selenium and have to wait until back end dependencies are addressed/fulfilled.

For the front end, we should treat building the UI itself as a collection of units, whether they are visible in the UI or not. For the javascript portions (sometimes not visible), you can cover that with javascript unit testing tools. You can also cover it to some extent with Selenium/WebDriver by using it’s JavascriptExecutor feature to execute javascript on the page. That feature can also be useful for integration/system test of javascript when all the javascript + CSS + HTML code is combined to render the page UI.

In terms of the visible portion of the UI units – for simplicity, let’s consider them as widgets, headers, footers, components that fit together to form the page. As such, you should test with Selenium the same way you do unit testing. For each UI unit, you have Selenium code that will test the functionality of that unit in terms of I/O. Input would be clicks, mouse overs, typing text, etc. Output would be things like what element states change, what elements get hidden/unhidden, what events get triggered, do you go to another page (from link/button/menu, form submission, etc.).

Ok, but you need to use Selenium with a test framework or build one out so you don’t duplicate code over and over again for such unit-like tests for Selenium. True. Then you might ask, isn’t this overkill and we should use Selenium at the higher level and when UI is ready? We’ll cover that a bit later. In terms of framework, yes we need one. I leave it to the reader to figure out what framework is needed, since each organization’s needs are different.

With Selenium base unit testing idea covered, integration testing with Selenium is similar, just expanding to interactions between visual units and the I/O resulting from those interactions. And system integration testing = traditional UI automation followed by automated acceptance testing.

Now, with all that kind of covered, how should you then implement those different levels of testing with Selenium? Well, I see 2 approaches:

* You be crazy and actually implement unit test like coverage of your UI components using some Selenium test framework. For QA, think of this a really small & short Selenium tests that take less than a minute to run. But we can end up with many Selenium tests in the end (UI unit tests + UI integration tests + normal QA Selenium tests). This approach is fine if you don’t mind large test suite to run and have the bandwidth to write and maintain the tests.

* You don’t write Selenium tests at the unit and integration test level, but you develop your Selenium framework (especially when using page object model) with the unit test and integration test level mentality so that you can be able to write better Selenium tests and do more test coverage efficiently. This is the approach I advocate.

Now, for my advocated approach you might say, can you clarify? Let’s begin by referring to the page object model. A page object represents all the functionality on a page and all its visual elements. So all the logical things you can do on the page, you model with methods/functions, and all the UI elements are translated as element locators for Selenium (property members of the page object class).

Ok, so you may or may not already know that so then what? Page object model is great, problem is not everyone uses it to maximum efficiency. For those of you who do (or near it) congratulations. For those who don’t, you got work to do. What do I mean by maximum efficiency? I’ve witnessed that some people and organizations tend to have a golden path, happy path, basic sanity test coverage mentality when building out test automation. So they build into the Selenium framework only the methods and locators they need at the given time they are writing tests to cover some portion of the page only. Meaning a lot of the functionality of the page is left out and filled in over time as more tests are written to cover them, if that ever happens. This choice makes sense for resourcing constraints, and not writing unused test/framework code but is a terrible design choice, at least in my humble opinion.

The proper TDD way to do it is you build out the page object as you are building out the UI itself. In an ideal paired programming or agile environment where team members collaborate and you don’t have “waterfall” development process, the (front end) developer builds the UI and simultaneously, a QA member will build out the matching page objects for the UI. Alternatively, the developer could also be the one building both the UI and the Selenium page objects. I don’t advocate building the page objects after the UI is ready, you just can’t keep up that way, unless of course you follow the route I dislike where you only implement portions of the page to cover only what you need at the time.

But you can’t build page objects without a working UI or can you?! Wrong if you thought no. Before you get a working UI, you usually have mockups and wireframes and other specifications that you can go by. The good specifications also cover the interactions of the page/site flow and interactions between elements. Using that info, you can define the page object’s model: what methods should it offer, what element locators it should have, how the locators might be utilized by the methods. Filling in the actual logic to make the methods work and what the locator values actually are can be done as the UI gets implemented, piece by piece. Without really thinking about it, in this way, you are essentially building a custom (page object Selenium) API for consumption by Selenium tests (as the clients of the API). Another way to look at it is you are also building a UI consuming client that consumes the UI (acting as a server in a client-server application model).

With my analogies in mind, things get more interesting. Unless you have a really smart QA/tester (who is well versed in those analogy types) and/or one who is developer-oriented, you may have problems with the QA/tester building out a maximally efficient Selenium framework (or the page objects) that mimick what you can consider an API or a client of something. Reasoning is they lack the technical experience or mindset to model the page objects properly – you can certainly get by with the deficient model, it just won’t provide as optimal a test coverage as you can actually get. In a sense, working on the Selenium framework and page objects are really meant for QA people in the roles of QA architects and Software Development Engineers in Test (SDET), while writing the tests that utilize the page objects can be delegated to the other QA roles of QA automation engineer (a bit vague a role/term to me), QA tester, software QA engineer, etc. The roles of QA architect and SDET of course are for QAs who have developer type background, which surprisingly most QAs tend to lack – some may have developer skills and even engineering or Computer Science degrees but lack the depth of knowledge desired compared to a developer. In essence, by developer type background we mean a QA who is capable of being a developer and doing application development (not just working with Selenium and test frameworks) but chooses to focus on the testing aspect rather than the core application development aspect, hence the SDET name is quite literally correct.

Now, if the organization lacks SDETs and QA architects and doesn’t have the luck (or resources) to attract such people, you can make up for that loss (where possible) by shifting your developers to take on some (but not all) of the testing work. With good object oriented programming and software development experience, developers can build out the page objects that the less experienced QAs can consume by creating actual automated tests cases. Because once you have a fairly readable and easy to interpret page object model (i.e. designed well), less technical people can easily write scripts, etc. that call the page object methods and pass in the needed test data, something which quite some postings online talk about. When calling on your developers to help out in this area, I would say that those who are software architects, design APIs (or maybe consume them too), develop client-server software, or are good with designing classes would be best at designing your Selenium page objects.

On a related note, implementing page objects are best done by those with a development background, but defining the page object method signatures (the API or input/output) can be done by other members like User Interface/Experience folks or the business analyst, or QA etc. basically someone with good modeling skills. A bad developer can easily define a bad page object model.

Before we continue, let’s briefly come back to the thought of why QAs with less technical experience create deficient page objects. Reason 1 – they may lack the architect/developer mindset to properly define method behavior and parameterize methods. Meaning methods use default static test data and/or cover a specific case only and isn’t generalized to handle different cases. They may add more methods for other cases and at some point figure out the y need to refactor and simplify for code reuse. Bear in mind that all the presently possible cases to cover are already defined by the UI (and hence the matching page object model), so if one knows all the possible cases, when writing a method for at least one of the cases, who in their right mind would not build the method in a general way to handle all cases and parameterize it? Given time constraints, no one said you could not leave the other cases unimplemented (do nothing actions or throw exception about not being implemented yet), only handling what is most important at first. This way you leave a well designed framework with minimal refactoring later on, easier to fill in the missing code later on too. Reason 2 – lack of element locator definition skills. To achieve good page object modelling, especially for complex javascript library based web applications, and also when developers don’t cater to QA ideal requirements (more on that later), you often need to resort to XPath and CSS selectors to achieve locator templating and parameterization for effective modeling and miniming lines of code from many static value locators as the alternative. And due to CSS limitations around node text “contains” matching and parent/child/sibling node axis traversal, you often have to go with XPath. Many QAs only have basic knowledge of XPath and CSS and fail to construct good XPaths or CSS that are templatized and can be used with variables and/or chained with a prefix, suffix, infix, to define the final value of a locator at runtime. Developers tend to be better at working with XPath and CSS, since they work with it in the web applications. WebDriver also indirectly offers an object oriented way to use a WebElement to traverse back up to its parent/ancestor or down to its children, etc. though it’s not well documented and requires some technical knowledge to make use of. Reason 3 – they may lack experience with javascript as in programming in javascript. WebDriver and even Selenium RC exposed a javascript execution interface to execute arbitrary javascript code. Which comes in handy for browser limitation workarounds, missing functionality that Selenium doesn’t offer, or for extra (lower level) test coverage by executing javascript code. Without js knowledge QAs either have to search online for js tricks or consult their developers to get some js code to use. Some people advise against executing js code for testing, but there is one area it comes in handy too, to fetch test data that you’d otherwise have to find other means to such as “screen scraping” the UI or pulling from the database or external API when it is already offered in the web application via a javascript API or hidden DOM elements on the page.

Now let’s briefly go over why have your developers help out with the page objects and even Selenium tests? Because they have more technical experience that comes in handy for the modeling and heavy programming portions of Selenium page object and framework setup. And also you force the developers to eat their own dog food (i.e. consume their web application for testing) and they see how horrible locating elements can be with XPath and CSS – it can be done, but sometimes the string is long, a bit messy, and complex though still technically “beautiful” when well defined. And we know with developers tending to be lazy (like using jQuery over direct DOM/document object) they may want to finally make their apps use unique IDs, names, classes, or other attributes to avoid using XPath and CSS for locators.

And to finally conclude it, why build out page objects together with the UI development? If ou didn’t get it from reading so far, it’s because this keeps the functionality in sync between what’s available in the app and what you are/can/will be testing. The keeping in sync part also ensures any changes to the UI causes a timely matching update to the page object code (and updates to any affected tests that consume those page objects). Doing this also ensures proper test coverage from the start. By building out the page object early on, you already build out the infrastructure to support test coverage whenever you get around to adding it. To clarify, you need to cover A, B, and C but you only have time for A right now. If you build out the page object to support B & C, you may have no tests now, but adding a test for them later is easy, simply chain together calls to the right page object methods with the right test data, do some test runs and fix as needed and you’re good. Without page object build out early, you then have to go back and add them in then write tests which mean more work later on just to save on work now. Do you want to pay later or pay now? One analogy to this is when you build houses (and perhaps anything else) you build from a good foundation (bottom up, not top down starting with the roof first). And you complete the house as a whole. Not build in pieces with the house still missing rooms/walls (hard to live in). You build it fully and hopefully perfect from the start so you don’t later have to say upgrade/trade up (e.g. switch test frameworks) or do some remodeling or rebuild the house later on (e.g. refactoring or framework/page object code redesign). With the complete page objects in place, you also know what test coverage you are missing based on unused code. Whatever page object methods and locators exist but are not used means at some point you need to write tests to cover it. So you don’t later have to do assessments on what test coverage you may be missing. If you run code coverage tools, if you can run them against the Selenium framework code, that could also give you such stats. It’s also good to build out page objects early on with the UI since the project is fresh in everyone’s mind so you know what functionality there is and what to cover. When things get deferred you forget over time. Plus when you work on related things, even for page object code, doing them together at once is faster and better designed than do a little now and come back to it for the rest later when you might forget some stuff. In my experience, I’ve found that writing missing tests from existing well defined methods is a whole lot easier and faster than having to create the missing page object code later on before writing the missing tests as well.

And some final words, while I talk about Selenium and page objects, this approach can apply to other types of frameworks without page objects (just use object oriented or functional programming design) and other types of applications (desktop, mobile, APIs, etc.). For the QAs who are less technical/developer-oriented, hopefully this post has given you insight on how to do better with Selenium. For developers, hopefully this post has given you insight on how to help QA out.

Writing Selenium automation and tests in and of itself is writing software. Hence, just as with software, you want to do TDD with it too, and I’ve described how I would do it in a TDDish way.

Update 01/29/2014: Came across a similarly useful post:

Update 06/11/2014: came across a post that used the build a house analogy with regards to testing:


Shorthand to XPath and CSS in developer console and javascript libraries but not WebDriver API?

15 Jul

This thought just came to mind while I blogged and commented online about finding elements by XPath and CSS using the DOM following DOM conventions (accessed/executed in javascript whether in browser developer console or with WebDriver JavascriptExecutor) like:




where some people pointed out you could just do

$x(‘//someXPath’) and $$(‘someCss’)

in the browser developer console (Firefox, Chrome) to do same thing. And yes, it is much shorter to type out.

And of course, you can do similar if your web application already uses jQuery or other javascript libraries, in which case you can execute their ways of locating elements with WebDriver’s JavascriptExecutor. If the web application under test doesn’t use them, the alternative if you want those location strategies is to inject the jQuery and like libraries into the page under test using JavascriptExecutor first before you can utilize them.

So it seems a bit interesting how some people prefer shorthand use in javascript library and in developer console, but how we don’t have much complaints or work done on the WebDriver side to mimic same changes on the web development/debugging side.

Like how document.getElementByXyz() is considered unnecessary extra work but driver.findElement(By.xpath()).click() is ok instead of targeting for something like driver.$x(‘//someXpath’).click() or driver.$$(‘some css’).click(). Granted in the WebDriver case, due to object oriented programming and instantiating classes, you’d always have a driver object that you can’t skip typing though you could shorten it by naming it “d”. But maybe we can also shorten the find by XPath and find by CSS methods with alias methods. And if there’s an argument of no, then one might also ask, since we can live with this with WebDriver, why complain that you can and should use shorthand in the browser console and jQuery?

I guess it is easier to code and debug when shorthand but standardization and consistency of APIs and naming to me is more important. Less to remember & learn, especially for novices. To me

document.getElementByXyz() clearly maps better to  driver.findElement( than $x() and $$(), and vice versa. Same applies for the multiple elements version of the methods.

Testing XPath and CSS locators FirePath style across browsers

2 May

FirePath (for Firebug on Firefox) is a nice tool for finding and testing XPath and CSS selector locators. Firebug alone (and similar developer tools/console on other browsers) can only inspect element(s) but can’t give you the XPath/CSS to it nor allow you to directly test a given XPath/CSS locator value to see if it matches/finds any element to see if your locator is correct or not.

Well, I recently came up with a workaround that works across browsers. I’d still use FirePath on Firefox, but go with the workaround for all other browsers (until someone comes up with a FirePath port on those browsers).

Here’s the technique. You simply inject/execute some javascript code in your desired browser’s javascript (or error) console. Once done, you can then query for elements by XPath and CSS. Note that this isn’t a perfect workaround, it may have issues, but in general seems to work.

Here’s the javascript code snippet to run in the browser console:

document.getElementByXPath = function(sValue) { var a = this.evaluate(sValue, this, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null); if (a.snapshotLength > 0) { return a.snapshotItem(0); } }; document.getElementsByXPath = function(sValue){ var aResult = new Array();var a = this.evaluate(sValue, this, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);for ( var i = 0 ; i < a.snapshotLength ; i++ ){aResult.push(a.snapshotItem(i));} return aResult;}; document.getElementByCssSelector = document.querySelector; document.getElementsByCssSelector = document.querySelectorAll;

To then query by XPath or CSS, simply do:

document.getElementByXPath("value here");
document.getElementByCssSelector("value here");

for getting multiple locators/elements, use the plural versions getElementsBy…

be sure to escape the double & single quotes as needed

and note that you can directly query by CSS w/o executing the code snippet above. You simply call document.querySelector(“css value here”); and document.querySelectorAll(“multiple element css value here”); the above code snippet simply create an alias to the native methods to keep the naming convention closer to Selenium/WebDriver APIs.

Now the results show as HTML tags in the javscript console and when you hover over the result, it should highlight in the browser. You’d get nothing returned if there was no match (or undefined, null, etc.)

Neat trick don’t you think?

As for find element by ID, name, class, tag, we already have those natively in browser as
document.getElementById(), getElementsByClassName(), getElementsByName(), getElementsByTagName(). Note that only by ID returns a single element, the others here return a multiple but if it only matches 1, you then have a javascript array of 1 returned. You can then access directly like document.getElementsByName(‘uniqueName’)[0].

The code snippets here are derived from Selenium discussion thread:

Update 5/22/2013: Thought I’d mention this as well, forgot to mention it before. For CSS selector inspection & testing, one can also look at SelectorDetector and SelectorGadget. However, having tried them, I found them cumbersome to use. They are useful for finding CSS selectors but for testing (modifications to) them, I’d prefer the approach documented here. There may also be other browser extensions for Chrome/Safari/IE that fulfill what FirePath does for Firefox, but as of this writing I know not of any, if there is such now or in the future, please do enlighten me.

Update 6/28/2013: it looks like the code snippet above doesn’t work well in IE by default (tested with IE9, would also apply to older versions, not sure about IE10). But it works fine for Chrome and Safari (tested on Windows for both). For IE support, you need a few more tricks to do:

  1. IE developer tools doesn’t seem to return DOM elements back for inspection (e.g. run some javascript that returns a DOM element), where you can click the result to see it highlighted in browser, unlike FF, Chrome, and Safari. The workaround is to load Firebug Lite with IE (or equivalent add-on javascript console) to get that missing functionality.
  2. IE doesn’t offer native XPath support, so even with Firebug Lite and executing the code snippet above, it may fail XPath lookup. CSS selectors are OK. For XPath workaround then, you’d have to execute some additional code snippet first before the snippet above, and this prerequisite snippet simply injects the needed javascript library (that offers XPath support) into the page’s DOM/HTML source.

var script = document.createElement("script"); script.src = "URL to javascript XPath library"; script.setAttribute("type","text/javascript"); document.body.appendChild(script);

then wait at least one second, maybe a few seconds just in case before executing the original code snippet at beginning of this post. The wait is needed for DOM on page to update with the injected script.

Now, for the URL to the javascript XPath libary, you can choose what library to use. I don’t know how many there are out there, but I came across two:

The first one I think is what the Selenium project uses to support XPath for IE, but not 100% sure. I did my IE testing using this library.

BONUS TIP 1: you can also use the following trick to “find” XPath value for an element in IE, should work for other browsers too. But to verify/test the XPath, you still have to use the tricks in this blog post, or some alternate tools.

BONUS TIP 2: if tip 1 for IE you don’t like or not work for you, try the Fire IE Selenium tool that require Microsoft Excel to work:

Update 10/30/2017: I was informed about yet another CSS bookmarklet tool, SuperSelector, that is supposedly good. Also, there are now Selenium WebDriver based GUI tools that allow Firebug/FirePath type inspection of elements across browsers. Take a look at Looking Glass,  Selenium Webdriver Elementor Toolkit, and SWD Page Recorder

Update 02/18/2016: I came across info about alternate but similar XPath query in javascript to return the DOM element, with more details about the particular query:

document.getElementByXPath = function(xpathExpression) { return document.evaluate(xpathExpression, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue”); };


Manually setting cookie value in browser for testing and automation

26 Apr

Sometimes you have to set/use cookie for testing, and you may not have control to force set cookie on server side application/site, so you have to set it on the client side.

But what’s a good option to set cookie that’s cross platform across browsers and operating systems? I assume probably javascript, as not all browsers have a cookie editor section like Firebug related tools for Firefox. Most browsers only seem to have a cookie viewer built in with the dev tools unless I’m mistaken.

With javascript, you can set the cookie for use with Selenium (or use Selenium’s API to set cookie instead of pure javascript), and you can also use it for manual testing as well.

Here’s one example of how to do it adapted from

var doSetCookie = function setCookie(c_name,value,exdays){ var exdate=new Date(); exdate.setDate(exdate.getDate() + exdays); var c_value=escape(value) + ((exdays==null) ? "" : "; expires="+exdate.toUTCString()); document.cookie=c_name + "=" + c_value; }; doSetCookie("cookieName","cookieValue",1);

For manual testing, simply open up the javascript/error/developer console of the browser developer tools and then paste & execute the above code. May then have to refresh page for cookie to take effect.

I do notice that setting cookies this way for Selenium doesn’t persist as well as doing it via native Selenium APIs, but this is a workaround should you have issues with the native approach like with Selenium issue 5503.

And one last thing to end this post, anyone have a more optimized javascript version of the above code to execute? What I have above I’m assuming is just rudimentary proof of concept and could be fine tuned.

Selenium automation with execution of custom site specific javascript

21 Feb

Expanding on my previous post: Javascript is your ally for Selenium / WebDriver

I am wondering if anyone has used WebDriver’s javascript execution facility to execute custom javascript code (not general code used for browser/Selenium limitation workarounds) but javascript code that’s specific to their website or web application under test.

I asked a colleague within our organization and he said his department recommends avoiding such use. While I understand that it is best not to dig into specific detailed functionality of the app under test for UI automation it does have some uses:

Limitations of the UI and element locators (IDs, class names, name attributes or other DOM attributes, visual text on screen) in providing a good way to define locators in an object oriented way programmatic parameterized way so that your locators are more dynamic and not static. For example have a method that selects a color and we “select/click” the element (finding it) based on color name rather than some less user friendly/understandable strategy. In some cases, that’s not possible, but might be if you execute app-specific javascript code to find out the info you want that is not exposed via the UI. Same goes for checking application state that might not be fully exposed visually in the UI. I’ve found it helpful in my case.

The other useful case may be for more white box javascript API & unit testing integrated within Selenium UI tests for extended test coverage. This might be useful compared to issues that may come up with pure javascript API/unit tests alone per this blog:

What are your thoughts on this?

Javascript is your ally for Selenium / WebDriver

9 Feb

First a background introduction / discussion

When you run into issues or limitations for Selenium, particularly WebDriver, whether it be in Selenium itself or a browser specific limitation, javascript is your ally.

Though javascript isn’t all power and limitless, there’s much that can be done with javascript provided you know what to do with it and how to use it.

With Selenium RC, you have access to the browserbot object that gives you javascript-based access to windows and the HTML Document Object Model (DOM) via Selenium. You can do web search on Selenium browserbot to learn more about it or find usage examples. One of my posts show how to use it in interesting ways (under the DOM examples): Special element state validation with Selenium and CSS and DOM

With Selenium 2 / WebDriver, you have even more power with javascript as you’re given direct access to javascript with the only security restrictions being that of the browser itself (e.g. no system file access, no cross domain AJAX calls w/o workarounds), whatever you can do with javascript, you can execute with the WebDriver JavascriptExecutor. Again, you can web search for some examples.

Now the goodies:

Unfortunately, most Selenium users are not well versed in javascript (not front end web developers), and fortunately, Selenium is good enough 80-95% of the time that you don’t need to resort to javascript. But for the 5-20% of the time where you need a workaround, javascript might be it. So first, it is good idea to learn javascript programming so it can help you be better with Selenium. Outside of Selenium, it also helps you better debug/troubleshoot front end web application bugs.

In this blog post, I’ll list some useful javascript related content links that apply towards Selenium. And I’ll update this post as I find more. Sort of acting as a central repository of useful Selenium javascript workaround hacks. Since it’s not easy scouring the web for such solutions. Feel free to contact me with additional hacks you find.

NOTE: to understand how these hacks work, you have to learn more about javascript…

List of useful Selenium workaround javascript hacks

Verifying images – “really” verifying an image is rendered/displayed on the browser, not just a broken link

Checking scrollbars – checking whether an element (div, iframe, textarea, etc.) that holds other content has scrollbars (vertical and/or horizontal). The solution here is for Selenium RC in Java, but can be adapted for WebDriver and other languages. A good javascript learning exercise on how to adapt it.

Manipulating (or clearing) HTML5 local & session storage
Can add/remove, check for data, etc. but in terms of automated testing, we’re more likely just wanting to clear storage cache like clearing cookies. The above link is a Java example for session storage, but can be adapted to other languages, and local storage has the same API, just window.localStorage instead. And read this for more info on working with HTML5 local & session storage:

Drag & Drop using javascript instead of native interaction / Actions API
In case the native actions drag & drop fails to work for any given browser, or where not available like SafariDriver

Mouse over using javascript instead of native interaction / Actions API, comment #60 & other related comments (before & after that are relevant). This method useful when the native method fails for any given browser, and currently is only option for SafariDriver.

Mouse click using javascript instead of native interaction / Actions API or
See the mouse over link above for code. You then just need to modify the code to replace “mouseover” with “click” and “onmouseover” with “onclick”. How is this useful? When the regular method fails to cause the expected click action, and in case the native interactions / Actions API fails to work either (or where not available like SafariDriver).

Force setting value to a (Web)Element – sometimes, you can’t seem to manipulate a WebElement as action does not seem to take effect or it is not allowed because the element is not visible/displayed (e.g. hidden). One trick is to use javascript to set the value (e.g. value of a form input field, or value of a hidden input field that is actually used by web app and set via javascript when you perform action against some other element that’s usually near it in the DOM tree). For an example of this, see my blog post (WebDriver update at the end): Special element state validation with Selenium and CSS and DOM

Mouse wheel zoom action
This currently doesn’t appear to be available in Selenium, even with the native interactions / Actions API? Or I overlooked it. Here’s a possible javascript solution. I haven’t fully tested whether it works or not. As the link is to web developer/application code, you do have to adapt it to work within Selenium. Not simply copy, paste, and run.

Highlighting elements

Getting a WebElement from x and y coordinate among other neat stuff too
you might be able to get the WebElement from WebDriver API too, not sure, but here’s a way to do it from javascript, and more that might be useful for QA test automation purposes although I’ve not found actual need for them myself at present besides getting element from given coordinates.

more to come…

2/15/2018: Extracting image(s) off HTML5 canvas for validation/testing: Forgot to mention before, but using javascript, you can extract image/graphics off an HTML5 canvas as base64 encoded text of the binary image/graphic data. Using the scripting/programming language of choice that you use with Selenium, you can then base 64 decode the data into binary and save it as the corresponding image file type, or keep as base64 version for processing if you have tools that work with base 64 or that render base64 data into binary visualization. In terms of having it saved into file, you can then compare the image file against say a known image (file) to expect for validation, or run fuzzy logic image matching with OCR tools like Sikuli/Applitools, etc. For how to do that, you just need access to the canvas element in javascript (can pass in the WebElement reference via Selenium to JavascriptExecutor), and call this API: canvasElement.toDataURI(optionalImageTypeDefaultingToPng,optionalEncodingOptions)

Getting window handle on existing windows in javascript by use of window name

9 Nov

I came across this recently. Don’t recall if I found the basis for technique off the web or not (think I did, forgot the source URL), but adapted it for use in javascript code.

In rare cases, could also be used for Selenium/WebDriver, if you’re not using their built in window handling APIs.

//get handle to some existing window
var someWinHdl =,”some existing window name if you know it”);

//get handle to self (current window)
var selfWinHdl =,;

granted getting the current window handle this way isn’t all that useful though since you could also just use “window.self”. But it is useful to gain access to an existing window if you know it’s “” value.

This same technique can also be applied for opening new windows to keep a handle on the newly opened window, just provide a real URL instead of null to

With this window handle in place you could access it like any window object such as closing it, using the stored handle variable example above:


NOTE – update 7/8/14:

For this technique, the code will only work under these conditions:

  • will only work for current window & any windows that were opened from current window. That means that if window A opened window B, then window B won’t be able to access window A with this technique, even if you knew window A’s name. Or if window B was opened separately/manually by the user as separate tab/window and it somehow has a name, window A won’t be able to access it. This is likely from browser security sandboxing. You can only access windows you control (e.g. current window and any that you opened via javascript from current window).
  • needs to be defined with a value (in the case of getting handle to self/current window) or the window name to another window (opened by current window) needs to be a valid defined window name, otherwise, it will just open a blank window and you won’t get a reference to the (correct) window handle. For current window or self, you can arbitrarily set the value (e.g. to “testing123”) if it is undefined, and then you will be able to use the code above. For other windows currently open, we assume those windows were opened with a defined (or that it is defined at some point) so that you can reference it. If the name is undefined (anonymous), then this code won’t work against other windows that were previously opened by the current window.
  • The code here is generally not that helpful in regards to windows already opened since you get the window handle when opening the window with and storing the return value. However, sometimes one might not have stored that value in the original call, or for testing purposes, you don’t have that handle but you do have the window name. As such, you can make a repeat call with the presented code to retrieve the window handle without changing the current location/URL of that window (e.g. don’t cause it to navigate away or page refresh).
Tea in Eighteenth-Century Britain

History of Tea Project at Queen Mary University of London

Phimosis - A Simple Cure (That is working for me)

No nonsense, no adverts, no sales pitch, just honest information.

Glass Onion Blog

Cheat sheets, post-its and random notes from the desk of a programmer

Abode QA

A Hub For Testing Minds...

The Test Therapist

Performance & Security Testing Blog


a programmer's hub

Let's Not Crash and Burn

it makes your brain tingle


“Incinerate Ignorance”

Anastasia Writes

politics, engineering, parenting, relevant things over coffee.

One Software Tester

Trying To Make Sense Of The World, One Test At A Time

the morning paper

a random walk through Computer Science research, by Adrian Colyer

RoboSim (Robot Simulator)

Visualize and Simulate the Robotics concepts such as Localization, Path Planning, P.I.D Controller


open notebook

a happy knockout mouse.

my journey into computer science

Perl 6 Advent Calendar

Something cool about Perl 6 every day


Inspire and spread the power of collaboration

Niraj Bhatt - Architect's Blog

Ruminations on .NET, Architecture & Design

Pete Zybrick

Bell Labs to Big Data