Archive | Java RSS feed for this section

Selenium page objects beyond pages like a cart object?

23 Sep

Selenium page objects is a design pattern to help you model test code better. But one doesn’t have to follow the full guidelines of the design pattern.

Some people have used it to model parts of pages as well (headers, footers, navigation, templates, widgets, etc.).  But perhaps it can be useful for more than that, and some people might have already done this or not, as I couldn’t really search anything up or don’t know what/how to search specifically for this. If people have already done this, they haven’t widely publicized it.

What I have found so far is this:

It’s an interesting piece to review. I had this similar thought in mind recently and decided to blog about it:

A shopping cart page doesn’t really do much really. It contains cart items, and offers a visual call to action (click button) that then takes you to checkout. That along with standard site (header/footer/navigation) template actions (login, logout, links to other areas of site).

The core functionality in the shopping cart page really belongs to the cart items and what you can do with them. So in my mind, having the shopping cart page object manipulate cart item actions doesn’t seem quite appropriate for object oriented design.

For example, this would be how you might typically implement the cart page in basic page object model:

cartPage.updateQuantity(cartItemIndex, quantity);
//obviously access cart item by index, text string name of item, or by some unique ID

However, perhaps you can extrapolate the cart items outside of the page object to manipulate individually as a collection or set of related WebElements (name, quantity field, remove button, etc.) or for more advanced usage as an encapsulated cart item object model itself.

Both modeling options are presented here below (since I find it hard to list code in a “basic” WordPress blog)

I would note that modeling cart items in such a way, while being more object oriented, can make implementing the cart page object (particularly the getCartItems method) and associated procedure to locate and group the related cart item elements together functionally more complicated. Because often, the web application will not have an easily implemented UI that has all the related elements easily related and uniquely identifiable to each other, especially on a cart page with N number of cart items.

Usually that may require sophisticated use of CSS and XPath patterns to locate and relate the set of elements for N cart items on the page. So this this whole approach is not something novice page object model and Selenium users can easily tackle. It takes time and skill to do. But worth trying out.

In the long run, I feel this type of approach is more maintainable, scalable, and makes the tests more readable. It just requires more thought in architectural design and more work upfront to implement. However, the complexity to implement could be reduced if you can get the developers to make the element locator values easily defined w/o resorting to custom CSS and XPath, and make it work for N cart items, and X related cart item elements (e.g. item color, item description, item this, item that, for every cart item)

If you ask me personally which cart item object model I prefer, it is the latter one that resembles a page object rather than the one that is simple a container of WebElements for a cart item.

What are your thoughts on modeling things or objects on a page like a page object. Cart item is the one that tends to come to mind, but there are others, for which they can be thought of as objects but not widgets, nor headers, footers, or navigation for page object modeling. Some other possible examples include a search result, a category listing, etc.

Also, please do inform me if you come across other articles about using page objects for things like cart items, search results, etc. where we’re not working with a page but some other object per se.

IDEs and fancy development tools can be bad for you

23 Aug

Especially for testers (at least those without a good developer/hacker type skillset), you become reliant on them. And when you want to do something outside of them, you are lost.

I’ve seen quite a few posts from testers about how to execute Java (Selenium/TestNG) tests outside of Eclipse, and they usually have maven set up with Eclipse too.

As someone who likes to have options & builds tools & utilities from (glue) scripts and custom mini applications, whether for testing or not, it irritates me to see those questions pop up.

Either the poster is ignorant & an idiot for not knowing how to search up the solution (one search may not give you the entire solution, but search up the components that will make up your solution individually and you can piece together the puzzle to solve), or they are plain lazy to just want someone to solve it for them.

Those who work with Java toolset should know how to compile Java from scratch on command line, how to execute maven from command line, how to execute JUnit and TestNG tests from the command line test runner.

Those who work with .NET/C# should know how to compile the code from scratch on command line, how to execute NUnit/etc. tests from command line test runner, and the similar equivalents to what you do in Java.

Those who work with scripting languages should know how to run code from files on command line, run code from interpreter via command line (in GUI interpreter mode or pure command line).

Those who work with any language should know how to use libraries/packages/modules, installing, referencing them, etc. all without the fancy IDE – just command line compilation, installation, etc. Should also know how to use methods & functions of libraries/packages/modules without auto-complete in IDE, rather simply by looking up quick references and API documentation instead.

Those who work on Windows should know how to work with batch files and the Windows task scheduler. Even better to learn WSH, WMI, VBScript and/or Powershell.

Those who work on *nix should know shell scripting and cron jobs at least.

Selenium WebDriver – extracting an image on page by use of take screenshot and cropping

4 Dec

Normally, if we wanted to get an image off the web page under test, we’d download it externally using the URL extracted off the image element source attribute. Unfortunately, in today’s AJAX based web apps, that doesn’t always work. Sometimes, the elements appear like images visually to the user but are rendered by javascript into DOM elements that are not “image” elements, composed of sprites, etc. on the server side.

In such cases, you have to go an alternate route…

The technique to do it is explained here:

and an example implementation in Java here:

Side comment 1 – it would be nice if others in the community can contribute other language implementations to the Java source code example above so others can use it. Python, Ruby, C# perhaps? And a sprinkle of PHP, Perl…I may contribute code snippets when I have time.

After you’ve obtained the image, you can do whatever you want with it. Binary/MD5/SHA-1 hash compare the image against a known benchmark one?

For my case, that’s not sufficient as the image isn’t always a single image but a collection formed as one and dynamically generated. So we do some flexible fuzzy logic image comparison with our internal in house image comparison solution.

And on that topic, side comment 2:

I’ve noticed that taking screenshot then cropping to desired element location & size isn’t 100% cross platform compatible. So blogging this for reference to others and to see if anyone else have same issue. The problem here is that different browsers crop with slightly different coordinates and/or width/height ending up with cropped images that differ slightly across browsers. Now I haven’t tested across all browsers, but my findings reveal that Safari provides the most exact desired cropping, while Firefox and IE have slight variations that are not desirable. I didn’t test Chrome, nor mobile Safari with Appium.

With that in mind, you definitely can’t do hash/binary compare of the cropped images across browsers due to differences, unless you have a different benchmark image per browser. So with a single benchmark image, you’d need some flexible fuzzy logic image comparison technique. There’s multiple options here, but none readily available, other than maybe ones like Sikuli. The in house solution we use is based off ImageMagick and does the dirty work of calculating the parameters for the image comparison for you wrapped in a nice REST API.

Now anyone have experiences they can share around the topics discussed above?

Developing Selenium tests with proper abstraction

11 Jul

This post is with respect to page object modeling of Selenium tests but also applies to general Selenium test development when not using page object model. In the latter case, one may be using object oriented programming/design (OOP/D) or functional (but not necessarily object oriented) programming/design (FP/D), or not. I would advise if not using OOP/D, at least use FP/D.

In this post, I’m ranting from my experience, how some people do not use page object modeling properly (with some examples) and how it should be used. Pardon the long blocks of text & sparse use of formatting, it is a rant after all, I may try to pretty up the formatting in the future.

While one doesn’t have to adopt the page object model fully, there are some guiding principles one should follow.  Others may have differing opinions. This article may be a nice read for some of you who have experienced the frustration of what I see, and/or a nice reference for those not so experienced with programming particularly OOP/D and/or FP/D. From my perspective FP/D is same as OOP/D with the expection that you don’t organize your code into class objects with methods, instead you just have a set of functions (e.g. methods) to do things and in both cases, the methods/functions are parameterized and abstract away the low level details to describe a higher level view of functionality. Hopefully, in the larger community of Selenium testers, there are not many who actually do the things I rant about in this post. For those who actually do it, I’m assuming they’re either novice programmers or that they’ve been used to the the Selenium IDE and old school way of singular test command execution of writing tests.

Things you can choose not to adopt:

  • page objects returning other pages. This is a nice to have, but you can manage without it, you just have to manually instantiate the correct page objects in the needed flow/sequence rather than expect the starting page object to return the next page object in sequence for you to then continue with test. when returning page objects from a page object, you still would want to check that the page returned is correct (in correct state), then it wouldn’t be too much to expect the test writer to know what the correct page to check for and as such they can also manually instantiate the correct page if page object doesn’t return the next page. This is one feature our team chose not to adopt, why, I have no idea as I’m not the lead architect of our framework to spec’d it out. In OOP/D, class objects are not required to return other class objects, but they may. They could simply return basic types (int, string, boolean, etc.). So therefore, you can do same with page objects. Here’s some related posts about not wanting to return page objects from a page object:

Things you should adopt:

  • abstraction of page behavior from test behavior. Page behaviors being UI element locators, actions that can be performed on page, UI/page/element state. Test behavior being assertions performed, comparison of data, sequences of actions across pages. Following this allows for good, clean, modular test design that’s easier to read & maintain.

Problems observed with improper adoption of page object model:

  • UI element locators of page object made public and directly accessed by test to perform assertions against (text of locator, DOM/element attributes of locator), or to invoke actions against (e.g. click, type, select). It might be easy to do so, but then it makes tests aware of internal page UI and structure, which is not good abstraction and modeling of test. More brittle tests and code all over the place to update for changes. UI element locators should always be made private to the page object (or protected), and must be accessed only by methods of the page object or associated classes. If you need to access some locator (extract data or manipulate it) and find no method to do so, it means you need to create proper methods for it. In Java, you’d get warnings about private members that don’t get used by some method of the class. It’s actually ok though to leave private member locators defined with no methods that manipulate them as you can fill them in later on, though it is good practice to do it all at once (but who has the time to do the best case scenario anyways?).
  • Convenience methods added to page object. Page objects are meant to expose behavior of the page (or whatever the page represents as it can be a group of pages or section of a page). As such, the page object should contain methods that define particular individual logical actions on the page only (which may or may not reference other page objects). You should not create “convenience” methods that call multiple methods (which may or may not involve other page objects) just so that you have a single simple method to call in a test case to do a group of things and not have to write out a group of method calls & class object instantiations in the test cases. It makes sense to have convenience methods, but they should go in some common or base test case class that test cases are derived/extended from. They don’t belong in page object, unless it is actually logical as a behavioral action of the page. These are easy to notice when the method name or the implementation within the method indicates something like doThisThatFromPointAToGoToPointB().
  • Improper abstraction or defining of page object methods. For example, hard coding values to a page object method. This leans toward making the method a convenience method so that you can get through some step to test something else. Because you may later or should have tests that invoke the method with other data to test other use cases, therefore, the data used should never be hard coded. If you desire some hard coding, use/add as a separate additional convenience method (in common/base test case class) that calls the parameterized abstracted page object method in question with the predefined static input/data. Another example, is too granular, low level, atomic methods like clickThis(), clickThat(), because it is easy to do and is the old fashioned and Selenium IDE way to automate tests by per command basis. But ideally you want to abstract it up higher to something more functional or logical such as doThis() or goThroughSomeFlow() that actually perform the actions of clickThis() and clickThat() at the lower level. If you keep it too granular and low level, that defeats the purpose of OOP/D with page objects, as going so low level, you might as well go back to Selenium IDE style of executing singular Selenium commands and chaining them up to build a test.
  • adding test data into page objects like locators. For example, defining static variables for text on a page, and later using it to perform assertions or checking that the text exists on page. If the variables/text is used like a (or to define a) locator, then it is fine. But when you see it used like test data, then it should belong under a common/base test case class (or the test case itself) as a static variable rather than in a page object. Using like test data usually means it is used in assertion as the “expected” value rather than the “actual” value. If it’s treated as “actual” value then that’s ok, because then it is behaving like a locator that you’re extracting text from.
  • not using enums and constants where available in programming language used for automated test. This doesn’t directly apply to page objects, but does indirectly in terms of usage with page object methods (their parameter arguments or the internal method implementations). I’ve seen people use integer values like 0, 1, 2, or perhaps strings to specify option path to branch with (via if/else or switch case statement). While that’s fine, it makes the test design more brittle to updates and harder to interpret the meaning of the integer value. Strings are less of an issue but brittle should the string value change. While one can’t avoid use of strings and integers in actual implementation as some UI elements (or non-UI items but some type of data) will be based on string identifiers or identified by index or integer value, it helps at the interface/API level where you define the page object method signature (parameter arguments and return data type) to use constants and enums. Internally within the method, you use the enum/constant to lookup/translate to the proper string or integer value (e.g. SIDES.FRONT translates to “front” or COOKIE.OATMEAL translates to 2). This way, the test cases don’t need to have knowledge of the actual low level developer-centric values (of the web application or HTML source), it just needs to know on high level what logic we’re handling like which side of an object (front or back) or what type of cookie (oatmeal or peanut butter, without caring what the actual index position or value an oatmeal cookie is in the web application), which is part of abstraction with OOP/D. All the low level logic/values is handled within the page object methods.

Proper page object modeling

Here are some of my thoughts on what properly modeling a page object should be like. I didn’t do an extensive search on page objects online, but the example presented by the Selenium project is a little too bare/minimalistic to guide a novice tester not fully acquainted with OOP/D. Hopefully, what I present here is a bit more thorough in clarifying page object modeling with respect to OOP/D.

Any and all page objects should contain a set of the following types of methods that define the overall behavior of the page:

  • state reporting methods like isSomethingInSomeState(), isPageInSomeState(). Examples can be isColorSelected(“red”), isFontSelected(“Times New Roman”) or amIOnThisStepInwizard(2). With such methods, you can then call from test case to assert against expected state, the return value of the method being the actual value that you would compare against the expected. Alternatively, you might call the method to check the state to then perform the approach if/else/switch statement branching, such that if in state A do this, if state B do that. This type of method I’ve noticed some people miss. They either don’t create any such methods, because we generally aim for automating the happy/basic path test cases and don’t plan for robust regression testing that should involve state checking. The happy path tests generally assume state and indirectly test state via other actions (luckily that does happen), otherwise, there would be critical missed test coverage. Improper usage in this area I’ve seen is where instead of having a proper state checking method, people instead simply check an attribute or text of a locator (exposed publicly) as a way to check state.
  • read (or data extraction) methods like getThisFromThat() or getThisOfThat(), where we extract out data on page via page object to then assert against. The extracted data is the actual value to assert against some expected value. An example method could be something like getPriceOfCookies(COOKIES.CHOCOLATE_CHIP) where page shows a table of cookies and their prices, and COOKIES.CHOCOLATE_CHIP is an enum constant value defining chocolate chip cookies. The method internally uses the enum to figure out what locator to extract price value from. Improper use in this area I’ve seen is where people expose locators publicly and extract data off it rather than abstract it within a read method.
  • write (or action) methods like doThis(), doThat(), doTheseMultipleThings(), where the multiple things method could be invoking the multiple things in sequence (serially) or concurrently in parallel. This is the simplest type of method people would be comfortable with creating methods for to do certain actions or groups of actions on a page like filling in form fields, clicking buttons & links, selecting menu items that involve mouse over dropdown menus, etc. The only issues of bad usage I’ve seen here are the problems reported earlier (e.g. too low level like clickThis(), hard coded data, convenience methods).
  • parameterization of locators and methods. By parameterization for methods, we mean they should not have hard coded values. All input should taken as parameter arguments to the method, and page class member property constants/variables. Anything sort of hard coded in the method might be for template constructs to form a locator or test data. But nothing truely hard coded. By parameterization for locators, we mean where applicable, don’t statically define locators by a static ID/name/XPath/CSS/etc. With true OOP/D, you’ll notice that certain locators form a logical grouping like icons on page for red, blue, green color, or a dropdown menu of font names. If you carefully inspect the HTML/DOM source, you’ll notice that these locators share common attributes. In essence, you can define them with a regular expression (or string or positional index matching) type of locator using XPath or CSS selectors. So then you can templatize/parameterize the locator as a static expression with some dynamic variable inserted into it somewhere (in front, at end, in middle, etc.) to form the actual locator at runtime. The alternatives to parameterizing locators, are (1) offering fixed test coverage (e.g. we’ll only test so much with these statically defined subset of color or font locators, we won’t test all possible colors or fonts with data driven testing), (2) using static set of locators defined individually as scalar variables or as a set within an array/dictionary/hash/map. Alternative 1 doesn’t offer future scalability in full data driven test coverage, alternative 2 makes it a hassle to maintain a large set of locators also expanding your lines of code unecessarily. Though the trade-off for locator parameterization is that your locators can become more complex (for those not well familiar with complex XPath and CSS) and possibly slower as you have to use XPath or CSS over by ID, name, etc. sometimes. And it allows for better page behavior modeling as you can offer something like isColorSelected(“color name as string or enum constant”) vs isRedSelected(), isBlueSelected(), etc. But while you can still take the former approach of isColorSelected(value) with alternative 2 (array/map of locators), it ends up being a large block of if/else or switch case statement to filter through all possible values in the set rather than a single or few lines call to manipulate the correct locator by dynamically inserting/injecting the passed in value to parameterized locator (template). Last, alternative 1 for locators in my opinion is not desirable as you limit yourself extensibility and expanded test coverage in your framework (via data driven testing) in the future, and say if the set of colors or fonts were switched around in the app that you use for testing, you have to keep updating them in the framework, whereas if it was built well with parameterization, you won’t have to worry about it, you just update the test case data to match what the app offers, the framework remain same. In good test design, we want to avoid updating test framework and prefer to update test cases (flow and/or data) as the latter matches manual test use case expectation (what does test framework have to do with testing, with respect to use cases and manual testing?). I do know that one reason parameterized locators are not used by some people may be because they aren’t familiar enough with XPath and CSS selectors to utilize the functionality in cases where it is hard to parameterize without them, in which case, you would be stuck with alternative 1 or 2.

To conclude, this may or may not be a lot to expect of a QA person who works with Selenium, particularly with page objects and/or a functional test framework. But writing (real) test automation itself is software development, so technically those who work with proper Selenium automated tests should be “Software Development Engineers in Test” (SDET), not really Software Quality Assurance (QA) Engineers since you’re writing software for testing.

For those who still don’t (fully) understand what I’m talking about after reading this post, you need to learn more about OOP/D as well as how to define functions and methods in programming.

Selenium file download by code and request for more platform options

23 Aug

I see in the Selenium discussion groups, the following has been a frequent topic:

How to download file with Selenium/WebDriver?

particularly a cross-browser, cross-OS solution, or when FF/Chrome profile doesn’t work for the auto download file to some location (w/o prompting) and AutoIt, etc. not perfect.

The typical suggestion is to do it via external code via HTTP libraries for your language platform. But so far, there’s only suggestion of Java:

Being an idealist, I think it would be nice if there were more options besides Java. And furthermore, could provide more library code for integrated testing to do more than just download files. I’m thinking along the lines of sharing session cookies to make HTTP POSTs and other REST API calls outside of Selenium but that require a session that is started from and available with Selenium. Think functional GUI + API testing together.

On that note at least, here’s a start for similar example to the Java one in PHP:

Now who’s with me in trying to offer more to the community than just Java (and PHP)? We should have equivalent examples in .NET, Python, Ruby, etc.

Update 4/6/2015:

Python version can be from this:

and an alternate simple gist for Java:

Optimizing Selenium tests with HTTP requests

23 Aug

I found a Selenium talk by Santi from SauceLabs interesting:

and wanted to add a comment/elaborate on one of the topics.

For optimizing Selenium tests by bypassing GUI for nonessential actions (e.g. test data/state setup, etc.), you don’t necessarily need:

  • special APIs
  • hidden test pages to do the state/data setup
  • or go through the database & memcache config route

Since we’re talking about the web applications space, the first approach one should have in mind is what other methods can I test the web app with without using GUI test tools or involving a developer to provide access hooks? Or perhaps phrased differently, how else would you load/performance test the web app w/o using Selenium (Grid)?

For the latter question, ask load test pros, and they’ll probably tell you to use a tool like JMeter. Consider how to use JMeter and going back to the former question, the answer would be to use HTTP requests (normally GETs and POSTs).

And for these HTTP requests, you’re simulating a user and making the same requests but without a browser/Selenium. This also offers the benefit of test coverage closer to what Selenium gives you since all that you’re missing is rendering of the page and javascript/AJAX by a browser, but all the client-server interactions is replicated by the HTTP requests. Therefore you don’t need special (REST) APIs created or need to involve a developer (unless you lack expertise to work with HTTP requests in code). All the APIs needed are already part of the web app / website itself. You just need to find out the API (i.e. HTTP requests). How to do that?

Step 1, check if there’s existing documentation already, saves you time and serves as useful reference. If none available, goto step 2, reverse engineer the web app from the API/HTTP level. To do this, you would use tools like the network analysis feature of browser developer tools (for Chrome/Safari) like Firebug, or tools like Fiddler, Wireshark, or a web proxy that logs HTTP requests. For IE, in addition to the IE developer tools, there’s also a handy tool called ieHttpHeaders.

With these tools you spy on the network traffic that occurs whenever you perform actions like click a link/button, submit a form, login, etc. You’ll see the actual HTTP (GET or POST) requests that were made, the data that was sent along with it, any associated HTTP headers and cookies (cookies are where the session state are stored), and the response from the server.

Once you make note of the requests you need to make, the type of data to pass along, the cookies you need to create/send or extract from response, you can replicate them back using code with the HTTP libraries offered for your programming language.

In addition, this works both ways in that you could first use HTTP code to set up session cookie and make configurations then extract cookie and pass to Selenium, or you could have Selenium start some work, then extract Selenium cookie(s) and pass to HTTP code to do some other work where it’s optimal over Selenium.

It’s some amount of work to provide good code/test examples, but you can find some basic examples at least on how to extract cookie from Selenium to then pass to HTTP code (say to do a file download):

Java example for WebDriver

PHP example for Selenium RC

and last, this article I wrote a while back is also a good example and code demo of how to spy for HTTP requests from browser session then convert the traces from the manual exploration into code that makes same HTTP requests, which can be adapted for Selenium and for other languages (original is in Perl):

A Selenium IDE alternative for other browsers and another record & playback method

3 May

I see quite a few posts about Selenium IDE support for IE, etc.

While there’s no such thing and as such record & playback support, just wanted to mention an alternative:

It does require you to know Selenium RC or WebDriver API to make use of. But it also allows a form of record & playback as well.

Read my other posts for background reference:

Using Selenium with interactive interpreter shells?

Developing and debugging in Java via an interpreter

Using the interpreter shell method, you can try things out in shell (~using Selenium IDE but with RC/WebDriver commands) and when you have what you want, copy & paste to file and clean up to make into a script (Python, Ruby, Perl, PHP) or then compile code (Java, .NET/C#) (~record), and use that output to run commands again (~playback).

This method will work for whatever browser Selenium supports making it sort of an IDE that will work for the non-FF browsers.

And this method also allows you test/verify Selenium locators. You can’t find them using this method (need Firebug, or other tools for IE and other browers) but you can test them using this method to make sure locators work. And you can also run the Firebug and other plugins while using this method so combine those tools with this method and you can find and test locators at same time and know Selenium will work with them, with exception of timing problems that don’t show up under this method (compared to scripts/code).

One other benefit of this method, is it can help you find problems in other browsers like quirks for locators, windows, and other things, if you develop in the other browsers using this method rather than always starting with Firefox.

Update 1/23/2014:

Besides Selenium Builder, and my interpreter shell method above, there is a new alternative option for a Selenium IDE, see SWD Recorder, read a blog post review of it, or watch the video about it. It’s not an exact replacement but a similar one and is cross browser supported.

Der Flounder

Seldom updated, occasionally insightful.

The 4T - Trail, Tram, Trolley, Train

Exploring Portland with the 4T

Midnight Musings

Thoughts on making art

Automation Guide

The More You Learn The More You Play...!

The Performance Engineer



Thoughts related to software development

Yi Wang's Tech Notes

A blog ported from

Appium Tutorial

Technical…..Practical…..Theoretically Interesting


I swear! Meerkats can do Linux


Requeuing the packets dropped in my memory.

Two cents of software value

Writing. Training. Consulting.

@akumar overflow

wisdom exceeding 140 chars.

Lazy Programmer's Shortcut

Java, J2EE, Spring, OOAD, DDD & LIFE! .......all in one :)

Testing Mobile Apps


education and inspiration for visual storytellers

No, Seriously...

Freeing up some mind cache!

Mike Taulty

I do some developer stuff for Microsoft UK