Getting a list of all link URL’s on a page with Selenium..

You get a list of all URLs on the page by using the getHtmlSource command to get all of the HTML source code from the page. You can then use either a regular expression or HTML parser to extract what you’d like. In this case; all of the a href elements on the page.

If you’re using Selenium, you can the following code into your script.

This will open up yahoo.com in Safari and get a list of all the link on the page and put the output into your console. If you use this as a function for every page that loads, the output in your console could get pretty lengthy if you’re dealing with multiple pages. It would be better to have this output in a text file.

If you’re using JUnit Reports to do your reporting, you can easily have all that information output into a text file. If you’re new to reporting, please see my post on how to set up JUnit reporting in Eclipse with Junit and Ant.

@Before
public void setUp() throws Exception {
selenium = new DefaultSelenium(“localhost”, 4444, “*safari”, “http://www.yahoo.com”);
selenium.start();

}

@Test
public void testUntitled() throws Exception {

selenium.open(“/”);
selenium.waitForPageToLoad(“20000”);

String htmlSource = selenium.getHtmlSource();
Pattern linkElementPattern = Pattern.compile(“]*href=\”[^>]*>(.*?)“);
Matcher linkElementMatcher = linkElementPattern.matcher(htmlSource);
while (linkElementMatcher.find()) {
System.out.println(linkElementMatcher.group());
}
Object myurl = selenium.getLocation();
System.out.println(myurl);

@After
public void tearDown() throws Exception {
selenium.stop();
}
}

Advertisements
Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: