0 likes | 1 Views
A comprehensive, in-depth guide that teaches not just how to use Selenium, but how to architect reliable, maintainable, and scalable test automation solutions for modern web applications.
E N D
Selenium WebDriver Tutorial - Outline 1. Introduction to Selenium ● What is Selenium? ● History and Evolution ● Components of Selenium Suite ● Advantages and Limitations 2. Setting Up the Environment ● Installing Java/Python ● Installing IDE (Eclipse, IntelliJ, VS Code) ● Downloading and Configuring WebDriver ● Setting Up Browsers (Chrome, Firefox, Edge) 3. First Selenium Script ● Writing Your First Test ● Explaining WebDriver Methods ● Running Tests on Different Browsers 4. Locating Web Elements ● By ID, Name, Class, Tag, Link Text ● XPath and CSS Selectors ● Dynamic Locators ● Best Practices 5. WebDriver Commands ● Browser Commands
● Navigation Commands ● WebElement Commands (Click, SendKeys, etc.) ● Handling Waits (Implicit, Explicit, Fluent) 6. Advanced User Interactions ● Handling Alerts ● Working with Frames and Windows ● Drag & Drop, Mouse Hover, Keyboard Actions 7. Handling Web Forms and Tables ● Input Fields, Dropdowns, Checkboxes ● Reading Web Tables ● Form Submission Scenarios 8. Page Object Model (POM) ● What is POM? ● Benefits of POM ● Implementing POM in Framework 9. Data-Driven Testing ● Reading Data from Excel or CSV ● Using TestNG or PyTest Data Providers ● Parameterization 10. Framework Development ● TestNG/JUnit Integration ● Logging with Log4j
● Reporting with Extent Reports/Allure 11. Parallel and Cross-Browser Testing ● Introduction to Grid ● Selenium Grid Setup ● Running Tests in Parallel 12. Continuous Integration with Selenium ● Integrating with Jenkins ● Triggering Tests from CI/CD Pipelines ● Reporting and Notifications 13. Troubleshooting and Best Practices ● Debugging Selenium Tests ● Common Errors and Fixes ● Optimizing Test Performance 14. Selenium with Other Languages (Optional) ● Selenium with Python/Ruby/C# ● Language-Specific Nuances 15. Real-Time Project ● End-to-End Test Case ● Folder Structure ● Code Walkthrough Appendices ● Selenium WebDriver API Reference
● Useful Tools & Browser Add-ons ● Interview Questions & Answers Chapter 1: Introduction to Selenium
1.1 What is Selenium? Selenium is a powerful, open-source framework for automating web browsers. Primarily used for testing web applications, Selenium lets testers simulate user interactions such as clicking buttons, entering text, and navigating between pages—just as a real user would do in a browser. Unlike many tools tied to a specific browser or operating system, Selenium supports multiple browsers (Chrome, Firefox, Safari, Edge) and cross-platform execution (Windows, macOS, Linux). This flexibility makes it a preferred choice for UI automation in web development and QA. 1.2 Evolution of Selenium Selenium has come a long way since its inception. Here's a brief timeline: ● Selenium Core (2004): The original JavaScript-based framework developed by Jason Huggins. Limited due to same-origin policy restrictions. ● Selenium RC (Remote Control): Introduced to bypass same-origin issues using a server as a proxy. ● Selenium WebDriver (2008): A complete rewrite, WebDriver directly controls the browser, providing more native support and better performance. ● Selenium Grid: Allows running tests in parallel across different browsers and systems. ● Selenium 4 (Latest major release): Includes W3C WebDriver standard support, better debugging tools, relative locators, and enhanced documentation. Today, Selenium WebDriver is the core component used for browser automation, and it integrates well with modern CI/CD tools and frameworks. 1.3 Components of the Selenium Suite Selenium is not a single tool but a suite of tools: Component Description Selenium IDE A record-and-playback tool; great for quick demos or prototypes. Not suited for complex testing. Selenium RC Now deprecated. Allowed writing test scripts in various languages but required a proxy server.
Selenium WebDriver Most widely used; allows test scripts in Java, Python, C#, Ruby, and more. Controls browsers natively. Selenium Grid Supports distributed test execution across multiple environments. Useful for large-scale testing. 1.4 Why Selenium? Open Source and Free There’s no licensing cost involved. Anyone can download, use, or even contribute to its source code. Language Flexibility You can write Selenium tests in Java, Python, C#, Ruby, JavaScript, and more—whichever language best suits your team or project. Cross-Browser and Cross-Platform Supports all major browsers and works on Windows, Linux, and macOS. This makes it a strong choice for testing web apps with broad user bases. Integration Friendly Easily integrates with tools like: ● TestNG / JUnit / PyTest for unit testing ● Maven / Gradle for build management ● Jenkins / GitHub Actions for CI/CD ● ExtentReports / Allure for test reporting Scalable Selenium Grid and third-party platforms like Testgrid, BrowserStack, or Sauce Labs allow scaling up test runs across multiple machines and browsers. 1.5 Limitations of Selenium Despite its strengths, Selenium has some limitations: ● Only for Web Applications: Cannot test desktop or mobile native apps directly.
● No Built-in Reporting: Requires third-party libraries or frameworks for test result visualization. ● Steep Learning Curve: Beginners may need time to grasp concepts like locators, synchronization, and framework integration. ● Flaky Tests: Tests may occasionally fail due to timing issues, requiring robust wait strategies. These challenges can be mitigated with best practices, reusable frameworks, and reliable infrastructure. 1.6 Who Uses Selenium? Selenium is widely adopted by companies across industries for functional, regression, and smoke testing of web apps. Common user groups include: ● QA Engineers and Automation Testers ● Software Developers in Test (SDET) ● DevOps Engineers (in CI/CD workflows) ● Technical Leads for validation at scale Popular companies that rely on Selenium include Netflix, LinkedIn, Salesforce, and Amazon. 1.7 Real-World Example Imagine you’re testing an e-commerce website. You might want to verify: ● A user can search for a product ● The product details page loads correctly ● The user can add the product to cart ● Checkout functionality redirects to payment gateway Selenium allows you to write automated test scripts that simulate all of the above actions, speeding up testing and reducing manual effort. For a comprehensive guide on using Selenium WebDriver effectively, Testgrid offers valuable insights through its Selenium WebDriver tutorial on their blog. It’s a great resource to enhance your automation testing knowledge and improve your Selenium workflows.
Chapter 2: Setting Up the Environment Before writing Selenium tests, we need to prepare the development environment. This involves installing programming languages, selecting an IDE, and configuring browser drivers. The setup process may vary slightly based on whether you're using Java, Python, or another language, but the overall structure remains consistent. 2.1 Choosing the Programming Language Selenium supports multiple languages, including Java, Python, C#, Ruby, and JavaScript. The most commonly used languages for Selenium automation are Java and Python, due to their community support and extensive documentation. For this book, we'll demonstrate examples in both Java and Python, so you can follow along with the language of your choice. 2.2 Installing the Prerequisites Java Setup If you're using Java: 1. Install JDK (Java Development Kit): ○ Download from the Oracle or OpenJDK site. ○ Set the JAVA_HOME environment variable. ○ Add %JAVA_HOME%\bin to the system PATH. Verify installation: java -version ○ 2. Install an IDE: ○ Recommended: Eclipse or IntelliJ IDEA ○ Download and install the IDE from the official site. Python Setup
If you're using Python: 1. Install Python: ○ Download the latest version from python.org. ○ During installation, check the option “Add Python to PATH”. Verify installation: python --version ○ 2. Install an IDE or Code Editor: ○ Recommended: PyCharm, VS Code, or Jupyter Notebook 3. Install Selenium Library: pip install selenium 2.3 Installing Web Browsers Selenium interacts directly with browsers. Install the latest stable versions of at least two major browsers (e.g., Google Chrome and Mozilla Firefox) for testing. ● Google Chrome: https://www.google.com/chrome/ ● Mozilla Firefox: https://www.mozilla.org/firefox/ ● Microsoft Edge: https://www.microsoft.com/edge 2.4 Downloading WebDriver Executables Each browser needs a WebDriver executable, which acts as a bridge between Selenium and the browser. Make sure to download a version that matches your browser version. ChromeDriver 1. Find your Chrome version: Go to chrome://settings/help
2. Download the corresponding ChromeDriver: https://sites.google.com/chromium.org/driver/ 3. Extract and place the executable in a known directory. 4. Add it to your system’s PATH or specify its location in your script. GeckoDriver (Firefox) 1. Download from: https://github.com/mozilla/geckodriver/releases 2. Unzip and add to system PATH. EdgeDriver 1. Check your Edge version. 2. Download from: https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/ 2.5 Setting Up a Java Project with Selenium Here’s how to create a basic Selenium project in Java using Eclipse: 1. Open Eclipse and create a new Java Project. 2. Add Selenium libraries: ○ Download the Selenium Java client from: https://www.selenium.dev/downloads/ ○ Include the .jar files in your project’s Build Path. 3. Write a simple script: import org.openqa.selenium.WebDriver; import org.openqa.selenium.chrome.ChromeDriver; public class FirstTest { public static void main(String[] args) { System.setProperty("webdriver.chrome.driver", "path/to/chromedriver"); WebDriver driver = new ChromeDriver();
driver.get("https://example.com"); System.out.println("Title: " + driver.getTitle()); driver.quit(); } } 4. Run the program. The browser should launch and load the URL. 2.6 Setting Up a Python Project with Selenium For Python, use VS Code or PyCharm: 1. Create a new Python file, e.g., test_example.py 2. Add the following code: from selenium import webdriver driver = webdriver.Chrome(executable_path="path/to/chromedriver") driver.get("https://example.com") print("Title:", driver.title) driver.quit() Run the script: python test_example.py 3. If your chromedriver is in the system PATH, you can omit executable_path. 2.7 Folder Structure (Best Practice) As your project grows, organize it well: selenium_project/ │ ├── drivers/ # WebDriver executables ├── tests/ # Test scripts ├── pages/ # Page Object classes
├── data/ # Test data files (Excel, JSON, CSV) ├── reports/ # Test reports └── utils/ # Helper functions This structure is scalable and supports long-term maintenance. 2.8 Common Issues During Setup ● Driver Version Mismatch: Always match WebDriver with your browser version. ● PATH Errors: If you get “driver not found” errors, confirm that PATH is correctly set. ● Permissions: On Linux/macOS, give executable permission using chmod +x chromedriver.
Chapter 3: First Selenium Script With your environment set up, it’s time to write your first Selenium test. This chapter walks you through writing and executing a simple script in both Java and Python, explaining each line along the way. 3.1 What This Script Will Do We’ll create a simple script that: 1. Opens a browser 2. Navigates to https://example.com 3. Fetches the page title 4. Prints the title in the console 5. Closes the browser 3.2 Writing the First Script in Java Let’s start with Java, using ChromeDriver. Prerequisites: ● Java installed ● Chrome browser installed ● Selenium .jar files added to the project ● ChromeDriver in your system PATH or specified in your script Java Code: import org.openqa.selenium.WebDriver; import org.openqa.selenium.chrome.ChromeDriver; public class FirstTest { public static void main(String[] args) { // Set path to chromedriver if not added to PATH
System.setProperty("webdriver.chrome.driver", "C:\\drivers\\chromedriver.exe"); // Initialize WebDriver WebDriver driver = new ChromeDriver(); // Navigate to the website driver.get("https://example.com"); // Get and print the title String title = driver.getTitle(); System.out.println("Page Title: " + title); // Close the browser driver.quit(); } } Explanation: ● System.setProperty(...): Sets the path for the ChromeDriver executable. ● new ChromeDriver(): Launches a new Chrome browser window. ● get(): Navigates to the provided URL. ● getTitle(): Returns the page title. ● quit(): Closes all browser windows and ends the session. 3.3 Writing the First Script in Python If you’re using Python, the process is very similar.
Prerequisites: ● Python and pip installed ● Selenium installed via pip install selenium ● Chrome browser and ChromeDriver installed Python Code: from selenium import webdriver # Create a new Chrome browser instance driver = webdriver.Chrome(executable_path="C:/drivers/chromedriver.exe") # Open the target website driver.get("https://example.com") # Print the title print("Page Title:", driver.title) # Close the browser driver.quit() Note: If chromedriver is in your system PATH, you can simply use webdriver.Chrome() without the executable_path. 3.4 Running the Script In Java (Eclipse or IntelliJ): ● Right-click on the file and select Run As > Java Application.
● You should see a browser window open, load the page, and then close after printing the title in the console. In Python (VS Code or terminal): Run the file in your terminal: python first_test.py ● The browser should launch, display the page, and then close. 3.5 Tips for First-Time Execution ● Browser Closes Too Fast: If your browser closes too quickly to see anything, insert time.sleep(5) (Python) or Thread.sleep(5000) (Java) before quit(). ● Driver Exceptions: Make sure the version of ChromeDriver matches your installed browser version. Permission Issues: On macOS/Linux, make the driver executable: chmod +x chromedriver ● 3.6 Customizing the Script You can expand your basic script by: ● Navigating to a different site ● Extracting text using locators (we’ll cover these in the next chapter) ● Taking screenshots with driver.get_screenshot_as_file() (Python) or TakesScreenshot in Java Example (Python): driver.save_screenshot("screenshot.png") Example (Java): File screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);FileUtils.copyFile(screenshot, new File("screenshot.png"));
Chapter 4: Locating Web Elements Automated testing with Selenium isn't just about opening a browser and visiting pages. The real power comes from interacting with elements on those pages—buttons, text fields, links, dropdowns, and more. To interact with any web element, Selenium first needs to locate it. This chapter focuses on one of the most critical aspects of Selenium automation: locating web elements reliably. We'll explore all major locator strategies, how and when to use them, and what best practices to follow to avoid flaky tests. 4.1 What Are Web Elements? In a web application, a web element is any component you can interact with in a browser: ● Buttons ● Links ● Input fields ● Checkboxes ● Drop-down menus ● Images ● Tables ● Alerts Selenium allows you to identify these elements using attributes in their underlying HTML code. Once an element is located, you can perform actions such as clicking, entering text, selecting options, etc. 4.2 The Importance of Locators Think of locators as the address of a house. If your address is precise, the mail gets delivered to the right door. If not, it gets lost. The same principle applies to element locators in Selenium. Poorly chosen locators are the number one cause of test failures in UI automation. Well-crafted locators ensure that your tests are: ● Accurate (point to the right element)
● Stable (not easily broken by small UI changes) ● Maintainable (easy to update if needed) 4.3 Types of Locator Strategies Selenium provides multiple methods to locate elements on a webpage. Each comes with its strengths and ideal use cases. 1. ID driver.findElement(By.id("username")); driver.find_element(By.ID, "username") When to use: Use this if the element has a unique and static id attribute. Pros: Fast and reliable (ID is supposed to be unique). Cons: Sometimes developers use dynamic IDs that change each session. 2. Name driver.findElement(By.name("email")); driver.find_element(By.NAME, "email") When to use: When the element has a name attribute and it's unique on the page. Pros: Readable and easy to use. Cons: Not always unique; less reliable than ID. 3. Class Name driver.findElement(By.className("btn-primary")); driver.find_element(By.CLASS_NAME, "btn-primary") When to use: Use when the class is specific to that element or uniquely identifies it. Pros: Convenient for buttons and styling elements. Cons: Avoid if the class is shared among multiple elements. 4. Tag Name
driver.findElement(By.tagName("input")); driver.find_element(By.TAG_NAME, "input") When to use: Use sparingly; mostly helpful when you're looking for all inputs, images, or links. Pros: Good for finding groups of similar elements. Cons: Usually not specific enough alone. 5. Link Text and Partial Link Text driver.findElement(By.linkText("Login")); driver.find_element(By.LINK_TEXT, "Login") When to use: When you're interacting with hyperlinks and the text is visible and unique. Partial Link Text allows matching just a part of the visible link: driver.find_element(By.PARTIAL_LINK_TEXT, "Log") Pros: Good for navigation. Cons: Breaks if link text changes or is localized. 6. CSS Selector driver.findElement(By.cssSelector("input[type='text']")); driver.find_element(By.CSS_SELECTOR, "input[type='text']") When to use: CSS Selectors are highly flexible and powerful for complex DOM structures. Pros: Precise and concise. Cons: Slightly harder to read and learn for beginners. 7. XPath driver.findElement(By.xpath("//input[@id='username']")); driver.find_element(By.XPATH, "//input[@id='username']")
When to use: XPath is the most powerful locator strategy and can traverse the entire DOM. Pros: Can find deeply nested elements, supports complex queries. Cons: Can become long and brittle if not used wisely. 4.4 Understanding XPath in Detail XPath is like a map through the HTML structure of a page. It supports two styles: ● Absolute XPath (not recommended): /html/body/div[2]/form/input[1] ● Relative XPath (preferred): //input[@id='email'] Common XPath patterns: Pattern Meaning //tagname Selects all elements with the given tag //div[@class='login-box'] Selects div with a specific class //input[@type='text'] Input elements of type text //a[contains(text(), 'Register')] Links containing "Register" //button[@disabled] Buttons with disabled attribute //div[@class='menu']//a Nested anchors inside menu div 4.5 Which Locator Should You Use? Recommended Priority: 1. id 2. name
3. cssSelector 4. xpath 5. className 6. linkText Use XPath or CSS Selectors only when simpler locators like ID and Name are not available or reliable. 4.6 Practical Examples Let’s use a sample login form: <form id="loginForm"> <input type="text" id="username" name="user"> <input type="password" name="pass"> <button class="btn login">Login</button> </form> Example locators: ● By ID: By.id("username") ● By Name: By.name("pass") ● By Class: By.className("login") ● By XPath: //button[@class='btn login'] ● By CSS: form#loginForm input[name='user'] 4.7 Locating Multiple Elements Sometimes you need to capture all matching elements: links = driver.find_elements(By.TAG_NAME, "a") for link in links: print(link.text)
Or in Java: List<WebElement> links = driver.findElements(By.tagName("a")); for (WebElement link : links) { System.out.println(link.getText()); } This is useful for validating menus, lists, or table data. 4.8 Best Practices for Locating Elements ● Prefer unique attributes like id, name, or custom data attributes. ● Avoid brittle locators based on element index or long XPaths. ● Use descriptive locators: a good locator describes what it's targeting. ● Avoid relying on visual text that might change with UI updates or localization. ● Always verify your locator manually in browser DevTools. 4.9 Tools to Help You Find Locators ● Browser DevTools (F12): Right-click > Inspect to view HTML. ● Copy XPath or Selector from the context menu. ● Selenium IDE: Helps record actions and generate locators. ● Extensions: Tools like ChroPath or SelectorsHub assist in building XPath/CSS. Chapter 5: Interacting with Web Elements
In the previous chapter, we explored how to locate web elements using various strategies such as ID, name, XPath, and CSS selectors. Now that we can accurately find elements on a page, the next step is to interact with them. This chapter covers the most common and essential types of user interactions in Selenium: clicking buttons, entering text, selecting from dropdowns, checking boxes, and handling dynamic user events. Each interaction will be explained with detailed examples in both Java and Python. 5.1 Introduction to WebElement Interface When you use a locator to find an element in Selenium, what you get is a WebElement object. This object serves as the interface to interact with that element—whether it's a button, text field, checkbox, or link. Example in Java: WebElement loginButton = driver.findElement(By.id("login")); loginButton.click(); Example in Python: login_button = driver.find_element(By.ID, "login") login_button.click() Almost all interactions start with locating the element and then calling a method on the resulting WebElement. 5.2 Clicking Elements The .click() method simulates a mouse click on elements like buttons, links, and checkboxes. Java: WebElement submit = driver.findElement(By.id("submitBtn")); submit.click(); Python: submit = driver.find_element(By.ID, "submitBtn")
submit.click() Important Notes: ● The element must be visible and enabled. ● If an element is hidden or not interactable, .click() will throw an exception. ● Always ensure page loading or animations are completed before clicking. 5.3 Typing Text into Input Fields Use the .sendKeys() method to simulate keyboard input. Java: WebElement username = driver.findElement(By.name("user")); username.sendKeys("myUsername"); Python: username = driver.find_element(By.NAME, "user") username.send_keys("myUsername") To clear an input field before typing: username.clear(); username.sendKeys("newUser"); username.clear() username.send_keys("newUser") Tip: Always use .clear() if there's a chance the field contains pre-filled values. 5.4 Handling Checkboxes and Radio Buttons
Checkboxes and radio buttons are also handled using .click(). Before clicking, you might want to check whether they’re already selected. Java: WebElement checkbox = driver.findElement(By.id("subscribe")); if (!checkbox.isSelected()) { checkbox.click(); } Python: checkbox = driver.find_element(By.ID, "subscribe") if not checkbox.is_selected(): checkbox.click() Common Methods: ● .isSelected() — returns true if the box is checked. ● .click() — toggles the checked state. 5.5 Selecting from Dropdown Menus There are two types of dropdowns: standard HTML <select> dropdowns and custom dropdowns created with JavaScript. Let’s start with standard ones. Standard <select> Dropdown Java provides a Select class: import org.openqa.selenium.support.ui.Select; Select country = new Select(driver.findElement(By.id("country"))); country.selectByVisibleText("Canada"); country.selectByValue("CA");
country.selectByIndex(2); Python also provides a Select class: from selenium.webdriver.support.ui import Select select = Select(driver.find_element(By.ID, "country")) select.select_by_visible_text("Canada") select.select_by_value("CA") select.select_by_index(2) Methods: ● .selectByVisibleText(String) — matches what the user sees. ● .selectByValue(String) — matches the value attribute. ● .selectByIndex(int) — selects by order (starts at 0). Custom Dropdowns Custom dropdowns (often using <div> or <li>) require you to click to open the menu and click again to select the item. driver.find_element(By.ID, "dropdownMenu").click() driver.find_element(By.XPATH, "//li[text()='Canada']").click() Use developer tools to inspect such elements and write a reliable locator strategy. 5.6 Handling Text, Attributes, and State Getting Text: String message = driver.findElement(By.id("msg")).getText();
message = driver.find_element(By.ID, "msg").text Getting Attributes: String type = driver.findElement(By.id("email")).getAttribute("type"); type = driver.find_element(By.ID, "email").get_attribute("type") Checking State: ● isDisplayed() – is the element visible? ● isEnabled() – can the user interact with it? ● isSelected() – for checkboxes/radios. 5.7 Submitting Forms Some forms submit when you press Enter or click a button. You can also explicitly submit them: Java: driver.findElement(By.id("loginForm")).submit(); Python: driver.find_element(By.ID, "loginForm").submit() Note: .submit() only works on form elements or their children. 5.8 Waiting for Elements (Implicit & Explicit Waits) Real-world web apps are often slow to load elements. Selenium offers two waiting strategies.
Implicit Wait: Sets a default delay for locating elements. driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS); driver.implicitly_wait(10) Explicit Wait: Waits for a specific condition. Java: WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(10)); WebElement element = wait.until(ExpectedConditions.visibilityOfElementLocated(By.id("login"))); Python: from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC wait = WebDriverWait(driver, 10) element = wait.until(EC.visibility_of_element_located((By.ID, "login"))) Use explicit waits for dynamic elements that load after an AJAX call. 5.9 Real-World Example: Login Script Let’s put it all together in a login automation scenario. Python: from selenium import webdriver from selenium.webdriver.common.by import By
driver = webdriver.Chrome() driver.get("https://example.com/login") driver.find_element(By.ID, "username").send_keys("testuser") driver.find_element(By.ID, "password").send_keys("securepass") driver.find_element(By.ID, "loginButton").click() # Optional wait and validation print("Title after login:", driver.title) driver.quit()
Chapter 6: Advanced User Interactions with the Actions Class So far, we've explored basic interactions with web elements—clicks, typing, selecting, and reading values. However, many modern web applications use advanced UI features like drag-and-drop, hover menus, right-click context menus, and keyboard shortcuts. Selenium provides the Actions class to simulate these more complex interactions. This chapter covers how to use it for building robust, human-like behavior in your automated tests. 6.1 What is the Actions Class? The Actions class (in Java) or ActionChains (in Python) enables you to chain together multiple low-level input events like: ● Mouse movements (hover, click and hold, release) ● Composite gestures (drag-and-drop) ● Double-click, right-click ● Keyboard input (press, hold, release) These are especially useful when testing modern JavaScript-heavy UI components that respond to hover states or gesture-like behavior. 6.2 Setting Up Actions in Selenium Java Example: import org.openqa.selenium.interactions.Actions; Actions actions = new Actions(driver); Python Example: from selenium.webdriver.common.action_chains import ActionChains
actions = ActionChains(driver) Once initialized, the actions object can be used to perform a variety of gestures and interactions. 6.3 Hovering Over Elements (Mouse Over) Hovering is often required to reveal hidden menus or tooltips. Java: WebElement menu = driver.findElement(By.id("menu")); Actions actions = new Actions(driver); actions.moveToElement(menu).perform(); Python: menu = driver.find_element(By.ID, "menu") actions = ActionChains(driver) actions.move_to_element(menu).perform() Common Use Case: Navigating dropdown menus or tooltips that appear only on hover. 6.4 Click and Hold, Then Release Used to simulate a long-press or press-and-hold action. Java: WebElement element = driver.findElement(By.id("draggable")); actions.clickAndHold(element).pause(Duration.ofSeconds(2)).release().perform(); Python:
element = driver.find_element(By.ID, "draggable") actions.click_and_hold(element).pause(2).release().perform() This can simulate holding a mouse button or preparing for drag-and-drop. 6.5 Drag and Drop Many web UIs support dragging elements to reorder items or move between lists. Selenium handles this smoothly using Actions. Java: WebElement source = driver.findElement(By.id("source")); WebElement target = driver.findElement(By.id("target")); actions.dragAndDrop(source, target).perform(); Python: source = driver.find_element(By.ID, "source") target = driver.find_element(By.ID, "target") actions.drag_and_drop(source, target).perform() You can also build it step-by-step: actions.click_and_hold(source).move_to_element(target).release().perform() 6.6 Right-Click (Context Click) Context menus often require a right-click to reveal additional options.
Java: WebElement element = driver.findElement(By.id("contextMenu")); actions.contextClick(element).perform(); Python: element = driver.find_element(By.ID, "contextMenu") actions.context_click(element).perform() Tip: After right-clicking, you may need to send keyboard keys or click a menu item. 6.7 Double-Click Some UI components trigger actions on double-clicks (e.g., edit mode in table rows). Java: WebElement row = driver.findElement(By.className("editable")); actions.doubleClick(row).perform(); Python: row = driver.find_element(By.CLASS_NAME, "editable") actions.double_click(row).perform() 6.8 Keyboard Interactions To simulate keyboard input like pressing Tab, Enter, or combinations (e.g., Ctrl+C), you can use sendKeys() in combination with Keys. Java: import org.openqa.selenium.Keys;
WebElement input = driver.findElement(By.id("search")); input.sendKeys("Selenium" + Keys.ENTER); Python: from selenium.webdriver.common.keys import Keys input = driver.find_element(By.ID, "search") input.send_keys("Selenium" + Keys.ENTER) You can also use Actions for complex sequences: Java: actions.keyDown(Keys.CONTROL).sendKeys("a").keyUp(Keys.CONTROL).perform(); Python: actions.key_down(Keys.CONTROL).send_keys("a").key_up(Keys.CONTROL).perform() Common Keys: ● Keys.ENTER, Keys.TAB, Keys.ESCAPE ● Keys.CONTROL, Keys.SHIFT, Keys.ALT ● Arrow keys: Keys.ARROW_UP, Keys.ARROW_DOWN 6.9 Combining Actions You can chain multiple actions to simulate realistic user flows. Example in Python: actions.move_to_element(menu)\
.click()\ .send_keys("search term")\ .send_keys(Keys.ENTER)\ .perform() Example in Java: actions.moveToElement(menu) .click() .sendKeys("search term") .sendKeys(Keys.ENTER) .perform(); This is useful for simulating user flows like menu navigation, typing, and submitting—all in one gesture. 6.10 Limitations and Troubleshooting ● Timing issues: Ensure elements are visible and stable before interaction. Use WebDriverWait if needed. ● Overlapping elements: Sometimes hover or click may fail if another element is on top. ● Non-standard UIs: Some JS-based UI frameworks don't behave like standard HTML. In such cases, JavaScriptExecutor might be needed. ● Test flakiness: Avoid hardcoded waits. Prefer explicit waits and consistent interaction timing. Chapter 7: Handling Alerts, Frames, and Windows
As you continue building automation scripts for modern web applications, it’s important to handle more complex interactions that go beyond basic element manipulation. These interactions include dealing with pop-up alerts, iframes, and multiple browser windows. This chapter will guide you through handling these elements effectively using Selenium WebDriver. 7.1 Introduction to Alerts in Selenium Alerts are a common feature of modern web applications. They provide notifications or require user input in the form of acceptance or dismissal. Selenium offers built-in support to handle JavaScript alerts, confirmations, and prompts. Alert Types: ● Simple Alert: Displays a message and has an OK button. ● Confirmation Alert: Displays a message and has OK and Cancel buttons. ● Prompt Alert: Displays a message and a text input field, along with OK and Cancel buttons. 7.2 Handling Alerts Accepting Alerts To accept a simple alert (clicking the OK button), you can use the .accept() method. Java: Alert alert = driver.switchTo().alert(); alert.accept(); Python: alert = driver.switch_to.alert alert.accept() Use Case: This is often used to confirm actions like form submission or deletion.
Dismissing Alerts To dismiss a confirmation alert (clicking the Cancel button), you can use the .dismiss() method. Java: Alert alert = driver.switchTo().alert(); alert.dismiss(); Python: alert = driver.switch_to.alert alert.dismiss() Use Case: You might use this for canceling an operation, like deleting an item. Retrieving Alert Text You can retrieve the message text from an alert using .getText(). Java: String alertText = driver.switchTo().alert().getText(); System.out.println(alertText); Python: alert_text = driver.switch_to.alert.text print(alert_text) Sending Text to Prompt Alerts For prompt alerts, where you are asked to enter text, you can send input using .sendKeys(). Java: Alert alert = driver.switchTo().alert();
alert.sendKeys("Hello, Selenium!"); alert.accept(); Python: alert = driver.switch_to.alert alert.send_keys("Hello, Selenium!") alert.accept() Important Note: Ensure that the prompt alert is ready before sending keys to avoid exceptions. 7.3 Working with Frames Frames allow you to embed one HTML document within another. Selenium provides methods to switch between different frames (either by index, name, or WebElement) to interact with elements inside them. Switching to a Frame There are several ways to switch to a frame in Selenium: By index (0-based): driver.switchTo().frame(0); ● By name or ID: driver.switchTo().frame("frameName"); ● By WebElement: WebElement frame = driver.findElement(By.id("frameId")); driver.switchTo().frame(frame); ●
Python: driver.switch_to.frame(0) # By index driver.switch_to.frame("frameName") # By name frame_element = driver.find_element(By.ID, "frameId") driver.switch_to.frame(frame_element) # By WebElement Interacting Inside a Frame Once switched to a frame, you can interact with elements inside it just as you would with elements on the main page. Example: WebElement button = driver.findElement(By.id("buttonInFrame")); button.click(); Python: button = driver.find_element(By.ID, "buttonInFrame") button.click() Switching Back to the Default Content To interact with elements outside of the frame, you need to switch back to the default page context. Java: driver.switchTo().defaultContent(); Python: driver.switch_to.default_content() Tip: If you have nested frames, you will need to switch through each frame in the hierarchy.
7.4 Working with Multiple Browser Windows Web applications often open new browser windows or tabs. Selenium provides tools to handle multiple windows by using window handles. Getting the Current Window Handle The current window handle is used to interact with the window in focus. Java: String mainWindowHandle = driver.getWindowHandle(); Python: main_window_handle = driver.current_window_handle Getting All Window Handles To get a list of all open window handles, use the .getWindowHandles() method. This will return a set of window handles, which you can iterate over to switch between windows. Java: Set<String> allWindowHandles = driver.getWindowHandles(); for (String windowHandle : allWindowHandles) { driver.switchTo().window(windowHandle); } Python: all_window_handles = driver.window_handles for window_handle in all_window_handles: driver.switch_to.window(window_handle) Switching Between Windows
After obtaining all the window handles, you can switch between windows using .switchTo().window() (Java) or .switch_to.window() (Python) with the desired window handle. Java: driver.switchTo().window(windowHandle); Python: driver.switch_to.window(window_handle) Important Note: When you open a new window or tab, Selenium will continue to interact with the original window unless explicitly told to switch to the new one. Closing Windows To close a window, use .close() on the window handle. Java: driver.close(); Python: driver.close() Important: Calling .close() closes the current window. If there are multiple windows, you will need to switch to the one you want to close first. 7.5 Example Scenario: Handling Alerts and Windows Let’s combine alerts and windows in a practical example. Imagine that clicking a button triggers an alert, and then clicking a link opens a new window. Java Example: // Switch to alert and accept driver.findElement(By.id("alertButton")).click();
Alert alert = driver.switchTo().alert(); alert.accept(); // Open new window and switch to it driver.findElement(By.id("newWindowButton")).click(); String mainWindow = driver.getWindowHandle(); Set<String> allWindows = driver.getWindowHandles(); for (String window : allWindows) { if (!window.equals(mainWindow)) { driver.switchTo().window(window); break; } } driver.findElement(By.id("windowElement")).click(); Python Example: # Handle alert driver.find_element(By.ID, "alertButton").click() alert = driver.switch_to.alert alert.accept() # Open new window and switch to it driver.find_element(By.ID, "newWindowButton").click() main_window = driver.current_window_handle all_windows = driver.window_handles for window in all_windows:
if window != main_window: driver.switch_to.window(window) break driver.find_element(By.ID, "windowElement").click() Chapter 7: Handling Alerts, Frames, and Windows As you continue building automation scripts for modern web applications, it’s important to handle more complex interactions that go beyond basic element manipulation. These interactions include dealing with pop-up alerts, iframes, and multiple browser windows. This chapter will guide you through handling these elements effectively using Selenium WebDriver.
7.1 Introduction to Alerts in Selenium Alerts are a common feature of modern web applications. They provide notifications or require user input in the form of acceptance or dismissal. Selenium offers built-in support to handle JavaScript alerts, confirmations, and prompts. Alert Types: ● Simple Alert: Displays a message and has an OK button. ● Confirmation Alert: Displays a message and has OK and Cancel buttons. ● Prompt Alert: Displays a message and a text input field, along with OK and Cancel buttons. 7.2 Handling Alerts Accepting Alerts To accept a simple alert (clicking the OK button), you can use the .accept() method. Java: Alert alert = driver.switchTo().alert(); alert.accept(); Python: alert = driver.switch_to.alert alert.accept() Use Case: This is often used to confirm actions like form submission or deletion. Dismissing Alerts To dismiss a confirmation alert (clicking the Cancel button), you can use the .dismiss() method. Java: Alert alert = driver.switchTo().alert();
alert.dismiss(); Python: alert = driver.switch_to.alert alert.dismiss() Use Case: You might use this for canceling an operation, like deleting an item. Retrieving Alert Text You can retrieve the message text from an alert using .getText(). Java: String alertText = driver.switchTo().alert().getText(); System.out.println(alertText); Python: alert_text = driver.switch_to.alert.text print(alert_text) Sending Text to Prompt Alerts For prompt alerts, where you are asked to enter text, you can send input using .sendKeys(). Java: Alert alert = driver.switchTo().alert(); alert.sendKeys("Hello, Selenium!"); alert.accept(); Python: alert = driver.switch_to.alert
alert.send_keys("Hello, Selenium!") alert.accept() Important Note: Ensure that the prompt alert is ready before sending keys to avoid exceptions. 7.3 Working with Frames Frames allow you to embed one HTML document within another. Selenium provides methods to switch between different frames (either by index, name, or WebElement) to interact with elements inside them. Switching to a Frame There are several ways to switch to a frame in Selenium: By index (0-based): driver.switchTo().frame(0); ● By name or ID: driver.switchTo().frame("frameName"); ● By WebElement: WebElement frame = driver.findElement(By.id("frameId")); driver.switchTo().frame(frame); ● Python: driver.switch_to.frame(0) # By index driver.switch_to.frame("frameName") # By name frame_element = driver.find_element(By.ID, "frameId") driver.switch_to.frame(frame_element) # By WebElement
Interacting Inside a Frame Once switched to a frame, you can interact with elements inside it just as you would with elements on the main page. Example: WebElement button = driver.findElement(By.id("buttonInFrame")); button.click(); Python: button = driver.find_element(By.ID, "buttonInFrame") button.click() Switching Back to the Default Content To interact with elements outside of the frame, you need to switch back to the default page context. Java: driver.switchTo().defaultContent(); Python: driver.switch_to.default_content() Tip: If you have nested frames, you will need to switch through each frame in the hierarchy. 7.4 Working with Multiple Browser Windows Web applications often open new browser windows or tabs. Selenium provides tools to handle multiple windows by using window handles. Getting the Current Window Handle
The current window handle is used to interact with the window in focus. Java: String mainWindowHandle = driver.getWindowHandle(); Python: main_window_handle = driver.current_window_handle Getting All Window Handles To get a list of all open window handles, use the .getWindowHandles() method. This will return a set of window handles, which you can iterate over to switch between windows. Java: Set<String> allWindowHandles = driver.getWindowHandles(); for (String windowHandle : allWindowHandles) { driver.switchTo().window(windowHandle); } Python: all_window_handles = driver.window_handles for window_handle in all_window_handles: driver.switch_to.window(window_handle) Switching Between Windows After obtaining all the window handles, you can switch between windows using .switchTo().window() (Java) or .switch_to.window() (Python) with the desired window handle. Java: driver.switchTo().window(windowHandle);
Python: driver.switch_to.window(window_handle) Important Note: When you open a new window or tab, Selenium will continue to interact with the original window unless explicitly told to switch to the new one. Closing Windows To close a window, use .close() on the window handle. Java: driver.close(); Python: driver.close() Important: Calling .close() closes the current window. If there are multiple windows, you will need to switch to the one you want to close first. 7.5 Example Scenario: Handling Alerts and Windows Let’s combine alerts and windows in a practical example. Imagine that clicking a button triggers an alert, and then clicking a link opens a new window. Java Example: // Switch to alert and accept driver.findElement(By.id("alertButton")).click(); Alert alert = driver.switchTo().alert(); alert.accept(); // Open new window and switch to it driver.findElement(By.id("newWindowButton")).click();