-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hybrid of a11y tree & DOM for input to observe #459
Merged
Merged
Changes from 12 commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
f4d096d
include backendDOMNodeId
seanmcguire12 40862fd
skip ax nodeId if negative
seanmcguire12 79a2b1f
replace role with dom tag name if none or generic
seanmcguire12 8212480
add xpath to AXNode type
seanmcguire12 8ec593b
revert unnecessary changed lines
seanmcguire12 22aee72
revert more unnecessary changed lines
seanmcguire12 b04d7c1
changeset
seanmcguire12 332b864
Merge remote-tracking branch 'origin/main' into a11y-dom-hybrid
seanmcguire12 7c46416
Merge remote-tracking branch 'origin/main' into a11y-dom-hybrid
seanmcguire12 90e645c
speedup
miguelg719 8b60212
prettier
miguelg719 b2fbf4f
prune before updating roles
seanmcguire12 cbd9eb1
take xpath out of AXnode type
seanmcguire12 5a7bd49
rm commented code
seanmcguire12 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
"@browserbasehq/stagehand": patch | ||
--- | ||
|
||
create a11y + dom hybrid input for observe |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -13,7 +13,9 @@ export function formatSimplifiedTree( | |||
level = 0, | ||||
): string { | ||||
const indent = " ".repeat(level); | ||||
let result = `${indent}[${node.nodeId}] ${node.role}${node.name ? `: ${node.name}` : ""}\n`; | ||||
let result = `${indent}[${node.nodeId}] ${node.role}${ | ||||
node.name ? `: ${node.name}` : "" | ||||
}\n`; | ||||
|
||||
if (node.children?.length) { | ||||
result += node.children | ||||
|
@@ -29,39 +31,113 @@ export function formatSimplifiedTree( | |||
* 1. Removes generic/none nodes with no children | ||||
* 2. Collapses generic/none nodes with single child | ||||
* 3. Keeps generic/none nodes with multiple children but cleans their subtrees | ||||
* and attempts to resolve their role to a DOM tag name | ||||
*/ | ||||
function cleanStructuralNodes( | ||||
async function cleanStructuralNodes( | ||||
node: AccessibilityNode, | ||||
): AccessibilityNode | null { | ||||
// Filter out nodes with negative IDs | ||||
page?: StagehandPage, | ||||
logger?: (logLine: LogLine) => void, | ||||
): Promise<AccessibilityNode | null> { | ||||
// 1) Filter out nodes with negative IDs | ||||
if (node.nodeId && parseInt(node.nodeId) < 0) { | ||||
return null; | ||||
} | ||||
|
||||
// Base case: leaf node | ||||
if (!node.children) { | ||||
// 2) Base case: if no children exist, this is effectively a leaf. | ||||
// If it's "generic" or "none", we remove it; otherwise, keep it. | ||||
if (!node.children || node.children.length === 0) { | ||||
return node.role === "generic" || node.role === "none" ? null : node; | ||||
} | ||||
|
||||
// Recursively clean children | ||||
const cleanedChildren = node.children | ||||
.map((child) => cleanStructuralNodes(child)) | ||||
.filter(Boolean) as AccessibilityNode[]; | ||||
|
||||
// Handle generic/none nodes specially | ||||
// 3) Recursively clean children | ||||
const cleanedChildrenPromises = node.children.map((child) => | ||||
cleanStructuralNodes(child, page, logger), | ||||
); | ||||
const resolvedChildren = await Promise.all(cleanedChildrenPromises); | ||||
const cleanedChildren = resolvedChildren.filter( | ||||
(child): child is AccessibilityNode => child !== null, | ||||
); | ||||
|
||||
// 4) **Prune** "generic" or "none" nodes first, | ||||
// before resolving them to their tag names. | ||||
if (node.role === "generic" || node.role === "none") { | ||||
if (cleanedChildren.length === 1) { | ||||
// Collapse single-child generic nodes | ||||
// Collapse single-child structural node | ||||
return cleanedChildren[0]; | ||||
} else if (cleanedChildren.length > 1) { | ||||
// Keep generic nodes with multiple children | ||||
return { ...node, children: cleanedChildren }; | ||||
} else if (cleanedChildren.length === 0) { | ||||
// Remove empty structural node | ||||
return null; | ||||
} | ||||
// If we have multiple children, we keep this node as a container. | ||||
// We'll update role below if needed. | ||||
} | ||||
|
||||
// 5) If we still have a "generic"/"none" node after pruning | ||||
// (i.e., because it had multiple children), now we try | ||||
// to resolve and replace its role with the DOM tag name. | ||||
if ( | ||||
page && | ||||
logger && | ||||
node.backendDOMNodeId !== undefined && | ||||
(node.role === "generic" || node.role === "none") | ||||
) { | ||||
try { | ||||
const { object } = await page.sendCDP<{ | ||||
object: { objectId?: string }; | ||||
}>("DOM.resolveNode", { | ||||
backendNodeId: node.backendDOMNodeId, | ||||
}); | ||||
|
||||
if (object && object.objectId) { | ||||
try { | ||||
// Get the tagName for the node | ||||
const { result } = await page.sendCDP<{ | ||||
result: { type: string; value?: string }; | ||||
}>("Runtime.callFunctionOn", { | ||||
objectId: object.objectId, | ||||
functionDeclaration: ` | ||||
function() { | ||||
return this.tagName ? this.tagName.toLowerCase() : ""; | ||||
} | ||||
`, | ||||
returnByValue: true, | ||||
}); | ||||
|
||||
// If we got a tagName, update the node's role | ||||
if (result?.value) { | ||||
node.role = result.value; | ||||
} | ||||
} catch (tagNameError) { | ||||
logger({ | ||||
category: "observation", | ||||
message: `Could not fetch tagName for node ${node.backendDOMNodeId}`, | ||||
level: 2, | ||||
auxiliary: { | ||||
error: { | ||||
value: tagNameError.message, | ||||
type: "string", | ||||
}, | ||||
}, | ||||
}); | ||||
} | ||||
} | ||||
} catch (resolveError) { | ||||
logger({ | ||||
category: "observation", | ||||
message: `Could not resolve DOM node ID ${node.backendDOMNodeId}`, | ||||
level: 2, | ||||
auxiliary: { | ||||
error: { | ||||
value: resolveError.message, | ||||
type: "string", | ||||
}, | ||||
}, | ||||
}); | ||||
} | ||||
// Remove generic nodes with no children | ||||
return null; | ||||
} | ||||
|
||||
// For non-generic nodes, keep them if they have children after cleaning | ||||
// 6) Return the updated node. | ||||
// If it has children, update them; otherwise keep it as-is. | ||||
return cleanedChildren.length > 0 | ||||
? { ...node, children: cleanedChildren } | ||||
: node; | ||||
|
@@ -73,13 +149,23 @@ function cleanStructuralNodes( | |||
* @param nodes - Flat array of accessibility nodes from the CDP | ||||
* @returns Object containing both the tree structure and a simplified string representation | ||||
*/ | ||||
export function buildHierarchicalTree(nodes: AccessibilityNode[]): TreeResult { | ||||
export async function buildHierarchicalTree( | ||||
nodes: AccessibilityNode[], | ||||
page?: StagehandPage, | ||||
logger?: (logLine: LogLine) => void, | ||||
): Promise<TreeResult> { | ||||
// Map to store processed nodes for quick lookup | ||||
const nodeMap = new Map<string, AccessibilityNode>(); | ||||
|
||||
// First pass: Create nodes that are meaningful | ||||
// We only keep nodes that either have a name or children to avoid cluttering the tree | ||||
nodes.forEach((node) => { | ||||
// Skip node if its ID is negative (e.g., "-1000002014") | ||||
const nodeIdValue = parseInt(node.nodeId, 10); | ||||
if (nodeIdValue < 0) { | ||||
return; | ||||
} | ||||
|
||||
const hasChildren = node.childIds && node.childIds.length > 0; | ||||
const hasValidName = node.name && node.name.trim() !== ""; | ||||
const isInteractive = | ||||
|
@@ -99,6 +185,10 @@ export function buildHierarchicalTree(nodes: AccessibilityNode[]): TreeResult { | |||
...(hasValidName && { name: node.name }), // Only include name if it exists and isn't empty | ||||
...(node.description && { description: node.description }), | ||||
...(node.value && { value: node.value }), | ||||
...(node.backendDOMNodeId !== undefined && { | ||||
backendDOMNodeId: node.backendDOMNodeId, | ||||
}), | ||||
...(node.xpath && { xpath: node.xpath }), | ||||
}); | ||||
}); | ||||
|
||||
|
@@ -119,13 +209,18 @@ export function buildHierarchicalTree(nodes: AccessibilityNode[]): TreeResult { | |||
}); | ||||
|
||||
// Final pass: Build the root-level tree and clean up structural nodes | ||||
const finalTree = nodes | ||||
const rootNodes = nodes | ||||
.filter((node) => !node.parentId && nodeMap.has(node.nodeId)) // Get root nodes | ||||
.map((node) => nodeMap.get(node.nodeId)) | ||||
.filter(Boolean) | ||||
.map((node) => cleanStructuralNodes(node)) | ||||
.filter(Boolean) as AccessibilityNode[]; | ||||
|
||||
const cleanedTreePromises = rootNodes.map((node) => | ||||
cleanStructuralNodes(node, page, logger), | ||||
); | ||||
const finalTree = (await Promise.all(cleanedTreePromises)).filter( | ||||
Boolean, | ||||
) as AccessibilityNode[]; | ||||
|
||||
// Generate a simplified string representation of the tree | ||||
const simplifiedFormat = finalTree | ||||
.map((node) => formatSimplifiedTree(node)) | ||||
|
@@ -137,29 +232,46 @@ export function buildHierarchicalTree(nodes: AccessibilityNode[]): TreeResult { | |||
}; | ||||
} | ||||
|
||||
/** | ||||
* Retrieves the full accessibility tree via CDP and transforms it into a hierarchical structure. | ||||
*/ | ||||
export async function getAccessibilityTree( | ||||
page: StagehandPage, | ||||
logger: (logLine: LogLine) => void, | ||||
) { | ||||
): Promise<TreeResult> { | ||||
await page.enableCDP("Accessibility"); | ||||
|
||||
try { | ||||
// Fetch the full accessibility tree from Chrome DevTools Protocol | ||||
const { nodes } = await page.sendCDP<{ nodes: AXNode[] }>( | ||||
"Accessibility.getFullAXTree", | ||||
); | ||||
const startTime = Date.now(); | ||||
|
||||
// Extract specific sources | ||||
const sources = nodes.map((node) => ({ | ||||
role: node.role?.value, | ||||
name: node.name?.value, | ||||
description: node.description?.value, | ||||
value: node.value?.value, | ||||
nodeId: node.nodeId, | ||||
parentId: node.parentId, | ||||
childIds: node.childIds, | ||||
})); | ||||
// Transform into hierarchical structure | ||||
const hierarchicalTree = buildHierarchicalTree(sources); | ||||
const hierarchicalTree = await buildHierarchicalTree( | ||||
nodes.map((node) => ({ | ||||
role: node.role?.value, | ||||
name: node.name?.value, | ||||
description: node.description?.value, | ||||
value: node.value?.value, | ||||
nodeId: node.nodeId, | ||||
backendDOMNodeId: node.backendDOMNodeId, | ||||
parentId: node.parentId, | ||||
childIds: node.childIds, | ||||
xpath: node.xpath, | ||||
})), | ||||
page, | ||||
logger, | ||||
); | ||||
|
||||
logger({ | ||||
category: "observation", | ||||
message: `got accessibility tree in ${Date.now() - startTime}ms`, | ||||
level: 1, | ||||
}); | ||||
|
||||
// fs.writeFileSync("../hybrid_tree.txt", hierarchicalTree.simplified); | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||
|
||||
return hierarchicalTree; | ||||
} catch (error) { | ||||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isnt backendDOMNodeId === nodeId?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, nodeId is a11y specific (remember we used to have negative values for nodeIds)