You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FIX: Convert the relative links into absolute ones by appending the base URL to the relative paths.
BEFORE:
AFTER:
SAMPLE CODE:
`import requests
import html2text
from urllib.parse import urljoin
def fix_links_in_markdown(markdown_content, base_url):
"""
Fixes relative links in markdown by converting them to absolute URLs.
Args:
markdown_content (str): The markdown content with relative links.
base_url (str): The base URL to convert relative links to absolute ones.
Returns:
str: The markdown content with fixed links.
"""
# Split content by lines and process each line
lines = markdown_content.splitlines()
fixed_lines = []
for line in lines:
# Convert relative image and link URLs to absolute URLs
if "](/" in line or "src=\"/" in line: # Markdown and HTML relative links
line = line.replace("](/", "](" + urljoin(base_url, "/"))
line = line.replace("src=\"/", "src=\"" + urljoin(base_url, "/"))
fixed_lines.append(line)
return "\n".join(fixed_lines)
def website_to_markdown(url):
"""
Converts the content of a website into markdown format with fixed links.
Args:
url (str): The URL of the website to convert.
Returns:
str: The markdown content of the website with fixed links.
"""
try:
# Fetch the website content
response = requests.get(url)
response.raise_for_status() # Check for HTTP errors
# Convert HTML to markdown
html_content = response.text
markdown_content = html2text.html2text(html_content)
# Fix relative links
markdown_content = fix_links_in_markdown(markdown_content, url)
return markdown_content
except requests.exceptions.RequestException as e:
return f"An error occurred while fetching the website: {e}"
Example usage
if name == "main":
url = "https://myoona.ph/about-us/" # Replace with the URL of the website you want to convert
markdown = website_to_markdown(url)
# Save the markdown content to a file
with open("website_content_fixed.md", "w", encoding="utf-8") as f:
f.write(markdown)
print("Website content saved as markdown with fixed links.")
ERROR: Links not functional. All are relative links
Wrong Output:
![aboutus_image](/content/dam/oona/aem-images/header/about-us.svg) [About Us](/about-us/)
Correct Output:
![aboutus_image](https://myoona.ph/content/dam/oona/aem-images/header/about-us.svg) [About Us](https://myoona.ph/about-us/)
URL USED:
url = "https://myoona.ph/about-us/"
The text was updated successfully, but these errors were encountered: