Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Detect broken links within theBrain notes #13

Open
2 of 4 tasks
chriskyfung opened this issue Nov 20, 2023 · 0 comments
Open
2 of 4 tasks

✨ Detect broken links within theBrain notes #13

chriskyfung opened this issue Nov 20, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@chriskyfung
Copy link
Owner

Background

Motivation

We often use links to insert external resources, such as websites, images, or videos in our digital notes. However, these links may become broken over time due to various reasons, such as changes in the URL, deletion of the resource, or network issues. Broken links can affect the quality and usability of the notes. Therefore, it is important to find and fix broken links from the notes.

Objective

To develop a script to detect all the broken links from the Brain Notes. As a result, users can regularly scan their markdown files and fix the broken links as soon as possible.

Proposal

TheBrain 13 for Windows stores the Brain Notes with individual markdown (.md) files. To scan and identify broken links from the Brain notes, the script should include the following:

  • Loop through all the files in the Brain data directory
  • Scan a markdown file for links by using a regular expression
  • Define a function to check if a link is valid or not by sending a GET request to the link and checking the status code
  • Print or save the results to the console or a file

Example solutions

Here is a possible solution using Python generated by Bing AI:

# Import the modules
import os
import re
import requests

# Define the folder path
folder_path = "path/to/folder"

# Define a function to check if a link is valid
def is_valid_link(link):
    try:
        response = requests.get(link)
        return response.status_code == 200
    except:
        return False

# Define a function to scan a markdown file for links
def scan_markdown_file(file_path):
    # Open the file and read the content
    with open(file_path, "r", encoding="utf-8") as file:
        content = file.read()
    
    # Find all the links using a regular expression
    links = re.findall(r"\[.*?\]\((.*?)\)", content)
    
    # Check each link and print the result
    for link in links:
        if is_valid_link(link):
            print(f"{link} is valid in {file_path}")
        else:
            print(f"{link} is broken in {file_path}")

# Loop through all the files in the folder
for file_name in os.listdir(folder_path):
    # Check if the file is a markdown file
    if file_name.endswith(".md"):
        # Get the full file path
        file_path = os.path.join(folder_path, file_name)
        # Scan the file for links
        scan_markdown_file(file_path)

Here is a possible solution using PowerShell:

# Define the folder path
$folderPath = "path/to/folder"

# Define a function to check if a link is valid
function Test-Link {
    param (
        [string]$link
    )
    try {
        $response = Invoke-WebRequest -Uri $link -UseBasicParsing
        return $response.StatusCode -eq 200
    }
    catch {
        return $false
    }
}

# Define a function to scan a markdown file for links
function Scan-MarkdownFile {
    param (
        [string]$filePath
    )
    # Get the file content
    $content = Get-Content -Path $filePath -Raw
    
    # Find all the links using a regular expression
    $links = [regex]::Matches($content, "\[.*?\]\((.*?)\)") | ForEach-Object {$_.Groups[1].Value}
    
    # Check each link and print the result
    foreach ($link in $links) {
        if (Test-Link $link) {
            Write-Host "$link is valid in $filePath"
        }
        else {
            Write-Host "$link is broken in $filePath"
        }
    }
}

# Loop through all the files in the folder
Get-ChildItem -Path $folderPath -Filter *.md | ForEach-Object {
    # Get the full file path
    $filePath = $_.FullName
    # Scan the file for links
    Scan-MarkdownFile $filePath
}

These are just some possible ways to do it. There may be other ways to achieve the same goal.

Issues / Challenges

One of the challenges in developing a solution using PowerShell is the processing time required for checking if a link is valid and looping through all the links.

PowerShell is a scripting language that is designed for automation and configuration management, but it is not optimized for performance and speed. Therefore, it may take longer to execute certain tasks, such as sending HTTP requests and parsing regular expressions, compared to other languages, such as Python or C#. Moreover, PowerShell has a pipeline feature that allows users to pass the output of one command to another, but this can also introduce some overhead and latency.

As a result, the solution using PowerShell may have a longer processing time than the solution using Python, especially if the markdown files contain many links or the links are slow to respond. This can affect the efficiency and scalability of the solution, as well as the user experience and satisfaction.

@chriskyfung chriskyfung self-assigned this Nov 20, 2023
@chriskyfung chriskyfung converted this from a draft issue Nov 20, 2023
@chriskyfung chriskyfung added the enhancement New feature or request label Dec 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 📋 Backlog
Development

No branches or pull requests

1 participant