repo-analyzer_summary.txt


**Project Analysis Instructions**

Below is a summary of the project **repo-analyzer**, starting with the directory tree structure to provide an overview. This is followed by the main file, if specified, which serves as the primary runner for this tool. Use this main file to inform your understanding of all subsequent scripts, as it orchestrates the execution flow and primary logic of the project. Additional scripts have also been included for a comprehensive analysis of the tool.

User-specified files to include:
None

User-specified files to ignore:
build/*, repo_analyzer.egg_info/*

Please perform the following tasks:

1. **Summarize the Tool/Code Purpose**: Provide a high-level summary of what this tool or code is designed to do.
2. **Identify and Summarize Critical Functions and Dependencies**: Note any critical functions, methods, or dependencies within the project. Summarize their roles and how they interact with the main file.
3. **Contextual Analysis**: Use the main file to contextualize the functionality and importance of the subsequent scripts. Highlight how they contribute to the overall operation of the tool.

*Prompt Engineering Instructions*:
- **Core Function Focus**: Prioritize analysis on core functions, such as scientific computations, large computations, or scripts that manage multiple functions. Avoid spending time on wrapper functions or utility functions unless they play a significant role.
- **Execution Flow**: Clearly outline the execution flow starting from the main file to other dependent scripts.
- **File Boundaries**: Recognize the start of a new file with the pattern --- <file_path> ---.
- **Avoid Redundancy**: Do not reproduce the code verbatim. Instead, focus on storing and utilizing the code structure and logic in your memory for this analysis.

Directory Structure:
|-- LICENSE
|-- README.md
|-- build/
  |-- bdist.macosx-11.0-arm64/
  |-- lib/
    |-- repo_analyzer/
      |-- __init__.py
      |-- analyzer.py
      |-- cli.py
      |-- file_ops.py
      |-- tree.py
|-- repo-analyzer_summary.txt
|-- repo_analyzer/
  |-- __init__.py
  |-- analyzer.py
  |-- file_ops.py
  |-- tree.py
|-- repo_analyzer.egg-info/
  |-- PKG-INFO
  |-- SOURCES.txt
  |-- dependency_links.txt
  |-- entry_points.txt
  |-- top_level.txt
|-- setup.py
|-- tests/

Concatenated Files:


--- /Users/jonathan/Documents/Research/repo-analyzer/README.md ---

# RepoConcat

This tool combines the contents of a GitHub repository into a single text file, making it easier to use with Large Language Models (LLMs). It generates a summary of the project, including a directory tree and concatenated contents of specified files, with the option to prioritize a main file.

## Features

- **Repository Summarization**: Combines the contents of a GitHub repository into a single text file for easy use with LLMs.
- **Directory Tree Generation**: Provides a detailed directory tree structure of the repository.
- **File Concatenation**: Concatenates specified files into an output file in the order listed in the configuration file.
- **Main File Prioritization**: Prioritizes a specified main file in the output.
- **Hidden Files Inclusion**: Optionally includes hidden files and directories.
- **Pattern Matching**: Supports wildcard and regex entries for specifying file patterns.
- **Ignore Patterns**: Supports excluding files or folders based on specified patterns.
- **Customization**: Allows customization of maximum characters, tree depth, and items per directory.

## Installation

To install the necessary dependencies, run:

```sh
pip install -r requirements.txt
```

## Usage

```sh
python main.py [directory] [-m MAIN_FILE] [-c MAX_CHARS] [-t TREE_DEPTH] [-i] [-x MAX_ITEMS] [-n INCLUDE] [-g IGNORE]
```

### Arguments

- `directory` (str): The path to the local repository.
- `-m`, `--main_file` (str): The main file to prioritize in the output.
- `-c`, `--max_chars` (int): Maximum number of characters to include in the output file.
- `-t`, `--tree_depth` (int): Maximum depth of the directory tree to include in the output file.
- `-i`, `--include_hidden`: Include hidden files and directories.
- `-x`, `--max_items` (int): Maximum number of items to include in each directory to avoid clutter.
- `-n`, `--include` (str): Path to the configuration file with filename patterns to include in the output.
- `-g`, `--ignore` (str): Path to the configuration file with filename patterns to exclude from the output.

### Example

```sh
python main.py /path/to/repository -m main.py -c 10000 -t 5 -i -x 100 -n include.txt -g ignore.txt
```

## Configuration Files

### Include File

The include file should contain filename patterns to include in the output, one pattern per line. Wildcards and regex entries are supported.

Example `include.txt`:

```
*.py
*.md
*.json
```

### Ignore File

The ignore file should contain filename patterns to exclude from the output, one pattern per line. Wildcards and regex entries are supported.

Example `ignore.txt`:

```
*.test.py
tests/
docs/
```


--- /Users/jonathan/Documents/Research/repo-analyzer/setup.py ---

from setuptools import setup, find_packages

setup(
    name='repo_analyzer',
    version='0.1',
    packages=find_packages(),
    install_requires=[],
    entry_points={
        'console_scripts': [
            'repo_analyzer=repo_analyzer.analyzer:main',
        ],
    },
    include_package_data=True,
    package_data={
        '': ['repo_analyzer_config.txt'],
    },
)


--- /Users/jonathan/Documents/Research/repo-analyzer/repo_analyzer/tree.py ---

import os

def generate_tree(directory, depth=3, current_depth=0, max_items=50, include_hidden=False):
    """
    Generate a tree structure of the directory up to a specified depth.

    Parameters:
    directory (str): The root directory to start generating the tree from.
    depth (int): The maximum depth of the directory tree to include.
    current_depth (int): The current depth in the directory tree (used for recursion).
    max_items (int): Maximum number of items to include in each directory to avoid clutter.
    include_hidden (bool): Whether to include hidden files and directories.

    Returns:
    list: A list of strings representing the directory tree.
    """
    tree = []
    with os.scandir(directory) as entries:
        items = [entry.name for entry in entries if include_hidden or not entry.name.startswith('.')]
    items.sort()
    if len(items) > max_items:
        items = items[:max_items]
        items.append('...')  # Indicate that there are more items
    for item in items:
        item_path = os.path.join(directory, item)
        if os.path.isdir(item_path):
            tree.append(f"{'  ' * current_depth}|-- {item}/")
            if current_depth < depth - 1:
                tree.extend(generate_tree(item_path, depth, current_depth + 1, max_items, include_hidden))
        else:
            tree.append(f"{'  ' * current_depth}|-- {item}")
    return tree


--- /Users/jonathan/Documents/Research/repo-analyzer/repo_analyzer/analyzer.py ---

import os
import argparse
import time
import fnmatch
from .file_ops import read_file, read_config_file
from .tree import generate_tree

def find_files_by_config(directory, include_patterns, ignore_patterns, include_hidden, valid_extensions):
    """
    Find and return a list of text and code files in the directory matching the include patterns from the config file,
    while excluding those matching the ignore patterns.

    Parameters:
    directory (str): The root directory to start searching for files.
    include_patterns (list): List of filename patterns to include in the output.
    ignore_patterns (list): List of filename patterns to exclude from the output.
    include_hidden (bool): Whether to include hidden files.
    valid_extensions (set): Set of valid file extensions to include.

    Returns:
    list: List of full paths to files that match the criteria.
    """
    files_to_include = []
    for root, _, files in os.walk(directory):
        if not include_hidden:
            files = [file for file in files if not file.startswith('.')]
        for file in files:
            file_path = os.path.join(root, file)
            if any(fnmatch.fnmatch(file_path, os.path.join(directory, pattern)) for pattern in ignore_patterns):
                continue
            if not include_patterns or any(fnmatch.fnmatch(file_path, os.path.join(directory, pattern)) for pattern in include_patterns):
                if os.path.splitext(file)[1] in valid_extensions:
                    files_to_include.append(file_path)

    # Sort files_to_include according to the order in include_patterns
    sorted_files = []
    for pattern in include_patterns:
        for file_path in files_to_include:
            if fnmatch.fnmatch(file_path, os.path.join(directory, pattern)):
                sorted_files.append(file_path)

    # Reverse the order to match the order specified by the user in the config file
    if include_patterns:
        sorted_files.reverse()
    else:
        sorted_files = files_to_include

    return sorted_files

def concatenate_files(files_to_include, output_file, max_chars, main_file=None):
    """
    Concatenate contents of specified files into an output file.

    Parameters:
    files_to_include (list): List of file paths to include.
    output_file (str): The path of the output file to write concatenated contents.
    max_chars (int): The maximum number of characters to include in the output file.
    main_file (str): Path to the main file to prioritize in the output.
    """
    char_count = 0
    if os.path.exists(output_file):
        files_to_include = [f for f in files_to_include if f != output_file]

    with open(output_file, 'a') as outfile:
        if main_file and main_file in files_to_include:
            outfile.write(f"\n\n--- {main_file} (Main File) ---\n\n")
            content, char_count = read_file(main_file, max_chars, char_count)
            outfile.write(content)
            files_to_include.remove(main_file)

        for file_path in files_to_include:
            start_time = time.time()
            print(f"Processing file: {file_path}")
            content, char_count = read_file(file_path, max_chars, char_count)
            elapsed_time = time.time() - start_time

            if elapsed_time > 10:
                print(f"Skipping file {file_path} (took too long: {elapsed_time:.2f} seconds)")
                continue

            if content:
                outfile.write(f"\n\n--- {file_path} ---\n\n")
                outfile.write(content)

            if max_chars is not None and char_count >= max_chars:
                print(f"Reached maximum character limit: {max_chars}")
                return

    print(f"Files concatenated successfully, total characters: {char_count}")

def main():
    """
    Main function to parse arguments, generate directory tree, and concatenate files.
    """
    parser = argparse.ArgumentParser(description="Analyze the structure and specific file types of a local repository.")
    parser.add_argument("directory", type=str, help="The path to the local repository")
    parser.add_argument("-m", "--main_file", type=str, help="The main file to prioritize in the output")
    parser.add_argument("-c", "--max_chars", type=int, default=None, help="Maximum number of characters to include in the output file")
    parser.add_argument("-t", "--tree_depth", type=int, default=10, help="Maximum depth of the directory tree to include in the output file")
    parser.add_argument("-i", "--include_hidden", action='store_true', help="Include hidden files and directories")
    parser.add_argument("-x", "--max_items", type=int, default=50, help="Maximum number of items to include in each directory to avoid clutter")
    parser.add_argument("-n", "--include", type=str, help="Path to the configuration file with filename patterns to include in the output")
    parser.add_argument("-g", "--ignore", type=str, help="Path to the configuration file with filename patterns to exclude from the output")

    args = parser.parse_args()

    directory = os.path.abspath(args.directory)
    main_file = args.main_file
    max_chars = args.max_chars
    tree_depth = args.tree_depth
    include_hidden = args.include_hidden
    max_items = args.max_items
    include_file = args.include
    ignore_file = args.ignore

    valid_extensions = {'.txt', '.py', '.md', '.json', '.xml', '.html', '.css', '.js', '.java', '.cpp', '.c', '.hpp', '.h', '.m', 'ipynb'}

    include_patterns = []
    ignore_patterns = []

    if include_file:
        include_file = os.path.abspath(include_file)
        include_patterns = read_config_file(include_file)
        if include_patterns is None or not include_patterns or all(name.isspace() for name in include_patterns):
            print(f"Include file {include_file} not found, is empty, or only contains spaces.")
            include_patterns = []

    if ignore_file:
        ignore_file = os.path.abspath(ignore_file)
        ignore_patterns = read_config_file(ignore_file)
        if ignore_patterns is None or not ignore_patterns or all(name.isspace() for name in ignore_patterns):
            print(f"Ignore file {ignore_file} not found, is empty, or only contains spaces.")
            ignore_patterns = []

    project_name = os.path.basename(directory)
    header = f"""
**Project Analysis Instructions**

Below is a summary of the project **{project_name}**, starting with the directory tree structure to provide an overview. This is followed by the main file, if specified, which serves as the primary runner for this tool. Use this main file to inform your understanding of all subsequent scripts, as it orchestrates the execution flow and primary logic of the project. Additional scripts have also been included for a comprehensive analysis of the tool.

User-specified files to include:
{', '.join(include_patterns) if include_patterns else 'None'}

User-specified files to ignore:
{', '.join(ignore_patterns) if ignore_patterns else 'None'}

Please perform the following tasks:

1. **Summarize the Tool/Code Purpose**: Provide a high-level summary of what this tool or code is designed to do.
2. **Identify and Summarize Critical Functions and Dependencies**: Note any critical functions, methods, or dependencies within the project. Summarize their roles and how they interact with the main file.
3. **Contextual Analysis**: Use the main file to contextualize the functionality and importance of the subsequent scripts. Highlight how they contribute to the overall operation of the tool.

*Prompt Engineering Instructions*:
- **Core Function Focus**: Prioritize analysis on core functions, such as scientific computations, large computations, or scripts that manage multiple functions. Avoid spending time on wrapper functions or utility functions unless they play a significant role.
- **Execution Flow**: Clearly outline the execution flow starting from the main file to other dependent scripts.
- **File Boundaries**: Recognize the start of a new file with the pattern --- <file_path> ---.
- **Avoid Redundancy**: Do not reproduce the code verbatim. Instead, focus on storing and utilizing the code structure and logic in your memory for this analysis.

"""

    output_file = os.path.join(directory, project_name + "_summary.txt")
    
    print(f"Generating directory tree for {directory}...")
    tree = generate_tree(directory, depth=tree_depth, max_items=max_items, include_hidden=include_hidden)
    
    with open(output_file, 'w') as outfile:
        outfile.write(header)
        outfile.write("Directory Structure:\n")
        outfile.write("\n".join(tree))
        outfile.write("\n\nConcatenated Files:\n")

    print(f"Finding files to concatenate in {directory}...")
    files_to_include = find_files_by_config(directory, include_patterns, ignore_patterns, include_hidden, valid_extensions)

    if not files_to_include:
        print("No files found to include based on the configuration.")

    print(f"Concatenating files for {directory}...")
    concatenate_files(files_to_include, output_file, max_chars, main_file=main_file)

    print(f"Output saved to {output_file}")

if __name__ == "__main__":
    main()


--- /Users/jonathan/Documents/Research/repo-analyzer/repo_analyzer/file_ops.py ---

import os

def read_file(file_path, max_chars, char_count):
    """
    Read a file up to a maximum number of characters.

    Parameters:
    file_path (str): The path to the file.
    max_chars (int): The maximum number of characters to read from the file.
    char_count (int): The current character count.

    Returns:
    tuple: A tuple containing the file content and the new character count.
    """
    with open(file_path, 'r', errors='ignore') as infile:
        content = []
        for line in infile:
            if max_chars is not None and char_count + len(line) > max_chars:
                content.append(line[:max_chars - char_count])
                char_count = max_chars
                break
            else:
                content.append(line)
                char_count += len(line)
    return ''.join(content), char_count

def read_config_file(config_path):
    """
    Read filenames from a configuration file.

    Parameters:
    config_path (str): Path to the configuration file.

    Returns:
    list: List of filenames read from the configuration file.
    """
    if not os.path.exists(config_path):
        return None
    with open(config_path, 'r') as file:
        filenames = [line.strip() for line in file if line.strip()]
    return filenames


--- /Users/jonathan/Documents/Research/repo-analyzer/repo_analyzer.egg-info/SOURCES.txt ---

LICENSE
README.md
setup.py
repo_analyzer/__init__.py
repo_analyzer/analyzer.py
repo_analyzer/file_ops.py
repo_analyzer/tree.py
repo_analyzer.egg-info/PKG-INFO
repo_analyzer.egg-info/SOURCES.txt
repo_analyzer.egg-info/dependency_links.txt
repo_analyzer.egg-info/entry_points.txt
repo_analyzer.egg-info/top_level.txt

--- /Users/jonathan/Documents/Research/repo-analyzer/repo_analyzer.egg-info/entry_points.txt ---

[console_scripts]
repo_analyzer = repo_analyzer.analyzer:main


--- /Users/jonathan/Documents/Research/repo-analyzer/repo_analyzer.egg-info/top_level.txt ---

repo_analyzer


--- /Users/jonathan/Documents/Research/repo-analyzer/repo_analyzer.egg-info/dependency_links.txt ---