Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore Binary Files #53

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
8d59c15
add copy to clipboard option
hargup Mar 17, 2023
b094525
add requirements.txt
hargup Mar 17, 2023
7e2376d
Merge pull request #1 from felvin-search/clipboard
hargup Mar 17, 2023
5fb35a7
add script to install gpt_repository_loader as a console script
hargup Mar 17, 2023
e05b63a
Merge pull request #2 from felvin-search/console_script
hargup Mar 17, 2023
20bab53
fix the ignoring of dist and build
hargup Mar 17, 2023
bfbee00
allow using the script as a library as well
hargup Mar 19, 2023
a214ffc
add a basic release script
hargup Mar 19, 2023
4de5768
bump version
hargup Mar 19, 2023
9c5c7cb
Merge pull request #3 from felvin-search/console_script_2
hargup Mar 19, 2023
204766d
Use .gitignore if .gptignore is not present, also ignore .git
hargup Mar 19, 2023
5ace4f2
Merge pull request #4 from felvin-search/ignore_gitignore
hargup Mar 21, 2023
3c95317
print directory structure
hargup Mar 21, 2023
a07429d
Merge pull request #5 from felvin-search/repo_context
hargup Mar 21, 2023
17b3b19
add pycache in default ignore list
hargup Mar 22, 2023
24229d0
add node_modules in the default ignore list
hargup Mar 23, 2023
c714082
use argparse for argument parsing
hargup Mar 28, 2024
ca767dd
bump patch version
hargup Mar 28, 2024
2748a92
don't write to repo if -c flag is there
hargup Mar 29, 2024
825fe7e
Update README.md
hargup May 16, 2024
4d974a0
use git ls-files to get the list of tracked files
hargup May 16, 2024
05f9d49
bump version
hargup May 16, 2024
6208255
add package-lock.json in default ignore list
hargup May 21, 2024
b7a3311
bump version
hargup May 29, 2024
0189bfb
update readme
hargup May 29, 2024
9ecffe8
ignore image, video and audio files
hargup May 31, 2024
4465230
ignore yarn files
hargup May 31, 2024
a5ce37a
bump version
hargup May 31, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ __pycache__/
*.pyo
*.pyd

dist
build
dist/*
build/*
# Output file
output.txt

Expand Down
6 changes: 5 additions & 1 deletion .gptignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,8 @@ __pycache__/
.git/*
.gptignore
LICENSE
.github/*
.github/*
dist
build
dist/*
build/*
65 changes: 29 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,31 @@
# gpt-repository-loader

`gpt-repository-loader` is a command-line tool that converts the contents of a Git repository into a text format, preserving the structure of the files and file contents. The generated output can be interpreted by AI language models, allowing them to process the repository's contents for various tasks, such as code review or documentation generation.

## Contributing
Some context around building this is [located here](https://github.com/mpoon/gpt-repository-loader/discussions/18). Appreciate any issues and pull requests in the spirit of having mostly GPT build out this tool. Using [ChatGPT Plus](https://chat.openai.com/) is recommended for quick access to GPT-4.

## Getting Started

To get started with `gpt-repository-loader`, follow these steps:

1. Ensure you have Python 3 installed on your system.
2. Clone or download the `gpt-repository-loader` repository.
3. Navigate to the repository's root directory in your terminal.
4. Run `gpt-repository-loader` with the following command:

```bash
python gpt_repository_loader.py /path/to/git/repository
```
Replace `/path/to/git/repository` with the path to the Git repository you want to process.

5. The tool will generate an output.txt file containing the text representation of the repository. You can now use this file as input for AI language models or other text-based processing tasks.

## Running Tests

To run the tests for `gpt-repository-loader`, follow these steps:

1. Ensure you have Python 3 installed on your system.
2. Navigate to the repository's root directory in your terminal.
3. Run the tests with the following command:

```bash
python -m unittest test_gpt_repository_loader.py
```
Now, the test harness is added to the `gpt-repository-loader` project. You can run the tests by executing the command `python -m unittest test_gpt_repository_loader.py` in your terminal.

## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Installation

`pip install gpt-repository-loader`

## Linux Requirements
On Linux, ensure that you have `xclip` installed for clipboard functionality. You can install it using:
```bash
sudo apt-get install xclip # Debian/Ubuntu
sudo yum install xclip # Fedora/CentOS
```

## How to use?
Go to the directory you are interested in, run
```
gpt-repository-loader . -c
```
This will copy ALL the git tracked content in the repository on clipboard and then you can use [Gemini](https://aistudio.google.com/app/prompts/new_chat)/[Claude](https://claude.ai)/[ChatGPT](https://chatgpt.com) to ask questions on it.

### Available Command Line Flags
* `repo_path`: (Required) Path to the Git repository.
* `-p`, `--preamble`: Path to a preamble file to include before the repository content.
* `-c`, `--copy`: Copies the repository contents to the clipboard. If not provided, the output will be written to a file named `output.txt` in the current directory.

## What to use it for?
- Build a README for codebases
- Work with Legacy code
- Debug issues

Gemini's 1M context window is REALLLY big, and it under utilized.
61 changes: 0 additions & 61 deletions gpt_repository_loader.py

This file was deleted.

1 change: 1 addition & 0 deletions gpt_repository_loader/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .gpt_repository_loader import main, git_repo_to_text, print_directory_structure,get_ignore_list
113 changes: 113 additions & 0 deletions gpt_repository_loader/gpt_repository_loader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
#!/usr/bin/env python3

import os
import argparse
import fnmatch
import pyperclip
import io
import subprocess

def should_ignore(file_path, ignore_list):
for pattern in ignore_list:
if fnmatch.fnmatch(file_path, pattern):
return True
return False

def get_ignore_list(repo_path):
ignore_list = []
ignore_file_path = None

gpt_ignore_path = os.path.join(repo_path, ".gptignore")
git_ignore_path = os.path.join(repo_path, ".gitignore")

if os.path.exists(gpt_ignore_path):
ignore_file_path = gpt_ignore_path
elif os.path.exists(git_ignore_path):
ignore_file_path = git_ignore_path
else:
print("No ignore file present")

if ignore_file_path:
with open(ignore_file_path, 'r') as ignore_file:
for line in ignore_file:
line = line.strip()
if not line or line.startswith("#"):
continue
ignore_list.append(line)

default_ignore_list = ['dist', 'dist/','dist/*','sdist', 'sdist/','sdist/*' '.git/', '/.git/', '.git', '.git/*', '.gptignore', '.gitignore', 'node_modules', 'node_modules/*', '__pycache__', '__pycache__/*', 'package-lock.json', 'yarn.lock', 'yarn-error.log']
image_ignore_list = ['*.png', '*.jpg', '*.jpeg', '*.gif', '*.bmp', '*.ico', '*.cur', '*.tiff', '*.webp', '*.avif']
video_ignore_list = ['*.mp4', '*.mov', '*.wmv', '*.avi', '*.mkv', '*.flv', '*.webm', '*.mp3', '*.wav', '*.aac', '*.m4a', '*.mpa', '*.mpeg', '*.mpe', '*.mpg', '*.mpi', '*.mpt', '*.mpx', '*.ogv', '*.webm', '*.wmv', '*.yuv']
audio_ignore_list = ['*.mp3', '*.wav', '*.aac', '*.m4a', '*.mpa', '*.mpeg', '*.mpe', '*.mpg', '*.mpi', '*.mpt', '*.mpx', '*.ogv', '*.webm', '*.wmv', '*.yuv']
ignore_list += default_ignore_list + image_ignore_list + video_ignore_list + audio_ignore_list

return ignore_list

def process_repository(repo_path, ignore_list, output_stream):
git_files = subprocess.check_output(["git", "ls-files"], cwd=repo_path, universal_newlines=True).splitlines()

for file_path in git_files:
if not should_ignore(file_path, ignore_list):
full_path = os.path.join(repo_path, file_path)
with open(full_path, 'r', errors='ignore') as file:
contents = file.read()
output_stream.write("-" * 4 + "\n")
output_stream.write(f"{file_path}\n")
output_stream.write(f"{contents}\n")


def git_repo_to_text(repo_path, preamble_file=None):
ignore_list = get_ignore_list(repo_path)

output_stream = io.StringIO()

if preamble_file:
with open(preamble_file, 'r') as pf:
preamble_text = pf.read()
output_stream.write(f"{preamble_text}\n")
else:
output_stream.write("The following text is a Git repository with code. The structure of the text are sections that begin with ----, followed by a single line containing the file path and file name, followed by a variable amount of lines containing the file contents. The text representing the Git repository ends when the symbols --END-- are encounted. Any further text beyond --END-- are meant to be interpreted as instructions using the aforementioned Git repository as context.\n")

process_repository(repo_path, ignore_list, output_stream)

output_stream.write("--END--")

return output_stream.getvalue()

def main():
parser = argparse.ArgumentParser(description="Convert a Git repository to text.")
parser.add_argument("repo_path", help="Path to the Git repository.")
parser.add_argument("-p", "--preamble", help="Path to a preamble file.")
parser.add_argument("-c", "--copy", action="store_true", help="Copy the repository contents to clipboard.")
args = parser.parse_args()

repo_as_text = git_repo_to_text(args.repo_path, args.preamble)

if args.copy:
pyperclip.copy(repo_as_text)
print("Repository contents copied to clipboard.")
else:
with open('output.txt', 'w') as output_file:
output_file.write(repo_as_text)
print("Repository contents written to output.txt.")


def print_directory_structure(repo_path, indent=0, max_depth=2, ignore_list=None):
if ignore_list is None:
ignore_list = get_ignore_list(repo_path)

if indent <= max_depth:
for item in os.listdir(repo_path):
full_path = os.path.join(repo_path, item)
if os.path.isdir(full_path):
if should_ignore(full_path, ignore_list) or should_ignore(item, ignore_list):
continue
print("| " * indent + "|--" + item + "/")
print_directory_structure(full_path, indent + 1, max_depth, ignore_list)
else:
if should_ignore(full_path, ignore_list) or should_ignore(item, ignore_list):
continue
print("| " * indent + "|--" + item)

if __name__ == "__main__":
main()
7 changes: 7 additions & 0 deletions release.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash

# Remember you'll need to manually update the version in setup.py
# Also TWINE_USERNAME and TWINE_PASSWORD env variables should be set
python setup.py sdist bdist_wheel
source .env
twine upload dist/* --verbose --username $TWINE_USERNAME --password $TWINE_PASSWORD
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pyperclip
37 changes: 37 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#!/usr/bin/env python3

from setuptools import setup, find_packages

with open("README.md", "r") as fh:
long_description = fh.read()

setup(
name="gpt-repository-loader",
version="0.9.3",
author="Felvin",
author_email="[email protected]",
description="A utility to convert a Git repository into a text representation.",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/felvin-search/gpt-repository-loader",
packages=find_packages(),
classifiers=[
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
],
python_requires=">=3.6",
install_requires=["pyperclip"],
entry_points={
"console_scripts": [
"gpt-repository-loader=gpt_repository_loader:main",
],
},
)