forked from jonm3D/repo-analyzer
-
Notifications
You must be signed in to change notification settings - Fork 0
/
repo-analyzer_summary.txt
447 lines (343 loc) · 18 KB
/
repo-analyzer_summary.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
**Project Analysis Instructions**
Below is a summary of the project **repo-analyzer**, starting with the directory tree structure to provide an overview. This is followed by the main file, if specified, which serves as the primary runner for this tool. Use this main file to inform your understanding of all subsequent scripts, as it orchestrates the execution flow and primary logic of the project. Additional scripts have also been included for a comprehensive analysis of the tool.
User-specified files to include:
None
User-specified files to ignore:
build/*, repo_analyzer.egg_info/*
Please perform the following tasks:
1. **Summarize the Tool/Code Purpose**: Provide a high-level summary of what this tool or code is designed to do.
2. **Identify and Summarize Critical Functions and Dependencies**: Note any critical functions, methods, or dependencies within the project. Summarize their roles and how they interact with the main file.
3. **Contextual Analysis**: Use the main file to contextualize the functionality and importance of the subsequent scripts. Highlight how they contribute to the overall operation of the tool.
*Prompt Engineering Instructions*:
- **Core Function Focus**: Prioritize analysis on core functions, such as scientific computations, large computations, or scripts that manage multiple functions. Avoid spending time on wrapper functions or utility functions unless they play a significant role.
- **Execution Flow**: Clearly outline the execution flow starting from the main file to other dependent scripts.
- **File Boundaries**: Recognize the start of a new file with the pattern --- <file_path> ---.
- **Avoid Redundancy**: Do not reproduce the code verbatim. Instead, focus on storing and utilizing the code structure and logic in your memory for this analysis.
Directory Structure:
|-- LICENSE
|-- README.md
|-- build/
|-- bdist.macosx-11.0-arm64/
|-- lib/
|-- repo_analyzer/
|-- __init__.py
|-- analyzer.py
|-- cli.py
|-- file_ops.py
|-- tree.py
|-- repo-analyzer_summary.txt
|-- repo_analyzer/
|-- __init__.py
|-- analyzer.py
|-- file_ops.py
|-- tree.py
|-- repo_analyzer.egg-info/
|-- PKG-INFO
|-- SOURCES.txt
|-- dependency_links.txt
|-- entry_points.txt
|-- top_level.txt
|-- setup.py
|-- tests/
Concatenated Files:
--- /Users/jonathan/Documents/Research/repo-analyzer/README.md ---
# RepoConcat
This tool combines the contents of a GitHub repository into a single text file, making it easier to use with Large Language Models (LLMs). It generates a summary of the project, including a directory tree and concatenated contents of specified files, with the option to prioritize a main file.
## Features
- **Repository Summarization**: Combines the contents of a GitHub repository into a single text file for easy use with LLMs.
- **Directory Tree Generation**: Provides a detailed directory tree structure of the repository.
- **File Concatenation**: Concatenates specified files into an output file in the order listed in the configuration file.
- **Main File Prioritization**: Prioritizes a specified main file in the output.
- **Hidden Files Inclusion**: Optionally includes hidden files and directories.
- **Pattern Matching**: Supports wildcard and regex entries for specifying file patterns.
- **Ignore Patterns**: Supports excluding files or folders based on specified patterns.
- **Customization**: Allows customization of maximum characters, tree depth, and items per directory.
## Installation
To install the necessary dependencies, run:
```sh
pip install -r requirements.txt
```
## Usage
```sh
python main.py [directory] [-m MAIN_FILE] [-c MAX_CHARS] [-t TREE_DEPTH] [-i] [-x MAX_ITEMS] [-n INCLUDE] [-g IGNORE]
```
### Arguments
- `directory` (str): The path to the local repository.
- `-m`, `--main_file` (str): The main file to prioritize in the output.
- `-c`, `--max_chars` (int): Maximum number of characters to include in the output file.
- `-t`, `--tree_depth` (int): Maximum depth of the directory tree to include in the output file.
- `-i`, `--include_hidden`: Include hidden files and directories.
- `-x`, `--max_items` (int): Maximum number of items to include in each directory to avoid clutter.
- `-n`, `--include` (str): Path to the configuration file with filename patterns to include in the output.
- `-g`, `--ignore` (str): Path to the configuration file with filename patterns to exclude from the output.
### Example
```sh
python main.py /path/to/repository -m main.py -c 10000 -t 5 -i -x 100 -n include.txt -g ignore.txt
```
## Configuration Files
### Include File
The include file should contain filename patterns to include in the output, one pattern per line. Wildcards and regex entries are supported.
Example `include.txt`:
```
*.py
*.md
*.json
```
### Ignore File
The ignore file should contain filename patterns to exclude from the output, one pattern per line. Wildcards and regex entries are supported.
Example `ignore.txt`:
```
*.test.py
tests/
docs/
```
--- /Users/jonathan/Documents/Research/repo-analyzer/setup.py ---
from setuptools import setup, find_packages
setup(
name='repo_analyzer',
version='0.1',
packages=find_packages(),
install_requires=[],
entry_points={
'console_scripts': [
'repo_analyzer=repo_analyzer.analyzer:main',
],
},
include_package_data=True,
package_data={
'': ['repo_analyzer_config.txt'],
},
)
--- /Users/jonathan/Documents/Research/repo-analyzer/repo_analyzer/tree.py ---
import os
def generate_tree(directory, depth=3, current_depth=0, max_items=50, include_hidden=False):
"""
Generate a tree structure of the directory up to a specified depth.
Parameters:
directory (str): The root directory to start generating the tree from.
depth (int): The maximum depth of the directory tree to include.
current_depth (int): The current depth in the directory tree (used for recursion).
max_items (int): Maximum number of items to include in each directory to avoid clutter.
include_hidden (bool): Whether to include hidden files and directories.
Returns:
list: A list of strings representing the directory tree.
"""
tree = []
with os.scandir(directory) as entries:
items = [entry.name for entry in entries if include_hidden or not entry.name.startswith('.')]
items.sort()
if len(items) > max_items:
items = items[:max_items]
items.append('...') # Indicate that there are more items
for item in items:
item_path = os.path.join(directory, item)
if os.path.isdir(item_path):
tree.append(f"{' ' * current_depth}|-- {item}/")
if current_depth < depth - 1:
tree.extend(generate_tree(item_path, depth, current_depth + 1, max_items, include_hidden))
else:
tree.append(f"{' ' * current_depth}|-- {item}")
return tree
--- /Users/jonathan/Documents/Research/repo-analyzer/repo_analyzer/analyzer.py ---
import os
import argparse
import time
import fnmatch
from .file_ops import read_file, read_config_file
from .tree import generate_tree
def find_files_by_config(directory, include_patterns, ignore_patterns, include_hidden, valid_extensions):
"""
Find and return a list of text and code files in the directory matching the include patterns from the config file,
while excluding those matching the ignore patterns.
Parameters:
directory (str): The root directory to start searching for files.
include_patterns (list): List of filename patterns to include in the output.
ignore_patterns (list): List of filename patterns to exclude from the output.
include_hidden (bool): Whether to include hidden files.
valid_extensions (set): Set of valid file extensions to include.
Returns:
list: List of full paths to files that match the criteria.
"""
files_to_include = []
for root, _, files in os.walk(directory):
if not include_hidden:
files = [file for file in files if not file.startswith('.')]
for file in files:
file_path = os.path.join(root, file)
if any(fnmatch.fnmatch(file_path, os.path.join(directory, pattern)) for pattern in ignore_patterns):
continue
if not include_patterns or any(fnmatch.fnmatch(file_path, os.path.join(directory, pattern)) for pattern in include_patterns):
if os.path.splitext(file)[1] in valid_extensions:
files_to_include.append(file_path)
# Sort files_to_include according to the order in include_patterns
sorted_files = []
for pattern in include_patterns:
for file_path in files_to_include:
if fnmatch.fnmatch(file_path, os.path.join(directory, pattern)):
sorted_files.append(file_path)
# Reverse the order to match the order specified by the user in the config file
if include_patterns:
sorted_files.reverse()
else:
sorted_files = files_to_include
return sorted_files
def concatenate_files(files_to_include, output_file, max_chars, main_file=None):
"""
Concatenate contents of specified files into an output file.
Parameters:
files_to_include (list): List of file paths to include.
output_file (str): The path of the output file to write concatenated contents.
max_chars (int): The maximum number of characters to include in the output file.
main_file (str): Path to the main file to prioritize in the output.
"""
char_count = 0
if os.path.exists(output_file):
files_to_include = [f for f in files_to_include if f != output_file]
with open(output_file, 'a') as outfile:
if main_file and main_file in files_to_include:
outfile.write(f"\n\n--- {main_file} (Main File) ---\n\n")
content, char_count = read_file(main_file, max_chars, char_count)
outfile.write(content)
files_to_include.remove(main_file)
for file_path in files_to_include:
start_time = time.time()
print(f"Processing file: {file_path}")
content, char_count = read_file(file_path, max_chars, char_count)
elapsed_time = time.time() - start_time
if elapsed_time > 10:
print(f"Skipping file {file_path} (took too long: {elapsed_time:.2f} seconds)")
continue
if content:
outfile.write(f"\n\n--- {file_path} ---\n\n")
outfile.write(content)
if max_chars is not None and char_count >= max_chars:
print(f"Reached maximum character limit: {max_chars}")
return
print(f"Files concatenated successfully, total characters: {char_count}")
def main():
"""
Main function to parse arguments, generate directory tree, and concatenate files.
"""
parser = argparse.ArgumentParser(description="Analyze the structure and specific file types of a local repository.")
parser.add_argument("directory", type=str, help="The path to the local repository")
parser.add_argument("-m", "--main_file", type=str, help="The main file to prioritize in the output")
parser.add_argument("-c", "--max_chars", type=int, default=None, help="Maximum number of characters to include in the output file")
parser.add_argument("-t", "--tree_depth", type=int, default=10, help="Maximum depth of the directory tree to include in the output file")
parser.add_argument("-i", "--include_hidden", action='store_true', help="Include hidden files and directories")
parser.add_argument("-x", "--max_items", type=int, default=50, help="Maximum number of items to include in each directory to avoid clutter")
parser.add_argument("-n", "--include", type=str, help="Path to the configuration file with filename patterns to include in the output")
parser.add_argument("-g", "--ignore", type=str, help="Path to the configuration file with filename patterns to exclude from the output")
args = parser.parse_args()
directory = os.path.abspath(args.directory)
main_file = args.main_file
max_chars = args.max_chars
tree_depth = args.tree_depth
include_hidden = args.include_hidden
max_items = args.max_items
include_file = args.include
ignore_file = args.ignore
valid_extensions = {'.txt', '.py', '.md', '.json', '.xml', '.html', '.css', '.js', '.java', '.cpp', '.c', '.hpp', '.h', '.m', 'ipynb'}
include_patterns = []
ignore_patterns = []
if include_file:
include_file = os.path.abspath(include_file)
include_patterns = read_config_file(include_file)
if include_patterns is None or not include_patterns or all(name.isspace() for name in include_patterns):
print(f"Include file {include_file} not found, is empty, or only contains spaces.")
include_patterns = []
if ignore_file:
ignore_file = os.path.abspath(ignore_file)
ignore_patterns = read_config_file(ignore_file)
if ignore_patterns is None or not ignore_patterns or all(name.isspace() for name in ignore_patterns):
print(f"Ignore file {ignore_file} not found, is empty, or only contains spaces.")
ignore_patterns = []
project_name = os.path.basename(directory)
header = f"""
**Project Analysis Instructions**
Below is a summary of the project **{project_name}**, starting with the directory tree structure to provide an overview. This is followed by the main file, if specified, which serves as the primary runner for this tool. Use this main file to inform your understanding of all subsequent scripts, as it orchestrates the execution flow and primary logic of the project. Additional scripts have also been included for a comprehensive analysis of the tool.
User-specified files to include:
{', '.join(include_patterns) if include_patterns else 'None'}
User-specified files to ignore:
{', '.join(ignore_patterns) if ignore_patterns else 'None'}
Please perform the following tasks:
1. **Summarize the Tool/Code Purpose**: Provide a high-level summary of what this tool or code is designed to do.
2. **Identify and Summarize Critical Functions and Dependencies**: Note any critical functions, methods, or dependencies within the project. Summarize their roles and how they interact with the main file.
3. **Contextual Analysis**: Use the main file to contextualize the functionality and importance of the subsequent scripts. Highlight how they contribute to the overall operation of the tool.
*Prompt Engineering Instructions*:
- **Core Function Focus**: Prioritize analysis on core functions, such as scientific computations, large computations, or scripts that manage multiple functions. Avoid spending time on wrapper functions or utility functions unless they play a significant role.
- **Execution Flow**: Clearly outline the execution flow starting from the main file to other dependent scripts.
- **File Boundaries**: Recognize the start of a new file with the pattern --- <file_path> ---.
- **Avoid Redundancy**: Do not reproduce the code verbatim. Instead, focus on storing and utilizing the code structure and logic in your memory for this analysis.
"""
output_file = os.path.join(directory, project_name + "_summary.txt")
print(f"Generating directory tree for {directory}...")
tree = generate_tree(directory, depth=tree_depth, max_items=max_items, include_hidden=include_hidden)
with open(output_file, 'w') as outfile:
outfile.write(header)
outfile.write("Directory Structure:\n")
outfile.write("\n".join(tree))
outfile.write("\n\nConcatenated Files:\n")
print(f"Finding files to concatenate in {directory}...")
files_to_include = find_files_by_config(directory, include_patterns, ignore_patterns, include_hidden, valid_extensions)
if not files_to_include:
print("No files found to include based on the configuration.")
print(f"Concatenating files for {directory}...")
concatenate_files(files_to_include, output_file, max_chars, main_file=main_file)
print(f"Output saved to {output_file}")
if __name__ == "__main__":
main()
--- /Users/jonathan/Documents/Research/repo-analyzer/repo_analyzer/file_ops.py ---
import os
def read_file(file_path, max_chars, char_count):
"""
Read a file up to a maximum number of characters.
Parameters:
file_path (str): The path to the file.
max_chars (int): The maximum number of characters to read from the file.
char_count (int): The current character count.
Returns:
tuple: A tuple containing the file content and the new character count.
"""
with open(file_path, 'r', errors='ignore') as infile:
content = []
for line in infile:
if max_chars is not None and char_count + len(line) > max_chars:
content.append(line[:max_chars - char_count])
char_count = max_chars
break
else:
content.append(line)
char_count += len(line)
return ''.join(content), char_count
def read_config_file(config_path):
"""
Read filenames from a configuration file.
Parameters:
config_path (str): Path to the configuration file.
Returns:
list: List of filenames read from the configuration file.
"""
if not os.path.exists(config_path):
return None
with open(config_path, 'r') as file:
filenames = [line.strip() for line in file if line.strip()]
return filenames
--- /Users/jonathan/Documents/Research/repo-analyzer/repo_analyzer.egg-info/SOURCES.txt ---
LICENSE
README.md
setup.py
repo_analyzer/__init__.py
repo_analyzer/analyzer.py
repo_analyzer/file_ops.py
repo_analyzer/tree.py
repo_analyzer.egg-info/PKG-INFO
repo_analyzer.egg-info/SOURCES.txt
repo_analyzer.egg-info/dependency_links.txt
repo_analyzer.egg-info/entry_points.txt
repo_analyzer.egg-info/top_level.txt
--- /Users/jonathan/Documents/Research/repo-analyzer/repo_analyzer.egg-info/entry_points.txt ---
[console_scripts]
repo_analyzer = repo_analyzer.analyzer:main
--- /Users/jonathan/Documents/Research/repo-analyzer/repo_analyzer.egg-info/top_level.txt ---
repo_analyzer
--- /Users/jonathan/Documents/Research/repo-analyzer/repo_analyzer.egg-info/dependency_links.txt ---