Skip to content

Commit

Permalink
Init commit
Browse files Browse the repository at this point in the history
Init commit

Init commit
  • Loading branch information
d4rkc0de committed Sep 18, 2024
0 parents commit 964216f
Show file tree
Hide file tree
Showing 37 changed files with 15,130 additions and 0 deletions.
74 changes: 74 additions & 0 deletions .env.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
The .env file is used to store environment variables for the backend server. it contains two sections: Text Processing Configuration and Image Processing Configuration:

## Text Processing Configuration

These variables are used for document (text) processing:

- TEXT_API_END_POINT: Specifies the API endpoint for text processing.
- TEXT_MODEL_NAME: Defines the model to be used for text processing.
- TEXT_API_KEYS: A list containing the API key(s) required for authentication when making requests to the text API
endpoint. **`Using multiple keys will help in avoiding rate limits.`**

## Image Processing Configuration

These variables are used for image processing:

- IMAGE_API_END_POINT: Specifies the API endpoint for image processing.
- IMAGE_MODEL_NAME: Defines the model used for image processing.
- IMAGE_API_KEYS: A list containing the API key(s) for image processing requests. Using multiple keys will help in avoiding rate limits.


## Examples:

- **OPENAI**
```bash
# API and MODEL used for documents processing
TTEXT_API_END_POINT=https://api.openai.com/v1
TTEXT_MODEL_NAME=gpt-4o
TTEXT_API_KEYS=["sk-xxx","sk-yyy"]

# API and MODEL used for images processing
TIMAGE_API_END_POINT=https://api.openai.com/v1
TIMAGE_MODEL_NAME=gpt-4o
TIMAGE_API_KEYS=["sk-xxx","sk-yyy"]
```

- **GROQ**
```bash
# API and MODEL used for documents processing
TEXT_API_END_POINT=https://api.groq.com/openai/v1
TEXT_MODEL_NAME=llama3-70b-8192
TEXT_API_KEYS=["gsk_xxx","gsk_yyy"]

# API and MODEL used for images processing ( No vision models for GROQ yet)
IMAGE_API_END_POINT=http://localhost:11434/v1
IMAGE_MODEL_NAME=moondream:latest
IMAGE_API_KEYS=["ollama"]
```

- **OLLAMA**
```bash
# API and MODEL used for documents processing
TEXT_API_END_POINT=http://localhost:11434/v1
TEXT_MODEL_NAME=gemma2:latest
TEXT_API_KEYS=["ollama"]

# API and MODEL used for images processing
IMAGE_API_END_POINT=http://localhost:11434/v1
IMAGE_MODEL_NAME=moondream:latest
IMAGE_API_KEYS=["ollama"]
```


- **HUGGING FACE**
```bash
# API and MODEL used for documents processing
TEXT_API_END_POINT=https://api-inference.huggingface.co/v1
TEXT_MODEL_NAME=microsoft/Phi-3-mini-4k-instruct
TEXT_API_KEYS=["hf_xxx","hf_yyy"]

# API and MODEL used for images processing
IMAGE_API_END_POINT=https://api-inference.huggingface.co/v1
IMAGE_MODEL_NAME=nlpconnect/vit-gpt2-image-captioning
IMAGE_API_KEYS=["hf_xxx","hf_yyy"]
```
22 changes: 22 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# IDE and OS specific files
**/.idea/
**/.vscode/
.DS_Store
Thumbs.db

# Frontend (Angular) specific files
frontend/node_modules/
frontend/dist/
frontend/.angular/
frontend/*.js.map

# Backend (FastAPI) specific files
backend/__pycache__/
backend/*.pyc
backend/*.pyo
backend/*.pyd
backend/venv/
venv/
backend/env/
**/__pycache__/
backend/app/*.db
144 changes: 144 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# FileWizardAi

## Description

FileWizardAi is a Python/Angular project designed to automatically organize your files into a well-structured directory
hierarchy and rename them according to their content. This tool is ideal for anyone looking to declutter their digital
workspace by sorting files into appropriate folders and providing descriptive names, making it easier to manage and
locate files. Additionally, it allows you to input a text prompt and instantly searches for files that are related to
your query, providing you with the most relevant files based on the content you provide.

The app also features a caching system to minimize API calls, ensuring that only new or modified files are processed.

### Example:

**Before**

```bash
/home/user
├── Downloads
│ ├── 6.1 Course Curriculum v2.pdf
│ └── trip_paris.txt
│ └── 8d71473c-533f-4ba3-9bce-55d3d9a6662a.jpg
│ └── Screenshot_from_2024-06-10_21-39-24.png
```

**After**

```bash
/home/user/Downloads
├─ docs
│ └─ certifications
│ └─ databricks
│ └─ data_engineer_associate
│ └─ curriculum_v2.pdf
├─ Personal Photos
│ └─ 2024
│ └─ 03
│ └─ 01
│ └─ person_in_black_shirt.jpg
├─ finance-docs
│ └─ trip-expenses
│ └─ paris
│ └─ trip-justification.txt
└─ project Assets
└─ instructions_screenshot.png
```

### Video tutorial:

[![Watch the video](./yt_video_logo.png)](https://www.youtube.com/watch?v=T1rXLox80rM)


## Table of Contents

- [Installation](#installation)
- [Usage](#usage)
- [Run in Development Mode](#run-in-development-mode)
- [Credits](#credits)
- [License](#license)
- [Technical architecture](#technical-architecture)

## Installation

Make sure you have Python installed on your machine.

First, clone the repository:

```bash
git clone https://github.com/AIxHunter/FileWizardAi.git
```

Navigate to the backend folder and update your `.env` file according to the [documentation](.env.md). Then, install the
required
packages by running ( preferably in a virtual environment like venv or conda):

```bash
cd backend
pip install -r requirements.txt
```

## Usage

Run the backend server

```bash
cd backend
uvicorn app.server:app --host localhost --port 8000
```

App will be running under: http://localhost:8000/

## Run in Development Mode

If you are a developper and you want to modify the frontend, you can run the frontend and backend separately, here is
how to do it:
Install Node.js https://nodejs.org/

install Angular CLI:

```bash
npm install -g @angular/cli
```

Run frontend:

```bash
cd frontend
npm install
ng serve
```

The frontend will be available at `http://localhost:4200`.

to package the frontend run:

```bash
ng build --base-href static/
```

Run backend:

Update your `.env` file with the desired API settings (check the [documentation](.env.md)), then:

```bash
cd backend
uvicorn app.server:app --host localhost --port 8000 --reload
```

## Technical architecture

<img src="filewizardai_architecture.png" alt="Online Image" width="600"/>

1. Send request from Angular frontend (ex, organize files)
2. Backend receives request through a REST API of FastAPI.
3. Check SQLite if files has already been processed (cached files).
4. Return cached summary if file was already processed.
5. If the file has not been processed before, send new file to LLM for summarization.
6. Cache summary in SQLite.
7. Return summary to Angular frontend.
8. Display summary to user and perform actions if need it.

## License

This project is licensed under the MIT License.
9 changes: 9 additions & 0 deletions backend/.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# API and MODEL used for documents processing
TEXT_API_END_POINT=https://api.groq.com/openai/v1
TEXT_MODEL_NAME=llama3-70b-8192
TEXT_API_KEYS=["gsk_xxx"]

# API and MODEL used for images processing
IMAGE_API_END_POINT=http://localhost:11434/v1
IMAGE_MODEL_NAME=moondream:latest
IMAGE_API_KEYS=["ollama"] # Required but not used
Empty file added backend/app/__init__.py
Empty file.
63 changes: 63 additions & 0 deletions backend/app/database.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
import sqlite3


class SQLiteDB:
def __init__(self):
self.conn = sqlite3.connect('FileWizardAi.db')
self.cursor = self.conn.cursor()
create_table_query = "CREATE TABLE IF NOT EXISTS files_summary (file_path TEXT PRIMARY KEY,file_hash TEXT NOT NULL,summary TEXT)"
self.cursor.execute(create_table_query)
self.conn.commit()

def select(self, table_name, where_clause=None):
sql = f"SELECT * FROM {table_name}"
if where_clause:
sql += f" WHERE {where_clause}"
self.cursor.execute(sql)
return self.cursor.fetchall()

def is_file_exist(self, file_path, file_hash):
self.cursor.execute("SELECT * FROM files_summary WHERE file_path = ? AND file_hash = ?", (file_path, file_hash))
file = self.cursor.fetchone()
return bool(file)

def insert_file_summary(self, file_path, file_hash, summary):
c = self.conn.cursor()
c.execute("SELECT * FROM files_summary WHERE file_path=?", (file_path,))
user_exists = c.fetchone()

if user_exists:
c.execute("UPDATE files_summary SET file_hash=?, summary=? WHERE file_path=?",
(file_hash, summary, file_path))
else:
c.execute("INSERT INTO files_summary (file_path, file_hash, summary) VALUES (?, ?, ?)",
(file_path, file_hash, summary))
self.conn.commit()

def get_file_summary(self, file_path):
self.cursor.execute("SELECT summary FROM files_summary WHERE file_path = ?", (file_path,))
result = self.cursor.fetchone()
return result[0] if result else None

def drop_table(self):
self.cursor.execute("DROP TABLE IF EXISTS files_summary")
self.conn.commit()

def get_all_files(self):
self.cursor.execute("SELECT file_path FROM files_summary")
results = self.cursor.fetchall()
files_path = [row[0] for row in results]
return files_path

def update_file(self, old_file_path, new_file_path, new_hash):
self.cursor.execute("UPDATE files_summary SET file_path = ?, file_hash = ? WHERE file_path = ?",
(new_file_path, new_hash, old_file_path))
self.conn.commit()

def delete_records(self, file_paths):
placeholders = ",".join("?" * len(file_paths))
self.cursor.execute(f"DELETE FROM files_summary WHERE file_path IN ({placeholders})", file_paths)
self.conn.commit()

def close(self):
self.conn.close()
Loading

0 comments on commit 964216f

Please sign in to comment.