Init commit

Init commit Init commit
AIxHunter · Sep 18, 2024 · 964216f · 964216f
commit 964216f
Show file tree

Hide file tree

Showing 37 changed files with 15,130 additions and 0 deletions.
diff --git a/.env.md b/.env.md
@@ -0,0 +1,74 @@
+The .env file is used to store environment variables for the backend server. it contains two sections: Text Processing Configuration and Image Processing Configuration:
+
+## Text Processing Configuration
+
+These variables are used for document (text) processing:
+
+- TEXT_API_END_POINT: Specifies the API endpoint for text processing.
+- TEXT_MODEL_NAME: Defines the model to be used for text processing.
+- TEXT_API_KEYS: A list containing the API key(s) required for authentication when making requests to the text API
+  endpoint. **`Using multiple keys will help in avoiding rate limits.`**
+
+## Image Processing Configuration
+
+These variables are used for image processing:
+
+- IMAGE_API_END_POINT: Specifies the API endpoint for image processing.
+- IMAGE_MODEL_NAME: Defines the model used for image processing.
+- IMAGE_API_KEYS: A list containing the API key(s) for image processing requests. Using multiple keys will help in avoiding rate limits.
+
+
+## Examples:
+
+- **OPENAI**
+```bash
+# API and MODEL used for documents processing
+TTEXT_API_END_POINT=https://api.openai.com/v1
+TTEXT_MODEL_NAME=gpt-4o
+TTEXT_API_KEYS=["sk-xxx","sk-yyy"]
+
+# API and MODEL used for images processing
+TIMAGE_API_END_POINT=https://api.openai.com/v1
+TIMAGE_MODEL_NAME=gpt-4o
+TIMAGE_API_KEYS=["sk-xxx","sk-yyy"]
+```
+
+- **GROQ**
+```bash
+# API and MODEL used for documents processing
+TEXT_API_END_POINT=https://api.groq.com/openai/v1
+TEXT_MODEL_NAME=llama3-70b-8192
+TEXT_API_KEYS=["gsk_xxx","gsk_yyy"]
+
+# API and MODEL used for images processing ( No vision models for GROQ yet)
+IMAGE_API_END_POINT=http://localhost:11434/v1
+IMAGE_MODEL_NAME=moondream:latest
+IMAGE_API_KEYS=["ollama"]
+```
+
+- **OLLAMA**
+```bash
+# API and MODEL used for documents processing
+TEXT_API_END_POINT=http://localhost:11434/v1
+TEXT_MODEL_NAME=gemma2:latest
+TEXT_API_KEYS=["ollama"]
+
+# API and MODEL used for images processing
+IMAGE_API_END_POINT=http://localhost:11434/v1
+IMAGE_MODEL_NAME=moondream:latest
+IMAGE_API_KEYS=["ollama"]
+```
+
+
+- **HUGGING FACE**
+```bash
+# API and MODEL used for documents processing
+TEXT_API_END_POINT=https://api-inference.huggingface.co/v1
+TEXT_MODEL_NAME=microsoft/Phi-3-mini-4k-instruct
+TEXT_API_KEYS=["hf_xxx","hf_yyy"]
+
+# API and MODEL used for images processing
+IMAGE_API_END_POINT=https://api-inference.huggingface.co/v1
+IMAGE_MODEL_NAME=nlpconnect/vit-gpt2-image-captioning
+IMAGE_API_KEYS=["hf_xxx","hf_yyy"]
+```
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,22 @@
+# IDE and OS specific files
+**/.idea/
+**/.vscode/
+.DS_Store
+Thumbs.db
+
+# Frontend (Angular) specific files
+frontend/node_modules/
+frontend/dist/
+frontend/.angular/
+frontend/*.js.map
+
+# Backend (FastAPI) specific files
+backend/__pycache__/
+backend/*.pyc
+backend/*.pyo
+backend/*.pyd
+backend/venv/
+venv/
+backend/env/
+**/__pycache__/
+backend/app/*.db
diff --git a/README.md b/README.md
@@ -0,0 +1,144 @@
+# FileWizardAi
+
+## Description
+
+FileWizardAi is a Python/Angular project designed to automatically organize your files into a well-structured directory
+hierarchy and rename them according to their content. This tool is ideal for anyone looking to declutter their digital
+workspace by sorting files into appropriate folders and providing descriptive names, making it easier to manage and
+locate files. Additionally, it allows you to input a text prompt and instantly searches for files that are related to
+your query, providing you with the most relevant files based on the content you provide.
+
+The app also features a caching system to minimize API calls, ensuring that only new or modified files are processed.
+
+### Example:
+
+**Before**
+
+```bash
+/home/user
+ ├── Downloads
+ │   ├── 6.1 Course Curriculum v2.pdf
+ │   └── trip_paris.txt
+ │   └── 8d71473c-533f-4ba3-9bce-55d3d9a6662a.jpg
+ │   └── Screenshot_from_2024-06-10_21-39-24.png
+ ```
+
+**After**
+
+```bash
+/home/user/Downloads
+ ├─ docs
+ │  └─ certifications
+ │     └─ databricks
+ │        └─ data_engineer_associate
+ │           └─ curriculum_v2.pdf
+ ├─ Personal Photos
+ │  └─ 2024
+ │     └─ 03
+ │        └─ 01
+ │           └─ person_in_black_shirt.jpg
+ ├─ finance-docs
+ │  └─ trip-expenses
+ │     └─ paris
+ │        └─ trip-justification.txt
+ └─ project Assets
+    └─ instructions_screenshot.png
+```
+
+### Video tutorial:
+
+[![Watch the video](./yt_video_logo.png)](https://www.youtube.com/watch?v=T1rXLox80rM)
+
+
+## Table of Contents
+
+- [Installation](#installation)
+- [Usage](#usage)
+- [Run in Development Mode](#run-in-development-mode)
+- [Credits](#credits)
+- [License](#license)
+- [Technical architecture](#technical-architecture)
+
+## Installation
+
+Make sure you have Python installed on your machine.
+
+First, clone the repository:
+
+```bash
+git clone https://github.com/AIxHunter/FileWizardAi.git
+```
+
+Navigate to the backend folder and update your `.env` file according to the [documentation](.env.md). Then, install the
+required
+packages by running ( preferably in a virtual environment like venv or conda):
+
+```bash
+cd backend
+pip install -r requirements.txt
+```
+
+## Usage
+
+Run the backend server
+
+```bash
+cd backend
+uvicorn app.server:app --host localhost --port 8000
+```
+
+App will be running under: http://localhost:8000/
+
+## Run in Development Mode
+
+If you are a developper and you want to modify the frontend, you can run the frontend and backend separately, here is
+how to do it:
+Install Node.js https://nodejs.org/
+
+install Angular CLI:
+
+```bash
+npm install -g @angular/cli
+```
+
+Run frontend:
+
+```bash
+cd frontend
+npm install
+ng serve
+```
+
+The frontend will be available at `http://localhost:4200`.
+
+to package the frontend run:
+
+```bash
+ng build --base-href static/
+```
+
+Run backend:
+
+Update your `.env` file with the desired API settings (check the [documentation](.env.md)), then:
+
+```bash
+cd backend
+uvicorn app.server:app --host localhost --port 8000 --reload
+```
+
+## Technical architecture
+
+<img src="filewizardai_architecture.png" alt="Online Image" width="600"/>
+
+1. Send request from Angular frontend (ex, organize files)
+2. Backend receives request through a REST API of FastAPI.
+3. Check SQLite if files has already been processed (cached files).
+4. Return cached summary if file was already processed.
+5. If the file has not been processed before, send new file to LLM for summarization.
+6. Cache summary in SQLite.
+7. Return summary to Angular frontend.
+8. Display summary to user and perform actions if need it.
+
+## License
+
+This project is licensed under the MIT License.
diff --git a/backend/.env b/backend/.env
@@ -0,0 +1,9 @@
+# API and MODEL used for documents processing
+TEXT_API_END_POINT=https://api.groq.com/openai/v1
+TEXT_MODEL_NAME=llama3-70b-8192
+TEXT_API_KEYS=["gsk_xxx"]
+
+# API and MODEL used for images processing
+IMAGE_API_END_POINT=http://localhost:11434/v1
+IMAGE_MODEL_NAME=moondream:latest
+IMAGE_API_KEYS=["ollama"] # Required but not used
diff --git a/backend/app/__init__.py b/backend/app/__init__.py
diff --git a/backend/app/database.py b/backend/app/database.py
@@ -0,0 +1,63 @@
+import sqlite3
+
+
+class SQLiteDB:
+    def __init__(self):
+        self.conn = sqlite3.connect('FileWizardAi.db')
+        self.cursor = self.conn.cursor()
+        create_table_query = "CREATE TABLE IF NOT EXISTS files_summary (file_path TEXT PRIMARY KEY,file_hash TEXT NOT NULL,summary TEXT)"
+        self.cursor.execute(create_table_query)
+        self.conn.commit()
+
+    def select(self, table_name, where_clause=None):
+        sql = f"SELECT * FROM {table_name}"
+        if where_clause:
+            sql += f" WHERE {where_clause}"
+        self.cursor.execute(sql)
+        return self.cursor.fetchall()
+
+    def is_file_exist(self, file_path, file_hash):
+        self.cursor.execute("SELECT * FROM files_summary WHERE file_path = ? AND file_hash = ?", (file_path, file_hash))
+        file = self.cursor.fetchone()
+        return bool(file)
+
+    def insert_file_summary(self, file_path, file_hash, summary):
+        c = self.conn.cursor()
+        c.execute("SELECT * FROM files_summary WHERE file_path=?", (file_path,))
+        user_exists = c.fetchone()
+
+        if user_exists:
+            c.execute("UPDATE files_summary SET file_hash=?, summary=? WHERE file_path=?",
+                      (file_hash, summary, file_path))
+        else:
+            c.execute("INSERT INTO files_summary (file_path, file_hash, summary) VALUES (?, ?, ?)",
+                      (file_path, file_hash, summary))
+        self.conn.commit()
+
+    def get_file_summary(self, file_path):
+        self.cursor.execute("SELECT summary FROM files_summary WHERE file_path = ?", (file_path,))
+        result = self.cursor.fetchone()
+        return result[0] if result else None
+
+    def drop_table(self):
+        self.cursor.execute("DROP TABLE IF EXISTS files_summary")
+        self.conn.commit()
+
+    def get_all_files(self):
+        self.cursor.execute("SELECT file_path FROM files_summary")
+        results = self.cursor.fetchall()
+        files_path = [row[0] for row in results]
+        return files_path
+
+    def update_file(self, old_file_path, new_file_path, new_hash):
+        self.cursor.execute("UPDATE files_summary SET file_path = ?, file_hash = ? WHERE file_path = ?",
+                            (new_file_path, new_hash, old_file_path))
+        self.conn.commit()
+
+    def delete_records(self, file_paths):
+        placeholders = ",".join("?" * len(file_paths))
+        self.cursor.execute(f"DELETE FROM files_summary WHERE file_path IN ({placeholders})", file_paths)
+        self.conn.commit()
+
+    def close(self):
+        self.conn.close()